数据挖掘1-

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Part I:

1. Suppose that the data for analysis include the attribute age. The age values for the data tuples are (in increasingv order):

13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70. (a) use min-max normalization to transform the value 35 for age onto the rang[0.0,1.0].

(b) use z-score normalization to transform the value 35 for age ,where the standard deviation of age is 12.94 years.

(c) Use normalization by decimal scaling to transform the value 35 for age. (d) Comment on which method you would prefer to use for the given data ,giving reasons sa to why.

(a) Given that the minmum age value is 13 and the maxmun value is 70, we can

transform the value 35 for age onto the rang[0.0,1.0] by min-max normalization as follows:

min '(max min )min max min 3513

(10)070130.39

age age age age

age age

v v new new new -=-+--=

-+-=

(b) Given that the standard deviation of age is 12.94 years ,we may use z-score

normalization to transform the value 35 for age:

809

29.9627

age =

= 3529.96

'0.3912.94

age

v age

v σ--=

=

=

(c) By decimal scaling normalization,we transform the value 35 for age as 35'0.35(max(')1)10100

j v v j v =

==<为使得的最小整数

(d) 我更倾向于使用小数定标规范化。对于最小-最大规范化。如果今后的输入

值落在age 的原始数据至于之外,该方法可能会面临“越界”错误,而z-dcore 规范化则额外计算并保存平均值和标准差这两个参数。由于age 的值基本上都是不超过二位数,因而j 可以统一取2,即用100初每个数即可。

2.A database has four transaction. Let min_sup=60%and min_conf=80%.

(a) At the granularity of item_category(eg.item;could be “milk ”),for the following rule template,

123,(,)(,)(,)[,]X transaction buys X item buys X item buys X item s c ∀∈∧⇒ List the frequent k itemset - for the largest k and all of the strong association rules (with their support s and confidence c)containing the frequent

k itemset - for the largest k .

相对支持度为min_sup=60%,那么可以求得绝对支持度为 min_sup=0.64 2.4⨯=

{ milk,cheese },{ cheese,bread },{ milk, bread },{ milk },{ cheese },{ bread }。得出关联规则如下,每个都列出置信度:

milk cheese bread 3/3100%confidence ∧⇒== cheese bread milk 3/3100%confidence ∧⇒== milk bread cheese 3/475%confidence ∧⇒== milk bread cheese 3/475%confidence ⇒∧== cheese milk bread 3/3100%confidence ⇒∧== bread milk cheese

3/475%confidence ⇒∧==

已知最小置信度为min_conf=80%,故强关联规则有:

,(,milk)(,cheese)(,bread)[75%,100%]

X transaction buys X buys X buys X s c ∀∈∧⇒==,(,cheese)(,bread)(,milk)[75%,100%]X transaction buys X buys X buys X s c ∀∈∧⇒== ,(,cheese)(,milk)(,bread)[75%,100%]X transaction buys X buys X buys X s c ∀∈⇒∧==

(b) At the granularity of brand-item_category(e.g. item; could be “sunset-milk ”), for the following rule temple,

123,(,)(,)(,)X customer buys X item buys X item buys X item ∀∈∧⇒

List the frequent k itemset - for the largest k . Note:do not print any rule.

从1L 可以看出,k 最大为1,故频繁k 项集为l ={ Wonder-bread }

3. When mining cross-level association rules, suppose it is found that the itemset

相关文档
最新文档