中科院刘莹大数据挖掘课程作业2

合集下载
相关主题
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

HW2

Due Date: Nov. 23 Part I: written assignment

1.

a)Compute the Information Gain for Gender, Car Type and Shirt Size.

本题的class有两类;即C0和C1

I(C0,C1)= I(10,10)=1

infor gender(D)=10

20 I(6,4)+10

20

I(4,6)

=10 20 (−6

10

log26

10

−4

10

log24

10

)+10

20

(−6

10

log26

10

−4

10

log24

10

)=0.971

Gain(gender)= I(C0,C1)-infor gender(D)=1-0.971=0.029

infor CarType(D)=4

20 I(1,3)+8

20

I(8,0)+8

20

I(1,7)

=4 20(−1

4

log21

4

−3

4

log23

4

)+8

20

(−1

8

log21

8

−7

8

log27

8

)=0.3797

Gain(CarType)= I(C0,C1)-infor gender(D)=1-0.3797=0.6203

infor ShirtSize(D)=5

20 I(3,2)+7

20

I(3,4)+4

20

I(2,2)+4

20

I(2,2)

=5 20(−3

5

log23

5

−2

5

log22

5

)+7

20

(−3

7

log23

7

−4

7

log24

7

)+4

10

(−2

4

log21

2

−2

4

log21

2

)=0.9876

Gain(shirtSize)= I(C0,C1)-infor gender(D)=1-0.9876=0.0124

b)Construct a decision tree with Information Gain.

①由a知,CarType的information Gain最大,故本题应该选择CarType作为首要分裂属

性。

CarType的类别有Luxury family Sport(因全部属于C0类,此类无需再划分)

②对Luxury进一步划分:

I(C0,C1)= I(1,7)=0.5436

infor gender(D)=1

8 I(1,0)+7

8

I(1,6)=0+7

8

(−1

7

log21

7

−6

7

log26

7

)=0.5177

Gain(gender)= I(C0,C1)-infor gender(D)=0.5436-0.5177=0.0259

infor ShirtSize(D)=2

8 I(0,2)+3

8

I(0,3)+2

8

I(1,1)+1

8

I(0,2)=0.25

Gain(shirtSize)= I(C0,C1)-infor gender(D)=0.5436-0.25=0.2936 故此处选择ShirtSize进行属性分裂。

③对family进一步划分:

I(C0,C1)= I(1,3)=0.811

Gain(gender)= I(C0,C1)-infor gender(D)=0.811- I(1,3)=0 Gain(shirtSize)= I(C0,C1)-infor gender(D)

=0.811-1

4 I(1,0)-1

4

I(0,1)- 1

4

I(0,1)- 1

4

I(0,1)=0.811

故此处选择ShirtSize进行属性分裂。

④根据以上的计算可得本题的决策数如下:

2.

CarType

Family

ShirtType

Sports

C0

Luxury

ShirtType

small

C0

medium

C1

large

C1

Extra Large

C1C1C1C0 C1C1

Small Medium Large ExtraLarge

CarType

Family

ShirtType

Sports

C0

Luxury

ShirtType

small

C0

Other

C1

C0 C1C1

Large Other

(a) Design a multilayer feed-forward neural network (one hidden layer) for the data set in Q1. Label the nodes in the input and output layers.

根据数据的属性特点易知输入层有8个节点,分别为:

x1 Gender ( Gender = M: x1 = 1; Gender = F: x1 = 0 )

x2 Car Type = Sports ( Y = 1; N = 0)

x3 Car Type = Family( Y = 1; N = 0)

x4 Car Type = Luxury ( Y = 1; N = 0)

x5 Shirt Size = Small ( Y = 1; N = 0)

x6 Shirt Size = Medium ( Y = 1; N = 0)

x7 Shirt Size = Large ( Y = 1; N = 0)

x8 Shirt Size = Extra Large ( Y = 1; N = 0)

隐藏层有三个节点x9、x10和x11. 输出为二类问题, 因此只有1个节点x12(C0=1;C2=0).

神经网络图如下:(其中Wij表示输入层第i个节点到隐藏层第j个节点所付权重,为方便计算,第i个节点到第9/10/11个节点的权重设置一样;Wi-j则表示隐藏层第i个节点到输出层节点所赋予的权重)

相关文档
最新文档