计算智能与机器学习_作业2
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Homework 2
First Part: Comparing three Machine Learning Algorithms (10 marks)
In this first part, you will use Weka to compare three machine learning algorithms: decision trees (J48), using one attribute to make a rule set (oneR), and using all the attributes equally to make predictions (Naive Bayes (NB)).
1. Load the weather.nominal.arff data and run the Naive Bayes classifier on this data. Look at the top part of the output and answer the following question:
a) How many instances are there?
Answer: There are 14 instances.
b) Go back to Preprocess. Click on Filter Choose. Notice that the filters are organized in a hierarchical fashion. The supervised ones depend on the class value, and the unsupervised ones do not. Within each category, the filter can be based on attribute or instance. Choose unsupervised, instance, RemoveWithValues. Now click on the RemoveWithValues box (next to the Choose button), and use the attributeIndex and the nominalIndices to remove the instances that have Outlook = Overcast. Click on Apply to make the change. Run the Naive Bayes classifier on this version of the data, and give and explain the numbers corresponding to the ones above. For example, you should find now that Class yes (0.5).
Answer: After removing the instances that have Outlook = Overcast. The number of instances drops from 14 to 10. The Correctly Classified Instances drops from 92.8571% to 90%, and we find that the Class yes decrease from 0.63 to 0.5. The
2. Reload the original weather.nominal.arff data. What is the accuracy for each of the three classifiers (J48, oneR, NB) when the testing is done on the training data (resubstitution rate)? What is the accuracy using 10-fold cross-validation (10x-cv)? Which is a better estimate of the accuracy of the model?
Answer: When the testing is done on the training data, the estimate of the accuracy of model J48 is better. The accuracy of the J48 classifier is 100%.
The accuracy of the oneR classifier is 71.4286%.
The accuracy of the NB classifier is 92.8571%.
When using 10-fold cross-validation (10x-cv), the estimate of the accuracy of the model NB is better. The accuracy of the J48 classifier is 50%.
The accuracy of the oneR classifier is 42.8571%.
The accuracy of the NB classifier is 57.1429.%.
3. Load the weather.arff data, for which humidity and temperature are numeric. What are the two accuracies (training data and 10x-cv) for each of the three algorithms?
Answer: When the testing is done on the training data, the accuracy of the J48 classifier is 100%.
the accuracy of the oneR classifier is 71.4286%.
the accuracy of the NB classifier is 92.8571%.