《白话机器学习算法》数据来源和参考资料

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

《白话机器学习算法》数据来源和参考资料

k均值聚类:Facebook用户性格特征

Stillwell, D., & Kosinski, M. (2012). myPersonality Project: .

Kosinski, M., Matz, S., Gosling, S., Popov, V., & Stillwell, D. (2015). Facebook as a Social Science Research Tool: Opportunities, Challenges, Ethical Considerations and Practical Guidelines. American Psychologist.

主成分分析:食物的营养成分

美国农业部(2015),USDA Food Composition Databases:

关联规则:杂货店数据

数据集包含在如下R包中:Hahsler, M., Buchta, C., Gruen, B., & Hornik, K. (2016). arules: Mining Association Rules and Frequent Itemsets. R包版本1.5-0. 。

Hahsler, M., Hornik, K., & Reutterer, T. (2006). Implications of Probabilistic Data Modeling for Mining Association Rules. In Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A.,& Gaul, W. Eds., From Data and Information Analysis to Knowledge Engineering, Studies in Classification, Data Analysis, and Knowledge Organization. pp.598-605. Berlin, Germany: Springer-Verlag.

Hahsler, M., & Chelluboina, S. (2011). Visualizing Association Rules: Introduction to the R-extension Package arulesViz. R Project Module, 223-238.

回归分析:预测房价

Harrison, D., & Rubinfeld, D. (1978). Hedonic Prices and the Demand for Clean Air. Journal of Environmental Economics and Management, 5, 81-102.

k最近邻算法:葡萄酒的化学成分

Forina, M., et al. (1998). Wine Recognition: .

Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling Wine Preferences by Data Mining from Physicochemical Properties. Decision Support Systems, 47(4), 547-553.

支持向量机:预测心脏病

Robert Detrano (M.D., Ph.D), from Virginia Medical Center, Long Beach and Cleveland Clinic Foundation (1988). Heart Disease Database (Cleveland) [Data file and description]: +Disease.

Detrano, R., et al. (1989). International Application of a New Probability Algorithm for the Diagnosis of Coronary Artery Disease. The American Journal of Cardiology, 64(5), 304-310.

决策树:泰坦尼克号乘客数据

Report on the Loss of the 'Titanic' (S.S.) (1990). British Board of Trade Inquiry Report (reprint), Gloucester, UK: Allan Sutton Publishing and are discussed in Dawson, R. J. M. (1995). The 'Unusual Episode' Data Revisited. Journal of Statistics Education, 3(3).

随机森林:旧金山犯罪事件数据

SF OpenData, City and County of San Francisco (2016). Crime Incidents.

随机森林:旧金山天气

National Oceanic and Atmospheric Administration, National Centers for Environmental Information (2016). Quality Controlled Local Climatological Data (QCLCD).

神经网络:手写数字

LeCun, Y., & Cortes, C. (1998). The MNIST Database of Handwritten Digits[Data file and description]: .

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278-2324.

若想获取更多开放数据集,请访问。

相关文档
最新文档