基于数据挖掘的客户流失预测实证研究
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
管理学硕士学位论文
基于数据挖掘的客户流失预测实证研究 司学峰
北京工业大学
2009 年 5 月
分类号 C93
单位代码:10005 学 号:S200611087 密 级:公开
北京工业大学硕士学位论文
题 目 基于数据挖掘的客户流失预测实证研究
题 目 Demonstration Study of Customer Churn Prediction based on Data Mining
- III -
目录
目录
摘 要............................................................................................................................................. I Abstract .......................................................................................................................................... II 第 1 章 绪论 .................................................................................................................................1
研究生姓名:
司学峰
专 业:管理科学与工程
研究方向:信息管理与信息系统
导师姓名: 蒋 国 瑞
职 称: 教 授
论文报告提交日期 授予单位名称和地址
学位授予日期
独创性声明
本人声明所呈交的论文是我个人在导师指导下进行的研究工作及取得的研究成果。尽我所 知,除了文中特别加以标注和致谢的地方外,论文中不包含其他人已经发表或撰写过的研究 成果,也不包含为获得北京工业大学或其它教育机构的学位或证书而使用过的材料。与我一
In the thesis, the results of research are: Customer churn data sets have typical non-equilibrium characteristic and differences in the cost of misclassification. In traditional SVM based on the Cost Sensitive Learning put forward a Cost Sensitive SVM customer churn prediction modeling, experimental verification of the validity of the modeling to solve such
针对客户流失问题,目前在电信行业、银行业、保险行业基于数据挖掘技术 进行客户流失预测建模,并取得了丰硕的研究成果。而针对网络招聘行业面向企 业客户流失问题的研究尚属起步阶段,本文分析了前人研究成果,对非平衡数据 进行了介绍;对客户流失预测理论、研究方法和发展脉络进行了回顾与综述;支 持向量机(Support Vector Machine,SVM)以其坚实的理论基础与良好的推广性能 成为近几年来应用研究的热点,是一种流行的数据挖掘技术,本文对支持向量机 进行了介绍;论述了我国网络招聘行业特征、市场规模及发展前景。最后通过国 内某知名招聘网站企业客户特征数据以及客户在线行为日志数据,利用数据挖掘 技术进行了客户流失预测建模及挽留策略的实证研究。
本文的研究成果主要有: 1) 针对客户流失数据集的非平衡性与错分代价差异性问题,在传统 SVM 基础
上,引入代价敏感学习理论,提出了代价敏感 SVM 的客户流失预测建模方 法,通过实验验证了方法的有效性,对解决此类问题有一定的借鉴意义。 2) 针对客户流失预测数据集的高维特征约减问题,提出了主成分分析与神经网 络的预测建模方法。通过实证研究,结果表明此组合方法降低了高维属性, 简化了神经网络拓扑结构,提高了模型的预测性能。 3) 针对网络招聘企业客户挽留问题,分析了客户流失影响因素,基于 K-means 聚类技术对客户在线行为进行客户细分,并结合每类客户特征探讨了客户关 系管理策略。
For the problems of customer churn, in the telecommunications industry, banking, insurance, building customer churn prediction based on data mining technology is good choose and achieved fruitful research results. However, the study of churn problems for enterprise is the initial stage in web recruitment industry. In the thesis, we have a depth study and research on the non-equilibrium data mining problems. The customer churn theory, research methods and the development of context were reviewed and summarized. And to China’s web recruitment industry characteristics, market size and growth prospects were also analyzed and discussed. Support Vector Machine as a popular data mining techniques and becomes a research hotspot in recent years for its solid theoretical foundation and the promotion of good performance were introduced and systematic exposition.on the basis of to the problem of customer churn and retention strategy, we have a demonstration study based on data mining through collecting a well known domestic web recruitment site enterprise customers’ characteristics data and their online behavior log data.
同工作的同志对本研究所做的任何贡献均已在论文中作了明确的说明并表示了谢意。
签名:
日期:
关于论文使用授权的说明
本人完全了解北京工业大学有关保留、使用学位论文的规定,即:学校有权保留送交论文的 复印件,允许论文被查阅和借阅;学校可以公布论文的全部或部分内容,可以采用影印、缩
印或其他复制手段保存论文。 (保密的论文在解密后应遵守此规定)
关键词:数据挖掘;客户流失预测;非平衡数据;代价敏感学习;支持向量机
-I-
北京工业大学管理学硕士学位论文
Abstract
In the real world, data distribution is often class-imbalanced.The un-balanced data problem has already affected many applications for example: customer churn, fraud detection, risk management and so on. Now, with in-depth study of data mining, non-equilibrium data mining is becoming a hot new field of research.
For the issue of retention enterprise customer, the thesis discusses the retention strategy. In addition, customer online behavior is analyzed by K-means clustering technology. Keywords: data mining; customer churn prediction; un-balanced data; cost sensitive learning; support vector machine
签名:
导师签名:
日期:
摘要
摘要
现实世界中,数据的分布往往是不平衡的,数据非平衡性问题已影响到多个 应用领域如:客户流失、欺诈侦测、风险管理等。当前,随着数据挖掘研究的深 入,非平衡数据挖掘正成为一个新的热点研究领域。
本文研究的客户流失数据集具有典型的非平衡数据每天约有 2000 万 条就业信息发布,3000 多万人在互联网上发出求职简历,2006 年全球招聘市场 规模为 172 亿美元。在中国,2007 年网络招聘市场规模为 9.7 亿元,2008 年 12.5 亿,预计 2009 年将达到 16.1 亿元。网络招聘巨大的市场规模,良好的利润前景 催生了新的专业化、行业性、地方性的招聘网站的诞生,同时也加剧了网络招聘 行业的激烈竞争。
In the thesis, the customer churn data sets are typical of non-equilibrium data. And in the thesis the customer is enterprise customer of web recruit sits.The industry of global web recruitment is developing so rapidly. It was reported that about the world's 20 million daily employment information released and More than 3000 million people on the Internet issued their Resumes and in 2006 the global recruitment market reached 17.2 billion dollar. In China, the web recruitment market size reached 0.97 billion RMB in 2007, 1.25 billion RBM in 2008 and expected in 2009 will reach 1.61 billion RBM. For web recruitment huge market size, good prospects of highly profit, lots of new specialization, industry, local recruitment web sites was born and meanwhile increased the web recruitment of industry competition.
-II-
Abstract
problems on a certain reference. To against the problem of customer churn data sets’ High -Dimensional
characteristics , put forward a principal component analysis and neural network prediction modeling and through empirical research results show that the combination of ways to reduce high-dimensional attributes, simplifying the neural network topology and improving the performance of the model predictions.
基于数据挖掘的客户流失预测实证研究 司学峰
北京工业大学
2009 年 5 月
分类号 C93
单位代码:10005 学 号:S200611087 密 级:公开
北京工业大学硕士学位论文
题 目 基于数据挖掘的客户流失预测实证研究
题 目 Demonstration Study of Customer Churn Prediction based on Data Mining
- III -
目录
目录
摘 要............................................................................................................................................. I Abstract .......................................................................................................................................... II 第 1 章 绪论 .................................................................................................................................1
研究生姓名:
司学峰
专 业:管理科学与工程
研究方向:信息管理与信息系统
导师姓名: 蒋 国 瑞
职 称: 教 授
论文报告提交日期 授予单位名称和地址
学位授予日期
独创性声明
本人声明所呈交的论文是我个人在导师指导下进行的研究工作及取得的研究成果。尽我所 知,除了文中特别加以标注和致谢的地方外,论文中不包含其他人已经发表或撰写过的研究 成果,也不包含为获得北京工业大学或其它教育机构的学位或证书而使用过的材料。与我一
In the thesis, the results of research are: Customer churn data sets have typical non-equilibrium characteristic and differences in the cost of misclassification. In traditional SVM based on the Cost Sensitive Learning put forward a Cost Sensitive SVM customer churn prediction modeling, experimental verification of the validity of the modeling to solve such
针对客户流失问题,目前在电信行业、银行业、保险行业基于数据挖掘技术 进行客户流失预测建模,并取得了丰硕的研究成果。而针对网络招聘行业面向企 业客户流失问题的研究尚属起步阶段,本文分析了前人研究成果,对非平衡数据 进行了介绍;对客户流失预测理论、研究方法和发展脉络进行了回顾与综述;支 持向量机(Support Vector Machine,SVM)以其坚实的理论基础与良好的推广性能 成为近几年来应用研究的热点,是一种流行的数据挖掘技术,本文对支持向量机 进行了介绍;论述了我国网络招聘行业特征、市场规模及发展前景。最后通过国 内某知名招聘网站企业客户特征数据以及客户在线行为日志数据,利用数据挖掘 技术进行了客户流失预测建模及挽留策略的实证研究。
本文的研究成果主要有: 1) 针对客户流失数据集的非平衡性与错分代价差异性问题,在传统 SVM 基础
上,引入代价敏感学习理论,提出了代价敏感 SVM 的客户流失预测建模方 法,通过实验验证了方法的有效性,对解决此类问题有一定的借鉴意义。 2) 针对客户流失预测数据集的高维特征约减问题,提出了主成分分析与神经网 络的预测建模方法。通过实证研究,结果表明此组合方法降低了高维属性, 简化了神经网络拓扑结构,提高了模型的预测性能。 3) 针对网络招聘企业客户挽留问题,分析了客户流失影响因素,基于 K-means 聚类技术对客户在线行为进行客户细分,并结合每类客户特征探讨了客户关 系管理策略。
For the problems of customer churn, in the telecommunications industry, banking, insurance, building customer churn prediction based on data mining technology is good choose and achieved fruitful research results. However, the study of churn problems for enterprise is the initial stage in web recruitment industry. In the thesis, we have a depth study and research on the non-equilibrium data mining problems. The customer churn theory, research methods and the development of context were reviewed and summarized. And to China’s web recruitment industry characteristics, market size and growth prospects were also analyzed and discussed. Support Vector Machine as a popular data mining techniques and becomes a research hotspot in recent years for its solid theoretical foundation and the promotion of good performance were introduced and systematic exposition.on the basis of to the problem of customer churn and retention strategy, we have a demonstration study based on data mining through collecting a well known domestic web recruitment site enterprise customers’ characteristics data and their online behavior log data.
同工作的同志对本研究所做的任何贡献均已在论文中作了明确的说明并表示了谢意。
签名:
日期:
关于论文使用授权的说明
本人完全了解北京工业大学有关保留、使用学位论文的规定,即:学校有权保留送交论文的 复印件,允许论文被查阅和借阅;学校可以公布论文的全部或部分内容,可以采用影印、缩
印或其他复制手段保存论文。 (保密的论文在解密后应遵守此规定)
关键词:数据挖掘;客户流失预测;非平衡数据;代价敏感学习;支持向量机
-I-
北京工业大学管理学硕士学位论文
Abstract
In the real world, data distribution is often class-imbalanced.The un-balanced data problem has already affected many applications for example: customer churn, fraud detection, risk management and so on. Now, with in-depth study of data mining, non-equilibrium data mining is becoming a hot new field of research.
For the issue of retention enterprise customer, the thesis discusses the retention strategy. In addition, customer online behavior is analyzed by K-means clustering technology. Keywords: data mining; customer churn prediction; un-balanced data; cost sensitive learning; support vector machine
签名:
导师签名:
日期:
摘要
摘要
现实世界中,数据的分布往往是不平衡的,数据非平衡性问题已影响到多个 应用领域如:客户流失、欺诈侦测、风险管理等。当前,随着数据挖掘研究的深 入,非平衡数据挖掘正成为一个新的热点研究领域。
本文研究的客户流失数据集具有典型的非平衡数据每天约有 2000 万 条就业信息发布,3000 多万人在互联网上发出求职简历,2006 年全球招聘市场 规模为 172 亿美元。在中国,2007 年网络招聘市场规模为 9.7 亿元,2008 年 12.5 亿,预计 2009 年将达到 16.1 亿元。网络招聘巨大的市场规模,良好的利润前景 催生了新的专业化、行业性、地方性的招聘网站的诞生,同时也加剧了网络招聘 行业的激烈竞争。
In the thesis, the customer churn data sets are typical of non-equilibrium data. And in the thesis the customer is enterprise customer of web recruit sits.The industry of global web recruitment is developing so rapidly. It was reported that about the world's 20 million daily employment information released and More than 3000 million people on the Internet issued their Resumes and in 2006 the global recruitment market reached 17.2 billion dollar. In China, the web recruitment market size reached 0.97 billion RMB in 2007, 1.25 billion RBM in 2008 and expected in 2009 will reach 1.61 billion RBM. For web recruitment huge market size, good prospects of highly profit, lots of new specialization, industry, local recruitment web sites was born and meanwhile increased the web recruitment of industry competition.
-II-
Abstract
problems on a certain reference. To against the problem of customer churn data sets’ High -Dimensional
characteristics , put forward a principal component analysis and neural network prediction modeling and through empirical research results show that the combination of ways to reduce high-dimensional attributes, simplifying the neural network topology and improving the performance of the model predictions.