毕设外文文献翻译
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Application of Bayesian Network in Improving Customer Credit Precision
1.Introduction
The concept of CRM was first mentioned by Gartner Group in 1997,it is a cycling process between corporations and customers,which produces,collects and analyzes customer data and then applies the results to corporate service and market activities.CRM is a brand-new idea but not merely a management idea;it is also a new management system which demands enterprises to transform from “product-focus”to “custom-focus”model.
The core of CRM is focused on customers.It does systematic research on customer to enhance custom service level;its final goal is to raise customer’s satisfaction and loyalty,to hold former customers,to contend for new custom and opportunities and to bring more profit for companies.CRM’s purpose is to make corporate acknowledge their customers,to develop its most possible relationship with customers,to realize custom’s biggest value and reach double-profit between corporations and customers.This paper first introduces methods of calculating Bayesian network,then establishes bias network’s traditional formation ways according to sample data,then to mine,analyze and evaluate customers of mobile telecommunication in order to realize customer’s credit analysis and to support the corporation to make correct strategic decisions.
2.Bayesian network
Data mining’s calculating methods,being a classical technique in CRM,has been respected by more and more experts and corporations.However,calculating methods of traditional data mining has its own defects,so we introduce Bayesian network in custom credit analysis.Bayesian network model has characteristics as follow:combing prior information and sample data,it on the one hand avoids the subjective prejudice if only using the prior information and avoids the blind search and wearisome calculation if lacking of sample data,on the other,it can avoid the noise effect if only using posterior information.Bayesian network model,absorbing prior information,describing the relationship among data in figure form,is a very convenient way for forecast.
2.1Bayesian network model
Bayesian network is also called Bayesian ideal network.It presents united rate distribution by an appointed group of conditional independence assumption (direction,no circle picture)and a group of partial condition rate sets.Bayesian network allows identifying conditional independence among variables in sub-sets,provides a kind of cause-effect relation picture and establishes model for learning on its basis.Given a random variable {}n X X X X Λ,,21=,i X is one of m random variables?
Bayesian network G consists of two parts.One is network structure S ,which is used to express independence and conditional independence among variable.Generally,there isn’t any edge
independence or conditional independence between two directly connected variable p .The other one is X joint probability distribution.
Bayesian network marks it as:
>
=<p S G ,S is a figure with direction but no cycle,its apex separately correspond to a limited set random variable {}n X X X X Λ,,21=,each arc stands for one function’s reliable relation.
Provided a arc from the variable Y to X ,then Y is X parent node,while X is Y child node.We present all of parents(i X )in pa(i X ),and all its child node in set )(i X child 。
p represents X joint distribution.If X is an set of discrete variable,then p can be
parameterized using conditional probability.
Regarding i X every value i x ,conditional probability ))(|(i i X pa x p is equal to a parameter )(|i i x pa x θ,means given )(i X Pa ,
conditional probability which event i x occurs.Actually ,Bayesian network gives variables set X joint conditional probability distribution.
∏==n i i i n x pa X p X X X p 1
21))
(|(),,,(ΛBayesian network reduces the computation and the memory joint distribution consumption
greatly through this kind of decomposition.If n variables are all two value variables,then the joint distribution's condition number is n 2,it needs n 2parameters to describe joint distribution.But when the joint distribution decomposes into certain low dimension variable in the conditional distribution product,the parameters greatly reduced.
Graph 1includes Bayesian network structure of five random variables
{}54321,,,,X X X X X X =。
1X is root node,it has two child nodes 2X and 3X ,2X and 3X are parent node 4X .by graph we can decomposes X joint distribution into five conditional distribution product :
)
|(),|()|()|()())
(|()(45324131215
1
X X p X X X p X X p X X p X p X pa X p X p i i i ==∏−If X is a two value variable ,j i x means event 2,1,==j j X i .)(j i x p means the Occurs
probability when time j X i =.Then there are
)
|(),|()|()|()(),,,,()
|(),|()|()|()(),,,,()
|(),|()|()|()(),,,,(242522232421232122212524232221241522132411131122111524132211141512131411131112111514131
211x x p x x x p x x p x x p x p x x x x x p x x p x x x p x x p x x p x p x x x x x p x x p x x x p x x p x x p x p x x x x x p ===
图15个随机变量的贝叶斯网络实例
To establish Bayesian network model,we should first determine the related variables and one
reliable data training set,making use of data to do the training.Therefore,to make sure the goal of model is necessary,that is to make sure the related explanations.Next we should produce a picture
(with direction but no cycle)of conditional independence and then appoint the partial rate distribution.Under the scattered condition,we shall appoint each state one distribution for parent node of every variable.
2.2Prospect value customer,improve service level
Bayesian network model is a picture model combing prior information and posterior information,because of such a combination,it on the one hand avoids the bias if only using the prior information,and avoids the blind search and wearisome calculation if lacking of sample data,on the other,it can also avoid the noise effect if only using the posterior information.Bayesian network model plays a profound role in CRM,it helps classify the customers in a suitable way,satisfy their personal needs and be a crucial element to retain their golden customers and to gain more new customers to be their golden customers.For different customers,Bayesian network will provide different models,forecast customer’s next demands and analyze customer’s credibility so to avoid corporate potential danger.Bayesian network could combine prior information and sample data to have an effective analysis on customers and to provide powerful support for corporate right strategic decisions.
The customer’s debt condition has puzzled correspondent industry for a long time,to customer’s debt whether to immediately engine off,because immediately to be engine off will drain the partial customers,if not engine off,then the telecommunication will face the financial revenue and
expenditure not to be balanced.This article uses the Bayesian network to carry on the computation to the mobile communication customer credit precision,by its achievement customer credit forecast
appraisal basis,to telecommunication customer money owed whether the engine off does carry on the judgment immediately,provides the support for enterprise's decision-making,thus brings a greater income for the enterprise.
Selects the following five variables to come to the customer credit precision on computation,5variables expressed separately whether the customer did have the money owed situation before (F),whether to have the stable income (R),Five variables are selected to illustrate Bayesian network model’s practical application and function in the customer credit precision on computation.These five variables F ,R ,B ,S ,A will separately show customer’s debt condition before (F),having stable income or not (R),customer nature (B),customer month telephone conversation quantity (S)and use service (A).
B={student ,staff ,Senior citizen},S={few(s<100hours),many(s>100hours)}
A={M-zone ,ShenZhouxing,populace card }
We first determine the variables,and then choose the proper network formation according to prior knowledge.There will be 5!different network formations in a data sample with 5variables,yet
analyzing each is not necessary,we could eliminate most of unreasonable formations by our mastered knowledge.We discover in the research that there won’t be any connection between S and A,S1and S2are two formations we can actually choose.
Figure 1is network formation of S1;S2is based on
S1but with one additional arc from A to R.
both formations are presumed to have a united prior probability:5.0)()(21==h h S p S p .In S1,we have
some formulas as follows:
001
.0),50,|(02
.0),5030,|(,0005.0),30,|(0002
.0),50,|(,004.0),5030,|(001
.0),30,|(,05.0*)*,,|(01
.0)|(,2.0)|(5
.0)(,49.0)5030(,00001.0)(,25.0)30()
,,|(),,,|(),|(),,|(),(),|(),()|(==≥====−=====≤====≥====−=====≤================−====≤====女否是女否是女否是男否是男否是男否是是是否是是是男是s a f b p s a f b p s a f b p s a f b p s a f b p s a f b p s a f b p f g p f r p s p a p f p a p s a f bb p r s a f b p f r p s a f r p s p a f s p a p f a P As other variables forecast values are known,probability formula for calculating
telecommunication customer’s phone debt is provided:
图1移动客户拖欠话费的贝叶斯网络模型分析图S1
∑′′==
),,,,(),,,,(),,,(),,,,(),,,|(b r s a f p f b r s a f p b r s a p b r s a f p b r s a f p ∑∑′′′′=′′′′=),,|()|()()
,,|()|()(),,|()|()(),,|()|()()()(s a f b p f r p f p
f s a f b p f r p f p s a f b p f r p f p f s a f b p f s p s p a p f p f ′means f all possible states,D is the data sample,ξstands for prior knowledge,then we have the followin
g formula:
74
.0),|(,26.0),|(21==ξξD S p D S p h h We can summarize from the above that the final formations calculated by S1and S2are obviously different despite of their similar network formations.This indicated that the Bayesian network has the very good sensitivity,is suitable for the customer credit precision computation.
3.Conclusion
CRM,a management concept and technique in improving corporate core competition,has attracted attentions of more and more experts and enterprises.Bayesian,as a young mining technique,comprehending prior information and sample data,has a very powerful dealing ability to data,and obtains much better logic and understanding ability.At the aspect of analyzing customer’s credibility,its main responsibility is to offer different customers personalized service,so as to reduce loss caused by custom’s debt and improper power off.we here used model to analyze custom’s credibility,the results has shown that false rate and loss rate have both reached the practical requirement,and will make more customers have advanced personalized service and control serious debt condition at the same time.However,Bayesian is still a model to be improved and perfected.
1.Custom’s other attributes may have influence on it.When these attributes strengthen,model’s classification ability may fall,so considering more ability attributes becomes necessary.
2.Improving the calculating method to make network feed back quickly,increase the model’s training precision,add training hours and to enhance model classification’s correctness rate.
3.Further analyze the credibility of ideal output to classify the custom in much more detail.Reference:
[1]Wang Hui,Forecast by Bayesian Network,Northeast Normal university newspaper natural sciences journal,2002
[2]Ji Junzhong,Liu Chunnian ,Sha Zhiqiang,Bayesian Model Study,Inference and Application,computer project and application,2003
[3]Zhu Huiming,Chen Junwu,Based on Bayesian Network Learning Model CRM,statistics and decision-making,2006
[4]Chen Xin,CRM in the Business Management and the Telecommunication Domain Application
[D],Central Finance and Economics University,2002
[5][US]conta Nick.Data Mining Concept,Model,Method and Algorithm[M].Beijing:Tsinghua University publishing house,2003.
[6]Chen J,Greiner paring Bayesian network classifiers[A]Proceedings
of the fifteenth conference on uncertainty inartificial intelligence[C].San Francisco,Morgan Kaufmann,1999.101-107.
[7]Heckeman D.Bayesian networks for data mining.Data mining and knowledge discovery. 1997,1;79-119
采用贝叶斯网络模型提高客户信用度计算精度
一、引入背景
CRM 的概念最早由Gartner Group 公司于1997年提出,是企业与客户进行交互的循环流程,进而产生、收集和分析客户数据,然后把结果应用到企业的服务和市场活动中。
CRM 是一种全新的理念,CRM 不仅仅是一种管理理念,同时CRM 也是一种新型管理机制,它要求企业从“以产品为中心”的模式向“以客户为中心”的模式转移。
CRM 的核心思想是以客户为中心,对客户进行系统化研究,以改进客户服务水平,它的最终目标是提高客户满意度和忠诚度,留住老客户,不断争取新客户和新商机,为企业带来更多的利润。
CRM 的目的就是要求企业认识客户,最大限度的发展与客户的关系,实现客户价值的最大化,创造企业与客户之间的双赢。
针对移动通信客户欠费是否立即停机这一问题,本文利用贝叶斯网络建立模型,对移动通信的客户信用度精度进行预测,并队结果进行分析和评估,为企业做出正确的战略决策提供支持。
二、贝叶斯网络
数据挖掘算法作为客户关系管理的经典技术,越来越被专家和企业所重视,针对传统数据挖掘算法的不足,在客户信用分析中我们引入了贝叶斯网络。
贝叶斯网络模型有如下特点:综合先验信息和样本数据,既可避免只使用先验信息可能带来的主观偏见和缺乏样本数据时的盲目搜索与冗杂计算,也可以避免只使用后验信息带来的噪音的影响。
贝叶斯网络模型结合先验信息,用图形的形式描述数据间的相互关系,非常便于进行预测分析。
1贝叶斯网络模型
贝叶斯网络也称贝叶斯信念网络,它是通过指定一组条件独立性假定(有向无环图)以及一组局部条件概率集合来表示联合概率分布。
贝叶斯网络允许在变量的子集间定义条件独立性,并且提供一种因果关系的图形,通过贝叶斯网络建立模型可以对数据进行进一步的分析,为做出正确的决策提供技术支持。
给定一个随机变量集{}n X X X X Λ,,21=,其中i X 是一个m 维随机向量。
贝叶斯网络G 由两部分组成。
其一为网络结构S ,该结构可以用来表达变量之间的独立性和条件独立性。
一般来说,直接相连的两个变量之间不存在任何边缘独立性和条件独立性。
其二为X 的联合分布p 。
贝叶斯网络记为:
>
=<p S G ,其中S 是一个有向无环图,顶点分别对应于有限集X 中的随机变量n X X X ,,,21Λ,每条弧代表一个函数依赖关系。
如果有一条由变量Y 到X 的弧,则Y 是X 的父结点,而X 则是Y 的子结点。
i X 的所有父结点变量用集合)(i X Pa 表示,i X 的所有子结点变量用集合)(i X child 表示。
p 代表X 的联合分布。
若X 是离散变量的集合,则p 可以用条件概率参数化。
对于每一个i X 的取值i x ,条件概率))(|(i i X pa x p 相当于一个参数)(|i i x pa x θ,表示在给定)(i X Pa 发生的情况下,事件i x 发生的条件概率。
实际上,贝叶斯网络给定了变量集合X 上的联合条件概
率分布:
∏==n
i i i n x pa X p X X X p 121))
(|(),,,(Λ贝叶斯网络通过这种分解大大降低了计算和存储联合分布的消耗。
如果n 个变量都是两值变量,则联合分布的状态数为n 2,即需要n 2个参数描述联合分布。
而当联合分布分解为若干个较低维数变量上的条件分布的乘积时,需要的参数就大大减少了。
图1所示的有向无环图是包含5个随机变量{}54321,,,,X X X X X X =的贝叶斯网络的结构。
其中,1X 是根节点,它有两个子节点,分别为2X 和3X 。
同时2X 和3X 是节点4X 的父结点。
由图的结构可以将X 的联合分布分解为五个条件分布的乘积如下:
)
|(),|()|()|()())
(|()(45324131215
1X X p X X X p X X p X X p X p X pa X p X p i i i ==∏−如果X 是两值变量,用j i x 表示事件2,1,==j j X i 。
)(j i x p 表示时间j X i =发生的概率。
则有
)
|(),|()|()|()(),,,,()
|(),|()|()|()(),,,,()
|(),|()|()|()(),,,,(242522232421232122212524232221241522132411131122111524132211141512131411131112111514131
211x x p x x x p x x p x x p x p x x x x x p x x p x x x p x x p x x p x p x x x x x p x x p x x x p x x p x x p x p x x x x x p ===
图15个随机变量的贝叶斯网络实例
为了建立贝叶斯网络模型,首先须确定相关变量及一个可靠的数据训练集合,利用数据进行训练。
为此,首先应确定模型的目标,即确定相关的解释;其次需要建立一个条件独立的有向无环图;然后指派局部概率分布。
在离散的情况下,需要为每一个变量的父节点集的各个状态指派一个分布。
2预测价值客户,改进服务水平
贝叶斯网络学习模型是结合先验信息和后验信息的图形模型,由于综合了数据的先验信息和后验信息,因此在对客户数据分析时可以避免经典数据挖掘算法只使用先验信息带来的主观偏见,以及在缺乏样本信息时的烦杂搜寻和计算,也可以避免因为样本数据的噪声带来的影响。
贝叶斯网络模型在客户关系管理中发挥着重要的作用,通过贝叶斯网络可以对客户进行很好的分类,满足他们的个性需求,对保持原有的黄金客户和争取更多的客户成为黄金客户非常必要;针对不同的客户,贝叶斯网络对客户进行建模,预测客户的预测需求;对客户信用进行分析,从而为企业避免潜在风险。
贝叶斯网络可以结合先验信息和样本数据对客
户信用进行有效分析,从而为企业做出正确的战略决策提供有力支持。
客户欠费状况一直困扰着通信行业,对欠费的客户是否立即停机,停机是否会流失部分客户,若不停机,则电信将会面临财政收支不平衡。
本文利用贝叶斯网络对移动通信客户信用度精度进行计算,以其作为客户信用预测评估的依据,对电信客户欠费是否立即停机进行判断,为企业的决策提供支持,从而为企业带来更大的收益。
选取如下5个变量来对客户的信用度精度进行计算,5个变量分别表示客户以前是否发生欠费情况(F),是否有稳定的收入(R),客户性质(B),客户的月通话量(S)和使用服务(A)。
B={学生,工作人员,老年人},S={少(s<100小时),多(s>100小时)}
A={动感地带,神州行,大众卡}
首先确定变量,然后根据先验知识选择合适的网络结构。
对于含有5个变量的数据样本,可能组成的网络结构有5!种,对每一个网络结构进行计算分析是不必要的,利用已有的知识派出大量不合理的结构。
研究中发现S 和A 之间是不会有联系的,结合实际现只选择1S 和2S 两种网络结构。
图1就是1S 的网络结构,2S 是在其基础上增加一条A 到R 的弧,
并假定这两种结构都具有统一的先验概率5.0)()(21==h h S p S p ,在网络结构1S 中,有以
下的关系式:
001
.0),50,|(02
.0),5030,|(,0005.0),30,|(0002
.0),50,|(,004.0),5030,|(001
.0),30,|(,05.0*)*,,|(01
.0)|(,2.0)|(5
.0)(,49.0)5030(,00001.0)(,25.0)30(),,|(),,,|(),|(),,|(),(),|(),()|(==≥====−=====≤====≥====−=====≤================−====≤====少否是少否是少否是多否是多否是多否是是是否是是是多是s a f b p s a f b p s a f b p s a f b p s a f b p s a f b p s a f b p f g p f r p s p a p f p a p s a f bb p r s a f b p f r p s a f r p s p a f s p a p f a P 已知其他变量的观测值,要计算电信客户拖欠话费的概率表达式为:
∑′′==),,,,(),,,,(),,,(),,,,(),,,|(b r s a f p
f b r s a f p b r s a p b r s a f p b r s a f p ∑∑′′′′=′′′′=)
,,|()|()()
,,|()|()(),,|()|()(),,|()|()()()(s a f b p f r p f p f s a f b p f r p f p s a f b p f r p f p f s a f b p f s p s p a p f p 其中,f ′表示f 所有可能状态,D 为数据样本,ξ代表先验知识,则有
图1移动客户拖欠话费的贝叶斯网络模型分析图S1
74
.0),|(,26.0),|(21==ξξD S p D S p h h 由此可以看出来,1S 和2S 网络结构虽然差别不大,但是由它们计算出的结构却相差甚远。
这说明贝叶斯网络有很好的敏感性,非常适用于客户关系管理中对客户信用度精度的计算。
3结论
客户关系管理成为提升企业核心竞争力的管理理念和技术,被越来越多地专家和企业所青睐。
贝叶斯网络作为一种新的挖掘技术,综合先验信息和样本信息,对数据具有很强的处理能力,具有更好的逻辑性和可理解性。
对客户信用状况进行分析,主要是为了能为不同的客户提供个性化的服务,从而减少由于客户欠费和不当停机造成的损失。
从本文所用的模型对客户信用状况的分析结果来看,误判率和漏判率均达到了实际的要求,可以在遏制高额欠费的同时,让更多的客户得到优质的人性化服务。
但是,目前建立的贝叶斯网络模型还有待于进一步改进和完善。
(1)客户其它属性对信用度精度的计算也存在一定的影响,当其它属性能力增强时,模型的分类能力会有所降低,有必要考虑更多的能力属性。
(2)进一步改进算法,使网络能快速的收敛,与此同时可以提高模型的训练精度,增加训练时间,使模型的分类正确率得到进一步的提高。
(3)对作为理想输出的信用状况做出进一步的分析,以求对客户进行更为详尽的分类。
参考文献:
[1]王辉用于预测的贝叶斯网络东北师大报自然科学学报,2002
[2]翼俊忠刘椿年沙志强贝叶斯模型的学习、推理和应用计算机工程与应用,2003
[3]朱慧明陈俊武基于贝叶斯网络学习模型的客户关系管理研究统计与决策2006
[4]陈沁客户关系管理CRM 在企业管理及电信领域中的应用[D].中央财经大学,2002.
[5][美]康塔尼克.数据挖掘概念、模型、方法和算法[M].北京:清华大学出版社,2003.
[6]Chen J,Greiner paring Bayesian network classifiers [A]Proceedings
of the fifteenth conference on uncertainty inartificial intelligence [C].San Francisco,Morgan Kaufmann,1999.101-107.
[7]Heckeman D.Bayesian networks for data mining.Data mining and knowledge
discovery.1997,1;79-119。