大数据与数据挖掘技术
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
大数据与数据挖掘技术
近些年,由于以社交网站、基于位置的服务LBS 等为代表的新型信息产生方式的涌现,以及云计算、移动和物联网技术的迅猛发展,无处不在的移动、无线传感器等设备无时不刻都在产生数据,数以亿计用户的互联网服务时时刻刻都在产生着数据交互,大数据时代已经到来。在当下,大数据炙手可热,不管是企业还是个人都在谈论或者从事大数据相关的话题与业务,我们创造大数据同时也被大数据时代包围。在大量的数据中找到有意义的模式和规则。在大量数据面前,数据的获得不再是一个障碍,而是一个优势。对于数据量早已逾越TB、增长率惊人、实时性高的大数据,如何分析、管理、利用大数据等工作仍将面临若干的挑战。
互联网数据中心对大数据的定义为:为更经济地从高频率的、大容量的、不同结构和类型的数据中获取价值而设计的新一代构架和技术。所有对大数据的定义基本上是从大数据的特征出发,通过这些特征的阐述和归纳给出其定义。在这些定义中,可将大数据的特点总结为:规模性(volume)、多样性(variety)、高速型(velocity)和价值性(value)。
大数据的核心:数据挖掘。从头至尾我们都脱离不了数据挖掘。其实从大学到现在一直都接触数据挖掘,但是我们不关心是什么是数据挖掘,我们关心的是我们如何通过数据挖掘过程中找到我们需要的东西。大数据的挖掘是从海量、不完全的、有噪声的、模糊的、随机的大型数据库中发现隐含在其中有价值的、潜在有用的信息和知识的过程,也是一种决策支持过程。其主要基于人工智能,机器学习,模式学习,统计学等。通过对大数据高度自动化地分析,做出归纳性的推理,从中挖掘出潜在的模式,可以帮助企业、商家、用户调整市场政策、减少风险、理性面对市场,并做出正确的决策。目前,在很多领域尤其是在商业领域如银行、电信、电商等,数据挖掘可以解决很多问题,包括市场营销策略制定、背景分析、企业管理危机等。大数据的挖掘常用的方法有分类、回归分析、聚类、关联规则、神经网络方法、Web 数据挖掘等。这些方法从不同的角度对数据进行挖掘。
最后,大数据不是最终答案,而是参考答案,千万不要神化了大数据。往往从神化到妖魔化只有一线之隔。记住,更大的数据是人类本身,在使用这一科技资源时要怀有谦恭之心,时刻铭记人性之本。
Big Data and data mining technology
In recent years, due to the social networking site, the emergence of new ways of information generated location-based services LBS, represented, as well as the rapid development of cloud computing, mobile and networking technologies, ubiquitous mobile, wireless sensors and other equipment no time not engraved in generating data, hundreds of millions of users of Internet services all the time to produce the data exchange, Big Data era has arrived. In the moment, large data sought, whether business or individual are talking about or engage in big data and business-related topics, we have created a large data also is surrounded by big data era. Find meaningful patterns and rules in large amounts of data. In front of the large amounts of data, access to data is no longer an obstacle, but an advantage. The amount of data already beyond TB, amazing growth, high real-time big data, how to analyze, manage, use big data work will continue to face a number of challenges.
Internet data centers for large data defined as follows: for the more economically, different structures and types of data to extract value from the high-frequency, high-capacity and design a new generation architecture and technology. All the definition of big data is basically starting from the characteristic large data through elaborate and generalize these features give its definition. In these definitions, the characteristics of large data can be summarized as follows: scale (volume), diversity (variety), high-speed type (velocity) and value (value).
The core of big data: data mining. We are inseparable from start to finish data mining. In fact, from university to now has been in contact data mining, but we do not care what is data mining, we are concerned about is how we pass the data mining process to find what we need. Mining Big Data is from the mass, incomplete, noisy, fuzzy, random large database found implicit in the valuable, potentially useful information and knowledge, but also a decision support process. Which is mainly based on artificial intelligence, machine learning, pattern learning and statistics. Based on highly automated analysis of large data, make inductive reasoning, dig out the potential of the model, can help enterprises, businesses, user adjustment of market policies to reduce risk, rational face of the market, and make the right decisions. Currently, in many areas, especially in commercial areas such as banking, telecommunications, electricity providers and other data mining can solve many problems, including marketing strategy development, background analysis, enterprise management crisis. Big data mining methods used are classification, regression, clustering, association rules, neural networks, Web data mining. These methods from a different perspective on data mining.
Finally, the big data is not the final answer, but with reference to the answer, do not be deified big data. Often from deification to demonize only a thin line. Remember, the more data is humanity itself, in the use of the scientific and technological resources have to have the humility of the heart, ever mindful of the human nature.