文献翻译-数据类型泛化用于数据挖掘算法

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

英文翻译

系别

专业

班级

学生姓名

学号

指导教师

Data Types Generalization for Data Mining Algorithms

Abstract

With the increasing of database applications, mining interesting information from huge databases becomes of most concern and a variety of mining algorithms have been proposed in recent years. As we know, the data processed in data mining may be obtained from many sources in which different data types may be used. However, no algorithm can be applied to all applications due to the difficulty for fitting data types of the algorithm, so the selection of an appropriate mining algorithm is based on not only the goal of application, but also the data fittability. Therefore, to transform the non-fitting data type into target one is also an important work in data mining, but the work is often tedious or complex since a lot of data types exist in real world. Merging the similar data types of a given selected mining algorithm into a generalized data type seems to be a good approach to reduce the transformation complexity. In this work, the data types fittability problem for six kinds of widely used data mining techniques is discussed and a data type generalization process including merging and transforming phases is proposed. In the merging phase, the original data types of data sources to be mined are first merged into the generalized ones. The transforming phase is then used to convert the generalized data types into the target ones for the selected mining algorithm. Using the data type generalization process, the user can select appropriate mining algorithm just for the goal of application without considering the data types.

1. Introduction

In recent years, the amount of various data grows rapidly Widely available, low-cost computer technology now makes it possible to both collect historical data and also institute on-line analysis for newly arriving data. Automated data generation and gathering leads to tremendous amounts of data stored in databases Although we are filled with data, but we lack for knowledge. Data mining is the automated discovery of non-trivial, previously unknown, and potentially useful knowledge embedded in databases. Different kinds of data mining methods and algorithms have

been proposed,each of which has its own advantages and suitable application domains. However, it is difficult for users to choose an appropriate one by themselves.to choose an appropriate one by themselves. This is because the data provided can not be directly used for data mining algorithms. Since most data mining algorithms can only be applied to some specific data types, the types of data stored in databases restricts the choice of data mining methods. If certain kinds of knowledge need to be obtained using some data mining algorithms, data types transformation should be done first and this is what we called“the data types fittability problem”for data mining. For the time being, there is no tool that can help users to do this kind of data types transformation. In this paper, we will survey and analyze the data types fittability problem for data mining algorithms, and then we propose a“data types generalization process”to solve the data types fittability problem for the attributes in relational databases.

The “data types generalization process” i ncluding merging and transforming phases is a procedure to transform the data types of atttributes contained in relations (tables). In the merging phase, the original data types of data sources to be mined are first merged into the generalized ones. The transforming phase is then used to convert the generalized data types into the target ones for the selected mining algorithm. Using the data type generalization process, the user can select appropriate mining algorithm just for the goal of application without considering the data types.

2. Related work

As mentioned above, because many data mining algorithms can only be applied to the data types with restricted range, users possibly need to do data types transformation before the selected algorithm has been executed. In this paper, we propose a general concept called “data types generalization process“ which provide a procedure for doing this kind of data types transformation. Data types generalization can be seen as a pre-processing of data mining. Of course, other pre-processing such as data selection, data cleaning, dimension (attribute) reduction, missing data handling may also need to be performed before running the selected data mining algorithm.

相关文档
最新文档