游戏拍卖行系统毕业设计(论文)
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
大连交通大学信息工程学院
毕业设计(论文)任务书题目游戏拍卖行系统
毕业设计(论文)进度计划与考核表
指导教师签字:年月日注:“计划完成内容”由学生本人认真填写,其它由指导教师考核时填写。
大连交通大学信息工程学院
毕业设计(论文)外文翻译
学生姓名李青霖专业班级软件工程08-1班
指导教师常敬岩史原职称高工讲师
所在单位信息科学系软件工程教研室
教研室主任刘瑞杰
完成日期 2012 年 4 月 13 日
A clustering method to distribute a database on a grid
ScienceDirect:Future Generation Computer Systems 23 (2007) 997–1002
Summary:Clusters and grids of workstations provide available resources for data mining
processes. To exploit these resources, new distributed algorithms are necessary, particularly concerning the way to distribute data and to use this partition. We present a clustering algorithm dubbed Progressive Clustering that provides an “intelligent” distribution of data on grids. The usefulness of this algorithm is shown for several distributed datamining tasks.
Keywords: Grid and parallel computings; Data mining; Clustering
Introduction
Knowledge discovery in databases, also called data mining, is a valuable engineering tool that serves to extract useful information from very large databases. This tool usually needs high computing capabilities that could be provided by parallelism and distribution. The work developed here is part of the DisDaMin project that deals with data mining issues (as association rules, clustering, . . . ) using distributed computing. DisDaMin’s aim is to develop parallel and distributed solutions for data mining problems. It achieves two gains in execution times: gain from the use of parallelism and gain from decreased computation (by using an intelligent distribution of data and computation). In parallel and distributed environments such as grids or clusters, constraints inherent to the execution platform must be taken into account in algorithms. The non-existence of a central memory forces us to distribute the database into fragments and to handle these fragments using parallelism. Because of the high communication cost in this kind of environment, parallel computing must beas autonomous as possible to avoid costly communications (or at least synchronizations). However, existing grid data mining projects (e.g. Discovery Net, GridMiner, DMGA [7], or Knowledge Grid [11]) provide mechanisms for integration and deployment of classical algorithms on grid, but not new grid-specific algorithms.
On the other hand the DisDaMin project intends to tackle data mining tasks considering data mining specifics as well as grid computing specifics. For data mining problems, it is necessary to obtain an intelligent data partition, in order to compute more independent data fragments. The main problem is how to obtain this intelligent partition. For the association rules problem, for example, the main criterion for intelligent partition is that data rows within a fragment are as similar as possible (according to values for each attribute), while data rows between fragments are as dissimilar as possible. This criterion allows us to parallelize this problem which normally needs to access the whole database. It allows us to decrease complexity (see [2]). As this distribution criterion appears similar to the objective of clustering algorithms, the partition could be produced by a clustering treatment. The usefulness of the intelligent partition obtained from clustering for the association rules problem has already been studied (see [2]).
Clearly the clustering phase itself has to be distributed and needs to be fast in order not to slow down the global execution time. Clustering methods will be described before introducing the Distributed Progressive Clustering algorithm for execution on grid.