信管专业英语翻译
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
ETL subsystem is an important subsystem of MSMiner. The main motivation of ETL function module is to transform the operational date from source database to analytical data in data warehouse. As we all know, the data in data warehouse is integrated and extracted from disperse databases(for example Oracle, SQL Server, Access, Foxpro, Excel, DB2 etc),and there are many differences between the operational data in source database and the analytical data in data warehouse, so it is not good way to load the data from various data sources into data warehouse directly. Namely, to get the clean data for data warehouse, the data from previous database must be cleaned, collected and transformed before being integrated into data warehouse. It is a key and complex step during building data warehouse. Generally speaking, ETL subsystem needs to finish the following words:
(1)Because of data repetition and conflict in the source data from disperse databases, the
subsystem should unify the conflict data.
(2)To get the comprehensive data in data warehouse, the subsystem should transform the original
data structure from application-oriented one to subject-oriented one and do some generating and computing.
There are 4 modules in ETL subsystem:
(1)The friendly user interface
The users can do any ETL operations expediently by this interface, such as designing the ETL tasks, registering new ETL DLL functions, scheduling and executing ETL tasks and visiting the results of ETL tasks.
(2)The integrated ETL functions management and ETL tasks management
This module including registering new ETL DLL functions, building new ETL tasks, scheduling and processing of ETL tasks, etc.
(3)The uniform metadata management
The whole subsystem is developed in metadata-oriented way. Namely all information of this subsystem, including data source, algorithm and result, are managed by metadata.
(4)The database server
ETL subsystem supports disperse and various databases (for example Oracle, SQL Server, Access, Foxpro, Excel, DB2 etc).
The subsystem supports the expandable ETL function base. The main algorithms for ETL function are realized in the form of dynamic link lib (DLL) with uniform interfaces. Users can design the ETL task according to their needs by choosing the relevant ETL DLLs. At present the subsystem provides about 30 kinds of ETL DLLs. In addition, users can develop some new ETL DLLs in accordance with uniform interfaces, and add them into ETL function base. In order to improve the efficiency, the ETL tasks can be scheduled at designated time and processed concurrently.
ETL子系统是MSMiner的一个重要的子系统。
ETL功能模块的主要动机是转化从源数据库到数据仓库中分析数据的运行日期。
正如我们都知道,数据仓库中的数据是集成的和从分散的数据库(例如甲骨文,SQL服务器,Access,FoxPro,Excel中,DB2等)中提取的,并在源数据库中的运行数据和数据仓库中分析数据有很多的差异,所以它不是一个很好的直接加载从各种数据源中的数据到数据仓库的方式。
也就是说,要从数据仓库获得干净的数据,在被集成到数据仓库之前,先前的数据库中的数据必须被清理、收集和转化。
在建设数据仓库的过程中,这是一个关键和复杂的步骤。
一般来说,ETL子系统需要完成下面的几点:(1)由于从分散的数据库到源数据库中数据的重复和冲突,子系统应该统一这些冲突数
据。
(2)为了在数据仓库中获得全面的数据,子系统应该改造原有的数据结构--从以应用为导向的一个改为面向主题的一个,并做一些生成和计算。
在ETL子系统有4个模块:
(1)友好的用户界面
用户可以方便地通过这个接口做任何ETL操作,如设计ETL任务,注册新的ETL DLL 功能,调度和执行ETL任务和访问ETL任务的结果。
(2)集成的ETL功能管理和ETL任务管理
此模块包括注册新的ETL DLL功能,建立新的ETL任务,调度和处理ETL任务等。
(3)统一的元数据管理
整个子系统开发是用面向元数据的方式。
即该子系统的所有信息,包括数据来源,算法和结果,由元数据管理。
(4)数据库服务器
ETL子系统支持分散和各种数据库(例如甲骨文,SQL服务器,Access,FoxPro,Excel 中,DB2等)。
子系统支持可扩展的ETL函数库。
ETL功能的主要算法都具有统一接口的动态链接库(DLL)的形式实现。
用户可以根据自己的需要设计ETL任务,选择相关的ETL的DLL。
目前该子系统提供约30种ETL的DLL。
此外,用户还可以开发一些新的ETL的DLL,按照统一的接口,并将它们添加到ETL函数库中。
为了提高效率,可以将ETL任务调度在指定时间,并发处理。