Chapter2-DataWarehouse
The Data Warehouse ETL Toolkit 2nd
ix xxxi 1 1 3 4 4 5 5 5 6 6 6 6 7 7 7 7 7 8 8 8 9 9 10 11 12 13 14
xiv
Contents Chapter 2 Launching and Managing the Project/Program Define the Project Assess Your Readiness for DW/BI Strong Senior Business Management Sponsor(s) Compelling Business Motivation Feasibility Factors Not Considered Readiness Deal Breakers Address Shortfalls and Determine Next Steps Strong Sponsor, Compelling Business Need, and Quality Data Poor Quality Data Weak Business Sponsor or IT-Only Sponsor Too Much Demand from Multiple Business Sponsors Well Meaning, But Overly Aggressive Business Sponsor Legacy of Underperforming, Isolated Data Silos Develop the Preliminary Scope and Charter Focus on a Single Business Process The Role of Rapid Application Development Document the Scope/Charter Build the Business Case and Justification Determine the Financial Investments and Costs Determine the Financial Returns and Benefits Combine the Investments and Returns to Calculate ROI Plan the Project Establish the Project Identity Staff the Project Front Office: Sponsors and Drivers Coaches: Project Managers and Leads Regular Lineup: Core Project Team Special Teams Free Agents Convert Individual Talent into a Team Develop the Project Plan Develop the Communication Plan Project Team Sponsor and Driver Briefings Business User Community Communication with Other Interested Parties Manage the Project Conduct the Project Team Kickoff Meeting Monitor Project Status Project Status Meetings Project Status Reports Maintain the Project Plan Consolidate the Project Documentation 15 16 16 16 17 17 18 18 19 19 19 20 21 21 22 22 24 25 27 27 28 30 31 31 32 33 34 35 38 39 40 40 43 44 45 45 46 46 47 48 48 49 50 50
数据仓库(外文翻译)
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information onwhich an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informationfrom multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
关于数据仓库的书 -回复
关于数据仓库的书-回复
关于数据仓库的书籍,这里推荐几本在业内颇具影响力的:
1. 《数据仓库工具箱:维度建模权威指南》
作者:Ralph Kimball
这本书被誉为数据仓库领域的经典之作,作者是维度建模方法的主要倡导者,书中详细阐述了维度建模的概念、方法和实践案例,对于理解并构建数据仓库非常有帮助。
2. 《数据仓库设计与实现》
作者:W.H. Inmon
W.H. Inmon被誉为“数据仓库之父”,这本书是其代表作之一,系统介绍了数据仓库的设计原则、架构以及实施过程,适合对数据仓库进行全面学习的读者。
3. 《大数据技术原理与应用——从数据获取到商业智能》
作者:耿新等
这本书结合理论与实践,深入浅出地讲解了大数据环境下的数据仓库技术,包括Hadoop、数据仓库模型、ETL等,并探讨了如何将这些技术应用于商业智能中。
4. 《数据仓库与商务智能:概念、技术和应用》
作者:Ramesh Sharda, Dursun Delen, Efraim T. Turban
这本书全面介绍了数据仓库与商务智能的基本概念、关键技术及应用实例,适合初学者或希望深入了解数据仓库在商务智能领域应用的专业人士阅读。
5. 《数据湖实战:构建企业级数据仓库》
作者:Jarek Ratajski
随着大数据技术的发展,数据湖成为新的热门话题。
本书主要介绍了如何基于现代大数据技术如Hadoop、Spark等构建企业级数据仓库和数据湖。
以上书籍可以按照您的需求和兴趣选择阅读,从而更好地理解和掌握数据仓库的相关知识和技能。
《计算机英语》课后习题答案
《计算机英语》参考答案Chapter 11.(1) 中央处理器(Central Processing Unit)(2) 随机访问内存(Random-access Memory)(3) 美国国际商用机器公司(International Business Machine)(4) 集成电路(Integrated Circuit)(5) 大规模集成电路(Large Scale Integration)(6) 超大规模集成电路(Very Large Scale Integration)(7) 个人数字助理(Personal Digital Assistant)(8) 图形用户界面(Graphical User Interface)2.(1) data(2) software(3) IC(4) ENIAC(5) supercomputer(6) superconductivity3.(1) F (ENIAC is the second digital computer after Atanasoff-Berry Computer)(2) T(3) F (Data is a unorganized)(4) T(5) T(6) T4.(1) 人工智能(2) 光计算机(3) 神经网络(4) 操作系统(5) 并行处理(6) vacuum tube(7) integrated circuit(8) electrical resistance(9) silicon chip(10) minicomputer5.数据是未经组织的内容的集合,数据可以包括字符、数字、图形和声音。
计算机管理数据,并将数据处理生成信息。
向计算机输入的数据称为输入,处理的结果称为输出。
计算机能在某一个称为存储器的地方保存数据和信息以备后用。
输入、处理、输出和存储的整个周期称为信息处理周期。
与计算机交互或使用计算机所产生信息的人称为用户。
1.(1) 发光二极管(Light-Emitting Diode)(2) 静态随机存储器(Static Random Access Memory)(3) 只读存储器(Read Only Memory)(4) 运算器(Arithmetic and Logical Unit)(5) 阴极射线管(Cathode Ray Tube)(6) 视频显示单元(Visual Display Unit)(7) 可编程只读存储器(Programmable Read Only Memory)(8) 液晶显示屏(Liquid Crystal Display)2.(1) CPU(2) peripheral(3) memory(4) modem(5) control unit(6) byte3.(1) T(2) T(3) F (RAM is volatile memory because the information within the computer chips is erased as soon as the computer is powered off whereas ROM is nonvolatile)(4) T(5) T(6) F (Microphones and digital cameras are input devices)4.(1) 寄存器组(2) 主机(3) 二进制的(4) 算法(5) 光盘(6) CD-RW(7) logic operation(8) barcode(9) peripheral device(10) volatile memory5.计算机的内存可被视为一系列的单元,可以在单元中存取数字。
IBM DB2 Warehouse 2 商品说明书
•••To better support innovation and dif-ferentiation, you need the ability to bring together a “customer view” with a traditional “product view” and you need to give more users and processes on demand access to accurate, in-context and actionable information. Of course, the idea of more timely and widespread information access is great. But the technologist side of your brain is prob-ably screaming, “Complexity!” Andthe business side is probably dubious, given the potential costs and risks. Both sides know that status quo data ware-housing solutions and approaches will not support these seemingly conflicting needs. That’s why a new approach that employs more dynamic and balanced warehousing capabilities is required.With IBM Balanced Warehouse offer-ings, IBM can help your company optimize warehousing performance with best practices to enable you to:Coordinate marketing plans across channels to position your company for growth.Manage inventory across channels and plan assortment based on mar-ketplace needs.Tailor promotions to each customer segment.Enable staff with right-time views into inventory availability.Watch ideas take flight with a flexible, manageable approach IBM provides all of the software and hardware capabilities you need to deploy, maintain and evolve an enterprise-wide data warehouse through IBM Balanced Warehouse solutions. A robust com-bination of database, analytic and warehousing software, servers and stor-age components gives you the ability to analyze and act on large amounts of structured and unstructured information. Moreover, Balanced Warehouse solu-tions rely on industry open standards and nonproprietary hardware, so they’ll work with your existing systems and support easy redeployment as needed.••••IBM Balanced Warehouse solutions are preconfigured using best practices and extensive certification to support the needs of enterprise environments, including the need to:Handle large data volumes. IBM uses a modular design that enables you to easily and cost-effectively scale units to support data growth.Maintain high availability. Balanced Warehouse solutions use IBM compo-nents selected for optimum price and performance, and include hardware component redundancy and a fault-tolerant design for robust availability.Work with comprehensive, integrated software. All of the software tools you need to get started—including information storage, management and delivery tools, and business analytics tools—come standard.Given their advanced,integrated capabilitiesand performance attributes, IBM Balanced Warehouse solutions are an ideal foundation to support dynamic warehousing. This approach enables you to leverage immediate business insight across merchandising, supply chain, store and channel operations, rather than limiting you to providing only after-the-fact reports and analysis from data warehouses. So more people and processes have the information they need to create differentiated customer experiences that help improve customer satisfaction and loyalty.The heart of dynamic ware-housing: IBM DB2 Warehouse Derive more value from information more quickly without adding IT staff. Unlike most data warehousing and business intelligence solutions that are pieced together with components from mul-tiple vendors, IBM DB2® Warehouse software, which is the heart of the IBM Balanced Warehouse solution, provides•••••a complete, integrated and highly flex-ible and scalable data warehousingstack that works together from day one.It offers the tooling and infrastructureto simplify the design, deployment andmaintenance of an enterprise data ware-house. And built-in retail data models(for example, models for customercentricity, merchandising management,store operations and product manage-ment, and supply chain management)and other industry-optimized miningtools and in-line analytics extendpowerful warehousing capabilities toall frontline users. Imagine what the ITdepartment, decision makers and evenstore employees could do with a datawarehouse that enables you to:Store more with less and improvequery performance dramaticallywith the help of row compressiontools, which can help reduce diskstorage needs by 50 percent, andwith materialized query tables andmultidimensional clusters, which aredesigned to improve the performanceof complex aggregate queries.•Reduce investment risks with amodular, quality-tested solution thatprovides around-the-clock supportfrom a single phone number and easygrowth at a predictable cost.Provide users with visibility intooperational and transactional datawithin the context of the applicationsthey use every day, to support greaterresponsiveness to business needs.Exchange data in two directions tohelp ensure that the data warehouseis feeding accurate data to opera-tional and transactional systems andbusiness intelligence applications.Provide high performance for mixedworkload query processing with thehelp of a shared-nothing architecturethat can scale multiple workloads upand out without affecting performance.Unify business intelligence into asingle solution with built-in analyticbuilding blocks that help you extendanalytics into applications.Start seeing the advantage of a balanced warehouseBased on IBM’s experience providing data warehousing to leading companies around the world, IBM has identified three strategic pillars for warehouse solutions that guide its solution design: Simplicity. Reliability and performance. And extended insight. As your data volumes and need for dynamic informa-tion grow, you can be confident that IBM solutions designed using these prin-ciples will help you optimize the value of your information.Choose a solution that’s right for youIBM understands what it takes to runa data warehouse in a retail enter-prise. To meet your company’s unique needs, IBM offers DB2 Warehousein standalone solutions or as part of preconfigured, preintegrated, pretested and highly scalable IBM Balanced Warehouse solutions. Access to accurate information acrossmerchandising, supply chain, store andchannel operations is the key to deliver-ing a superior shopping experience,creating a demand-driven supply chainand driving operational excellence. DB2Warehouse solutions offer targeted anal-ysis for merchandising, supply chain,multichannel and store applications.And with prebuilt retail data models, aproven implementation methodologyand embedded mining capabilities, youcan potentially achieve a faster time tovalue from data warehousing effortswhen you employ DB2 Warehouse. Byhelping you give more users and appli-cations access to dynamic information,Balanced Warehouse solutions can helpyou unlock the value of all of your data.So you can drive greater efficiency, dif-ferentiation and customer loyalty.For more informationTo learn more about IBM BalancedWarehouse solutions and IBM DB2Warehouse, and for help choosing thesolution that’s right for you, contactyour IBM sales representative or visit:/software/bi© Copyright IBM Corporation 2007IBM CorporationSoftware GroupRoute 100Somers, NY 10589U.S.A.Produced in the United States of America08-07All Rights ReservedDB2, IBM, the IBM logo and are trademarksof International Business Machines Corporation in theUnited States, other countries or both.Other company, product and service names may betrademarks or service marks of others.References in this publication to IBM products orservices do not imply that IBM intends to make themavailable in all countries in which IBM operates.The information contained in this documentationis provided for informational purposes only. Whileefforts were made to verify the completeness andaccuracy of the information contained in this docu-mentation, it is provided “as is” without warranty ofany kind, express or implied. In addition, this infor-mation is based on IBM’s current product plans andstrategy, which are subject to change by IBM withoutnotice. IBM shall not be responsible for any dam-ages arising out of the use of, or otherwise relatedto, this documentation or any other documentation.Nothing contained in this documentation is intendedto, nor shall have the effect of, creating any warran-ties or representations from IBM (or its suppliers orlicensors), or altering the terms and conditions of theapplicable license agreement governing the use ofIBM software.The IBM home page on the Internet can be found at®.IMB10923-USEN-00。
数仓模型设计 流程
数仓模型设计流程Designing a data warehouse model is a crucial step in the process of building a robust and efficient data infrastructure. 数仓模型设计是建立稳健高效数据基础设施过程中的关键一步。
It involves structuring and organizing data in a way that facilitates easy access, retrieval, and analysis for decision-making. 这涉及对数据进行结构化和组织,以便于决策时进行轻松访问、检索和分析。
A well-designed data warehouse model should be able to integrate data from multiple sources, maintain data quality, and provide valuable insights for business operations. 一个精心设计的数仓模型应该能够集成来自多个来源的数据,保持数据质量,并为业务运营提供有价值的见解。
One of the key aspects of designing a data warehouse model is understanding the specific requirements of the organization and its stakeholders. 设计数仓模型的一个关键方面是理解组织及其利益相关者的具体需求。
This involves conducting thorough interviews and meetings with various departments and business users to gather requirements and ensure that the data warehouse model meets the needs of all stakeholders. 这包括与各部门和业务用户进行深入的访谈和会议,以收集需求,并确保数仓模型满足所有利益相关者的需求。
数据仓库(中英文翻译)
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneoussources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, datawarehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, d ata warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
数据仓库(Data Warehousing)说明书
Trademarks All products and service marks mentioned herein are trademarks of the respective owners mentioned in the articles and or on the website. The publishers cannot attest to the accuracy of the information provided . Use of a term in this book and/or website should not be regarded as affecting the validity of any trademark or service mark. 1st Edition April 2001 All rights reserved . Copyright © 2001 Friedr. Vieweg & Sohn Verlagsgesellschaft mbH, Braunschweig/wiesbaden
international IT-training corporation, the editors have easy access to the latest information on IT-developments and are kept well-informed by their colleagues. In their research
IBM DB2 Warehouse解决方案说明书
Data warehousingTo support your business objectivesRely on business insight—not just intuition. Guide business strategy more effectively with IBM DB2 Warehouse solutions for small data warehouse implementationsChances are your systems contain all of the hard evidence that business lead-ers need to understand trends and make informed decisions. The problem is you don’t have a way to easily and cost-efficiently compile and analyze the data —so your organization’s decision makers must rely heavily on experience and intuition. But what intuition suggests and what data shows are often surpris-ingly different.Take the example of a small bank in the northwestern United States. Bank management assumed that wealthy clients who made large deposits were the bank’s most profitable clients. However, after implementing a solution that enabled bank leaders to analyze customer profitability, they realized that the opposite was true. Wealthy clients were savvy negotiators who typically negotiated all of the profits out of the instruments they purchased from the bank. In fact, smaller, more modest investors were more profitable —the bank just needed a strategy to attract more of them.Fact or best guess?The decision-making disconnectIn recent years, small and large organizations alike have seen rapid increases in the amounts of company and cus-tomer data available to them.For many companies, however, an inability to easily access and use this information strategically limits theira bility to optimize operational efficien-cies and differentiate themselves from the competition. And given commoditi-zation and intense competitive pressures, businesses need every advantage they can get to deliver higher service levels, increase efficiency and address regula-tory requirements.Transform information into a strategic asset with a data warehouseIn any industry—whether you work in a small business or in a large company— if you can’t compile and analyzeh istorical data, you can’t separate facts from best guesses to maximize your competitive advantage. And the reason that many organizations can’t get to this information is a lack of integration among company data sources ranging from desktops to legacy systems, servers and intranets. While it may be easy to access current sales or financial data in a single system, pulling together differ-ent types of historical data from multiple systems to see whether you can find any business opportunities is another story. Fortunately, there is an easy and cost-effective way to compile and analyze your data for strategic advantage: a data warehouse. In fact, a data ware-house —which is a central repository of information from the systems across your business—can help you improve decision making and give flight to new ideas across key strategic areas of your business, including:Sales analysis. Understand the regions and time periods in which products are selling, and identify the factors that contribute to wins and losses.•Customer relationship manage-ment. Better understand who your customers are, what they want and what they’re buying so you can give them what they want.Resource planning. Identify cost- cutting opportunities and budget trends to support better investment decisions.Make more informed decisions:IBM DB2 Warehouse softwareAn IBM DB2®Warehouse solution canprovide a data warehouse that delivers an up-to-the-moment, single view of company-wide data without overstretch-ing your IT team or bank account. By integrating data sources ranging from spreadsheets to heterogeneous, siloed, legacy systems, DB2 Warehouse can help decision makers capitalize on an organization-wide view of the information. And with the help of online analytical processing (OLAP) and data mining capabilities from IBM Business Partners, the software can help you navigate and find hidden relationships in your data to spark innovative ideas and see new business opportunities.••DB2 Warehouse software includes a powerful graphical tool, called the SQW warehousing tool, for designing, deploying and loading the warehouse to support data mining and analytics activities. And an easy-to-use interface makes it possible for a wide range of employees to access the capabilities. Moreover —unlike some small-scale data warehouse solutions from third-party vendors that support only limited, difficult-to-scale solutions —flexibility is a key attribute of DB2 Warehouse soft-ware. DB2 Warehouse can support both operational and transactional workloads and prioritize different requests, orga-nizations and users. The architecture also enables you to add more complex workloads, and it easily scales as your business requirements demand.A vast network of IBM Business Partners underpins DB2 Warehouse solutions. Business Partners provide an added layer of local support as well as solu-tions that are proven to integrate easily with IBM technology.Clear insight, costs and a growth pathLet’s face it, if you can’t deploy a data warehouse relatively quickly and maintain it with existing staff, then you probably don’t want one. What’s more, if it doesn’t start adding value soon after it’s in place, then it’s hard to justify the investment. Whether it’s by providing new insight through summarized data or through the output of a Business Partner applica-tion analyzing the key metrics for your business, DB2 Warehouse software can begin delivering insight as soon as your data is loaded.To help simplify the deployment of a data warehouse, IBM offers the IBM Balanced Warehouse solution. Specifi-cally designed to jump-start smaller warehousing implementations, the IBM Warehouse C class provides out-of-the-box solutions that include preintegrated, preconfigured DB2 Warehouse software and IBM systems and storage technol-ogy that are pretested to support optimal performance. Based on nonpropri-etary, readily available hardware, IBM B alanced Warehouse solutions can be easily reused and redeployed depending on changing business needs. They’re competitively priced and simple to use, and they scale easily as your business grows, helping to reduce hidden costs related to training, maintenance and growth. If you want to implement a datawarehouse solution on your own hard-ware, you can also choose from threedifferent competitively priced versions ofthe DB2 Warehouse software—basedon the features that make sense foryour business.IBM’s data warehouse offerings areflexible and agile so you can implementa solution that supports your currentbusiness needs and scale it all the wayup to hundreds of terabytes of data.Take comfort in knowing that you cantransform your data into reliable, con-sistent business insight and easily growyour data warehouse if your require-ments change.For more informationTo learn more about the IBM DB2Warehouse or IBM BalancedW arehouse solutions that bestmeet your business needs, visit:/software/bi© Copyright IBM Corporation 2007IBM CorporationSoftware GroupRoute 100Somers, NY 10589U.S.A.Produced in the United States of America03-07All Rights ReservedDB2, IBM and the IBM logo are trademarks ofI nternational Business Machines Corporation in theUnited States, other countries or both.Other company, product and service names may betrademarks or service marks of others.References in this publication to IBM products orservices do not imply that IBM intends to make themavailable in all countries in which IBM operates.The information contained in this documentationis provided for informational purposes only. Whileefforts were made to verify the completeness andaccuracy of the information contained in this docu-mentation, it is provided “as is” without warranty ofany kind, express or implied. In addition, this infor-mation is based on IBM’s current product plans andstrategy, which are subject to change by IBM withoutnotice. IBM shall not be responsible for any dam-ages arising out of the use of, or otherwise relatedto, this documentation or any other documentation.Nothing contained in this documentation is intendedto, nor shall have the effect of, creating any warran-ties or representations from IBM (or its suppliers orlicensors), or altering the terms and conditions of theapplicable license agreement governing the use ofIBM softwareThe IBM home page on the Internet can be foundat .IMB10902-USEN-00。
《人工智能与数据挖掘教学课件》2.datawarehouse.ppt
The Data Warehouse is always growing.
Operational Database vs. Data warehouse
Operational DB
Data Warehouse
Similar data can have Unified view of all
different representations data elements
Data Warehouse
Why Data warehouse
The most common issue companies face when looking at data mining is that the information is not in one place.
The biggest challenge business analysts face in using data mining is how to extract, integrate, cleanse, and prepare data to solve their most pressing business problems.
Data Mart
Data Marts can serve as a test vehicle for companies exploring the potential benefits of Data Warehouses.
第二章_数据仓库
summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data
element”.
Data Warehouse—Non-Volatile
A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency
complex query millions hundreds 100GB-TB query throughput, response
Why Separate Data Warehouse?
High performance for both systems DBMS— tuned for OLTP: access methods, indexing, concurrency
Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data sources Relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures,
《人工智能与数据挖掘教学课件》2.datawarehou
Operational Database vs. Data warehouse
Operational DB
Similar data can have different representations or meanings
The Data Warehouse
Integrated
The Data Warehouse is a centralized, consolidated database that integrates data retrieved from the entire organization.
the desktop query and reporting tools used for decision support
Data Warehousing Process Overview
Operational Vs. Multidimensional View Of Sales
Creating A Data Warehouse
Data Warehouse environment
the source systems from which data is extracted
the tools used to extract data for loading the data warehouse
the data warehouse database itself where the data is stored
The Data Warehouse
Time Variant
The Warehouse data represent the flow of data through time. It can even contain projected data.
库房(货场)管理制度(中英文)
库房(货场)管理制度(中英文)库房(货场)管理制度 Warehouse (yard) management system第一章目的The first chapter goal为加强仓库(货场)物资管理,明确物资出入库手续和流程,确保仓库(货场)有序、安全,特制定本制度。
Order to strengthen the administration of the warehouse (yard) supplies clear material for the procedures and processes,Ensure the safety of the warehouse (yard) order, this system.第二章库房(货场)人员工作内容The first chapter Warehouse (yard) personnel work scope and content1、保持库房(货场)材料堆放整齐、标示清楚,环境干净卫生与安全。
Keep the warehouse (yard) materials piled up neatly, clearly marked, the environment clean health and safety.2、入库:根据物资设备采购计划和报关资料核查入库物资型号、规格、数量、包装是否完好,检查无误后入库并填写《开箱检查记录》、《设备材料入库台账》。
Storage: according to the material equipment purchasing plan and customs declaration data check incoming material type, specification, quantity and packing are in good condition, and so on, put in storage after inspection and correct and fill in out of the inspection records、equipment, material storage parameter.3、出库:根据工区施工人员编制的《施工作业票》发放材料,并填写《物资设备领用单》月底25日汇总填写《物资设备用料台账》。
《物流实务英语》(英汉双语)
Summary 本章小结
The chapter focuses on the concept of supply chain and supply chain management. Supply chain consists of firms collaborating to serve the needs of end-customers. Supply chain consists of firms collaborating to take advantage of strategic position and to improve operating efficiency.
True or False 判断对错
1.There are a variety of definition about the term "logistics", each have slightly different meaning.
2. Logistics involves the flow and storage of "goods, services, and related information".
5. Good customer service is to make sure that the right person receive the right product with the right quantity at the right place at the right time in the right condition, even the cost is very high.
计算机毕业设计外文翻译---数据仓库
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1)Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2)Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on..(3)Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data..In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisionsbased on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending).(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies.(3) analyzing operations and looking for sources of profit.(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1)Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2)Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3)Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4)View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供了组织结构和工具,以便系统地组织、理解和使用数据进行决策。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
client applications
SSL
Internet
Application layer
Web servers
Browsers
决策分析型数据是多维性,分析 内容复杂。 在事务处理环境中,决策者可能 并不关心具体的细节信息,在 document 决策分析环境中,如果这些细节 Data management layer 数据量太大一方面会严重影响分 DB2 析效率,另一方面这些细节数据 会分散决策者的注意力。
解决方案
数 数 据 重 新 组 织
ETL
据 质 量 控 制
企 业 数 据模 型 R D W DDB D M
报表
In te r n e t
生产系统
取、 迁 移、 加 载
管理人员
即席查询
随即查询
分析人员
E II 采购系统 解决方案 企数 信 息 集 业 数 成据 E I I) ( 财务系统
质 量 控 制 重 新 组 织 据 实时 增量
共46页
15
BI系统VS决策盲点
某大型国有企业老总当他查看近十年企业的生产和运营 数据时,手边得到了各种各样不同的数据报表。 在仔细查看这些报表之后,这位国企老总惊讶地发现, 不同的系统可以得出截然相反的两种结论。例如某一产 品,它的动态成本反映在ERP系统和CRM、SCM系统 里面相差很大,如果引用ERP和CRM里面的数据,它 就是一款很成功、销量很好的产品,但在SCM里面来看, 它的采购和物流成本过高,导致了这款看起来很成功的 产品实际上是一笔赔钱的买卖。
共46页 3
事务型处理数据和分析型处理数据的区别
特性 OLTP OLAP
特征 面向 用户 功能 DB 设计 数据 汇总 视图 工作单位 存取 关注 操作 访问记录数 用户数 DB规模 优先 度量
操作处理 事务 办事员、DBA、数据库专业人员 日常操作 基于E-R,面向应用 当前的;确保最新 原始的,高度详细 详细,一般关系 短的、简单事务 读/写 数据进入 主关键字上索引/散列 数十个 数千 100MB到GB 高性能,高可用性 事务吞吐量 共46页
共46页
11
集成性
数据仓库中的数据是从原 有分散的源数据库中提取 出来的,其每一个主题所 对应的源数据在原有的数 据库中有许多冗余和不一 致,且与不同的应用逻辑 相关。为了创建一个有效 的主题域,必须将这些来 自不同数据源的数据集成 起来,使之遵循统一的编 码规则。
共46页 12
稳定性
数据仓库内的数据有很长的时间跨度,通常是5-10 年。 数据仓库中的数据反映的是一段时间内历史数据的内 容,是不同时点的数据库快照的集合,以及基于撰写 快照进行统计、综合和重组的导出数据。主要供 企业高层决策分析之用,所涉及的数据操作主要是查 询,一般情况下并不进行修改操作. 数据仓库中的数据是不可实时更新的,仅当超过规 定的存储期限,才将其从数据仓库中删除,提取新的 数据经集成后输入数据仓库。
信息处理 分析 知识工人(如经理、主管、分析员) 长期信息需求,决策支持 星形/雪花,面向主题 历史的;跨时间维护 汇总的,统一的 汇总的,多维的 复杂查询 大多为读 信息输出 大量扫描 数百万 数百 100GB到TB 高灵活性,端点用户自治 查询吞吐量,响应时间 4
数据库系统的局限性
数据库适于存储高度结构化的日 常事务细节数据,而决策型数据 多为历史性、汇总性或计算性数 据,多表现为静态数据,不需 直接更新,但可周期性刷新。
共46页
21
数据仓库流程
共46页
22
IBM 信息分析框架
Production data source ERP, CRM, SCM, data sources
Marts
Purchased Data DB2
Red Brick
DB2 Warehouse Manager
virtual tables
IBM
共46页
2
分析型处理
分析型处理:用于管理 人员的决策分析,例如 DSS、 EIS和多维分析等。 它帮助决策者分析数据 以察看趋向、判断问题。 分析型处理经常要访问 大量的历史数据,支持 复杂的查询。 分析型处理过程中经常 用到外部数据,这部分 数据不是由事务型处理 系统产生的,而是来自 于其他外部数据源。
共46页
14
支持管理决策
数据仓库支持OLAP(联机分析处理)、数据挖掘和决策 分析。OLAP从数据仓库中的综合数据出发,提供面向分 析的多维模型,并使用多维分析的方法从多个角度、多 个层次对多维数据进行分析,使决策者能够以更加自然 的方式来分析数据。数据挖掘则以数据仓库和多维数据 库中的数据为基础,发现数据中的潜在模式和进行预测。 因此,数据仓库的功能是支持管理层进行科学决策,而 不是事务处理。
共46页 18
数据仓库系统的结构
元数据管理(业务元数据、技术元数据等) 数据获取
数据源
数据迁移
数据管理
ETL
数据清洗
数据分析
业务模型
数据集市管理
数据展现
数据展现
安全性、 分析管理 最终用户
数据存储管理
数据仓库 元数据管理
数据仓库
ETL 销售系统
数 据 抽
决策人员
In tr a n e t/ 日常 数据增加 周 数据加载 日 周期 产品报告
共46页 13
时变性
时变性:许多商业分析要求对发展趋势做出预测,对发展 趋势的分析需要访问历史数据。因此数据仓库必须不断捕 捉OLTP数据库中变化的数据,生成数据库的快照,经集成 后增加到数据仓库中去;另外数据仓库还需要随时间的变 化删去过期的、对分析没有帮助的数据,并且还需要按规 定的时间段增加综合数据。
DB2
Reports & Content
IBM
Banco Azul
Federated Search
Welcome Carol Jones
Banco Azul - Today's News
Corporate News CEO Christoph Dermond comments on stock split Minimizing risk in B2B relations Special employee credit offers New Privacy Executive Post named Competitor News SomeCo talks with EvilEmpire Bank sparks merger rumors ToughCo loses fight with Banco Azul for $821M industrial loan Asian invasion into retail securities
共46页
17
数据仓库的技术要求
复杂分析的高性能体现:涉及大量数据的聚集、综合等, 在进行复杂查询时经常会使用多表的联接、累计、分类、 排序等操作。 对提取出来的数据进行集成:数据仓库中的数据是从多 个应用领域中提取出来的,在不同的应用领域和不同的 数据库系统中都有不同的结构和形式,所以如何对数据 进行集成也是构建数据仓库的一个重要方面。 对进行高层决策的最终用户的界面支持:提供各种分析 应用工具。
Data Warehouse
DB2 OLAP Intelligent Miner QMF ,BO, Brio, etc.
KPIs
WPS v1.2 - The Cutting File Edge Edit View Tools Help x Forward Stop Refresh Home Search Back Address: http://my_ History Mail Print 07/02/2001 Customize Home Page ! Edit 19:29:20 x Logout !
共46页
9
数据仓库(Data Warehouse)
数据仓库用来保存从多个数据库或其它信息源选取的数据, 并为 上层应用提供统一 用户接口,完成数据查询和分析。支持整个 企业范围的主要业务来建立的,主要特点是,包含大量面向整 个企业的综合信息及导出信息。 数据仓库是作为DSS服务基础的分析型DB,用来存放大容量的 只读数据,为制定决策提供所需要的信息。 数据仓库是与操作型系统相分离的、基于标准企业模型集成的、 带有时间属性的、面向主题及不可更新的数据集合。 以1992年W H Inmon出版《Building the Data Warehouse》 为标志,数据仓库发展速度很快。 W H Inmon被誉为数据仓库 之父。 W H Inmon对数据仓库所下的定义:数据仓库是面向主题的、 集成的、稳定的、随时间变化的数据集合次查询要启动多个局部系统, 通信和运行 开销大。
共46页
7
实施数据仓库的条件
数据积累已达到一定规模 面临激烈的市场竞争 在IT方面的资金能得到保障
共46页
8
数据仓库的发展
自从NCR公司为Wal Mart建立了第一个数据仓库。 1996年,加拿大的IDC公司调查了62家实现了数据仓库的欧美 企业,结果表明:数据仓库为企业提供了巨大的收益。 早期的数据仓库大都采用当时流行的客户/服务器结构。近年来 分布式对象技术飞速发展,整个数据仓库体系结构从功能上划分 为若干个分布式对象,这些分布式对象不仅可以直接用于建立数 据仓库,还可以在应用程序中向用户提供调用的接口。 IBM的实验室在数据仓库方面已经进行了10多年的研究,并将 研究成果发展成为商用产品。 其他数据库厂商在数据仓库领域也纷纷提出了各自的解决方案。