星型模型和雪花模型(数据仓库设计模型)
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
We are often told that one of the benefits of OBI EE is the speed and ease of development. Data sources can easily be added into the system, users can then quickly build queries and the results are easy to distribute. While I completely support this, to me this leaves a few questions when you go beyond the slick sales demos: do we need a Data Warehouse? How do we deal with data quality? How do we test? How do we ensure our great looking reports get from our development environment to our production environment and still display the same data?
This posting just concentrates on the Data Warehouse. Think of the project if we didn’t need a Data Warehouse, initially in terms of cost: no database license required, no ETL/Data Warehouse development required, no ongoing maintenance and maybe in the eyes of the business no black hole of money and time where the hairy developers go away, grumble about data quality and take longer than everyone thought. Think of all this new way from a business point of view: there is a new reporting requirement, they sit down with a business analyst, add the data source into the physical and then business model, graphically create the joins, and 10 minutes later the data is on the CEO’s dashboard, marvelous, and not a hairy developer type to be found anywhere in the process, so no cost and no hold-ups. But wait a minute: why have we been spending money developing Data Warehouses over the past couple of decades?
What was the Data Warehouse actually doing?
For starters, they prevented queries being run against the live transactional systems. Query and analysis tools can be very powerful, and hence can generate complex queries. If each of these queries was being run against the live systems then performance would be impacted, plus the reporting system would be dependent on all the systems being live all the time. Thus loading all the data into a ‘reporting database’ protected the transactional systems.
(我们应该防止SQL query直接在我们的transactional systems 上运行,这样会影响我们的application system)
But what about our real-time Data Warehouse, managers need up to the minute information to make informed decisions? Real-time Data Warehouses can be a double-edged sword, it is great to have up the minute information, but the downside is that reports keep on changing; it becomes more difficult to reconcile information or to get a consistent view of the organization. Sometimes it is actually useful to have a static view of the data for 24 hours. Real-time Data Warehousing can also be implemented using replication features like Change Data Capture in the database if required.
Data Warehouses also offer a number of functional advantages:
They can store historic data that may have been archived from the transactional system.
They can store the history of dimensions, so facts can be correctly categorized when they happened.
They can store data from different systems.
They can store the data in a way that makes it easy and efficient for users to query, this means that a large volume of data can be accessed and used effectively.