数据仓库--外文翻译
SQL数据库外文翻译--数据库的工作
Working with DatabasesThis chapter describes how to use SQL statements in embedded applications to control databases. There are three database statements that set up and open databases for access: SET DATABASE declares a database handle, associates the handle with an actual database file, and optionally assigns operational parameters for the database.SET NAMES optionally specifies the character set a client application uses for CHAR, VARCHAR, and text Blob data. The server uses this information totransli terate from a database‟s default character set to the client‟s character set on SELECT operations, and to transliterate from a client application‟s character set to the database character set on INSERT and UPDATE operations.g CONNECT opens a database, allocates system resources for it, and optionally assigns operational parameters for the database.All databases must be closed before a program ends. A database can be closed by using DISCONNECT, or by appending the RELEASE option to the final COMMIT or ROLLBACK in a program. Declaring a databaseBefore a database can be opened and used in a program, it must first be declared with SET DATABASE to:CHAPTER 3 WORKING WITH DATABASES. Establish a database handle. Associate the database handle with a database file stored on a local or remote node.A database handle is a unique, abbreviated alias for an actual database name. Database handles are used in subsequent CONNECT, COMMIT RELEASE, and ROLLBACK RELEASE statements to specify which databases they should affect. Except in dynamic SQL (DSQL) applications, database handles can also be used inside transaction blocks to qualify, or differentiate, table names when two or more open databases contain identically named tables.Each database handle must be unique among all variables used in a program. Database handles cannot duplicate host-language reserved words, and cannot be InterBase reserved words.The following statement illustrates a simple database declaration:EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;This database declaration identifies the database file, employee.gdb, as a database the program uses, and assigns the database a handle, or alias, DB1.If a program runs in a directory different from the directory that contains the database file, then the file name specification in SET DATABASE must include a full path name, too. For example, the following SET DATABASE declaration specifies the full path to employee.gdb:EXEC SQLSET DATABASE DB1 = ‟/interbase/examples/employee.gdb‟;If a program and a database file it uses reside on different hosts, then the file name specification must also include a host name. The following declaration illustrates how a Unix host name is included as part of the database file specification on a TCP/IP network:EXEC SQLSET DATABASE DB1 = ‟jupiter:/usr/interbase/examples/employee.gdb‟;On a Windows network that uses the Netbeui protocol, specify the path as follows: EXEC SQLSET DATABASE DB1 = ‟//venus/C:/Interbase/examples/employee.gdb‟; DECLARING A DATABASEEMBEDDED SQL GUIDE 37Declaring multiple databasesAn SQL program, but not a DSQL program, can access multiple databases at the same time. In multi-database programs, database handles are required. A handle is used to:1. Reference individual databases in a multi-database transaction.2. Qualify table names.3. Specify databases to open in CONNECT statements.Indicate databases to close with DISCONNECT, COMMIT RELEASE, and ROLLBACK RELEASE.DSQL programs can access only a single database at a time, so database handle use is restricted to connecting to and disconnecting from a database.In multi-database programs, each database must be declared in a separate SET DATABASE statement. For example, the following code contains two SET DATABASE statements:. . .EXEC SQLSET DATABASE DB2 = ‟employee2.gdb‟;EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;. . .4Using handles for table namesWhen the same table name occurs in more than one simultaneously accessed database, a database handle must be used to differentiate one table name from another. The database handle is used as a prefix to table names, and takes the formhandle.table.For example, in the following code, the database handles, TEST and EMP, are used to distinguish between two tables, each named EMPLOYEE:. . .EXEC SQLDECLARE IDMATCH CURSOR FORSELECT TESTNO INTO :matchid FROM TEST.EMPLOYEEWHERE TESTNO > 100;EXEC SQLDECLARE EIDMATCH CURSOR FORSELECT EMPNO INTO :empid FROM EMP.EMPLOYEEWHERE EMPNO = :matchid;. . .CHAPTER 3 WORKING WITH DATABASES38 INTERBASE 6IMPORTANTThis use of database handles applies only to embedded SQL applications. DSQL applications cannot access multiple databases simultaneously.4Using handles with operationsIn multi-database programs, database handles must be specified in CONNECT statements to identify which databases among several to open and prepare for use in subsequent transactions.Database handles can also be used with DISCONNECT, COMMIT RELEASE, and ROLLBACKRELEASE to specify a subset of open databases to close.To open and prepare a database w ith CONNECT, see “Opening a database” on page 41.To close a database with DISCONNECT, COMMIT RELEASE, or ROLLBACK RELEASE, see“Closing a database” on page 49. To learn more about using database handles in transactions, see “Accessing an open database” on p age 48.Preprocessing and run time databasesNormally, each SET DATABASE statement specifies a single database file to associate with a handle. When a program is preprocessed, gpre uses the specified file to validate the program‟s table and column referenc es. Later, when a user runs the program, the same database file is accessed. Different databases can be specified for preprocessing and run time when necessary.4Using the COMPILETIME clause A program can be designed to run against any one of several identically structured databases. In other cases, the actual database that a program will use at runtime is not available when a program is preprocessed and compiled. In such cases, SET DATABASE can include a COMPILETIME clause to specify a database for gpre to test against during preprocessing. For example, the following SET DATABASE statement declares that employee.gdb is to be used by gpre during preprocessing: EXEC SQLSET DATABASE EMP = COMPILETIME ‟employee.gdb‟;IMPORTANTThe file specification that follows the COMPILETIME keyword must always be a hard-coded, quoted string.DECLARING A DATABASEEMBEDDED SQL GUIDE 39When SET DATABASE uses the COMPILETIME clause, but no RUNTIME clause, and does not specify a different database file specification in a subsequent CONNECT statement, the same database file is used both for preprocessing and run time. To specify different preprocessing and runtime databases with SET DATABASE, use both the COMPILETIME andRUNTIME clauses.4Using the RUNTIME clauseWhen a database file is specified for use during preprocessing, SET DATABASE can specify a different database to use at run time by including the RUNTIME keyword and a runtime file specification:EXEC SQLSET DATABASE EMP = COMPILETIME ‟employee.gdb‟RUNTIME ‟employee2.gdb‟;The file specification that follows the RUNTIME keyword can be either ahard-coded, quoted string, or a host-language variable. For example, the following C code fragment prompts the user for a database name, and stores the name in a variable that is used later in SET DATABASE:. . .char db_name[125];. . .printf("Enter the desired database name, including node and path):\n");gets(db_name);EXEC SQLSET DATABASE EMP = COMPILETIME ‟employee.gdb‟ RUNTIME : db_name; . . .Note host-language variables in SET DATABASE must be preceded, as always, by a colon.Controlling SET DATABASE scopeBy default, SET DATABASE creates a handle that is global to all modules in an application.A global handle is one that may be referenced in all host-language modules comprising the program. SET DATABASE provides two optional keywords to change the scope of a declaration:g STATIC limits declaration scope to the module containing the SET DATABASE statement. No other program modules can see or use a database handle declared STATIC.CHAPTER 3 WORKING WITH DATABASES40 INTERBASE 6EXTERN notifies gpre that a SET DATABASE statement in a module duplicates a globally-declared database in another module. If the EXTERN keyword is used, then another module must contain the actual SET DATABASE statement, or an error occurs during compilation.The STATIC keyword is used in a multi-module program to restrict database handle access to the single module where it is declared. The following example illustrates the use of theSTATIC keyword:EXEC SQLSET DATABASE EMP = STATIC ‟employee.gdb‟;The EXTERN keyword is used in a multi-module program to signal that SET DATABASE in one module is not an actual declaration, but refers to a declaration made in a different module. Gpre uses this information during preprocessing. The following example illustrates the use of the EXTERN keyword:EXEC SQLSET DATABASE EMP = EXTERN ‟employee.gdb‟;If an application contains an EXTERN reference, then when it is used at run time, the actual SET DATABASE declaration must be processed first, and the database connected before other modules can access it.A single SET DATABASE statement can contain either the STATIC or EXTERN keyword, but not both. A scope declaration in SET DATABASE applies to both COMPILETIME and RUNTIME databases.Specifying a connection character setWhen a client application connects to a database, it may have its own character set requirements. The server providing database access to the client does not know about these requirements unless the client specifies them. The client application specifies its character set requirement using the SET NAMES statement before it connects to the database.SET NAMES specifies the character set the server should use when translating data from the database to the client application. Similarly, when the client sends data to the database, the server translates the data from the client‟s character set to the database‟s default character set (or the character set for an individual column if it differs from the databas e‟s default character set). For example, the following statements specify that the client is using the DOS437 character set, then connect to the database:EXEC SQLOPENING A DATABASEEMBEDDED SQL GUIDE 41SET NAMES DOS437;EXEC SQLCONNECT ‟europe.gdb‟ USER ‟JAMES‟ PASSWORD ‟U4EEAH‟;For more information about character sets, see the Data Definition Guide. For the complete syntax of SET NAMES and CONNECT, see the Language Reference. Opening a databaseAfter a database is declared, it must be attached with a CONNECT statement before it can be used. CONNECT:1. Allocates system resources for the database.2. Determines if the database file is local, residing on the same host where the application itself is running, or remote, residing on a different host.3. Opens the database and examines it to make sure it is valid.InterBase provides transparent access to all databases, whether local or remote. If the database structure is invalid, the on-disk structure (ODS) number does not correspond to the one required by InterBase, or if the database is corrupt, InterBase reports an error, and permits no further access. Optionally, CONNECT can be used to specify:4. A user name and password combination that is checked against the server‟s security database before allowing the connect to succeed. User names can be up to 31 characters.Passwords are restricted to 8 characters.5. An SQL role name that the user adopts on connection to the database, provided that the user has previously been granted membership in the role. Regardless of role memberships granted, the user belongs to no role unless specified with this ROLE clause.The client can specify at most one role per connection, and cannot switch roles except by reconnecting.6. The size of the database buffer cache to allocate to the application when the default cache size is inappropriate.Using simple CONNECT statementsIn its simplest form, CONNECT requires one or more database parameters, each specifying the name of a database to open. The name of the database can be a: Database handle declared in a previous SET DATABASE statement.CHAPTER 3 WORKING WITH DATABASES42 INTERBASE 61. Host-language variable.2. Hard-coded file name.4Using a database handleIf a program uses SET DATABASE to provide database handles, those handles should be used in subsequent CONNECT statements instead of hard-coded names. For example,. . .EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;EXEC SQLSET DATABASE DB2 = ‟employee2.gdb‟;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .There are several advantages to using a database handle with CONNECT:1. Long file specifications can be replaced by shorter, mnemonic handles.2. Handles can be used to qualify table names in multi-database transactions. DSQL applications do not support multi-database transactions.3. Handles can be reassigned to other databases as needed.4. The number of database cache buffers can be specified as an additional CONNECT parameter.For more information about setting the number of database cache buffers, see “Setting d atabase cache buffers” on page 47. 4Using strings or host-language variables Instead of using a database handle, CONNECT can use a database name supplied at run time. The database name can be supplied as either a host-language variable or a hard-coded, quoted string.The following C code demonstrates how a program accessing only a single database might implement CONNECT using a file name solicited from a user at run time:. . .char fname[125];. . .printf(‟Enter the desired database name, including nodeand path):\n‟);OPENING A DATABASEEMBEDDED SQL GUIDE 43gets(fname);. . .EXEC SQLCONNECT :fname;. . .TipThis technique is especially useful for programs that are designed to work with many identically structured databases, one at a time, such as CAD/CAM or architectural databases.MULTIPLE DATABASE IMPLEMENTATIONTo use a database specified by the user as a host-language variable in a CONNECT statement in multi-database programs, follow these steps:1. Declare a database handle using the following SET DATABASE syntax:EXEC SQLSET DATABASE handle = COMPILETIME ‟ dbname‟;Here, handle is a hard-coded database handle supplied by the programmer, dbnameis a quoted, hard-coded database name used by gpre during preprocessing.2. Prompt the user for a database to open.3. Store the database name entered by the user in a host-language variable.4. Use the handle to open the database, associating the host-language variable with the handle using the following CONNECT syntax:EXEC SQLCONNECT : variable AS handle;The following C code illustrates these steps:. . .char fname[125];. . .EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;printf("Enter the desired database name, including nodeand path):\n");gets(fname);EXEC SQLCONNECT :fname AS DB1;. . .CHAPTER 3 WORKING WITH DATABASES44 INTERBASE 6In this example, SET DATABASE provides a hard-coded database file name for preprocessing with gpre. When a user runs the program, the database specified in the variable, fname, is used instead. 4Using a hard-coded database namesIN SINGE-DATABASE PROGRAMSIn a single-database program that omits SET DATABASE, CONNECT must contain a hard-coded, quoted file name in the following format:EXEC SQLCONNECT ‟[ host[ path]] filename‟; host is required only if a program and a dat abase file it uses reside on different nodes.Similarly, path is required only if the database file does not reside in the current working directory. For example, the following CONNECT statement contains ahard-coded file name that includes both a Unix host name and a path name:EXEC SQLCONNECT ‟valdez:usr/interbase/examples/employee.gdb‟;Note Host syntax is specific to each server platform.IMPORTANT A program that accesses multiple databases cannot use this form of CONNECT.IN MULTI-DATABASE PROGRAMSA program that accesses multiple databases must declare handles for each of them in separate SET DATABASE statements. These handles must be used in subsequent CONNECT statements to identify specific databases to open:. . .EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;EXEC SQLSET DATABASE DB2 = ‟employee2.gdb‟;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .Later, when the program closes these databases, the database handles are no longer in use. These handles can be reassigned to other databases by hard-coding a file name in a subsequent CONNECT statement. For example,OPENING A DATABASEEMBEDDED SQL GUIDE 45. . .EXEC SQLDISCONNECT DB1, DB2;EXEC SQLCONNECT ‟project.gdb‟ AS DB1;. . .Additional CONNECT syntaxCONNECT supports several formats for opening databases to provide programming flexibility. The following table outlines each possible syntax, provides descriptions and examples, and indicates whether CONNECT can be used in programs that access single or multiple databases:For a complete discussion of CONNECT syntax and its uses, see the Language Reference.Syntax Description ExampleSingle accessMultiple accessCONNECT …dbfile‟; Opens a single, hard-coded database file, dbfile.EXEC SQLCONNECT …employee.gdb‟;Yes NoCONNECT handle; Opens the database file associated with a previously declared database handle. This is the preferred CONNECT syntax.EXEC SQLCONNECT EMP;Yes YesCONNECT …dbfile‟ AS handle;Opens a hard-coded database file, dbfile, and assigns a previously declared database handle to it.EXEC SQLCONNECT …employee.gdb‟AS EMP;Yes Yes CONNECT :varname AS handle;Opens the database file stored in the host-language variable, varname, and assigns a previously declared database handle to it.EXEC SQL CONNECT :fname AS EMP;Yes YesTABLE 3.1 CONNECT syntax summaryCHAPTER 3 WORKING WITH DATABASES46 INTERBASE 6Attaching to multiple databasesCONNECT can attach to multiple databases. To open all databases specified in previous SETDATABASE statements, use either of the following CONNECT syntax options: EXEC SQLCONNECT ALL;EXEC SQLCONNECT DEFAULT;CONNECT can also attach to a specified list of databases. Separate each database request from others with commas. For example, the following statement opens two databases specified by their handles:EXEC SQLCONNECT DB1, DB2;The next statement opens two hard-coded database files and also assigns them to previously declared handles:EXEC SQLCONNECT ‟employee.gdb‟ AS DB1, ‟employee2.gdb‟ AS DB2;Tip Opening multiple databases with a single CONNECT is most effective when a program‟s database access is simple and clear. In complex programs that open and close several databases, that substitute database names with host-language variables, or that assign multiple handles to the same database, use separate CONNECT statements to make program code easier to read, debug, and modify.Handling CONNECT errors. The WHENEVER statement should be used to trap and handle runtime errors that occur during database declaration. The following C code fragment illustrates an error-handling routine that displays error messages and ends the program in an orderly fashion:. . .EXEC SQLWHENEVER SQLERRORGOTO error_exit;. . .OPENING A DATABASEEMBEDDED SQL GUIDE 47:error_exitisc_print_sqlerr(sqlcode, status_vector);EXEC SQLDISCONNECT ALL;exit(1);. . .For a complete discussion of SQL error handling, see Chapter 12, “Error Handling and Recovery.”数据库的工作这章描述怎样使用在嵌入式应用过程中的SQL语句控制数据库。
大数据技术-大数据数据仓库
大数据数据仓库1、数据仓库基本概念数据仓库即Data Warehouse,简称DW,主要研究和解决从数据中获取信息的问题,为企业所有级别的决策制定过程,提供所有类型数据支持的战略集合。
本质上,数据仓库试图提供一种从操作型系统到决策支持系统的数据流架构模型。
主要是解决多重数据复制带来的高成本问题。
在有数仓之前,需要大量的冗余数据来支撑多个决策支持系统,尽管每个系统服务于不同的用户,但是这些系统经常需要大量相同的数据。
决策支持系统Decision support System,即DDS,是用于支持业务或组织决策活动的信息系统,服务于组织管理、运营和规划管理层(通常是中层或高级管理层),帮助人们对可能快速变化并且不容易预测结果的问题做出决策。
数据仓库就是为决策系统提供数据支持的。
1.1、数据仓库的概念数据仓库之父Bill Inmon提出了被广泛认可的数据仓库定义,把数据仓库定义为是一个面向主题的、集成的、非易失的和时变的数据集合,用于支持管理者的决策过程。
1.2、数据仓库的特性面向主题传统的操作型系统是围绕功能性应用来组织数据的,各个业务系统可能是相互隔离的,而数据仓库是面向主题的。
主题是一个抽象概念,简单的来说就是用户使用数据仓库进行决策时所关心的重点方面,一个主题通常与多个操作型系统相关。
例如一个保险公司要分析销售数据,就可以建立一个专注销售数据的数据仓库,通过这个数据仓库就可以得到“过去半年销售保险总额以及各险种的占比”,这个场景下销售就是一个数据主题,同时数据仓库设计时要排除对于决策无用的数据,上述场景决策关注销售,理赔不是这个主题的,因此理赔的数据就不需要在这个数据仓库中存储。
集成集成是与面向主题密切相关。
我们上面的保险的例子中,建立一个保险公司的销售数据仓库,不同的险种由不同的部门负责,他们有各自独立的销售数据库,此时要想从公司层面整体分析销售数据,必须将各分散的数据源统一成一致的、无歧义的数据格式后,再存储在数据仓库中,因此就需要解决各数据源的矛盾之处,例如字段的同名异意、异名同意、字段数据类型不一致,长度不一致,计量单位不一致等等。
外文翻译---数据库管理
英文资料翻译资料出处:From /china/ database英文原文:Database ManagementDatabase (sometimes spelled database) is also called an electronic database, referring to any collections of data, or information, that is specially organized for rapid search and retrieval by a computer. Databases are structured to facilitate the storage, retrieval modification and deletion of data in conjunction with various data-processing operations. Database can be stored on magnetic disk or tape, optical disk, or some other secondary storage device.A database consists of a file or a set of files. The information in the these files may be broken down into records, each of which consists of one or more fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Using keywords and various sorting commands, users can rapidly search, rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregates of data.Database records and files must be organized to allow retrieval of the information. Early system were arranged sequentially (i.e., alphabetically, numerically, or chronologically); the development of direct-access storage devices made possible random access to data via indexes. Queries are the main way users retrieve database information. Typically the user provides a string of characters, and the computer searches the database for a corresponding sequence and provides the source materials in which those characters appear. A user can request, for example, all records in which the content of the field for a person’s last name is the word Smith.The many users of a large database must be able to manipulate the information within it quickly at any given time. Moreover, large business and other organizations tend to build up many independent files containing related and even overlapping data, and their data, processing activities often require the linking of data from several files.Several different types of database management systems have been developed to support these requirements: flat, hierarchical, network, relational, and object-oriented.In flat databases, records are organized according to a simple list of entities; many simple databases for personal computers are flat in structure. The records in hierarchical databases are organized in a treelike structure, with each level of records branching off into a set of smaller categories. Unlike hierarchical databases, which provide single links between sets of records at different levels, network databases create multiple linkages between sets by placing links, or pointers, to one set of records in another; the speed and versatility of network databases have led to their wide use in business. Relational databases are used where associations among files or records cannot be expressed by links; a simple flat list becomes one table, or “relation”, and multiple relations can be mathematically associated to yield desired information. Object-oriented databases store and manipulate more complex data structures, called “objects”, which are organized into hierarchical classes that may inherit properties from classes higher in the chain; this database structure is the most flexible and adaptable.The information in many databases consists of natural-language texts of documents; number-oriented database primarily contain information such as statistics, tables, financial data, and raw scientific and technical data. Small databases can be maintained on personal-computer systems and may be used by individuals at home. These and larger databases have become increasingly important in business life. Typical commercial applications include airline reservations, production management, medical records in hospitals, and legal records of insurance companies. The largest databases are usually maintained by governmental agencies, business organizations, and universities. These databases may contain texts of such materials as catalogs of various kinds. Reference databases contain bibliographies or indexes that serve as guides to the location of information in books, periodicals, and other published literature. Thousands of these publicly accessible databases now exist, covering topics ranging from law, medicine, and engineering to news and current events, games, classified advertisements, and instructional courses. Professionals such as scientists,doctors, lawyers, financial analysts, stockbrokers, and researchers of all types increasingly rely on these databases for quick, selective access to large volumes of information.一、DBMS Structuring TechniquesSequential, direct, and other file processing approaches are used to organize and structure data in single files. But a DBMS is able to integrate data elements from several files to answer specific user inquiries for information. That is, the DBMS is able to structure and tie together the logically related data from several large files.Logical Structures. Identifying these logical relationships is a job of the data administrator. A data definition language is used for this purpose. The DBMS may then employ one of the following logical structuring techniques during storage access, and retrieval operations.List structures. In this logical approach, records are linked together by the use of pointers. A pointer is a data item in one record that identifies the storage location of another logically related record. Records in a customer master file, for example, will contain the name and address of each customer, and each record in this file is identified by an account number. During an accounting period, a customer may buy a number of items on different days. Thus, the company may maintain an invoice file to reflect these transactions. A list structure could be used in this situation to show the unpaid invoices at any given time. Each record in the customer in the invoice file. This invoice record, in turn, would be linked to later invoices for the customer. The last invoice in the chain would be identified by the use of a special character as a pointer.Hierarchical (tree) structures. In this logical approach, data units are structured in multiple levels that graphically resemble an “upside down”tree with the root at the top and the branches formed below. There’s a superior-subordinate relationship in a hierarchical (tree) structure. Below the single-root data component are subordinate elements or nodes, each of which, in turn, “own”one or more other elements (or none). Each element or branch in this structure below the root has only a single owner. Thus, a customer owns an invoice, and the invoice has subordinate items. Thebranches in a tree structure are not connected.Network Structures. Unlike the tree approach, which does not permit the connection of branches, the network structure permits the connection of the nodes in a multidirectional manner. Thus, each node may have several owners and may, in turn, own any number of other data units. Data management software permits the extraction of the needed information from such a structure by beginning with any record in a file.Relational structures. A relational structure is made up of many tables. The data are stored in the form of “relations”in these tables. For example, relation tables could be established to link a college course with the instructor of the course, and with the location of the class.To find the name of the instructor and the location of the English class, the course/instructor relation is searched to get the name (“Fitt”), and the course/locati on relation is a relatively new database structuring approach that’s expected to be widely implemented in the future.Physical Structures. People visualize or structure data in logical ways for their own purposes. Thus, records R1 and R2 may always be logically linked and processed in sequence in one particular application. However, in a computer system it’s quite possible that these records that are logically contiguous in one application are not physically stored together. Rather, the physical structure of the records in media and hardware may depend not only on the I/O and storage devices and techniques used, but also on the different logical relationships that users may assign to the data found in R1and R2. For example, R1 and R2 may be records of credit customers who have shipments send to the same block in the same city every 2 weeks. From the shipping department manager’s perspective, then, R1 and R2 are sequential entries on a geographically organized shipping report. But in the A/R application, the customers represented by R1 and R2 may be identified, and their accounts may be processed, according to their account numbers which are widely separated. In short, then, the physical location of the stored records in many computer-based information systems is invisible to users.二、Database Management Features of OracleOracle includes many features that make the database easier to manage. We’ve divided the discussion in this section into three categories: Oracle Enterprise Manager, add-on packs, backup and recovery.1.Oracle Enterprise ManagerAs part of every Database Server, Oracle provides the Oracle Enterprise Manager (EM), a database management tool framework with a graphical interface used to manage database users, instances, and features (such as replication) that can provide additional information about the Oracle environment.Prior to the Oracle8i database, the EM software had to be installed on Windows 95/98 or NT-based systems and each repository could be accessed by only a single database manager at a time. Now you can use EM from a browser or load it onto Windows 95/98/2000 or NT-based systems. Multiple database administrators can access the EM repository at the same time. In the EM repository for Oracle9i, the super administrator can define services that should be displayed on other administrators’consoles, and management regions can be set up.2. Add-on packsSeveral optional add-on packs are available for Oracle, as described in the following sections. In addition to these database-management packs, management packs are available for Oracle Applications and for SAP R/3.(1) standard Management PackThe Standard Management Pack for Oracle provides tools for the management of small Oracle databases (e.g., Oracle Server/Standard Edition). Features include support for performance monitoring of database contention, I/O, load, memory use and instance metrics, session analysis, index tuning, and change investigation and tracking.(2) Diagnostics PackYou can use the Diagnostic Pack to monitor, diagnose, and maintain the health of Enterprise Edition databases, operating systems, and applications. With both historical and real-time analysis, you can automatically avoid problems before theyoccur. The pack also provides capacity planning features that help you plan and track future system-resource requirements.(3)Tuning PackWith the Tuning Pack, you can optimise system performance by identifying and tuning Enterprise Edition databases and application bottlenecks such as inefficient SQL, poor data design, and the improper use of system resources. The pack can proactively discover tuning opportunities and automatically generate the analysis and required changes to tune the systems.(4) Change Management PackThe Change Management Pack helps eliminate errors and loss of data when upgrading Enterprise Edition databases to support new applications. It impact and complex dependencies associated with application changes and automatically perform database upgrades. Users can initiate changes with easy-to-use wizards that teach the systematic steps necessary to upgrade.(5) AvailabilityOracle Enterprise Manager can be used for managing Oracle Standard Edition and/or Enterprise Edition. Additional functionality is provided by separate Diagnostics, Tuning, and Change Management Packs.3. Backup and RecoveryAs every database administrator knows, backing up a database is a rather mundane but necessary task. An improper backup makes recovery difficult, if not impossible. Unfortunately, people often realize the extreme importance of this everyday task only when it is too late –usually after losing business-critical data due to a failure of a related system.The following sections describe some products and techniques for performing database backup operations.(1) Recovery ManagerTypical backups include complete database backups (the most common type), database backups, control file backups, and recovery of the database. Previously,Oracle’s Enterprise Backup Utility (EBU) provided a similar solution on some platforms. However, RMAN, with its Recovery Catalog stored in an Oracle database, provides a much more complete solution. RMAN can automatically locate, back up, restore, and recover databases, control files, and archived redo logs. RMAN for Oracle9i can restart backups and restores and implement recovery window policies when backups expire. The Oracle Enterprise Manager Backup Manager provides a GUI-based interface to RMAN.(2) Incremental backup and recoveryRMAN can perform incremental backups of Enterprise Edition databases. Incremental backups back up only the blocks modified since the last backup of a datafile, tablespace, or database; thus, they’re smaller and faster than complete backups. RMAN can also perform point-in-time recovery, which allows the recovery of data until just prior to a undesirable event.(3) Legato Storage ManagerVarious media-management software vendors support RMAN. Oracle bundles Legato Storage Manager with Oracle to provide media-management services, including the tracking of tape volumes, for up to four devices. RMAN interfaces automatically with the media-management software to request the mounting of tapes as needed for backup and recovery operations.(4)AvailabilityWhile basic recovery facilities are available for both Oracle Standard Edition and Enterprise Edition, incremental backups have typically been limited to Enterprise Edition.Data IndependenceAn important point about database systems is that the database should exist independently of any of the specific applications. Traditional data processing applications are data dependent. COBOL programs contain file descriptions and record descriptions that carefully describe the format and characteristics of the data.Users should be able to change the structure of the database without affecting the applications that use it. For example, suppose that the requirements of yourapplications change. A simple example would be expanding ZIP codes from five digits to nine digits. On a traditional approach using COBOL programs each individual COBOL application program that used that particular field would have to be changed, recompiled, and retested. The programs would be unable to recognize or access a file that had been changed and contained a new data description; this, in turn, might cause disruption in processing unless the change were carefully planned.Most database programs provide the ability to change the database structure by simply changing the ZIP code field and the data-entry form. In this case, data independence allows for minimal disruption of current and existing applications. Users can continue to work and can even ignore the nine-digit code if they choose. Eventually, the file will be converted to the new nine-digit ZIP code, but the ease with which the changeover takes place emphasizes the importance of data independence.Data IntegrityData integrity refers to the accuracy, correctness, or validity of the data in the database. In a database system, data integrity means safeguarding the data against invalid alteration or destruction arise. The first has to do with many users accessing the database concurrently. For example, if thousands of travel agents and airline reservation clerks are accessing the database concurrently. For example, if thousands of travel agents and airline reservation clerks are accessing the same database at once, and two agents book the same seat on the same flight, the first agent’s booking will be lost. In such case the technique of locking the record or field provides the means for preventing one user from accessing a record while another user is updating the same record.The second complication relates to hardwires, software, or human error during the course of processing and involves database transactions treated as a single . For example, an agent booking an airline reservation involves several database updates (i.e., adding the passenger’s name and address and updating the seats-available field), which comprise a single transaction. The database transaction is not considered to be completed until all updates have been completed; otherwise, none of the updates will be allowed to take place.Data SecurityData security refers to the protection of a database against unauthorized or illegal access or modification. For example, a high-level password might allow a user to read from, write to, and modify the database structure, whereas a low-level password history of the modifications to a database-can be used to identify where and when a database was tampered with and it can also be used to restore the file to its original condition.三、Choosing between Oracle and SQL ServerI have to decide between using the Oracle database and WebDB vs. Microsoft SQL Server with Visual Studio. This choice will guide our future Web projects. What are the strong points of each of these combinations and what are the negatives?Lori: Making your decision will depend on what you already have. For instance, if you want to implement a Web-based database application and you are a Windows-only shop, SQL Server and the Visual Studio package would be fine. But the Oracle solution would be better with mixed platforms.There are other things to consider, such as what extras you get and what skills are required. WebDB is a content management and development tool that can be used by content creators, database administrators, and developers without any programming experience. WebDB is a browser-based tool that helps ease content creation and provides monitoring and maintenance tools. This is a good solution for organizations already using Oracle. Oracle also scales better than SQL Server, but you will need to have a competent Oracle administrator on hand.The SQL Sever/Visual Studio approach is more difficult to use and requires an experienced object-oriented programmer or some extensive training. However, you do get a fistful of development tools with Visual Studio: Visual Basic, Visual C++, and Visual InterDev for only $1,619. Plus, you will have to add the cost of the SQL Server, which will run you $1,999 for 10 clients or $3,999 for 25 clients-a less expensive solution than Oracle’s.Oracle also has a package solution that starts at $6,767, depending on the platform selected. The suite includes not only WebDB and Oracle8i butalso other tools for development such as the Oracle application server, JDeveloper, and Workplace Templates, and the suite runs on more platforms than the Microsoft solution does. This can be a good solution if you are a start-up or a small to midsize business. Buying these tools in a package is less costly than purchasing them individually.Much depends on your skill level, hardware resources, and budget. I hope this helps in your decision-making.Brooks: I totally agree that this decision depends in large part on what infrastructure and expertise you already have. If the decision is close, you need to figure out who’s going to be doing the work and what your priorities are.These two products have different approaches, and they reflect the different personalities of the two vendors. In general, Oracle products are designed for very professional development efforts by top-notch programmers and project leaders. The learning period is fairly long, and the solution is pricey; but if you stick it out you will ultimately have greater scalability and greater reliability.If your project has tight deadlines and you don’t have the time and/or money to hire a team of very expensive, very experienced developers, you may find that the Oracle solution is an easy way to get yourself in trouble. There’s nothing worse than a poorly developed Oracle application.What Microsoft offers is a solution that’s aimed at rapid development and low-cost implementation. The tools are cheaper, the servers you’ll run it on are cheaper, and the developers you need will be cheaper. Choosing SQL Sever and Visual Studio is an excellent way to start fast.Of course, there are trade-offs. The key problem I have with Visual Studio and SQL Server is that you’ll be tied to Microsoft operating systems and Intel hardware. If the day comes when you need to support hundreds of thousands of users, you really don’t have anywhere to go other than buying hundreds of servers, which is a management nightmare.If you go with the Microsoft approach, it sounds like you may not need morethan Visual Interdev. If you already know that you’re going to be developing ActiveX components in Visual Basic or Visual C++, that’s warning sign that maybe you should look at the Oracle solution more closely.I want to emphasize that, although these platforms have their relative strengths and weaknesses, if you do it right you can build a world-class application on either one. So if you have an organizational bias toward one of the vendors, by all means go with it. If you’re starting out from scratch, you’re going to have to ask yourself whether your organization leans more toward perfectionism or pragmatism, and realize that both “isms”have their faults.中文译文:数据库管理数据库(也称DataBase)也称为电子数据库,是指由计算机特别组织的用下快速查找和检索的任意的数据或信息集合。
数据库 专业英语翻译
数据库数据库(Database)是按照数据结构来组织、存储和管理数据的仓库,它产生于距今五十年前,随着信息技术和市场的发展,特别是二十世纪九十年代以后,数据管理不再仅仅是存储和管理数据,而转变成用户所需要的各种数据管理的方式。
数据库有很多种类型,从最简单的存储有各种数据的表格到能够进行海量数据存储的大型数据库系统都在各个方面得到了广泛的应用。
一.数据管理的诞生数据库的历史可以追溯到五十年前,那时的数据管理非常简单。
通过大量的分类、比较和表格绘制的机器运行数百万穿孔卡片来进行数据的处理,其运行结果在纸上打印出来或者制成新的穿孔卡片。
而数据管理就是对所有这些穿孔卡片进行物理的储存和处理。
然而,1 9 5 1 年雷明顿兰德公司(Remington Rand Inc.)的一种叫做Univac I 的计算机推出了一种一秒钟可以输入数百条记录的磁带驱动器,从而引发了数据管理的革命。
1956 年IBM生产出第一个磁盘驱动器——the Model 305 RAMAC。
此驱动器有50 个盘片,每个盘片直径是2 英尺,可以储存5MB的数据。
使用磁盘最大的好处是可以随机地存取数据,而穿孔卡片和磁带只能顺序存取数据。
数据库系统的萌芽出现于60 年代。
当时计算机开始广泛地应用于数据管理,对数据的共享提出了越来越高的要求。
传统的文件系统已经不能满足人们的需要。
能够统一管理和共享数据的数据库管理系统(DBMS)应运而生。
数据模型是数据库系统的核心和基础,各种DBMS 软件都是基于某种数据模型的。
所以通常也按照数据模型的特点将传统数据库系统分成网状数据库、层次数据库和关系数据库三类。
二.结构化查询语言(SQL)1974 年,IBM的Ray Boyce和Don Chamberlin将Codd关系数据库的12条准则的数学定义以简单的关键字语法表现出来,里程碑式地提出了SQL(Structured Query Language)语言。
数据仓库原理
数据仓库原理-by zvane 1.数据仓库概念因为,管理人员往往传统数据库以及OLTP(On-Line Transaction Processing 联机事务处理)在日常的管理事务处理中获得了巨大的成功,但是对管理人员的决策分析要求却无法满足。
希翼能够通过对组织中的大量数据进行分析,了解业务的发展趋势。
而传统数据库只保留了当前的业务处理信息,缺乏决策分析所需要的大量的历史信息。
为满足管理人员的决策分析需要,就需要在数据库的基础上产生适应决策分析的数据环境——数据仓库(Data Warehouse)。
1.1定义William H.Inmon 在1993 年所写的论著《Building the DataWarehouse》首先系统地阐述了关于数据仓库的思想、理论,为数据仓库的发展奠定了历史基石。
文中他将数据仓库定义为:A data warehouse is a subject-oriented, integrated, non-volatile, time-variant collection of data in support of management decisions.一个面向主题的、集成的、非易失性的、随时间变化的数据的集合,以用于支持管理层决策过程。
1.2特性1.2.1subject-oriented(面向主题性)面向主题表示了数据仓库中数据组织的基本原则,数据仓库中的数由数据都是环绕着某一主题组织展开的。
由于数据仓库的用户大多是企业的管理决策者,这些人所面对的往往是一些比较抽象的、层次较高的管理分析对象。
例如,企业中的客户、产品、供应商等都可以作为主题看待。
从信息管理的角度看,主题就是在一个较高的管理层次上对信息系统的数据按照某一具体的管理对象进行综合、归类所形成的分析对象。
从数据组织的角度看,主题是一些数据集合,这些数据集合对分析对象作了比较完整的、一致的描述,这种描述不仅涉及到数据自身,而且涉及到数据之间的关系。
数据仓库(外文翻译)
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information onwhich an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informationfrom multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
SQL数据库外文翻译
Working with DatabasesThis chapter describes how to use SQL statements in embedded applications to control databases. There are three database statements that set up and open databases for access: SET DATABASE declares a database handle, associates the handle with an actual database file, and optionally assigns operational parameters for the database.SET NAMES optionally specifies the character set a client application uses for CHAR, VARCHAR, and text Blob data. The server uses this information totransli terate from a database’s default character set to the client’s character set on SELECT operations, and to transliterate from a client application’s character set to the database character set on INSERT and UPDATE operations.g CONNECT opens a database, allocates system resources for it, and optionally assigns operational parameters for the database.All databases must be closed before a program ends. A database can be closed by using DISCONNECT, or by appending the RELEASE option to the final COMMIT or ROLLBACK in a program. Declaring a databaseBefore a database can be opened and used in a program, it must first be declared with SET DATABASE to:CHAPTER 3 WORKING WITH DATABASES. Establish a database handle. Associate the database handle with a database file stored on a local or remote node.A database handle is a unique, abbreviated alias for an actual database name. Database handles are used in subsequent CONNECT, COMMIT RELEASE, and ROLLBACK RELEASE statements to specify which databases they should affect. Except in dynamic SQL (DSQL) applications, database handles can also be used inside transaction blocks to qualify, or differentiate, table names when two or more open databases contain identically named tables.Each database handle must be unique among all variables used in a program. Database handles cannot duplicate host-language reserved words, and cannot be InterBase reserved words.The following statement illustrates a simple database declaration:EXEC SQLSET DATABASE DB1 = ’employee.gdb’;This database declaration identifies the database file, employee.gdb, as a database the program uses, and assigns the database a handle, or alias, DB1.If a program runs in a directory different from the directory that contains the database file, then the file name specification in SET DATABASE must include a full path name, too. For example, the following SET DATABASE declaration specifies the full path to employee.gdb:EXEC SQLSET DATABASE DB1 = ’/interbase/examples/employee.gdb’;If a program and a database file it uses reside on different hosts, then the file name specification must also include a host name. The following declaration illustrates how a Unix host name is included as part of the database file specification on a TCP/IP network:EXEC SQLSET DATABASE DB1 = ’jupiter:/usr/interbase/examples/employee.gdb’;On a Windows network that uses the Netbeui protocol, specify the path as follows: EXEC SQLSET DATABASE DB1 = ’//venus/C:/Interbase/examples/employee.gdb’; DECLARING A DATABASEEMBEDDED SQL GUIDE 37Declaring multiple databasesAn SQL program, but not a DSQL program, can access multiple databases at the same time. In multi-database programs, database handles are required. A handle is used to:1. Reference individual databases in a multi-database transaction.2. Qualify table names.3. Specify databases to open in CONNECT statements.Indicate databases to close with DISCONNECT, COMMIT RELEASE, and ROLLBACK RELEASE.DSQL programs can access only a single database at a time, so database handle use is restricted to connecting to and disconnecting from a database.In multi-database programs, each database must be declared in a separate SET DATABASE statement. For example, the following code contains two SET DATABASE statements:. . .EXEC SQLSET DATABASE DB2 = ’employee2.gdb’;EXEC SQLSET DATABASE DB1 = ’employee.gdb’;. . .4Using handles for table namesWhen the same table name occurs in more than one simultaneously accessed database, a database handle must be used to differentiate one table name from another. The database handle is used as a prefix to table names, and takes the formhandle.table.For example, in the following code, the database handles, TEST and EMP, are used to distinguish between two tables, each named EMPLOYEE:. . .EXEC SQLDECLARE IDMATCH CURSOR FORSELECT TESTNO INTO :matchid FROM TEST.EMPLOYEEWHERE TESTNO > 100;EXEC SQLDECLARE EIDMATCH CURSOR FORSELECT EMPNO INTO :empid FROM EMP.EMPLOYEEWHERE EMPNO = :matchid;. . .CHAPTER 3 WORKING WITH DATABASES38 INTERBASE 6IMPORTANTThis use of database handles applies only to embedded SQL applications. DSQL applications cannot access multiple databases simultaneously.4Using handles with operationsIn multi-database programs, database handles must be specified in CONNECT statements to identify which databases among several to open and prepare for use in subsequent transactions.Database handles can also be used with DISCONNECT, COMMIT RELEASE, and ROLLBACKRELEASE to specify a subset of open databases to close.To open and prepare a database w ith CONNECT, see “Opening a database” on page 41.To close a database with DISCONNECT, COMMIT RELEASE, or ROLLBACK RELEASE, see“Closing a database” on page 49. To learn more about using database handles in transactions, see “Accessing an open database” on p age 48.Preprocessing and run time databasesNormally, each SET DATABASE statement specifies a single database file to associate with a handle. When a program is preprocessed, gpre uses the specified file to validate the program’s table and column referenc es. Later, when a user runs the program, the same database file is accessed. Different databases can be specified for preprocessing and run time when necessary.4Using the COMPILETIME clause A program can be designed to run against any one of several identically structured databases. In other cases, the actual database that a program will use at runtime is not available when a program is preprocessed and compiled. In such cases, SET DATABASE can include a COMPILETIME clause to specify a database for gpre to test against during preprocessing. For example, the following SET DATABASE statement declares that employee.gdb is to be used by gpre during preprocessing: EXEC SQLSET DATABASE EMP = COMPILETIME ’employee.gdb’;IMPORTANTThe file specification that follows the COMPILETIME keyword must always be a hard-coded, quoted string.DECLARING A DATABASEEMBEDDED SQL GUIDE 39When SET DATABASE uses the COMPILETIME clause, but no RUNTIME clause, and does not specify a different database file specification in a subsequent CONNECT statement, the same database file is used both for preprocessing and run time. To specify different preprocessing and runtime databases with SET DATABASE, use both the COMPILETIME andRUNTIME clauses.4Using the RUNTIME clauseWhen a database file is specified for use during preprocessing, SET DATABASE can specify a different database to use at run time by including the RUNTIME keyword and a runtime file specification:EXEC SQLSET DATABASE EMP = COMPILETIME ’employee.gdb’RUNTIME ’employee2.gdb’;The file specification that follows the RUNTIME keyword can be either ahard-coded, quoted string, or a host-language variable. For example, the following C code fragment prompts the user for a database name, and stores the name in a variable that is used later in SET DATABASE:. . .char db_name[125];. . .printf("Enter the desired database name, including node and path):\n");gets(db_name);EXEC SQLSET DATABASE EMP = COMPILETIME ’employee.gdb’ RUNTIME : db_name; . . .Note host-language variables in SET DATABASE must be preceded, as always, by a colon.Controlling SET DATABASE scopeBy default, SET DATABASE creates a handle that is global to all modules in an application.A global handle is one that may be referenced in all host-language modules comprising the program. SET DATABASE provides two optional keywords to change the scope of a declaration:g STATIC limits declaration scope to the module containing the SET DATABASE statement. No other program modules can see or use a database handle declared STATIC.CHAPTER 3 WORKING WITH DATABASES40 INTERBASE 6EXTERN notifies gpre that a SET DATABASE statement in a module duplicates a globally-declared database in another module. If the EXTERN keyword is used, then another module must contain the actual SET DATABASE statement, or an error occurs during compilation.The STATIC keyword is used in a multi-module program to restrict database handle access to the single module where it is declared. The following example illustrates the use of theSTATIC keyword:EXEC SQLSET DATABASE EMP = STATIC ’employee.gdb’;The EXTERN keyword is used in a multi-module program to signal that SET DATABASE in one module is not an actual declaration, but refers to a declaration made in a different module. Gpre uses this information during preprocessing. The following example illustrates the use of the EXTERN keyword:EXEC SQLSET DATABASE EMP = EXTERN ’employee.gdb’;If an application contains an EXTERN reference, then when it is used at run time, the actual SET DATABASE declaration must be processed first, and the database connected before other modules can access it.A single SET DATABASE statement can contain either the STATIC or EXTERN keyword, but not both. A scope declaration in SET DATABASE applies to both COMPILETIME and RUNTIME databases.Specifying a connection character setWhen a client application connects to a database, it may have its own character set requirements. The server providing database access to the client does not know about these requirements unless the client specifies them. The client application specifies its character set requirement using the SET NAMES statement before it connects to the database.SET NAMES specifies the character set the server should use when translating data from the database to the client application. Similarly, when the client sends data to the database, the server translates the data from the client’s character set to the database’s default character set (or the character set for an individual column if it differs from the databas e’s default character set). For example, the following statements specify that the client is using the DOS437 character set, then connect to the database:EXEC SQLOPENING A DATABASEEMBEDDED SQL GUIDE 41SET NAMES DOS437;EXEC SQLCONNECT ’europe.gdb’ USER ’JAMES’ PASSWORD ’U4EEAH’;For more information about character sets, see the Data Definition Guide. For the complete syntax of SET NAMES and CONNECT, see the Language Reference. Opening a databaseAfter a database is declared, it must be attached with a CONNECT statement before it can be used. CONNECT:1. Allocates system resources for the database.2. Determines if the database file is local, residing on the same host where the application itself is running, or remote, residing on a different host.3. Opens the database and examines it to make sure it is valid.InterBase provides transparent access to all databases, whether local or remote. If the database structure is invalid, the on-disk structure (ODS) number does not correspond to the one required by InterBase, or if the database is corrupt, InterBase reports an error, and permits no further access. Optionally, CONNECT can be used to specify:4. A user name and password combination that is checked against the server’s security database before allowing the connect to succeed. User names can be up to 31 characters.Passwords are restricted to 8 characters.5. An SQL role name that the user adopts on connection to the database, provided that the user has previously been granted membership in the role. Regardless of role memberships granted, the user belongs to no role unless specified with this ROLE clause.The client can specify at most one role per connection, and cannot switch roles except by reconnecting.6. The size of the database buffer cache to allocate to the application when the default cache size is inappropriate.Using simple CONNECT statementsIn its simplest form, CONNECT requires one or more database parameters, each specifying the name of a database to open. The name of the database can be a: Database handle declared in a previous SET DATABASE statement.CHAPTER 3 WORKING WITH DATABASES42 INTERBASE 61. Host-language variable.2. Hard-coded file name.4Using a database handleIf a program uses SET DATABASE to provide database handles, those handles should be used in subsequent CONNECT statements instead of hard-coded names. For example,. . .EXEC SQLSET DATABASE DB1 = ’employee.gdb’;EXEC SQLSET DATABASE DB2 = ’employee2.gdb’;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .There are several advantages to using a database handle with CONNECT:1. Long file specifications can be replaced by shorter, mnemonic handles.2. Handles can be used to qualify table names in multi-database transactions. DSQL applications do not support multi-database transactions.3. Handles can be reassigned to other databases as needed.4. The number of database cache buffers can be specified as an additional CONNECT parameter.For more information about setting the number of database cache buffers, see “Setting d atabase cache buffers” on page 47. 4Using strings or host-language variables Instead of using a database handle, CONNECT can use a database name supplied at run time. The database name can be supplied as either a host-language variable or a hard-coded, quoted string.The following C code demonstrates how a program accessing only a single database might implement CONNECT using a file name solicited from a user at run time:. . .char fname[125];. . .printf(’Enter the desired database name, including nodeand path):\n’);OPENING A DATABASEEMBEDDED SQL GUIDE 43gets(fname);. . .EXEC SQLCONNECT :fname;. . .TipThis technique is especially useful for programs that are designed to work with many identically structured databases, one at a time, such as CAD/CAM or architectural databases.MULTIPLE DATABASE IMPLEMENTATIONTo use a database specified by the user as a host-language variable in a CONNECT statement in multi-database programs, follow these steps:1. Declare a database handle using the following SET DATABASE syntax:EXEC SQLSET DATABASE handle = COMPILETIME ’ dbname’;Here, handle is a hard-coded database handle supplied by the programmer, dbnameis a quoted, hard-coded database name used by gpre during preprocessing.2. Prompt the user for a database to open.3. Store the database name entered by the user in a host-language variable.4. Use the handle to open the database, associating the host-language variable with the handle using the following CONNECT syntax:EXEC SQLCONNECT : variable AS handle;The following C code illustrates these steps:. . .char fname[125];. . .EXEC SQLSET DATABASE DB1 = ’employee.gdb’;printf("Enter the desired database name, including nodeand path):\n");gets(fname);EXEC SQLCONNECT :fname AS DB1;. . .CHAPTER 3 WORKING WITH DATABASES44 INTERBASE 6In this example, SET DATABASE provides a hard-coded database file name for preprocessing with gpre. When a user runs the program, the database specified in the variable, fname, is used instead. 4Using a hard-coded database namesIN SINGE-DATABASE PROGRAMSIn a single-database program that omits SET DATABASE, CONNECT must contain a hard-coded, quoted file name in the following format:EXEC SQLCONNECT ’[ host[ path]] filename’; host is required only if a program and a dat abase file it uses reside on different nodes.Similarly, path is required only if the database file does not reside in the current working directory. For example, the following CONNECT statement contains ahard-coded file name that includes both a Unix host name and a path name:EXEC SQLCONNECT ’valdez:usr/interbase/examples/employee.gdb’;Note Host syntax is specific to each server platform.IMPORTANT A program that accesses multiple databases cannot use this form of CONNECT.IN MULTI-DATABASE PROGRAMSA program that accesses multiple databases must declare handles for each of them in separate SET DATABASE statements. These handles must be used in subsequent CONNECT statements to identify specific databases to open:. . .EXEC SQLSET DATABASE DB1 = ’employee.gdb’;EXEC SQLSET DATABASE DB2 = ’employee2.gdb’;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .Later, when the program closes these databases, the database handles are no longer in use. These handles can be reassigned to other databases by hard-coding a file name in a subsequent CONNECT statement. For example,OPENING A DATABASEEMBEDDED SQL GUIDE 45. . .EXEC SQLDISCONNECT DB1, DB2;EXEC SQLCONNECT ’project.gdb’ AS DB1;. . .Additional CONNECT syntaxCONNECT supports several formats for opening databases to provide programming flexibility. The following table outlines each possible syntax, provides descriptions and examples, and indicates whether CONNECT can be used in programs that access single or multiple databases:For a complete discussion of CONNECT syntax and its uses, see the Language Reference.Syntax Description ExampleSingle accessMultiple accessCONNECT ‘dbfile’; Opens a single, hard-coded database file, dbfile.EXEC SQLCONNECT ‘employee.gdb’;Yes NoCONNECT handle; Opens the database file associated with a previously declared database handle. This is the preferred CONNECT syntax.EXEC SQLCONNECT EMP;Yes YesCONNECT ‘dbfile’ AS handle;Opens a hard-coded database file, dbfile, and assigns a previously declared database handle to it.EXEC SQLCONNECT ‘employee.gdb’AS EMP;Yes Yes CONNECT :varname AS handle;Opens the database file stored in the host-language variable, varname, and assigns a previously declared database handle to it.EXEC SQL CONNECT :fname AS EMP;Yes YesTABLE 3.1 CONNECT syntax summaryCHAPTER 3 WORKING WITH DATABASES46 INTERBASE 6Attaching to multiple databasesCONNECT can attach to multiple databases. To open all databases specified in previous SETDATABASE statements, use either of the following CONNECT syntax options: EXEC SQLCONNECT ALL;EXEC SQLCONNECT DEFAULT;CONNECT can also attach to a specified list of databases. Separate each database request from others with commas. For example, the following statement opens two databases specified by their handles:EXEC SQLCONNECT DB1, DB2;The next statement opens two hard-coded database files and also assigns them to previously declared handles:EXEC SQLCONNECT ’employee.gdb’ AS DB1, ’employee2.gdb’ AS DB2;Tip Opening multiple databases with a single CONNECT is most effective when a program’s database access is simple and clear. In complex programs that open and close several databases, that substitute database names with host-language variables, or that assign multiple handles to the same database, use separate CONNECT statements to make program code easier to read, debug, and modify.Handling CONNECT errors. The WHENEVER statement should be used to trap and handle runtime errors that occur during database declaration. The following C code fragment illustrates an error-handling routine that displays error messages and ends the program in an orderly fashion:. . .EXEC SQLWHENEVER SQLERRORGOTO error_exit;. . .OPENING A DATABASEEMBEDDED SQL GUIDE 47:error_exitisc_print_sqlerr(sqlcode, status_vector);EXEC SQLDISCONNECT ALL;exit(1);. . .For a complete discussion of SQL error handling, see Chapter 12, “Error Handling and Recovery.”数据库的工作这章描述怎样使用在嵌入式应用过程中的SQL语句控制数据库。
SAP数据仓库帮助目录翻译
Archivation of Request Administration Data .请求管理数据的档案集
BI Background Management .BI后台管理
Clients in BW .BW的客户
Transferring Transaction Data Using Web Services (RDA) .使用WEB服务传送和处理事务数据
Monitor for Real-Time Data Acquisition .实时数据获取的监视(监控)
Troubleshooting Real-Time Data Acquisition. 实时数据获取的困难解决
InfoSource .信息源。
Migration of Update Rules, 3.x InfoSources, and Transfer Rules .对更新规则、 3.x 信息源和传输规则的迁移。
Old Transformation Concept .旧的转型概念。
Further Processing Data .进一步处理数据。
Data Warehousing .数据仓库。
ห้องสมุดไป่ตู้
The Data Warehouse Concept .数据仓库概念。
Using a Data Warehouse .使用数据仓库。
Architecture of a Data Warehouse .数据仓库的构建。
Enterprise Data Warehouse (EDW) .企业数据仓库(EDW)。
Controlling Real-Time Data Acquisition with Process Chains .带处理链的实时数据获取控制
数据仓库(Data Warehousing)说明书
Trademarks All products and service marks mentioned herein are trademarks of the respective owners mentioned in the articles and or on the website. The publishers cannot attest to the accuracy of the information provided . Use of a term in this book and/or website should not be regarded as affecting the validity of any trademark or service mark. 1st Edition April 2001 All rights reserved . Copyright © 2001 Friedr. Vieweg & Sohn Verlagsgesellschaft mbH, Braunschweig/wiesbaden
international IT-training corporation, the editors have easy access to the latest information on IT-developments and are kept well-informed by their colleagues. In their research
中英文文献翻译
Database introduction and ACCESS2000The database is the latest technology of data management, and the important branch of computer science. The database , as its name suggests, is the warehouse to preserve the data. The warehouse to store apparatus in computer only, and data to deposit according to sure forms。
The so-called database is refers to the long-term storage the data acquisition which in the computer, organized, may share。
In the database data according to the certain data model organization, the description, and the storage, has a smaller redundance, the higher data independence and the easy extension, and may altogether shine for each kind of user。
The effective management database, frequently has needed some database management systems (DBMS) is the user provides to database operation each kind of order, the tool and the method, including database establishment and recording input, revision, retrieval, demonstration, deletion and statistics。
数据仓库技术知识
一、数据仓库数据仓库,英文名称为Data Warehouse,可简写为DW或DWH。
数据仓库,是为企业所有级别的决策制定过程,提供所有类型数据支持的战略集合。
它是单个数据存储,出于分析性报告和决策支持目的而创建。
为需要业务智能的企业,提供指导业务流程改进、监视时间、成本、质量以及控制。
1、数据仓库是面向主题的;操作型数据库的数据组织面向事务处理任务,而数据仓库中的数据是按照一定的主题域进行组织。
主题是指用户使用数据仓库进行决策时所关心的重点方面,一个主题通常与多个操作型信息系统相关。
2、数据仓库是集成的,数据仓库的数据有来自于分散的操作型数据,将所需数据从原来的数据中抽取出数据仓库的核心工具来,进行加工与集成,统一与综合之后才能进入数据仓库;数据仓库中的数据是在对原有分散的数据库数据抽取、清理的基础上经过系统加工、汇总和整理得到的,必须消除源数据中的不一致性,以保证数据仓库内的信息是关于整个企业的一致的全局信息。
数据仓库的数据主要供企业决策分析之用,所涉及的数据操作主要是数据查询,一旦某个数据进入数据仓库以后,一般情况下将被长期保留,也就是数据仓库中一般有大量的查询操作,但修改和删除操作很少,通常只需要定期的加载、刷新。
数据仓库中的数据通常包含历史信息,系统记录了企业从过去某一时点(如开始应用数据仓库的时点)到当前的各个阶段的信息,通过这些信息,可以对企业的发展历程和未来趋势做出定量分析和预测。
3、数据仓库是不可更新的,数据仓库主要是为决策分析提供数据,所涉及的操作主要是数据的查询;4、数据仓库是随时间而变化的,传统的关系数据库系统比较适合处理格式化的数据,能够较好的满足商业商务处理的需求。
稳定的数据以只读格式保存,且不随时间改变。
5、汇总的。
操作性数据映射成决策可用的格式。
6、大容量。
时间序列数据集合通常都非常大。
7、非规范化的。
Dw数据可以是而且经常是冗余的。
8、元数据。
将描述数据的数据保存起来。
第二章_数据仓库-文档资料
Data Warehouse—Time Variant
The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical
attribute measures, etc. among different data sources
✓ E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
第二章_数据仓库
外文翻译---数据仓库技术
附录1 外文原文Data warehouse techniqueThe data warehouse says allThe data warehouse is an environment, not a product. It provides the decision that customer used for current history data supports, these data is very difficult in traditional operation type database or can't get, say more tangibly, the data warehouse is a kind of system construction. Data warehouse than it customer relation the management is a concept that is been familiar with by person, it is 1991 the United States an information engineering learns what house W.H. Inmon Doctor put forward, its definition is" the data warehouse is a support decision the process faces to the topic of, gather of, at any time but change of, the last long data gathers".The technique system construction of the data warehouse The data of obtains mold piece: Used for obtaining the data from the source document with the source database, combine the proceeding sweep, delivering, adding it to data warehouse database inside.· Data management mold piece: Used for the movement of the management data warehouse.· Delivers mold piece: Used for the other warehouse in direction with assign the data warehouse data in the exterior system.· The data is in the center a mold piece: The end customer in direction tool that used for the method provides the interview data warehouse database.· Data interview mold piece: Used for providing the interview for the end customer of the business enterprise with the tool of the analysis data warehouse data.· Design mold piece: Used for the design data warehouse database.· Information catalogue mold piece: Used for governor to provide with the customer relevant saving contents in data warehouse data in the database with meaning information.How to establish the data warehouseCurrent, the internal calculator in business enterprise system is mutually independent, the data rule( legitimacy) demand of the system that have is affirmed from the other system, various data lacks to gather sex, conduct and actions trend, the data warehouse technique is an one of the most emollient way to makes these data gathered get ups, the data warehouse establishes can at logical realize various system interaction operation, this lay the foundation for the modern college in developments, also leads for the college layer science decision offering guarantees powerfully. Theprocess that establish in the data warehouse needs below step:1. Establish the data model to the end business need. The design of the data model not only consider only to the first topic, but also looks after both sides the need of the other management in college decision topic to searches the need of the topic with every kind of data, statement.2. The certain topic proceeding data sets up the mold. According to the decision need certain topic, choice data source, proceed logic construction design.3. The database of the design data warehouse. Put great emphasis on the saving construction in physics that apply in the topic development data warehouse inside data.4. Definition data source. According to the topic data model, choose different operation type database as the data source.5. Establish the model for a data. The model made sure into the data scope of the data warehouse, and with provision of relevant data. Complete a data, can let customer known, the data warehouse inside has actually what data, the data gathers the level of structure with how detailed degree is, can provide what information, how these information are carried calculates with organizes etc..6. Take( Extract), convert( Transmit), add from the operation type database inside take out the data that carry( Load) the database inside arrive the data warehouse.7. Choice data interview analysis tool, the customer will use the saving information within these toolses interview data warehouse, realizing decision support need.The data scoops out the techniqueAlong with the database technical develop continuously and extensive application in each profession in system in management in database, the backlog enlarges in the nasty play in amount of data in the database, but among them can use directly however opposite less in amount of information.People have been hoping can to conceal in the superficial information in these data, proceed many level of structures analyze, for the purpose of better land utilization acquire the benefit to operate in the business with these data, increase the information of the social competition ability.Current every kind of database management system although can realizes efficiently the data record into and search, statistics to wait the function, can't discover relation existed in the data with regulation, resulted in like this and then a kind of data Bang and knowledge needy keep both of phenomenon. According to the inquisition, the data collections increase with saving with every year 130% speed, but in the data only have 2% data to is analyzed availably. This exploitation that scoop out provided the vast space for the data .To the 2004, apply to attain USD1,000,000,000 in the data of the electronic commerce market of scooping out the tool.附录2 外文资料译文数据仓库技术数据仓库概述数据仓库是一个环境,而不是一件产品。
《数据库英文翻译》word版
databaseDatabase is in accordance with the data structure to organize, storage and management of data warehouse, which arises from fifty years ago, with the dating of information technology and the development of the market, especially since the 1990s, data management is no longer merely data storage and management, and transformed into user needs of the various data management way. The database has a variety of types, from the most simple storage have various data form to can be carried out mass data storage of large database systems are obtained in each aspect has extensive application.The birth of data managementDatabase's history can be traced back to fifty years ago, when the data management is very simple. Through a lot of classification, comparison and form rendering machine running millions of punched CARDS for data processing, its operation results on paper printed or punched card made new. While the data management is punched card for all these physical storage and handling. However, 1 9 5 1 year Remington Rand corporation (Remington Rand Inc.) an enzyme called Univac I computer launched a a second can input hundreds of recording tape drives, which has caused data management revolution. 1956 IBM produce the first disk drives -- the RAMAC Model 305. This drives have 50 blanks, each blanks diameter is 2 feet, can store 5 MB of data. The biggest advantage is use disk can be randomly access data, and punched CARDS and tape can order access data.Database system appears in the 1960s the bud. When computer began to widely used in data management, the sharing of data put forward more and more high demand. The traditional file system already cannot satisfy people's needs. Manage and share data can unify the database management system (DBMS) came into being. The data model is the core and foundation of database system, various DBMS software are based on a data model. So usually in accordance with the characteristics of the data model and the traditional database system into mesh database, the hierarchy database and relational database three types.Structured query language (SQL)commercial database systems require a query language that is more user friendly. In this chapter,we study SQL, themost influential commercially marketed query language, SQL. SQL uses a combination ofrelational-algebra and relational-calculus constructs.Although we refer to the SQL language as a “query language,” it can do much more than just query a database. It can define the structure of the data, modify data in the database, and specify security constraints.It is not our intention to provide a complete users’ guide for SQL.Rather,we present SQL’s fundamental constructs and concepts. Individual implementations of SQL may differ in details, or may support only a subset of the full language.2.1 BackgroundIBM developed the original version of SQL at its San Jose Research Laboratory (nowthe Almaden Research Center). IBM implemented the language, originally called Sequel, as part of the System R project in the early 1970s. The Sequel language hasevolved since then, and its name has changed to SQL (Structured Query Language). Many products now support the SQL language. SQL has clearly established itself as the standard relational-database language.In 1986, the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) published an SQL standard, called SQL-86.IBM published its own corporate SQL standard, the Systems Application Architecture Database Interface (SAA-SQL) in 1987. ANSI published an extended standard forSQL, SQL-89, in 1989. The next version of the standard was SQL-92 standard, and the most recent version is SQL:1999. The bibliographic notes provide references to these standards.Chapter 4 SQLIn this chapter, we present a survey of SQL, based mainly on the widely implemented SQL-92 standard. The SQL:1999 standard is a superset of the SQL-92 standard;we cover some features of SQL:1999 in this chapter, and provide more detailed coverage in Chapter 9. Many database systems support some of the new constructs inSQL:1999, although currently no database system supports all the new constructs. You should also be aware that some database systems do not even support all the features of SQL-92, and that many databases provide nonstandard features that we donot cover here.The SQL language has several parts:•Data-definition language (DDL). The SQL DDL provides commands for defining relation schemas, deleting relations, and modifying relation schemas.•Interactive data-manipulation language (DML). The SQL DML includes a query language based on both the relational algebra and the tuple relational calculus. It includes also commands to insert tuples into, delete tuples from,and modify tuples in the database.•View definition.The SQL DDL includes commands for defining views.•Transaction control. SQL includes commands for specifying the beginning and ending of transactions.•Embedded SQL and dynamic SQL. Embedded and dynamic SQL define how SQL statements can be embedded within general-purpose programming languages, such as C, C++, Java, PL/I, Cobol, Pascal, and Fortran.•Integrity.The SQL DDL includes commands for specifying integrity constraints that the data stored in the database must satisfy. Updates that violate integrity constraints are disallowed.•Authorization.The SQL DDL includes commands for specifying access rights to relations and views.In this chapter, we cover the DML and the basic DDL features of SQL.Wealso briefly outline embedded and dynamic SQL, including the ODBC and JDBC standards for interacting with a database from programs written in the C and Java languages.SQL features supporting integrity and authorization are described in Chapter 6,while Chapter 9 outlines object-oriented extensions to SQL.The enterprise that we use in the examples in this chapter, and later chapters, is abanking enterprise with the following relation schemas:Branch-schema = (branch-name, branch-city, assets)Customer-schema = (customer-name, customer-street, customer-city)Loan-schema = (loan-number, branch-name, amount)Borrower-schema = (customer-name, loan-number)Account-schema = (account-number, branch-name, balance)Depositor-schema = (customer-name, account-number)Note that in this chapter, as elsewhere in the text, we use hyphenated names for schema, relations, and attributes for ease of reading. In actual SQL systems, however,hyphens are not valid parts of a name (they are treated as the minus operator). A simple way of translating the names we use to valid SQL names is to replace all hy phens by the underscore symbol (“ ”). For example, we use branch name in place ofbranch-name.2.2 Basic StructureA relational database consists of a collection of relations, each of which is assigneda unique name. Each relation has a structure similar to that presented in Chapter3.SQL allows the use of null values to indicate that the value either is unknown or does not exist. It allows a user to specify which attributes cannot be assigned null values,as we shall discuss in Section4.11.The basic structure of an SQL expression consists of three clauses: select, from,and where.•The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query.•The from clause corresponds to the Cartesian-product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression. •The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the fromclause.That the term select has different meaning in SQL than in the relational algebra is an unfortunate historical fact. We emphasize the different interpretations here to minimize potential confusion.A typical SQL query has the formselect A1,A2,...,Anfrom r1,r2,...,rmwhere PEach Ai represents an attribute, and each ri arelation. P is a predicate. The query isequivalent to the relational-algebra expressionΠA1,A2,...,An(σP (r1 × r2 × ··· × rm))If the where clause is omitted, the predicate P is true. However, unlike the result of a relational-algebra expression, the result of the SQL query may containmultiple copies of some tuples; we shall return to this issue in Section 4.2.8.SQL forms the Cartesian product of the relations named in the from clause,performs a relational-algebra selection using the where clause predicate, and then projects the result onto the attributes of the select clause. In practice, SQL may convert the expression into an equivalent form that can be processed more efficiently.However, we shall defer concerns about efficiency to Chapters 13 and 14.In 1974, IBM's Ray Boyce and Don Chamberlin will Codd relational database 12 rule mathematical definition with simple keyword grammar expression comes out, put forward the landmark Structured Query Language (SQL) Language. SQL language features include inquiry, manipulation, definition and control, is a comprehensive, general relational database language, and at the same time, a highly the process of language, only request users do not need pointed out how do pointed out. SQL integration achieved database of all life cycle operation. SQL database provides and relations interact with the method, it can work with standard programming language. The date of the produce, SQL language became the touchstone of inspection relational database, and SQL standard every variation of guidingthe relational database product development direction. However, until the twentieth century, the mid 1970s to the theory of relation in commercial database Oracle and SQL used in DB2.In 1986, the SQL as ANSI relational database language American standards, that same year announced the standard SQL text. Currently SQL standard has three versions. ANSIX3135 - is defined as the basic SQL Database Language - 89, "Enhancement" SQL. A ANS89] [, generally called SQL - 89. SQL - 89 defines the schema definition, data operation and the transaction. SQL - 89 and subsequent ANSIX3168-1989, "Language - Embedded SQL Database, constituted the first generation of SQL standard. ANSIX3135-1992 [ANS92] describes a enhancements of SQL, now called SQL - 92 standards. SQL - 92 including mode operation, dynamic creation and SQL statements dynamic executive, network environment support enhancement. Upon completion of SQL - 92 ANSI and ISO standard, they started SQL3 standards development cooperation. The main features SQL3 abstract data types support, for the new generation of object relational database provides standard.The nature of database data1. Data integrity: database is a unit or an application field of general data processing system, he storage is to belong to enterprise and business departments, organizations and individuals set of related data. Database data from a global view, he according to certain data model organization, description and storage. Based on the structure of data between natural relation, thus can provide all the necessary access route, and data no longer deal with a certain applications, but for total organization, with a whole structural features.2. Data sharing: database data is for many users sharing their information and the establishment, got rid of the specific procedures restrictions and restraint. Different users can use the database according to their respective usage the data; Multiple users can also Shared database data resource, i.e., different users can also access database in the same data. Data sharing each user not only meets the requirements of information, but also meet the various users of information communication between the requirements.Object-oriented databaseAlong with the development of information technology and the market, people found relational database system, while technology is mature, but its limitations is obvious: it can be a very good treatment of so-called "form of data", but of dominating the more and more complex appear helpless type of data. Since the 1990s, technology has been studying and seek new database system. But in what is the development direction of the new database system, industry once is quite confused. The influence of agitation by technology at the time, for quite some time, people put a lot of energy spent on research "object-oriented database system (object oriented database)" or simply as "OO database system". What is worth mentioning, the United States Stonebraker professor proposed object-oriented RDS theory once favored by industry. And in Stonebraker himself Informix spend big money was then appointed technology director always.However, several years of development, spatio-temporal object-oriented relational database system product market development situation is not good. Theoretically perfect sex didn't bring market warm response. The main reason of success, the main design thought database products with new database system is an attempt to replace the existing database system. This for many has been using database system for years and accumulated the massive job data, especially big customer for customers, is unable to withstand the conversion between old and new data that huge workload and big spending. In addition, object-oriented RDS system makes query language extremely complex, so whether database development businessman or application customers depending on the complicated application technology to be a dangerous road.Basic structureThe basic structure of database, reflects the three levels of observation database of three different Angle.(1) physical data layer.It is the most lining, is database on physical storage equipment actually stored data collection. These data are raw data, the object, the user isprocessed by internal model describing the throne of handled the instructions of string, character and word.(2) conceptual data layer.It is a layer of database, is among the whole logic said. Database Points out the logic of each data definition and the logical connection between data collection of storage and record, is. It is related to the database all objects logical relationship, not their physical condition, is under the database administrator concept of database.(3) logical data layer.It is the user sees and use the database, says one or some specific users use collections of data, namely the logical record set.Database different levels is the connection between conversion by mapping.Main features(1) to implement the data sharing.Data sharing contains all users can also access database data, including the user can use all sorts of ways to use the database through interfaces, and provide data sharing.(2) reduce data redundancy.Compared with the file system, because the database to achieve data sharing so as to avoid the user respective establish application documentation. Reduce a lot of repeating data, reduce the data redundancy, maintain the consistency of the data.(3) data of independence.Data independence including database database of logical structure and application independent, also including data physical structure change does not affect the data of the logical structure.(4) data realize central control.File management mode, data in a decentralized state, different user or same users in different treatment had no relation between the documents.Using the database of data can be concentrated control and management, and through the data model of data organization and said the relation between data.(5) the data consistency and maintainability, to ensure the safety and reliability of the data.Mainly includes: (1) the safety control: to prevent data loss, error updating and excessive use; (2) the integrity control: ensure data accuracy, effectiveness and compatibility; (3) the concurrent control: make in the same time period to allow data realization muli-access, and can prevent users of abnormal interaction between; (4) fault finding and recovery: the database management system provides a set of method, can isolate faults and repair fault, thereby preventing data breaches(6) fault recovery.The database management system provides a set of method, can isolate faults and repair fault, thereby preventing data breaches. Database system can restore database system is running as soon as possible, is probably the fault occurred in the physical or logical error. For instance, in system caused by incorrect operation data error, etc.Database classification1. The MaiJie openPlant real-time databaseReal-time database system is a new field in database theory expansion, in power, chemical, steel, metallurgy, papermaking, traffic control and securities finance and other fields has a very broad application prospect. It can provide enterprises with high speed, timely real-time data services, to the rapidly changing real-time data to carry on the long-term effective history storage, is a factory control layer (fieldbus, DCS, PLC, etc) and production management system of the connection between the bridge, also process simulation, advanced control, online optimization, fault diagnosis system data platform.OpenPlant real-time database system used in today's advanced technology and architecture, can realize safe, stable and field each control system of the interface, and collected data efficient data compression and China's longhistory of storage, meanwhile, we also provide convenient client application and general data interface (API/DDE/ODBC/JDBC/OPC, etc.), make the enterprise management and decision makers can prompt, and comprehensive understanding of the current production situation, also can look back at past production conditions, the timely discovery and the problems existing in the production, improve equipment utilization rate, reduce production cost, the enhancement enterprise's core competitive ability.2. IBM DB2As the pioneer and relational database fields, IBM pilot in 1977 completed System R System prototype, 1980 began to provide integrated database server - System / 38, followed by SQL/DSforVSE and VM, and its initial version is closely related with SystemR research prototype. In 1983 forMVSV1 DB2 launch. This version of the objective is to provide the new plan promised simplicity, data don't correlation and user productivity. 1988 DB2 for MVS provides a powerful online transaction processing (OLTP) support, 1989 and 1993 respectively to remote work unit and distributed work unit realized distributed database support. Recently launched Universal Database 6.1 is DB2 Database model gm, is the first online function with multimedia relational Database management system, support includes a series of platform, Linux.3. OracleOracle predecessor called SDL Ellison and another by a benchwarmer in 1977 founded two programmers, they have developed their own fist product in the market, a large sale, 1979, Oracle company introduces the first commercial SQL relational database management system. Oracle corporation is the earliest development relational database, its product support and producers of the most widely operating system platform. Now Oracle relational database products market share a front-runner.4. InformixInformix founded in 1980, the purpose is for Unix operating system to provide professional such open relational database products. The company's name from Information and Informix is the combination of Unix. Informix first truly support SQL database products is the relationship between Informix SE(StandardEngine). InformixSE is at the time of the microcomputer Unix environment main database products. It is also the first to be transplanted into the commercial database products on Linux.5. SybaseSybase company was founded in 1984, the company name "Sybase" from "the system" and "database" combination of meaning. One of the company's founder Bob Sybase Epstein is Ingres university edition (and System/R the same period the relational database model products) of main design personnel. The company's first a relational database products are launched in 1987 SQLServer1.0 Sybase may. Sybase are first proposed the structure of database system/Server thoughts, and took the lead in SQLServer Sybase realize.6. SQL ServerIn 1987, Microsoft and IBM cooperative development complete OS / 2, IBM in its sales OS / 2 ExtendedEdition system binding Manager, and 2Database OS/production line is still lack of database of Microsoft products. Therefore, Microsoft will Sybase, Sybase foreign-exchange signed cooperation agreements with the technical development, the use of Sybase based on OS / 2 platform relational database. In 1989, Microsoft released SQL Server version 1.0.7. PostgreSQLPostgreSQL is a characteristic very complete free software objects - relational database management system (ORDBMS), and many of its characteristic is many of today's commercial database predecessor. The earliest PostgreSQL Ingres project started in BSD the. The characteristics of PostgreSQL covering the SQL - 2 / SQL - 92 and SQL - 3. First, it includes can say to the world's most abundant data types of support; Second, at present PostgreSQL is the only support affairs, son query, multiple versions parallel control system, data integrity checking the only features such as a free software database management system.8. MySQLMySQL is a small relational database management system, developers for Sweden mySQL AB corporation. In January 2008, was 16 from takeover. Currently MySQL is widely used in the small and medium-sized websites on the Internet. Because of its small size, speed, overall has low cost, especially open-source this one characteristic, many small and medium-sized web site, in order to reduce the overall cost of ownership website and selected MySQL as website database9 in Access databasesAmerican Microsoft company in 1994 launched microcomputer database management system. It has friendly interface, easy easy to use, development is simple, flexible, and other features, is the interface typical of the new generation of desktop database management system. Its main features below:(1) perfect management, various database objects of strong data organization, user management, safety inspection functions.(2) strong data processing functions, in a group level of network environment, use Access development multi-user database management system have traditional XBASE (DBASE, FoxBASE collectively) database system can achieve client/Server (Cient/Server) structure and the corresponding database security mechanism, the Access has many advanced large database management system has the characteristics, such as transaction processing/error rollback ability, etc.(3) can be easily generate various data object, using the data stored build forms and statements, visibility.(4) as Office suite, and part of the integrated, realize seamless connection version.(5) can use Web searching and release data, realization and Internet connection. Access mainly suitable for small and medium application system, or as a client/server system of the client database.10. SQLiteThe ACID is to comply with SQLite relational databases management system, it contained within a relatively small C library. It is ichardHipp D.Restablished public domain project. Not as common client/server architecture examples, with SQLite engine is not a process of communication independent process, but connected to the program become a major part of it. So the main communication protocol is within the programming language direct API calls. This in gross consumption, time delay and overall simplicity have positive role. The entire database (definition, table, indexing and data itself) are both in the host host stored in a single file. It is through the simple design of starting a business in the whole data files and complete lock.11. FoxPro databaseAt first by American Fox 1988, 1992 Fox launched by Microsoft company, have introduced after the takeover FoxPro2.5, 2.6 and VisualFoxPro etc, its function and performance version has greatly improved. FoxPro2.5, 2.6 into DOS and Windows two versions, respectively in DOS and running Windows environment. FoxPro FoxBASE in function and performance than and have improved greatly, mainly introducing the window, button, list box and text box control such as to enhance the system development capabilities.Common databases1. MySQL is the most popular open source SQL database management system, the company develops by MySQL AB, issuing, and support. MySQL AB is founded by several MySQL developer a commercial company. It is a second-generation open-source company, combined with open source value orientation, method and successful business model.Features:MySql core program using fully multi-threaded programming. Threading is lightweight processes, it can be flexibly provides services for users, and the system resources. But manyMySql can run in different operating systems. Say simply, MySql can support Windows95/98 / NT / 2000 and UNIX, Linux and OS from various operating system platform.MySql have a very flexible and safe access and password system. When a customer and MySql server connection between all the password, theytransmit encrypted, and MySql support host authentication.Support ODBC for Windows. MySql support all the ODBC 2.5 function and many other functions, like this may use Access connections MySql server, thus make the MySql applications are greatly extend.MySql support large database. Although written in Php web page for it as long as can deposit hundreds of above record data is enough, but MySql can easily support millions of recorded database.MySql have a very rapid and stable memory allocation system based on the thread, can continue to use face don't have to worry about its stability.The strong search function. MySql support inquires the ramp of the SELECT and statements all operators and function, and can be in the same query from different database table mixes, thus make inquires the become quicker and easier.PHP provides strong support for MySql, provide a whole set of the PHP function to MySql MySql, carry on the omni-directional support.2. Based on the server is SQLServer database can be used for medium and large capacity data applications, on the function management is much more ambitious than Access. In handling mass data efficiency, backstage development aspects of flexibility, scalability is strong. Because now database are using standard SQL language to database management, so if it is standard SQL language, both basically can generic. 92HeZu nets all rent space can be used both double Access database, while supporting the SQL Server. SQL Server and more extended stored procedure, can use the database size without limit restrictions.Graphical user interface, make the system management and database management more intuitive and simple.The real client/server architecture.Rich programming interface tools for users to program designed to provide more choices.SQL Server with Windows NT, using a fully integrated many functions, suchas NT send and receive messages, management login security, etc. SQL Server can be a very good and Microsoft BackOffice product integration.Has the very good flexibility, can span from running Windows 95/98 laptop to run Windows 2000 large multiprocessor and so on many kinds of platform use.Technical support to the Web, users can easily will database data released on to the Web page.SQL Server provides the data warehouse function, this function only in Oracle and other more expensive DBMS is only found in.3. Access is a desktop database, is suitable only for data quantity is little, in dealing with the application of a database of data and single visit is very good, the efficiency is high. But it's also visit the client can't more than four. The access database has certain limit, if the data reach 100M or so, is very easy to create the server iis feign death, or consume server memory to crash the server.The future development trendWith the expansion of information management content, appeared to rich variety of data model (hierarchical model, meshy model, relation model, object-oriented model, half structural model and so on), the new technology also emerge in endlessly (data streams, Web data management, data mining, etc.). Now every few years, international senior database experts assembled, discusses database research status, problems and future needs the new technology focus attention. The past existing several similar reports include: 1989 The Future Directions inDBMS Laguna BeachParticipants true - DatabaseSystems: Achievements in 1990, Opportunities, 1995. Inmon W.H. Database 1991: constructing The publication of The data warehouse"。
计算机毕业设计外文翻译---数据仓库
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1)Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2)Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on..(3)Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data..In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisionsbased on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending).(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies.(3) analyzing operations and looking for sources of profit.(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1)Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2)Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3)Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4)View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供了组织结构和工具,以便系统地组织、理解和使用数据进行决策。
数据仓库的构建及其多维数据集分析
数据仓库的构建及其多维数据集分析什么是数据仓库?数据仓库(Data Warehouse)是指某一个组织中各类数据集合的集中存储,以支持企业决策和分析等活动。
据此可以看出,数据仓库的设计是为了支持和提高企业的数据决策和分析能力,以支持企业的高效决策和优化作用。
数据仓库是在业务数据的基础上构建的,通过对数据挖掘、数据分析等处理,将原始业务数据集合转换成为信息化的数据仓库。
数据仓库的构建过程在进行数据仓库的构建过程中,常用的方法是ETL,即Extract、Transform、Load的缩写。
这种构建方法是从源数据中抽取数据,进行转换和清洗,然后载入数据仓库。
抽取(Extract)抽取是指从一定范围内,不同来源的业务数据中,确定需要抽取的数据。
在抽取数据的时候,主要要考虑到数据的完整性和准确性。
对于不必要的数据或者错误的数据可以过滤掉,以便提高数据的质量。
转换(Transform)数据转换主要是指将抽取出来的数据进行清洗、矫正、数值变换等等操作。
在数据转换时,可以对数据进行简单的汇总、聚集,或者通过复杂的算法来产生派生数据。
载入(Load)在数据转换操作完成后,需要将数据载入数据仓库中。
载入数据仓库时,需要考虑到数据的完整性和一致性。
同时在进行载入的时候还要对数据进行一些检测,以避免数据存入后对整个数据仓库造成影响。
由此可见,数据仓库的构建涉及到多个环节,每个环节都需要严格执行,以保证数据的准确和完整性。
多维数据集分析通过数据仓库的构建,可以很方便地进行多维数据集分析,也就是OLAP(On-Line Analytical Processing)分析。
OLAP与传统的数据分析有所不同,它可以在不同纬度下查看数据,比如按时间、地区、产品等不同的纬度进行数据分析,以更好地满足企业的需要。
多维数据集多维数据集也就是指超过三个维度的数据。
在多维数据集中,每个维度都有其属性和层次结构,并且在维度之间存在着关系和交互作用。
数据库外文翻译
DatabaseA database may be defined as a collection of interrelated data stored together with as little redundancy as possible to serve one or more applications in an optimal fashion; the data are stored so that they are independents of programs which use the data; a common and controlled approach is used in adding new data and in modifying and retrieving existing data within the data base one system is said to contain a collection of databases if they are entirely separate in structure.The restructuring should be possible without having to rewrite the application program and in general should cause as little upheaval as possible the ease with which a database can be changed will have a major effect on the rate at which data-processing application can be developed in a corporation.The term data independence is often quoted as being one of the main attributes of a database int implies that the data and the may be changed without changing the other, when a single setoff data items serves a variety of applications, different application programs perceive different relationships between the data items, to a large extent database organization is concerned with the as how and where the data are stored.A database used for many applications can have multiple interconnection referred to as entities. An entity may be a tangible object or no tangible if it has various properties which we may wish to record. It can describe the real world. The data item represents an attribute and the attribute must be associated which the relevant entity. We relevant entity we design values to the attributes one attribute has a special significance in that it identifies the entity.Logical data description is called a model.We must distinguish a record and a record examples, when talking about all personnel records when it is really a record type, not combined with its data value.A model is used to describe the database used storage in the database data item type and record types of general charts, subschema paragraph refers to an application programmer view of data, many different patterns can get from one mode. The schemaand the subschema are both used by the database management system the primary function of which is to serve the application programs by execution their data operations.A DBMS will usually be handling multiple data calls concurrently, it must organize its system buffers so that different data operations can be in process together, it provides a data definition language to specify the conceptual schema and most likely some of the details regarding the implementation of the conceptual schema by the physical schema the describe the conceptual schema in terms for a “data model”.The choice of a data model is a difficult one, since it must be such enough in structure to describe significant aspects of the real world, yet it must be possible to determine fairly automatically an efficient implementation of the conceptual conceptual schema by a physical schema. It should be emphasized that while a DBMS might be used to build small databases many databases involve millions of bytes and an inefficient implementation can be disastrous.The hierarchical and network structures have been used for DBMS since the 1960’s . the relational structure was introduced in the early 1970’s.In the relational model two-dimensional tables represent the entities and their relationships every table represents an entities are represented by common columns containing values from a domain or range of possible values .The end user is presented with a simple data model his and her request and don not reflect any complexities due to system-oriented aspects a relational data model is what the user sees , but it is mot necessarily what will be implemented physically.The relational data model removes the details of storage structure and access strategy from the user inter-face the model providers a relatively higher degree of data to make use of this property of the relational data model however, the design of the relations must be complete and accurate. Although some DBMS based on the relational data model are commercially available today it is difficult to provide a complete set of operational capabilities with required efficiency on a large scale it appears today that technological improvements in providing faster and more reliable hardware may answer the question positively.The hierarchical data model is based no a tree-like structure made up of nodes and branches a node is a collection of data attributes describing the entity at that opine the highest node of the hierarchical.A hierarchical data model always starts with a root node every node consists of one or more attributes describing the entity at that node dependent nodes can follow the succeeding levels the mode in the receding level becomes the parent node of the new dependent nodes a parent node can have one child node as a dependent or many children nodes the major advantage of the hierarchical data model is the existence of proven database management systems that use the hierarchical data model as the basic structure there is a reduction of data dependency but any child mode is accessible only in a clumsy way this often results in a redundancy in stored data.The database concept has evolved since the 1960s to ease increasing difficulties in designing, building, and maintaining complex information systems (typically with many concurrent end-users, and with a large amount of diverse data). It has evolved together with database management systems which enable the effective handling of databases. Though the terms database and DBMS define different entities, they are inseparable: a database's properties are determined by its supporting DBMS and vice-versa. With the progress in technology in the areas of processors, computer memory, computer storage and computer networks, the sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitudes. For decades it has been unlikely that a complex information system can be built effectively without a proper database supported by a DBMS. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.No widely accepted exact definition exists for DBMS. However, a system needs to provide considerable functionality to qualify as a DBMS. Accordingly its supported data collection needs to meet respective usability requirements (broadly defined by therequirements below) to qualify as a database. Thus, a database and its supporting DBMS are defined here by a set of general requirements listed below. Virtually all existing mature DBMS products meet these requirements to a great extent, while less mature either meet them or converge to meet them.Database researchDatabase research has been an active and diverse area, with many specializations, carried out since the early days of dealing with the database concept in the 1960s. It has strong ties with database technology and DBMS products. Database research has taken place at research and development groups of companies (e.g., notably at IBM Research, who contributed technologies and ideas virtually to any DBMS existing today), research institutes, and Academia. Research has been done both through Theory and Prototypes. The interaction between research and database related product development has been very productive to the database area, and many related key concepts and technologies emerged from it. Notable are the Relational and the Entity-relationship models, related Concurrency,control,techniques,Query,languages,and Query,optimization metho ds, RAID, and more. Research has provided deep insight to virtually all aspects of databases, though not always has been pragmatic, effective (and cannot and should not always be: research is exploratory in nature, and not always leads to accepted or useful ideas). Ultimately market forces and real needs determine the selection of problem solutions and related technologies, also among those proposed by research. However, occasionally, not the best and most elegant solution wins (e.g., SQL). Along their history DBMSs and respective databases, to a great extent, have been the outcome of such research, while real product requirements and challenges triggered database research directions and sub-areas.The database research area has several notable dedicated academic journals and annual conferences(e.g.,ACM PODS, VLDB, IEEE ICDE, and more), as well as an active and quite heterogeneous (subject-wise) research community all over the world. Functional requirementsCertain general functional requirements need to be met in conjunction with a database. They describe what is needed to be defined in a database for any specific application.Defining the structure of data: Data modeling and Data definition languagesThe database needs to be based on a data model that is sufficiently rich to describe in the database all the needed respective application's aspects. A data definition language exists to describe the databases within the data model. Such language is typically data model specific.Manipulating the data: Data manipulation languages and Query languagesA database data model needs support by a sufficiently rich data manipulation language to allow all database manipulations and information generation (from the data) as needed by the respective application. Such language is typically data model specific. Protecting the data: Setting database security types and levelsThe DB needs built-in security means to protect its content (and users) from dangers of unauthorized users (either humans or programs). Protection is also provided from types of unintentional breach. Security types and levels should be defined by the database owners.Database designDatabase design is done before building it to meet needs of end-users within a given application/information-system that the database is intended to support. The database design defines the needed data and data structures that such a database comprises. A design is typically carried out according to the common three architectural levels of a database (see Database architecture above). First, the conceptual level is designed, which defines the over-all picture/view of the database, and reflects all the real-world elements (entities) the database intends to model, as well as the relationships among them. On top of it the external level, various views of the database, are designed according to (possibly completely different) needs of specific end-user types. More external views can be added later. External views requirements may modify the design of the conceptual level (i.e., add/remove entities and relationships), but usually a welldesigned conceptual level for an application well supports most of the needed external views. The conceptual view also determines the internal level (which primarily deals with data layout in storage) to a great extent. External views requirement may add supporting storage structures, like materialized views and indexes, for enhanced performance. Typically the internal layer is optimized for top performance, in an average way that takes into account performance requirements (possibly conflicting) of different external views according to their relative importance. While the conceptual and external levels design can usually be done independently of any DBMS (DBMS-independent design software packages exist, possibly with interfaces to some specific popular DBMSs), the internal level design highly relies on the capabilities and internal data structure of the specific DBMS utilized (see the Implementation section below).A common way to carry out conceptual level design is to use the Entity-relationship model (ERM) (both the basic one, and with possible enhancement that it has gone over), since it provides a straightforward, intuitive perception of an application's elements and semantics. An alternative approach, which preceded the ERM, is using the Relational model and dependencies (mathematical relationships) among data to normalize the database, i.e., to define the ("optimal") relations (data record or tupple types) in the database. Though a large body of research exists for this method it is more complex, less intuitive, and not more effective than the ERM method. Thus normalization is less utilized in practice than the ERM method.The ERM may be less subtle than normalization in several aspects, but it captures the main needed dependencies which are induced bykeys/identifiers of entities and relationships. Also the ERM inherently includes the important inclusion dependencies (i.e., an entity instance that does not exist (has not been explicitly inserted) cannot appear in a relationship with other entities) which usually have been ignored in normalization..Another aspect of database design is its security. It involves both defining access control to database objects (e.g., Entities, Views) as well as defining security levels and methods for the data itself.数据库一个数据库可以被定义成一个相关数据的集合,这个集合尽可能小的冗余为一个或多个应用程序在最理想的方式下服务,存贮数据的目的是使他们与使用数据的程序独立,一种相同的控制方法用于数据库更新数据和修改,恢复已存在的数据,如果一个系统在结构上完全分离,则他们被称为一个数据库集合。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
大量组织机构已经发现,在当今这个充满竞争、快速发展的世界,数据仓库是一个有价值的工具。
在过去的几年中,许多公司已花费数百万美元,建立企业范围的数据仓库。
许多人感到,随着工业竞争的加剧,数据仓库成了必备的最新营销武器——通过更多地了解客户需求而保住客户的途径。
“那么”,你可能会充满神秘地问,“到底什么是数据仓库?”数据仓库已被多种方式定义,使得很难严格地定义它。
宽松地讲,数据仓库是一个数据库,它与组织机构的操作数据库分别维护。
数据仓库系统允许将各种应用系统集成在一起,为统一的历史数据分析提供坚实的平台,对信息处理提供支持。
按照W. H. Inmon,一位数据仓库系统构造方面的领头建筑师的说法,“数据仓库是一个面向主题的、集成的、时变的、非易失的数据集合,支持管理决策制定”。
这个简短、全面的定义指出了数据仓库的主要特征。
四个关键词,面向主题的、集成的、时变的、非易失的,将数据仓库与其它数据存储系统(如,关系数据库系统、事务处理系统、和文件系统)相区别。
让我们进一步看看这些关键特征。
(1)、面向主题的:数据仓库围绕一些主题,如顾客、供应商、产品和销售组织。
数据仓库关注决策者的数据建模与分析,而不是构造组织机构的日常操作和事务处理。
因此,数据仓库排除对于决策无用的数据,提供特定主题的简明视图。
(2)、集成的:通常,构造数据仓库是将多个异种数据源,如关系数据库、一般文件和联机事务处理记录,集成在一起。
使用数据清理和数据集成技术,确保命名约定、编码结构、属性度量的一致性等。
(3)、时变的:数据存储从历史的角度(例如,过去5-10 年)提供信息。
数据仓库中的关键结构,隐式或显式地包含时间元素。
(4)、非易失的:数据仓库总是物理地分离存放数据;这些数据源于操作环境下的应用数据。
由于这种分离,数据仓库不需要事务处理、恢复和并行控制机制。
通常,它只需要两种数据访问:数据的初始化装入和数据访问。
概言之,数据仓库是一种语义上一致的数据存储,它充当决策支持数据模型的物理实现,并存放企业决策所需信息。
数据仓库也常常被看作一种体系结构,通过将异种数据源中的数据集成在一起而构造,支持结构化和启发式查询、分析报告和决策制定。
“好”,你现在问,“那么,什么是建立数据仓库?”根据上面的讨论,我们把建立数据仓库看作构造和使用数据仓库的过程。
数据仓库的构造需要数据集成、数据清理、和数据统一。
利用数据仓库常常需要一些决策支持技术。
这使得“知识工人”(例如,经理、分析人员和主管)能够使用数据仓库,快捷、方便地得到数据的总体视图,根据数据仓库中的信息做出准确的决策。
有些作者使用术语“建立数据仓库”表示构造数据仓库的过程,而用术语“仓库DBMS”表示管理和使用数据仓库。
我们将不区分二者。
“组织机构如何使用数据仓库中的信息?”许多组织机构正在使用这些信息支持商务决策活动,包括:(1)、增加顾客关注,包括分析顾客购买模式(如,喜爱买什么、购买时间、预算周期、消费习惯);(2)、根据季度、年、地区的营销情况比较,重新配置产品和管理投资,调整生产策略;(3)、分析运作和查找利润源;(4)、管理顾客关系、进行环境调整、管理合股人的资产开销。
从异种数据库集成的角度看,数据仓库也是十分有用的。
许多组织收集了形形色色数据,并由多个异种的、自治的、分布的数据源维护大型数据库。
集成这些数据,并提供简便、有效的访问是非常希望的,并且也是一种挑战。
数据库工业界和研究界都正朝着实现这一目标竭尽全力。
对于异种数据库的集成,传统的数据库做法是:在多个异种数据库上,建立一个包装程序和一个集成程序(或仲裁程序)。
这方面的例子包括IBM 的数据连接程序和Informix的数据刀。
当一个查询提交客户站点,首先使用元数据字典对查询进行转换,将它转换成相应异种站点上的查询。
然后,将这些查询映射和发送到局部查询处理器。
由不同站点返回的结果被集成为全局回答。
这种查询驱动的方法需要复杂的信息过滤和集成处理,并且与局部数据源上的处理竞争资源。
这种方法是低效的,并且对于频繁的查询,特别是需要聚集操作的查询,开销很大。
对于异种数据库集成的传统方法,数据仓库提供了一个有趣的替代方案。
数据仓库使用更新驱动的方法,而不是查询驱动的方法。
这种方法将来自多个异种源的信息预先集成,并存储在数据仓库中,供直接查询和分析。
与联机事务处理数据库不同,数据仓库不包含最近的信息。
然而,数据仓库为集成的异种数据库系统带来了高性能,因为数据被拷贝、预处理、集成、注释、汇总,并重新组织到一个语义一致的数据存储中。
在数据仓库中进行的查询处理并不影响在局部源上进行的处理。
此外,数据仓库存储并集成历史信息,支持复杂的多维查询。
这样,建立数据仓库在工业界已非常流行。
1.操作数据库系统与数据仓库的区别由于大多数人都熟悉商品关系数据库系统,将数据仓库与之比较,就容易理解什么是数据仓库。
联机操作数据库系统的主要任务是执行联机事务和查询处理。
这种系统称为联机事务处理(OLTP)系统。
它们涵盖了一个组织的大部分日常操作,如购买、库存、制造、银行、工资、注册、记帐等。
另一方面,数据仓库系统在数据分析和决策方面为用户或“知识工人”提供服务。
这种系统可以用不同的格式组织和提供数据,以便满足不同用户的形形色色需求。
这种系统称为联机分析处理(OLAP)系统。
OLTP 和OLAP 的主要区别概述如下。
(1)、用户和系统的面向性:OLTP 是面向顾客的,用于办事员、客户、和信息技术专业人员的事务和查询处理。
OLAP 是面向市场的,用于知识工人(包括经理、主管、和分析人员)的数据分析。
(2)、数据内容:OLTP 系统管理当前数据。
通常,这种数据太琐碎,难以方便地用于决策。
OLAP 系统管理大量历史数据,提供汇总和聚集机制,并在不同的粒度级别上存储和管理信息。
这些特点使得数据容易用于见多识广的决策。
(3)、数据库设计:通常,OLTP 系统采用实体-联系(ER)模型和面向应用的数据库设计。
而OLAP 系统通常采用星形或雪花模型和面向主题的数据库设计。
(4)、视图:OLTP 系统主要关注一个企业或部门内部的当前数据,而不涉及历史数据或不同组织的数据。
相比之下,由于组织的变化,OLAP 系统常常跨越数据库模式的多个版本。
OLAP 系统也处理来自不同组织的信息,由多个数据存储集成的信息。
由于数据量巨大,OLAP 数据也存放在多个存储介质上。
(5)、访问模式:OLTP 系统的访问主要由短的、原子事务组成。
这种系统需要并行控制和恢复机制。
然而,对OLAP 系统的访问大部分是只读操作(由于大部分数据仓库存放历史数据,而不是当前数据),尽管许多可能是复杂的查询。
OLTP 和OLAP 的其它区别包括数据库大小、操作的频繁程度、性能度量等。
2.但是,为什么需要一个分离的数据仓库“既然操作数据库存放了大量数据”,你注意到,“为什么不直接在这种数据库上进行联机分析处理,而是另外花费时间和资源去构造一个分离的数据仓库?”分离的主要原因是提高两个系统的性能。
操作数据库是为已知的任务和负载设计的,如使用主关键字索引和散列,检索特定的记录,和优化“罐装的”查询。
另一方面,数据仓库的查询通常是复杂的,涉及大量数据在汇总级的计算,可能需要特殊的数据组织、存取方法和基于多维视图的实现方法。
在操作数据库上处理OLAP 查询,可能会大大降低操作任务的性能。
此外,操作数据库支持多事务的并行处理,需要加锁和日志等并行控制和恢复机制,以确保一致性和事务的强健性。
通常,OLAP 查询只需要对数据记录进行只读访问,以进行汇总和聚集。
如果将并行控制和恢复机制用于这种OLAP 操作,就会危害并行事务的运行,从而大大降低OLTP 系统的吞吐量。
最后,数据仓库与操作数据库分离是由于这两种系统中数据的结构、内容和用法都不相同。
决策支持需要历史数据,而操作数据库一般不维护历史数据。
在这种情况下,操作数据库中的数据尽管很丰富,但对于决策,常常还是远远不够的。
决策支持需要将来自异种源的数据统一(如,聚集和汇总),产生高质量的、纯净的和集成的数据。
相比之下,操作数据库只维护详细的原始数据(如事务),这些数据在进行分析之前需要统一。
由于两个系统提供很不相同的功能,需要不同类型的数据,因此需要维护分离的数据库。
Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon ——a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling andanalysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging.Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1. Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2. But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.。