毕业论文外文翻译-数据仓库
外文文献翻译 An Introduction to Database Management System
英文翻译数据库管理系统的介绍Raghu Ramakrishnan数据库(database,有时被拼作data base)又称为电子数据库,是专门组织起来的一组数据或信息,其目的是为了便于计算机快速查询及检索。
数据库的结构是专门设计的,在各种数据处理操作命令的支持下,可以简化数据的存储、检索、修改和删除。
数据库可以存储在磁盘、磁带、光盘或其他辅助存储设备上。
数据库由一个或一套文件组成,其中的信息可以分解为记录,每一条记录又包含一个或多个字段(或称为域)。
字段是数据存取的基本单位。
数据库用于描述实体,其中的一个字段通常表示与实体的某一属性相关的信息。
通过关键字以及各种分类(排序)命令,用户可以对多条记录的字段进行查询,重新整理,分组或选择,以实体对某一类数据的检索,也可以生成报表。
所有数据库(除最简单的)中都有复杂的数据关系及其链接。
处理与创建,访问以及维护数据库记录有关的复杂任务的系统软件包叫做数据库管理系统(DBMS)。
DBMS软件包中的程序在数据库与其用户间建立接口。
(这些用户可以是应用程序员,管理员及其他需要信息的人员和各种操作系统程序)DBMS可组织、处理和表示从数据库中选出的数据元。
该功能使决策者能搜索、探查和查询数据库的内容,从而对正规报告中没有的,不再出现的且无法预料的问题做出回答。
这些问题最初可能是模糊的并且(或者)是定义不恰当的,但是人们可以浏览数据库直到获得所需的信息。
简言之,DBMS将“管理”存储的数据项和从公共数据库中汇集所需的数据项用以回答非程序员的询问。
DBMS由3个主要部分组成:(1)存储子系统,用来存储和检索文件中的数据;(2)建模和操作子系统,提供组织数据以及添加、删除、维护、更新数据的方法;(3)用户和DBMS之间的接口。
在提高数据库管理系统的价值和有效性方面正在展现以下一些重要发展趋势:1.管理人员需要最新的信息以做出有效的决策。
2.客户需要越来越复杂的信息服务以及更多的有关其订单,发票和账号的当前信息。
外文文献-中文翻译-数据库
外文文献-中文翻译-数据库英文原文2:《DBA Survivor: Become a Rock Star DBA》by Thomas LaRock,Published By Apress.2010You know that a database is a collection of logically related data elements that may be structured in various ways lo meet the multiple processing and retrieval needs of organizations and individuals. There’s nothing new about databases—early ones were chiseled in stone, penned on scrolls, and written on index cards. But now databases are commonly recorded on magnetizable media, and computer programs are required to perform the necessary storage and retrieval operations.Yo u’ll see in the following pages that complex data relationships and linkages may be found in all but the simplest databases. The system software package that handles the difficult tasks associated with creating, accessing, and maintaining database records is called a database management system (DBMS) .The programs in a DBMS package establish an interface between the database itself and the users of the database. (These users may be applications programmers, managers and others with information needs, and various OS programs.)A DBMS can organize, process, and present selected data elements from the database. This capability enables decision makers to search, probe, and query database contents in order to extract answers to nonrecurring and unplanned questions (hat aren't available in regular reports. These questions might initially be vague and / or poorly defined, but peo ple can "browse” through the database until they have the needed information. Inshort, the DBMS will “m anage”the stored data items and assemble the needed items from the common database in response to the queries of those who aren’t10programmers. In a file-oriented system, users needing special information may communicate their needs to a programmer, who, when time permits, will write one or more programs to extract the data and prepare the information[4].The availability of a DBMS, however, offers users a much faster alternative communications path.If the DBMS provides a way to interactively and update the database, as well as interrogate it capability allows for managing personal data-Aces however, it does not automatically leave an audit trail of actions and docs not provide the kinds of control a necessary in a multiuser organization. These-controls arc only available when a set of application programs arc customized for each data entry and updating function.Software for personal computers which perform me of the DBMS functions have been very popular. Personal computers were intended for use by individuals for personal information storage and process- These machines have also been used extensively small enterprises, professionals like doctors, acrylics, engineers, lasers and so on .By the nature of intended usage, database systems on these machines except from several of the requirements of full doge database systems. Since data sharing is not tended, concurrent operations even less so. the fewer can be less complex. Security and integrity maintenance arc de-emphasized or absent. As data limes will be small, performance efficiency is also important. In fact, the only aspect of a database system that is important is data Independence. Data-dependence, as stated earlier, means that applicant programs and user queries need not recognizant physical organization of data on secondary storage. The importance of this aspect, particularly for the personal computer user, is that this greatly simplifies database usage. The user can store, access and manipulate data a( a high level (close to (he application) and be totally shielded from the10low level (close to the machine) details of data organization. We will not discuss details of specific PC DBMS software packages here. Let us summarize in the following the strengths and weaknesses of personal computer data-base software systems:The most obvious positive factor is the user friendliness of the software. A user with no prior computer background would be able to use the system to store personal and professional data, retrieve and perform relayed processing. The user should, of course, satiety himself about the quality of software and the freedom from errors (bugs) so that invest-merits in data arc protected.For the programmer implementing applications with them, the advantage lies in the support for applications development in terms of input screen generations, output report generation etc. offered by theses stems.The main negative point concerns absence of data protection features. Unless encrypted, data cane accessed by whoever has access to the machine Data can be destroyed through mistakes or malicious intent. The second weakness of many of the PC-based systems is that of performance. If data volumes grow up to a few thousands of records, performance could be a bottleneck.For organization where growth in data volumes is expected, availability of. the same or compatible software on large machines should be considered.This is one of the most common misconceptions about database management systems that are used in personal computers. Thoroughly comprehensive and sophisticated business systems can be developed in dBASE, Paradox and other DBMSs. However, they are created by experienced programmers using the DBMS's own programming language. Thai is not the same as users who create and manage personal10files that are not part of the mainstream company system.Transaction Management of DatabaseThe objective of long-duration transactions is to model long-duration, interactive Database access sessions in application environments. The fundamental assumption about short-duration of transactions that underlies the traditional model of transactions is inappropriate for long-duration transactions. The implementation of the traditional model of transactions may cause intolerably long waits when transactions aleph to acquire locks before accessing data, and may also cause a large amount of work to be lost when transactions are backed out in response to user-initiated aborts or system failure situations.The objective of a transaction model is to pro-vide a rigorous basis for automatically enforcing criterion for database consistency for a set of multiple concurrent read and write accesses to the database in the presence of potential system failure situations. The consistency criterion adopted for traditional transactions is the notion of scrializability. Scrializa-bility is enforced in conventional database systems through theuse of locking for automatic concurrency control, and logging for automatic recovery from system failure situations. A “transaction’’ that doesn't provide a basis for automatically enforcing data-base consistency is not really a transaction. To be sure, a long-duration transaction need not adopt seri-alizability as its consistency criterion. However, there must be some consistency criterion.Version System Management of DatabaseDespite a large number of proposals on version support in the context of computer aided design and software engineering, the absence of a consensus on version semantics10has been a key impediment to version support in database systems. Because of the differences between files and databases, it is intuitively clear that the model of versions in database systems cannot be as simple as that adopted in file systems to support software engineering.For data-bases, it may be necessary to manage not only versions of single objects (e.g. a software module, document, but also versions of a collection of objects (e.g. a compound document, a user manual, etc. and perhaps even versions of the schema of database (c.g. a table or a class, a collection of tables or classes).Broadly, there arc three directions of research and development in versioning. First is the notion of a parameterized versioning", that is, designing and implementing a versioning system whose behavior may be tailored by adjusting system parameters This may be the only viable approach, in view of the fact that there are various plausible choices for virtually every single aspect of versioning.The second is to revisit these plausible choices for every aspect of versioning, with the view to discardingsome of themes either impractical or flawed. The third is the investigation into the semantics and implementation of versioning collections of objects and of versioning the database.There is no consensus of the definition of the te rm “management information system”. Some writers prefer alternative terminology such as “information processing system”, "information and decision syste m, “organizational information syste m”, or simply “i nformat ion system” to refer to the computer-based information processing system which supports the operations, management, and decision-making functions of an organization. This text uses “MIS” because i t is descriptive and generally understood; it also frequently uses "information system”instead of ''MIS” t o refer to an organizational information system.10A definition of a management information system, as the term is generally understood, is an integrated, user-machine system for providing information 丨o support operations, management, and decision-making functions in an organization. The system utilizes computer hardware and software; manual procedures: models for analysis planning, control and decision making; and a database. The fact that it is an integrated system does not mean that it is a single, monolithic structure: rather, ii means that the parts fit into an overall design. The elements of the definition arc highlighted below: Computer-based user-machine system.Conceptually, a management information can exist without computer, but it is the power of the computer which makes MIS feasible. The question is not whether computers should be used in management information system, but the extent to whichinformation use should be computerized. The concept of a user-machine system implies that some (asks are best performed humans, while others are best done by machine. The user of an MIS is any person responsible for entering input da(a, instructing the system, or utilizing the information output of the system. For many problems, the user and the computer form a combined system with results obtained through a set of interactions between the computer and the user.User-machine interaction is facilitated by operation in which the user's input-output device (usually a visual display terminal) is connected lo the computer. The computer can be a personal computer serving only one user or a large computer that serves a number of users through terminals connected by communication lines. The user input-output device permits direct input of data and immediate output of results. For instance, a person using The computer interactively in financial planning poses 4t what10if* questions by entering input at the terminal keyboard; the results are displayed on the screen in a few second.The computer-based user-machine characteristics of an MIS affect the knowledge requirements of both system developer and system user, “computer-based” means that the designer of a management information system must have a knowledge of computers and of their use in processing. The “user-machine” concept means the system designer should also understand the capabilities of humans as system components (as information processors) and the behavior of humans as users of information.Information system applications should not require users Co be computer experts. However, users need to be able lo specify(heir information requirements; some understanding of computers, the nature of information, and its use in various management function aids users in this task.Management information system typically provide the basis for integration of organizational information processing. Individual applications within information systems arc developed for and by diverse sets of users. If there are no integrating processes and mechanisms, the individual applications may be inconsistent and incompatible. Data item may be specified differently and may not be compatible across applications that use the same data. There may be redundant development of separate applications when actually a single application could serve more than one need. A user wanting to perform analysis using data from two different applications may find the task very difficult and sometimes impossible.The first step in integration of information system applications is an overall information system plan. Even though application systems are implemented one at a10time, their design can be guided by the overall plan, which determines how they fit in with other functions. In essence, the information system is designed as a planed federation of small systems.Information system integration is also achieved through standards, guidelines, and procedures set by the MIS function. The enforcement of such standards and procedures permit diverse applications to share data, meet audit and control requirements, and be shares by multiple users. For instance, an application may be developed to run on a particular small computer. Standards for integration may dictate that theequipment selected be compatible with the centralized database. The trend in information system design is toward separate application processing form the data used to support it. The separate database is the mechanism by which data items are integrated across many applications and made consistently available to a variety of users. The need for a database in MIS is discussed below.The term “information” and “data” are frequently used interchangeably; However, information is generally defined as data that is meaningful or useful to The recipient. Data items are therefore the raw material for producing information.The underlying concept of a database is that data needs to be managed in order to be available for processing and have appropriate quality. This data management includes both software and organization. The software to create and manage a database is a database management system.When all access to any use of database is controlled through a database management system, all applications utilizing a particular data item access the same data item which is stored in only one place. A single updating of the data item updates it for10all uses. Integration through a database management system requires a central authority for the database. The data can be stored in one central computer or dispersed among several computers; the overriding requirement is that there be an organizational function to exercise control.It is usually insufficient for human recipients to receive only raw data or even summarized data. Data usually needs to be processed and presented in such a way that Che result is directed toward the decision to be made. To do this, processing of dataitems is based on a decision model.For example, an investment decision relative to new capital expenditures might be processed in terms of a capital expenditure decision model.Decision models can be used to support different stages in the decision-making process. “Intelligence’’ models can be used to search for problems and/or opportunities. Models can be used to identify and analyze possible solutions. Choice models such as optimization models maybe used to find the most desirable solution.In other words, multiple approaches are needed to meet a variety of decision situations. The following are examples and the type of model that might be included in an MIS to aid in analysis in support of decision-making; in a comprehensive information system, the decision maker has available a set of general models that can be applied to many analysis and decision situations plus a set of very specific models for unique decisions. Similar models are available tor planning and control. The set of models is the model base for the MIS.Models are generally most effective when the manager can use interactive dialog (o build a plan or to iterate through several decision choices under different conditions.10中文译文2:《数据库幸存者:成为一个摇滚名明星》众所周知,数据库是逻辑上相关的数据元的汇集.这些数据元可以按不同的结构组织起来,以满足单位和个人的多种处理和检索的需要。
毕业论文英文参考文献及译文
Inventory managementInventory ControlOn the so-called "inventory control", many people will interpret it as a "storage management", which is actually a big distortion.The traditional narrow view, mainly for warehouse inventory control of materials for inventory, data processing, storage, distribution, etc., through the implementation of anti-corrosion, temperature and humidity control means, to make the custody of the physical inventory to maintain optimum purposes. This is just a form of inventory control, or can be defined as the physical inventory control. How, then, from a broad perspective to understand inventory control? Inventory control should be related to the company's financial and operational objectives, in particular operating cash flow by optimizing the entire demand and supply chain management processes (DSCM), a reasonable set of ERP control strategy, and supported by appropriate information processing tools, tools to achieved in ensuring the timely delivery of the premise, as far as possible to reduce inventory levels, reducing inventory and obsolescence, the risk of devaluation. In this sense, the physical inventory control to achieve financial goals is just a means to control the entire inventory or just a necessary part; from the perspective of organizational functions, physical inventory control, warehouse management is mainly the responsibility of The broad inventory control is the demand and supply chain management, and the whole company's responsibility.Why until now many people's understanding of inventory control, limited physical inventory control? The following two reasons can not be ignored: First, our enterprises do not attach importance to inventory control. Especially those who benefit relatively good business, as long as there is money on the few people to consider the problem of inventory turnover. Inventory control is simply interpreted as warehouse management, unless the time to spend money, it may have been to see the inventory problem, and see the results are often very simple procurement to buy more, or did not do warehouse departments .Second, ERP misleading. Invoicing software is simple audacity to call it ERP, companies on their so-called ERP can reduce the number of inventory, inventory control, seems to rely on their small software can get. Even as SAP, BAAN ERP world, the field of these big boys, but also their simple modules inside the warehouse management functionality is defined as "inventory management" or "inventory control." This makes the already not quite understand what our inventory control, but not sure what is inventory control.In fact, from the perspective of broadly understood, inventory control, should include the following:First, the fundamental purpose of inventory control. We know that the so-called world-class manufacturing, two key assessment indicators (KPI) is, customer satisfaction and inventory turns, inventory turns and this is actually the fundamental objective of inventory control.Second, inventory control means. Increase inventory turns, relying solely on the so-called physical inventory control is not enough, it should be the demand and supply chain management process flow of this large output, and this big warehouse management processes in addition to including this link, the more important The section also includes: forecasting and order processing, production planning and control, materials planning and purchasing control, inventory planning and forecasting in itself, as well as finished products, raw materials, distribution and delivery of the strategy, and even customs management processes. And with the demand and supply chain management processes throughout the process, it is the information flow and capital flow management. In other words, inventory itself is across the entire demand and supply management processes in all aspects of inventory control in order to achieve the fundamental purpose, it must control all aspects of inventory, rather than just manage the physical inventory at hand.Third, inventory control, organizational structure and assessment. Since inventory control is the demand and supply chain management processes, output, inventory control to achieve the fundamental purpose of this process must be compatible with a rational organizational structure. Until now, we can seethat many companies have only one purchasing department, purchasing department following pipe warehouse. This is far short of inventory control requirements. From the demand and supply chain management process analysis, we know that purchasing and warehouse management is the executive arm of the typical, and inventory control should focus on prevention, the executive branch is very difficult to "prevent inventory" for the simple reason that they assessment indicators in large part to ensure supply (production, customer). How the actual situation, a reasonable demand and supply chain management processes, and thus set the corresponding rational organizational structure and is a question many of our enterprises to exploreThe role of inventory controlInventory management is an important part of business management. In the production and operation activities, inventory management must ensure that both the production plant for raw materials, spare parts demand, but also directly affect the purchasing, sales of share, sales activities. To make an inventory of corporate liquidity, accelerate cash flow, the security of supply under the premise of minimizing Yaku funds, directly affects the operational efficiency. Ensure the production and operation needs of the premise, so keep inventories at a reasonable level; dynamic inventory control, timely, appropriate proposed order to avoid over storage or out of stock; reduce inventory footprint, lower total cost of inventory; control stock funds used to accelerate cash flow.Problems arising from excessive inventory: increased warehouse space and inventory storage costs, thereby increasing product costs; take a lot of liquidity, resulting in sluggish capital, not only increased the burden of payment of interest, etc., would affect the time value of money and opportunity income; finished products and raw materials caused by physical loss and intangible losses; a large number of enterprise resource idle, affecting their rational allocation and optimization; cover the production, operation of the whole process of the various contradictions and problems, is not conducive to improve the management level.Inventory is too small the resulting problems: service levels caused a decline in the profit impact of marketing and corporate reputation; production system caused by inadequate supply of raw materials or other materials, affecting the normal production process; to shorten lead times, increase the number of orders, so order (production) costs; affect the balance of production and assembly of complete sets.NotesInventory management should particularly consider the following two questions:First, according to sales plans, according to the planned production of the goods circulated in the market, we should consider where, how much storage.Second, starting from the level of service and economic benefits to determine how to ensure inventories and supplementary questions.The two problems with the inventory in the logistics process functions. In general, the inventory function:(1) to prevent interrupted. Received orders to shorten the delivery of goods from the time in order to ensure quality service, at the same time to prevent out of stock.(2) to ensure proper inventory levels, saving inventory costs.(3) to reduce logistics costs. Supplement with the appropriate time interval compatible with the reasonable demand of the cargo in order to reduce logistics costs, eliminate or avoid sales fluctuations.(4) ensure the production planning, smooth to eliminate or avoid sales fluctuations.(5) display function.(6) reserve. Mass storage when the price falls, reduce losses, to respond to disasters and other contingencies.About the warehouse (inventory) on what the question, we must consider the number and location. If the distribution center, it should be possible according to customer needs, set at an appropriate place; if it is stored incentral places to minimize the complementary principle to the distribution centers, there is no place certain requirements. When the stock base is established, will have to take into account are stored in various locations in what commodities.库存管理库存控制在谈到所谓“库存控制”的时候,很多人将其理解为“仓储管理”,这实际上是个很大的曲解。
毕业论文-外文资料翻译[管理资料]
附件1:外文资料翻译译文数据库简介1.数据库管理系统(DBMS)。
众所周知,数据库是逻辑上相关的数据元的集合。
这些数据元可以按不同的结构组织起来,以满足单位和个人的多种处理和检索的需要。
数据库本身不是什么新鲜事——早期的数据库记录在石头上或写在名册上,以及写入索引卡中。
而现在,数据库普遍记录再可磁化的介质上,并且需要用计算机程序来执行必需的存储和检索操作。
在后文中你将看到除了简单的以外,所有数据库中都有复杂的数据关系及其连接。
处理与创建、访问以及维护数据库记录有关的复杂任务的系统软件包叫做数据库管理系统(DBMS)。
DBMS软件包中的程序在数据库极其用户间建立了接口(这些用户可以是应用程序员、管理员以及其他需要信息和各种操作系统的人员)。
DBMS可组织、处理和显示从数据库中选择的数据元。
该功能使决策者可以搜索、试探和查询数据库的内容,从而对在正式报告中没有的、不再出现的且无计划的问题作出回答。
这些问题最初可能是模糊的并且是定义不清的,但是人们可以浏览数据库直到获得问题的答案。
也就是说DBMS将“管理”存储的数据项,并从公共数据库中汇集所需的数据项以回答那些非程序员的询问。
在面向文件的系统中,需要特定信息的用户可以将他们的要求传送给程序员。
该程序员在时间允许时,将编写一个或多个程序以提取数据和准备信息。
但是,使用DBMS可为用户提供一种更快的、用户可以选择的通信方式。
顺序的、直接的以及其它的文件处理方式常用于单个文件中数据的组织和构造,而DBMS能够访问和检索非关键记录字段的数据,即DBMS能够将几个大文件中逻辑相关的数据组织并连接在一起。
逻辑结构。
确定这些逻辑关系是数据管理者的任务,由数据定义语言完成。
DBMS 在存储、访问和检索操作过程中可选用以下逻辑结构技术:(1)表结构。
在该逻辑方式中,记录通过指针链接在一起。
指针是记录中的一个数据项,它指出另一个逻辑相关的记录的存储位置,例如,顾客主文件的记录将包含每个顾客的姓名和地址,而且该文件中的每个记录都由一个帐号标识。
数据库安全中英文对照外文翻译文献
中英文对照外文翻译文献(文档含英文原文和中文翻译)Database Security in a Web Environment IntroductionDatabases have been common in government departments and commercial enterprises for many years. Today, databases in any organization are increasingly opened up to a multiplicity of suppliers, customers, partners and employees - an idea that would have been unheard of a few years ago. Numerous applications and their associated data are now accessed by a variety of users requiring different levels of access via manifold devices and channels – often simultaneously. For example:• Online banks allow customers to perform a variety of banking operations - via the Internet and over the telephone – whilst maintaining the privacy of account data.• E-Commerce merchants and their Service Providers must store customer, order and payment data on their merchant server - and keep it secure.• HR departments allow employees to update their personal information –whilst protecting certain management information from unauthorized access.• The medical profession must protect the confidentiality of patient data –whilst allowing essential access for treatment.• Online brokerages need to be able to provide large numbers of simultaneous users with up-to-date and accurate financial information.This complex landscape leads to many new demands upon system security. The global growth of complex web-based infrastructures is driving a need for security solutions that provide mechanisms to segregate environments; perform integrity checking and maintenance; enable strong authentication andnon-repudiation; and provide for confidentiality. In turn, this necessitates comprehensive business and technical risk assessment to identify the threats,vulnerabilities and impacts, and from this define a security policy. This leads to security definitions throughout the infrastructure - operating system, database management system, middleware and network.Financial, personal and medical information systems and some areas of government have strict requirements for security and privacy. Inappropriate disclosure of sensitive information to the wrong parties can have severe social, legal and regulatory consequences. Failure to address the basics can result in substantial direct and consequential financial losses - witness the fraud losses through the compromise of several million credit card numbers in merchants’ databases [Occf], plus associated damage to brand-image and loss of consumer confidence.This article discusses some of the main issues in database and web server security, and also considers important architecture and design issues.A Simple ModelAt the simplest level, a web server system consists of front-end software and back-end databases with interface software linking the two. Normally, the front-end software will consist of server software and the network server operating system, and the back-end database will be a relational orobject-oriented database fulfilling a variety of functions, including recording transactions, maintaining accounts and inventory. The interface software typically consists of Common Gateway Interface (CGI) scripts used to receive information from forms on web sites to perform online searches and to update the database.Depending on the infrastructure, middleware may be present; in addition, security management subsystems (with session and user databases) that address the web server’s and related applications’ requirements for authentication, accesscontrol and authorization may be present. Communications between this subsystem and either the web server, middleware or database are via application program interfaces (APIs)..This simple model is depicted in Figure 1.Security can be provided by the following components:• Web server.• Middleware.• Operating system.. Figure 1: A Simple Model.• Database and Database Management System.• Security management subsystem.The security of such a system addressesAspects of authenticity, integrity and confidentiality and is dependent on the security of the individual components and their interactions. Some of the most common vulnerabilities arise from poor configuration, inadequate change control procedures and poor administration. However, even if these areas are properlyaddressed, vulnerabilities still arise. The appropriate combination of people, technology and processes holds the key to providing the required physical and logical security. Attention should additionally be paid to the security aspects of planning, architecture, design and implementation.In the following sections, we consider some of the main security issues associated with databases, database management systems, operating systems and web servers, as well as important architecture and design issues. Our treatment seeks only to outline the main issues and the interested reader should refer to the references for a more detailed description.Database SecurityDatabase management systems normally run on top of an operating system and provide the security associated with a database. Typical operating system security features include memory and file protection, resource access control and user authentication. Memory protection prevents the memory of one program interfering with that of another and limits access and use of the objects employing techniques such as memory segmentation. The operating system also protects access to other objects (such as instructions, input and output devices, files and passwords) by checking access with reference to access control lists. Security mechanisms in common operating systems vary tremendously and, for those that are lacking, there exists special-purpose security software that can be integrated with the existing environment. However, this can be an expensive, time-consuming task and integration difficulties may also adversely impact application behaviors.Most database management systems consist of a number of modules - including database querying and database and file management - along with authorization, concurrent access and database description tables. Thesemanagement systems also use a variety of languages: a data definition language supports the logical definition of the database; developers use a data manipulation language; and a query language is used by non-specialist end-users.Database management systems have many of the same security requirements as operating systems, but there are significant differences since the former are particularly susceptible to the threat of improper disclosure, modification of information and also denial of service. Some of the most important security requirements for database management systems are: • Multi-Level Access Control.• Confidentiality.• Reliability.• Integrity.• Recovery.These requirements, along with security models, are considered in the following sections.Multi-Level Access ControlIn a multi-application and multi-user environment, administrators, auditors, developers, managers and users – collectively called subjects - need access to database objects, such as tables, fields or records. Access control restricts the operations available to a subject with respect to particular objects and is enforced by the database management system. Mandatory access controls require that each controlled object in the database must be labeled with a security level, whereas discretionary access controls may be applied at the choice of a subject.Access control in database management systems is more complicated than in operating systems since, in the latter, all objects are unrelated whereas in a database the converse is true. Databases are also required to make accessdecisions based on a finer degree of subject and object granularity. In multi-level systems, access control can be enforced by the use of views - filtered subsets of the database - containing the precise information that a subject is authorized to see.A general principle of access control is that a subject with high level security should not be able to write to a lower level object, and this poses a problem for database management systems that must read all database objects and write new objects. One solution to this problem is to use a trusted database management system.ConfidentialitySome databases will inevitably contain what is considered confidential data. For example, it could be inherently sensitive or its source may be sensitive, or it may belong to a sensitive table, thus making it difficult to determine what is actually confidential. Disclosure is also difficult to define, as it can be direct, indirect, involve the disclosure of bounds or even mere existence.An inference problem exists in database management systems whereby users can infer sensitive information from relatively insensitive queries. A trivial example is a request for information about the average salary of an employee and the number of employees turns out to be just one, thus revealing the employee’s salary. However, much more sophisticated statistical inference attacks can also be mounted. This highlights the fact that, although the data itself may be properly controlled, confidential information may still leak out.Controls can take several forms: not divulging sensitive information to unauthorized parties (which depends on the respective subject and object security levels), logging what each user knows or masking response data. The first control can be implemented fairly easily, the second quickly becomesunmanageable for a large number of users and the third leads to imprecise responses, and also exemplifies the trade-off between precision and security. Polyinstantiation refers to multiple instances of a data object existing in the database and it can provide a partial solution to the inference problem whereby different data values are supplied, depending on the security level, in response to the same query. However, this makes consistency management more difficult.Another issue that arises is when the security level of an aggregate amount is different to that of its elements (a problem commonly referred to as aggregation). This can be addressed by defining appropriate access control using views.Reliability, Integrity and RecoveryArguably, the most important requirements for databases are to ensure that the database presents consistent information to queries and can recover from any failures. An important aspect of consistency is that transactions execute atomically; that is, they either execute completely or not at all.Concurrency control addresses the problem of allowing simultaneous programs access to a shared database, while avoiding incorrect behavior or interference. It is normally addressed by a scheduler that uses locking techniques to ensure that the transactions are serial sable and independent. A common technique used in commercial products is two-phase locking (or variations thereof) in which the database management system controls when transactions obtain and release their locks according to whether or not transaction processing has been completed. In a first phase, the database management system collects the necessary data for the update: in a second phase, it updates the database. This means that the database can recover from incomplete transactions by repeatingeither of the appropriate phases. This technique can also be used in a distributed database system using a distributed scheduler arrangement.System failures can arise from the operating system and may result in corrupted storage. The main copy of the database is used for recovery from failures and communicates with a cached version that is used as the working version. In association with the logs, this allows the database to recover to a very specific point in the event of a system failure, either by removing the effects of incomplete transactions or applying the effects of completed transactions. Instead of having to recover the entire database after a failure, recovery can be made more efficient by the use of check pointing. It is used during normal operations to write additional updated information - such as logs, before-images of incomplete transactions, after-images of completed transactions - to the main database which reduces the amount of work needed for recovery. Recovery from failures in distributed systems is more complicated, since a single logical action is executed at different physical sites and the prospect of partial failure arises.Logical integrity, at field level and for the entire database, is addressed by the use of monitors to check important items such as input ranges, states and transitions. Error-correcting and error-detecting codes are also used.Security ModelsVarious security models exist that address different aspects of security in operating systems and database management systems. For example, theBell-LaPadula model defines security in terms of mandatory access control and addresses confidentiality only. The Bell LaPadula models, and other models including the Biba model for integrity, are described more fully in [Cast95] and [Pfle89]. These models are implementation-independent and provide a powerfulinsight into the properties of secure systems, lead to design policies and principles, and some form the basis for security evaluation criteria.Web Server SecurityWeb servers are now one of the most common interfaces between users and back-end databases, and as such, their security becomes increasingly important. Exploitation of vulnerabilities in the web server can lead to unforeseen attacks on middleware and backend databases, bypassing any controls that may be in place. In this section, we focus on common web server vulnerabilities and how the authentication requirements of web servers and databases are met.In general, a web server platform should not be shared with other applications and should be the only machine allowed to access the database. Using a firewall can provide additional security - either between the web server and users or between the web server and back-end database - and often the web server is placed on a de-militarized zone (DMZ) of a firewall. While firewalls can be used to block certain incoming connections, they must allow HTTP (and HTTPS) connections through to the web server, and so attacks can still be launched via the ports associated with these connections.VulnerabilitiesVulnerabilities appear on a weekly basis and, here, we prefer to focus on some general issues rather than specific attacks. Common web server vulnerabilities include:• No policy exists.• The default configuration is on.• Reusable passwords appear in clear.• Unnecessary ports available for network services are not disabled.• New security holes are not tracked. Even if they are, well-known vulnerabilities are not always fixed as the source code patches are not applied by system administrator and old programs are not re-compiled or removed.• Security tools are not used to scan the network for weaknesses and changes or to detect intrusions.• Faulty and buggy software - for example, buffer overflow and stack smashingAttacks• Automatic directory listings - this is of particular concern for the interface software directories.• Server root files are generally visible or accessible.• Lack of logs and bac kups.• File access is often not explicitly configured by the system administrator according to the security policy. This applies to configuration, client, administration and log files, administration programs, and CGI program sources and executables. CGI scripts allow dynamic web pages and make program development (in, for example, Perl) easy and rapid. However, their successful exploitation may allow execution of malicious programs, launching ofdenial-of-service attacks and, ultimately, privilege escalation on a server.Web Server and Database AuthenticationWhile user, browser and web server authentication are relatively well understood [Garf97], [Ghos98] and [Tree98], the introduction of additional components, such as databases and middleware, raise a number of authentication issues. There are a variety of options for authentication in a simple model (Figure 1). Firstly, both the web server and database management system can individually authenticate a user. This option requires the user to authenticatetwice which may be unacceptable in certain applications, although a singlesign-on device (which aims to manage authentication in a user-transparent way) may help. Secondly, a common approach is for the database to automatically grant user access based on web server authentication. However, this option should only be used for accessing publicly available information. Finally, the database may grant user access employing the web server authentication credentials as a basis for its own user authentication, using security management subsystems (Figure 1). We consider this last option in more detail.Web-based communications use the stateless HTTP protocol with the implication that state, and hence authentication, is not preserved when browsing successive web pages. Cookies, or files placed on user’s machine by a web server, were developed as a means of addressing this issue and are often used to provide authentication. However, after initial authentication, there is typically no re authentication per page in the same realm, only the use of unencrypted cookies (sometimes in association with IP addresses). This approach provides limited security as both cookies and IP addresses can be tampered with or spoofed.A stronger authentication method, commonly used by commercial implementations, uses digitally signed cookies. This allows additional systems, such as databases, to use digitally signed cookie data, including a session ID, as a basis for authentication. When a user has been authenticated by a web server (using a password, for example), a session ID is assigned and is stored in a security management subsystem database. When a user subsequently requests information from a database, the database receives a copy of the session ID, the security management subsystem checks this session ID against its local copy and, if authentication is successful, user access is granted to the database.The session ID is typically transmitted in the clear between the web server and database, but may be protected by SSL or even by physical security measures. The communications between the browser and web servers, and the web servers and security management subsystem (and its databases), are normally protected by SSL and use a web server security API that is used to digitally sign and verify browser cookies. The communications between the back-end databases and security management subsystem (and its databases) are also normally protected by SSL and use a database security API that verifies session Ids originating from the database and provides additional user authorization credentials. The web server security API is generally proprietary while, for the database security API, many vendors have adopted standards such as the Generic Security Services API (GSS-API) or CORBA [RFC2078] and [Corba].Architecture and DesignSecurity requirements for designing, building and implementing databases are important so that the systems, as part of the overall infrastructure, meet their requirements in actual operation. The various security models provide an important insight into the design requirements for databases and their management systems.Secure Database Management System ArchitecturesIn multi-level database management systems, a variety of architectures are possible: trusted subject, integrity locked, kernels and replicated. Trusted subject is used by most of the leading database management system vendors and can be integrated in existing products. Basically, the trusted subject architecture allows users to access a database via an un trusted front-end, a trusted database management system and trusted operating system. The operating systemprovides physical access to the database and the database management system provides multilevel object protection.The other architectures - integrity locked, kernels and replicated - all vary in detail, but they use a trusted front-end and an un trusted database management system. For details of these architectures and research prototypes, the reader is referred to [Cast95]. Different architectures are suited to different environments: for example, the trusted subject architecture is less integrated with the underlying operating system and is best suited when a trusted path can be assured between applications and the database management system.Secure Database Management System DesignAs discussed above, there are several fundamental differences between operating system and database management system design, including object granularity, multiple data types, data correlations and multi-level transactions. Other differences include the fact that database management systems include both physical and logical objects and that the database lifecycle is normally longer.These differences must be reflected in the design requirements which include:• Access, flow and infer ence controls.• Access granularity and modes.• Dynamic authorization.• Multi-level protection.• Polyinstantiation.• Auditing.• Performance.These requirements should be considered alongside basic information integrity principles, such as:• Well-formed transactions - to ensure that transactions are correct and consistent.• Continuity of operation - to ensure that data can be properly recovered, depending on the extent of a disaster.• Authorization and role management – to ensure that distinct roles are defined and users are authorized.• Authenticated users - to ensure that users are authenticated.• Least privilege - to ensure that users have the minimal privilege necessary to perform their tasks.• Separation of duties - to ensure that no single individual has access to critical data.• Delegation of authority - to ensure that the database management system policies are flexible enough to meet the organization’s requirements.Of course, some of these requirements and principles are not met by the database management system, but by the operating system and also by organizational and procedural measures.Database Design MethodologyVarious approaches to design exist, but most contain the same main stages. The principle aim of a design methodology is to provide a robust, verifiable design process and also to separate policies from how policies are actually implemented. An important requirement during any design process is that different design aspects can be merged and this equally applies to security.A preliminary analysis should be conducted that addresses the system risks, environment, existing products and performance. Requirements should then beanalyzed with respect to the results of a risk assessment. Security policies should be developed that include specification of granularity, privileges and authority.These policies and requirements form the input to the conceptual design that concentrates on subjects, objects and access modes without considering implementation details. Its purpose is to express information and process flows in a complete and consistent way.The logical design takes into account the operating system and database management system that will be used and which of the security requirements can be provided by which mechanisms. The physical design considers the actual physical realization of the logical design and, indeed, may result in a revision of the conceptual and logical phases due to physical constraints.Security AssuranceOnce a product has been developed, its security assurance can be assessed by a number of methods including formal verification, validation, penetration testing and certification. For example, if a database is to be certified as TCSEC Class B1, then it must implement the Bell-LaPadula mandatory access control model in which each controlled object in the database must be labeled with a security level.Most of these methods can be costly and lengthy to perform and are typically specific to particular hardware and software configurations. However, the international Common Criteria certification scheme provides the added benefit of a mutual recognition arrangement, thus avoiding the prospect of multiple certifications in different countries.ConclusionThis article has considered some of the security principles that are associated with databases and how these apply in a web based environment. Ithas also focused on important architecture and design principles. These principles have focused mainly on the prevention, assurance and recovery aspects, but other aspects, such as detection, are equally important in formulating a total information protection strategy. For example, host-based intrusion detection systems as well as a robust and tested set of business recovery procedures should be considered.Any fit-for-purpose, secure e-business infrastructure should address all the above aspects: prevention, assurance, detection and recovery. Certain industries are now starting to specify their own set of global, secure e-business requirements. International card payment associations have recently started to require minimum information security standards from electronic commerce merchants handling credit card data, to help manage fraud losses and associated impacts such as brand-image damage and loss of consumer confidence.网络环境下的数据库安全简介数据库在政府部门和商业机构得到普遍应用已经很多年了。
数据库毕业设计---外文翻译
附录附录A: 外文资料翻译-原文部分:CUSTOMER TARGETTINGThe earliest determinant of success in the development of a profitable card scheme will lie in the quality of applicants that are attracted by the marketing effort. Not only must there be sufficient creditworthy applicants to avoid fruitless and expensive application processing, but it is critical that the overall mix of new accounts meets the standard necessary to ensure ultimate profitability. For example, the marketing initiatives may attract sufficient volume of applicants that are assessed as above the scorecard cut-off, but the proportion of acceptances in the upper bands may be insufficient to deliver the level of profit and lesser bad debt required to achieve the financial objectives of the scheme.This chapter considers the range of data sources available to support the development of a credit card scheme and the tools that can be applied to maximize the flow of applications from the required categories.Data availabilityThe data that makes up the ingredients from which marketing campaigns can be constructed can come from many diverse sources. Typically, it will fall into four categories:1 the national or regional register of voters;2 the national or regional register of court judgments that records the outcomeof creditor-debtor legislation;3 any national or regional pooled information showing the credit history of clients of the participating lenders; and4 commercially compiled data including and culled from name and address lists, survey results and other market analysis data, e.g. neighborhoods and lifestyle categorization through geo-demographic information systems.The availability and quality of this data will vary from country to country and bureau to bureau.Availability is not only governed by the extent to which the responsible agency has undertaken to record it, but also by the feasibility of accessing the data and the extent (if any) to which local consumer legislation or other considerations (e.g. religious principles) will allow it to be used. Other limitations on the use of available data may lie in the simple impossibility or expense of accessing the information sources, perhaps because necessary consumer consent for divulgence has been withheld or because the records are not yet stored electronically.The local credit information bureaux will be able to provide guidance on all of these matters, as will many local trade or professional associations or the relevant government departments.Data segmentation and AnalysesThe following remarks deal with the ways in which lawfully obtained data may then be processed and analyzed in order to maximize its value as the basis of a marketing prospect list. Examples of the types and uses of data that will play a role in the credit decision area are discussed later in the chapter, within the context of application processing.The key categories into which prospects may be segmented include lifestyle, propensity to purchase specific products (financial or otherwise) and levels of risk. The leading international information bureaux will be able to provide segmentation systems that are able to correlate each of these data categories to provide meaningful prospect lists in rank order. Additionally, many bureaux will have the capability to further enhance the strength and value of the data. Through the selective purchasing of data from bona fide market sources, and by overlaying generic factors deduced from the analysis of the broad mass of industry information that routinely passes through their systems, the best international operators are now able to offer marketing and credit information support that can add significantly to the quality of new applicants.The importance of the role and standard of this data in influencing the quality of the target population for mailings, etc. should not be underestimated. Information that is dated or inaccurate may not only lead a marketer and the organization into embarrassment and damage their reputations, but it will also open the credit card scheme to applicants from outside either the target sector or ,worse still, applicants outside the lender’s view of an acceptable credit risk.From this, it follows that you should seek to use an information bureau whose business principles and operating practices comply with the highest levels of both competence and integrity.Developing the prospect databaseThis is the process by which the raw data streams are brought together and subjected to progressive refinement, with the output representing the refined base from which prospecting can begin in earnest. A wide experience-often across many different markets and countries-in the sourcing, handling and analysis of data inevitably improves the quality of the ideas and systems that a bureau can offer for the development of the prospect database.In summary, the typical shape of the service available from the very best bureaux will support a process that runs as follows:1.collect and consolidate all data to be screened for inclusion;2.merge the various streams;3.sort and classify the data by market and credit categories;4.screen the date using predetermined marketing and credit criteria; and5.consolidate and output the refined list.Bureaux will charge for the use of their expertise and systems.Therefore, consideration should be given to the volumes of data that are to be processed and the costs involved at each stage. The most cost-effective approach to constructing prospect databases only undertakes the lowest-cost screening process within the earlier stages. The more expensive screening processes are not employed until the mass of the data has been reduced by earlier filtering.It is impossible to be prescriptive about the range and levels of service that are available, but reference to one of the major bureaux operating in the region could certainly be a good starting point.Campaign Management and AnalysisAgain, this is an area where excellent support is available from the best-of-breed bureaux. They will provide both the operational support and software capabilities to mount, monitor and analyse your marketing campaign, should you so wish. Their depth of experience and capabilities in the credit sector will often open up income: cost possibilities from the solicitation exercise that would not otherwise be available to the new entrant.The First Important Applications of DBMS’sData items include names and addresses of customers, accounts, loans and their balance, and the connection between customers and their accounts and loans, e.g., who has signature authority over which accounts. Queries for account balances are common, but far more common are modifications representing a single payment from or deposit to an account.As with the airline reservation system, we expect that many tellers and customers (through ATM machines) will be querying and modifying the bank’s data at once. It is vital that simultaneous accesses to an account not cause the effect of an ATM transaction to be lost. Failures cannot be tolerated. For example, once the money has been ejected from an ATM machine ,the bank must record the debit, even if the power immediately fails. On the other hand, it is not permissible for the bank to record the debit and then not deliver the money because the power fails. The proper way to handle this operation is far from obvious and can be regarded as one of the significant achievements in DBMS architecture.Database system changed significantly. Codd proposed that database system should present the user with a view of data organized as tables called relations. Behindthe scenes, there might be a complex data structure that allowed rapid response to a variety of queries. But unlike the user of earlier database systems, the user of a relational system would not be concerned with storage structure. Queries could be expressed in a very high level language, which greatly increased the efficiency of database programmers. Relations are tables. Their columns are headed by attributes.Client –Server ArchitectureMany varieties of modern software use a client-server architecture, in which requests by one process (the client ) are sent to another process (the server) for execution. Database systems are no exception, and it is common to divide the work of the components shown into a server process and one or more client processes.In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces that the user and send queries or other commands across to the server. For example, relational systems generally use the SQL language for representing requests from the client to the server. The database server then sends the answer, in the form of a table or relation, back to client. The relationship between client and server can get more complex, especially when answers are extremely large. We shall have more to say about this matter in section 1.3.3. there is also a trend to put more work in the client, since the server will be a bottleneck if there are many simultaneous database users.附录B: 外文资料翻译-译文部分:客户目标:最早判断发展可收益卡的成功性是在于受市场影响的被吸引的申请人的质量。
计算机外文翻译外文文献英文文献数据库系统
外文资料原文Database Systems1.Introduction to Database SystemToday, more than at any previous time, the success of an organization depends on its ability to acquire accurate and timely data about its operation, to manage this data effectively, and to use it to analyze and guide its activities. Phrases such as the information superhighway have become ubiquitous, and information processing is a rapidly growing multibillion dollar industry .The amount of information available to us is literally exploding, and the value of data as an organizational asset is being widely recognized. This paradox drives the need for increasingly powerful and flexible data management systems .A database is a collection of data , typically describing the activities of one or more related organizations . For example , a university database might contain information about the following .●Entities such as students , faculty , courses , and classrooms .●Relationships between entities , such as students’enrollment in courses , faculty teaching courses , and the use of rooms for courses .A database management system , or DBMS , is software designed to assist in maintaining and utilizing large collections of data , and the need for such systems , as well as their use , is growing rapidly . The alternative to using a DBMS is to use ad hoc approaches that do notcarry over from one application to another , for example , to store the data in files and write application-specific code to manage it .The area of database management systems is a microcosm of computer science in general . The issues addressed and the techniques used span a wide spectrum , including languages , object-orientation and other programming paradigms , compilation , operating systems1concurrent programming , data structures , algorithms ,theory , parallel and distributed systems , user interfaces , expert systems and artificial intelligence , statistical techniques , and dynamic programming .Database management continues to gain importance as more and more data is brought on-line, and made ever more accessible through computer networking. Today the field is being driven by exciting visions such as multimedia databases, interactive video, digital libraries, a host of scientific projects such as the human genome mappin g effort and NASA’s Earth Observation System project, and the desire of companies to consolidate their decision-making processes and mine their data repositories for useful information about their business . Commercially , database management systems represent one of the largest and most vigorous market segments . Thus the study of database systems couldprove to be richly rewarding in more ways than one .2.Database consistsA database consists of a file or a set of files. The information in these files may be broken down into records, each of which consists of one or more fields. Fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Using keywords and various sorting commands, users can rapidly search, rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregates of data.Database records and files must be organized to allow retrieval of the information. Early systems were arranged sequentially (i.e., alphabetically, numerically, or chronologically); the development of direct-access storage devices made possible random access to data via indexes. Queries are the main way users retrieve database information. Typically, the user provides a string of characters, and the computersearches the database for a corresponding sequence and provides the source materials in which those characters appear. A user can request, for example, all records in which the content of the field for a person’s last name is the word Smith.In flat databases , records are organized according to a simplelist of entities; many simple databases for personal computers are flat in structure. The records in hierarchical databases are organized in a treelike structure, with each level of records branching off into a set of smaller categories. Unlike hierarchical databases, which provide single links between sets of records at different levels, network databases create multiple linkages between sets by placing links, orpointers, to one set of records in another; the speed and versatility of network databases have led to their wide use in business.Relational databases are used where associations among files or records cannot be expressed by links; a simple flat list becomes one table, or “relation”, and multiple relations can be mathematically associated to yield desired information. Object-oriented databases store and manipulate more complex data struct ures, called “objects”, which are organized into hierarchical classes that may inherit properties from classes higher in the chain; this database structure is the mostflexible and adaptable.3.Structure of the Relational databaseThe relational model is the basis for any relational database management system (RDBMS).A relational model has three core components: a collection of objects or relations, operators that act on the objects or relations, and data integrity methods. In other words, it has a place to store the data, a way to create and retrieve the data, and a way to make sure that the data is logically consistent.A relational database uses relations, or two-dimensional tables, to store the information needed to support a business.3.1.Tables, Row, and ColumnsA table in a relational database, alternatively known as a relation, is a two-dimensional structure used to hold related information. A database consists of one or more related tables.Note: Don't confuse a relation with relationships. A relation is essentially a table, and a relationship is a way to correlate, join, or associate two tables.A row in a table is a collection or instance of one thing, such as one employee or one line item on an invoice. A column contains all the information of a single type, and the piece of data at the intersection of a row and a column, a field, is the smallest piece of informationthat can be retrieved with the database's query language. For example, a table with information about employees might have a column calledLAST_NAME that contains all of the employees' last names. Data is retrieved from a table by filtering on both the row and the column.3.2.Primary Keys, Data types, and Foreign KeysRelation: A two-dimensional structure used to hold related information, also known as a table.Row: A group of one or more data elements in a database table that describes a person, place, or thing.Column: The component of a database table that contains all of the data of the same name and type across all rows.Primary Key: A column (or columns) in a table that makes the row in the table distinguishable from every other row in the same table.Data types: numeric values, character or alphabetic values, and date values.A foreign key enforces the concept of referential integrity in a relational database.Foreign Key: A column (or columns) in a table that draws its valuesfrom a primary or unique key column in another table. A foreign key assists in ensuring the data integrity of a table. Referential Integrity A method employed by a relational database system that enforces one-to-many relationships between tables.3.3.Data ModelingIn this process, the developer conceptualizes and documents all the tables for the database. One of the common methods for modeling a database is called ERA, which stands for entities, relationships, and attributes. The database designer uses an application that can maintain entities, their attributes, and their relationships. In general, anentity corresponds to a table in the database, and the attributes of the entity correspond to columns of the table.Data Modeling: A process of defining the entities, attributes, and relationships between the entities in preparation for creating the physical database.The data-modeling process involves defining the entities, defining the relationships between those entities, and then defining theattributes for each of the entities. Once a cycle is complete, it is repeated as many times as necessary to ensure that the designer is capturing what is important enough to go into the database. Let's take a closer look at each step in the data-modeling process.3.4. Defining the EntitiesFirst, the designer identifies all of the entities within the scope of the database application.The entities are the persons, places, or things that are important to the organization and need to be tracked in the database. Entitieswill most likely translate neatly to database tables.。
毕业设计数据库管理外文文献
毕业设计(论文)外文参考资料及译文译文题目:学生姓名:学号:专业:所在学院:指导教师:职称:年月日1. Database management system1. Database management systemA Database Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and the use of a database. It allows organizations to place control of database development in the hands of database administrators (DBAs) and other specialists. A DBMS is a system software package that helps the use of integrated collection of data records and files known as databases. It allows different user application programs to easily access the same database. DBMSs may use any of a variety of database models, such as the network model or relational model. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way. Instead of having to write computer programs to extract information, user can ask simple questions in a query language. Thus, many DBMS packages provide Fourth-generation programming language (4GLs) and other application development features. It helps to specify the logical organization for a database and access and use the information within a database. It provides facilities for controlling data access, enforcing data integrity, managing concurrency, and restoring the database from backups. A DBMS also provides the ability to logically present database information to users.2. OverviewA DBMS is a set of software programs that controls the organization, storage, management, and retrieval of data in a database. DBMSs are categorized according to their data structures or types. The DBMS accepts requests for data from an application program and instructs the operating system to transfer the appropriate data. The queries and responses must be submitted and received according to a format that conforms to one or more applicable protocols. When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.Database servers are computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions.3. HistoryDatabases have been in use since the earliest days of electronic computing. Unlike modern systems which can be applied to widely different databases and needs, the vast majority of older systems were tightly linked to the custom databases in order to gain speed at the expense of flexibility. Originally DBMSs were found only in large organizations with the computer hardware needed to support large data sets.3.1 1960s Navigational DBMSAs computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, Integrated Data Store (IDS), founded the "Database Task Group" within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971 they delivered their standard, which generally became known as the "Codasyl approach", and soon there were a number of commercial products based on it available.The Codasyl approach was based on the "manual" navigation of a linked data set which was formed into a large network. When the database was first opened, the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like "find all the people in India" required the programto walk the entire data set and collect the matching results. There was, essentially, no concept of "find" or "search". This might sound like a serious limitation today, but in an era when the data was most often stored on magnetic tape such operations were too expensive to contemplate anyway.IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model. Both concepts later became known as navigational databases due to the way data was accessed, and Bachman's 1973 Turing Award award presentation was The Programmer as Navigator. IMS is classified as a hierarchical database. IMS and IDMS, both CODASYL databases, as well as CINCOMs TOTAL database are classified as network databases.3.2 1970s Relational DBMSEdgar Codd worked at IBM in San Jose, California, in one of their offshoot offices that was primarily involved in the development of hard disk systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility which was becoming increasingly useful. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data Banks.[1]In this paper, he described a new system for storing and working with large databases. Instead of records being stored in some sort of linked list of free-form records as in Codasyl, Codd's idea was to use a "table" of fixed-length records. A linked-list system would be very inefficient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of normalized tables, with optional elements being moved out of the main table to where they would take up room only if needed.For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be normalized into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.Linking the information back together is the key to this system. In the relational model, some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional (or related) tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for.Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous SQL. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning sets of data in a single operation.Codd's paper was picked up by two people at the Berkeley, Eugene Wong and Michael Stonebraker. They started a project known as INGRES using funding that had already been allocated for a geographical database project, using studentprogrammers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. During this time, a number of people had moved "through" the group — perhaps as many as 30 people worked on the project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as QUEL — QUEL was in fact relational, having been based on Codd's own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts of the relational model as SQL itself.IBM itself did one test implementation of the relational model, PRTV, and a production one, Business System 12, both now discontinued. Honeywell did MRDS for Multics, and now there are two new implementations: Alphora Dataphor and Rel. All other DBMS implementations usually called relational are actually SQL DBMSs. In 1968, the University of Michigan began development of the Micro DBMS relational database management system. It was used to manage very large data sets by the US Department of Labor, the Environmental Protection Agency and researchers from University of Alberta, the University of Michigan and Wayne State University. It ran on mainframe computers using Michigan Terminal System. The system remained in production until 1996.3.3 End 1970s SQL DBMSIBM started working on a prototype system loosely based on Codd's concepts as System R in the early 1970s. The first version was ready in 1974/5, and work then started on multi-table systems in which the data could be split so that all of the data for a record (much of which is often optional) did not have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 1979, by which time a standardized query language, SQL, had been added. Codd's ideas were establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true production version of System R, known as SQL/DS, and, later, Database 2 (DB2).Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own companies to commercialize the work but with an SQL interface. Sybase, Informix, NonStop SQL and eventually Ingres itself were all being sold as offshoots to the original INGRES product in the 1980s. Even Microsoft SQL Server is actually a re-built version of Sybase, and thus, INGRES. Only Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R, and beat IBM to market when the first version was released in 1978.Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is now known as PostgreSQL. PostgreSQL is often used for global mission critical applications (the .org and .info domain name registries use it as their primary data store, as do many large companies and financial institutions).In Sweden, Codd's paper was also read and Mimer SQL was developed from the mid-70s at Uppsala University. In 1984, this project was consolidated into an independent enterprise. In the early 1980s, Mimer introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMS.3.4 1980s Object Oriented DatabasesThe 1980s, along with a rise in object oriented programming, saw a growth in how data in various databases were handled. Programmers and designers began to treat the data in their databases as objects. That is to say that if a person's data were in a database, that person's attributes, such as their address, phone number, and age, were now considered to belong to that person instead of being extraneous data. This allows for relationships between data to be relation to objects and their attributes and not to individual fields.Another big game changer for databases in the 1980s was the focus on increasing reliability and access speeds. In 1989, two professors from the University of Michigan at Madison, published an article at an ACM associated conference outlining their methods on increasing database performance. The idea was to replicate specific important, and often queried information, and store it in a smaller temporary database that linked these key features back to the main database. This meant that a query could search the smaller database much quicker, rather than search the entire dataset. This eventually leads to the practice of indexing, which is used by almost every operating system from Windows to the system that operates Apple iPod devices.4. DBMS building blocksA DBMS includes four main parts: modeling language, data structure, database query language, and transaction mechanisms:4.1 Components of DBMS∙DBMS Engine accepts logical request from the various other DBMS subsystems, converts them into physical equivalents, and actually accesses thedatabase and data dictionary as they exist on a storage device.∙Data Definition Subsystem helps user to create and maintain the data dictionary and define the structure of the files in a database.∙Data Manipulation Subsystem helps user to add, change, and delete information in a database and query it for valuable information. Software tools within the data manipulation subsystem are most often the primary interfacebetween user and the information contained in a database. It allows user tospecify its logical information requirements.∙Application Generation Subsystem contains facilities to help users to develop transaction-intensive applications. It usually requires that userperform a detailed series of tasks to process a transaction. It facilitateseasy-to-use data entry screens, programming languages, and interfaces.∙Data Administration Subsystem helps users to manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management.4.2 Modeling languageA data modeling language to define the schema of each database hosted in the DBMS, according to the DBMS database model. The four most common types of models are the:•hierarchical model,•network model,•relational model, and•object model.Inverted lists and other methods are also used. A given database management system may provide one or more of the four models. The optimal structure dependson the natural organization of the application's data, and on the application's requirements (which include transaction rate (speed), reliability, maintainability, scalability, and cost).The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists who believe this model is a corruption of the relational model, since it violates several of its fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.Before the database management approach, organizations relied on file processing systems to organize, store, and process data files. End users became aggravated with file processing because data is stored in many different files and each organized in a different way. Each file was specialized to be used with a specific application. Needless to say, file processing was bulky, costly and nonflexible when it came to supplying needed data accurately and promptly. Data redundancy is an issue with the file processing system because the independent data files produce duplicate data so when updates were needed each separate file would need to be updated. Another issue is the lack of data integration. The data is dependent on other data to organize and store it. Lastly, there was not any consistency or standardization of the data in a file processing system which makes maintenance difficult. For all these reasons, the database management approach was produced. Database management systems (DBMS) are designed to use one of five database structures to providesimplistic access to information stored in databases. The five database structures are hierarchical, network, relational, multidimensional and object-oriented models.The hierarchical structure was used in early mainframe DBMS. Records’ relationships form a treelike model. This structure is simple but nonflexible because the relationship is confined to a one-to-many relationship. IBM’s IMS system and the RDM Mobile are examples of a hierarchical database system with multiple hierarchies over the same data. RDM Mobile is a newly designed embedded database for a mobile computer system. The hierarchical structure is used primary today for storing geographic information and file systems.The network structure consists of more complex relationships. Unlike the hierarchical structure, it can relate to many records and accesses them by following one of several paths. In other words, this structure allows for many-to-many relationships.The relational structure is the most commonly used today. It is used by mainframe, midrange and microcomputer systems. It uses two-dimensional rows and columns to store data. The tables of records can be connected by common key values. While working for IBM, E.F. Codd designed this structure in 1970. The model is not easy for the end user to run queries with because it may require a complex combination of many tables.The multidimensional structure is similar to the relational model. The dimensions of the cube looking model have data relating to elements in each cell. This structure gives a spreadsheet like view of data. This structure is easy to maintain because records are stored as fundamental attributes, the same way they’re viewed and the structure is easy to understand. Its high performance has made it the most popular database structure when it comes to enabling online analytical processing (OLAP).The object oriented structure has the ability to handle graphics, pictures, voice and text, types of data, without difficultly unlike the other database structures. This structure is popular for multimedia Web-based applications. It was designed to work with object-oriented programming languages such as Java.4.3 Data structureData structures (fields, records, files and objects) optimized to deal with very large amounts of data stored on a permanent data storage device (which implies relatively slow access compared to volatile main memory).4.4 Database query languageA database query language and report writer allows users to interactively interrogate the database, analyze its data and update it according to the users privileges on data. It also controls the security of the database. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called subschemas. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data.If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These controls are only available when a set of application programs are customized for each data entry and updating function.4.5 Transaction mechanismA database transaction mechanism ideally guarantees ACID properties in orderto ensure data integrity despite concurrent user accesses (concurrency control), and faults (fault tolerance). It also maintains the integrity of the data in the database. The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can help prevent duplicate records via unique index constraints; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See ACID properties for more information (Redundancy avoidance).5. DBMS topics5.1 External, Logical and Internal viewA database management system provides the ability for many different users to share data and process resources. But as there can be many different users, there are many different database needs. The question now is: How can a single, unified database meet the differing requirement of so many users?A DBMS minimizes these problems by providing two views of the database data: an external view(or User view), logical view(or conceptual view)and physical(or internal) view. The user’s view, of a database program represents data in a format that is meaningful to a user and to the software programs that process those data. That is, the logical view tells the user, in user terms, what is in the database. The physicalview deals with the actual, physical arrangement and location of data in the direct access storage devices(DASDs). Database specialists use the physical view to make efficient use of storage and processing resources. With the logical view users can see data differently from how they are stored, and they do not want to know all the technical details of physical storage. After all, a business user is primarily interested in using the information, not in how it is stored.One strength of a DBMS is that while there is typically only one conceptual (or logical) and physical (or Internal) view of the data, there can be an endless number of different External views. This feature allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. Thus the logical view refers to the way user views data, and the physical view to the way the data are physically stored and processed...5.2 DBMS features and capabilitiesAlternatively, and especially in connection with the relational model of database management, the relation between attributes drawn from a specified set of domains can be seen as being primary. For instance, the database might indicate that a car that was originally "red" might fade to "pink" in time, provided it was of some particular "make" with an inferior paint job. Such higher arity relationships provide information on all of the underlying domains at the same time, with none of them being privileged above the others.5.3 DBMS simple definitionData base management system is the system in which related data is stored in an "efficient" and "compact" manner. Efficient means that the data which is stored in the DBMS is accessed in very quick time and compact means that the data which is stored in DBMS covers very less space in computer's memory. In above definition the phrase "related data" is used which means that the data which is stored in DBMS is about some particular topic.Throughout recent history specialized databases have existed for scientific, geospatial, imaging, document storage and like uses. Functionality drawn from such applications has lately begun appearing in mainstream DBMSs as well. However, the main focus there, at least when aimed at the commercial data processing market, is still on descriptive attributes on repetitive record structures.Thus, the DBMSs of today roll together frequently needed services or features of attribute management. By externalizing such functionality to the DBMS, applications effectively share code with each other and are relieved of much internal complexity. Features commonly offered by database management systems include:5.3.1 Query abilityQuerying is the process of requesting attribute information from various perspectives and combinations of factors. Example: "How many 2-door cars in Texas are green?" A database query language and report writer allow users to interactively interrogate the database, analyze its data and update it according to the users privileges on data.5.3.2 Backup and replicationCopies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic copy of attributes may also be created for a distant organization that cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets. When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.5.3.2 Rule enforcementOften one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine associated with it (identified by Engine Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny such a request and display an error message. However, with changes in the model specification such as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able to be added and removed as needed without significant data layout redesign.5.3.4 SecurityOften it is desirable to limit who can see or change which attributes or groups of attributes. This may be managed directly by individual, or by the assignment of individuals and privileges to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements.5.3.5 ComputationThere are common computations requested on attributes such as counting, summing, averaging, sorting, grouping, cross-referencing, etc. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations.5.3.6 Change and access loggingOften one wants to know who accessed what attributes, what was changed, and when it was changed. Logging services allow this by keeping a record of access occurrences and changes.5.3.7 Automated optimizationIf there are frequently occurring usage patterns or requests, some DBMS can adjust themselves to improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor performance, allowing a human expert to make the necessary adjustments after reviewing the statistics collected5.4 Meta-data repositoryMetadata is data describing data. For example, a listing that describes what attributes are allowed to be in data sets is called "meta-information". The meta-data is also known as data about data.5.5 Current trendsIn 1998, database management was in need of new style databases to solve current database management problems. Researchers realized that the old trends of database management were becoming too complex and there was a need for automated configuration and management. Surajit Chaudhuri, Gerhard Weikum and Michael Stonebraker, were the pioneers that dramatically affected the thought of database management systems. They believed that database management needed a more modular approach and that there are so many specifications needs for various users. Since this new development process of database management we currently have endless possibilities. Database management is no longer limited to “monolithic entities”. Many solutions have developed to satisfy individual needs of users. Development of numerous database options has created flexible solutions in database management.Today there are several ways database management has affected the technology world as we know it. Organizations demand for directory services has become an extreme necessity as organizations grow. Businesses are now able to use directory services that provided prompt searches for their company information. Mobile devices are not only able to store contact information of users but have grown to bigger capabilities. Mobile technology is able to cache large information that is used for computers and is able to display it on smaller devices. Web searches have even been affected with database management. Search engine queries are able to locate data。
数据仓库(外文翻译)
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information onwhich an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informationfrom multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
大学毕业设计关于数据库外文翻译2篇
原文:Structure of the Relational database—《Database System Concepts》Part1: Relational Databases The relational model is the basis for any relational database management system (RDBMS).A relational model has three core components: a collection of obj ects or relations, operators that act on the objects or relations, and data integrity methods. In other words, it has a place to store the data, a way to create and retrieve the data, and a way to make sure that the data is logically consistent.A relational database uses relations, or two-dimensional tables, to store the information needed to support a business. Let's go over the basic components of a traditional relational database system and look at how a relational database is designed. Once you have a solid understanding of what rows, columns, tables, and relationships are, you'll be well on your way to leveraging the power of a relational database.Tables, Row, and ColumnsA table in a relational database, alternatively known as a relation, is a two-dimensional structure used to hold related information. A database consists of one or more related tables.Note: Don't confuse a relation with relationships. A relation is essentially a table, and a relationship is a way to correlate, join, or associate two tables.A row in a table is a collection or instance of one thing, such as one employee or one line item on an invoice. A column contains all the information of a single type, and the piece of data at the intersection of a row and a column, a field, is the smallest piece of information that can be retrieved with the database's query language. For example, a table with information about employees might have a column calledLAST_NAME that contains all of the employees' last names. Data is retrieved from a table by filtering on both the row and the column.Primary Keys, Datatypes, and Foreign KeysThe examples throughout this article will focus on the hypothetical work of Scott Smith, database developer and entrepreneur. He just started a new widget company and wants to implement a few of the basic business functions using the relational database to manage his Human Resources (HR) department.Relation: A two-dimensional structure used to hold related information, also known as a table.Note: Most of Scott's employees were hired away from one of his previous employers, some of whom have over 20 years of experience in the field. As a hiring incentive, Scott has agreed to keep the new employees' original hire date in the new database.Row:A group of one or more data elements in a database table that describes a person, place, or thing.Column:The component of a database table that contains all of the data of the same name and type across all rows.You'll learn about database design in the following sections, but let's assume for the moment that the majority of the database design is completed and some tables need to be implemented. Scott creates the EMP table to hold the basic employee information, and it looks something like this:Notice that some fields in the Commission (COMM) and Manager (MGR) columns do not contain a value; they are blank. A relational database can enforce the rule that fields in a column may or may not be empty. In this case, it makes sense for an employee who is not in the Sales department to have a blank Commission field. It also makes sense for the president of the company to have a blank Manager field, since that employee doesn't report to anyone.Field:The smallest piece of information that can be retrieved by the database query language. A field is found at the intersection of a row and a column in a database table.On the other hand, none of the fields in the Employee Number (EMPNO) column are blank. The company always wants to assign an employee number to an employee, and that number must be different for each employee. One of the features of a relational database is that it can ensure that a value is entered into this column and that it is unique. Th e EMPNO column, in this case, is the primary key of the table.Primary Key:A column (or columns) in a table that makes the row in the table distinguishable from every other row in the same table.Notice the different datatypes that are stored in the EMP ta ble: numeric values, character or alphabetic values, and date values.As you might suspect, the DEPTNO column contains the department number for the employee. But how do you know what department name is associated with what number? Scott created the DEPT table to hold the descriptions for the department codes in the EMP table.The DEPTNO column in the EMP table contains the same values as the DEPTNO column in the DEPT table. In this case, the DEPTNO column in the EMP table is considered a foreign key to the same column in the DEPT table.A foreign key enforces the concept of referential integrity in a relational database. The concept of referential integrity not only prevents an invalid department number from being inserted into the EMP table, but it also prevents a row in the DEPT table from being deleted if there are employees still assigned to that department.Foreign Key:A column (or columns) in a table that draws its values from a primary or unique key column in another table. A foreign key assists in ensuring the data integrity of a table. Referential Integrity A method employed by a relational database system that enforces one-to-many relationships between tables.Data ModelingBefore Scott created the actual tables in the database, he went through a design process known as data modeling. In this process, the developer conceptualizes and documents all the tables for the database. One of the common methods for mod eling a database is called ERA, which stands for entities, relationships, and attributes. The database designer uses an application that can maintain entities, their attributes, and their relationships. In general, an entity corresponds to a table in the database, and the attributes of the entity correspond to columns of the table.Data Modeling:A process of defining the entities, attributes, and relationships between the entities in preparation for creating the physical database.The data-modeling process involves defining the entities, defining the relationships between those entities, and then defining the attributes for each of the entities. Once a cycle is complete, it is repeated as many times as necessary to ensure that the designer is capturing what is important enough to go into the database. Let's take a closer look at each step in the data-modeling process.Defining the EntitiesFirst, the designer identifies all of the entities within the scope of the database application.The entities are the pers ons, places, or things that are important to the organization and need to be tracked in the database. Entities will most likely translate neatly to database tables. For example, for the first version of Scott's widget company database, he identifies four entities: employees, departments, salary grades, and bonuses. These will become the EMP, DEPT, SALGRADE, and BONUS tables.Defining the Relationships Between EntitiesOnce the entities are defined, the designer can proceed with defining how each of the entities is related. Often, the designer will pair each entity with every other entity and ask, "Is there a relationship between these two entities?" Some relationships are obvious; some are not.In the widget company database, there is most likely a relations hip between EMP and DEPT, but depending on the business rules, it is unlikely that the DEPT and SALGRADE entities are related. If the business rules were to restrict certain salary grades to certain departments, there would most likely be a new entity that defines the relationship between salary grades and departments. This entity wouldbe known as an associative or intersection table and would contain the valid combinations of salary grades and departments.Associative Table:A database table that stores th e valid combinations of rows from two other tables and usually enforces a business rule. An associative table resolves a many-to-many relationship.In general, there are three types of relationships in a relational database:One-to-many The most common type of relationship is one-to-many. This means that for each occurrence in a given entity, the parent entity, there may be one or more occurrences in a second entity, the child entity, to which it is related. For example, in the widget company database, the DEPT entity is a parent entity, and for each department, there could be one or more employees associated with that department. The relationship between DEPT and EMP is one-to-many.One-to-one In a one-to-one relationship, a row in a table is related to only one or none of the rows in a second table. This relationship type is often used for subtyping. For example, an EMPLOYEE table may hold the information common to all employees, while the FULLTIME, PARTTIME, and CONTRACTOR tables hold information unique to full-time employees, part-time employees, and contractors, respectively. These entities would be considered subtypes of an EMPLOYEE and maintain a one-to-one relationship with the EMPLOYEE table. These relationships are not as common as one-to-many relationships, because if one entity has an occurrence for a corresponding row in another entity, in most cases, the attributes from both entities should be in a single entity.Many-to-many In a many-to-many relationship, one row of a table may be related to man y rows of another table, and vice versa. Usually, when this relationship is implemented in the database, a third entity isdefined as an intersection table to contain the associations between the two entities in the relationship. For example, in a database used for school class enrollment, the STUDENT table has a many-to-many relationship with the CLASS table—one student may take one or more classes, and a given class may have one or more students. The intersection table STUDENT_CLASS would contain the comb inations of STUDENT and CLASS to track which students are in which classes.Once the designer has defined the entity relationships, the next step is to assign the attributes to each entity. This is physically implemented using columns, as shown here for th e SALGRADE table as derived from the salary grade entity.After the entities, relationships, and attributes have been defined, the designer may iterate the data modeling many more times. When reviewing relationships, new entities may be discovered. For exa mple, when discussing the widget inventory table and its relationship to a customer order, the need for a shipping restrictions table may arise.Once the design process is complete, the physical database tables may be created. Logical database design sessions should not involve physical implementation issues, but once the design has gone through an iteration or two, it's the DBA's job to bring the designers "down to earth." As a result, the design may need to be revisited to balance the ideal database implementation versus the realities of budgets andschedules.译文:关系数据库的结构—《数据库系统结构》第一章:关系数据库关系模型是任何关系数据库管理系统(RDBMS)的基础。
数据库外文翻译外文文献英文文献数据库安全
Database Security“Why do I need to secure my database server? No one can access it —it’s in a DMZ protected by the firewall!” This is often the response when it is recommended that such devices are included within a security health check. In fact, database security is paramount in defending an organizations information, as it may be indirectly exposed to a wider audience than realized.This is the first of two articles that will examine database security. In this article we will discuss general database security concepts and common problems. In the next article we will focus on specific Microsoft SQL and Oracle security concerns.Database security has become a hot topic in recent times. With more and more people becoming increasingly concerned with computer security, we are finding that firewalls and Web servers are being secured more than ever(though this does not mean that there are not still a large number of insecure networks out there). As such, the focus is expanding to consider technologies such as databases with a more critical eye.◆Common sense securityBefore we discuss the issues relating to database security it is prudent to high- light the necessity to secure the underlying operating system and supporting technologies. It is not worth spending a lot of effort securing a database if a vanilla operating system is failing to provide a secure basis for the hardening of the data- base. There are a large number of excellent documents in the public domain detailing measures that should be employed when installing various operating systems.One common problem that is often encountered is the existence of a database on the same server as a web server hosting an Internet (or Intranet) facing application. Whilst this may save the cost of purchasing a separate server, it does seriously affect the security of the solution. Where this is identified, it is often the case that the database is openly connected to the Internet. One recent example I can recall is an Apache Web server serving an organizations Internet offering, with an Oracle database available on the Internet on port 1521. When investigating this issue further it was discovered that access to the Oracle server was not protected (including lack of passwords), which allowed the server to be stopped. The database was not required from an Internet facing perspective, but the use of default settings and careless security measures rendered the server vulnerable.The points mentioned above are not strictly database issues, and could be classified as architectural and firewall protection issues also, but ultimately it is the database that is compromised. Security considerations have to be made from all parts of a public facing net- work. You cannot rely on someone or something else within your organization protecting your database fr om exposur e.◆ Attack tools are now available for exploiting weaknesses in SQL and OracleI came across one interesting aspect of database security recently while carrying out a security review for a client. We were performing a test against an intranet application, which used a database back end (SQL) to store client details. The security review was proceeding well, with access controls being based on Windows authentication. Only authenticated Windows users were able to see data belonging to them. The application itself seemed to be handling input requests, rejecting all attempts to access the data- base directly.We then happened to come across a backup of the application in the office in which we were working. This media contained a backup of the SQL database, which we restored onto our laptop. All security controls which were in place originally were not restored with the database and we were able to browse the complete database, with no restrictions in place to protect the sensitive data. This may seem like a contrived way of compromising the security of the system, but does highlight an important point. It is often not the direct approach that is taken to attack a target, and ultimately the endpoint is the same; system compromise. A backup copy of the database may be stored on the server, and thus facilitates access to the data indirectly.There is a simple solution to the problem identified above. SQL 2000 can be configured to use password protection for backups. If the backup is created with password protection, this password must be used when restoring the password. This is an effective and uncomplicated method of stopping simple capture of backup data. It does however mean that the password must be remembered!◆Curr ent tr endsThere are a number of current trends in IT security, with a number of these being linked to database security.The focus on database security is now attracting the attention of the attackers. Attack tools are now available for exploiting weaknesses in SQL and Oracle. The emergence of these tools has raised the stakes and we have seen focused attacks against specific data- base ports on servers exposed to the Internet.One common theme running through the security industry is the focus on application security, and in particular bespoke Web applications. With he functionality of Web applications becoming more and more complex, it brings the potential for more security weaknesses in bespoke application code. In order to fulfill the functionality of applications, the backend data stores are commonly being used to format the content of Web pages. This requires more complex coding at the application end. With developers using different styles in code development, some of which are not as security conscious as other, this can be the source of exploitable errors.SQL injection is one such hot topic within the IT security industry at the moment. Discussions are now commonplace among technical security forums, with more and more ways and means of exploiting databases coming to light all the time. SQL injection is a misleading term, as the concept applies to other databases, including Oracle, DB2 and Sybase.◆ What is SQL Injection?SQL Injection is simply the method of communication with a database using code or commands sent via a method or application not intended by the developer. The most common form of this is found in Web applications. Any user input that is handled by the application is a common source of attack. One simple example of mishandling of user input is highlighted in Figure 1.Many of you will have seen this common error message when accessing web sites, and often indicates that the user input has not been correctly handled. On getting this type of error, an attacker will focus in with more specific input strings.Specific security-related coding techniques should be added to coding standard in use within your organization. The damage done by this type of vulnerability can be far reaching, though this depends on the level of privileges the application has in relation to the database.If the application is accessing data with full administrator type privileges, then maliciously run commands will also pick up this level of access, and system compromise is inevitable. Again this issue is analogous to operating system security principles, where programs should only be run with the minimum of permissions that is required. If normal user access is acceptable, then apply this restriction.Again the problem of SQL security is not totally a database issue. Specific database command or requests should not be allowed to pass through theapplication layer. This can be prevented by employing a “secure coding” approach.Again this is veering off-topic, but it is worth detailing a few basic steps that should be employed.The first step in securing any application should be the validation and control of user input. Strict typing should be used where possible to control specific data (e.g. if numeric data is expected), and where string based data is required, specific non alphanumeric characters should be prohibited where possible. Where this cannot be performed, consideration should be made to try and substitute characters (for example the use of single quotes, which are commonly used in SQL commands).Specific security-related coding techniques should be added to coding standard in use within your organization. If all developers are using the same baseline standards, with specific security measures, this will reduce the risk of SQL injection compromises.Another simple method that can be employed is to remove all procedures within the database that are not required. This restricts the extent that unwanted or superfluous aspects of the database could be maliciously used. This is analogous to removing unwanted services on an operating system, which is common security practice.◆ OverallIn conclusion, most of the points I have made above are common sense security concepts, and are not specific to databases. However all of these points DO apply to databases and if these basic security measures are employed, the security of your database will be greatly improved.The next article on database security will focus on specific SQL and Oracle security problems, with detailed examples and advice for DBAs and developers.There are a lot of similarities between database security and general IT security, with generic simple security steps and measures that can be (and should be) easily implemented to dramatically improve security. While these may seem like common sense, it is surprising how many times we have seen that common security measures are not implemented and so causea security exposure.◆User account and password securityOne of the basic first principals in IT security is “make su re you have a good password”. Within this statement I have assumed that a password is set in the first place, though this is often not the case.I touched on common sense security in my last article, but I think it is important to highlight this again. As with operating systems, the focus of attention within database account security is aimed at administrationaccounts. Within SQL this will be the SA account and within Oracle it may be the SYSDBA or ORACLE account.It is very common for SQL SA accounts to have a password of ‘SA’ or even worse a blank password, which is just as common. This password laziness breaks the most basic security principals, and should be stamped down on. Users would not be allowed to have a blank password on their own domain account, so why should valuable system resources such as databases be allowed to be left unprotected. For instance, a blank ‘SA’password will enable any user with client software (i.e. Microsoft query analyser or enterprise manager to ‘manage’ the SQL server and databases).With databases being used as the back end to Web applications, the lack of password control can result in a total compromise of sensitive information. With system level access to the database it is possible not only to execute queries into the database, create/modify/delete tables etc, but also to execute what are known as Stored Procedures.数据库安全“为什么要确保数据库服务安全呢?任何人都不能访问-这是一个非军事区的保护防火墙”,当我们被建议使用一个带有安全检查机制的装置时,这是通常的反应。
数据库管理系统毕业论文中英文资料对照外文翻译文献综述
数据库管理系统毕业论文中英文资料对照外文翻译文献综述中英文资料对照外文翻译文献综述英文翻译数据库管理系统的介绍Raghu Ramakrishnan数据库(database,有时被拼作data base)又称为电子数据库,是专门组织起来的一组数据或信息,其目的是为了便于计算机快速查询及检索。
数据库的结构是专门设计的,在各种数据处理操作命令的支持下,可以简化数据的存储、检索、修改和删除。
数据库可以存储在磁盘、磁带、光盘或其他辅助存储设备上。
数据库由一个或一套文件组成,其中的信息可以分解为记录,每一条记录又包含一个或多个字段(或称为域)。
字段是数据存取的基本单位。
数据库用于描述实体,其中的一个字段通常表示与实体的某一属性相关的信息。
通过关键字以及各种分类(排序)命令,用户可以对多条记录的字段进行查询,重新整理,分组或选择,以实体对某一类数据的检索,也可以生成报表。
所有数据库(除最简单的)中都有复杂的数据关系及其链接。
处理与创建,访问以及维护数据库记录有关的复杂任务的系统软件包叫做数据库管理系统(DBMS)。
DBMS软件包中的程序在数据库与其用户间建立接口。
(这些用户可以是应用程序员,管理员及其他需要信息的人员和各种操作系统程序)DBMS可组织、处理和表示从数据库中选出的数据元。
该功能使决策者能搜索、探查和查询数据库的内容,从而对正规报告中没有的,不再出现的且无法预料的问题做出回答。
这些问题最初可能是模糊的并且(或者)是定义不恰当的,但是人们可以浏览数据库直到获得所需的信息。
简言之,DBMS将“管理”存储的数据项和从公共数据库中汇集所需的数据项用以回答非程序员的询问。
DBMS由3个主要部分组成:(1)存储子系统,用来存储和检索文件中的数据;(2)建模和操作子系统,提供组织数据以及添加、删除、维护、更新数据的方法;(3)用户和DBMS之间的接口。
在提高数据库管理系统的价值和有效性方面正在展现以下一些重要发展趋势:1.管理人员需要最新的信息以做出有效的决策。
数据库设计外文翻译
外文资料As information technology advances, various management systems have emerged to change the daily lives of the more coherent, to the extent possible, the use of network resources can be significantly reasonable reduction of manual management inconvenience and waste of time.Accelerating the modernization of the 21st century, the continuous improvement of the scientific and cultural levels, the rapid growth of the number of students will inevitably increase the pressure information management students, the inefficient manual retrieval completely incompatible with the community\'s needs. The Student Information Management Systemis an information management one kind within system, currently information technique continuously of development, the network technique has already been applied in us extensively nearby of every trade, there is the network technical development, each high schools all make use of a calculator to manage to do to learn, the school is operated by handicraft before of the whole tedious affairs all got fast and solve high-efficiencily, especially student result management the system had in the school very big function, all can be more convenient, fast for the student and the teacher coming saying and understand accurately with management everyone noodles information.AbstractIt is a very heavy and baldness job of managing a bulky database by manpower. The disadvantage, such as great capacity of work, low efficiency and long period, exist in data inputting, demanding and modification. So the computer management system will bring us a quite change.Because there are so many students in the school, the data of students' information is huge, it makes the management of the information become a complicated and tedious work. This system aims at the school, passing by practically of demand analysis, adopt mighty VB6.0 to develop the student information management system. The whole system design process follow the principle of simple operation, beautiful and vivid interface and practical request. The student information management system including the function of system management, basic information management, study management, prize andpunishment management , print statement and so on. Through the proof of using, the student information management system which this text designed can satisfy the school to manage the demand of the aspect to students' information. The thesis introduced the background of development, the functions demanded and the process of design. The thesis mainly explained the point of the system design, the thought of design, the difficult technique and the solutions. The student managed the creation of the system to reduce the inconvenience on the manpower consumedly, let the whole student the data management is more science reasonable.The place that this system has most the special features is the backstage database to unify the management to student's information.That system mainly is divided into the system management, student profession management, student file management, school fees management, course management, result management and print the statement.The interface of the system is to make use of the vb software creation of, above few molds pieces are all make use of the vb to control a the piece binds to settle of method to carry out the conjunction toward the backstage database, the backstage database probably is divided into following few formses:Professional information form, the charges category form, student the job form, student the information form, political feature form of student, the customer logs on the form The system used Client/Server structure design, the system is in the data from one server and a number of Taiwan formed LAN workstations. Users can check the competence of different systems in different users submit personal data, background database you can quickly given the mandate to see to the content.Marks management is a important work of school,the original manual management have many insufficiencies,the reasons that,students' population are multitudinous in school,and each student's information are too complex,thus the work load are extremely big,the statistics and the inquiry have beeninconvenient.Therefore,how to solve these insufficiencies,let the marks management to be more convenient and quickly,have a higher efficiency,and become a key question.More and more are also urgent along with school automationthe marksmanagement when science and technology rapid development,therefore is essential to develop the software system of marks register to assist the school teaching management.So that can improve the marks management,enhance the efficiency of management.“We cut nature up, organize it into concepts, and ascribe significances as we do, largely because we are parties to an agreement that holds throughout our speech community and is codified in the patterns of our language …we cannot talk at all except by subscribing to the organization and classification of data which the agreement decrees.” Benjamin Lee Whorf (1897-1941)The genesis of the computer revolution was in a machine. The genesis of our programming languages thus tends to look like that machine.But computers are not so much machines as they are mind amplification tools (“bicycles for the mind,”as Steve Jobs is fond of saying) and a different kind of expressive medium. As a result, the tools are beginning to look less like machines and more like parts of our minds, and also like other forms of expression such as writing, painting, sculpture, animation, and filmmaking. Object-oriented programming (OOP) is part of this movement toward using the computer as an expressive medium.This chapter will introduce you to the basic concepts of OOP, including an overview of development methods. This chapter, and this book, assumes that you have some programming experience, although not necessarily in C. If you think you need more preparation in programming before tackling this book, you should work through the Thinking in C multimedia seminar, downloadable from .This chapter is background and supplementary material. Many people do not feel comfortable wading into object-oriented programming without understanding the big picture first. Thus, there are many concepts that are introduced here to give you a solid overview of OOP. However, other people may not get the big picture concepts until they’ve seen some of the mechanics first; these people may become boggeddown and lost without some code to get their hands on. If you’re part of this latter group and are eager to get to the specifics of the language, feel free to jump past this chapter—skipping it at t his point will not prevent you from writing programs or learning the language. However, you will want to come back here eventually to fill in your knowledge so you can understand why objects are important and how to design with them.All programming languages provide abstractions. It can be argued that the complexity of the problems you’re able to solve is directly related to the kind and quality of abstraction. By “kind”I mean, “What is it that you are abstracting?”Assembly language is a small abstraction of the underlying machine. Many so-called “imperative”languages that followed (such as FORTRAN, BASIC, and C) were abstractions of assembly language. These languages are big improvements over assembly language, but their primary abstraction still requires you to think in terms of the structure of the computer rather than the structure of the problem you are trying to solve. The programmer must establish the association between the machine model (in the “solution space,”which is the place where you’re implementing that solution, such as a computer) and the model of the problem that is actually being solved (in the 16 Thinking in Java Bruce EckelThe object-oriented approach goes a step further by providing tools for the programmer to represent elements in the problem space. This representation is general enough that the programmer is not constrained to any particular type of problem. We refer to the elements in the problem space and their representations in the solution space as “objects.” (You will also need other objects that don’t have problem-space analogs.) The idea is that the program is allowed to adapt itself to the lingo of the problem by adding new types of objects, so when you read the code describing the solution, you’re reading words that also express the problem. This is a more flexible and powerful language abstraction than what we’ve had before.1 Thus, OOP allows you to describe the problem in terms of the problem, rather than in terms of the computer where the solution will run. There’s still a connection back to the computer:Each object looks quite a bit like a little computer—it has a state, and it has operations that you can ask it to perform. However, this doesn’t seem like such a bad analogy to objects in the real world—they all have characteristics and behaviors.Java is making possible the rapid development of versatile programs for communicating and collaborating on the Internet. We're not just talking word processors and spreadsheets here, but also applications to handle sales, customer service, accounting, databases, and human resources--the meat and potatoes of corporate computing. Java is also making possible a controversial new class of cheap machines called network computers,or NCs,which SUN,IBM, Oracle, Apple, and others hope will proliferate in corporations and our homes.The way Java works is simple, Unlike ordinary software applications, which take up megabytes on the hard disk of your PC,Java applications,or"applets",are little programs that reside on the network in centralized servers,the network that delivers them to your machine only when you need them to your machine only when you need them.Because the applets are so much smaller than conventional programs, they don't take forever to download.Say you want to check out the sales results from the southwest region. You'll use your Internet browser to find the corporate Internet website that dishes up financial data and, with a mouse click or two, ask for the numbers.The server will zap you not only the data, but also the sales-analysis applet you need to display it. The numbers will pop up on your screen in a Java spreadsheet, so you can noodle around with them immediately rather than hassle with importing them to your own spreadsheet program。
数据库聚类分析外文翻译文献
数据库聚类分析外文翻译文献(文档含中英文对照即英文原文和中文翻译)Clustering5.1 INTRODUCTIONClustering is similar to classification in that data are grouped. However, unlike classification, the groups are not predefined. Instead, the grouping is accomplished by finding similarities between data according to characteristics found in the actual data. The groups are called clusters. Some authors view clustering as a special type of classification. In this text, however, we follow a more conventional view in that the two are different. Many definitions for clusters have been proposed:●Set of like elements. Elements from different clusters are not alike.●The distance between points in a cluster is less than the distance betweena point in the cluster and any point outside it.A term similar to clustering is database segmentation, where like tuple (record) in a database are grouped together. This is done to partition or segment the database into components that then give the user a more general view of the data. In this case text, we do not differentiate between segmentation and clustering. A simple example of clustering is found in Example 5.1. This example illustrates the fact that that determining how to do the clustering is not straightforward.As illustrated in Figure 5.1, a given set of data may be clustered on different attributes. Here a group of homes in a geographic area is shown. The first floor type of clustering is based on the location of the home. Homes that are geographically close to each other are clustered together. In the second clustering, homes are grouped based on the size of the house.Clustering has been used in many application domains, including biology, medicine, anthropology, marketing, and economics. Clustering applications include plant and animal classification, disease classification, image processing, pattern recognition, and document retrieval. One of the first domains in which clustering was used was biological taxonomy. Recent uses include examining Web log data to detect usage patterns.When clustering is applied to a real-world database, many interesting problems occur:●Outlier handling is difficult. Here the elements do not naturally fallinto any cluster. They can be viewed as solitary clusters. However, if aclustering algorithm attempts to find larger clusters, these outliers will beforced to be placed in some cluster. This process may result in the creationof poor clusters by combining two existing clusters and leaving the outlier in its own cluster.● Dynamic data in the database implies that cluster membership maychange over time.● Interpreting the semantic meaning of each cluster may be difficult.With classification, the labeling of the classes is known ahead of time. However, with clustering, this may not be the case. Thus, when the clustering process finishes creating a set of clusters, the exact meaning of each cluster may not be obvious. Here is where a domain expert is needed to assign a label or interpretation for each cluster.● There is no one correct answer to a clustering problem. In fact, manyanswers may be found. The exact number of clusters required is not easy to determine. Again, a domain expert may be required. For example, suppose we have a set of data about plants that have been collected during a field trip. Without any prior knowledge of plant classification, if we attempt to divide this set of data into similar groupings, it would not be clear how many groups should be created.● Another related issue is what data should be used of clustering.Unlike learning during a classification process, where there is some a priori knowledge concerning what the attributes of each classification should be, in clustering we have no supervised learning to aid the process. Indeed, clustering can be viewed as similar to unsupervised learning.We can then summarize some basic features of clustering (as opposed to classification):● The (best) number of clusters is not known.● There may not be any a priori knowledge concerning the clusters.● Cluster results are dynamic.The clustering problem is stated as shown in Definition 5.1. Here weassume that the number of clusters to be created is an input value, k. The actual content (and interpretation) of each cluster,j k ,1j k ≤≤, is determined as a result of the function definition. Without loss of generality, we will view that the result of solving a clustering problem is that a set of clusters is created: K={12,,...,k k k k }.D EFINITION 5.1.Given a database D ={12,,...,n t t t } of tuples and aninteger value k , the clustering problem is to define a mapping f : {1,...,}D k → where each i t is assigned to one cluster j K ,1j k ≤≤. A cluster j K , contains precisely those tuples mapped to it; that is,j K ={|(),1,i i j t f t K i n =≤≤and i t D ∈}.A classification of the different types of clustering algorithms isshown in Figure 5.2. Clustering algorithms themselves may be viewed as hierarchical or partitional. With hierarchical clustering, a nested set of clusters is created. Each level in the hierarchy has a separate set of clusters. At the lowest level, each item is in its own unique cluster. At the highest level, all items belong to the same cluster. With hierarchical clustering, the desired number of clusters is not input. With partitional clustering, the algorithm creates only one set of clusters. These approaches use the desired number of clusters to drive how the final set is created. Traditional clustering algorithms tend to be targeted to small numeric database that fit into memory .There are, however, more recent clustering algorithms that look at categorical data and are targeted to larger, perhaps dynamic, databases. Algorithms targeted to larger databases may adapt to memory constraints by either sampling the database or using data structures, which can be compressed or pruned to fit into memory regardless of the size of the database. Clustering algorithms may also differ based on whether they produce overlapping or nonoverlapping clusters. Even though we consider only nonoverlapping clusters, it is possible to place an item in multiple clusters. In turn, nonoverlapping clusters can be viewed as extrinsic or intrinsic. Extrinsic techniques use labeling of the items to assist in the classification process. These algorithms are the traditional classification supervised learning algorithms in which a special input training set is used. Intrinsic algorithms do not use any a priori category labels, but depend only on the adjacency matrix containing the distance between objects. All algorithms we examine in this chapter fall into the intrinsic class.The types of clustering algorithms can be furthered classified based onthe implementation technique used. Hierarchical algorithms can becategorized as agglomerative or divisive. ”Agglomerative ” implies that the clusters are created in a bottom-up fashion, while divisive algorithms work in a top-down fashion. Although both hierarchical and partitional algorithms could be described using the agglomerative vs. divisive label, it typically is more associated with hierarchical algorithms. Another descriptive tag indicates whether each individual element is handled one by one, serial (sometimes called incremental), or whether all items are examined together, simultaneous. If a specific tuple is viewed as having attribute values for all attributes in the schema, then clustering algorithms could differ as to how the attribute values are examined. As is usually done with decision tree classification techniques, some algorithms examine attribute values one at a time, monothetic. Polythetic algorithms consider all attribute values at one time. Finally, clustering algorithms can be labeled base on the mathematical formulation given to the algorithm: graph theoretic or matrix algebra. In this chapter we generally use the graph approach and describe the input to the clustering algorithm as an adjacency matrix labeled with distance measure.We discuss many clustering algorithms in the following sections. Thisis only a representative subset of the many algorithms that have been proposed in the literature. Before looking at these algorithms, we first examine possible similarity measures and examine the impact of outliers.5.2 SIMILARITY AND DISTANCE MEASURESThere are many desirable properties for the clusters created by asolution to a specific clustering problem. The most important one is that a tuple within one cluster is more like tuples within that cluster than it is similar to tuples outside it. As with classification, then, we assume the definition of a similarity measure, sim(,i l t t ), defined between any two tuples, ,i l t t D . This provides a more strict and alternative clustering definition, as found in Definition 5.2. Unless otherwise stated, we use the first definition rather than the second. Keep in mind that the similarity relationship stated within the second definition is a desirable, although not always obtainable, property.A distance measure, dis(,i j t t ), as opposed to similarity, is often used inclustering. The clustering problem then has the desirable property that given a cluster,j K ,,jl jm j t t K ∀∈ and ,(,)(,)i j jl jm jl i t K sim t t dis t t ∉≤.Some clustering algorithms look only at numeric data, usuallyassuming metric data points. Metric attributes satisfy the triangular inequality. The cluster can then be described by using several characteristic values. Given a cluster, m K of N points { 12,,...,m m mN t t t }, we make thefollowing definitions [ZRL96]:Here the centroid is the “middle ” of the cluster; it need not be an actualpoint in the cluster. Some clustering algorithms alternatively assume that the cluster is represented by one centrally located object in the cluster called a medoid . The radius is the square root of the average mean squared distance from any point in the cluster to the centroid, and of points in the cluster. We use the notation m M to indicate the medoid for cluster m K .Many clustering algorithms require that the distance between clusters(rather than elements) be determined. This is not an easy task given that there are many interpretations for distance between clusters. Given clusters i K and j K , there are several standard alternatives to calculate the distancebetween clusters. A representative list is:● Single link : Smallest distance between an element in onecluster and an element in the other. We thus havedis(,i j K K )=min((,))il jm il i j dis t t t K K ∀∈∉and jm j i t K K ∀∈∉.● Complete link : Largest distance between an element in onecluster and an element in the other. We thus havedis(,i j K K )=max((,))il jm il i j dis t t t K K ∀∈∉and jm j i t K K ∀∈∉.● Average : Average distance between an element in onecluster and an element in the other. We thus havedis(,i j K K )=((,))il jm il i j mean dis t t t K K ∀∈∉and jm j i t K K ∀∈∉.● Centroid : If cluster have a representative centroid, then thecentroid distance is defined as the distance between the centroids.We thus have dis(,i j K K )=dis(,i j C C ), where i C is the centroidfor i K and similarly for j C .Medoid : Using a medoid to represent each cluster, thedistance between the clusters can be defined by the distancebetween the medoids: dis(,i j K K )=(,)i j dis M M5.3 OUTLIERSAs mentioned earlier, outliers are sample points with values muchdifferent from those of the remaining set of data. Outliers may represent errors in the data (perhaps a malfunctioning sensor recorded an incorrect data value) or could be correct data values that are simply much different from the remaining data. A person who is 2.5 meters tall is much taller than most people. In analyzing the height of individuals, this value probably would be viewed as an outlier.Some clustering techniques do not perform well with the presence ofoutliers. This problem is illustrated in Figure 5.3. Here if three clusters are found (solid line), the outlier will occur in a cluster by itself. However, if two clusters are found (dashed line), the two (obviously) different sets of data will be placed in one cluster because they are closer together than the outlier. This problem is complicated by the fact that many clustering algorithms actually have as input the number of desired clusters to be found.Clustering algorithms may actually find and remove outliers to ensurethat they perform better. However, care must be taken in actually removing outliers. For example, suppose that the data mining problem is to predict flooding. Extremely high water level values occur very infrequently, and when compared with the normal water level values may seem to be outliers. However, removing these values may not allow the data mining algorithms to work effectively because there would be no data that showed that floods ever actually occurred.Outlier detection, or outlier mining, is the process of identifyingoutliers in a set of data. Clustering, or other data mining, algorithms may then choose to remove or treat these values differently. Some outlier detection techniques are based on statistical techniques. These usually assume that the set of data follows a known distribution and that outliers can be detected by well-known tests such as discordancy tests. However, thesetests are not very realistic for real-world data because real-world data values may not follow well-defined data distributions. Also, most of these tests assume single attribute value, and many attributes are involved in real-world datasets. Alternative detection techniques may be based on distance measures.聚类分析5.1简介聚类分析与分类数据分组类似。
外文翻译---数据仓库技术
附录1 外文原文Data warehouse techniqueThe data warehouse says allThe data warehouse is an environment, not a product. It provides the decision that customer used for current history data supports, these data is very difficult in traditional operation type database or can't get, say more tangibly, the data warehouse is a kind of system construction. Data warehouse than it customer relation the management is a concept that is been familiar with by person, it is 1991 the United States an information engineering learns what house W.H. Inmon Doctor put forward, its definition is" the data warehouse is a support decision the process faces to the topic of, gather of, at any time but change of, the last long data gathers".The technique system construction of the data warehouse The data of obtains mold piece: Used for obtaining the data from the source document with the source database, combine the proceeding sweep, delivering, adding it to data warehouse database inside.· Data management mold piece: Used for the movement of the management data warehouse.· Delivers mold piece: Used for the other warehouse in direction with assign the data warehouse data in the exterior system.· The data is in the center a mold piece: The end customer in direction tool that used for the method provides the interview data warehouse database.· Data interview mold piece: Used for providing the interview for the end customer of the business enterprise with the tool of the analysis data warehouse data.· Design mold piece: Used for the design data warehouse database.· Information catalogue mold piece: Used for governor to provide with the customer relevant saving contents in data warehouse data in the database with meaning information.How to establish the data warehouseCurrent, the internal calculator in business enterprise system is mutually independent, the data rule( legitimacy) demand of the system that have is affirmed from the other system, various data lacks to gather sex, conduct and actions trend, the data warehouse technique is an one of the most emollient way to makes these data gathered get ups, the data warehouse establishes can at logical realize various system interaction operation, this lay the foundation for the modern college in developments, also leads for the college layer science decision offering guarantees powerfully. Theprocess that establish in the data warehouse needs below step:1. Establish the data model to the end business need. The design of the data model not only consider only to the first topic, but also looks after both sides the need of the other management in college decision topic to searches the need of the topic with every kind of data, statement.2. The certain topic proceeding data sets up the mold. According to the decision need certain topic, choice data source, proceed logic construction design.3. The database of the design data warehouse. Put great emphasis on the saving construction in physics that apply in the topic development data warehouse inside data.4. Definition data source. According to the topic data model, choose different operation type database as the data source.5. Establish the model for a data. The model made sure into the data scope of the data warehouse, and with provision of relevant data. Complete a data, can let customer known, the data warehouse inside has actually what data, the data gathers the level of structure with how detailed degree is, can provide what information, how these information are carried calculates with organizes etc..6. Take( Extract), convert( Transmit), add from the operation type database inside take out the data that carry( Load) the database inside arrive the data warehouse.7. Choice data interview analysis tool, the customer will use the saving information within these toolses interview data warehouse, realizing decision support need.The data scoops out the techniqueAlong with the database technical develop continuously and extensive application in each profession in system in management in database, the backlog enlarges in the nasty play in amount of data in the database, but among them can use directly however opposite less in amount of information.People have been hoping can to conceal in the superficial information in these data, proceed many level of structures analyze, for the purpose of better land utilization acquire the benefit to operate in the business with these data, increase the information of the social competition ability.Current every kind of database management system although can realizes efficiently the data record into and search, statistics to wait the function, can't discover relation existed in the data with regulation, resulted in like this and then a kind of data Bang and knowledge needy keep both of phenomenon. According to the inquisition, the data collections increase with saving with every year 130% speed, but in the data only have 2% data to is analyzed availably. This exploitation that scoop out provided the vast space for the data .To the 2004, apply to attain USD1,000,000,000 in the data of the electronic commerce market of scooping out the tool.附录2 外文资料译文数据仓库技术数据仓库概述数据仓库是一个环境,而不是一件产品。
毕业设计 物流 外文文献翻译 中英文 仓储
WarehousingThis chapter presents a description of a small, fictitious warehouse that distributes office supplies and some office furniture to small retail stores and individual mail-order customers. The facility was purchased from another company, and it is larger than required for the immediate operation. The operation, currently housed in an older facility, will move in a few months. The owners foresee substantial growth in theirhigh-quality product lines, so the extra space will accommodate the growth for the next few years. The description of the warehouse is of the planned operation after moving into the facility.The purpose of this chapter is to introduce the reader to the operations of warehouses. Basic function sare described, typical equipment types are illustrated, and operations within departments are presented in some detail so that the reader can understand the relationships among products, orders, order lines, storage space, and labor requirements. Storage assignment and retrieval strategies are briefly discussed.Evaluation of the planned operation includes turnover, performance, and cost analyses. Additional information can be found in other chapters of this volume and in the reference material.Role of the Warehouse in the Supply ChainWarehouses can serve different roles within the larger organization. For example, a stock room serving a manufacturing facility must provide a fast response time. The major activities would be piece (item)picking, carton picking, and preparation of assembly kits (kitting). A mail-order retailer usually must provide a great variety of products in small quantities at low cost to many customers. A factory warehouse usually handles a limited number of products in large quantities. A large, discount chain ware hou se typically “pushes” some products out to its retailers based on marketing campaigns, with other products being “pulled” by the store managers. Shipments are oft en full and half truckloads. The Ware house described here is a small, chain warehousethat carries a limited product line for distributionto its retailers and independent customers.The purpose of the warehouse is to provide the utility of time and place to its customers, both retail in the quantities requested by small retailers and individual customers. Production schedules often result in long runs and large lot sizes. Thus, manufacturers usually are not able to meet the delivery dates of small retailers and individuals. The warehouse bridges the gap and enables both parties, manufacturer and customer, to operate within their own spheres.Product and Order Descriptions1.Product DescriptionsThe products handled include paper products, pens, staplers, small storage units, other desktop products, electronic products are delivered directly from other distributors and not handled by the warehouse.One would say that the warehouse handles relatively low-value products from the viewpoint of manufacturing cost. ships among these load types. Individuals usually request pieces; retailers may also request pieces of slow movers, products that are not in high demand. Retailers usually request fast movers, products that are in high demand, in carton quantities. Bulky products like large desktop storage units may be in high enough demand so that they are sold by the warehouse in pallets. Furniture units are also sold on pallets for ease of movement in the warehouse and in the delivery trucks.shows the number of products to be stored and the number of storage locations needed. The latter issue is discussed inSection The typical dimensions of a piece is 10 × 25 × 3.5 cm, with a typical volume of 0.875 liters. A carton has typical dimensions of 33 × 43 × 30 cm, with a typical volume of 42.6 liters. Thus, a typical carton contains 48.7 pieces. The typical dimension of a pallet is 80 × 120 × 140 cm, with the last dimension being and individual. Manufacturers of office supplies and furniture are usually not willing to supply products low-priced media like CD and DVD blanks, book and electronic titles, and office furniture. High-value Products are sold by the warehouse as pieces, cartons, and on pallets. Figure 12.1 shows the relation- the height. Thepallet base is about 10 cm high, so the typical product volume is 1.25 m3, corresponding to 29.3 cartons. The pallet base allows for pickup by forklift truck from any of the four sides. Table 12.2 summarizes these values. Different products, of course, have different dimensions and relationships. The conversion factors can vary depending on whether the product is sold mainly in piece, carton, or pallet quantities. We will not introduce further complexity here and use the values given here for determining storage and labor requirements.2.Order DescriptionsThere are two types of orders processed at the warehouse. Large orders are placed by the retailers who belong to the same corporation; these are delivered by less-than-truckload (LTL) carrier. Small orders are placed by individuals, and these are delivered by package courier service like United States Postal Service (USPS), United Parcel Service (UPS), and Federal Express (Fed EX). Large orders contain more products and the quantity per product is greater than for small orders.Pallet Pick OperationsFull pallet picking is done primarily in the floor storage area and occasionally in the pallet rack area. These pallets move directly to outbound staging. A forklift truck has the capacity to transport one pallet at a time. Travel within the pallet floor storage area follows the rectilinear distance metric (Francis et al. 1992).Sorting, Packing, Staging, Shipping OperationsPieces and cartons that are picked using batch picking must first be sorted by order before further processing. The method of batch picking, described in the following, is designed to facilitate this process without requiring extensive conveyor equipment. In addition, all pieces must be packed into over pack cartons, and these are then consolidated with regular (single product) cartons by order. Some cartons and over packs move to outbound staging for package courier services like USPS, UPS, and FedEx. Others move to outbound staging for LTL carrier service. The package courier services load their vehicles manually, and the LTL carriers are loaded by warehouse personnel using either forklift trucks or pallet jacks.Support Operations, Reware housing, Returns ProcessingAt irregular times, the warehouse staff must perform additional functions that are not part of the normal process. Whenever a new store is being prepared for opening, a large quantity of product, for the full product line, must be picked and staged. There is a separate area set aside for this staging.Occasionally, some products need to be repackaged and/or labeled for retail stores. Th is value-added processing is performed between picking and packing. Returned merchandise must be inspected, possibly repackaged, and then returned to storage locations. The volume is not significant, and it is handled in the value-added area. Periodically, product locations must be changed to reflect changing demand. This reware housing is performed during slack periods so as not to require additional labor.In addition, the warehouse contains an office for management and sales personnel, toilets for both staff and truck drivers, and a break room with space for vending machines and dining. There is a battery charging room for the electric batteries used by forklifts and pallet jacks, and a small maintenance room.Storage Department Descriptions and OperationsThis section presents details on the individual storage departments and their operations. Here we determine the storage space requirements, and we describe the pick methods and obtain labor requirements.Bin ShelvingTh e bin shelving area contains 1000 slow moving products that are picked as pieces. Th ey are housed in shelving units that are 40 cm deep, 180 cm high, and 100 cm wide, for a cubic volume of 0.72 m3. Using a cubic space utilization factor of 0.6 to allow for clearances and mismatches of carton dimensions with the shelves, each shelving unit can accommodate on average 0.72 × 0.6/0.0426 = 10.14 cartons. If each product requires at most one carton, then we need 1000/10.14 = 98.6 or 99 shelving units. Rounding this to 100 units implies a pick line 100/2 = 50 m. One way to implement this is to establish two pick aisles, each 25m long, as shown in Figure 12.9. In the final layout, the system is expanded to a length of 30 m. In addition, space is provided for two future aisles. Although all the products stored here are considered slow movers, with some exceptions for products with small total required inventory measured in cubic volume, the principle of activity-based storage is extended further to identify the faster moving products (among the slow movers). These are placed in the ergonomically desirable golden zone.The small number of requests per order for slow moving products makes it appropriate to use a sort-while-pick (SWP) method for retrieval. An order picker uses a cart with multiple compartments to pick items for several orders on one trip past the shelves. The compartments items for different orders being mixed . Later, when the cart is moved to sorting, consolidation, and packing, there is actually little sorting work to do, but mainly consolidation and packing.Warehouse ManagementThe operation of the warehouse requires careful and constant management. The scanning of received products is just one example of the functions performed by the WMS. It is beyond the scope of this chapter to present details of a typical WMS. However, some main features should be mentioned here.The tracking of flows throughout the warehouse is one of the basic functions of a WMS. This can be done manually, but most facilities today use barcode scanners, and many use barcode scanners intedatabase. A typical WMS enables the functions listed below. These requirements are not inclusive, but only indicate the types of functions desired. Further details are in (Sharp, 2001).The WMS should enable scheduling of personnel, including regular full-time employees and temporary and part-time employees. Tracking of employee productivity is useful for training and workload balancing. Workload scheduling should be linked to forecast information, and the conversion of product volumes should be automatically translated to labor hours by function and employee productivity. out-of-stock conditions, process partial receipts, and quarantine products requiringinspection. It should generate labels for pallets and cartons with data on SKU (unique product type), description, date received, lot or purchase order number, expiration code(s), and location code(s). It should assign storage location recognizing physical characteristics of product, physical characteristics of location, environmental restrictions, and stock rotation. It should also have the ability to send products directly to out-bound vehicles (cross-docking). The ability to schedule trucks and assign them to docks is also useful. mation of stow (storage) action, updating of inventory upon stow, stock reservation capability, and provision for cycle counting. The WMS should support more than one location per SKU and more than one SKU per location. Report generation should include stock activity reports (fast, medium, slow, dead), empty location reports, and anticipated replenishment of forward pick areas.仓储本章提出了一个描述一个小虚拟仓库分发办公用品和办公家具的小零售商店和邮购客户个人。
计算机毕业设计外文翻译---数据仓库
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1)Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2)Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on..(3)Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data..In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisionsbased on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending).(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies.(3) analyzing operations and looking for sources of profit.(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1)Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2)Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3)Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4)View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供了组织结构和工具,以便系统地组织、理解和使用数据进行决策。
仓库管理系统外文翻译本科毕业论文
At a bare minimum, a WMS should:
Have a flexible loca on system.
U lize user-defined parameters to direct warehouse tasks and use live documents to execute these tasks.
The Reality:
The implementa on of a WMS along with automated data collec on will likely give you increases in accuracy, reduc on in labor costs (provided the labor required to maintain the system is less than the labor saved on the warehouse floor), and a greater ability to service the customera ons of inventory reduc on and increased storage capacity are less likely. While increased accuracy and efficiencies in the receiving process may reduce the level of safety stock required, the impact of this reduc on will likely be negligible in comparison to overall inventory levels. The predominant factors that control inventory levels are lot
外文翻译 - 数据库
本科生毕业设计 (论文)
外文翻译
原文标题Database Management
译文标题数据库管理
作者所在系别计算机科学与工程系
作者所在专业网络工程专业
作者所在班级B07522
作者姓名李健健
作者学号20074052232
指导教师姓名赵丽艳
指导教师职称讲师
完成时间2010 年11 月
北华航天工业学院教务处制
注:1. 指导教师对译文进行评阅时应注意以下几个方面:①翻译的外文文献与毕业设计(论文)的主题是否高度相关,并作为外文参考文献列入毕业设计(论文)的参考文献;②翻译的外文文献字数是否达到规定数量(3 000字以上);③译文语言是否准确、通顺、具有参考价值。
2. 外文原文应以附件的方式置于译文之后。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
河北工程大学毕业论文(设计)英文参考文献原文复印件及译文数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
大量组织机构已经发现,在当今这个充满竞争、快速发展的世界,数据仓库是一个有价值的工具。
在过去的几年中,许多公司已花费数百万美元,建立企业范围的数据仓库。
许多人感到,随着工业竞争的加剧,数据仓库成了必备的最新营销武器——通过更多地了解客户需求而保住客户的途径。
“那么”,你可能会充满神秘地问,“到底什么是数据仓库?”数据仓库已被多种方式定义,使得很难严格地定义它。
宽松地讲,数据仓库是一个数据库,它与组织机构的操作数据库分别维护。
数据仓库系统允许将各种应用系统集成在一起,为统一的历史数据分析提供坚实的平台,对信息处理提供支持。
按照W. H. Inmon,一位数据仓库系统构造方面的领头建筑师的说法,“数据仓库是一个面向主题的、集成的、时变的、非易失的数据集合,支持管理决策制定”。
这个简短、全面的定义指出了数据仓库的主要特征。
四个关键词,面向主题的、集成的、时变的、非易失的,将数据仓库与其它数据存储系统(如,关系数据库系统、事务处理系统、和文件系统)相区别。
让我们进一步看看这些关键特征。
(1) 面向主题的:数据仓库围绕一些主题,如顾客、供应商、产品和销售组织。
数据仓库关注决策者的数据建模与分析,而不是构造组织机构的日常操作和事务处理。
因此,数据仓库排除对于决策无用的数据,提供特定主题的简明视图。
(2) 集成的:通常,构造数据仓库是将多个异种数据源,如关系数据库、一般文件和联机事务处理记录,集成在一起。
使用数据清理和数据集成技术,确保命名约定、编码结构、属性度量的一致性等。
(3) 时变的:数据存储从历史的角度(例如,过去5-10 年)提供信息。
数据仓库中的关键结构,隐式或显式地包含时间元素。
(4) 非易失的:数据仓库总是物理地分离存放数据;这些数据源于操作环境下的应用数据。
由于这种分离,数据仓库不需要事务处理、恢复和并行控制机制。
通常,它只需要两种数据访问:数据的初始化装入和数据访问。
概言之,数据仓库是一种语义上一致的数据存储,它充当决策支持数据模型的物理实现,并存放企业决策所需信息。
数据仓库也常常被看作一种体系结构,通过将异种数据源中的数据集成在一起而构造,支持结构化和启发式查询、分析报告和决策制定。
“好”,你现在问,“那么,什么是建立数据仓库?”根据上面的讨论,我们把建立数据仓库看作构造和使用数据仓库的过程。
数据仓库的构造需要数据集成、数据清理、和数据统一。
利用数据仓库常常需要一些决策支持技术。
这使得“知识工人”(例如,经理、分析人员和主管)能够使用数据仓库,快捷、方便地得到数据的总体视图,根据数据仓库中的信息做出准确的决策。
有些作者使用术语“建立数据仓库”表示构造数据仓库的过程,而用术语“仓库DBMS”表示管理和使用数据仓库。
我们将不区分二者。
“组织机构如何使用数据仓库中的信息?”许多组织机构正在使用这些信息支持商务决策活动,包括:(1)、增加顾客关注,包括分析顾客购买模式(如,喜爱买什么、购买时间、预算周期、消费习惯);(2)、根据季度、年、地区的营销情况比较,重新配置产品和管理投资,调整生产策略;(3)、分析运作和查找利润源;(4)、管理顾客关系、进行环境调整、管理合股人的资产开销。
从异种数据库集成的角度看,数据仓库也是十分有用的。
许多组织收集了形形色色数据,并由多个异种的、自治的、分布的数据源维护大型数据库。
集成这些数据,并提供简便、有效的访问是非常希望的,并且也是一种挑战。
数据库工业界和研究界都正朝着实现这一目标竭尽全力。
对于异种数据库的集成,传统的数据库做法是:在多个异种数据库上,建立一个包装程序和一个集成程序(或仲裁程序)。
这方面的例子包括IBM 的数据连接程序和Informix的数据刀。
当一个查询提交客户站点,首先使用元数据字典对查询进行转换,将它转换成相应异种站点上的查询。
然后,将这些查询映射和发送到局部查询处理器。
由不同站点返回的结果被集成为全局回答。
这种查询驱动的方法需要复杂的信息过滤和集成处理,并且与局部数据源上的处理竞争资源。
这种方法是低效的,并且对于频繁的查询,特别是需要聚集操作的查询,开销很大。
对于异种数据库集成的传统方法,数据仓库提供了一个有趣的替代方案。
数据仓库使用更新驱动的方法,而不是查询驱动的方法。
这种方法将来自多个异种源的信息预先集成,并存储在数据仓库中,供直接查询和分析。
与联机事务处理数据库不同,数据仓库不包含最近的信息。
然而,数据仓库为集成的异种数据库系统带来了高性能,因为数据被拷贝、预处理、集成、注释、汇总,并重新组织到一个语义一致的数据存储中。
在数据仓库中进行的查询处理并不影响在局部源上进行的处理。
此外,数据仓库存储并集成历史信息,支持复杂的多维查询。
这样,建立数据仓库在工业界已非常流行。
1.操作数据库系统与数据仓库的区别由于大多数人都熟悉商品关系数据库系统,将数据仓库与之比较,就容易理解什么是数据仓库。
联机操作数据库系统的主要任务是执行联机事务和查询处理。
这种系统称为联机事务处理(OLTP)系统。
它们涵盖了一个组织的大部分日常操作,如购买、库存、制造、银行、工资、注册、记帐等。
另一方面,数据仓库系统在数据分析和决策方面为用户或“知识工人”提供服务。
这种系统可以用不同的格式组织和提供数据,以便满足不同用户的形形色色需求。
这种系统称为联机分析处理(OLAP)系统。
OLTP 和OLAP 的主要区别概述如下。
(1) 用户和系统的面向性:OLTP 是面向顾客的,用于办事员、客户、和信息技术专业人员的事务和查询处理。
OLAP 是面向市场的,用于知识工人(包括经理、主管、和分析人员)的数据分析。
(2) 数据内容:OLTP 系统管理当前数据。
通常,这种数据太琐碎,难以方便地用于决策。
OLAP 系统管理大量历史数据,提供汇总和聚集机制,并在不同的粒度级别上存储和管理信息。
这些特点使得数据容易用于见多识广的决策。
(3) 数据库设计:通常,OLTP 系统采用实体-联系(ER)模型和面向应用的数据库设计。
而OLAP 系统通常采用星形或雪花模型和面向主题的数据库设计。
(4) 视图:OLTP 系统主要关注一个企业或部门内部的当前数据,而不涉及历史数据或不同组织的数据。
相比之下,由于组织的变化,OLAP 系统常常跨越数据库模式的多个版本。
OLAP 系统也处理来自不同组织的信息,由多个数据存储集成的信息。
由于数据量巨大,OLAP 数据也存放在多个存储介质上。
(5)、访问模式:OLTP 系统的访问主要由短的、原子事务组成。
这种系统需要并行控制和恢复机制。
然而,对OLAP系统的访问大部分是只读操作(由于大部分数据仓库存放历史数据,而不是当前数据),尽管许多可能是复杂的查询。
OLTP 和OLAP 的其它区别包括数据库大小、操作的频繁程度、性能度量等。
2.但是,为什么需要一个分离的数据仓库“既然操作数据库存放了大量数据”,你注意到,“为什么不直接在这种数据库上进行联机分析处理,而是另外花费时间和资源去构造一个分离的数据仓库?”分离的主要原因是提高两个系统的性能。
操作数据库是为已知的任务和负载设计的,如使用主关键字索引和散列,检索特定的记录,和优化“罐装的”查询。
另一方面,数据仓库的查询通常是复杂的,涉及大量数据在汇总级的计算,可能需要特殊的数据组织、存取方法和基于多维视图的实现方法。
在操作数据库上处理OLAP 查询,可能会大大降低操作任务的性能。
此外,操作数据库支持多事务的并行处理,需要加锁和日志等并行控制和恢复机制,以确保一致性和事务的强健性。
通常,OLAP 查询只需要对数据记录进行只读访问,以进行汇总和聚集。
如果将并行控制和恢复机制用于这OLAP 操作,就会危害并行事务的运行,从而大大降低OLTP 系统的吞吐量。
最后,数据仓库与操作数据库分离是由于这两种系统中数据的结构、内容和用法都不相同。
决策支持需要历史数据,而操作数据库一般不维护历史数据。
在这种情况下,操作数据库中的数据尽管很丰富,但对于决策,常常还是远远不够的。
决策支持需要将来自异种源的数据统一(如,聚集和汇总),产生高质量的、纯净的和集成的数据。
相比之下,操作数据库只维护详细的原始数据(如事务),这些数据在进行分析之前需要统一。
由于两个系统提供很不相同的功能,需要不同类型的数据,因此需要维护分离的数据库。
Data warehousing provides architectures and tools for business executives to sy stematically organize, understand, and use their data to make strategic decisions. A lar ge number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people f eel that with competition mounting in every industry, data warehousing is the latest m ust-have marketing weapon —— a way to keep customers by learning more about the ir needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulat e a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse s ystems allow for the integration of a variety of application systems. They support info rmation processing by providing a solid platform of consolidated, historical data for a nalysis.According to W. H. Inmon, a leading architect in the construction of data wareho use systems, “a data warehouse is a subject-oriented, integrated, time-variant, and non volatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. T he four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distingui sh data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day o perations and transaction processing of an organization, a data warehouse focuses on t he modeling and analysis of data for decision makers. Hence, data warehouses typical ly provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple he terogeneous sources, such as relational databases, flat files, and on-line transaction rec ords. Data cleaning and data integration techniques are applied to ensure consistency i n naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical pers pective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data tra nsformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and co ncurrency control mechanisms. It usually requires only two operations in data accessi ng: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a p hysical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneou s sources to support structured and/or ad hoc queries, analytical reporting, and decisio n making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integratio n, data cleaning, and data consolidation. The utilization of a data warehouse often nec essitates a collection of decision support technologies. This allows “knowledge worke rs" (e.g., managers, analysts, and executives) to use the warehouse to quickly and con veniently obtain an overview of the data, and to make sound decisions based on infor mation in the warehouse. Some authors use the term “data warehousing" to refer onlyto the process of data warehouse construction, while the term warehouse DBMS is use d to refer to the management and utilization of data warehouses. We will not make thi s distinction here.“How are organizations using the information from data warehouses?" Many org anizations are using this information to support business decision making activities, in cluding:(1) increasing customer focus, which includes the analysis of customer buying pa tterns (such as buying preference, buying time, budget cycles, and appetites for spendi ng),(2) repositioning products and managing product portfolios by comparing the per formance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous dat abase integration. Many organizations typically collect diverse kinds of data and main tain large databases from multiple, heterogeneous, autonomous, and distributed infor mation sources. To integrate such data, and provide easy and efficient access to it is hi ghly desirable, yet challenging.Much effort has been spent in the database industry and research community tow ards achieving this goal.The traditional database approach to heterogeneous database integration is to buil d wrappers and integrators (or mediators) on top of multiple, heterogeneous databases . A variety of data joiner and data blade products belong to this category. When a quer y is posed to a client site, a metadata dictionary is used to translate the query into quer ies appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sit es are integrated into a global answer set. This query-driven approach requires comple x information filtering and integration processes, and competes for resources with pro cessing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach o f heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informati on from multiple, heterogeneous sources is integrated in advance and stored in a ware house for direct querying and analysis. Unlike on-line transaction processing database s, data warehouses do not contain the most current information. However, a data ware house brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured int o one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store an d integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1. Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems .The major task of on-line operational database systems is to perform on-line trans action and query processing. These systems are called on-line transaction processing ( OLTP) systems. They cover most of the day-to-day operations of an organization, suc h as, purchasing, inventory, manufacturing, banking, payroll, registration, and account ing. Data warehouse systems, on the other hand, serve users or “knowledge workers" i n the role of data analysis and decision making. Such systems can organize and presen t data in various formats in order to accommodate the diverse needs of the different us ers. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as f ollows.(1). Users and system orientation: An OLTP system is customer-oriented and is u sed for transaction and query processing by clerks, clients, and information technolog y professionals. An OLAP system is market-oriented and is used for data analysis by k nowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amou nts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the d ata easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER)data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterpri se or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with informat ion that originates from different organizations, integrating information from many da ta stores. Because of their huge volume, OLAP data are stored on multiple storage me dia.(5). Access patterns: The access patterns of an OLTP system consist mainly of sh ort, atomic transactions. Such a system requires concurrency control and recovery me chanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although m any could be complex queries.Other features which distinguish between OLTP and OLAP systems include data base size, frequency of operations, and performance metrics and so on. 2. But, why ha ve a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and w orkloads, such as indexing and hashing using primary keys, searching for particular re cords, and optimizing “canned" queries. On the other hand, data warehouse queries ar e often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementati on methods based on multidimensional views. Processing OLAP queries in operationa l databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several t ransactions. Concurrency control and recovery mechanisms, such as locking and loggi ng, are required to ensure the consistency and robustness of transactions. An OLAP qu ery often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reducethe throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically mainta in historical data. In this context, the data in operational databases, though abundant, i s usually far from complete for decision making. Decision support requires consolidat ion (such as aggregation and summarization) of data from heterogeneous sources, resu lting in high quality, cleansed and integrated data. In contrast, operational databases c ontain only detailed raw data, such as transactions, which need to be consolidated bef ore analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.五分钟搞定5000字毕业论文外文翻译,你想要的工具都在这里!在科研过程中阅读翻译外文文献是一个非常重要的环节,许多领域高水平的文献都是外文文献,借鉴一些外文文献翻译的经验是非常必要的。