DataStage 基础培训_练习

合集下载

datastage面试300题

datastage面试300题

1. What are the Environmental variables in Datastage?2. Check for Job Errors in datastage3. What are Stage V ariables, Derivations and Constants?4. What is Pipeline Parallelism?5. Debug stages in PX6. How do you remove duplicates in dataset7. What is the difference between Job Control and Job Sequence8. What is the max size of Data set stage?9. performance in sort stage10. How to develop the SCD using LOOKUP stage?12. What are the errors you expereiced with data stage13. what are the main diff between server job and parallel job in datastage14. Why you need Modify Stage?15. What is the difference between Squential Stage & Dataset Stage. When do u use them.16. memory allocation while using lookup stage17. What is Phantom error in the datastage. How to overcome this error.18. Parameter file usage in Datastage19. Explain the best approch to do a SCD type2 mapping in parallel job?20. how can we improve the performance of the job while handling huge amount of data21. HI How can we create read only jobs in Datastage.22. how to implement routines in data stage,have any one has any material for data stage23. How will you determine the sequence of jobs to load into data warehouse?24. How can we Test jobs in Datastage??25. DataStage - delete header and footer on the source sequential26. How can we implement Slowly Changing Dimensions in DataStage?.27. Differentiate Database data and Data warehouse data?28. How to run a Shell Script within the scope of a Data stage job?29. what is the difference between datastage and informatica30. Explain about job control language such as (DS_JOBS)32. What is Invocation ID?33. How to connect two stages which do not have any common columns between them?34. In SAP/R3, How do you declare and pass parameters in parallel job .35. Difference between Hashfile and Sequential File?36. How do you fix the error "OCI has fetched truncated data" in DataStage37. A batch is running and it is scheduled to run in 5 minutes. But after 10 days the time changes to 10 minutes. What type of error is this and how to fix it?38. Which partition we have to use for Aggregate Stage in parallel jobs ?39. What is the baseline to implement parition or parallel execution method in datastage job.e.g. more than 2 millions records only advised ?40. how do we create index in data satge?41. What is the flow of loading data into fact & dimensional tables?42. What is a sequential file that has single input link??43. Aggregators –What does the warning “Hash table has grown to …xyz‟ ….” mean?44. what is hashing algorithm?45. How do you load partial data after job failedsource has 10000 records, Job failed after 5000 records are loaded. This status of the job is abort , Instead of removing 5000 records from target , How can i resume the load46. What is Orchestrate options in generic stage, what are the option names. value ? Name of an Orchestrate operator to call. what are the orchestrate operators available in datastage for AIX environment.47. Type 30D hash file is GENERIC or SPECIFIC?48. Is Hashed file an Active or Passive Stage? When will be it useful?49. How do you extract job parameters from a file?50.1.What about System variables?2.How can we create Containers?3.How can we improve the performance of DataStage?4.what are the Job parameters?5.what is the difference between routine and transform and function?6.What are all the third party tools used in DataStage?7.How can we implement Lookup in DataStage Server jobs?8.How can we implement Slowly Changing Dimensions in DataStage?.9.How can we join one Oracle source and Sequential file?.10.What is iconv and oconv functions?51What are the difficulties faced in using DataStage ? or what are the constraints in using DataStage ?52. Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps you have53. What r XML files and how do you read data from XML files and what stage to be used?54. How do you track performance statistics and enhance it?55. Types of vies in Datastage Director?There are 3 types of views in Datastage Director a) Job View - Dates of Jobs Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages, Event Messages, Program Generated Messag56. What is the default cache size? How do you change the cache size if needed?Default cache size is 256 MB. We can incraese it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there.57. How do you pass the parameter to the job sequence if the job is running at night?58. How do you catch bad rows from OCI stage?59. what is quality stage and profile stage?60. what is the use and advantage of procedure in datastage?61. What are the important considerations while using join stage instead of lookups.62. how to implement type2 slowly changing dimenstion in datastage? give me with example?63. How to implement the type 2 Slowly Changing dimension in DataStage?64. What are Static Hash files and Dynamic Hash files?65. What is the difference between Datastage Server jobs and Datastage Parallel jobs?66. What is ' insert for update ' in datastage67. How did u connect to DB2 in your last project?Using DB2 ODBC drivers.68. How do you merge two files in DS?Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or created a job to concatenate the 2 files into one if the metadata is different.69. What is the order of execution done internally in the transformer with the stage editor having input links on the lft hand side and output links?70. How will you call external function or subroutine from datastage?71. What happens if the job fails at night?72. Types of Parallel Processing?Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing.73. What is DS Administrator used for - did u use it?74. How do you do oracle 4 way inner join if there are 4 oracle input files?75. How do you pass filename as the parameter for a job?76. How do you populate source files?77. How to handle Date convertions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-mm? We use a) "Iconv" function - Internal Convertion. b) "Oconv" function - External Convertion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname,"D/M78. How do you execute datastage job from command line prompt?Using "dsjob" command as follows. dsjob -run -jobstatus projectname jobname79. Differentiate Primary Key and Partition Key?Primary Key is a combination of unique and not null. It can be a collection of key values called as composite primary key. Partition Key is a just a part of Primary Key. There are several methods of80 How to install and configure DataStage EE on Sun Micro systems multi-processor hardware running the Solaris 9 operating system?Asked by: Kapil Jayne81. What are all the third party tools used in DataStage?82. How do you eliminate duplicate rows?83. what is the difference between routine and transform and function?84. Do you know about INTEGRITY/QUALITY stage?85. how to attach a mtr file (MapTrace) via email and the MapTrace is used to record all the execute map errors86. Is it possible to calculate a hash total for an EBCDIC file and have the hash total stored as EBCDIC using Datastage?Currently, the total is converted to ASCII, even tho the individual records are stored as EBCDIC.87. If your running 4 ways parallel and you have 10 stages on the canvas, how many processes does datastage create?88. Explain the differences between Oracle8i/9i?89. How will you pass the parameter to the job schedule if the job is running at night? What happens if one job fails in the night?90. what is an environment variable??91. how find duplicate records using transformer stage in server edition92. what is panthom error in data stage93. How can we increment the surrogate key value for every insert in to target database94. what is the use of environmental variables?95. how can we run the batch using command line?96. what is fact load?97. Explain a specific scenario where we would use range partitioning ?98. what is job commit in datastage?99. hi..Disadvantages of staging area Thanks,Jagan100. How do you configure api_dump102. Does type of partitioning change for SMP and MPP systems?103. what is the difference between RELEASE THE JOB and KILL THE JOB?104. Can you convert a snow flake schema into star schema?105. What is repository?106. What is Fact loading, how to do it?107. What is the alternative way where we can do job control??108.Where we can use these Stages Link Partetionar, Link Collector & Inter Process (OCI) Stage whether in Server Jobs or in Parallel Jobs ?And SMP is a Parallel or Server ?109. Where can you output data using the Peek Stage?110. Do u know about METASTAGE?111. In which situation,we are using RUN TIME COLUMN PROPAGA TION option?112. what is the difference between datasatge and datastage TX?113. 1 1. Difference between Hashfile and Sequential File?. What is modulus?2 2. What is iconv and oconv functions?.3 3. How can we join one Oracle source and Sequential file?.4 4. How can we implement Slowly Changing Dimensions in DataStage?.5 5. How can we implement Lookup in DataStage Server jobs?.6 6. What are all the third party tools used in DataStage?.7 7. what is the difference between routine and transform and function?.8 8. what are the Job parameters?.9 9. Plug-in?.10 10.How can we improv114. Is it possible to query a hash file? Justify your answer...115. How to enable the datastage engine?116. How I can convert Server Jobs into Parallel Jobs?117. Suppose you have table "sample" & three columns in that tablesample:Cola Colb Colc1 10 1002 20 2003 30 300Assume: cola is primary keyHow will you fetch the record with maximum cola value using data stage tool into the target system118. How to parametarise a field in a sequential file?I am using Datastage as ETL Tool,Sequential file as source.119. What is TX and what is the use of this in DataStage ? As I know TX stand for Transformer Extender, but I don't know how it will work and where we will used ?120. What is the difference betwen Merge Stage and Lookup Stage?121. Importance of Surrogate Key in Data warehousing?Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e Surrogate Key is not affected by the changes going on with a databas122. What is the difference between Symetrically parallel processing,Massively parallel processing?123.What is the diffrence between the Dynamic RDBMS Stage & Static RDBMS Stage ?124. How to run a job using command line?125. What is user activity in datastage?126. how can we improve the job performance?127. how we can create rank using datastge like in informatica128. What is the use of job controle??129. What does # indicate in environment variables?130. what are two types of hash files??131. What are different types of star schema??132. what are different types of file formats??133. What are different dimension table in your project??Plz explain me with an example?? 134. what is the difference between buildopts and subroutines ?135. how can we improve performance in aggregator stage??136. What is SQL tuning? how do you do it ?137. What is the use of tunnable??138. how to distinguish the surogate key in different dimensional tables?how can we give for different dimension tables?139. how can we load source into ODS?140. What is the difference between sequential file and a dataset? When to use the copy stage?141. how to eleminate duplicate rows in data stage?142. What is complex stage? In which situation we are using this one?143. What is the sequencer stage??144. where actually the flat files store?what is the path?145. what are the different types of lookups in datastage?146. What are the most important aspects that a beginner must consider doin his first DS project ?147. how to find errors in job sequence?148. it is possible to access the same job two users at a time in datastage?149. how to kill the job in data stage?150. how to find the process id?explain with steps?151. Why job sequence is use for? what is batches?what is the difference between job sequence and batches?152. What is Integrated & Unit testing in DataStage ?153. What is iconv and oconv functions?154. For what purpose is the Stage Variable is mainly used?155. purpose of using the key and difference between Surrogate keys and natural key156. how to read the data from XL FILES?my problem is my data file having some commas in data,but we are using delimitor is| ?how to read the data ,explain with steps?157. How can I schedule the cleaning of the file &PH& by dsjob?158. Hot Fix for ODBC Stage for AS400 V5R4 in Data Stage 7.1159. what is data stage engine?what is its purpose?160. What is the difference between Transform and Routine in DataStage?161. what is the meaning of the following..1)If an input file has an excessive number of rows and can be split-up then use standard 2)logic to run jobs in parallel3)Tuning should occur on a job-by-job basis. Use the power of DBMS.162. Why is hash file is faster than sequential file n odbc stage??163. Hello,Can both Source system(Oracle,SQLServer,...etc) and Target Data warehouse(may be oracle,SQLServer..etc) can be on windows environment or one of the system should be in UNIX/Linux environment.Thanks,Jagan164. How to write and execute routines for PX jobs in c++?165. what is a routine?166. how to distinguish the surrogate key in different dimentional tables?167. how can we generate a surrogate key in server/parallel jobs?168. what is NLS in datastage? how we use NLS in Datastage ? what advantages in that ? at thetime of installation i am not choosen that NLS option , now i want to use that options what can i do ? to reinstall that datastage or first uninstall and install once again ?169. how to read the data from XL FILES?explain with steps?170. whats the meaning of performance tunning techinque,Example??171. differentiate between pipeline and partion parallelism?172. What is the use of Hash file??insted of hash file why can we use sequential file itself?173. what is pivot stage?why are u using?what purpose that stage will be used?174. How did you handle reject data?175. Hiwhat is difference betweend ETL and ELT?176. how can we create environment variables in datasatage?177. what is the difference between static hash files n dynamic hash files?178. how can we test the jobs?179. What is the difference between reference link and straight link ?180. What are the command line functions that import and export the DS jobs?181. what is the size of the flat file?182. Whats difference betweeen operational data stage (ODS) & data warehouse?183. I have few questions1. What ar ethe various process which starts when the datastage engine starts?2. What are the changes need to be done on the database side, If I have to use dB2 stage?3. datastage engine is responsible for compilation or execution or both?184. Could anyone plz tell abt the full details of Datastage Certification.Title of Certification?Amount for Certification test?Where can v get the Tutorials available for certification?Who is Conducting the Certification Exam?Whether any training institute or person for guidens?I am very much pleased if anyone enlightwn me abt the above saidSuresh185. how to use rank&updatestratergy in datastage186. What is Ad-Hoc access? What is the difference between Managed Query and Ad-Hoc access?187. What is Runtime Column Propagation and how to use it?188. how we use the DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting e/xecutable versions on ad hoc or scheduled basis?189. What is the difference bitween OCI stage and ODBC stage?190. Is there any difference b/n Ascential DataStage and DataStage.191. How do you remove duplicates without using remove duplicate stage?192. if we using two sources having same meta data and how to check the data in two sorces is same or nif we using two sources having same meta data and how to check the data in two sorces is same or not?and if the data is not same i want to abort the job ?how we can do this?193. If a DataStage job aborts after say 1000 records, how to continue the job from 1000th record after fixing the error?194. Can you tell me for what puorpse .dsx files are used in the datasatage195. how do u clean the datastage repository.196. give one real time situation where link partitioner stage used?197. What is environment variables?what is the use of this?198. How do you call procedures in datastage?199. How to remove duplicates in server job200. What is the exact difference betwwen Join,Merge and Lookup Stage??202. What are the new features of Datastage 7.1 from datastage 6.1203. How to run the job in command prompt in unix?204. How to know the no.of records in a sequential file before running a server job?205. Other than Round Robin, What is the algorithm used in link collecter? Also Explain How it will works?206. how to drop the index befor loading data in target and how to rebuild it in data stage?207. How can ETL excel file to Datamart?208. what is the transaction size and array size in OCI stage?how these can be used?209. what is job control?how it is developed?explain with steps?210. My requirement is like this :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMMDD.PSVSALEMy requirement is like this :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMMDD.PSVSALE_LINE_XXXXX_YYYYMMDD.PSVXXXXX = LVM sequence to ensure unicity and continuity of file exchangesCaution, there will an increment to implement.YYYYMMDD = LVM date of file creation COMPRESSION AND DELIVERY TO: SALE_HEADER_XXXXX_YYYYMMDD.ZIP AND SALE_LINE_XXXXX_YYYYMMDD.ZIPif we run that job the target file names are like this sale_header_1_20060206 & sale_line_1_20060206.If we run next time means the211. what is the purpose of exception activity in data stage 7.5?212. How to implement slowly changing dimentions in Datastage?213. What does separation option in static hash-file mean?214. how to improve the performance of hash file?215. Actually my requirement is like that :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMMActually my requirement is like that :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMMDD.PSVSALE_LINE_XXXXX_YYYYMMDD.PSVXXXXX = LVM sequence to ensure unicity and continuity of file exchangesCaution, there will an increment to implement.YYYYMMDD = LVM date of file creation COMPRESSION AND DELIVERY TO: SALE_HEADER_XXXXX_YYYYMMDD.ZIP AND SALE_LINE_XXXXX_YYYYMMDD.ZIPif we run that job the target file names are like this sale_header_1_20060206 & sale_line_1_20060206.if we run next216. How do u check for the consistency and integrity of model and repository?217. how we can call the routine in datastage job?explain with steps?218. what is job control?how can it used explain with steps?219. how to find the number of rows in a sequential file?220. If the size of the Hash file exceeds 2GB..What happens? Does it overwrite the current rows?221. where we use link partitioner in data stage job?explain with example?222 How i create datastage Engine stop start script.Actually my idea is as below.!#bin/bashdsadm - usersu - rootpassword (encript)DSHOMEBIN=/Ascential/DataStage/home/dsadm/Ascential/DataStage/DSEngine/binif check ps -ef | grep DataStage (client connection is there) { kill -9 PID (client connection) }uv -admin - stop > dev/nulluv -admin - start > dev/nullverify processcheck the connectionecho "Started properly"run it as dsadm223. can we use shared container as lookup in datastage server jobs?224. what is the meaning of instace in data stage?explain with examples?225. wht is the difference beteen validated ok and compiled in datastage.226. hi all what is auditstage,profilestage,qulaitystages in datastge please explain indetail227what is PROFILE STAGE , QUALITY STAGE,AUDIT STAGE in datastage..please expalin in detail.thanks in adv228. what are the environment variables in datastage?give some examples?229. What is difference between Merge stage and Join stage?230. Hican any one can explain what areDB2 UDB utilitiesub231. What is the difference between drs and odbc stage232. Will the data stage consider the second constraint in the transformer once the first condition is satisfied ( if the link odering is given)233. How do you do Usage analysis in datastage ?234. how can u implement slowly changed dimensions in datastage? explain?2) can u join flat file and database in datastage?how?235. How can you implement Complex Jobs in datastage236. DataStage from Staging to MDW is only running at 1 row per second! What do we do to remedy?237. what is the mean of Try to have the constraints in the 'Selection' criteria of the jobs iwhat is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made?238. * What are constraints and derivation?* Explain the process of taking backup in DataStage?*What are the different types of lookups available in DataStage?239. # How does DataStage handle the user security?240. What are the Steps involved in development of a job in DataStage?241. What is a project? Specify its various components?242. What does a Config File in parallel extender consist of?Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location.243. how to implement type2 slowly changing dimensions in data stage?explain with example?244. How much would be the size of the database in DataStage ?What is the difference between Inprocess and Interprocess ?245. Briefly describe the various client components?246. What are orabulk and bcp stages?247. What is DS Director used for - did u use it?248. what is meaning of file extender in data stage server jobs.can we run the data stage job from one job to another job that file data where it is stored and what is the file extender in ds jobs.249. What is the max capacity of Hash file in DataStage?250. what is merge and how it can be done plz explain with simple example taking 2 tables .......251. it is possible to run parallel jobs in server jobs?252. what are the enhancements made in datastage 7.5 compare with 7.0253. If I add a new environment variable in Windows, how can I access it in DataStage?254. what is OCI?255. Is it possible to move the data from oracle ware house to SAP Warehouse using withDA TASTAGE Tool.256. How can we create Containers?257. what is data set? and what is file set?258. How can I extract data from DB2 (on IBM iSeries) to the data warehouse via Datastage as the ETL tool. I mean do I first need to use ODBC to create connectivity and use an adapter for the extraction and transformation of data? Thanks so much if anybody could provide an answer.259. it is possible to call one job in another job in server jobs?260. how can we pass parameters to job by using file.261. How can we implement Lookup in DataStage Server jobs?262. what user varibale activity when it used how it used !where it is used with real example263. Did you Parameterize the job or hard-coded the values in the jobs?Always parameterized the job. Either the values are coming from Job Properties or from a …Parameter Manager‟ – a third part tool. There is no way you will hard–code some parameters in your jobs. The o264. what is hashing algorithm and explain breafly how it works?265. what happends out put of hash file is connected to transformer ..what error it throughs266. what is merge ?and how to use merge? merge is nothing but a filter conditions that have been used for filter condition267. What will you in a situation where somebody wants to send you a file and use that file as an input What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job.268. What is the NLS equivalent to NLS oracle code American_7ASCII on Datastage NLS?269. Why do you use SQL LOADER or OCI STAGE?270. What about System variables?271. what are the differences between the data stage 7.0 and 7.5in server jobs?272. How the hash file is doing lookup in serverjobs?How is it comparing the key values?273. how to handle the rejected rows in datastage?274. how is datastage 4.0 functionally different from the enterprise edition now?? what are the exact changes?275. What is Hash file stage and what is it used for?Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better performance.276. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director?Use crontab utility along with d***ecute() function along with proper parameters passed.277. How can I connect my DB2 database on AS400 to DataStage? Do I need to use ODBC 1st to open the database connectivity and then use an adapter for just connecting between the two? Thanks alot of any replies.278. what is the OCI? and how to use the ETL Tools?OCI means orabulk data which used client having bulk data its retrive time is much more ie., your used to orabulk data the divided and retrived Asked by: ramanamv279. what is difference between serverjobs & paraller jobs280. What is the difference between Datastage and Datastage TX?281. Hi!Can any one tell me how to extract data from more than 1 hetrogenious Sources.mean, example 1 sequenal file, Sybase , Oracle in a singale Job.282. How can we improve the performance of DataStage jobs?283. How good are you with your PL/SQL?On the scale of 1-10 say 8.5-9284. What are OConv () and Iconv () functions and where are they used?IConv() - Converts a string to an internal storage formatOConv() - Converts an expression to an output format.285. If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues could arise?286. How can I specify a filter command for processing data while defining sequential file output data?287. There are three different types of user-created stages available for PX. What are they? Which would you use? What are the disadvantage for using each type?288. What is DS Manager used for - did u use it?289. What are Sequencers?Sequencers are job control programs that execute other jobs with preset Job parameters.290. Functionality of Link Partitioner and Link Collector?291. Containers : Usage and Types?Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers. a) Local Container: Job Specific b) Shared Container: Used in any job within a project.292. Does Enterprise Edition only add the parallel processing for better performance?Are any stages/transformations available in the enterprise edition only?293. what are validations you perform after creating jobs in designer.what r the different type of errors u faced during loading and how u solve them294. how can you do incremental load in datastage?295. how we use NLS function in Datastage? what are advantages of NLS function? where we can use that one? explain briefly?296. Dimension Modelling types along with their significanceData Modelling is Broadly classified into 2 types. a) E-R Diagrams (Entity - Relatioships). b) Dimensional Modelling.297. Did you work in UNIX environment?Yes. One of the most important requirements.298. What other ETL's you have worked with?Informatica and also DataJunction if it is present in your Resume.299. What is APT_CONFIG in datastage300. Does the BibhudataStage Oracle plug-in better than OCI plug-in coming from DataStage? What is theBibhudataStage extra functions?301. How do we do the automation of dsjobs?302. what is trouble shhoting in server jobs ? what are the diff kinds of errors encountered while。

DataStage官方培训教程10

DataStage官方培训教程10

DataStage官方培训教程10DataStage是一个ETL工具,它提供了广泛的数据连接选项以及数据转换和清理功能,以帮助企业完成数据仓库构建和管理。

由于数据驱动的世界越来越重要,数据管理和ETL应用程序的需求也变得越来越迫切。

DataStage官方培训教程10涵盖了DataStage常见的任务和操作,为学习DataStage用户提供了完整的指导。

在本文中,我们将对DataStage官方培训教程10进行分析和实践,以帮助读者更好地了解和掌握DataStage。

DataStage官方培训教程10的结构和内容DataStage官方培训教程10是一本基于DataStage 11.7版本的官方培训教材。

该教材共包含15个单元,分为4个部分。

第一部分介绍了DataStage概述和安装过程,包括DataStage架构、组件、工作流程等。

第二部分介绍了DataStage的数据源定义、数据移动和数据变换。

第三部分主要介绍了DataStage的错误处理和调试,包括日志、报告、来源和目标检查等。

第四部分介绍了高级主题,如DataStage管理、性能调整、共享资源和集成JDBC驱动程序等。

DataStage官方培训教程10的学习方法和技巧DataStage官方培训教程10是一本详细的教材,需要耐心和时间来学习。

以下是一些学习方法和技巧,可以帮助读者更好地掌握DataStage。

1.按照教材结构进行学习按照各个部分和单元的结构进行学习,以便逐步深入理解每个主题。

特别是,在学习前两部分时需要仔细阅读和理解数据源定义、数据移动和数据变换的概念和操作指南,掌握其重要性和影响关系。

2.完整地跟随实例进行演示教材中提供了许多实例来示范DataStage的各个方面,读者可以用自己的DataStage环境进行实操,加深对DataStage的理解和熟练度。

值得注意的是,在学习高级主题时需要一定的实践经验和技能,否则可能会花费更多的时间和精力。

Datastager入门应用开发(详细示例)

Datastager入门应用开发(详细示例)

Datastage应用开发1 Datastage 简介Datastage包含四大部件:Administrator、Manager、Designer、Director。

1.用DataStage Administrator 新建或者删除项目,设置项目的公共属性,比如权限。

2.用DataStage Designer 连接到指定的项目上进行Job的设计;3.用DataStage Director 负责job的运行,监控等。

例如设置设计好的job的调度时间。

4.用DataStage Manager 进行Job的备份等job的管理工作。

2 设计一个JOB示例2.1 环境准备目标:将源表中数据调度到目标表中去。

1 数据库:posuser/posuser@WHORADB , ip: 192.168.100.882 源表:a_test_from3 目标表:a_test_to两者表结构一样,代码参考:create table A_TEST_FROM(ID INTEGER not null,CR_SHOP_NO CHAR(15),SHOP_NAME VARCHAR2(80),SHOP_TEL CHAR(20),YEAR_INCOME NUMBER(16,2),SHOP_CLOSE_DATE DATE,SHOP_OPEN_DATE DATE);alter table A_TEST_FROMadd constraint TEST primary key (ID);4. 示例数据:insert into A_TEST_FROM (ID, CR_SHOP_NO, SHOP_NAME, SHOP_TEL, YEAR_INCOME, SHOP_CLOSE_DATE, SHOP_OPEN_DATE)values (24402, '105420580990038', '宜昌市云集门诊部', '82714596 ', 1000, to_date('01-05-2008', 'dd-mm-yyyy'), to_date('01-06-2008', 'dd-mm-yyyy'));insert into A_TEST_FROM (ID, CR_SHOP_NO, SHOP_NAME, SHOP_TEL, YEAR_INCOME, SHOP_CLOSE_DATE, SHOP_OPEN_DATE)values (24403, '105420559982198', '于志良', '82714596 ', 2000, to_date('02-05-2008', 'dd-mm-yyyy'), to_date('02-06-2008', 'dd-mm-yyyy'));insert into A_TEST_FROM (ID, CR_SHOP_NO, SHOP_NAME, SHOP_TEL, YEAR_INCOME, SHOP_CLOSE_DATE, SHOP_OPEN_DATE)values (24404, '105420556410012', '阳光儿童广场', '82714596 ', 3000, to_date('03-05-2008', 'dd-mm-yyyy'), to_date('03-06-2008', 'dd-mm-yyyy'));insert into A_TEST_FROM (ID, CR_SHOP_NO, SHOP_NAME, SHOP_TEL, YEAR_INCOME, SHOP_CLOSE_DATE, SHOP_OPEN_DATE)values (24405, '105420580620033', '秭归县医疗中心', '82714596 ', 4000, to_date('04-05-2008', 'dd-mm-yyyy'), to_date('04-06-2008', 'dd-mm-yyyy'));insert into A_TEST_FROM (ID, CR_SHOP_NO, SHOP_NAME, SHOP_TEL, YEAR_INCOME, SHOP_CLOSE_DATE, SHOP_OPEN_DATE)values (24406, '105420559120063', '同德医药零售北门连锁店', '82714596 ', 5000, to_date('05-05-2008', 'dd-mm-yyyy'), to_date('05-06-2008', 'dd-mm-yyyy'));2.2 打开Designer任务:打开datastage designer,连接datastage服务器1.双击桌面datastage designer图标。

DataStage基础培训教程ppt课件

DataStage基础培训教程ppt课件

全局变量与Job变量
• 全局变量 -- 生命周期:整个Project -- 在Administrator中定义
• Job变量 -- 生命周期:一个Job -- 在Designer、Manager中定义
演示:定义一个Job变量 在Designer中定义参数
Meta data definition
Debug and Tuning
• View Status and Logs - status, log, detail等多种视图 - 配合Monitor来查错、调优
Job Status
• Not Compiled • Compiled • Reset • Running • Finished • Finished (with warning) • Abort
演示:生成事实表
明细表
关联
聚合
事实表
Hash File
• 用途: -- 左连接时用作副表 -- 多次被访问的数据集 -- 存储其他临时数据
• 关键点: -- 必须指定key -- output的position必须与input一致
Transformer
• 用途: -- 提供丰富的运算符和函数 -- 数据清洗、转换 -- 关联多个数据源
DataStage基础培训
Jerry 2006.03
议程
• Hello World • DataStage Components • Define Parameter & Table • Hash File、Transformer、Aggregator • Director & Monitor • Administrator & Manager • Routine & Control

DataStage入门培训

DataStage入门培训
Designer Creates DataStage jobs that are compiled into executable programs
Director Used to run and monitor the DataStage jobs
Manager Allows you to view and edit the contents of the repository
Designer Clear job log Set Director options
Row limits Abort after x warnings
W
DataStage
Director Log View
Click the Log button in the toolbar to view the job log. The job log records events that occur during the execution of a job.
W
DataStage
DataStage Director
W
DataStage Desinger
DataStage
What Is a Job?
Executable DataStage program Created in DataStage Designer, but can use
components from Manager Built using a graphical user interface Compiles into Orchestrate shell language (OSH)
W
DataStage Manager
DataStage
DataStage Manager

datastage入门教程

datastage入门教程

DATASTAGE总结一、安装datastageA、安装服务端安装虚拟机(注册码在文件中)---解压datastage安装包redhat3__Datastage----点击解压文件中Red Hat Enterprise Linux 3---安装---在虚拟机启动---查看虚拟机IP地址,在dos窗口验证是否可以连接---打开secureCRT,连接虚拟机---进入/app/oracle/product/10.2/network/admin/tnsnames.ora中---按E键,再按i 进入编辑状态---将IP地址设为本机Ip地址,数据库实例名自己设置---按ESC、W、Q、:键退出---完成B、安装客户端解压datastage客户端安装包Datastageclient---点击解压文件datastage7.5.3\datastage client---安装---注册码在datastage7.5.1下载地址及license中----完成二、DATASTAGE主键1、transforme r(oracle----transformer---file)数据源oracle设置properties\source\readmethod=auto-generated sqlproperties\source\table=要导入的表名点击connection,出现remote server=数据库实例名,user=Scott,password=tigerColumns下将length设置合适---load---oracleI9--选定导入的表名---ok注意:若不知道导入表的格式Columns下将length设置合适---load---import---plug-in meda data definitions---oracleI9--ok---数据库实例名,用户名、密码---ok--选择Scott用户下---选表--- 导入Transformer设置:将需要显示的字段拖拽过去---ok目标文件file设置:properties下file--填入保存路径first line is columns name=trueFomat下点击record level 添加record delimiter 属性为UNIX newline点击field defaults 添加 null field value 属性为0 Quote=noneColumns下将length设置合适----ok以下主键数据源或目标文件为oracle/file的设置同上transformer的设置方法2、转存(file---transformer---file)Transformer设置:将需要显示的字段拖拽过去---ok3、导入(file---transformer---oracle)Transformer设置:将需要显示的字段拖拽过去---ok4、copy(file--copy--多file):一个输入,多个输出Copy设置:stage当只有一个输入及一个输出时最好将Force设置为TrueOutput下将需要显示的字段拖拽过去---ok5、filter(file--filter---多file):只有一个输入,可以有多个输出Filter设置:stage下properties\where clause=过滤条件--点击whereclause出现output link=slink值(在link orderingzhong看对应值) Output下将需要显示的字段拖拽过去---ok6、join(多oracle---join---file):多表连接Join设置:stage下properties\join keys\key=关联字段,options\join type=连接类型(内、全、左、右连接)Output下将需要显示的字段拖拽过去---ok7、look up(多oracle---look up--file):数据的查询Look up设置:将关联字段连接,再将需要显示的字段拖拽过去8、merge(多file---merge---file):相同数据的合并Merge设置:stage下properties\merge keys\key=字段,sort order=排序Options下unmatched masters mode=保留/删除Output下将需要显示的字段拖拽过去---ok9、funnel(多file---funnel---file):数据的合并Funnel设置:stage下properties\options\funnel type=选择合并方式Output下将需要显示的字段拖拽过去---ok10、aggregator(oracle---aggregator---file):数据的分类、汇总Aggregator设置:stage下properties\grouping keys\group=分组字段点击aggregations\aggregation type出现column for calculation=聚合字段及合方式,可以取最大值,最小值, Sum值,count值等多种聚合方式。

DATASTAGE技术培训-经典收藏

DATASTAGE技术培训-经典收藏

DataStage Designer常用STAGE
l Sequential file q 功能特点:适用于一般顺序文件(定长或不定长),可识别文本文件
或IBM大机ebcdic文件。
DataStage Designer常用STAGE
修改文件属性,文件名称,reject方式等
DataStage Designer常用STAGE
对每个工程的各个单元,包括库表定义、集中的转换程序 和元数据连接等对象进行分类和组织。
DataStage Client部件简介
n Director 为启动、停止和监视作业提供交互式控制。
n Administrator 在服务器端管理Datastage的工程和使用者权限的分配。
DataStage服务器
单的设置,点击对我们的JOB进行编译,就可以运行了(我们 一般在Director运行JOB)。 u Designer的主要功能编译和设计JOB,编写函数、子程序、脚 本等。
我们将在下面以一个例子介绍Designer的用法。
DataStage功能组件-- Director
双击: 编辑的JOB。
进入Director登录界面,注意选择自己想进入
n 服务器 是数据集成的主要设备。在服务器上,你可以在运行时间
内对几个并行的处理过程进行控制,以便在多个不同的数据源 和数据目标之间发送数据。服务器可以安装在NT或UNIX、 LINUX环境中,同时通过调节来有效地利用多处理器和内存的 优势。通过使用Datstage中包括的许多富有效率的功能,企业 可以缩短学习周期、简化管理过程、最大限度地开发资源,从 而缩短数据集成应用程序的开发和维护周期。
DATASTAGE技术培训
ETL简介
ETL(Extract-Transform-Load的缩写,即数据抽取、转换、 装载的过程)作为BI/DW(Business Intelligence)的核心和灵魂, 能够按照统一的规则集成并提高数据的价值,是负责完成数据 从数据源向目标数据仓库转化的过程,是实施数据仓库的关键 步骤。如果说数据仓库的模型设计是一座大厦的设计蓝图,数 据是砖瓦的话,那么ETL就是建设大厦的过程。在整个项目中 最难部分是用户需求分析和模型设计,而ETL规则设计和实施 则是工作量最大的,约占整个项目的60%~80%,这是国内外 从众多实践中得到的普遍共识。

IBM-DataStage技能培训

IBM-DataStage技能培训

DataStage存储过程的调用(方法一)
可以通过SQL语句块调用存储过程
DataStage存储过程的调用(方法二)
两种调用比较:第一种简 单明了,但是取不到存储 过程的返回值;第二种方 法比较复杂,下去可以在 测试环境搞搞。
DS备份(导出)
可以选择不同的对象进行备份,一般建议完全备份,选择Whole project
候开始执行后面的序列
erVariables_Activity:自定义参数控件,可以定义全局参数,
供整个JOB引用
6.Routine_Activity:调用封装好的Routine,Routine类似于SQL中
的存储过程。 Demo: SEQ_POL_MAIN
一个数据抽取转换装载的实例 Demo: CopyCopyPjob_PRIP_LJTEMPFEE 实例目的:暂收费表,根据中保信二期逻辑改造而来,介绍Prallel
专业程度如Datastage旗鼓相当,也是图形化界面开发,很多控件的功能与Datastage 相似,价格似乎比Datastage便宜,可以在Window、Linux、Unix、Aix等多个环境上运 行。
• Kettle
Kettle 中文名叫水壶,纯java编写的开源ETL工具,开源当然就免费,免费的有些东西 使用就不是很方便,很多功能需要结合Java开发,可以在Window、Linux、Unix、Aix 上运行,数据抽取高效稳定。
• ODI
Oracle数据库厂商提供的工具,有局限性,与oracle数据库耦合太深
完备的开发环境
IBM WebSphere DataStage 的开发环境是基于 C/S 模式 的,通过 DataStage Client 连接到DataStage Server 上 进行开发,DataStage Server 支持多种平台,比如 Windows、Redhat Linux、 IBM AIX 、HP-UNIX等。

DataStage 学习

DataStage 学习

DataStage学习1.基本的工具介绍:用户通过各个客户端工具访问DataStage企业版的开发、配置和维护功能。

这些工具包括:Designer:用来建立和编辑DataStage作业和表的定义。

Designer中的“Job Sequencer”控制作业的执行,其他作业成功完成(或失败,等)的条件。

Administrator:用来执行管理任务,如建立DataStage用户、建立和删除工程并且建立清洗标准。

Manager:用来编辑管理用户工程的DataStage资料库。

Director:用来验证、时序安排、运行和监测企业版作业。

2.试学习例子图二、DataStage企业版数据流图示2.1。

企业版Aggregator Stage的编辑器如下所示。

图三、企业版编辑Aggregator Stage图标和Stage编辑器例子2.2企业版Transformer Stage是一个强大和灵活的组件,允许用户对input link输入的数据进行转换。

并且将数据传到另一个活动的Stage或者将数据写到目标数据或文件。

Transformer 编辑器(如下所示)使得用户可以在input liks和output link间简单建立mapping,并且可以使用BASIC等语言建立任意转换。

这些转换可以并行执行来提高吞吐量和性能。

企业版提供了超过100个内嵌的功能,另外可以用C或C++编写的route在转换中使用和进行互操作。

图四、企业版Transformer Stage图标和Stage编辑器例子2.3Enterprise Deployment and Management许多大的公司都又他们自己的在复杂生产环境下的配置、时间序列、监测和管理应用的标准。

DataStage企业版提供了灵活功能来迎合这些需要。

首先,DataStage提供了一个图形化的作业顺序器,允许用户定义作业执行的序列。

设计一个作业序列就象设计一个作业。

用户在DataStage中设计作业序列。

datastage教程

datastage教程

1、【第一章】datastage简介与工作原理1、简介数据中心(数据仓库)中的数据来自于多种业务数据源,这些数据源可能是不同硬件平台上,使用不同的操作系统,数据模型也相差很远,因而数据以不同的方式存在不同的数据库中。

如何获取并向数据中心(数据仓库)加载这些数据量大、种类多的数据,已成为建立数据中心(数据仓库)所面临的一个关键问题。

针对目前系统的数据来源复杂,而且分析应用尚未成型的现状,专业的数据抽取、转换和装载工具DataStage是最好的选择。

Websphere DataStage 是一套专门对多种操作数据源的数据抽取、转换和维护过程进行简化和自动化,并将其输入数据集市或数据中心(数据仓库)目标数据库的集成工具。

DataStage 能够处理多种数据源的数据,包括主机系统的大型数据库、开放系统上的关系数据库和普通的文件系统等,以下列出它所能处理的主要数据源:大型主机系统数据库:IMS,DB2,ADABAS,VSAM 等开放系统的关系数据库:Informix,Oracle,Sybase,DB2,Microsoft SQL Server等ERP 系统:SAP/R3,PeopleSoft系统等,普通文件和复杂文件系统,FTP 文件系统,XML等IIS,Netscape,Apache等Web服务器系统Outlook等Email系统。

DataStage 可以从多个不同的业务系统中,从多个平台的数据源中抽取数据,完成转换和清洗,装载到各种系统里面。

其中每步都可以在图形化工具里完成,同样可以灵活的被外部系统调度,提供专门的设计工具来设计转换规则和清洗规则等,实现了增量抽取、任务调度等多种复杂而实用的功能。

其中简单的数据转换可以通过在界面上拖拉操作和调用一些DataStage 预定义转换函数来实现,复杂转换可以通过编写脚本或结合其他语言的扩展来实现,并且DataStage 提供调试环境,可以极大提高开发和调试抽取、转换程序的效率。

DataStage优化培训笔记.doc

DataStage优化培训笔记.doc

DataStage优化培训笔记Sequential file1、注意reject mode的设置2、优化:(在文件定长的前提下)number of readers per node 设定单节点的多个读取,根据实际情况设置多读个数read from multiple nodes 设定多节点的数据读取Change Capture Stage比较数据后会进行排序,如果之前的数据已经做了排序,则需要改变排序属性。

注意before 和after 的设置,不要设反。

Copy Stage在内存中操作的组件,建议1进多出用copy组件Tansformer Stage是内嵌的程序,一旦作业执行到此stage 程序会暂停进程,外部调用so的程序,Transformer组件中包含的函数,可以自己编写函数进行嵌入(通过routine实现)filter不能用于复杂的判断,copy不能增加赋默认值的字段..Sort Stage尽量不用,属于滞留组件,要等数据齐全后再能进行sort操作LookUp和Join的区别需要注意join一定要进行排序再进行处理(效率较低),LookUp是流水线实现(超过800M不能用此stage)Data Set StageStage自动设置数据为定长,实现多值读取,可以通过drop on input来限制输入数据。

生产环境优化:关注CPU(并发路数,逻辑节点数,物理作业数),内存,I/O交互1、在Oracle Enterprise 中使用select语句时,提取尽量少的字段数据2、在使用LookUp Stage时,如果数据从Oralce出来的,在LookUp table(参照表中)可以设置Lookup type=sparse(此方式是数据不提取到内存,直接在表中进行操作)3、在Oracle Enterprise中设置Partition table="需要查询的表名"可以实现多进程读取数据4、在文件系统中,为平衡节点负载,建议数据的输入和输出放在不同的磁盘上(可通过节点进行设置,如Sequential_File中设置FILE的路径)5、尽量少用repartition(sort stage 、join stage等组件需要对数据进行repartition)6、要保证有足够的scratch空间,当此空间满了之后,系统会把数据转移到tmp空间,效率变低7、网络瓶颈会影响作业效率(局域网通讯,Node之间的通讯问题)8、在MAIN机器上,设置是否关闭jobmonitor进程(pools"" 为默认节点,需要进行节点运行,如果对””进行赋值,则不作为默认节点,不做运行。

datastage培训提纲

datastage培训提纲

培训提纲1.ETL定义说明ETL过程指的是从数据源中抽取数据,然后对这些数据进行清洗、转换,最终加载到目标数据库和数据仓库中。

数据抽取:数据抽取主要是针对各个业务系统及不同网点的分散数据,充分理解数据定义后,规划需要的数据源及数据定义,制定可操作的数据源,制定增量抽取的定义。

数据转化和清洗:数据转换是真正将源数据变为目标数据的关键环节,它包括数据格式转换、数据类型转换、数据汇总计算、数据拼接等等。

但这些工作可以在不同的过程中处理视具体情况而定,比如,可以在数据抽取时转换,也可以在数据加载时转换。

数据清洗主要是针对系统的各个环节可能出现的数据二义性、重复、不完整、违反业务规则等问题,允许通过试抽取,将有问题的纪录先剔除出来,根据实际情况调整相应的清洗操作。

数据加载:数据加载主要是将经过转换和清洗的数据加载到数据仓库(或数据库)里面,即入库,操作者可以通过数据文件直接装载或直连数据库的方式来进行数据装载。

2.ETL工具的选择2.1.支持平台随着各种应用系统数据量的飞速增长和对业务可靠性等要求的不断提高,人们对数据抽取工具的要求往往是将几十、上百个GB的数据在有限的几个小时内完成抽取转换和装载工作,这种挑战势必要求抽取工具对高性能的硬件和主机提供更多支持。

因此,我们可以从数据抽取工具支持的平台,来判断它能否胜任企业的环境,目前主流的平台包括SUN Solaris、HP-UX、IBM AIX、AS/400、OS/390、Sco UNIX、Linux、Windows等。

2.2.支持数据源对数据源支持的重要性不言而喻,因此这个指标必须仔细地考量。

首先,我们需要对项目中可能会遇到的各种数据源有一个清晰的认识;其次对各种工具提供的数据源接口类型也要有深入了解,比如,针对同一种数据库,使用通用的接口(如ODBC/JDBC)还是原厂商自己的专用接口,数据抽取效率都会有很大差别,这直接影响到我们能不能在有限的时间内完成ETL任务。

datastage学习文档

datastage学习文档

工作总结1 如何重新启动DataStage服务器, 步骤如下: (5)2 DataStage开发经验积累: (5)2.1模板开发 (5)2.2通过S ERVER S HARED C ONTAINER在P ARALLEL J OB中添加S ERVER J OB S TAGE (5)2.3去除不需要的字段 (5)2.4T RANSFORMER S TAGE的使用 (5)2.5L OOK UP/JOIN 空值处理 (6)2.6D ATA S TAGE中默认和隐式类型转换时注意的问题 (6)2.7配置一个INPUT或OUTPUT,就VIEW DATA一下,不要等到RUN时再回头找ERROR (6)2.8D ATA型数据是比较麻烦的 (6)2.9行列互换之H ORIZONTAL P IVOT(P IVOT S TAGE) (7)2.10行列互换之V ERTICAL P IVOT (7)2.11O RACLE EE S TAGE在VIEW数据时出现的错误及解决方法 (9)2.12D ATA S TAGE SAP S TAGE的使用 (10)2.13C OLUM I MPORT S TAGE的使用 (10)2.14C OLUM E XPORT S TAGE的使用 (12)2.15G OT ERROR:C ANNOT FIND ANY PROCESS NUMBER FOR STAGES IN J OB J OBNAME解决 (13)2.16U NABLE TO CREATE RT_CONFIG NNN (14)2.17查看JOB和CLIENT的对应的后台进程 (14)2.18强制杀死DS进程 (14)2.19查看S ERVER E NGINE的进程 (15)2.20查看S ERVER L OCKS (15)2.21关于UNIX系统下无法启动服务的解决办法 (16)2.22L OCKED BY OTHER USER (17)2.23DATA S TAGE J OB L OG的处理 (17)2.24一些BASIC语言中处理字符串的函数 (17)2.25BASIC程序中使用到的一些语法知识 (18)3DS中常见问题记录 (22)3.1权限管理问题 (22)3.2JOB MAY BE BEING MONITORED或者是CLEANUP问题 (22)3.3删除文件的问题 (22)3.4SEQUENCE调度出现的错误问题 (23)3.17字符集问题 (23)3.18V ERSION C ONTROL的问题 (23)3.19SEQUENCE调不起JOB的问题 (23)3.20SEQUENCE调度失败的问题 (24)3.21DS发送邮件的配置问题 (25)3.22随机错误问题 (26)3.23DS中的日期问题 (26)3.24DS连接ORACLE问题 (27)。

Datastage入门培训

Datastage入门培训

一、工具入门DataStage是一个ETL的工具,就是对数据的抽取,转换,加载。

个人通俗的理解就是一个对数据进行处理,提取的工具,这里面的数据大部分是以数据库中表的格式存在着的,所以如果要使用这个工具,首先必须对关系数据库的一些基本概念要有所了解,比如最基本的字段,键,记录等概念。

DataStage是通过设计job来实现ETL的功能的。

Job的设计跟普通的IDE设计一样,通过拖拽控件,并填加脚本来完成。

这里的控件称为stage,每一个不同的stage都有不同的数据处理的功能,将各个stage通过一定的方式组合起来,设计成job,对job进行编译,运行,就能够实现对数据抽取转换加载。

1,安装datastage,看学习指导,先对该工具有个大概的认识,大概知道administrator,design,director,manager的区别。

了解datastage工具的主要用途:简单的说就是把一批数据input进来,经过各种各样的转化,清洗,然后在output出去,整个就是ETL 的过程。

对4个工具我们最常做的操作有:Administrator:1、对Project的管理,主要是建立和删除project;2、对Licensing的管理,主要是更换Licensing。

design:datastage的核心,所有的开发都在design里面完成,在这里可以编辑你的job,使用各种stage控件。

director:1、查看日志,当运行job结束时,无论job成功或者失败,我们都可以在director 里面查看日志,里面能反映我们job运行的状态,经常job出错我们都是先查看日志,然后分析原因,再到design里面修改。

2、director的另外一个很有用的功能是logout job,当服务器或者网络出问题时,正在编辑的job很有可能被锁定,这时你就算把design关了再重新登陆还是无法打开job,会提示job has been used, 这就需要到director里面把job logout,然后就可以使用了。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
DataStage 基础培训_练习
2011-04-12
1
练习 1、需求内容 实现按部门和城市对订单进行多维度分析
2、设计方案 1)新建订单多维分析事实表
注:在插入数据之前,先对当前 2)源头取数
从订单表orders获取当前会计期(200305)订单信息
从员工表emp获取员工所属部门信息
从供应商表suppliers获取供应商所属城市信息
3
练习 2、设计方案 3)开发ETL,将基于部门和城市的订单多维分析数据装载到事
实表,要求:
A、新建一个JOB将订单信息先落地到中间数据文件 B、新建另一个JOB从中间数据文件抽取订单信息,并通过与员 工表、供应商表匹配获取订单多维分析数据,装载到事实表 C、通过JOB Sequence将两个JOB封装起来,并设计前后依赖
(2)DataStage环境
IP:192.168.1.253
Project:SIEProjectA u/p:dsadm/dsadm
5
练习 参考ETL(SQL)
6
Q&A
7
关系
D、使用的Stage包括但不限于以下stage:Oracle Stage、 DataSet Stage、Join Stage、Lookup Stage、Transformer Stage、Aggregator Stage
4
练习
3、环境
(1)#深圳SIEDW数据库 USER/ PASSWORD:BI_APP/bi_app SIEDW = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.254)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = SIEDW) ) )
相关文档
最新文档