The document discusses Extract, Transform, Load (ETL) processes. It defines extract as reading data from a database, transform as converting extracted data into a form suitable for another database, and load as writing transformed data into the target database. It then lists several common ETL tools and databases they can connect to.
2. Extract is the process of reading data from a database
Transform is the process of converting the extracted data from its
previous form into the form it needs to be in so that it can be placed
into another database. Transformation occurs by using rules or
lookup tables or by combining the data with other data
Load is the process of writing the data into the target database
3.
4. data migration
data management
data cleansing
data synchronization
data consolidation.
.
5. •Oracle ETL
•Ab Initio
•Pentaho Data Integration -Kettle Project (open source ETL)
•SAS ETL studio
•Cognos Decisionstream
•Business Objects Data Integrator (BODI)
•Microsoft SQL Server Integration Services (SSIS)
•Informatica PowerCenter
•Talend
6. Talend Open Studio for Data Integration
◦ http://www.talend.com/download
VirtualBox
◦ https://www.virtualbox.org/wiki/Downloads
Hortonworks Sandbox VM
◦ http://hortonworks.com/products/hortonworks-
sandbox/#install
14. Talend Studio offers nearly comprehensive connectivity to:
Packaged applications (ERP, CRM, etc.), databases, mainframes, files, Web Services, and so on to
address the growing disparity of sources.
Data warehouses, data marts, OLAP applications - for analysis, reporting, dashboarding,
scorecarding, and so on.
Built-in advanced components for ETL, including string manipulations, Slowly Changing
Dimensions, automatic lookup handling, bulk loads support, and so on.
15.
16.
17.
18.
19.
20.
21.
22. Data volumes are growing exponentially
Data velocity is moving faster
As information systems grow in complexity, the disparity of
sources is growing as well
All these target structures have different data transformation
requirements and different tolerances in terms of latency
Transformations involved in ETL processes can be highly complex