Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

ETL Technologies.pptx

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 13 Publicité

Plus De Contenu Connexe

Plus récents (20)

Publicité

ETL Technologies.pptx

  1. 1. ETL Technologies Gaurav Bhatnagar Draft v.01
  2. 2. Introduction 2 ETL(EXTRACT, TRANSFORM, LOAD) • The ETL process collects the raw data from various data sources (your CRM, ad accounts, ERP, email servers, …) and saves them to the staging area. • Before data can be loaded in the target data warehouse or database of your choice, the data undergoes extensive transformations. • Depending on your business logic, you might mask sensitive personal information, remove outliers, or aggregate metrics to make your analysts’ life easier, before finally loading data into the data storage. ELT(EXTRACT, LOAD, TRANSFORM) • A variant of ETL wherein the extracted data is first loaded into the target system. • Transformations are performed after the data is loaded into the data warehouse. • ELT typically works well when the target system is powerful enough to handle transformations. Analytical • Databases like Amazon Redshift and Google Big Query are often used in ELT pipelines because they are highly efficient in performing transformations ETL VS ELT
  3. 3. Workflow 3 ETL VS ELT
  4. 4. 4 Differences ETL ELT 1) Support for Data Warehouse Yes, ETL is the traditional process for transforming and integrating structured or relational data into a cloud-based or on-premises data warehouse. Yes, ELT is the modern process for transforming and integrating structured or unstructured data into a cloud-based data warehouse. 2) Support for Data Lake/Mart/Lakehouse No, ETL is not an appropriate process for data lakes, data marts or data lakehouses. Yes, the ELT process is tailored to provide a data pipeline for data lakes, data marts or data lakehouses. 3) Size/type of data set ETL is most appropriate for processing smaller, relational data sets which require complex transformations and have been predetermined as being relevant to the analysis goals. ELT can handle any size or type of data and is well suited for processing both structured and unstructured big data. Since the entire data set is loaded, analysts can choose at any time which data to transform and use for analysis. ETL VS ELT
  5. 5. 5 Differences ETL ELT 4) Implementation The ETL process has been around for decades and there is a mature ecosystem of ETL tools and experts readily available to help with implementation. The ELT process is a newer approach and the ecosystem of tools and experts needed to implement it is still growing. 5) Transformation In the ETL process, data transformation is performed in a staging area outside of the data warehouse and the entire data must be transformed before loading. As a result, transforming larger data sets can take a long time up front but analysis can take place immediately once the ETL process is complete. In the ELT process, data transformation is performed on an as-needed basis in the target system itself. As a result, the transformation step takes little time but can slow down the querying and analysis processes if there is not sufficient processing power. ETL VS ELT
  6. 6. 6 Differences ETL VS ELT ETL ELT 6. Loading The ETL loading step requires data to be loaded into a staging area before being loaded into the target system. This multi-step process takes longer than the ELT process In ELT, the full data set is loaded directly into the target system. Since there is only one step, and it only happens one time, loading in the ELT process is faster than ETL. 7) Cost ETL can be cost-prohibitive for many small and medium businesses. ELT benefits from a robust ecosystem of cloud-based platforms which offer much lower costs and a variety of plan options to store and process data. 8) Compliance ETL is better suited for compliance with GDPR, HIPAA, and CCPA standards given that users can omit any sensitive data prior to loading in the target system. ELT carries more risk of exposing private data and not complying with GDPR, HIPAA, and CCPA standards given that all data is loaded into the target system.
  7. 7. 7 Use Case ETL VS ELT ETL ELT TRANSFORM TECHNOLOGIES Scripting languages, SQL procedures Data warehouse specific solutions PHYSICAL SPACE REQUIRED TO STORE DATA Lower Higher MATURITY Tested and proven Novel and (sometimes) experimental ENGINEERING EXPERTISE REQUIRED Medium High DATA TYPE All, but best for structured (relational) data All, but excels at unstructured data PROS Simpler to deploy and maintain. A lot of (human and technical) resources available. Can handle massive amounts of data. Best for unstructured data. CONS Scaling - Becomes increasingly more complex for large data deployments. Needs a higher level of expertise to deploy and maintain. Edge cases are not always polished for reliability
  8. 8. AWS GLU ETL VS ELT  AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. You can use it for analytics, machine learning, and application development. It also includes additional productivity and data ops tooling for authoring, running jobs, and implementing business workflows.  With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum. AWS GLUE COMPONENTS::  AWS Glue console  AWS Glue Data Catalog  AWS Glue crawlers and classifiers  AWS Glue crawlers and classifiers  AWS Glue ETL operations  Streaming ETL in AWS Glue  The AWS Glue jobs system 8
  9. 9. DATA FLOW IN AWS GLU ETL VS ELT 9
  10. 10. AZURE DATA FACTORY ETL VS ELT 10  Azure Data Factory falls under the identify domain of Services in the SEO(Search Engine Optimization )catalog, and it’s a cloud based integration service.  Basically it works on data .It Orchestrates and automates the movement or transformation of data.  As data is coming from a number of different products ,to analyze and store all this data we need a power full tool ,so Azure data factory will Help us How ADF Will Help us??  Storing the data with the help of Azure Data Lake storage  Analyzing the data  Transforming the data with the help of pipelines  Publishing the Organized data  Visualizing the data with third party applications like Apache spark and Hadoop
  11. 11. ETL VS ELT 11 FLOW PROCESS OF DATA FACTORY
  12. 12. BUSINEES OBJECT DATA SERVICES(BODS) ETL VS ELT 12  SAP BODS is an ETL tool for extracting data from disparate systems, transform data into meaningful information, and load data in a data warehouse. It is designed to deliver enterprise- class solutions for data integration, data quality, data processing and data profiling. The full form of SAP BODS is Business Objects Data Services. • Repository, Management Console, Designer, Job Server, Access Server, are important components of SAP BODS Architecture • SAP Business Objects offers better profiling because of too many acquisitions of other companies.
  13. 13. Conclusion We need to look at business/ technical problems , What would be our reference data model architecture and then come up with roadmaps towards the same. • ETL is best suited for fast analytics in smaller-to-medium data environments, where the source data and data operations are well-controlled and do not evolve constantly (do not need flexibility). • ELT, in contrast, is best suited for working with semi-structured or unstructured data, in big data environments, where the changing data operation requirements foresee a lot of needed flexibility.

Notes de l'éditeur

  • We are building up a base of integrated expertise by data transfer. key use case example using the warehouse of information sharing we can easily reuse the data into other systems their by enabling collaboration with various ecosystems empowering customer centric thinking with southern water IT landscape . To do this digital first is the key
  • Essentially, we are trying to establish an echo system with fundament for sharing data

×