SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Talend Data Integration and Management
Data Integration



   Data Integration involves combining data
 residing in differente sources and providing the
        user with a unified view of the data


Data Management combines different disciplines
    to manage data as a valuable resource




                                         www.robertomarchetto.com
Talend


●   Talend is a company focused on Data
    Integration and Data Management solutions
●   Talend is a „Cool Vendor“ for Gartner (2010)
●   Present in more than 12 locations around the
    World
●   Fast growing company




                                          www.robertomarchetto.com
Talend Open Studio




                     www.robertomarchetto.com
Talend Open Studio

●   Open Source, professional tool
●   Draw procedures linking components, each
    component performs an operation
●   DB vendor-specific optimized components
●   Produces fully editable Java (or Perl) code
●   Deployment with small and fast compiled Java
    or as Web Service
●   Eclipse based IDE, excellent flexibility
●   BI Platform indipendent, DB Vendor indipendent
                                               www.robertomarchetto.com
Automatic code generation, diffent
           deployment




                             www.robertomarchetto.com
Extracion Transformation Loading


●   ETL is a common process in Data Integration
    ●   Extract, reading data from different datasources
        (database, flat files, spreadsheet files, web
        services, etc)
    ●   Transfom, converting data in a form so that it can
        be placed in another container (database, web
        services, files, etc). Cleaning, computations and
        verifications are also performed
    ●   Load, write the data in the target format



                                                    www.robertomarchetto.com
Tutorial, Source data




                        www.robertomarchetto.com
Tutorial, Destination data (Datawarehouse)




                                 www.robertomarchetto.com
Tutorial, Metadata


●   Talend requires a preliminary definition of the
    metadata
●   Often a strong metadata definition means, as in
    programming languages, fast, robust and
    maintenable applications
●   ..demo..




                                            www.robertomarchetto.com
Tutorial, Talend jobs basics



●   Place components on the designer
●   Link components to build a transformation
●   Main type of link: Rows flow
●   Schema metadata is propagated and must be
    coherent
●   ..demo..



                                         www.robertomarchetto.com
Tutorial, users_dimension




                        www.robertomarchetto.com
Test the job




               www.robertomarchetto.com
Tutorial, accounts_dimension




                         www.robertomarchetto.com
Tutorial, dates_dimension




                        www.robertomarchetto.com
Tutorial, write a Java library




                            www.robertomarchetto.com
Tutorial, opportunities_fact




                          www.robertomarchetto.com
Tutorial, define a root job




                          www.robertomarchetto.com
Deploy and run




                 www.robertomarchetto.com
Extensibility, comunity plugins


                ●   Many official
                    components
                ●   Components for
                    every task released
                    by the comunity
                ●   Geospatial
                    components, log
                    analysis, Google
                    analytics, data
                    encryption, etc

                                www.robertomarchetto.com
Scheduler




            www.robertomarchetto.com
And now.. reports, dashboards, OLAP,
        Geoanalysis, KPIs..




                              www.robertomarchetto.com
Do you trust your data?




                     www.robertomarchetto.com
What about data quality?

●   Customer A is present 5 times with different
    names
●   Null values can vary statistical indexes like
    mean calculation
●   Duplicated records
●   Blank values
●   Some records can contain errors (es -1 field
    values)
●   Some records can be garbage

                                            www.robertomarchetto.com
Talend Open Profiler




                       www.robertomarchetto.com
What abount data storage size?


●   Some fields can be oversized for the data they
    contain
●   Sometimes fields are related and can be
    calculated
●   Some keys or values are never used
●   When data grow garbage grow
●   Data storage is not free (disks, electricity,
    backups, DB licenses)

                                              www.robertomarchetto.com
Data is „the black gold“ that can produce
                knowledge


●   Data is a resource, you can extract knowledge
●   A lot of Data produces concise informations
●   Data storage is not free and a lot of data can
    make system not fast
●   Data cleansing is a central process in statistical
    analysis and Data Mining




                                            www.robertomarchetto.com
Talend Master Data Management




                         www.robertomarchetto.com

Contenu connexe

Tendances

Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSIRemain Software
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewRajan Kanitkar
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
An Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudAn Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudTalend
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault ModelingKent Graziano
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesDATAVERSITY
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkDatabricks
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Edureka!
 

Tendances (20)

Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSI
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
An Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudAn Introduction to Talend Integration Cloud
An Introduction to Talend Integration Cloud
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
 

Similaire à Talend Open Studio Data Integration

Business Intelligence Open Source
Business Intelligence Open SourceBusiness Intelligence Open Source
Business Intelligence Open SourceRoberto Marchetto
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cEdelweiss Kammermann
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow ManagementRomi Kuntsman
 
Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data LogisticsKen Farmer
 
An Introduction To Palomino
An Introduction To PalominoAn Introduction To Palomino
An Introduction To PalominoLaine Campbell
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...Mark Rittman
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projectsIBACZ
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling MagentoCopious
 
Fighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFabio Pellegrini
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_ResumeAmit Kumar
 
Resume ETL-Informatica developer
Resume  ETL-Informatica developerResume  ETL-Informatica developer
Resume ETL-Informatica developerajayagrawal92
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 

Similaire à Talend Open Studio Data Integration (20)

Business Intelligence Open Source
Business Intelligence Open SourceBusiness Intelligence Open Source
Business Intelligence Open Source
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Spark Workflow Management
Spark Workflow ManagementSpark Workflow Management
Spark Workflow Management
 
Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data Logistics
 
An Introduction To Palomino
An Introduction To PalominoAn Introduction To Palomino
An Introduction To Palomino
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Resume
ResumeResume
Resume
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
Are we there yet?
Are we there yet?Are we there yet?
Are we there yet?
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling Magento
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Fighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless phpFighting legacy with hexagonal architecture and frameworkless php
Fighting legacy with hexagonal architecture and frameworkless php
 
Boobalan_Muthukumarasamy_Resume_DW_8_Yrs
Boobalan_Muthukumarasamy_Resume_DW_8_YrsBoobalan_Muthukumarasamy_Resume_DW_8_Yrs
Boobalan_Muthukumarasamy_Resume_DW_8_Yrs
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
 
Resume ETL-Informatica developer
Resume  ETL-Informatica developerResume  ETL-Informatica developer
Resume ETL-Informatica developer
 
Odi ireland rittman
Odi ireland rittmanOdi ireland rittman
Odi ireland rittman
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 

Dernier

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Talend Open Studio Data Integration

  • 1. Talend Data Integration and Management
  • 2. Data Integration Data Integration involves combining data residing in differente sources and providing the user with a unified view of the data Data Management combines different disciplines to manage data as a valuable resource www.robertomarchetto.com
  • 3. Talend ● Talend is a company focused on Data Integration and Data Management solutions ● Talend is a „Cool Vendor“ for Gartner (2010) ● Present in more than 12 locations around the World ● Fast growing company www.robertomarchetto.com
  • 4. Talend Open Studio www.robertomarchetto.com
  • 5. Talend Open Studio ● Open Source, professional tool ● Draw procedures linking components, each component performs an operation ● DB vendor-specific optimized components ● Produces fully editable Java (or Perl) code ● Deployment with small and fast compiled Java or as Web Service ● Eclipse based IDE, excellent flexibility ● BI Platform indipendent, DB Vendor indipendent www.robertomarchetto.com
  • 6. Automatic code generation, diffent deployment www.robertomarchetto.com
  • 7. Extracion Transformation Loading ● ETL is a common process in Data Integration ● Extract, reading data from different datasources (database, flat files, spreadsheet files, web services, etc) ● Transfom, converting data in a form so that it can be placed in another container (database, web services, files, etc). Cleaning, computations and verifications are also performed ● Load, write the data in the target format www.robertomarchetto.com
  • 8. Tutorial, Source data www.robertomarchetto.com
  • 9. Tutorial, Destination data (Datawarehouse) www.robertomarchetto.com
  • 10. Tutorial, Metadata ● Talend requires a preliminary definition of the metadata ● Often a strong metadata definition means, as in programming languages, fast, robust and maintenable applications ● ..demo.. www.robertomarchetto.com
  • 11. Tutorial, Talend jobs basics ● Place components on the designer ● Link components to build a transformation ● Main type of link: Rows flow ● Schema metadata is propagated and must be coherent ● ..demo.. www.robertomarchetto.com
  • 12. Tutorial, users_dimension www.robertomarchetto.com
  • 13. Test the job www.robertomarchetto.com
  • 14. Tutorial, accounts_dimension www.robertomarchetto.com
  • 15. Tutorial, dates_dimension www.robertomarchetto.com
  • 16. Tutorial, write a Java library www.robertomarchetto.com
  • 17. Tutorial, opportunities_fact www.robertomarchetto.com
  • 18. Tutorial, define a root job www.robertomarchetto.com
  • 19. Deploy and run www.robertomarchetto.com
  • 20. Extensibility, comunity plugins ● Many official components ● Components for every task released by the comunity ● Geospatial components, log analysis, Google analytics, data encryption, etc www.robertomarchetto.com
  • 21. Scheduler www.robertomarchetto.com
  • 22. And now.. reports, dashboards, OLAP, Geoanalysis, KPIs.. www.robertomarchetto.com
  • 23. Do you trust your data? www.robertomarchetto.com
  • 24. What about data quality? ● Customer A is present 5 times with different names ● Null values can vary statistical indexes like mean calculation ● Duplicated records ● Blank values ● Some records can contain errors (es -1 field values) ● Some records can be garbage www.robertomarchetto.com
  • 25. Talend Open Profiler www.robertomarchetto.com
  • 26. What abount data storage size? ● Some fields can be oversized for the data they contain ● Sometimes fields are related and can be calculated ● Some keys or values are never used ● When data grow garbage grow ● Data storage is not free (disks, electricity, backups, DB licenses) www.robertomarchetto.com
  • 27. Data is „the black gold“ that can produce knowledge ● Data is a resource, you can extract knowledge ● A lot of Data produces concise informations ● Data storage is not free and a lot of data can make system not fast ● Data cleansing is a central process in statistical analysis and Data Mining www.robertomarchetto.com
  • 28. Talend Master Data Management www.robertomarchetto.com