SlideShare une entreprise Scribd logo
1  sur  12
Data Cleansing and Prep
with Azure Synapse Analytics
Data Flows
Mark Kromer @kromerbigdata
Azure Data Governance
Principal PM
https://techcommunity.microsoft.com/t5/azure-data-
factory/how-to-save-your-data-profiler-summary-stats-in-adf-data-flows/ba-p/1243251
• iif (length(title) == 0,toString(null()),title)
https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-dedupe-nulls-snippets
https://www.youtube.com/watch?v=E_UD3R-VpYE
https://mssqldude.wordpress.com/2020/08/31/adf-data-flow-metadata-functions-explained/
• https://www.youtube.com/watch?v=wjsi9g3ffco
• https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-dedupe-nulls-snippets
• https://www.youtube.com/watch?v=O8gmv5-lXhs
• https://www.youtube.com/watch?v=7gdwExjHBbw
• https://www.youtube.com/watch?v=ryYo8UFUgTI
• https://www.youtube.com/watch?v=E_UD3R-VpYE
• ADF Metafunctions explained
• https://mssqldude.wordpress.com/2020/08/31/adf-data-flow-metadata-functions-explained/
• This slide deck
• https://www.slideshare.net/kromerm/data-cleansing-and-data-prep-with-synapse-data-flows
• Tutorial
• https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/synapse-dataflows-tutorial-
001.docx

Contenu connexe

Tendances

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

Tendances (20)

Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADF
 
Lift SSIS package to Azure Data Factory V2
Lift SSIS package to Azure Data Factory V2Lift SSIS package to Azure Data Factory V2
Lift SSIS package to Azure Data Factory V2
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Azure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukAzure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de Kreuk
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
 

Similaire à Data cleansing and prep with synapse data flows

FEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny Jessee
FEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny JesseeFEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny Jessee
FEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny Jessee
WSPDC & FEDSPUG
 

Similaire à Data cleansing and prep with synapse data flows (20)

Data Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFData Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADF
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
December 2022 Microsoft 365 Need to Know Webinar
December 2022 Microsoft 365 Need to Know WebinarDecember 2022 Microsoft 365 Need to Know Webinar
December 2022 Microsoft 365 Need to Know Webinar
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
Microsoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMicrosoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office Edition
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
LinkedIn Certifications List
LinkedIn Certifications ListLinkedIn Certifications List
LinkedIn Certifications List
 
Azureと楽しいクリスマスを迎える安心の手引き
Azureと楽しいクリスマスを迎える安心の手引きAzureと楽しいクリスマスを迎える安心の手引き
Azureと楽しいクリスマスを迎える安心の手引き
 
Microsoft Data Science Technologies: Architecture Edition 201509
Microsoft Data Science Technologies: Architecture Edition 201509Microsoft Data Science Technologies: Architecture Edition 201509
Microsoft Data Science Technologies: Architecture Edition 201509
 
FEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny Jessee
FEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny JesseeFEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny Jessee
FEDSPUG November 2013: Developing Apps SharePoint 2013 by Danny Jessee
 
202310
202310202310
202310
 
Microsoft Azure News - April 2022
Microsoft Azure News - April 2022Microsoft Azure News - April 2022
Microsoft Azure News - April 2022
 
Microsoft Windows 10 Bootcamp - Active directory
Microsoft Windows 10 Bootcamp - Active directoryMicrosoft Windows 10 Bootcamp - Active directory
Microsoft Windows 10 Bootcamp - Active directory
 
Oracle Cloud Infrastructure:2020年5月度サービス・アップデート
Oracle Cloud Infrastructure:2020年5月度サービス・アップデートOracle Cloud Infrastructure:2020年5月度サービス・アップデート
Oracle Cloud Infrastructure:2020年5月度サービス・アップデート
 
Develop computer vision applications with azure computer vision api
Develop computer vision applications with azure computer vision apiDevelop computer vision applications with azure computer vision api
Develop computer vision applications with azure computer vision api
 
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
いそがしいひとのための Microsoft Ignite 2018 最新情報 Data 編
 
Power BI dataflows と Power Platform Data Integration の使いどころ
Power BI dataflows と Power Platform Data Integration の使いどころPower BI dataflows と Power Platform Data Integration の使いどころ
Power BI dataflows と Power Platform Data Integration の使いどころ
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Oracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinarOracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinar
 

Plus de Mark Kromer

Plus de Mark Kromer (20)

Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptx
 
Build data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesBuild data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelines
 
Data cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsData cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flows
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power Query
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADF
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300
 
ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2
 
ADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview Migration
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
 
Azure Data Factory Data Flow Limited Preview for January 2019
Azure Data Factory Data Flow Limited Preview for January 2019Azure Data Factory Data Flow Limited Preview for January 2019
Azure Data Factory Data Flow Limited Preview for January 2019
 
Microsoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMicrosoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow Scenarios
 
Azure Data Factory Data Flow Preview December 2019
Azure Data Factory Data Flow Preview December 2019Azure Data Factory Data Flow Preview December 2019
Azure Data Factory Data Flow Preview December 2019
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018
 
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Data cleansing and prep with synapse data flows