Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

CI/CD for a Data Platform

90 vues

Publié le

Discover how your organizations can use the power of automation to streamline data pipelines with the help of Azure Data Factory.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

CI/CD for a Data Platform

  1. 1. CI/CD for a Data Platform How to enable consistent data pipelines
  2. 2. 2 Your Host | Koen Rottiers | Senior Consultant @ Codit | 9 years in IT, track record in networking and infrastructure | Combining people, business and technology CI/CD for a Data Platform: How to enable consistent data pipelines @KoenRottiers
  3. 3. Agenda | A Data Platform? | What is Azure Data Factory? | The Data Lake architecture | Why do CI/CD for a Data Platform? | Azure Data Factory Git integration 3
  4. 4. A Data Platform? 4
  5. 5. Data Platform overview 5 | Ingestion different sources | Centralized data store | Data flows through | Output curated data | Multiple inputs and outputs
  6. 6. What is Azure Data Factory? 6
  7. 7. Azure Data Factory 7 | Orchestrator | Connectors to different data sources | Cloud and on-premises | Data Mapping flows | Data Wrangling flows | External compute integration | DataBricks | AzureML | Azure Functions | ....
  8. 8. Place in the data platform 8
  9. 9. The Data Lake architecture 9
  10. 10. High-Level Architecture 10 On-Premises Other Azure Resources Azure DevOps Project for DataLake infra and code DB DB File Server ExpressRoute vNet Integrated External Connections/ Sources/Destinations Transformation Rg-bru-{env}-datalake-001 App-bru-{env}- {action}-datalake-001 la-bru-{env}-{action}- datalake-001 Kb-bru-{env}-datalake-001 Stabru{env}landingdatalake001 mi-bru-{env}-datalake-001 Stabru{env}rawdatalake001 Stabru{env}curateddatalake001 Stabru{env}outputdatalake001 df-bru-{env}-datalake-001
  11. 11. Self-Hosted integration runtimes 11 On-Premises Azure Networks DB File Server ExpressRoute vNet Peering df-bru-{env}-datalake-001 Hub Network Self-Hosted Runtime Azure Integration Runtime DB
  12. 12. Why do CI/CD for a Data Platform? 12
  13. 13. Data Platform Roles and Responsibilities 13 - Data platform owner: This person is the owner and responsible of the overall data platform. - Data platform operator: This role is responsible for the day to day operational tasks of the platform - Data pipeline owner: Different pipelines will be running on the platform. Each pipeline will have its own purpose and so it’s specific owner. This is someone from the BI Team or business. - Data pipeline developer: This person will be developing new pipelines or making adjustment to existing ones. - Data source owner: Different data sources will be integrated with the data platform. Every data source will need to have an owner to determine access rights, access manner,... This person will be responsible for the data residing in the source system. Most of the time this will be the application owner of the application that uses the data source.
  14. 14. Key Advantages 14 | Consistent deployment of data pipelines | Full testing of data flows in the Data Lake | Better collaboration | Feature development tracking | Pipeline quality reviews | More fine-grained data security | Tracking data movements
  15. 15. Azure Data Factory Git integration 15
  16. 16. Data Factory Git Integration 16
  17. 17. Repo’s and branches 17
  18. 18. What does it look like? 18
  19. 19. Azure DevOps – Infra Git Repository 19
  20. 20. Azure DevOps – Pipelines Git Repository 20
  21. 21. Azure DevOps – Pipelines 21
  22. 22. Azure Data Factory – Git Integration 22
  23. 23. So why? 23 | Let data engineer/data scientists focus on delivering value and insights to the business | Enable an agile process in data engineering | Consistency across environments | Track feature development / Bug fixing | Be able to audit your data streams
  24. 24. Do you want a demo? Feel free to reach out to us. 24
  25. 25. Q&A 25