SlideShare a Scribd company logo
1 of 11
Download to read offline
Presented By: Ram Indukuri
Airbyte - ELT in Modern
Data Stack
KnolX Etiquettes
Punctuality
Join the session 5 minutes prior to
the session start time. We start on
time and conclude on time!
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your mobile devices in silent
mode, feel free to move out of
session in case you need to attend
an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during
the session.
Our Agenda
01 The Holy Grail
02 Modern Data Stack
03 Transformations
04 Airbyte
05 Demo
Holy Grail of Modern Data Architecture - Data Mesh
Modern data stack
ELT Data Pipeline
Cloud Based Data Lake
Data Transformation Tool
Modern BI Tools
1
2
3
4
Modern data stack
1
2
3
4
# PYTHON
# connect to db using wrapper around psycopg2
db = DatabaseConnection(db='db', user='username', password='password')
# grab data from db and load into memory
df = db.run_query("SELECT * FROM cleaned_table;")
df = pd.DataFrame(df, columns=['user_id', 'series_id', 'timestamp'])
# calculate time since first visit
df = df.assign(time_since_first=df.groupby('user_id', sort=False).timestamp.apply(lambda x: x -
x.min()))
# calculate time since last visit
df = df.assign(time_since_last=df.sort_values(['timestamp'], ascending=True).groupby('user_id',
sort=False)['timestamp'].transform(pd.Series.diff))
# save df to compressed csv
df.to_csv('transform_time_test.gz', compression='gzip')
-- SQL equivalent
-- increase the working memory (be careful with this)
set work_mem='600MB';
-- create a dual index on the partition
CREATE INDEX IF NOT EXISTS user_time_index ON table(user_id, timestamp);
-- calculate time since last visit and time since first visit in one pass
SELECT *, AGE(timestamp, LAG(timestamp, 1, timestamp) OVER w) AS time_since_last, AGE(timestamp,
FIRST_VALUE(timestamp) OVER w) AS time_since_first FROM table WINDOW w AS (PARTITION BY user_id
ORDER BY timestamp);
SQL has taken over as dominant transformation tool
dbt
Airbyte - Plugin Architecture
Airbyte Internal Architecture
Airbyte-ModernDataStack

More Related Content

Similar to Airbyte-ModernDataStack

Taming the Beast: A SharePoint Survival Guide for the Server Admin
Taming the Beast: A SharePoint Survival Guide for the Server AdminTaming the Beast: A SharePoint Survival Guide for the Server Admin
Taming the Beast: A SharePoint Survival Guide for the Server AdminCraigToThePoint
 
Oracle data guard for beginners
Oracle data guard for beginnersOracle data guard for beginners
Oracle data guard for beginnersPini Dibask
 
End-to-end working of Apache Spark
End-to-end working of Apache SparkEnd-to-end working of Apache Spark
End-to-end working of Apache SparkKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Automating Yourself Out of Trouble
Automating Yourself Out of TroubleAutomating Yourself Out of Trouble
Automating Yourself Out of TroubleJose De La Rosa
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksKnoldus Inc.
 
FOG drive Keerthana3rd ppt.pptx
FOG drive Keerthana3rd ppt.pptxFOG drive Keerthana3rd ppt.pptx
FOG drive Keerthana3rd ppt.pptxrohithprakash16
 
How to connect Postgres SQL db using slick library with play framework
How to connect Postgres SQL db using slick library with play frameworkHow to connect Postgres SQL db using slick library with play framework
How to connect Postgres SQL db using slick library with play frameworkKnoldus Inc.
 
IOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for BeginnersIOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for BeginnersPini Dibask
 
Oracle Data Guard for Beginners
Oracle Data Guard for BeginnersOracle Data Guard for Beginners
Oracle Data Guard for BeginnersPini Dibask
 
Syed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Haider
 
01-01-2017 This section will lay out the implementation plan o.docx
01-01-2017 This section will lay out the implementation plan o.docx01-01-2017 This section will lay out the implementation plan o.docx
01-01-2017 This section will lay out the implementation plan o.docxhoney725342
 
PPT - Internovation (1)
PPT - Internovation (1)PPT - Internovation (1)
PPT - Internovation (1)Pritish Mishra
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
Spark Structured APIs
Spark Structured APIsSpark Structured APIs
Spark Structured APIsKnoldus Inc.
 
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of IronstreamData Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of IronstreamPrecisely
 
NLP based Data Engineering and ETL Tool - Ask On Data.pdf
NLP based Data Engineering and ETL Tool - Ask On Data.pdfNLP based Data Engineering and ETL Tool - Ask On Data.pdf
NLP based Data Engineering and ETL Tool - Ask On Data.pdfHelicalInsight1
 
James Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docx
James Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docxJames Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docx
James Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docxpriestmanmable
 

Similar to Airbyte-ModernDataStack (20)

Taming the Beast: A SharePoint Survival Guide for the Server Admin
Taming the Beast: A SharePoint Survival Guide for the Server AdminTaming the Beast: A SharePoint Survival Guide for the Server Admin
Taming the Beast: A SharePoint Survival Guide for the Server Admin
 
Oracle data guard for beginners
Oracle data guard for beginnersOracle data guard for beginners
Oracle data guard for beginners
 
End-to-end working of Apache Spark
End-to-end working of Apache SparkEnd-to-end working of Apache Spark
End-to-end working of Apache Spark
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Automating Yourself Out of Trouble
Automating Yourself Out of TroubleAutomating Yourself Out of Trouble
Automating Yourself Out of Trouble
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
FOG drive Keerthana3rd ppt.pptx
FOG drive Keerthana3rd ppt.pptxFOG drive Keerthana3rd ppt.pptx
FOG drive Keerthana3rd ppt.pptx
 
How to connect Postgres SQL db using slick library with play framework
How to connect Postgres SQL db using slick library with play frameworkHow to connect Postgres SQL db using slick library with play framework
How to connect Postgres SQL db using slick library with play framework
 
IOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for BeginnersIOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for Beginners
 
Oracle Data Guard for Beginners
Oracle Data Guard for BeginnersOracle Data Guard for Beginners
Oracle Data Guard for Beginners
 
Mysql Latency
Mysql LatencyMysql Latency
Mysql Latency
 
Syed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Iqbal haider_updated
Syed Iqbal haider_updated
 
01-01-2017 This section will lay out the implementation plan o.docx
01-01-2017 This section will lay out the implementation plan o.docx01-01-2017 This section will lay out the implementation plan o.docx
01-01-2017 This section will lay out the implementation plan o.docx
 
PPT - Internovation (1)
PPT - Internovation (1)PPT - Internovation (1)
PPT - Internovation (1)
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Spark Structured APIs
Spark Structured APIsSpark Structured APIs
Spark Structured APIs
 
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of IronstreamData Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
 
NLP based Data Engineering and ETL Tool - Ask On Data.pdf
NLP based Data Engineering and ETL Tool - Ask On Data.pdfNLP based Data Engineering and ETL Tool - Ask On Data.pdf
NLP based Data Engineering and ETL Tool - Ask On Data.pdf
 
James Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docx
James Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docxJames Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docx
James Williams_IT110_IP1.doc.docxIT110-1404A-02 Introduction.docx
 

More from Knoldus Inc.

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingKnoldus Inc.
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionKnoldus Inc.
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxKnoldus Inc.
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptxKnoldus Inc.
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfKnoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxKnoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesKnoldus Inc.
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxKnoldus Inc.
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxKnoldus Inc.
 

More from Knoldus Inc. (20)

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptx
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Airbyte-ModernDataStack

  • 1. Presented By: Ram Indukuri Airbyte - ELT in Modern Data Stack
  • 2. KnolX Etiquettes Punctuality Join the session 5 minutes prior to the session start time. We start on time and conclude on time! Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call. Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3. Our Agenda 01 The Holy Grail 02 Modern Data Stack 03 Transformations 04 Airbyte 05 Demo
  • 4. Holy Grail of Modern Data Architecture - Data Mesh
  • 5. Modern data stack ELT Data Pipeline Cloud Based Data Lake Data Transformation Tool Modern BI Tools 1 2 3 4
  • 7. # PYTHON # connect to db using wrapper around psycopg2 db = DatabaseConnection(db='db', user='username', password='password') # grab data from db and load into memory df = db.run_query("SELECT * FROM cleaned_table;") df = pd.DataFrame(df, columns=['user_id', 'series_id', 'timestamp']) # calculate time since first visit df = df.assign(time_since_first=df.groupby('user_id', sort=False).timestamp.apply(lambda x: x - x.min())) # calculate time since last visit df = df.assign(time_since_last=df.sort_values(['timestamp'], ascending=True).groupby('user_id', sort=False)['timestamp'].transform(pd.Series.diff)) # save df to compressed csv df.to_csv('transform_time_test.gz', compression='gzip') -- SQL equivalent -- increase the working memory (be careful with this) set work_mem='600MB'; -- create a dual index on the partition CREATE INDEX IF NOT EXISTS user_time_index ON table(user_id, timestamp); -- calculate time since last visit and time since first visit in one pass SELECT *, AGE(timestamp, LAG(timestamp, 1, timestamp) OVER w) AS time_since_last, AGE(timestamp, FIRST_VALUE(timestamp) OVER w) AS time_since_first FROM table WINDOW w AS (PARTITION BY user_id ORDER BY timestamp); SQL has taken over as dominant transformation tool
  • 8. dbt
  • 9. Airbyte - Plugin Architecture