SlideShare une entreprise Scribd logo
1  sur  38
DataOps at
TripActions
27 Oct 2020
Agenda
04 - Dev/Test Flow
TA goals, customers, and team How our team builds and tests changes
02 - Data at TripActions 05 - Deployments
Data team objectives and architectural objectives How code moves into production and is monitored
03 - Infrastructure 06 - The Future
Architecture, Platforms, and tooling Future objectives in tooling and process
01 - Intro TripActions
TripActions Overview
4
OUR MISSION
To Move People,
Ideas &
Businesses
Forward
63% feel they have
to handle everything on
their own when
something goes wrong
83% spend over
an hour booking a trip
Built for the traveler by the traveler
6
and many more...
97%
Traveler adoption
34%
Hotel cost savings
1.5M
Hotel rooms
By the numbers
Managing travel for >4000
companies
Partners range from small
businesses to Fortune 100
companies in a variety of industries
Supporting more than a
million travellers
TripActions provides booking and
support services for all forms of
business and personal travel
800 employees around the
globe
Headquartered in California,
TripActions has offices around the
globe including Amsterdam
Data at TripActions
Who is the Data Team?
BI Palo Alto - 3 CA, 1 IL
BI-PA ● Product BI
● Liquid (credit card product) reporting and analysis
BI Amsterdam - 6 AMS
BI-AMS ● Operational BI (Customer Service, Success, Supply)
● Finance reporting and analysis
Data Science - 7 AMS, 1 Israel
DS
● Insights and analytics
● Predictive modelling
● Production ML services
Data Engineering - 4 AMS, 1 CA
DE
● Data integration
● Data warehousing
● Infrastructure
● Tooling
BI @ TripActions
Business Intelligence
Pillars
Standardized
Reporting
Training and
Development
Ad Hoc
Reporting and
Analytics
● >50% of company uses standard
reporting daily, >1000 daily report
views
● >65% of company has attended BI
training
● ~100 weekly self-service ad hoc
reports
Data Science
Personalizing User Experience Empowering Decision Making
Architecture and Infrastructure
Overall BI/Data Engineering Architecture
Additional Services
Data Flows
Pipelinewise
What is it?
● Extensible, “any source to any target”,
singer.io wrapper
● Provides stitch-like experience for job
management via yml definition files
● TripActions maintains a custom fork
that extends logging, metrics, and
functionality
dbt: Puts the T in ELT
Code Architecture - dbt
Data Warehouse
Core integration of all data for
concepts around users, activity,
finance, etc
● Basis for all reporting and
data science
● Provides rich, integrated
data
● Updated every 30
minutes
Event Models
“Big data” models to transform
raw events from logs and event
tracking into usable data
● Integrates ~15TB of data
from three event sources
● Enriches and normalizes
to a common data model
Reporting Marts
Denormalized reporting views
for BI reporting and self-service
● Underlies every Tableau
dashboard and >1400
self-service reports
Data Science
Data transformations to feed
into our ML analytics and
services
● Used to power every site
interaction via
personalized experiences
● Drives target setting and
operations planning
How We Develop and Test
Development approach
Work close to the
truth
Let analysts use real data
and directly test against
prod DWH to measure
impact
Make it easy to
validate, hard to fail
Tooling should make it hard
to make mistakes and easy
to commit with confidence
ALWAYS test and
document
No change should be
deployed without
documentation and tests in
place first
Rapid, high quality code changes
Combining tooling, process, and education allows anyone to continuously,
confidently make changes to core data models
Analyst/developer workflow
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others
Analyst/developer workflow
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication
dbt Development
Every user has their own
dev database
Prior to starting, analysts can
either clone tables or create
views to production for project
dependencies
All raw data can be modelled
and tested based on actual prod
data
Local quality review
Quality review is intended to check the
following areas:
1. Code runnability
2. Existing tests
3. Data quality
4. Documentation
5. New tests
Code quality testing and automated tests
● Code quality checks run in the
following ways
○ Changed table and all dependent tables in
project
○ If models are incrementally loaded,
incremental refreshes
● Tests run on the changed model and
all dependents
● Other projects are then checked for
potential dependencies
Data Validation - Manual
Documentation and new tests
Net result: Low work, high confidence in changes
PR Review
● PRs follow a standard structure and
labelling
○ Local testing report card becomes the body
of the PR
● Slack automation coordinates the
review
○ Notifies the reviewers of the new PR
○ Informs dev of change requests
○ Tracks and labels when the PR is approved
and then merged
Deployments and Monitoring
Deploying into Snowflake
● Changes in dbt models are detected
when a PR is merged
● Deploy processes kick off
automatically, running
○ The changed model
○ Dependent models (based on model type
and name)
● Global data dictionaries are updated
on server and google sheets with new
information
In depth: deployment evaluation process
What if it goes wrong?
Monitoring via automated testing
● All data tested every six hours
● Any failing tests posted to channel
● SQL added to a pastebin for easy troubleshooting
Looking to the Future
What is it?
● Standardized data profiling and
testing
● Alerting on changes in data quality or
structure
Planned integration at TripActions
● Directly generate test profiles and
configurations via pipelinewise
● Integration of great_expectations
tests and data directly into tadoc /
dbt docs
Pipelinewise 2.0
● Extend to “anywhere to anywhere”
functionality with standardized JSON
API importer functionality
● Source data discovery and reporting
to show analysts/DS new data objects
dbt Validator 2.0
● Smart, dynamic re-cloning of objects
into dev databases for faster testing
○ Cleanup functionality to prevent testing on
stale objects
○ Fast clone based on dbt DAG to accelerate
development
● Extended test capabilities including
custom tests and data validation ->
automated tests
● Automated reporting of BI
dependencies on marts and tables
Rob Winters | Director, Data | rwinters@tripactions.com
Thank you!

Contenu connexe

Tendances

Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Digital Transformation Strategy Template and Training
Digital Transformation Strategy Template and TrainingDigital Transformation Strategy Template and Training
Digital Transformation Strategy Template and TrainingAurelien Domont, MBA
 
Practical Enterprise Architecture - Introducing CSVLOD EA Model
Practical Enterprise Architecture - Introducing CSVLOD EA ModelPractical Enterprise Architecture - Introducing CSVLOD EA Model
Practical Enterprise Architecture - Introducing CSVLOD EA ModelAshraf Fouad
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flowconfluent
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at NubankDatabricks
 
Digital Transformation Toolkit - Framework, Best Practices and Templates
Digital Transformation Toolkit - Framework, Best Practices and TemplatesDigital Transformation Toolkit - Framework, Best Practices and Templates
Digital Transformation Toolkit - Framework, Best Practices and TemplatesAurelien Domont, MBA
 
Biopharma's search for sustainable growth
Biopharma's search for sustainable growthBiopharma's search for sustainable growth
Biopharma's search for sustainable growthaccenture
 
Book-of-Strategy-Maps.pdf
Book-of-Strategy-Maps.pdfBook-of-Strategy-Maps.pdf
Book-of-Strategy-Maps.pdfKayKay751113
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
 
Airbyte - Series-B deck
Airbyte - Series-B deckAirbyte - Series-B deck
Airbyte - Series-B deckAirbyte
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
The People's Innovation Platform - Microsoft Power Platform
    The People's Innovation Platform - Microsoft Power Platform    The People's Innovation Platform - Microsoft Power Platform
The People's Innovation Platform - Microsoft Power PlatformKorcomptenz Inc
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
Accelerating Innovation in Energy
Accelerating Innovation in EnergyAccelerating Innovation in Energy
Accelerating Innovation in Energyaccenture
 
Enter the World of PowerApps - Canvas vs. Model-Driven Apps
Enter the World of PowerApps - Canvas vs. Model-Driven AppsEnter the World of PowerApps - Canvas vs. Model-Driven Apps
Enter the World of PowerApps - Canvas vs. Model-Driven AppsDaniel Laskewitz
 

Tendances (20)

Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
PowerApps
PowerAppsPowerApps
PowerApps
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Digital Transformation Strategy Template and Training
Digital Transformation Strategy Template and TrainingDigital Transformation Strategy Template and Training
Digital Transformation Strategy Template and Training
 
Practical Enterprise Architecture - Introducing CSVLOD EA Model
Practical Enterprise Architecture - Introducing CSVLOD EA ModelPractical Enterprise Architecture - Introducing CSVLOD EA Model
Practical Enterprise Architecture - Introducing CSVLOD EA Model
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at Nubank
 
Digital Transformation Toolkit - Framework, Best Practices and Templates
Digital Transformation Toolkit - Framework, Best Practices and TemplatesDigital Transformation Toolkit - Framework, Best Practices and Templates
Digital Transformation Toolkit - Framework, Best Practices and Templates
 
Biopharma's search for sustainable growth
Biopharma's search for sustainable growthBiopharma's search for sustainable growth
Biopharma's search for sustainable growth
 
Book-of-Strategy-Maps.pdf
Book-of-Strategy-Maps.pdfBook-of-Strategy-Maps.pdf
Book-of-Strategy-Maps.pdf
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Airbyte - Series-B deck
Airbyte - Series-B deckAirbyte - Series-B deck
Airbyte - Series-B deck
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
The People's Innovation Platform - Microsoft Power Platform
    The People's Innovation Platform - Microsoft Power Platform    The People's Innovation Platform - Microsoft Power Platform
The People's Innovation Platform - Microsoft Power Platform
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Microsoft power platform
Microsoft power platformMicrosoft power platform
Microsoft power platform
 
Accelerating Innovation in Energy
Accelerating Innovation in EnergyAccelerating Innovation in Energy
Accelerating Innovation in Energy
 
Enter the World of PowerApps - Canvas vs. Model-Driven Apps
Enter the World of PowerApps - Canvas vs. Model-Driven AppsEnter the World of PowerApps - Canvas vs. Model-Driven Apps
Enter the World of PowerApps - Canvas vs. Model-Driven Apps
 

Similaire à Data Ops at TripActions

Srujana Unnam Microstrategy Profile
Srujana Unnam Microstrategy ProfileSrujana Unnam Microstrategy Profile
Srujana Unnam Microstrategy Profilesrujana unnam
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptxsharpan
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform Michael Ghen
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...AgileNetwork
 
Pysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avullaPysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avullaBilot
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analyticsRob Winters
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptxsharpan
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformArvind Sathi
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep Shahapur
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxOpsTree solutions
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Resume_Arun_Baby_03Jan17
Resume_Arun_Baby_03Jan17Resume_Arun_Baby_03Jan17
Resume_Arun_Baby_03Jan17Arun Baby
 
Maximizing Your Data’s Potential: DOTs & DPWs Edition
Maximizing Your Data’s Potential: DOTs & DPWs EditionMaximizing Your Data’s Potential: DOTs & DPWs Edition
Maximizing Your Data’s Potential: DOTs & DPWs EditionSafe Software
 
Copy of Alok_Singh_CV
Copy of Alok_Singh_CVCopy of Alok_Singh_CV
Copy of Alok_Singh_CVAlok Singh
 

Similaire à Data Ops at TripActions (20)

Srujana Unnam Microstrategy Profile
Srujana Unnam Microstrategy ProfileSrujana Unnam Microstrategy Profile
Srujana Unnam Microstrategy Profile
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
 
Pysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avullaPysyvästi laadukasta masterdataa SmartMDM:n avulla
Pysyvästi laadukasta masterdataa SmartMDM:n avulla
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
Anusaa_Qlikview
Anusaa_QlikviewAnusaa_Qlikview
Anusaa_Qlikview
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of Exerience
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
SANJAY_SINGH
SANJAY_SINGHSANJAY_SINGH
SANJAY_SINGH
 
Resume_Arun_Baby_03Jan17
Resume_Arun_Baby_03Jan17Resume_Arun_Baby_03Jan17
Resume_Arun_Baby_03Jan17
 
Puneet Verma CV
Puneet Verma CVPuneet Verma CV
Puneet Verma CV
 
Abdul ETL Resume
Abdul ETL ResumeAbdul ETL Resume
Abdul ETL Resume
 
Maximizing Your Data’s Potential: DOTs & DPWs Edition
Maximizing Your Data’s Potential: DOTs & DPWs EditionMaximizing Your Data’s Potential: DOTs & DPWs Edition
Maximizing Your Data’s Potential: DOTs & DPWs Edition
 
Copy of Alok_Singh_CV
Copy of Alok_Singh_CVCopy of Alok_Singh_CV
Copy of Alok_Singh_CV
 

Plus de Rob Winters

A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousingRob Winters
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Building a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningBuilding a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningRob Winters
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsRob Winters
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesRob Winters
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataRob Winters
 
Getting Started with Big Data Analytics
Getting Started with Big Data AnalyticsGetting Started with Big Data Analytics
Getting Started with Big Data AnalyticsRob Winters
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowRob Winters
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil GamesRob Winters
 

Plus de Rob Winters (11)

A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousing
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Building a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningBuilding a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine Learning
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data Analytics
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 
Getting Started with Big Data Analytics
Getting Started with Big Data AnalyticsGetting Started with Big Data Analytics
Getting Started with Big Data Analytics
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right Now
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 

Dernier

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Dernier (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Data Ops at TripActions

  • 2. Agenda 04 - Dev/Test Flow TA goals, customers, and team How our team builds and tests changes 02 - Data at TripActions 05 - Deployments Data team objectives and architectural objectives How code moves into production and is monitored 03 - Infrastructure 06 - The Future Architecture, Platforms, and tooling Future objectives in tooling and process 01 - Intro TripActions
  • 4. 4 OUR MISSION To Move People, Ideas & Businesses Forward
  • 5. 63% feel they have to handle everything on their own when something goes wrong 83% spend over an hour booking a trip
  • 6. Built for the traveler by the traveler 6 and many more... 97% Traveler adoption 34% Hotel cost savings 1.5M Hotel rooms
  • 7. By the numbers Managing travel for >4000 companies Partners range from small businesses to Fortune 100 companies in a variety of industries Supporting more than a million travellers TripActions provides booking and support services for all forms of business and personal travel 800 employees around the globe Headquartered in California, TripActions has offices around the globe including Amsterdam
  • 9. Who is the Data Team? BI Palo Alto - 3 CA, 1 IL BI-PA ● Product BI ● Liquid (credit card product) reporting and analysis BI Amsterdam - 6 AMS BI-AMS ● Operational BI (Customer Service, Success, Supply) ● Finance reporting and analysis Data Science - 7 AMS, 1 Israel DS ● Insights and analytics ● Predictive modelling ● Production ML services Data Engineering - 4 AMS, 1 CA DE ● Data integration ● Data warehousing ● Infrastructure ● Tooling
  • 10. BI @ TripActions Business Intelligence Pillars Standardized Reporting Training and Development Ad Hoc Reporting and Analytics ● >50% of company uses standard reporting daily, >1000 daily report views ● >65% of company has attended BI training ● ~100 weekly self-service ad hoc reports
  • 11. Data Science Personalizing User Experience Empowering Decision Making
  • 13. Overall BI/Data Engineering Architecture Additional Services
  • 15. Pipelinewise What is it? ● Extensible, “any source to any target”, singer.io wrapper ● Provides stitch-like experience for job management via yml definition files ● TripActions maintains a custom fork that extends logging, metrics, and functionality
  • 16. dbt: Puts the T in ELT
  • 17. Code Architecture - dbt Data Warehouse Core integration of all data for concepts around users, activity, finance, etc ● Basis for all reporting and data science ● Provides rich, integrated data ● Updated every 30 minutes Event Models “Big data” models to transform raw events from logs and event tracking into usable data ● Integrates ~15TB of data from three event sources ● Enriches and normalizes to a common data model Reporting Marts Denormalized reporting views for BI reporting and self-service ● Underlies every Tableau dashboard and >1400 self-service reports Data Science Data transformations to feed into our ML analytics and services ● Used to power every site interaction via personalized experiences ● Drives target setting and operations planning
  • 18. How We Develop and Test
  • 19. Development approach Work close to the truth Let analysts use real data and directly test against prod DWH to measure impact Make it easy to validate, hard to fail Tooling should make it hard to make mistakes and easy to commit with confidence ALWAYS test and document No change should be deployed without documentation and tests in place first Rapid, high quality code changes Combining tooling, process, and education allows anyone to continuously, confidently make changes to core data models
  • 20. Analyst/developer workflow Begin with a Jira issue Most changes begin with Jira tickets to track the development and manage stakeholder communication Analyst builds change in dbt All analysts and collaborators are proficient in dbt and 100% of transformations are built using it. Tooling makes it easy Automated quality review - local All analysts use an automated suite which verifies transformations and repeatability, runs tests, and adds documentation and new tests - dbt validator Automated quality review - remote Automated tests on the PR check for general code quality, formatting, dependencies, etc Guided PR review and merge PR processes allow minimum waiting for review and minimum distraction for others
  • 21. Analyst/developer workflow Analyst builds change in dbt All analysts and collaborators are proficient in dbt and 100% of transformations are built using it. Tooling makes it easy Automated quality review - local All analysts use an automated suite which verifies transformations and repeatability, runs tests, and adds documentation and new tests - dbt validator Automated quality review - remote Automated tests on the PR check for general code quality, formatting, dependencies, etc Guided PR review and merge PR processes allow minimum waiting for review and minimum distraction for others Begin with a Jira issue Most changes begin with Jira tickets to track the development and manage stakeholder communication
  • 22. dbt Development Every user has their own dev database Prior to starting, analysts can either clone tables or create views to production for project dependencies All raw data can be modelled and tested based on actual prod data
  • 23. Local quality review Quality review is intended to check the following areas: 1. Code runnability 2. Existing tests 3. Data quality 4. Documentation 5. New tests
  • 24. Code quality testing and automated tests ● Code quality checks run in the following ways ○ Changed table and all dependent tables in project ○ If models are incrementally loaded, incremental refreshes ● Tests run on the changed model and all dependents ● Other projects are then checked for potential dependencies
  • 27. Net result: Low work, high confidence in changes
  • 28. PR Review ● PRs follow a standard structure and labelling ○ Local testing report card becomes the body of the PR ● Slack automation coordinates the review ○ Notifies the reviewers of the new PR ○ Informs dev of change requests ○ Tracks and labels when the PR is approved and then merged
  • 30. Deploying into Snowflake ● Changes in dbt models are detected when a PR is merged ● Deploy processes kick off automatically, running ○ The changed model ○ Dependent models (based on model type and name) ● Global data dictionaries are updated on server and google sheets with new information
  • 31. In depth: deployment evaluation process
  • 32. What if it goes wrong?
  • 33. Monitoring via automated testing ● All data tested every six hours ● Any failing tests posted to channel ● SQL added to a pastebin for easy troubleshooting
  • 34. Looking to the Future
  • 35. What is it? ● Standardized data profiling and testing ● Alerting on changes in data quality or structure Planned integration at TripActions ● Directly generate test profiles and configurations via pipelinewise ● Integration of great_expectations tests and data directly into tadoc / dbt docs
  • 36. Pipelinewise 2.0 ● Extend to “anywhere to anywhere” functionality with standardized JSON API importer functionality ● Source data discovery and reporting to show analysts/DS new data objects
  • 37. dbt Validator 2.0 ● Smart, dynamic re-cloning of objects into dev databases for faster testing ○ Cleanup functionality to prevent testing on stale objects ○ Fast clone based on dbt DAG to accelerate development ● Extended test capabilities including custom tests and data validation -> automated tests ● Automated reporting of BI dependencies on marts and tables
  • 38. Rob Winters | Director, Data | rwinters@tripactions.com Thank you!