Data Ops at TripActions

DataOps at
TripActions
27 Oct 2020

Agenda
04 - Dev/Test Flow
TA goals, customers, and team How our team builds and tests changes
02 - Data at TripActions 05 - Deployments
Data team objectives and architectural objectives How code moves into production and is monitored
03 - Infrastructure 06 - The Future
Architecture, Platforms, and tooling Future objectives in tooling and process
01 - Intro TripActions

4
OUR MISSION
To Move People,
Ideas &
Businesses
Forward

63% feel they have
to handle everything on
their own when
something goes wrong
83% spend over
an hour booking a trip

Built for the traveler by the traveler
6
and many more...
97%
Traveler adoption
34%
Hotel cost savings
1.5M
Hotel rooms

By the numbers
Managing travel for >4000
companies
Partners range from small
businesses to Fortune 100
companies in a variety of industries
Supporting more than a
million travellers
TripActions provides booking and
support services for all forms of
business and personal travel
800 employees around the
globe
Headquartered in California,
TripActions has offices around the
globe including Amsterdam

Who is the Data Team?
BI Palo Alto - 3 CA, 1 IL
BI-PA ● Product BI
● Liquid (credit card product) reporting and analysis
BI Amsterdam - 6 AMS
BI-AMS ● Operational BI (Customer Service, Success, Supply)
● Finance reporting and analysis
Data Science - 7 AMS, 1 Israel
DS
● Insights and analytics
● Predictive modelling
● Production ML services
Data Engineering - 4 AMS, 1 CA
DE
● Data integration
● Data warehousing
● Infrastructure
● Tooling

BI @ TripActions
Business Intelligence
Pillars
Standardized
Reporting
Training and
Development
Ad Hoc
Reporting and
Analytics
● >50% of company uses standard
reporting daily, >1000 daily report
views
● >65% of company has attended BI
training
● ~100 weekly self-service ad hoc
reports

Data Science
Personalizing User Experience Empowering Decision Making

Architecture and Infrastructure

Overall BI/Data Engineering Architecture
Additional Services

Pipelinewise
What is it?
● Extensible, “any source to any target”,
singer.io wrapper
● Provides stitch-like experience for job
management via yml definition files
● TripActions maintains a custom fork
that extends logging, metrics, and
functionality

Code Architecture - dbt
Data Warehouse
Core integration of all data for
concepts around users, activity,
finance, etc
● Basis for all reporting and
data science
● Provides rich, integrated
data
● Updated every 30
minutes
Event Models
“Big data” models to transform
raw events from logs and event
tracking into usable data
● Integrates ~15TB of data
from three event sources
● Enriches and normalizes
to a common data model
Reporting Marts
Denormalized reporting views
for BI reporting and self-service
● Underlies every Tableau
dashboard and >1400
self-service reports
Data Science
Data transformations to feed
into our ML analytics and
services
● Used to power every site
interaction via
personalized experiences
● Drives target setting and
operations planning

Development approach
Work close to the
truth
Let analysts use real data
and directly test against
prod DWH to measure
impact
Make it easy to
validate, hard to fail
Tooling should make it hard
to make mistakes and easy
to commit with confidence
ALWAYS test and
document
No change should be
deployed without
documentation and tests in
place first
Rapid, high quality code changes
Combining tooling, process, and education allows anyone to continuously,
confidently make changes to core data models

Analyst/developer workflow
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others

Analyst/developer workflow
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication

dbt Development
Every user has their own
dev database
Prior to starting, analysts can
either clone tables or create
views to production for project
dependencies
All raw data can be modelled
and tested based on actual prod
data

Local quality review
Quality review is intended to check the
following areas:
1. Code runnability
2. Existing tests
3. Data quality
4. Documentation
5. New tests

Code quality testing and automated tests
● Code quality checks run in the
following ways
○ Changed table and all dependent tables in
project
○ If models are incrementally loaded,
incremental refreshes
● Tests run on the changed model and
all dependents
● Other projects are then checked for
potential dependencies

Net result: Low work, high confidence in changes

PR Review
● PRs follow a standard structure and
labelling
○ Local testing report card becomes the body
of the PR
● Slack automation coordinates the
review
○ Notifies the reviewers of the new PR
○ Informs dev of change requests
○ Tracks and labels when the PR is approved
and then merged

Deploying into Snowflake
● Changes in dbt models are detected
when a PR is merged
● Deploy processes kick off
automatically, running
○ The changed model
○ Dependent models (based on model type
and name)
● Global data dictionaries are updated
on server and google sheets with new
information

In depth: deployment evaluation process

Monitoring via automated testing
● All data tested every six hours
● Any failing tests posted to channel
● SQL added to a pastebin for easy troubleshooting

What is it?
● Standardized data profiling and
testing
● Alerting on changes in data quality or
structure
Planned integration at TripActions
● Directly generate test profiles and
configurations via pipelinewise
● Integration of great_expectations
tests and data directly into tadoc /
dbt docs

Pipelinewise 2.0
● Extend to “anywhere to anywhere”
functionality with standardized JSON
API importer functionality
● Source data discovery and reporting
to show analysts/DS new data objects

dbt Validator 2.0
● Smart, dynamic re-cloning of objects
into dev databases for faster testing
○ Cleanup functionality to prevent testing on
stale objects
○ Fast clone based on dbt DAG to accelerate
development
● Extended test capabilities including
custom tests and data validation ->
automated tests
● Automated reporting of BI
dependencies on marts and tables

Rob Winters | Director, Data | rwinters@tripactions.com
Thank you!

Data Ops at TripActions

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Data Ops at TripActions

Similaire à Data Ops at TripActions (20)

Plus de Rob Winters

Plus de Rob Winters (11)

Dernier

Dernier (20)

Data Ops at TripActions