An overview on how we have approached dataops to allow analysts and data scientists to work quickly and release frequently with high confidence. Covers:
- Cloud/multi-cloud architecture
- CI/CD in the data space
- Development, testing, and deployment
- Monitoring and alerting
2. Agenda
04 - Dev/Test Flow
TA goals, customers, and team How our team builds and tests changes
02 - Data at TripActions 05 - Deployments
Data team objectives and architectural objectives How code moves into production and is monitored
03 - Infrastructure 06 - The Future
Architecture, Platforms, and tooling Future objectives in tooling and process
01 - Intro TripActions
5. 63% feel they have
to handle everything on
their own when
something goes wrong
83% spend over
an hour booking a trip
6. Built for the traveler by the traveler
6
and many more...
97%
Traveler adoption
34%
Hotel cost savings
1.5M
Hotel rooms
7. By the numbers
Managing travel for >4000
companies
Partners range from small
businesses to Fortune 100
companies in a variety of industries
Supporting more than a
million travellers
TripActions provides booking and
support services for all forms of
business and personal travel
800 employees around the
globe
Headquartered in California,
TripActions has offices around the
globe including Amsterdam
9. Who is the Data Team?
BI Palo Alto - 3 CA, 1 IL
BI-PA ● Product BI
● Liquid (credit card product) reporting and analysis
BI Amsterdam - 6 AMS
BI-AMS ● Operational BI (Customer Service, Success, Supply)
● Finance reporting and analysis
Data Science - 7 AMS, 1 Israel
DS
● Insights and analytics
● Predictive modelling
● Production ML services
Data Engineering - 4 AMS, 1 CA
DE
● Data integration
● Data warehousing
● Infrastructure
● Tooling
10. BI @ TripActions
Business Intelligence
Pillars
Standardized
Reporting
Training and
Development
Ad Hoc
Reporting and
Analytics
● >50% of company uses standard
reporting daily, >1000 daily report
views
● >65% of company has attended BI
training
● ~100 weekly self-service ad hoc
reports
15. Pipelinewise
What is it?
● Extensible, “any source to any target”,
singer.io wrapper
● Provides stitch-like experience for job
management via yml definition files
● TripActions maintains a custom fork
that extends logging, metrics, and
functionality
17. Code Architecture - dbt
Data Warehouse
Core integration of all data for
concepts around users, activity,
finance, etc
● Basis for all reporting and
data science
● Provides rich, integrated
data
● Updated every 30
minutes
Event Models
“Big data” models to transform
raw events from logs and event
tracking into usable data
● Integrates ~15TB of data
from three event sources
● Enriches and normalizes
to a common data model
Reporting Marts
Denormalized reporting views
for BI reporting and self-service
● Underlies every Tableau
dashboard and >1400
self-service reports
Data Science
Data transformations to feed
into our ML analytics and
services
● Used to power every site
interaction via
personalized experiences
● Drives target setting and
operations planning
19. Development approach
Work close to the
truth
Let analysts use real data
and directly test against
prod DWH to measure
impact
Make it easy to
validate, hard to fail
Tooling should make it hard
to make mistakes and easy
to commit with confidence
ALWAYS test and
document
No change should be
deployed without
documentation and tests in
place first
Rapid, high quality code changes
Combining tooling, process, and education allows anyone to continuously,
confidently make changes to core data models
20. Analyst/developer workflow
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others
21. Analyst/developer workflow
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication
22. dbt Development
Every user has their own
dev database
Prior to starting, analysts can
either clone tables or create
views to production for project
dependencies
All raw data can be modelled
and tested based on actual prod
data
23. Local quality review
Quality review is intended to check the
following areas:
1. Code runnability
2. Existing tests
3. Data quality
4. Documentation
5. New tests
24. Code quality testing and automated tests
● Code quality checks run in the
following ways
○ Changed table and all dependent tables in
project
○ If models are incrementally loaded,
incremental refreshes
● Tests run on the changed model and
all dependents
● Other projects are then checked for
potential dependencies
28. PR Review
● PRs follow a standard structure and
labelling
○ Local testing report card becomes the body
of the PR
● Slack automation coordinates the
review
○ Notifies the reviewers of the new PR
○ Informs dev of change requests
○ Tracks and labels when the PR is approved
and then merged
30. Deploying into Snowflake
● Changes in dbt models are detected
when a PR is merged
● Deploy processes kick off
automatically, running
○ The changed model
○ Dependent models (based on model type
and name)
● Global data dictionaries are updated
on server and google sheets with new
information
33. Monitoring via automated testing
● All data tested every six hours
● Any failing tests posted to channel
● SQL added to a pastebin for easy troubleshooting
35. What is it?
● Standardized data profiling and
testing
● Alerting on changes in data quality or
structure
Planned integration at TripActions
● Directly generate test profiles and
configurations via pipelinewise
● Integration of great_expectations
tests and data directly into tadoc /
dbt docs
36. Pipelinewise 2.0
● Extend to “anywhere to anywhere”
functionality with standardized JSON
API importer functionality
● Source data discovery and reporting
to show analysts/DS new data objects
37. dbt Validator 2.0
● Smart, dynamic re-cloning of objects
into dev databases for faster testing
○ Cleanup functionality to prevent testing on
stale objects
○ Fast clone based on dbt DAG to accelerate
development
● Extended test capabilities including
custom tests and data validation ->
automated tests
● Automated reporting of BI
dependencies on marts and tables
38. Rob Winters | Director, Data | rwinters@tripactions.com
Thank you!