Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
3. About Caserta
Data Intelligence Consulting and Modern Data Engineering
Award-winning data innovation
Internationally recognized work force
Strategy, Architecture, Governance, Implementation
5. What is DevOps for Analytics?
First some terminology…
DevOps
Associated with movement primarily in application
development space for last 5-10 years
Focused on very fast and continuous software product
releases
Think intra-day Prod releases at Netflix, Amazon, etc.
Convergence of development and operations
methodologies to minimize TTR
Tons of resources – devops.com, DZone
6. What is DevOps for Analytics?
Some more terminology…
DataOps
Re-emergent term
Seems to have a broader context
Applying DevOps to data management or to handling
backend databases
Also tends to carry real legacy connotation
Manual operations of database backups and restores,
7. What is DevOps for Analytics?
And finally…
AnalyticsOps
This is a term that we see starting to be used more
Its focused on applying DevOps practices within a data
analytics and data science context
This is the area we’re interested in for this talk
We’ll use the terms AnalyticsOps or the more explicit
DevOps for Analytics interchangeably
8. DevOps…
Speak with anyone and they will tell you first that DevOps
is a culture
Based primarily on teamwork
10. DevOps…
Speak with anyone and they will tell you first that DevOps is a
“culture”
Based primarily on teamwork
Aims to address the underlying conflict between
development and operations objectives
Innovation @ speed vs. Performance @ quality
Change vs. Stability
Culture is not “implemented”
It needs to evolve
Good news is it can be seeded
11. DevOps…
It works!
75% of IT and product dev organizations were successfully
using DevOps to some extent
– Source: RightScale 2016 State of the Cloud Report
It’s flexible
No two companies’ DevOps approaches will look the same
Infinite number of ways to create teamwork
A reflection of the organization itself
12. DevOps…
DevOps tenets
Continuous Integration
Test Automation
Continuous Delivery
Continuous Deployment
End-to-end automation is still aspirational for most
companies
Justify how much automation you need based on business
requirements.
13. DevOps…
What DevOps is not is a toolchain implementation
Tools help the team execute within the culture
Don’t run out and put an end to end chain in place and then
expect adoption
Lets talk about tools for a minute …
Explosion of both open-source and commercial DevOps
tooling
Serve every discrete need
requirements management, SCM, test automation, defect
tracking, build, deployment, monitoring and more
1,500+ tools available
14. DevOps…
Tooling categories:
Code : Code development, version control tools, code merging
Build : Continuous integration tools, build status
Test : Test and results determine performance
Package : Artifact repository, application pre-deployment
staging
Release : Change management, release automation
Configure : Infrastructure configuration and management,
Infrastructure as Code tools
Monitor : Applications performance monitoring, end user
experience
16. Why DevOps for Analytics?
“The fact is that analytic teams are
being compared by their businesses to
Amazon Prime – 2-day delivery of
almost anything”
Source: Unknown
18. Why DevOps for Analytics?
A couple of recent real world examples…
Data Science Rock Star Process Overengineering
19. Why DevOps for Analytics?
Analytics and data science projects, what used to take
months to achieve is happening in days or hours
Businesses typically like that and want more…
Enabled by the strong trend toward cloud analytic
platforms/services
Infrastructure as code (IaC) allows extension of software
development practices to servers and infrastructure
We can automate the build of complex analytic pipelines -
storage, processing engines, etc. with relative ease
20. DevOps for Analytics
DevOps for Analytics combines the development and
operations teams and establishes best practices that
improve coordination between data science and operations
BUT… Data Science and Analytics are different from
application development
Especially in a Big Data environments - need big data to test big
data applications
Much more diverse mix of tools and technologies – not just java
Some differences in approach are needed
21. DevOps for Analytics
AnalyticsOps this is still in its early days
There aren’t any real solid industry success stories published
People are still trying to figure out what works and aren’t’ open
kimono and sharing experiences just yet
Not a lot of experienced practitioners
But there are some early themes and guidelines emerging
22. DevOps for Analytics
Environments
Separate DEV and PROD environments
Should you reuse any of the PROD data assets?
Separate landing area, destination area (Data Lake), etc.
Trickier with increasing data volumes – do it smart to avoid
double costs
Sharing compute cluster resources is OK
Make all job inputs and outputs configuration driven (PROD
and DEV code doesn’t change) – for CI
23. DevOps for Analytics
Automated Testing
It’s almost impossible to get full code coverage
How do you unit test SPARK SQL scripts? Regression tests?
Data validation?
Test data is a complex problem – handle as a cross-functional
initiative.
Analytic results are often buried in complex outputs, QA
becomes forensic data analysis
Automate what you can, supplement with community based
real-world data testing in a parallel Dev/Test environment
The role of the Test/QA Engineer is still really important
Test/QA Engineers need Data Engineering experience
24. DevOps for Analytics
Monitoring
Tracking and analyzing intra-day demand and longer term trends
in infrastructure performance (standard DevOps)
But then…
By their nature analytics processes require monitoring and
tuning over time with real-world inputs
Data drifts; Predictive models have a finite lifetime
Silent failures
Feedback to developers so they can see how their code is
performing and affecting the Prod environment
Continuous improvement
The next wave is analytics on analytics…
25. DevOps for Analytics
Emerging DevOps for Analytics environment usually contain
SCM
CI
Repo to store analytics app
Repo to store configuration
An API to deploy to the cluster
Mechanism to monitor behavior and performance
26. DevOps for Analytics Organization
Building a DevOps for Analytics culture is not an easy
undertaking
Should fall under the purview of a dedicated data organization
These organizations are typically lead by the Chief Data
Officer
More recently by Chief Data Scientist a Chief Analytics Officer
Key responsibilities include
Fostering adoption
Clarifying and aligning to the business' vision
Securing reasonable funding
27. DevOps for Analytics Organization
The goal over time is to create lean, highly performant, cross-
functional, extremely effective teams
Business Stakeholders
Data Engineers
Data Analysts & Data Scientists
QA
Operations
All of these skills are important - but when in doubt get more Data
Engineers!
Everyone on team has an equal voice
Everyone codes & Everyone needs to know what Prod looks like
28. DevOps for Analytics Organization
Start-up Condition: Bring in an experienced set of DevOps for
Analytics Engineers
Help define the culture, lead by example
Identify the Innovators and get them involved and leading
The DevOps Engineers job is to ultimately engineer themselves out
of the equation
Source: Matthew Skelton, DevOps Patterns - Team Topologies
29. Final Thoughts
“We aim to engineer systems and processes
to better integrate development and
operations, resulting in decreased time to
market and an application infrastructure
that is instrumented, scalable and fault
tolerant… and immortal!”
- Will Liu, Equinox Data Team
30. Final Thoughts
There are plenty of benefits in establishing a DevOps
for Analytics culture for your organization
For the business: Speed to insight
For the teams: Professional and personal satisfaction
Be Fearless –
go build your own DevOps for Analytics culture!
33. Thank You
Bob Eilbacher
Vice President Operations, Caserta
bob@casertaconcepts.com
Upcoming Training Opportunity:
Caserta is hosting 3 Days of Training Courses October 18-20th in NYC,
taught by Joe Caserta, co-author of The Data Warehouse ETL Toolkit:
Day 1: Agile Data Warehouse Design & Dimensional Modeling
Day 2: ETL Architecture & Design
Day 3: Big Data for Data Warehouse Practitioners
More info at casertaconcepts.com/event/