Our ability to produce, ingest and store data has grown exponentially, but our ability to parse out insights from data has not. In the 90s, an organization’s data would live in a data warehouse with an ETL pipeline and one reporting layer on top. Information was well controlled if not somewhat limited in breadth and slow to trickle down. Now with the onset of self-service analytics, anyone can create a report and an insight and there are many different sources of “truth.” For example, a seemingly straightforward question like "how many customers do we have?" will likely return difference answers from sales, finance and customer success, depending on their definitions and the data at hand. There is simply too much data (and duplicate data), too many tools, and too many systems storing data -- leading to time consuming searches, confusion and a lack of trust. Hear Stephanie discuss how a data catalog can help solve the noise to signal problem - making information easier to find, easier to understand and more trustworthy. She will describe how organizations like Safeway, Albertsons, Munich Re and Pfizer leverage a data catalog to find data and collaborate on data, gain a fuller understanding of its meaning and ultimately, solve important problems.
3. What will we discuss today?
• Noise in the data pipeline
• Three familiar stories of challenges
• Why the problem is so pervasive
• How data catalogs focus organizations on signal
4. BRIAN HOPKINS
VICE PRESIDENT & PRINCIPAL ANALYST
FORRESTER, MARCH 9, 2016
““Despite big investments in big data,
we found that while 74% of firms want
to be ‘data-driven’, only 29% are
connecting analytics to action.
7. “We asked respondents about their
difficulty in locating relevant
content (e.g., data, metadata) for
analysis. Forty-seven percent of
respondents indicate that analytic
consumers have difficulty
locating/accessing relevant
analytic content.“
Dresner Advisory Services, Data Catalog Study, June 18, 2018
8. Enterprises are struggling with too many
• Too many…
• Raw files & tables to FIND data easily
• Processing engines to UNDERSTAND transformation trade-offs
• Self-service reports & dashboards to TRUST the validity &
accuracy of results
10. …and they’re being asked to focus change
Measure
• Track usage across a
wider variety of users
• Understand new,
iterative & discovery
based use cases
• Leverage usage
insights to build data
literacy
Manage
• Democratize access &
discovery
• Ensure the data is
readily available to
consume
• Govern for compliance
& insights
Monetize
• Drive business
outcomes from data
• Evaluate information &
its artifacts as an asset
• Establish data
ownership & rights
13. …to pharmaceutical companies moving to the Cloud…
“Data science shouldn’t be
confined to mathematicians.”
- JEFF KEISLING
CHIEF INFORMATION OFFICER
14. …And tech companies maximizing Teradata
“The biggest sin of data governance
is if a random person queries some
data, puts it in Excel, modifies it,
puts it into a Powerpoint, and ships
it around. We had this happening a
lot.”
- ZOHER KARU
FORMER CHIEF DATA OFFICER
15. In all of these scenarios, data catalogs are
driving transformational results…
16. Data catalogs are enabling new Business Units
“Our data strategy is geared to offer
new and better risk-related services
to our customers. A core-piece in that
strategy is our integrated self-service
data analytics platform. Alation’s social
catalog is part of that platform and
already helps more than 600 users in
the group to discover data easily and
to share knowledge with each other.”
- WOLFGANG HAUNER
CHIEF DATA OFFICER
Protecting Against Natural Catastrophes, Cyber Attacks, and Future Risks
17. Data catalogs are enabling new diagnosis & trials
Delivering break-through drugs to market faster
• Aggregates disparate data sources – including physician
notes, lab reports, demographics & co-morbidities
• Uses ML models to potentially identify rare heart
failures like transthyretin cardiomyopathy - disease
often goes undiagnosed because the symptoms are
similar to other forms of heart failure; identifying
candidates helps with diagnosis but also with identifying
clinical trial participants
• Consistent, global information asset re-use – Analyst &
data science assets (like R scripts, SQL queries) are
registered in broad-based re-use
18. Data catalogs are enabling over 1,000 eBay
employees to analyze data weekly, growing to 3,000
• Business glossary delivers trust in data –
Consolidated glossary with hundreds of analyst/data
stewards contributing
• On-boarding in days – Alation users are able
consistently to self-serve SQL data through Alation
SmartSuggest
• Consistent, global information asset re-use –
Analyst & data science assets (like R scripts & SQL
queries) are registered in broad-based re-use
19. Why has every company had this
challenge? What’s so hard about
achieving these results?
20. Evolution of the Data Ecosystem
“Single Source of Truth”
1985 - Present
Consumption
Storage
Curator • Head of IT
Consistency • Consistent
Creation Speed • Slow
Ability to dig • Limited to what’s there
Preparation
‣ Too many tools creating
locally conflicting data
‣ Complicated code that’s
impossible to parse
‣ Too much data in too
many different systems
breeds confirmation bias
‣ Information is hard to
‣ find,
‣ understand, &
‣ trust.
What’s the REAL
problem today?
Complexity
“Self Service”
2005-Present
• Report Author
• Multiple “Truths”
• Fast
• Unlimited
+
21. We’re only half way through the
self-service analytics revolution
22. Changing traditional data governance comes next
• People & process heavy - small teams tasked with the impossible
• Business impact is hard to quantify in real dollar amounts
• Often just a response to government regulation
• Provides an alluring but evasive “Single source of truth”
• One single pre-modeled repository (defined during data
modeling)
• Rigid data access rules
• IT point of control & slow to change
23. Gartner, Magic Quadrant for Business
Intelligence and Analytics Platforms, Rita L.
Sallam, Cindi Howson, Carlie J. Idoine, Thomas
W. Oestreich, James Laurence
Richardson, Joao Tapadinhas, 16 February 2017
““By 2020, organizations that offer users
access to a curated catalog of internal
and external data will realize 2x the
business value from analytics
investments than those that do not.
24. Find the right data
(3–6 weeks)
Understand the data
(1–2 days)
Trust the data
(1-2 days)
Timeline
Standard Timeline
1-2 days
Data Catalogs: Analysis in hours that’s trusted & impactful
Enhance the Analytical
Productivity of each analyst and
business user by up to 50%
Produce Accurate
Documentation by up to 40%
faster.
Build business value by speeding
analysis and achieving Reduced
Time-to-Insight
1-2 months
Find the Data
(3-6 weeks)
Understand the data
(1-2 days)
Trust the data
(1-2 days)
Write the query
(1-10hours)
Run the query
(minutes+)
Reclaimed time for deeper insights
25. Top 5 things to look for in a data catalog
alation.com/cdp
1. Unified view for all your data - not just a manual inventory, a living catalog
• Metadata enriched for business accessibility
2. Machine-human collaboration to curate data context
• Human in the loop AI & ML
3. Verification of sources so you can trust your data
• Data lineage that spans sources, processing engines & data visualization tools
4. Just in time guidance
• Certification/badging & a data recommendation engine
5. Collaborative capabilities to break down organizational silos
• Reviews, comments, rankings
26.
27. Interested in more information?
Download a copy of the
latest research:
alation.com/MLDC
Register for a personal demo:
alation.com/GetDemo18