Contenu connexe Similaire à Data-Centric Analytics and Understanding the Full Data Supply Chain (20) Data-Centric Analytics and Understanding the Full Data Supply Chain1. The First Step in Information Management
looker.com
Produced by:
MONTHLY SERIES
In partnership with:
Data-centric Analytics and
Understanding the Full Data Supply Chain
May 3, 2018
2. Welcome to Today’s Discussion
Understanding the data supply chain and how it impacts analytics
Data-centric design considerations for the data supply chain
Data supply chain features and components for the data lake
Key roles and responsibilities
How analytics must interact with the data supply chain
Best practices and key takeaways
Q&A
pg 2© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
3. Background
An insurance company believes it found a strong predictor for policy
renewal only to discover the model was based on an indicator that
actually meant a policy was cancelled, not expiring.
A real estate AI model was corrupted because, while one record in a
million was wrong, it was wrong by a magnitude of 1000, and there
was no way to tell if it was correct or an error.
“There are many downstream processes, including EHR configuration,
data transport, aggregation, normalization, and reporting mechanisms,
that through omission or commission can negatively impact data
quality.” Healthcare IT News
Postal addresses and emails change constantly so 20–23% of all of
this data is wrong as soon as it is received.
There are new examples of incorrect conclusions and bad AI results
every day. Example site – www.towardsdatascience.com
pg 3© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
4. What is a Data Supply Chain?
The Data supply chain represents
the sources, flows, management
and distribution of data and
information in an organization.
pg 4© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
• Steel
• Aluminum
Raw Material
• Supplier
• Fabrication
Make/
Acquire Parts • Processes
• Sequence
Assembly
• Ship
• Store
• Sell
Distribute
•Internal Database
•Derived data
Raw Material
•External File
•Clean Data
•Standardize and
Position Data
Make/
Acquire Parts •Populate Sand Box
•Develop and Run
Models
Assembly
•Visualization
•Publish and Pull
•Sell Results
Distribute
A regular supply chain coordinates
material sourcing and assembly to
fulfill and deliver goods to customers.
5. Why Data Supply Chains Are Important
A well-run factory has a “parts and tools”
crib or cage.
Contents are well-managed and tracked.
Distribution from that area depends on
strong guidance, policy and lean
management.
pg 5© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
6. Why Data Supply Chains Are Important
A well-run factory has a “parts and tools”
crib or cage.
Contents are well-managed and tracked.
Distribution from that area depends on
strong guidance, policy and lean
management.
pg 6© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
The tool crib has an
inventory control system
and an item master.
A data supply chain and
data lake uses metadata
and data governance.
7. Why Data Supply Chains Are Important
Unknown bad data is like a hidden
manufacturing fault, but instead of a recall,
you get to explain that the model and AI are
in error and have been putting out bad
recommendations.
Gathering external, internal and then
blending in deduced data is like switching
suppliers in a supply chain.
Sometimes the pieces don’t fit in spite of the
same specs.
pg 7© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
This isn’t new, but data
scientists who are new
to this type of use of
data are “discovering”
things that have
already been
discovered.
8. Why Data Supply Chains Are Important
Unknown bad data is like a hidden
manufacturing fault, but instead of a recall,
you get to explain that the model and AI are
in error and have been putting out bad
recommendations.
Gathering external, internal and then
blending in deduced data is like switching
suppliers in a supply chain.
Sometimes the pieces don’t fit in spite of the
same specs.
pg 8© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Like modern
manufacturing, you do
not fix the product
after it is built
anymore. You fix the
process, i.e., you build
a data supply chain.
9. Data-centric Thinking for Design
pg 9
Treat and View Your Data as an Asset
Sell to Suppliers, Not Consumers
Integrate With Your Data Strategy
Treat Like a Real Business Line
AI and analytics
architectures are
really logistical
challenges.
Pretend the
analytics results
are a product,
even if internal.
Design the data
supply chain.
=
© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
10. Manage Your Data Architecture
Most large organizations
have complicated
architectures.
Even smaller
organizations need to
balance COTS,
homegrown and modern
data assets.
pg 10© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Data Insight Architecture
Integration
&
Abstraction
Layer
Data Management Layer
Data Access Layer
Business Strategy
Vintage Area
Contemporary
Area
11. Manage Your Data Architecture
Use the concept of a
supply chain to
understand, design and
manage the balancing of
the vintage and
contemporary sides of
your data architecture.
pg 11© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Data Insight Architecture
Integration
&
Abstraction
Layer
Data Management Layer
Data Access Layer
Business Strategy
Vintage Area
Contemporary
Area
Source
FabricateAssembly
Distribute
12. Data Supply Chain
Features of a Modern Data Supply Chain
CREATE USE UPDATE MEASURE MODEL ANALYZE DELETE
Goods and Services Supply Chain
Distribution
Purchasing
Operations
Integration
Functions
Logistics
Compliance
Organization
Product Mgmt
Capabilities
Data Push and Pull
Data Management
Data Operations
Data Integration
Functions
BI, Analytics
Governance
Engagement Model
Product Mgmt
Capabilities
Data Supply Chain
pg 12© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
13. Major Components of the Data Supply Chain
pg 13© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Data Supply Chain
CREATE USE UPDATE MEASURE MODEL ANALYZE DELETE
14. Major Components of the Data Supply Chain
pg 14© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Data Supply Chain
Data Governance
Data Catalog (Metadata)
CREATE USE UPDATE MEASURE MODEL ANALYZE DELETE
Data Quality
15. Major Components of the Data Supply Chain
pg 15© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Data Supply Chain
Data Governance
Data Catalog (Metadata)
CREATE USE UPDATE MEASURE MODEL ANALYZE DELETE
Data Sources Data Lake
LANDING
ZONE
STANDARDIZATION
ZONE
ANALYTICS
PLATFORMS
Legacy Apps ERP External Files
Traditional BI and Reporting
Reporting
Data
Mart
Data
Warehouse
Data Quality
16. Roles – Treat and View Your Data as an Asset
pg 16© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
•Data Product Manager, quality control, etc. (Custodian,
Steward)
Information product
management
•Define the supply chain, from source, production, marketing
and shipping
Architect, engineer
•Manage data quality The data scientists cannot sustain the
product without good raw materials. Govern data There
will be standards required for source and usage of data.
Governance and quality
•90% of the same support mechanisms are shared. Integrate
with the data strategy and vision for the organizationLeadership and alignment
Roles Responsibilities
17. Responsibilities – Treat Like a Real Business Line
pg 17
• Define revenue streams; Monitor costs; Measure
effectiveness
Oversight of data use
• Measure the business; Monitor costs, returns on
sales
Manage the numbers
• Engage, Transact, Fulfill, Service exists for internal
and external
Manage the customer
• Attain peer status with finance, legal, other
products/revenue streams
Alignment and acceptance
© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Roles Responsibilities
18. Best Practices
pg 18
Sell to
suppliers,
not
consumers
Solve customer needs with data
products.
Provide data (or data access) that someone
else can “productize.”
Develop premium data products
for sellers versus buyers.
Sellers (and their agents) are more willing to
pay than buyers.
Direct-to-consumer tends to be fickle and
higher-cost (Zillow and Uber, as examples).
Develop a data exchange with
customers on the platform of
your customer’s preference.
Don't make your customer have to invest
heavily –standards are one thing, capital is
another.
Achieve market scale quickly. POCs are good, but make a call. Someone else
will beat you.
© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
19. Best Practices
pg 19
Integrate
with
data
strategy
IaaS and internal data
ecosystems tend to be
intertwined.
They will share the same data management
and governance functions (it is a win-win).
Metadata is your inventory
management system.
Indices, registries and lineage are vital
functions for oversight and scaling.
Data landscape management is
key – a good inventory of data
assets.
The most important or biggest DB might be
one you don’t own (e.g. external data).
Fuse data from many sources and formats.
Culture has strategy for lunch. Acceptance of new standards and capabilities
is never smooth.
© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
20. How Analytics Must Interact with the Data Supply Chain
pg 20© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Data Supply Chain
DATA SOURCES
Analytics will lie somewhere along your data supply chain.
LANDING ZONE STANDARDIZATION ZONE ANALYTICS PLATFORMS
DATA GOVERNANCE
DATA CONSUMERS
DATA OPERATIONS
DATA
SCIENTISTS
DATA MANAGEMENT
Create Use Update Measure Model Delete
21. How Analytics Must Interact with the Data Supply Chain
pg 21© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
Data Supply Chain
DATA SOURCES
Analytics will lie somewhere along your data supply chain.
LANDING ZONE STANDARDIZATION ZONE ANALYTICS PLATFORMS
DATA GOVERNANCE
DATA CONSUMERS
DATA OPERATIONS
DATA
SCIENTISTS
DATA MANAGEMENT
Create Use Update Measure Report Delete
22. Best Practices and Key Takeaways
Understand the bigger picture of balancing old and new, many
sources and many uses of data
Understand there is a strong metaphor in the supply chain
Realize lean management, logistics and compliance are models in
the non-data world that can be applied to data management of
data supply chains
Ensure the end point is the data lake and analytics/insights
Start with information requirements when you plan for the data
supply chain
Fully exploit the assembly line metaphor: raw data in – and out
comes an analytics conclusion
Remember well-established best practices along the way.
Metadata and the data catalog are your item master and
inventory management systems
pg 22© 2018 First San Francisco Partners www.firstsanfranciscopartners.com
24. Thank you for joining us today!
Our Thursday, June 7 #DIAnaltyics webinar is:
Top 5 Priorities of an Analytics Leader.
John Ladley @jladley
john@firstsanfranciscopartners.com
Kelle O’Neal @kellezoneal
kelle@firstsanfranciscopartners.com