SlideShare une entreprise Scribd logo
www.scling.com
Industrialised data -
the key to AI success
Lars Albertsson, Founder, Scling
2024-04-25
1
www.scling.com
Myth:
● We are all doing quite ok
● 2-10x leader-to-rear span
The great capability divide
2
capability in X
# orgs
www.scling.com
Myth:
● We are all doing quite ok
● 2-10x leader-to-rear span
The great capability divide
3
capability in X
# orgs
capability in X
# orgs
Reality:
● Few leaders in each area
● 100-10000x leader-to-rear span
www.scling.com
Capability KPIs
DORA research / State of DevOps report:
● Deployment frequency
● Lead time for changes
● Change failure rate
● Time to restore service
Small elite
~1000x span
4
Observed differences in data organisations:
● Lead time from idea to production
● Time to mend / change pipeline
● Number of pipelines / developer
● Number of datasets / day / developer
Small elite
100 - 10000x span (or more)
www.scling.com
Efficiency gap, data cost & value
● Data processing produces datasets
○ Each dataset has business value
● Proxy value/cost metric: datasets / day
○ S-M traditional: < 10
○ Bank, telecom, media: 100-1000
5
2014: 6500 datasets / day
2016: 20000 datasets / day
2018: 100000+ datasets / day,
25% of staff use BigQuery
2021: 500B events collected / day
2016: 1600 000 000
datasets / day
Disruptive value of data, machine learning
Financial, reporting
Insights, data-fed features
effort
value
www.scling.com
Enabling innovation
6
"The actual work that went into
Discover Weekly was very little,
because we're reusing things we
already had."
https://youtu.be/A259Yo8hBRs
https://youtu.be/ZcmJxli8WS8
https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/
"Discover Weekly wasn't a great
strategic plan and 100 engineers.
It was 3 engineers that decided to
build something."
"I would have killed it. All of a sudden,
they shipped it. It’s one of the most
loved product features that we have."
- Daniel Ek, CEO
www.scling.com
Data not acted upon - value left on table
7
The car has sensors for precipitation,
temperature, and window state. I would
have liked to receive a mobile app
warning when all windows were down
during a snowy night.
www.scling.com
Data not acted upon - value left on table
8
Changing to winter tyres at Bilia. When I arrived,
the staff could see that mechanics were currently
not fully booked, and offered me a discounted
front wheel adjustment, which I accepted.
Data value taken from table.
Neighbour jumps in their Volvo, talks to husband
through window, and drives off. On arrival she
discovers that she has no car key - husband's key
was close enough to start car. An early warning
that key is no longer in proximity would have been
valuable.
I make the effort to submit a detailed bug report
regarding Volvo's known over-aggressive
automatic safety brake problem, in order to
provide more data for debugging. I receive no
feedback from the process, and lose interest.
Our Volvo incorrectly activates the automatic safety
brake again. This is a known problem since years
back, which affects our car occasionally. I have no
efficient channel to report time and circumstances
for this occasion, allowing Volvo to get more data
on the problem, and me to feel confident that the
issue is worked on.
After a repair at Bilia of the rear view mirror, the
car does not properly detect cars in front. Camera
near mirror is suspected. Bilia cannot obtain
camera measurements or car detection statistics
from Volvo, so a cycle of trial and error repairs
follows, taking me to the mechanic multiple times.
The car has sensors for precipitation,
temperature, and window state. I would
have liked to receive a mobile app
warning when all windows were down
during a snowy night.
I send a design suggestion to Volvo with pics of a
hacked solution for a luggage problem. A simple
hook could solve it. I do not receive "We have
assigned your suggestion id X, and will get back if
it is implemented." I will not send another, and feel
less connected to the brand.
Wife has driven our Volvo. Mirrors and seats
automatically move to my position, except for the
right mirror, which I need to correct again, as I
have for years. Volvo could have measured and
detected irregular changes of settings, and found
this bug.
Wife unlocks Volvo. Driver seat switches to her
position. I take the driver's seat, pull it back, and
change to my profile. Car switches seat back to
her position, and I have to pull it back again.
When seat heating was automatically activated,
"Increase seat heating" audio command turns it
off. I make additional commands to get to the
desired state. Measurements could identify
repeated audio or UX commands.
The Volvo phone app warns about many things:
interrupted charging, car parked but not being charged,
etc. But not about the roof and windows left open in the
rain. Rear view mirror electronics now need a repair.
I change the audio balance in the
Volvo with six screen clicks.
Measurements could indicate that
long click sequences at high speed
indicates poor user interface.
The car keeps informing me
that there is a software
upgrade, but no matter what I
do, nothing gets upgraded.
Map updates fail to auto
install. Support cannot obtain
diagnose information. Manual
installation attempts fail
without error messages.
www.scling.com
IT craft to factory
9
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
www.scling.com
Security Waterfall
Data factories
10
Application
delivery
Traditional
operations
DevSecOps
Traditional
QA
Infrastructure
DB-oriented
architecture
Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
From craft to process
11
www.scling.com
From craft to process
12
Multiple time windows
Assess ingress data quality
Repair broken data from
complementary source
Forecast based on history,
multiple parameter settings
Assess outcome data quality
Assess forecast success,
adapt parameters
www.scling.com
Naive machine learning
13
www.scling.com
Sustainable production machine learning
14
Multiple models,
parameters, features
Assess ingress data quality
Repair broken data from
complementary source
Choose model and parameters based
on performance and input data
Benchmark models
Try multiple models,
measure, A/B test
www.scling.com
Generative AI → more complex data engineering
15
Input
LLM
Output
Generative AI is simple?
Unless you want
● Relevance
● Correct facts
● Multiple modalities
● Prevent undesirable outcomes
www.scling.com
● Training an adapted large language model is expensive and difficult
● RAG hack:
○ Find relevant document chunks
○ Concatenate with prompt (context)
○ Combined general and specific information!
● How to search?
○ Keywords / embeddings / neural?
○ Rerank chunks?
● How to chunk documents?
○ Boundaries
○ Chunks need higher context
■ E.g. "These uses of X may cause fire: <snip> …"
Retrieval-augmented generation (RAG)
16
Input
LLM
Output
Search
engine
Document
corpus
Document
chunks
www.scling.com
RAG experiment: Household appliance interface
● Products that will be disrupted by generative AI?
○ How many different washing machine programs do you use?
○ What if you could talk to the machine? Show your stained shirt?
● Naive chunks:
"Saturate with prewash stain remover or nonflammable dry cleaning fluid. Baby formula, dairy
products, egg."
● Domain-specific data engineering divides generative AI products from demos.
17
www.scling.com
Generative AI → more complex data engineering
18
Input
LLM
Output
Generative AI is simple? Want multi-modal?
Correct facts too?
Safety mechanisms to
avoid undesirable use?
Add relevance? →
Retrieval augmented
generation (RAG)
Input
LLM
Output
Search engine Document
corpus
Document
snippets
Nvidia: Nemo Guardrails
Langchain: Multi-modal RAG
USTC + UCLA + Google: Corrective RAG
www.scling.com
Data engineering in the future
19
DW
~10 year capability gap
"data factory engineering"
Enterprise big data failures
"Modern data stack" -
traditional workflows, new technology
4GL / UML phase of data engineering
Data engineering education
www.scling.com
Don't build. Grow.
20
● Every data / AI project has either
○ failed
○ cost a fortune
● Every leading data company has
○ solved product challenges at hand
○ improved process / org / ways of working
○ had data / AI success through enabled teams
● Brief recipe for success
○ Align teams with value chains
○ Automation + data for current challenges
○ Centralised, immutable data in pipelines
○ It's a software engineering problem
■ QA, composability, DevOps, …
○ Feedback cycle speed < hour (day)
Technology
change
Process
change
Negative ROI
Positive ROI
Sustainable
data+AI feature
development
We must
be AI first!
Solve today's
challenges with
data & automation
www.scling.com
● Other disciplines?
○ Shared innovation
○ E.g. Foxconn, Autoliv
● Tech leaders
○ Grow, don't build
○ Product-driven
21
Data factory adoption
Observations:
● Industrial success requires veterans
● Not enough factory experience around
● Minimal flow of veterans to incumbents /
consultants
● Artisanal tools (Data warehouse / low-code / …)
inadequate for domain-specific AI
● Industrial tools not helpful without veterans
Belief:
We need new ways of
collaboration.
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Data innovation
as leaders do it
Learning by
doing, in
collaboration
www.scling.com
What we learnt
22
Success under good circumstances
● Match Spotify's numbers per developer
● 10-1000x client's own capability
● Data + AI innovation on par with leaders
Challenge: Data capability → innovation
● Domain-specific competence
● "Innovation building blocks" needed
○ Agile, pull vs push, iterative
○ Digital thinking
○ Aligned, cross-functional teams
○ Product focus
○ Value chain alignment
Greater challenge: "How hard can it be?"
● Unawareness of data divide
www.scling.com
What we learnt, what we need
23
Humble, but competent clients
● Data with value potential
● Domain experts
● Innovation building blocks
● Humble regarding data divide
Other ways to slice the challenge?
Partners that bridge innovation gap
● Digitalisation in some vertical
● Interested in new business models
Success under good circumstances
● Match Spotify's numbers per developer
● 10-1000x client's own capability
● Data + AI innovation on par with leaders
Challenge: Data capability → innovation
● Domain-specific competence
● "Innovation building blocks" needed
○ Agile, pull vs push, iterative
○ Digital thinking
○ Aligned, cross-functional teams
○ Product focus
○ Value chain alignment
Greater challenge: "How hard can it be?"
● Unawareness of data divide

Contenu connexe

Similaire à Industrialised data - the key to AI success.pdf

Enough Blame for System Performance Issues
Enough Blame for System Performance IssuesEnough Blame for System Performance Issues
Enough Blame for System Performance Issues
Mahesh Vallampati
 
Gl wand oracle baug may 2013
Gl wand oracle baug may 2013Gl wand oracle baug may 2013
Gl wand oracle baug may 2013
Marianne Nørskov
 

Similaire à Industrialised data - the key to AI success.pdf (20)

The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
MeasureWorks - Online Optimizers & Analytics meetup Stockholm - Need for Speed
MeasureWorks  - Online Optimizers & Analytics meetup Stockholm - Need for SpeedMeasureWorks  - Online Optimizers & Analytics meetup Stockholm - Need for Speed
MeasureWorks - Online Optimizers & Analytics meetup Stockholm - Need for Speed
 
Imvac 2018 rev9_public_optimized
Imvac 2018 rev9_public_optimizedImvac 2018 rev9_public_optimized
Imvac 2018 rev9_public_optimized
 
Rockwell Automation TechED 2017 - AP08 - Givaudan
Rockwell Automation TechED 2017 - AP08 - GivaudanRockwell Automation TechED 2017 - AP08 - Givaudan
Rockwell Automation TechED 2017 - AP08 - Givaudan
 
Using Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded SpaceUsing Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded Space
 
Leveraging Cloud data to optimize your product decisions and Agile processes ...
Leveraging Cloud data to optimize your product decisions and Agile processes ...Leveraging Cloud data to optimize your product decisions and Agile processes ...
Leveraging Cloud data to optimize your product decisions and Agile processes ...
 
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
DevOps at EMC NYC August 2015 - Modernize your apps to drive organizational e...
 
How to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOpsHow to Better Manage Technical Debt While Innovating on DevOps
How to Better Manage Technical Debt While Innovating on DevOps
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
[코세나, kosena] Auto ML, H2O.ai의 제조분야 AI 활용 사례
 
Enough Blame for System Performance Issues
Enough Blame for System Performance IssuesEnough Blame for System Performance Issues
Enough Blame for System Performance Issues
 
Building Beautiful High Performance Connected Car Applications
Building Beautiful High Performance Connected Car ApplicationsBuilding Beautiful High Performance Connected Car Applications
Building Beautiful High Performance Connected Car Applications
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Analytics on system z final
Analytics on system z finalAnalytics on system z final
Analytics on system z final
 
Company presentation english 1 2015
Company presentation english 1 2015Company presentation english 1 2015
Company presentation english 1 2015
 
Gl wand oracle baug may 2013
Gl wand oracle baug may 2013Gl wand oracle baug may 2013
Gl wand oracle baug may 2013
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Hey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima MukkamalaHey IT, Meet OT with Hima Mukkamala
Hey IT, Meet OT with Hima Mukkamala
 
AltoWeb_SPEED_Overview-2001
AltoWeb_SPEED_Overview-2001AltoWeb_SPEED_Overview-2001
AltoWeb_SPEED_Overview-2001
 

Plus de Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 

Plus de Lars Albertsson (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Data democratised
Data democratisedData democratised
Data democratised
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 

Dernier

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
benishzehra469
 

Dernier (20)

2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 

Industrialised data - the key to AI success.pdf

  • 1. www.scling.com Industrialised data - the key to AI success Lars Albertsson, Founder, Scling 2024-04-25 1
  • 2. www.scling.com Myth: ● We are all doing quite ok ● 2-10x leader-to-rear span The great capability divide 2 capability in X # orgs
  • 3. www.scling.com Myth: ● We are all doing quite ok ● 2-10x leader-to-rear span The great capability divide 3 capability in X # orgs capability in X # orgs Reality: ● Few leaders in each area ● 100-10000x leader-to-rear span
  • 4. www.scling.com Capability KPIs DORA research / State of DevOps report: ● Deployment frequency ● Lead time for changes ● Change failure rate ● Time to restore service Small elite ~1000x span 4 Observed differences in data organisations: ● Lead time from idea to production ● Time to mend / change pipeline ● Number of pipelines / developer ● Number of datasets / day / developer Small elite 100 - 10000x span (or more)
  • 5. www.scling.com Efficiency gap, data cost & value ● Data processing produces datasets ○ Each dataset has business value ● Proxy value/cost metric: datasets / day ○ S-M traditional: < 10 ○ Bank, telecom, media: 100-1000 5 2014: 6500 datasets / day 2016: 20000 datasets / day 2018: 100000+ datasets / day, 25% of staff use BigQuery 2021: 500B events collected / day 2016: 1600 000 000 datasets / day Disruptive value of data, machine learning Financial, reporting Insights, data-fed features effort value
  • 6. www.scling.com Enabling innovation 6 "The actual work that went into Discover Weekly was very little, because we're reusing things we already had." https://youtu.be/A259Yo8hBRs https://youtu.be/ZcmJxli8WS8 https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/ "Discover Weekly wasn't a great strategic plan and 100 engineers. It was 3 engineers that decided to build something." "I would have killed it. All of a sudden, they shipped it. It’s one of the most loved product features that we have." - Daniel Ek, CEO
  • 7. www.scling.com Data not acted upon - value left on table 7 The car has sensors for precipitation, temperature, and window state. I would have liked to receive a mobile app warning when all windows were down during a snowy night.
  • 8. www.scling.com Data not acted upon - value left on table 8 Changing to winter tyres at Bilia. When I arrived, the staff could see that mechanics were currently not fully booked, and offered me a discounted front wheel adjustment, which I accepted. Data value taken from table. Neighbour jumps in their Volvo, talks to husband through window, and drives off. On arrival she discovers that she has no car key - husband's key was close enough to start car. An early warning that key is no longer in proximity would have been valuable. I make the effort to submit a detailed bug report regarding Volvo's known over-aggressive automatic safety brake problem, in order to provide more data for debugging. I receive no feedback from the process, and lose interest. Our Volvo incorrectly activates the automatic safety brake again. This is a known problem since years back, which affects our car occasionally. I have no efficient channel to report time and circumstances for this occasion, allowing Volvo to get more data on the problem, and me to feel confident that the issue is worked on. After a repair at Bilia of the rear view mirror, the car does not properly detect cars in front. Camera near mirror is suspected. Bilia cannot obtain camera measurements or car detection statistics from Volvo, so a cycle of trial and error repairs follows, taking me to the mechanic multiple times. The car has sensors for precipitation, temperature, and window state. I would have liked to receive a mobile app warning when all windows were down during a snowy night. I send a design suggestion to Volvo with pics of a hacked solution for a luggage problem. A simple hook could solve it. I do not receive "We have assigned your suggestion id X, and will get back if it is implemented." I will not send another, and feel less connected to the brand. Wife has driven our Volvo. Mirrors and seats automatically move to my position, except for the right mirror, which I need to correct again, as I have for years. Volvo could have measured and detected irregular changes of settings, and found this bug. Wife unlocks Volvo. Driver seat switches to her position. I take the driver's seat, pull it back, and change to my profile. Car switches seat back to her position, and I have to pull it back again. When seat heating was automatically activated, "Increase seat heating" audio command turns it off. I make additional commands to get to the desired state. Measurements could identify repeated audio or UX commands. The Volvo phone app warns about many things: interrupted charging, car parked but not being charged, etc. But not about the roof and windows left open in the rain. Rear view mirror electronics now need a repair. I change the audio balance in the Volvo with six screen clicks. Measurements could indicate that long click sequences at high speed indicates poor user interface. The car keeps informing me that there is a software upgrade, but no matter what I do, nothing gets upgraded. Map updates fail to auto install. Support cannot obtain diagnose information. Manual installation attempts fail without error messages.
  • 9. www.scling.com IT craft to factory 9 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  • 12. www.scling.com From craft to process 12 Multiple time windows Assess ingress data quality Repair broken data from complementary source Forecast based on history, multiple parameter settings Assess outcome data quality Assess forecast success, adapt parameters
  • 14. www.scling.com Sustainable production machine learning 14 Multiple models, parameters, features Assess ingress data quality Repair broken data from complementary source Choose model and parameters based on performance and input data Benchmark models Try multiple models, measure, A/B test
  • 15. www.scling.com Generative AI → more complex data engineering 15 Input LLM Output Generative AI is simple? Unless you want ● Relevance ● Correct facts ● Multiple modalities ● Prevent undesirable outcomes
  • 16. www.scling.com ● Training an adapted large language model is expensive and difficult ● RAG hack: ○ Find relevant document chunks ○ Concatenate with prompt (context) ○ Combined general and specific information! ● How to search? ○ Keywords / embeddings / neural? ○ Rerank chunks? ● How to chunk documents? ○ Boundaries ○ Chunks need higher context ■ E.g. "These uses of X may cause fire: <snip> …" Retrieval-augmented generation (RAG) 16 Input LLM Output Search engine Document corpus Document chunks
  • 17. www.scling.com RAG experiment: Household appliance interface ● Products that will be disrupted by generative AI? ○ How many different washing machine programs do you use? ○ What if you could talk to the machine? Show your stained shirt? ● Naive chunks: "Saturate with prewash stain remover or nonflammable dry cleaning fluid. Baby formula, dairy products, egg." ● Domain-specific data engineering divides generative AI products from demos. 17
  • 18. www.scling.com Generative AI → more complex data engineering 18 Input LLM Output Generative AI is simple? Want multi-modal? Correct facts too? Safety mechanisms to avoid undesirable use? Add relevance? → Retrieval augmented generation (RAG) Input LLM Output Search engine Document corpus Document snippets Nvidia: Nemo Guardrails Langchain: Multi-modal RAG USTC + UCLA + Google: Corrective RAG
  • 19. www.scling.com Data engineering in the future 19 DW ~10 year capability gap "data factory engineering" Enterprise big data failures "Modern data stack" - traditional workflows, new technology 4GL / UML phase of data engineering Data engineering education
  • 20. www.scling.com Don't build. Grow. 20 ● Every data / AI project has either ○ failed ○ cost a fortune ● Every leading data company has ○ solved product challenges at hand ○ improved process / org / ways of working ○ had data / AI success through enabled teams ● Brief recipe for success ○ Align teams with value chains ○ Automation + data for current challenges ○ Centralised, immutable data in pipelines ○ It's a software engineering problem ■ QA, composability, DevOps, … ○ Feedback cycle speed < hour (day) Technology change Process change Negative ROI Positive ROI Sustainable data+AI feature development We must be AI first! Solve today's challenges with data & automation
  • 21. www.scling.com ● Other disciplines? ○ Shared innovation ○ E.g. Foxconn, Autoliv ● Tech leaders ○ Grow, don't build ○ Product-driven 21 Data factory adoption Observations: ● Industrial success requires veterans ● Not enough factory experience around ● Minimal flow of veterans to incumbents / consultants ● Artisanal tools (Data warehouse / low-code / …) inadequate for domain-specific AI ● Industrial tools not helpful without veterans Belief: We need new ways of collaboration. Customer Data factory Data platform & lake data domain expertise Value from data! Data innovation as leaders do it Learning by doing, in collaboration
  • 22. www.scling.com What we learnt 22 Success under good circumstances ● Match Spotify's numbers per developer ● 10-1000x client's own capability ● Data + AI innovation on par with leaders Challenge: Data capability → innovation ● Domain-specific competence ● "Innovation building blocks" needed ○ Agile, pull vs push, iterative ○ Digital thinking ○ Aligned, cross-functional teams ○ Product focus ○ Value chain alignment Greater challenge: "How hard can it be?" ● Unawareness of data divide
  • 23. www.scling.com What we learnt, what we need 23 Humble, but competent clients ● Data with value potential ● Domain experts ● Innovation building blocks ● Humble regarding data divide Other ways to slice the challenge? Partners that bridge innovation gap ● Digitalisation in some vertical ● Interested in new business models Success under good circumstances ● Match Spotify's numbers per developer ● 10-1000x client's own capability ● Data + AI innovation on par with leaders Challenge: Data capability → innovation ● Domain-specific competence ● "Innovation building blocks" needed ○ Agile, pull vs push, iterative ○ Digital thinking ○ Aligned, cross-functional teams ○ Product focus ○ Value chain alignment Greater challenge: "How hard can it be?" ● Unawareness of data divide