SlideShare une entreprise Scribd logo
1  sur  58
Bo Peng • @bo_p
Iterative design for data science projects
for QCon San Francisco • Nov 7, 2016
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
iterative-design-data-science
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
http://heritagehealthprize.com
Goal: Create an algorithm
that predicts how many
days a patient will spend in
a hospital in the next year.
case study: heritage health prize
approach
2
1,363
25,316
years
teams
entries
http://heritagehealthprize.com
case study: heritage health prize
approach
score
time (in months)
constant value
all zeros
goal
http://heritagehealthprize.com
case study: heritage health prize
approach
score
time (in months)
constant value
all zeros
goal
http://heritagehealthprize.com
case study: heritage health prize
approach
score
time (in months)
constant value
all zeros
goal
http://heritagehealthprize.com
case study: heritage health prize
approach
score
time (in months)
constant value
all zeros
goal
http://heritagehealthprize.com
case study: heritage health prize
approach
score
time (in months)
constant value
all zeros
goal
http://heritagehealthprize.com
case study: heritage health prize
approach
score
time (in months)
constant value
all zeros
goal
http://heritagehealthprize.com
case study: heritage health prize
approach
score
constant value
all zeros
goal
What can we learn from this?
Solving business problems can rarely be
reduced to minimizing a model’s RMSE.
score
constant value
all zeros
goal
Contests are fun.
Solving business problems can rarely be
reduced to minimizing a model’s RMSE.
score
constant value
all zeros
goal
Contests are fun.
Solving business problems can rarely be
reduced to minimizing a model’s RMSE.
agenda
- A common approach to data science
- The design approach:
- a simple model goes along way (eDiscovery)
- finding & recommending experts within P&G
How simple models + design go a long way
Data driven e-discovery for Daegis
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent
turn over to plaintiff
don’t
turn over to plaintiff
adverse inference
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent
turn over to plaintiff
don’t
turn over to plaintiff
adverse inference
give away trade secrets
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent
turn over to plaintiff
don’t
turn over to plaintiff
adverse inference
give away trade secrets
data-driven e-discovery
daegis
turn over to plaintiff
don’t
turn over to plaintiff
data-driven e-discovery
daegis
data-driven e-discovery
daegis
create a “document map”
algorithm design
patents
marketing
finances
fantasy football
lunch
coffee
data-driven e-discovery
daegis
create a “document map”
fantasy football
algorithm design
patents
lunch
marketing
finances
coffee
review away shades of grey
reduce reviews by 90-99%
data-driven e-discovery
daegis
care about design.
simple, powerful interfaces relay analytics better.
iterative problem solving
generate ideas
build prototypeevaluate
rapid
iterations
plan, build, test, and iterate as quickly as possible
Procter & Gamble
Data driven expertise exploration
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
High level goals:
- reveal areas of expertise
- evaluate connectivity within experts
data-driven expertise exploration
procter & gamble
Lorem Ipsum: a narrative about blankets.
Author: Charlie Brown
Date: 31 Jan 2012
Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a long
history starting from the 1500s and is still used in digital millennium for typesetting electronic
documents, page designs, etc.
In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin book
that was written about 50 BC. Nevertheless, Lorem Ipsum’s words have been changed so
they don’t read as a proper text.
Naturally, page designs that are made for text documents must contain some text rather than
placeholder dots or something else. However, should they contain proper English words and
sentences almost every reader will deliberately try to interpret it eventually, missing the
design itself.
However, a placeholder text must have a natural distribution of letters and punctuation or
otherwise the markup will look strange and unnatural. That’s what Lorem Ipsum helps to
achieve.
I would like to thank Peppermint Pattyfor her support on studying Lorem
Ipsum as well as the infinite wisdom of Linus van Peltand his willingness to use
his blanket in my experiments.
data-driven expertise exploration
procter & gamble
vs.
vs.
iterative problem solving
generate ideas
build prototypeevaluate
rapid
iterations
plan, build, test, and iterate as quickly as possible
High level goals:
- reveal areas of expertise
- evaluate connectivity within experts
High level goals:
- reveal areas of expertise
- evaluate connectivity within experts
let’s compare countries.
+ 1
10 5 5 20
8 25 2 5
12 3 30 10
1 20 25 50
10 5 5 20
8 25 2 5
12 3 30 10
1 20 25 50
10 5 5 20
8 25 2 5
12 3 30 10
1 20 25 50
10 5 5 20
8 25 2 5
12 3 30 10
1 20 25 50
design influences data science.
care about design.
Iterative design for data science projects
Bo Peng • @bo_p
for QCon San Francisco • Thanks!
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
iterative-design-data-science

Contenu connexe

Plus de C4Media

Plus de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Iterative Design for Data Science Projects