SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Asking “Why?”
A lesson for Data Scientists and
those who manage them
Adapted from a post by Mike Stringer &
Dean Malmgren, founders of Datascope
???
The other day we had a
conversation with a
bespectacled senior data
scientist at another
organization (named X to
protect the innocent).
Many of us have had similar
conversations with people like
X, and many of us have been
X before.
Data scientists, being curious
individuals, are often drawn to projects
because:
☑ they’re interesting
☑ they’re fun
☑ they’re technically challenging
☑ their boss heard about “big data” in
the Wall Street Journal
These reasons are all
distinctly different from trying
to solve an important
problem.
Important problems in
business are often daunting to
data scientists because they
don’t strictly require data to
solve…
…and there are established
experts already working on
them.
Operations Product
Development
Strategy
Human
Resources Marketing
IT R&DSales
Yet these roles increasingly have
an opportunity to use data in
innovative ways, to make dents
in long-standing problems where
quantitative approaches have
previously been impossible.
Operations Product
Development
Strategy
Human
Resources Marketing
IT R&DSales
To tap this abundant resource
of useful problems to solve,
data scientists must:
1. learn from business domain
experts about real problems
2. think creatively about if and
how data can be used as part of
a solution
3. focus on problems that
actually improve the business.
Going in any different order is a
recipe for disillusionment about big
data’s true potential.
Starting with a real problem instead
of starting with some interesting
dataset often leads data scientists
down a completely different—and
much more fruitful—path.
A real example from our work
at Datascope:
In 2010, Brian Uzzi introduced
us to Daegis, an e-discovery
services provider
When a company gets sued, they
have to provide all documents
relevant to the case.
E-discovery companies like Daegis
use a combination of technology and
lawyers to help sued companies
provide these documents, without
providing anything they don’t need to.
Early conversations circled
around “social network
analysis”.
Daegis’ client datasets
contained millions of emails we
could parse, study and visualize!
☑ Interesting
☑ Fun
☑ Technically challenging
☐ Useful to the business
But we caught ourselves, and
asked one important question.
Why?
Instead of social networks, we
made the first phase of our
project building a quick
prototype using data from the
Text Retrieval Conference
(TREC).
We demonstrated that our
transductive learning
algorithms could reduce the
number of documents that
needed to be reviewed by
80-99%.
This was huge!
We were going to help Daegis gain
a tremendous advantage and
Daegis’ clients would be able to
defend themselves from frivolous
lawsuits.
+1 for the good guys. Right?
There’s that “why” again.
Had we asked about this at the
beginning of the project we
would’ve known the
importance of defensibility.
After more design iterations
(see our Strata presentation or
slides if you’re interested), we
arrived at some insights: what
we developed needed to be
educational, transparent, and
understandable.
By the end, if you had to summarize
the project, it would be closer to
“educating attorneys about
information retrieval” than “social
network analysis.”
The final result is a product that
Daegis sells under the name
Acumen.
This case illustrates a lesson
for data scientists:
Ask why first!
But beware.
The answers to this deceptively
simple question may surprise you,
take you into challenging uncharted
territory, and inspire you to think
about problems in completely
different ways.
Learn more about us at http://datasco.pe
Thanks for your attention.

Contenu connexe

Tendances

The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
crystalpullen
 

Tendances (20)

Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Making Big Data Work
Making Big Data WorkMaking Big Data Work
Making Big Data Work
 
Agile beyond it case study sanika bhide
Agile beyond it case study sanika bhideAgile beyond it case study sanika bhide
Agile beyond it case study sanika bhide
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Social Media World presentation
Social Media World presentationSocial Media World presentation
Social Media World presentation
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital Enterprise
 
Bootstrap Big Data Webinar
Bootstrap Big Data WebinarBootstrap Big Data Webinar
Bootstrap Big Data Webinar
 
Anchormen corne versloot
Anchormen corne verslootAnchormen corne versloot
Anchormen corne versloot
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
 
Big Data
Big DataBig Data
Big Data
 
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
 
Introduction to Semantic Computing
Introduction to Semantic ComputingIntroduction to Semantic Computing
Introduction to Semantic Computing
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
London Online 2008
London Online 2008London Online 2008
London Online 2008
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Big Data Training (Part 1)
Hadoop Big Data Training (Part 1)Hadoop Big Data Training (Part 1)
Hadoop Big Data Training (Part 1)
 
Introduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAIntroduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBA
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 

En vedette (12)

Presentation of Learning
Presentation of LearningPresentation of Learning
Presentation of Learning
 
171202
171202171202
171202
 
Infomedia-InnovationInMotion
Infomedia-InnovationInMotionInfomedia-InnovationInMotion
Infomedia-InnovationInMotion
 
Mvp #2 data
Mvp #2 dataMvp #2 data
Mvp #2 data
 
WARAKA WA WATANZANIA KUTOKA AHERA KUHUSU RAIS JAKAYA KIKWETE
WARAKA WA WATANZANIA KUTOKA AHERA KUHUSU RAIS JAKAYA KIKWETEWARAKA WA WATANZANIA KUTOKA AHERA KUHUSU RAIS JAKAYA KIKWETE
WARAKA WA WATANZANIA KUTOKA AHERA KUHUSU RAIS JAKAYA KIKWETE
 
TUDARCO-PROCUREMENT ISSUES IN OIL AND GAS CONTRACTS
TUDARCO-PROCUREMENT ISSUES IN OIL AND GAS CONTRACTSTUDARCO-PROCUREMENT ISSUES IN OIL AND GAS CONTRACTS
TUDARCO-PROCUREMENT ISSUES IN OIL AND GAS CONTRACTS
 
IKULU SI MAHALA PA KUKIMBILIA HATA KIDOGO
IKULU SI MAHALA PA KUKIMBILIA HATA KIDOGOIKULU SI MAHALA PA KUKIMBILIA HATA KIDOGO
IKULU SI MAHALA PA KUKIMBILIA HATA KIDOGO
 
MEC Intranet Farewell response
MEC Intranet Farewell responseMEC Intranet Farewell response
MEC Intranet Farewell response
 
Aplicatii ale matematicii
Aplicatii ale matematiciiAplicatii ale matematicii
Aplicatii ale matematicii
 
MINERAL TAX CLINIC REVISED EDITION 3
MINERAL TAX CLINIC REVISED EDITION 3MINERAL TAX CLINIC REVISED EDITION 3
MINERAL TAX CLINIC REVISED EDITION 3
 
Seguridad Industrial call center
Seguridad Industrial call centerSeguridad Industrial call center
Seguridad Industrial call center
 
Xbox presentation
Xbox presentationXbox presentation
Xbox presentation
 

Similaire à Asking Why

Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
mark madsen
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
juliennehar
 

Similaire à Asking Why (20)

Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
How to Use Data for Good
How to Use Data for Good How to Use Data for Good
How to Use Data for Good
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data, AI and Tokens: A Glimpse of What is to Come
Data, AI and Tokens: A Glimpse of What is to ComeData, AI and Tokens: A Glimpse of What is to Come
Data, AI and Tokens: A Glimpse of What is to Come
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
A strategy for security data analytics - SIRACon 2016
A strategy for security data analytics - SIRACon 2016A strategy for security data analytics - SIRACon 2016
A strategy for security data analytics - SIRACon 2016
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Data Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdfData Analytics Today - Data, Tech, and Regulation.pdf
Data Analytics Today - Data, Tech, and Regulation.pdf
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the Organization
 
Data Scientist
Data ScientistData Scientist
Data Scientist
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
20 Best Platforms to Learn Data Science and Machine Learning.pdf
20 Best Platforms to Learn Data Science and Machine Learning.pdf20 Best Platforms to Learn Data Science and Machine Learning.pdf
20 Best Platforms to Learn Data Science and Machine Learning.pdf
 
The Open Group Conference Panel Explores How the Big Data Era Now Challenges ...
The Open Group Conference Panel Explores How the Big Data Era Now Challenges ...The Open Group Conference Panel Explores How the Big Data Era Now Challenges ...
The Open Group Conference Panel Explores How the Big Data Era Now Challenges ...
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 

Dernier

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Dernier (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

Asking Why

  • 1. Asking “Why?” A lesson for Data Scientists and those who manage them Adapted from a post by Mike Stringer & Dean Malmgren, founders of Datascope ???
  • 2. The other day we had a conversation with a bespectacled senior data scientist at another organization (named X to protect the innocent).
  • 3.
  • 4. Many of us have had similar conversations with people like X, and many of us have been X before.
  • 5. Data scientists, being curious individuals, are often drawn to projects because: ☑ they’re interesting ☑ they’re fun ☑ they’re technically challenging ☑ their boss heard about “big data” in the Wall Street Journal
  • 6. These reasons are all distinctly different from trying to solve an important problem.
  • 7. Important problems in business are often daunting to data scientists because they don’t strictly require data to solve…
  • 8. …and there are established experts already working on them. Operations Product Development Strategy Human Resources Marketing IT R&DSales
  • 9. Yet these roles increasingly have an opportunity to use data in innovative ways, to make dents in long-standing problems where quantitative approaches have previously been impossible. Operations Product Development Strategy Human Resources Marketing IT R&DSales
  • 10. To tap this abundant resource of useful problems to solve, data scientists must:
  • 11. 1. learn from business domain experts about real problems
  • 12. 2. think creatively about if and how data can be used as part of a solution
  • 13. 3. focus on problems that actually improve the business.
  • 14. Going in any different order is a recipe for disillusionment about big data’s true potential. Starting with a real problem instead of starting with some interesting dataset often leads data scientists down a completely different—and much more fruitful—path.
  • 15. A real example from our work at Datascope:
  • 16. In 2010, Brian Uzzi introduced us to Daegis, an e-discovery services provider
  • 17. When a company gets sued, they have to provide all documents relevant to the case. E-discovery companies like Daegis use a combination of technology and lawyers to help sued companies provide these documents, without providing anything they don’t need to.
  • 18. Early conversations circled around “social network analysis”. Daegis’ client datasets contained millions of emails we could parse, study and visualize!
  • 19. ☑ Interesting ☑ Fun ☑ Technically challenging ☐ Useful to the business
  • 20. But we caught ourselves, and asked one important question. Why?
  • 21.
  • 22. Instead of social networks, we made the first phase of our project building a quick prototype using data from the Text Retrieval Conference (TREC).
  • 23. We demonstrated that our transductive learning algorithms could reduce the number of documents that needed to be reviewed by 80-99%.
  • 24. This was huge! We were going to help Daegis gain a tremendous advantage and Daegis’ clients would be able to defend themselves from frivolous lawsuits. +1 for the good guys. Right?
  • 25.
  • 26. There’s that “why” again. Had we asked about this at the beginning of the project we would’ve known the importance of defensibility.
  • 27. After more design iterations (see our Strata presentation or slides if you’re interested), we arrived at some insights: what we developed needed to be educational, transparent, and understandable.
  • 28. By the end, if you had to summarize the project, it would be closer to “educating attorneys about information retrieval” than “social network analysis.” The final result is a product that Daegis sells under the name Acumen.
  • 29. This case illustrates a lesson for data scientists: Ask why first!
  • 30.
  • 31. But beware. The answers to this deceptively simple question may surprise you, take you into challenging uncharted territory, and inspire you to think about problems in completely different ways.
  • 32. Learn more about us at http://datasco.pe Thanks for your attention.