SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Street Fighting
 Data Science
                     Pete Skomoroch
                    @peteskomoroch
          O’Reilly Strata Conference
                   February 28, 2012
To solve hard problems:
Think like a street
     fighter
Analyze
Improvise
Anticipate
Adapt
How does this apply to
   Data Science?
Pricing model decreases profit
     in test stores by 30%
What went wrong?
• Ran complex “black box” model
• Didn’t analyze the data first
• Didn’t anticipate elasticity errors
How could this have
  been avoided?
The Men Who Stare at Charts
Look at your data
Raw Data: FEC Contributions
not employed	

118672             retired	

 32938
self employed	

92973             self-employed	

25454
information requested	

 17627    information requested per best efforts	

   1313
refused	

 728                    homemaker	

 4992
unemployed	

 1493                the bank of new york	

 65
self-employed	

5919              john mccain 2008	

 57
university of california	

 825   u.s. government	

 121
microsoft	

    915               idt corp.	

 54
university of chicago	

 616      merrill lynch	

 273
harvard university	

848          blank rome l.l.p.	

 51
google	

 662                     department of defense	

100
stanford university	

716         u.s. army	

90
university of washington	

 614   us army	

 141
ibm	

 1016                       none	

      642
columbia university	

      782   greenberg traurig	

 118
university of michigan	

 514     northrop grumman	

       105
freelance	

372                   at&t	

141
sa	

 150                         citigroup	

134
sidley austin llp	

 509          bridgewater associates	

 44
na	

 999                         univision communications inc.	

 36
not employed	

118672             retired	

 32938
self employed	

92973             self-employed	

25454
information requested	

 17627    information requested per best efforts	

   1313
refused	

 728                    homemaker	

 4992
unemployed	

 1493                the bank of new york	

 65
self-employed	

5919              john mccain 2008	

 57
university of california	

 825   u.s. government	

 121
microsoft	

    915               idt corp.	

 54
university of chicago	

 616      merrill lynch	

 273
harvard university	

848          blank rome l.l.p.	

 51
google	

 662                     department of defense	

100
stanford university	

716         u.s. army	

90
university of washington	

 614   us army	

 141
ibm	

 1016                       none	

      642
columbia university	

      782   greenberg traurig	

 118
university of michigan	

 514     northrop grumman	

       105
freelance	

372                   at&t	

141
sa	

 150                         citigroup	

134
sidley austin llp	

 509          bridgewater associates	

 44
na	

 999                         univision communications inc.	

 36
Katherine Alexandra
“Don't indulge in any
unnecessary, sophisticated
moves.
You'll get clobbered if you do,
and in a street fight you'll
have your shirt zipped off
you.”

- Bruce Lee
Look at your errors
• Sanity check row counts
• Track errors over time
• Find patterns in the error data
• Add missing features to models
• Replace models entirely
Analyze
Improvise
Anticipate
Adapt
Think like a street
     fighter

Contenu connexe

Plus de Peter Skomoroch

Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkPeter Skomoroch
 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With DataPeter Skomoroch
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustPeter Skomoroch
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsPeter Skomoroch
 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and SearchPeter Skomoroch
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingPeter Skomoroch
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPeter Skomoroch
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science SummitPeter Skomoroch
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Peter Skomoroch
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
 

Plus de Peter Skomoroch (13)

Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With Data
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and Search
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science Summit
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 

Analyze Data Like a Street Fighter

  • 1. Street Fighting Data Science Pete Skomoroch @peteskomoroch O’Reilly Strata Conference February 28, 2012
  • 2.
  • 3.
  • 4. To solve hard problems:
  • 5. Think like a street fighter
  • 7. How does this apply to Data Science?
  • 8.
  • 9.
  • 10. Pricing model decreases profit in test stores by 30%
  • 12. • Ran complex “black box” model • Didn’t analyze the data first • Didn’t anticipate elasticity errors
  • 13. How could this have been avoided?
  • 14.
  • 15.
  • 16. The Men Who Stare at Charts
  • 17. Look at your data
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Raw Data: FEC Contributions
  • 24. not employed 118672 retired 32938 self employed 92973 self-employed 25454 information requested 17627 information requested per best efforts 1313 refused 728 homemaker 4992 unemployed 1493 the bank of new york 65 self-employed 5919 john mccain 2008 57 university of california 825 u.s. government 121 microsoft 915 idt corp. 54 university of chicago 616 merrill lynch 273 harvard university 848 blank rome l.l.p. 51 google 662 department of defense 100 stanford university 716 u.s. army 90 university of washington 614 us army 141 ibm 1016 none 642 columbia university 782 greenberg traurig 118 university of michigan 514 northrop grumman 105 freelance 372 at&t 141 sa 150 citigroup 134 sidley austin llp 509 bridgewater associates 44 na 999 univision communications inc. 36
  • 25. not employed 118672 retired 32938 self employed 92973 self-employed 25454 information requested 17627 information requested per best efforts 1313 refused 728 homemaker 4992 unemployed 1493 the bank of new york 65 self-employed 5919 john mccain 2008 57 university of california 825 u.s. government 121 microsoft 915 idt corp. 54 university of chicago 616 merrill lynch 273 harvard university 848 blank rome l.l.p. 51 google 662 department of defense 100 stanford university 716 u.s. army 90 university of washington 614 us army 141 ibm 1016 none 642 columbia university 782 greenberg traurig 118 university of michigan 514 northrop grumman 105 freelance 372 at&t 141 sa 150 citigroup 134 sidley austin llp 509 bridgewater associates 44 na 999 univision communications inc. 36
  • 26.
  • 27.
  • 28.
  • 30.
  • 31.
  • 32.
  • 33. “Don't indulge in any unnecessary, sophisticated moves. You'll get clobbered if you do, and in a street fight you'll have your shirt zipped off you.” - Bruce Lee
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Look at your errors
  • 44. • Sanity check row counts • Track errors over time • Find patterns in the error data • Add missing features to models • Replace models entirely
  • 45.
  • 47. Think like a street fighter