The document discusses how data science can help build better products. It explains that products are initially built to quickly test ideas through lightweight and imperfect means. Data science helps understand customer value and enables continuous learning through a process of analyzing data, making discoveries, and pivoting the product based on what is learned. This contrasts with the traditional approach where functionality is locked in place. The document advocates for an adaptive software environment that allows for rapid changes based on new insights. It provides tips for building successful data products through iterative improvements informed by data.
How Data Science Builds Better Products - Data Science Pop-up Seattle
1. #datapopupseattle
How Data Science Builds
Better Products
Sean McClure, Ph.D
Data Scientist, Senior Consultant, ThoughtWorks
WorldOfDataSci Thoughtworks
2. #datapopupseattle
UNSTRUCTURED
Data Science POP-UP in Seattle
www.dominodatalab.com
D
Produced by Domino Data Lab
Domino’s enterprise data science platform is used
by leading analytical organizations to increase
productivity, enable collaboration, and publish
models into production faster.
12. data science
the right decisions
+
understands strategyunderstands data
BETTER DISCOVERY
13. Count-controlled loops
Condition-controlled loops
Collection-controlled loops
Infinite loops
Restart loop
Generators
Early exit from loops
Loop variants and invariants
Loop system cross-references
Structured non-local control flow
Conditions
Exceptions
Loops
Flow
Control structures
If-then-(else)
Case and switch
Coroutines
Continuations
STANDARD SOFTWARE
What’s Wrong With the Usual Approach?
All functionality is locked in place
17. Learning algorithms
Model Validation
Model Performance
Data visualization
Operationalizing Models
Scientific computing libraries
Data cleansing
Data preparation
Probability and statistics
Loops
Flow
Control structures
If-then-(else)
Case and switch
Coroutines
Continuations
Count-controlled loops
Condition-controlled loops
Collection-controlled loops
Infinite loops
Restart loop
Generators
Early exit from loops
Loop variants and invariants
Loop system cross-references
Structured non-local control flow
Conditions
Exceptions
ADAPTIVE SOFTWARE
What is the New Approach?
unlocked
23. Successful Data Products
• establish early benchmarks
• understand true validation
• build sophistication via iteration
• provide APIs to model results
• get continuous exposure to domain experience
• design product experiments
Need to utilize technology choices that allow for
building data products successfully
24. Search Engine Marketing - Recommendation
• Increasing CTR?
• Decreasing CPC?
• Call volume trends
• Percentage of Good Call trends.
• Page Position
• Visits vs Cost Per Visit
• Impressions vs CTR graph.
• Breakdown of CVT types
• Click-to-call
• Daily Budget Spend
• Top 5 KWs vs Previous Good Cycle
• Budget distribution
• Impressions per publisher
• Revenue per publisher
• Page position per publisher
• Review for Negative KWs
• Review for Partner site issues
• Review for OAT
• Check Category page
• Impression Share
• Are the ads approved and running?
• Below 1st Page Bid KWs
• Quality Score
• Is it loading?
• Are all numbers replacing correctly?
• Out of Area Traffic
• High Spend – Low Revenue.
• Super Low CTRs
making decisions
25. Data Product
Hadoop Cluster
Databases
DB Data
Producer
Queue
Reporting Data
Operational Data
rl_op
rl_
keyword
rl_
report
HDFS
Flume
Data Core CPI Data Mart
Campaign
Creative
Publishers
Proxy Logs Call Logs
CPI
Admin
Console
Others
Others
Sqoop
CPI
Space
Raw
Nor
mali
zed
Core
Jobs
CPI
Jobs
Search Engine Marketing - Recommendation