SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Roger S. Barga, Ph.D.
General Manager
Amazon Web Services
Driving Business Value with
Data Science
Recent Experience
Fielded Solutions
• Customer Segmentation & Targeting
• Which category item will customer buy next?
• Azure ML Reference Customer
• Predictive Analytics to reduce school
dropout rate
• Predictive model identifies which students
are risk of dropping out at K12
• Predictive Model shows when JLL can charge
above or below the market for a specific deal
• Predictive maintenance & Internet of Things
• Built model to predict causes of elevator failure
• Reference Customer for Azure ML and ISS
REFERENCECUSTOMERS
Predictive Maintenance at ThyssenKrupp
ThyssenKrupp partnered with Microsoft to build
a new predictive maintenance solution to
improve service margins for its elevator business
• Great Internet of Things example
• Used ISS and Azure Machine Learning
• ML model predicts top causes of failure in an
elevator – 5M elevators in production, $400
cost savings annually.
Key Benefits
• Ease of use across skillsets
• Ease of deployment
• Increased productivity
Now we have the ability to
use live data to define the
needed repair before a
breakdown happens,
reducing costs for ourselves
and our customers.
Dr. Rory Smith
ThyssenKrupp
Problem:
To leverage the history of a person’s behavior on
Microsoft.com to identify their interests and
predict future actions
Findings:
• Opportunity to provide upsell after users hit on
Microsoft online products such as Bing,
SkyDrive, Xbox Live, Zune
• Target messaging on Windows Phone extends
the functionality of Microsoft products
Methodology:
• Big Data Platform – HDP for Windows/Azure
HDInsight and Advanced Analytics support
• Develop statistical models to determine the
probability of users buying a Surface Device
Customer Targeting With Machine Learning
Problem: Early detection of suspicious activity on the
network servers & eliminate the threat.
Methodology:
• File system to store massive security data.
• Fully automated workflow to drive end-to-end
data receiving and transformation process.
• Analysis and visualizations of Windows Events to
identify pre-defined threat scenarios.
• Move from descriptive analytics to a mature
predictive archetype.
Preventing Network Intrusion with Machine Learning
A Sample Project
• Create a Pdemo to show potential of Predictive Analytics
• Develop a demo to answer the question “What factors drive
our client to charge over or below market rates?”
• Create 2 predictive models to predict
• If our client can charge over the market average for Landlords, and
• Whether our client can charge below the market for Tenants
• Develop strategies to explain key factors that drive these outcomes
• Visualize results in Power BI.
Building Predictive Models
Business
Insights
1
2
34
5
Note:
This is a variant of the Cross-Industry
Standard Process for Data Mining
(CRISP-DM)
Conceptual Solution
Data Pre-processing
on Hadoop (Hive
queries)
Data Preparation and Predictive
Models with Machine Learning
Source Data
#1
Source Data
#2 Visualization in Power
BI
How to Use the Predictive Model
Predictive Model
Data on a
new deal
1 = You can charge above the
market average
0 = You can charge below
market
Broker
Data Preparation
• Source data: 1 internal and 1
external data source
• Internal data source prepared on
Hadoop cluster
• Both datasets joined in our internal
Machine Learning tool
• New column created to determine
when our client charges above or
below market average
Data Source #1 Data Source #2
Predictive Model
• Tested several algorithms
including Logistic Regression,
Boosted Decision Trees, etc.
• Models were trained with 10-fold
cross validation.
• Boosted Decision Trees was the
best algorithm – see ROC curve
• Area under curve for Boosted
Decision Trees was 92.4%!
Predictive ModelforLandlords-Results
• Boosted Decision Trees - Area under
the curve = 92.4%!
• Logistic Regresssion - Area under the
curve = 81.2%!
Visualization in Power BI
Industry Overview: Financial Services
Data Science applied to the Financial Services sector enables insights into:
“The opportunity for the Financial sectors are to unlock
the potential in their data through analytics and shape
the strategy for business through reliable factual insight
rather than intuition…” - Deloitte, 2013
Fraud & Financial Crimes
• Enterprise fraud and financial crimes
• Fraud Detection
• Credit Risk Management
Analytics
• Actuarial analysis, portfolio management and rate making
• Forecasting and econometrics
• Predictive analytics and data mining
• Mathematical optimization and simulations
Marketing & Customer Experience
• Social media analytics
• Customer Segmentation
• Customer Targeting
Customer Experience Enhancement
• Clickstream analysis
• Customer lifecycle management
• Dynamic profiling and enhanced customer
segmentation
Banks, Insurance,
Real Estate
Industry Overview: Healthcare
Providers,
Payers,
Pharmaceuticals
& Biotechnology
Data Science applied to the Healthcare sector enables insights into:
“Predictive analytics addresses today's pressing challenges in
healthcare effectiveness and economics by improving
operations across the spectrum of healthcare functions…”
- Predictive Analytics World Healthcare, 2014
Quality & Outcomes
• Readmissions Avoidance Analysis
• Health outcomes
• Patient safety
Consumer Analytics
• Customer acquisition
• Health intervention
• Member & Population Health
• Value-based care and
payment models
• Membership portfolio
optimization
Risk & Incentives
• A holistic view of patient episodes
• Value-based care and payment models
Care Delivery
• Health care cost analytics
• Performance management
• Workforce planning
Cost Containment
• Fraud and improper payments
• Eligibility fraud
• Enterprise case management
Industry Overview: Oil &Gas
Oil & Gas Producers,
Oil Equipment,
Services &
Distribution,
Alternative Energy
Data Science applied to the Oil and Gas sector enables insights into:
Oil Field Analytics
• Seismic analyses
• Reservoir characterization
• Drilling optimization.
• Unconventional completions.
• Production forecasting.
Assets & Operations
• Facility integrity
• Demand forecasting.
• Integrated operations and
logistics
• Operational risk/environment,
health and safety (EH&S)
Data Management
• Complex Event Processing
• Data Quality
• Master Data Management
“Access to more information from multiple sources and
disciplines and more sophisticated analytics will improve the
oil and gas industry's ability to optimize production…
Analytics will provide a way to bring optimization from
statisticians to the business.” – IDC, 2013
How to be Successful
How to be successful?
1. Create value
2. Capture some for yourself
How to create value (as a data scientist)
Extract insights from data for decision support
Productive Use of Time
Have a bias against writing learning algorithms
• Have a bias in favor of leveraging 3rd party
implementations…
Productive Use of Time
Have a bias against writing learning algorithms
• Bias in favor of leveraging 3rd party implementations
• Add data: more information beats better algorithms
Productive Use of Time
Have a bias against writing learning algorithms
• Bias in favor of leveraging 3rd party implementations
• Add data: more information beats better algorithms
You will write data manipulation algorithms
• Data is surprising enough, need algorithm certainty
• Defect count is proportional to line count
• Use as high level a language as possible
Analysis and Diminishing Returns
First few models tend to capture most of the value
Analysis and Diminishing Returns
First few models tend to capture most of the value
Distinguish between:
• Marginal improvements important (e.g., search, WalMart);
• Marginal improvements unimportant (typical).
Analysis and Diminishing Returns
First few models tend to capture most of the value
Distinguish between:
• Marginal improvements important (e.g., search, WalMart);
• Marginal improvements unimportant (typical).
Latter case: get first 80%, move to new problem
The Importance of Starting Small
The Importance of Starting Small
When you first encounter a data set, you know nothing.
• Ergo: first piece of data is very informative.
• Think of data set utility as roughly logarithmic in size.
The Importance of Starting Small
When you first encounter a data set, you know nothing.
• Ergo: first piece of data is very informative.
• Think of data set utility as roughly logarithmic in size.
Don’t require a large data set before starting analysis.
The Importance of Starting Small
When you first encounter a data set, you know nothing.
• Ergo: first piece of data is very informative.
• Think of data set utility as roughly logarithmic in size.
Don’t require a large data set before starting analysis.
Always try things out on small portions of data first.
Timescales and Failing Fast
1. Immediate zone: less than 60 seconds
• 100s per day
2.Bathroom break zone: less than 5 minutes
• 10s per day
3.Lunch zone: less than an hour
• 5 per day
4.Overnight zone: less than 12 hours
• 1 per day
Timescales and Failing Slow
1. Immediate zone: less than 60 seconds
• 100s per day
2.Bathroom break zone: less than 5 minutes
• 10s per day
3.Lunch zone: less than an hour
• 5 per day
4.Overnight zone: less than 12 hours
• 1 per day
Timescales and Failing Fast
1. Immediate zone: less than 60 seconds
• 100s per day
2.Bathroom break zone: less than 5 minutes
• 10s per day
3.Lunch zone: less than an hour
• 5 per day
4.Overnight zone: less than 12 hours
• 1 per day
Failing Fast: Summary
1. Move code to data, not the converse!
2.Do feature engineering with a fast learning algorithm
(e.g., linear), then switch to a slower algorithm for
the final product (e.g., GBDT, NN).
3.Subsample your data intelligently.
4.Less examples (rows), e.g., imbalanced classification.
5.Less features (columns), e.g., random projections
Productivity demands debugging as fast as possible.
Stay in the immediate zone
Proxy Metrics
Proxy Metric: Something you can measure and optimize
• Revenue per impression
• Clickthrough rate
• Reciprocal communication rate
• Polling results
• Gene expression levels
• Value at risk
Proxy Metrics Reality
Reality: Something you actually care about
Revenue per impression Economic Value Created
Clickthrough rate User Experience Quality
Reciprocal communication rate Match Quality
Polling results Election Outcome
Gene expression levels Drug Efficacy in Vivo
Value at risk Portfolio Quality
Proxy Metrics vs. Reality
Agree on the OEC
A concrete goal begets concrete stopping conditions and
concrete acceptance criteria.
The less specific the goal, the likelier that the project will go
unbounded, because no result will be "good enough."
If you don't know what you want to achieve, you don't know
when to stop trying – or even what to try. When the project
eventually terminates – because either time or resources run
out – no one will be happy with the outcome…
Key Takeaways
Think about your data, not about your software.
Productivity is about not waiting for answers.
Mind the gap (between proxy metrics and reality).
Agree upon the OEC with business stakeholders
Best Defense: close collaboration with a business expert.
Know Your (re)Sources
You can make much stronger inferences about a woman named Brittany. That name was very
popular from the mid-1980s through the mid-1990s, but it wasn’t all that common before and
hasn’t been since. If you know a Brittany, she is probably of college age or just a bit older. Half
of living American Brittany’s are between the ages of 19 and 25
Blogs to Follow…
• FastML, covering practical applications of machine learning and data science
• Hilary Mason blog, from Bitly Chief Scientist, covering Data Science and Machine
Learning on Big Data.
• Hunch.net, by John Langford, a leading applied machine learning researcher; His
blog covers the intersection of theory and practice
• Kaggle blog no free hunch, covering Kaggle data science and machine learning
competitions
• KDnuggets, news, jobs, software, events, and more in Data Mining and Data Science
research and applications
• Normal Deviate by Larry Wasserman, CMU Prof. of Statistics and Machine Learning
• Statistical Modeling, Causal Inference, and Social Science by Andrew Gelman
• Three-Toed Sloth by Cosma Shalizi
• FiveThirtyEight Blog by Nate Silver, a very popular and non-technical blog covering
analytics applied mainly to politics and sports
Blogs to Follow…
• Data Mining Research blog by Sandro Saitta
• Data Mining: Text Mining, Visualization, and Social Media, by Matthew Hurst, a leading data
scientist at Microsoft
• DecisionStats, by Ajay Ohri, covering business analytics and R, with practical examples, and
interviews of field leaders
• Geeking with Greg , by Greg Linden, inventor of Amazon recommendation engine and
internet enterpreneur
• IA Ventures blog, one of the leading Big Data venture capitalists Roger Ehrenberg and team
• Occam's Razor, by Avinash Kaushik, brilliant Digital Marketing Evangelist at Google
• R-bloggers , best blogs from the community of R, with code, examples, and visualizations
• Smart Data Collective, an aggregation of blogs from many interesting data science people
• Steve Miller blog, covering data science, statistics, R, and other topics at Information
management.
• Tom H. C. Anderson blog, focusing on market research with data and text mining.
• What's the Big Data, by Gil Press. Gil covers the Big Data space and also writes a column on
Big Data and Business in Forbes.
THANK YOU!

Contenu connexe

Tendances

Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 

Tendances (20)

Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 

En vedette

Multisystemic treatment-consignas
Multisystemic treatment-consignasMultisystemic treatment-consignas
Multisystemic treatment-consignas
almacenuc
 

En vedette (15)

Аудит 2013
Аудит 2013Аудит 2013
Аудит 2013
 
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANASPARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
 
Kerajaan
KerajaanKerajaan
Kerajaan
 
Quién es mi mamá
Quién es mi mamáQuién es mi mamá
Quién es mi mamá
 
Alguna vez has visto un pavo real
Alguna vez has visto un pavo realAlguna vez has visto un pavo real
Alguna vez has visto un pavo real
 
Rifapremios
RifapremiosRifapremios
Rifapremios
 
Analisis ekonomi pertanaman kelapa
Analisis ekonomi pertanaman  kelapaAnalisis ekonomi pertanaman  kelapa
Analisis ekonomi pertanaman kelapa
 
43393707 presentasi-narkoba
43393707 presentasi-narkoba43393707 presentasi-narkoba
43393707 presentasi-narkoba
 
Performance Management Fundamentals Presentation
Performance Management Fundamentals Presentation Performance Management Fundamentals Presentation
Performance Management Fundamentals Presentation
 
Francesco Micali : Corsi eventi medicina_in_italia_wordpress - Mediabeta srl
Francesco Micali : Corsi eventi medicina_in_italia_wordpress - Mediabeta srlFrancesco Micali : Corsi eventi medicina_in_italia_wordpress - Mediabeta srl
Francesco Micali : Corsi eventi medicina_in_italia_wordpress - Mediabeta srl
 
Sauce Labs for Visual Studio Team Services & TFS
Sauce Labs for Visual Studio Team Services & TFSSauce Labs for Visual Studio Team Services & TFS
Sauce Labs for Visual Studio Team Services & TFS
 
Saral Gyan Hidden Gem - Dec 2012
Saral Gyan Hidden Gem - Dec 2012Saral Gyan Hidden Gem - Dec 2012
Saral Gyan Hidden Gem - Dec 2012
 
Meetingonline 3C
Meetingonline 3CMeetingonline 3C
Meetingonline 3C
 
Workplay 3B
Workplay 3BWorkplay 3B
Workplay 3B
 
Multisystemic treatment-consignas
Multisystemic treatment-consignasMultisystemic treatment-consignas
Multisystemic treatment-consignas
 

Similaire à Barga Galvanize Sept 2015

351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
Priyesh Patel
 

Similaire à Barga Galvanize Sept 2015 (20)

If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Deep learning
Deep learningDeep learning
Deep learning
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
 
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the EnterpriseNZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
NZS-4555 - IT Analytics Keynote - IT Analytics for the Enterprise
 
Data mining
Data miningData mining
Data mining
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
Improving Data Modeling Workflow
Improving Data Modeling WorkflowImproving Data Modeling Workflow
Improving Data Modeling Workflow
 
Top Rated Dissertation Data Analysis Services | PhD Assistance
Top Rated Dissertation Data Analysis Services | PhD AssistanceTop Rated Dissertation Data Analysis Services | PhD Assistance
Top Rated Dissertation Data Analysis Services | PhD Assistance
 
Real-Time Data Analytics Examples
Real-Time Data Analytics ExamplesReal-Time Data Analytics Examples
Real-Time Data Analytics Examples
 
Model Factory at ING Bank
Model Factory at ING BankModel Factory at ING Bank
Model Factory at ING Bank
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikh
 
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjnWHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economy
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 

Plus de Roger Barga (9)

RS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRS Barga STRATA'18 New York City
RS Barga STRATA'18 New York City
 
Barga Strata'18 presentation
Barga Strata'18 presentationBarga Strata'18 presentation
Barga Strata'18 presentation
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Barga Data Science lecture 3
Barga Data Science lecture 3Barga Data Science lecture 3
Barga Data Science lecture 3
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 

Dernier

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Barga Galvanize Sept 2015

  • 1.
  • 2. Roger S. Barga, Ph.D. General Manager Amazon Web Services Driving Business Value with Data Science
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9. Fielded Solutions • Customer Segmentation & Targeting • Which category item will customer buy next? • Azure ML Reference Customer • Predictive Analytics to reduce school dropout rate • Predictive model identifies which students are risk of dropping out at K12 • Predictive Model shows when JLL can charge above or below the market for a specific deal • Predictive maintenance & Internet of Things • Built model to predict causes of elevator failure • Reference Customer for Azure ML and ISS REFERENCECUSTOMERS
  • 10. Predictive Maintenance at ThyssenKrupp ThyssenKrupp partnered with Microsoft to build a new predictive maintenance solution to improve service margins for its elevator business • Great Internet of Things example • Used ISS and Azure Machine Learning • ML model predicts top causes of failure in an elevator – 5M elevators in production, $400 cost savings annually. Key Benefits • Ease of use across skillsets • Ease of deployment • Increased productivity Now we have the ability to use live data to define the needed repair before a breakdown happens, reducing costs for ourselves and our customers. Dr. Rory Smith ThyssenKrupp
  • 11. Problem: To leverage the history of a person’s behavior on Microsoft.com to identify their interests and predict future actions Findings: • Opportunity to provide upsell after users hit on Microsoft online products such as Bing, SkyDrive, Xbox Live, Zune • Target messaging on Windows Phone extends the functionality of Microsoft products Methodology: • Big Data Platform – HDP for Windows/Azure HDInsight and Advanced Analytics support • Develop statistical models to determine the probability of users buying a Surface Device Customer Targeting With Machine Learning
  • 12. Problem: Early detection of suspicious activity on the network servers & eliminate the threat. Methodology: • File system to store massive security data. • Fully automated workflow to drive end-to-end data receiving and transformation process. • Analysis and visualizations of Windows Events to identify pre-defined threat scenarios. • Move from descriptive analytics to a mature predictive archetype. Preventing Network Intrusion with Machine Learning
  • 14. • Create a Pdemo to show potential of Predictive Analytics • Develop a demo to answer the question “What factors drive our client to charge over or below market rates?” • Create 2 predictive models to predict • If our client can charge over the market average for Landlords, and • Whether our client can charge below the market for Tenants • Develop strategies to explain key factors that drive these outcomes • Visualize results in Power BI.
  • 15. Building Predictive Models Business Insights 1 2 34 5 Note: This is a variant of the Cross-Industry Standard Process for Data Mining (CRISP-DM)
  • 16. Conceptual Solution Data Pre-processing on Hadoop (Hive queries) Data Preparation and Predictive Models with Machine Learning Source Data #1 Source Data #2 Visualization in Power BI
  • 17. How to Use the Predictive Model Predictive Model Data on a new deal 1 = You can charge above the market average 0 = You can charge below market Broker
  • 18. Data Preparation • Source data: 1 internal and 1 external data source • Internal data source prepared on Hadoop cluster • Both datasets joined in our internal Machine Learning tool • New column created to determine when our client charges above or below market average Data Source #1 Data Source #2
  • 19. Predictive Model • Tested several algorithms including Logistic Regression, Boosted Decision Trees, etc. • Models were trained with 10-fold cross validation. • Boosted Decision Trees was the best algorithm – see ROC curve • Area under curve for Boosted Decision Trees was 92.4%!
  • 20. Predictive ModelforLandlords-Results • Boosted Decision Trees - Area under the curve = 92.4%! • Logistic Regresssion - Area under the curve = 81.2%!
  • 22. Industry Overview: Financial Services Data Science applied to the Financial Services sector enables insights into: “The opportunity for the Financial sectors are to unlock the potential in their data through analytics and shape the strategy for business through reliable factual insight rather than intuition…” - Deloitte, 2013 Fraud & Financial Crimes • Enterprise fraud and financial crimes • Fraud Detection • Credit Risk Management Analytics • Actuarial analysis, portfolio management and rate making • Forecasting and econometrics • Predictive analytics and data mining • Mathematical optimization and simulations Marketing & Customer Experience • Social media analytics • Customer Segmentation • Customer Targeting Customer Experience Enhancement • Clickstream analysis • Customer lifecycle management • Dynamic profiling and enhanced customer segmentation Banks, Insurance, Real Estate
  • 23. Industry Overview: Healthcare Providers, Payers, Pharmaceuticals & Biotechnology Data Science applied to the Healthcare sector enables insights into: “Predictive analytics addresses today's pressing challenges in healthcare effectiveness and economics by improving operations across the spectrum of healthcare functions…” - Predictive Analytics World Healthcare, 2014 Quality & Outcomes • Readmissions Avoidance Analysis • Health outcomes • Patient safety Consumer Analytics • Customer acquisition • Health intervention • Member & Population Health • Value-based care and payment models • Membership portfolio optimization Risk & Incentives • A holistic view of patient episodes • Value-based care and payment models Care Delivery • Health care cost analytics • Performance management • Workforce planning Cost Containment • Fraud and improper payments • Eligibility fraud • Enterprise case management
  • 24. Industry Overview: Oil &Gas Oil & Gas Producers, Oil Equipment, Services & Distribution, Alternative Energy Data Science applied to the Oil and Gas sector enables insights into: Oil Field Analytics • Seismic analyses • Reservoir characterization • Drilling optimization. • Unconventional completions. • Production forecasting. Assets & Operations • Facility integrity • Demand forecasting. • Integrated operations and logistics • Operational risk/environment, health and safety (EH&S) Data Management • Complex Event Processing • Data Quality • Master Data Management “Access to more information from multiple sources and disciplines and more sophisticated analytics will improve the oil and gas industry's ability to optimize production… Analytics will provide a way to bring optimization from statisticians to the business.” – IDC, 2013
  • 25. How to be Successful
  • 26. How to be successful? 1. Create value 2. Capture some for yourself
  • 27. How to create value (as a data scientist) Extract insights from data for decision support
  • 28. Productive Use of Time Have a bias against writing learning algorithms • Have a bias in favor of leveraging 3rd party implementations…
  • 29. Productive Use of Time Have a bias against writing learning algorithms • Bias in favor of leveraging 3rd party implementations • Add data: more information beats better algorithms
  • 30. Productive Use of Time Have a bias against writing learning algorithms • Bias in favor of leveraging 3rd party implementations • Add data: more information beats better algorithms You will write data manipulation algorithms • Data is surprising enough, need algorithm certainty • Defect count is proportional to line count • Use as high level a language as possible
  • 31.
  • 32. Analysis and Diminishing Returns First few models tend to capture most of the value
  • 33. Analysis and Diminishing Returns First few models tend to capture most of the value Distinguish between: • Marginal improvements important (e.g., search, WalMart); • Marginal improvements unimportant (typical).
  • 34. Analysis and Diminishing Returns First few models tend to capture most of the value Distinguish between: • Marginal improvements important (e.g., search, WalMart); • Marginal improvements unimportant (typical). Latter case: get first 80%, move to new problem
  • 35. The Importance of Starting Small
  • 36. The Importance of Starting Small When you first encounter a data set, you know nothing. • Ergo: first piece of data is very informative. • Think of data set utility as roughly logarithmic in size.
  • 37. The Importance of Starting Small When you first encounter a data set, you know nothing. • Ergo: first piece of data is very informative. • Think of data set utility as roughly logarithmic in size. Don’t require a large data set before starting analysis.
  • 38. The Importance of Starting Small When you first encounter a data set, you know nothing. • Ergo: first piece of data is very informative. • Think of data set utility as roughly logarithmic in size. Don’t require a large data set before starting analysis. Always try things out on small portions of data first.
  • 39. Timescales and Failing Fast 1. Immediate zone: less than 60 seconds • 100s per day 2.Bathroom break zone: less than 5 minutes • 10s per day 3.Lunch zone: less than an hour • 5 per day 4.Overnight zone: less than 12 hours • 1 per day
  • 40. Timescales and Failing Slow 1. Immediate zone: less than 60 seconds • 100s per day 2.Bathroom break zone: less than 5 minutes • 10s per day 3.Lunch zone: less than an hour • 5 per day 4.Overnight zone: less than 12 hours • 1 per day
  • 41. Timescales and Failing Fast 1. Immediate zone: less than 60 seconds • 100s per day 2.Bathroom break zone: less than 5 minutes • 10s per day 3.Lunch zone: less than an hour • 5 per day 4.Overnight zone: less than 12 hours • 1 per day
  • 42. Failing Fast: Summary 1. Move code to data, not the converse! 2.Do feature engineering with a fast learning algorithm (e.g., linear), then switch to a slower algorithm for the final product (e.g., GBDT, NN). 3.Subsample your data intelligently. 4.Less examples (rows), e.g., imbalanced classification. 5.Less features (columns), e.g., random projections
  • 43. Productivity demands debugging as fast as possible. Stay in the immediate zone
  • 44. Proxy Metrics Proxy Metric: Something you can measure and optimize • Revenue per impression • Clickthrough rate • Reciprocal communication rate • Polling results • Gene expression levels • Value at risk
  • 45. Proxy Metrics Reality Reality: Something you actually care about Revenue per impression Economic Value Created Clickthrough rate User Experience Quality Reciprocal communication rate Match Quality Polling results Election Outcome Gene expression levels Drug Efficacy in Vivo Value at risk Portfolio Quality
  • 46. Proxy Metrics vs. Reality
  • 47. Agree on the OEC A concrete goal begets concrete stopping conditions and concrete acceptance criteria. The less specific the goal, the likelier that the project will go unbounded, because no result will be "good enough." If you don't know what you want to achieve, you don't know when to stop trying – or even what to try. When the project eventually terminates – because either time or resources run out – no one will be happy with the outcome…
  • 48. Key Takeaways Think about your data, not about your software. Productivity is about not waiting for answers. Mind the gap (between proxy metrics and reality). Agree upon the OEC with business stakeholders Best Defense: close collaboration with a business expert.
  • 50.
  • 51. You can make much stronger inferences about a woman named Brittany. That name was very popular from the mid-1980s through the mid-1990s, but it wasn’t all that common before and hasn’t been since. If you know a Brittany, she is probably of college age or just a bit older. Half of living American Brittany’s are between the ages of 19 and 25
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59. Blogs to Follow… • FastML, covering practical applications of machine learning and data science • Hilary Mason blog, from Bitly Chief Scientist, covering Data Science and Machine Learning on Big Data. • Hunch.net, by John Langford, a leading applied machine learning researcher; His blog covers the intersection of theory and practice • Kaggle blog no free hunch, covering Kaggle data science and machine learning competitions • KDnuggets, news, jobs, software, events, and more in Data Mining and Data Science research and applications • Normal Deviate by Larry Wasserman, CMU Prof. of Statistics and Machine Learning • Statistical Modeling, Causal Inference, and Social Science by Andrew Gelman • Three-Toed Sloth by Cosma Shalizi • FiveThirtyEight Blog by Nate Silver, a very popular and non-technical blog covering analytics applied mainly to politics and sports
  • 60. Blogs to Follow… • Data Mining Research blog by Sandro Saitta • Data Mining: Text Mining, Visualization, and Social Media, by Matthew Hurst, a leading data scientist at Microsoft • DecisionStats, by Ajay Ohri, covering business analytics and R, with practical examples, and interviews of field leaders • Geeking with Greg , by Greg Linden, inventor of Amazon recommendation engine and internet enterpreneur • IA Ventures blog, one of the leading Big Data venture capitalists Roger Ehrenberg and team • Occam's Razor, by Avinash Kaushik, brilliant Digital Marketing Evangelist at Google • R-bloggers , best blogs from the community of R, with code, examples, and visualizations • Smart Data Collective, an aggregation of blogs from many interesting data science people • Steve Miller blog, covering data science, statistics, R, and other topics at Information management. • Tom H. C. Anderson blog, focusing on market research with data and text mining. • What's the Big Data, by Gil Press. Gil covers the Big Data space and also writes a column on Big Data and Business in Forbes.