SlideShare a Scribd company logo
1 of 23
1 May 14, 2013© Kalido I Kalido Confidential May 14, 2013
Data Scientist: Your Must-Have
Business Investment NOW
2 May 14, 2013© Kalido I Kalido Confidential May 14, 2013
Gregory Piatetsky
Editor, Kdnuggets
co-founder KDD and ACM SIGKDD
David Smith
Data Scientist
Revolution Analytics
Carla Gentry
Data Scientist
Analytical Solution
Darren Peirce
CTO
Kalido
Eric Kavanagh
DM Radio Host
Information Management
Magazine’s DM Radio
Today’s Speakers #DataScienceNow
Revolution Confidential
3
© Dov Harrington, CC By-2.0
http://www.flickr.com/photos/idovermani/4110546683/
Revolution Confidential
Statistician Data Scientist
Image Baseball (Cricket) HBR Sexiest Job of 21st Century
Mode Reactive Consultative
Works Solo In a team
Inputs Data File, Hypothesis A Business Problem
Data Pre-prepared, clean Distributed, messy, unstructured
Data Size Kilobytes Gigabytes
Tools SAS, Mainframe R, Python, awk, Hadoop, Linux,
…
Nouns Tables Data Visualizations
Focus Inference (why) Prediction (what)
Output Report Data App / Data Product
Latency Weeks Seconds
Stars G.E.P Box
Trevor Hastie
Hilary Mason
Nate Silver
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 4
Revolution Confidential
Statistician Data Scientist
Image Baseball (Cricket) HBR Sexiest Job of 21st Century
Mode Reactive Consultative
Works Solo In a team
Inputs Data File, Hypothesis A Business Problem
Data Pre-prepared, clean Distributed, messy, unstructured
Data Size Kilobytes Gigabytes
Tools SAS, Mainframe R, Python, awk, Hadoop, Linux,
…
Nouns Tables Data Visualizations
Focus Inference (why) Prediction (what)
Output Report Data App / Data Product
Latency Weeks Seconds
Stars G.E.P Box
Trevor Hastie
Hilary Mason
Nate Silver
5
Revolution Confidential
Three Essential Skills of Data Scientists
6
Drew Conway
http://www.dataists.com/2010/09/the-data-science-venn-diagram/
Data Integration
Mashups
Applications
Models
Visualization
Predictions
Uncertainty
Problems
Data Sources
Credibility
Effective
Data
Applications
Revolution ConfidentialData Science to the
Rescue!
Revolution Confidential
Business
Intelligence Data Science
Perspective Looking backwards Looking forwards
Actions Slice and Dice Interact
Expertise Business User Data Scientist
Data Warehoused, Siloed Distributed, real-time
Scope Unlimited Specific business question
Questions What happened? What will happen?
What if?
Output Table Answer
Applicability Historic, possible
confounding factors
Future, correcting for influences
Tools SAP, Cognos,
Microstrategy, SAS
Revolution R Enterprise
QlikView, Tableau, Jaspersoft
Hot or not? So 1997 Transformational
8
What is Data Science?
By Carla Gentry
Data Scientist
Analytical-Solution
Data Science is….
• The term "data science" has existed for over
thirty years – first mentioned by Peter Naur in
1960 but more recently it has gained a lot of
attention!
Data Science can be broken down into
4 main areas of expertise.
• Data knowledge
– design & structure
• Programming
– SAS, R, SQL, NO-SQL
• Analytics
– Insight
• Communication
– Tell the story
Data Knowledge: Part analyst - part IT
• What kind of servers do you own?
- Servers vs. Mainframe
• What kind of load can the server handle?
- Iterations matter
– Why ask this?
Programming – Pick a language and
use it wisely
• Efficiency is KING!
- Why?
• Number of iterations & complex algorithms or
scripts. Snowflakes vs. Star schema?
-Design is import but why?
• Key things: normalize, index, there is more to
Data Science than just analytics.
How can I learn about Data Science?
• For those who want to invest their time and
talent there are resources.
• College Courses
• Online
• Webinars
• Blogs
Data Science and Data Scientists
Now
Gregory Piatetsky, @kdnuggets
Analytics, Big Data,
Data Mining, and Data Science Resources
15© KDnuggets 2013
• Statistics, 1830-
• Data mining, 1980-
• Knowledge Discovery in
Data (KDD), 1989-
• Business Analytics, 1997-
• Predictive Analytics, 2002-
• Data Analytics,2011-
• Data Science, 2011-
• …?
© KDnuggets 2013 16
Same Core Idea:
Finding Useful
Patterns in Data
Different
Emphasis
Trends from Google Ngrams (1800-2008)
and Google Trends (2005-2013)
Big Data > Data Mining >
Business Analytics > Predictive Analytics
> Data Science
17© KDnuggets 2013
Big Data
Google Trends search, Jan 2008- Apr 2013, Worldwide
Data mining
© KDnuggets 2013 18
Data Scientist – sexiest job of the 21st Century (???)
say Thomas H. Davenport and D.J. Patil, (HBR, Oct 2012)
“Data Scientist”
Fastest growing term on
www.kdnuggets.com/jobs
1% of jobs in 2010
4% of jobs in 2011
19% of jobs in 2012
23% of jobs in 2013
19© KDnuggets 2013
Data Mining
Big Data
Data Scientist
“Data mining” jobs are more common, but
“Big Data” jobs are surging much faster than “Data Scientist”
“Statistician” jobs are steady, but not growing
Statistician
• Big Data can produce better predictions, but expect limited
improvement
• Example: Netflix prize took 3 years to improve prediction of
movie ranking from 0.95 stars to 0.86
• Inherent randomness in human behavior
• Data Science should help separate hype from reality
• Biggest effects from Big Data are from new platforms, like
Google, Facebook, LinkedIn; Personalized medicine
• However, Big Data makes privacy online almost possible
Gregory Piatetsky-Shapiro, Big Data Hype and Reality, Harvard
Business Review blog, Oct 2012
© KDnuggets 2013 20
© 2013 KDnuggets
21
Gartner Hype Cycle
Big Data
Gartner VP says Big Data
is Falling into the Trough
of Disillusionment, Jan
2013
© 2013 Kalido I Kalido Confidential I May 14, 201322
Q&A
Gregory Piatetsky
Editor, Kdnuggets
co-founder KDD and ACM
SIGKDD
@kdnuggets
David Smith
Data Scientist
Revolution Analytics
@revodavid
Carla Gentry
Data Scientist
Analytical Solution
@data_nerd
Darren Peirce
CTO
Kalido
@DarrenPeirce
Eric Kavanagh
DM Radio Host
Information Management
Magazine’s DM Radio
@eric_kavanagh
© 2010 Kalido I Kalido Confidential I May 14, 201323
Summers Sessions: Two Tracks For YOU
Series Kickoff
May 14: Data Scientist: Your must-have
business investment now.
(30 Minute Learning Sessions)
May 28 Rapid Data Integration
tools and methods
June 4 Harmonizing Data for the
Warehouse
June 11 Rapid Iteration Methodology
Using Information Models
Series Kickoff
June 25: Find your data warehouse’s hidden
costs before they find you.
(30 Minute Learning Sessions)
July 2 The real cost per release cycle
July 9 Automate to reduce operating costs
July 16 Reduce tool cost
July 23 Scale drives cost reductions
Agile Information Foundation
for the Data Scientist
TCO: Find data warehouse
costs before they find you.
Visit get.kalido.com/summer-series to register

More Related Content

What's hot

Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Datafbeckett1
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformSavita Yadav
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big DataMatthew Dennis
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in GovernmentDeepak Ramanathan
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryJongwook Woo
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive AnalysisJongwook Woo
 
Big Data Marketing Analytics
Big Data Marketing AnalyticsBig Data Marketing Analytics
Big Data Marketing AnalyticsAkash Tyagi
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningJongwook Woo
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLJongwook Woo
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its TrendsJongwook Woo
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 

What's hot (20)

Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Data
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart Factory
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
Big Data Marketing Analytics
Big Data Marketing AnalyticsBig Data Marketing Analytics
Big Data Marketing Analytics
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep Learning
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its Trends
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 

Similar to Data Scientists: Your Must-Have Business Investment

1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptxarpit206900
 
Satyam open analytics nyc
Satyam open analytics nycSatyam open analytics nyc
Satyam open analytics nycOpen Analytics
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data Blueprint
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data ScienceGanes Kesari
 
The Business Value of Big Data
The Business Value of Big DataThe Business Value of Big Data
The Business Value of Big DataClark Boyd
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
Graph Databases – Benefits and Risks
Graph Databases – Benefits and RisksGraph Databases – Benefits and Risks
Graph Databases – Benefits and RisksDATAVERSITY
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data scienceThinkful
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDATAVERSITY
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCTJ Stalcup
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 

Similar to Data Scientists: Your Must-Have Business Investment (20)

20191106 brasil it 2
20191106 brasil it 220191106 brasil it 2
20191106 brasil it 2
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Satyam open analytics nyc
Satyam open analytics nycSatyam open analytics nyc
Satyam open analytics nyc
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data Science
 
The Business Value of Big Data
The Business Value of Big DataThe Business Value of Big Data
The Business Value of Big Data
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
Graph Databases – Benefits and Risks
Graph Databases – Benefits and RisksGraph Databases – Benefits and Risks
Graph Databases – Benefits and Risks
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DC
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 

More from Kalido

"Incrementality" - Scaling up affordability
"Incrementality" - Scaling up affordability"Incrementality" - Scaling up affordability
"Incrementality" - Scaling up affordabilityKalido
 
Reducing Tool Costs
Reducing Tool CostsReducing Tool Costs
Reducing Tool CostsKalido
 
Automation to Reduce Operating Costs
Automation to Reduce Operating CostsAutomation to Reduce Operating Costs
Automation to Reduce Operating CostsKalido
 
Reducing Cost Per Release Cycle
Reducing Cost Per Release CycleReducing Cost Per Release Cycle
Reducing Cost Per Release CycleKalido
 
TCO: An Achilles Heel of Hand-Built Data Warehouses
TCO: An Achilles Heel of Hand-Built Data WarehousesTCO: An Achilles Heel of Hand-Built Data Warehouses
TCO: An Achilles Heel of Hand-Built Data WarehousesKalido
 
Rapid Iteration Methodology Using Modeling
Rapid Iteration Methodology Using ModelingRapid Iteration Methodology Using Modeling
Rapid Iteration Methodology Using ModelingKalido
 
Harmonizing Data for the Warehouse
Harmonizing Data for the WarehouseHarmonizing Data for the Warehouse
Harmonizing Data for the WarehouseKalido
 
Rapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & MethodologyRapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & MethodologyKalido
 
The Value of an Agile Warehouse in Omni-Channel
The Value of an Agile Warehouse in Omni-ChannelThe Value of an Agile Warehouse in Omni-Channel
The Value of an Agile Warehouse in Omni-ChannelKalido
 
Omni-Channel: The Future of Retail
Omni-Channel: The Future of RetailOmni-Channel: The Future of Retail
Omni-Channel: The Future of RetailKalido
 
What's the Half-Life of Your Data?
What's the Half-Life of Your Data?What's the Half-Life of Your Data?
What's the Half-Life of Your Data?Kalido
 
Driving Business Process Performance Through Data Governance
Driving Business Process Performance Through Data GovernanceDriving Business Process Performance Through Data Governance
Driving Business Process Performance Through Data GovernanceKalido
 
True Drivers of MDM webinar
True Drivers of MDM webinarTrue Drivers of MDM webinar
True Drivers of MDM webinarKalido
 
Building Agile Data Warehouses with Ralph Hughes
Building Agile Data Warehouses with Ralph HughesBuilding Agile Data Warehouses with Ralph Hughes
Building Agile Data Warehouses with Ralph HughesKalido
 
The Road to Agility Starts with BI
The Road to Agility Starts with BIThe Road to Agility Starts with BI
The Road to Agility Starts with BIKalido
 

More from Kalido (15)

"Incrementality" - Scaling up affordability
"Incrementality" - Scaling up affordability"Incrementality" - Scaling up affordability
"Incrementality" - Scaling up affordability
 
Reducing Tool Costs
Reducing Tool CostsReducing Tool Costs
Reducing Tool Costs
 
Automation to Reduce Operating Costs
Automation to Reduce Operating CostsAutomation to Reduce Operating Costs
Automation to Reduce Operating Costs
 
Reducing Cost Per Release Cycle
Reducing Cost Per Release CycleReducing Cost Per Release Cycle
Reducing Cost Per Release Cycle
 
TCO: An Achilles Heel of Hand-Built Data Warehouses
TCO: An Achilles Heel of Hand-Built Data WarehousesTCO: An Achilles Heel of Hand-Built Data Warehouses
TCO: An Achilles Heel of Hand-Built Data Warehouses
 
Rapid Iteration Methodology Using Modeling
Rapid Iteration Methodology Using ModelingRapid Iteration Methodology Using Modeling
Rapid Iteration Methodology Using Modeling
 
Harmonizing Data for the Warehouse
Harmonizing Data for the WarehouseHarmonizing Data for the Warehouse
Harmonizing Data for the Warehouse
 
Rapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & MethodologyRapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & Methodology
 
The Value of an Agile Warehouse in Omni-Channel
The Value of an Agile Warehouse in Omni-ChannelThe Value of an Agile Warehouse in Omni-Channel
The Value of an Agile Warehouse in Omni-Channel
 
Omni-Channel: The Future of Retail
Omni-Channel: The Future of RetailOmni-Channel: The Future of Retail
Omni-Channel: The Future of Retail
 
What's the Half-Life of Your Data?
What's the Half-Life of Your Data?What's the Half-Life of Your Data?
What's the Half-Life of Your Data?
 
Driving Business Process Performance Through Data Governance
Driving Business Process Performance Through Data GovernanceDriving Business Process Performance Through Data Governance
Driving Business Process Performance Through Data Governance
 
True Drivers of MDM webinar
True Drivers of MDM webinarTrue Drivers of MDM webinar
True Drivers of MDM webinar
 
Building Agile Data Warehouses with Ralph Hughes
Building Agile Data Warehouses with Ralph HughesBuilding Agile Data Warehouses with Ralph Hughes
Building Agile Data Warehouses with Ralph Hughes
 
The Road to Agility Starts with BI
The Road to Agility Starts with BIThe Road to Agility Starts with BI
The Road to Agility Starts with BI
 

Recently uploaded

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Data Scientists: Your Must-Have Business Investment

  • 1. 1 May 14, 2013© Kalido I Kalido Confidential May 14, 2013 Data Scientist: Your Must-Have Business Investment NOW
  • 2. 2 May 14, 2013© Kalido I Kalido Confidential May 14, 2013 Gregory Piatetsky Editor, Kdnuggets co-founder KDD and ACM SIGKDD David Smith Data Scientist Revolution Analytics Carla Gentry Data Scientist Analytical Solution Darren Peirce CTO Kalido Eric Kavanagh DM Radio Host Information Management Magazine’s DM Radio Today’s Speakers #DataScienceNow
  • 3. Revolution Confidential 3 © Dov Harrington, CC By-2.0 http://www.flickr.com/photos/idovermani/4110546683/
  • 4. Revolution Confidential Statistician Data Scientist Image Baseball (Cricket) HBR Sexiest Job of 21st Century Mode Reactive Consultative Works Solo In a team Inputs Data File, Hypothesis A Business Problem Data Pre-prepared, clean Distributed, messy, unstructured Data Size Kilobytes Gigabytes Tools SAS, Mainframe R, Python, awk, Hadoop, Linux, … Nouns Tables Data Visualizations Focus Inference (why) Prediction (what) Output Report Data App / Data Product Latency Weeks Seconds Stars G.E.P Box Trevor Hastie Hilary Mason Nate Silver http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 4
  • 5. Revolution Confidential Statistician Data Scientist Image Baseball (Cricket) HBR Sexiest Job of 21st Century Mode Reactive Consultative Works Solo In a team Inputs Data File, Hypothesis A Business Problem Data Pre-prepared, clean Distributed, messy, unstructured Data Size Kilobytes Gigabytes Tools SAS, Mainframe R, Python, awk, Hadoop, Linux, … Nouns Tables Data Visualizations Focus Inference (why) Prediction (what) Output Report Data App / Data Product Latency Weeks Seconds Stars G.E.P Box Trevor Hastie Hilary Mason Nate Silver 5
  • 6. Revolution Confidential Three Essential Skills of Data Scientists 6 Drew Conway http://www.dataists.com/2010/09/the-data-science-venn-diagram/ Data Integration Mashups Applications Models Visualization Predictions Uncertainty Problems Data Sources Credibility Effective Data Applications
  • 8. Revolution Confidential Business Intelligence Data Science Perspective Looking backwards Looking forwards Actions Slice and Dice Interact Expertise Business User Data Scientist Data Warehoused, Siloed Distributed, real-time Scope Unlimited Specific business question Questions What happened? What will happen? What if? Output Table Answer Applicability Historic, possible confounding factors Future, correcting for influences Tools SAP, Cognos, Microstrategy, SAS Revolution R Enterprise QlikView, Tableau, Jaspersoft Hot or not? So 1997 Transformational 8
  • 9. What is Data Science? By Carla Gentry Data Scientist Analytical-Solution
  • 10. Data Science is…. • The term "data science" has existed for over thirty years – first mentioned by Peter Naur in 1960 but more recently it has gained a lot of attention!
  • 11. Data Science can be broken down into 4 main areas of expertise. • Data knowledge – design & structure • Programming – SAS, R, SQL, NO-SQL • Analytics – Insight • Communication – Tell the story
  • 12. Data Knowledge: Part analyst - part IT • What kind of servers do you own? - Servers vs. Mainframe • What kind of load can the server handle? - Iterations matter – Why ask this?
  • 13. Programming – Pick a language and use it wisely • Efficiency is KING! - Why? • Number of iterations & complex algorithms or scripts. Snowflakes vs. Star schema? -Design is import but why? • Key things: normalize, index, there is more to Data Science than just analytics.
  • 14. How can I learn about Data Science? • For those who want to invest their time and talent there are resources. • College Courses • Online • Webinars • Blogs
  • 15. Data Science and Data Scientists Now Gregory Piatetsky, @kdnuggets Analytics, Big Data, Data Mining, and Data Science Resources 15© KDnuggets 2013
  • 16. • Statistics, 1830- • Data mining, 1980- • Knowledge Discovery in Data (KDD), 1989- • Business Analytics, 1997- • Predictive Analytics, 2002- • Data Analytics,2011- • Data Science, 2011- • …? © KDnuggets 2013 16 Same Core Idea: Finding Useful Patterns in Data Different Emphasis Trends from Google Ngrams (1800-2008) and Google Trends (2005-2013)
  • 17. Big Data > Data Mining > Business Analytics > Predictive Analytics > Data Science 17© KDnuggets 2013 Big Data Google Trends search, Jan 2008- Apr 2013, Worldwide Data mining
  • 18. © KDnuggets 2013 18 Data Scientist – sexiest job of the 21st Century (???) say Thomas H. Davenport and D.J. Patil, (HBR, Oct 2012) “Data Scientist” Fastest growing term on www.kdnuggets.com/jobs 1% of jobs in 2010 4% of jobs in 2011 19% of jobs in 2012 23% of jobs in 2013
  • 19. 19© KDnuggets 2013 Data Mining Big Data Data Scientist “Data mining” jobs are more common, but “Big Data” jobs are surging much faster than “Data Scientist” “Statistician” jobs are steady, but not growing Statistician
  • 20. • Big Data can produce better predictions, but expect limited improvement • Example: Netflix prize took 3 years to improve prediction of movie ranking from 0.95 stars to 0.86 • Inherent randomness in human behavior • Data Science should help separate hype from reality • Biggest effects from Big Data are from new platforms, like Google, Facebook, LinkedIn; Personalized medicine • However, Big Data makes privacy online almost possible Gregory Piatetsky-Shapiro, Big Data Hype and Reality, Harvard Business Review blog, Oct 2012 © KDnuggets 2013 20
  • 21. © 2013 KDnuggets 21 Gartner Hype Cycle Big Data Gartner VP says Big Data is Falling into the Trough of Disillusionment, Jan 2013
  • 22. © 2013 Kalido I Kalido Confidential I May 14, 201322 Q&A Gregory Piatetsky Editor, Kdnuggets co-founder KDD and ACM SIGKDD @kdnuggets David Smith Data Scientist Revolution Analytics @revodavid Carla Gentry Data Scientist Analytical Solution @data_nerd Darren Peirce CTO Kalido @DarrenPeirce Eric Kavanagh DM Radio Host Information Management Magazine’s DM Radio @eric_kavanagh
  • 23. © 2010 Kalido I Kalido Confidential I May 14, 201323 Summers Sessions: Two Tracks For YOU Series Kickoff May 14: Data Scientist: Your must-have business investment now. (30 Minute Learning Sessions) May 28 Rapid Data Integration tools and methods June 4 Harmonizing Data for the Warehouse June 11 Rapid Iteration Methodology Using Information Models Series Kickoff June 25: Find your data warehouse’s hidden costs before they find you. (30 Minute Learning Sessions) July 2 The real cost per release cycle July 9 Automate to reduce operating costs July 16 Reduce tool cost July 23 Scale drives cost reductions Agile Information Foundation for the Data Scientist TCO: Find data warehouse costs before they find you. Visit get.kalido.com/summer-series to register

Editor's Notes

  1. Most data scientists wear hipster glasses and T-shirts with ironic, geeky quotes.
  2. http://www.hilarymason.com/media_and_press/im-in-glamour-magazine/
  3. http://www.hilarymason.com/media_and_press/im-in-glamour-magazine/Ivan Fellegi, Chief Statistician of Canada and SSC President for 1981http://www.flickr.com/photos/ssc_liaison/431047111/
  4. Churn: bestalgorithms for predicting churn have lift of 5-7 – 5-7 times better than random. Behavioral advertising: 2-3% CTR – 10 times better than random