SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
H T 
Technologies 
of 
2014
HOST: 
Eric 
Kavanagh
THIS 
YEAR 
is…
D 
ata 
Science 
ž Considered 
a 
highly 
specialized 
field 
ž Perceived 
as 
an 
expensive 
position 
to 
fill 
given 
the 
required 
skill 
set 
ž Typically 
involves, 
among 
other 
things, 
data 
preparation 
for 
advanced 
analytics
ANALYST: 
John 
Myers 
Research 
Director, 
Enterprise 
Management 
Associates 
ANALYST: 
Robin 
Bloor 
Chief 
Analyst, 
The 
Bloor 
Group 
GUEST: 
Chuck 
Yarbrough 
Director 
of 
Big 
Data 
Product 
Marketing, 
Pentaho 
THE 
LINE 
UP 
GUEST: 
Mark 
Kromer 
Big 
Data 
Analytics 
Product 
Manager, 
Pentaho
INTRODUCING 
John 
Myers
Today’s Presenters 
John Myers, Research Director, EMA 
John has over 10 years of experience working in areas related to business analytics 
in professional services consulting and product development roles. Additionally, John 
helps organizations solve their business analytics problems, whether they relate to 
operational platforms – such as customer care or billing – or applied analytical 
applications – such as revenue assurance or fraud management. 
Slide 8 © 2014 Enterprise Management Associates, Inc.
How are companies using Data Science? 
Slide 9 © 2014 Enterprise Management Associates, Inc.
Data Science Defined 
Data Science is the study of the generalizable extraction 
of business or domain knowledge from data. It 
incorporates varying elements and builds on techniques and 
theories from many fields, including signal processing, 
mathematics, probability models, machine learning, 
statistical learning, computer programming, data engineering, 
pattern recognition and learning, visualization, 
uncertainty modeling, data warehousing, and high 
performance computing. Data Science is not 
restricted to Big Data. Although the fact that data is 
increasing in load, complexity and structure 
makes Big Data an important aspect of Data Science. 
Slide 10 © 2014 Enterprise Management Associates, Inc.
Vision of a “Data Scientist” 
Slide 11 © 2014 Enterprise Management Associates, Inc.
Few and far between… 
Slide 12 © 2014 Enterprise Management Associates, Inc.
Who’s really performing Data Science… 
Slide 13 © 2014 Enterprise Management Associates, Inc.
Many more Business Analysts… 
Slide 14 © 2014 Enterprise Management Associates, Inc.
EMA Hybrid Data Ecosystem 
Slide 15 © 2014 Enterprise Management Associates, Inc.
Empowering Data Scientists AND Business 
Analysts to perform Data Science 
Slide 16 © 2014 Enterprise Management Associates, Inc.
INTRODUCING 
Robin 
Bloor
The Data 
Science 
Dance 
Robin Bloor, Ph.D.
Take Note! 
You can know more 
about a business 
from its data than 
by any other 
means
The Driving Force of Insight 
and 
OPTIMIZATION? 
Foresight 
INSIGHT 
Hindsight Oversight
What is a Data Scientist? 
u Project manager 
u Qualified statistician 
u Domain Business expert 
u Experienced data 
architect 
u Software engineer 
(IT’S A TEAM)
A Process, Not an Activity 
u Data Analytics is a multi-disciplinary 
end-to-end 
process 
u Until recently it was a 
walled-garden. But the 
walls were torn down by: 
• Data availability 
• Scalable technology 
• Open source tools
The Impact of Machine Learning 
Machine learning and processing 
power (parallelism) will CHANGE the 
data analysis process 
Machine learning 
AUTOMATES “data science” 
to some degree
The Data Analysis Budget 
u Data Analysis is 
BUSINESS R&D 
u The focus is on 
business process 
u The outcome of successful 
R&D is a CHANGED PROCESS 
u Think of manufacturing for 
a useful example
INTRODUCING 
Chuck 
Yarbrough 
& 
Mark 
Kromer
DATA SCIENCE PACK 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 26 +1 (866) 660-7555 
CHUCK YARBROUGH 
DIRECTOR, BIG DATA PRODUCT MARKETING 
@CYARBROUGH 
MARK KROMER 
BIG DATA ANALYTICS PRODUCT MANAGER 
@KROMERBIGDATA 
JUNE 18, 2014 
Pentaho’s Hot Topic
The strength of Pentaho 
lies in the power of combination 
Data 
integration 
Big data +Any data 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 27 +1 (866) 660-7555 
Business 
+analytics 
The IT 
department 
Lines of 
+business 
Any data. Any environment. Any analytics.
OUR VISION 
The New Reality: 
Powerful yet simplified analytics for all users 
Billing 
Social 
Media 
Location 
Customer 
Web 
Network 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 28 +1 (866) 660-7555 
Analytics 
ANY Analytics 
• Reports 
• Dashboards 
• Visualizations 
• Discovery 
• Predictive 
• Any role 
Existing & New Data 
Infrastructure & 
Processes 
ANY Environment 
• Data warehouses 
• Data marts 
• Stack vendors 
• Cloud 
• Embedded 
ANY Data 
• Relational 
• Operational 
• Big Data 
• Data sources not 
yet anticipated
Pentaho 5.0 Architected for the Future 
Simplified analytics experience for all users 
Simplified 
Analytics 
Experience 
Blended 
Big Data 
Enterprise 
Big Data 
Integration 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 29 +1 (866) 660-7555
A Spectrum of Big Data Use Cases 
WHAT THE MARKET IS DEPLOYING TODAY AND PLANNING FOR TOMORROW 
Entry 
Transform 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 3300 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide ++11 ((886666)) 666600--77555555 
Advanced 
Optimize 
Data 
Warehouse 
Op.miza.on 
Streamlined 
Data 
Refinery 
Big 
Data 
Explora.on 
Customer 
360 
Degree 
View 
Harnessing 
Machine 
& 
Sensor 
Data 
Next 
Genera.on 
Applica.ons 
Internal 
Big 
Data 
as 
a 
Service 
On-­‐Demand 
Big 
Data 
Blending 
Big 
Data 
Predic.ve 
Analy.cs 
Use Case Complexity 
Business Impact 
Mone.ze 
My 
Data
Pentaho Data Science Pack 
OPERATIONALIZE R AND WEKA, OFFLOAD DATA PREPARATION 
• Allow Data Scientists to focus on analysis 
• Use familiar tools (R, Weka) 
• Leverage a graphical ETL tool to manage 
data preparation 
• Blend Big Data Sources Easily 
• Provide access to data with governance 
• Operationalize the analytic workflow 
• Enable IT to partner with Data Scientists 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 31 +1 (866) 660-7555
What’s in the Pack? 
TOOLS FAMILIAR TO THE DATA SCIENTIST 
• R SCRIPT EXECUTOR 
• Provides access to 5,500+ 
advanced algorithms 
• WEKA FORECASTING 
• Machine learning, time series 
analysis 
• WEKA SCORING 
• Calculates probability values for 
better predictions 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 32 +1 (866) 660-7555 
Analy.c 
Data 
Flows 
PDI 
R/Weka
LEVERAGING THE DATA SCIENCE PACK 
Providing a more complete view for customers 
“…we are now helping clients blend a 360- 
degree view of all equipment data sources for 
early prediction of potential machinery failure.” 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 33 +1 (866) 660-7555 
“There was a gap in the market ….people like 
myself were piecing together solutions to help 
with the data preparation, cleansing and 
orchestration of analytic data sets. The Pentaho 
Data Science Pack fills that gap to operationalize 
the data integration process for advanced and 
predictive analytics ” 
Ken Krooner President at ESRG
“USING WEKA WITH PDI, WE ARE NOW HELPING CLIENTS HAVE A 360- 
DEGREE VIEW OF ALL EQUIPMENT DATA SOURCES TO ENABLE 
CAPABILITIES TO PREDICT EARLY PREDICTION OF POTENTIAL MACHINERY 
FAILURE.” 
Fleet Data via 
Satellite 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 34 +1 (866) 660-7555 
Business User (COO) 
Reporting on Operations 
and Efficiency 
End Users 
Dashboards and Reports 
on Machine Performance 
PDI 
Data 
Marts 
Business 
Analytics 
Server 
Data Scientist 
Data Mining and 
Predictive Data 
Governance 
Local Machine 
and Server Data 
Cross Department 
Operations Data 
PDI 
• Provide 
remote 
and 
onboard 
analy.cs 
for 
mari.me 
fleets 
and 
ships 
• Weka 
with 
PDI, 
to 
help 
clients 
blend 
a 
360-­‐degree 
view
Predictive View of the Customer 
LEVERAGE BLENDED BIG DATA & DATA SCIENCE TO SEIZE OPPORTUNITIES 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 35 +1 (866) 660-7555 
Key 
Considera-ons 
• Requires 
data 
scien.sts 
and 
PhDs 
-­‐ 
expensive 
resources 
• Data 
prep 
for 
predic.ve 
modeling 
can 
be 
labor-­‐intensive 
• Tech 
fit: 
Various 
data 
stores, 
Distributed 
Weka, 
Enterprise 
R 
What 
is 
it? 
• Brings 
mul+-­‐source 
data 
together 
for 
an 
on-­‐demand 
analy+c 
view 
across 
customer 
touch 
points 
• Applies 
predic+ve 
models 
to 
data 
as 
part 
of 
the 
integra+on 
process 
– 
to 
op+mize 
customer-­‐facing 
decisions 
Why 
Do 
It? 
• Recommend 
profitable 
decisions 
for 
front 
line 
teams 
• Automate 
and 
scale 
op-mal 
customer 
interac-ons 
• Boost 
upsell, 
reduce 
churn
Thank You 
JOIN THE CONVERSATION. YOU CAN FIND US ON: 
blog.pentaho.com 
@Pentaho 
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 36 +1 (866) 660-7555 
Facebook.com/Pentaho 
Pentaho Business Analytics
THANK 
YOU! 
The 
Archive 
Trifecta: 
• Inside 
Analysis 
www.insideanalysis.com 
• SlideShare 
www.slideshare.net/InsideAnalysis 
• YouTube 
www.youtube.com/user/BloorGroup

Contenu connexe

En vedette

En vedette (9)

Elementos de Maquina
Elementos de MaquinaElementos de Maquina
Elementos de Maquina
 
Icade - Forte progression des résultats annuels 2016
Icade - Forte progression des résultats annuels 2016Icade - Forte progression des résultats annuels 2016
Icade - Forte progression des résultats annuels 2016
 
Programa de mesures contra la contaminació de l'aire
Programa de mesures contra la contaminació de l'airePrograma de mesures contra la contaminació de l'aire
Programa de mesures contra la contaminació de l'aire
 
Airbus A380
Airbus A380Airbus A380
Airbus A380
 
Top 30 US Accountable Care Orgaizations_Feb, 2017
Top 30 US Accountable Care Orgaizations_Feb, 2017Top 30 US Accountable Care Orgaizations_Feb, 2017
Top 30 US Accountable Care Orgaizations_Feb, 2017
 
Modulo de quimica
Modulo de quimica Modulo de quimica
Modulo de quimica
 
110 preguntas auto evaluacion concurso
110 preguntas auto evaluacion  concurso110 preguntas auto evaluacion  concurso
110 preguntas auto evaluacion concurso
 
REC Solar Customers are Saving Money and the Planet
REC Solar Customers are Saving Money and the PlanetREC Solar Customers are Saving Money and the Planet
REC Solar Customers are Saving Money and the Planet
 
Social Network Prioritization - How to Prioritize Investment in Social Media
Social Network Prioritization - How to Prioritize Investment in Social Media Social Network Prioritization - How to Prioritize Investment in Social Media
Social Network Prioritization - How to Prioritize Investment in Social Media
 

Plus de Inside Analysis

Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 

Plus de Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

The Ultimate Toolkit – Equipping the Data Scientist

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 5. D ata Science ž Considered a highly specialized field ž Perceived as an expensive position to fill given the required skill set ž Typically involves, among other things, data preparation for advanced analytics
  • 6. ANALYST: John Myers Research Director, Enterprise Management Associates ANALYST: Robin Bloor Chief Analyst, The Bloor Group GUEST: Chuck Yarbrough Director of Big Data Product Marketing, Pentaho THE LINE UP GUEST: Mark Kromer Big Data Analytics Product Manager, Pentaho
  • 8. Today’s Presenters John Myers, Research Director, EMA John has over 10 years of experience working in areas related to business analytics in professional services consulting and product development roles. Additionally, John helps organizations solve their business analytics problems, whether they relate to operational platforms – such as customer care or billing – or applied analytical applications – such as revenue assurance or fraud management. Slide 8 © 2014 Enterprise Management Associates, Inc.
  • 9. How are companies using Data Science? Slide 9 © 2014 Enterprise Management Associates, Inc.
  • 10. Data Science Defined Data Science is the study of the generalizable extraction of business or domain knowledge from data. It incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing. Data Science is not restricted to Big Data. Although the fact that data is increasing in load, complexity and structure makes Big Data an important aspect of Data Science. Slide 10 © 2014 Enterprise Management Associates, Inc.
  • 11. Vision of a “Data Scientist” Slide 11 © 2014 Enterprise Management Associates, Inc.
  • 12. Few and far between… Slide 12 © 2014 Enterprise Management Associates, Inc.
  • 13. Who’s really performing Data Science… Slide 13 © 2014 Enterprise Management Associates, Inc.
  • 14. Many more Business Analysts… Slide 14 © 2014 Enterprise Management Associates, Inc.
  • 15. EMA Hybrid Data Ecosystem Slide 15 © 2014 Enterprise Management Associates, Inc.
  • 16. Empowering Data Scientists AND Business Analysts to perform Data Science Slide 16 © 2014 Enterprise Management Associates, Inc.
  • 18. The Data Science Dance Robin Bloor, Ph.D.
  • 19. Take Note! You can know more about a business from its data than by any other means
  • 20. The Driving Force of Insight and OPTIMIZATION? Foresight INSIGHT Hindsight Oversight
  • 21. What is a Data Scientist? u Project manager u Qualified statistician u Domain Business expert u Experienced data architect u Software engineer (IT’S A TEAM)
  • 22. A Process, Not an Activity u Data Analytics is a multi-disciplinary end-to-end process u Until recently it was a walled-garden. But the walls were torn down by: • Data availability • Scalable technology • Open source tools
  • 23. The Impact of Machine Learning Machine learning and processing power (parallelism) will CHANGE the data analysis process Machine learning AUTOMATES “data science” to some degree
  • 24. The Data Analysis Budget u Data Analysis is BUSINESS R&D u The focus is on business process u The outcome of successful R&D is a CHANGED PROCESS u Think of manufacturing for a useful example
  • 26. DATA SCIENCE PACK © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 26 +1 (866) 660-7555 CHUCK YARBROUGH DIRECTOR, BIG DATA PRODUCT MARKETING @CYARBROUGH MARK KROMER BIG DATA ANALYTICS PRODUCT MANAGER @KROMERBIGDATA JUNE 18, 2014 Pentaho’s Hot Topic
  • 27. The strength of Pentaho lies in the power of combination Data integration Big data +Any data © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 27 +1 (866) 660-7555 Business +analytics The IT department Lines of +business Any data. Any environment. Any analytics.
  • 28. OUR VISION The New Reality: Powerful yet simplified analytics for all users Billing Social Media Location Customer Web Network © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 28 +1 (866) 660-7555 Analytics ANY Analytics • Reports • Dashboards • Visualizations • Discovery • Predictive • Any role Existing & New Data Infrastructure & Processes ANY Environment • Data warehouses • Data marts • Stack vendors • Cloud • Embedded ANY Data • Relational • Operational • Big Data • Data sources not yet anticipated
  • 29. Pentaho 5.0 Architected for the Future Simplified analytics experience for all users Simplified Analytics Experience Blended Big Data Enterprise Big Data Integration © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 29 +1 (866) 660-7555
  • 30. A Spectrum of Big Data Use Cases WHAT THE MARKET IS DEPLOYING TODAY AND PLANNING FOR TOMORROW Entry Transform © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 3300 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide ++11 ((886666)) 666600--77555555 Advanced Optimize Data Warehouse Op.miza.on Streamlined Data Refinery Big Data Explora.on Customer 360 Degree View Harnessing Machine & Sensor Data Next Genera.on Applica.ons Internal Big Data as a Service On-­‐Demand Big Data Blending Big Data Predic.ve Analy.cs Use Case Complexity Business Impact Mone.ze My Data
  • 31. Pentaho Data Science Pack OPERATIONALIZE R AND WEKA, OFFLOAD DATA PREPARATION • Allow Data Scientists to focus on analysis • Use familiar tools (R, Weka) • Leverage a graphical ETL tool to manage data preparation • Blend Big Data Sources Easily • Provide access to data with governance • Operationalize the analytic workflow • Enable IT to partner with Data Scientists © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 31 +1 (866) 660-7555
  • 32. What’s in the Pack? TOOLS FAMILIAR TO THE DATA SCIENTIST • R SCRIPT EXECUTOR • Provides access to 5,500+ advanced algorithms • WEKA FORECASTING • Machine learning, time series analysis • WEKA SCORING • Calculates probability values for better predictions © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 32 +1 (866) 660-7555 Analy.c Data Flows PDI R/Weka
  • 33. LEVERAGING THE DATA SCIENCE PACK Providing a more complete view for customers “…we are now helping clients blend a 360- degree view of all equipment data sources for early prediction of potential machinery failure.” © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 33 +1 (866) 660-7555 “There was a gap in the market ….people like myself were piecing together solutions to help with the data preparation, cleansing and orchestration of analytic data sets. The Pentaho Data Science Pack fills that gap to operationalize the data integration process for advanced and predictive analytics ” Ken Krooner President at ESRG
  • 34. “USING WEKA WITH PDI, WE ARE NOW HELPING CLIENTS HAVE A 360- DEGREE VIEW OF ALL EQUIPMENT DATA SOURCES TO ENABLE CAPABILITIES TO PREDICT EARLY PREDICTION OF POTENTIAL MACHINERY FAILURE.” Fleet Data via Satellite © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 34 +1 (866) 660-7555 Business User (COO) Reporting on Operations and Efficiency End Users Dashboards and Reports on Machine Performance PDI Data Marts Business Analytics Server Data Scientist Data Mining and Predictive Data Governance Local Machine and Server Data Cross Department Operations Data PDI • Provide remote and onboard analy.cs for mari.me fleets and ships • Weka with PDI, to help clients blend a 360-­‐degree view
  • 35. Predictive View of the Customer LEVERAGE BLENDED BIG DATA & DATA SCIENCE TO SEIZE OPPORTUNITIES © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 35 +1 (866) 660-7555 Key Considera-ons • Requires data scien.sts and PhDs -­‐ expensive resources • Data prep for predic.ve modeling can be labor-­‐intensive • Tech fit: Various data stores, Distributed Weka, Enterprise R What is it? • Brings mul+-­‐source data together for an on-­‐demand analy+c view across customer touch points • Applies predic+ve models to data as part of the integra+on process – to op+mize customer-­‐facing decisions Why Do It? • Recommend profitable decisions for front line teams • Automate and scale op-mal customer interac-ons • Boost upsell, reduce churn
  • 36. Thank You JOIN THE CONVERSATION. YOU CAN FIND US ON: blog.pentaho.com @Pentaho © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide 36 +1 (866) 660-7555 Facebook.com/Pentaho Pentaho Business Analytics
  • 37.
  • 38. THANK YOU! The Archive Trifecta: • Inside Analysis www.insideanalysis.com • SlideShare www.slideshare.net/InsideAnalysis • YouTube www.youtube.com/user/BloorGroup