SlideShare a Scribd company logo
1 of 11
Download to read offline
Data Science
Shankar Radhakrishnan
Cognizant
History…
• Questions first, data later	

• Data model first, data processing later	

• Size first, project second, react overtime	

• Focus on accuracy, assume little	

• Importance to completeness and comprehensiveness	

• Expose raw data to decision makers	

• Provide insights but those that are not actionable	

• Bound by constraints (Procurement, Process, Build Insights,
Interaction)
What’s Changed ?
• Medium to participate is vast	

• Mode to reach expanded	

• Data types are vast and voluminous	

• Noise is huge, yet accepted	

• Urgency precedes accuracy	

• Guidance is better than completeness	

• Cost to store and process has fallen (and still falling)	

• More ways and means to process data at scale
Speaking of Data
• Volume - Data at rest	

• Variety - Data in many forms	

• Velocity - Data in motion	

• Veracity - Data in doubt
Data Science
“ Data Science is the art of turning data into actions ”
This is accomplished through creation of data
products, that provide actionable information

without exposing underlying data or analytics
“ Scientific study of the creation, validation
and transformation of data to create meaning ”
http://www.datascienceassn.org/code-of-conduct.html
While we are on definitions…
Data Mining
“ Non-trivial process of identifying valid, novel, potentially
useful and understandable structures or patterns or models or
relationships in data to enable data driven decision making ”
Statistics
“ Science of learning from data or of 

making sense out of data ”
Science of Data Science
• Analyze and understand data that’s available	

• Find and acquire what more is needed	

• Discover what’s not known from data	

• Predict and build “actionable insights” from data	

• Build data products that has “immediate” business impact	

• Make it easy for business to “use”	

• Help decision making to drive “business value”
Data Science Toolkit
Python	

R	

Java	

Textwrangler	

SQL	

C, C++
Mahout	

NLTK	

OpenNLP	

GPText	

SciPy	

Pandas	

scikit-leam
Hadoop	

Hive	

HAWQ	

PL/Python	

PL/R	

PL/Java	

Proprietary
D3.js	

Gephi	

Graphviz	

R	

Tableau	

Proprietary	

Languages Libraries Database Visualization
Approach, Techniques
• Classification	

• Filtering	

• Structure	

• Clustering	

• Disambiguation	

• De-duplication	

• Normalization	

• Correlation	

• Prediction
• Discover	

• Reason	

• Model	

• Deploy	

• Visualize	

• Recommend	

• Predict	

• Explore
• Machine Learning	

• Decision Trees	

• Bayesian Networks	

• Logistic Regression	

• Monte Carlo Methods	

• Component Analysis	

• Fuzzy Modeling	

• Neural Networks	

• Genetic Algorithms
Step Process Technology
Data Science In Action
• Improving User Experience	

• Multi-device event stream analysis	

• Intrusion detection, avoidance	

• Collocation analysis from 

cell-phone towers	

• Text Mining, Bandwidth Throttling
• Network Performance &
Optimization	

• Mobile User Location Analytics	

• Customer Churn Prevention	

• Social Media and Sentiment
Analysis	

• Location Based Initiatives
Thanks !

More Related Content

What's hot (20)

Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Big data 101
Big data 101Big data 101
Big data 101
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Big data
Big dataBig data
Big data
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Big data 101
Big data 101Big data 101
Big data 101
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Bigdata
BigdataBigdata
Bigdata
 

Similar to Data science

The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management Rachel Di Cresce
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product OverviewStuart Shulman
 
Pivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationPivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationVMware Tanzu
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Sri Ambati
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your auditsCaseWare IDEA
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big DataRaymond Yu
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
Trendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sourcesTrendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sourcesMarieke Guy
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 

Similar to Data science (20)

TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product Overview
 
Pivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationPivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital Transformation
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your audits
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big Data
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Trendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sourcesTrendspotting: Helping you make sense of large information sources
Trendspotting: Helping you make sense of large information sources
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Data science

  • 2. History… • Questions first, data later • Data model first, data processing later • Size first, project second, react overtime • Focus on accuracy, assume little • Importance to completeness and comprehensiveness • Expose raw data to decision makers • Provide insights but those that are not actionable • Bound by constraints (Procurement, Process, Build Insights, Interaction)
  • 3. What’s Changed ? • Medium to participate is vast • Mode to reach expanded • Data types are vast and voluminous • Noise is huge, yet accepted • Urgency precedes accuracy • Guidance is better than completeness • Cost to store and process has fallen (and still falling) • More ways and means to process data at scale
  • 4. Speaking of Data • Volume - Data at rest • Variety - Data in many forms • Velocity - Data in motion • Veracity - Data in doubt
  • 5. Data Science “ Data Science is the art of turning data into actions ” This is accomplished through creation of data products, that provide actionable information
 without exposing underlying data or analytics “ Scientific study of the creation, validation and transformation of data to create meaning ” http://www.datascienceassn.org/code-of-conduct.html
  • 6. While we are on definitions… Data Mining “ Non-trivial process of identifying valid, novel, potentially useful and understandable structures or patterns or models or relationships in data to enable data driven decision making ” Statistics “ Science of learning from data or of 
 making sense out of data ”
  • 7. Science of Data Science • Analyze and understand data that’s available • Find and acquire what more is needed • Discover what’s not known from data • Predict and build “actionable insights” from data • Build data products that has “immediate” business impact • Make it easy for business to “use” • Help decision making to drive “business value”
  • 8. Data Science Toolkit Python R Java Textwrangler SQL C, C++ Mahout NLTK OpenNLP GPText SciPy Pandas scikit-leam Hadoop Hive HAWQ PL/Python PL/R PL/Java Proprietary D3.js Gephi Graphviz R Tableau Proprietary Languages Libraries Database Visualization
  • 9. Approach, Techniques • Classification • Filtering • Structure • Clustering • Disambiguation • De-duplication • Normalization • Correlation • Prediction • Discover • Reason • Model • Deploy • Visualize • Recommend • Predict • Explore • Machine Learning • Decision Trees • Bayesian Networks • Logistic Regression • Monte Carlo Methods • Component Analysis • Fuzzy Modeling • Neural Networks • Genetic Algorithms Step Process Technology
  • 10. Data Science In Action • Improving User Experience • Multi-device event stream analysis • Intrusion detection, avoidance • Collocation analysis from 
 cell-phone towers • Text Mining, Bandwidth Throttling • Network Performance & Optimization • Mobile User Location Analytics • Customer Churn Prevention • Social Media and Sentiment Analysis • Location Based Initiatives