SlideShare a Scribd company logo
1 of 28
Download to read offline
Making Sense out of Big Data
Peter Morgan - July 2013
Table of Contents
1. Definition and Overview
2. Data Sources
3. Databases
4. Data Analytics
Glossary
References
2
1. Definition and Overview
3
What is big data?
More and more data is being collected and stored each day
4
Four main components
• Data
– Structured and unstructured
• Databases
– Proprietary and open source
• Query language
– Querying the database
• Analytics
– Analysing the data
5
How big is big?
• Large data sets
– Greater than 1,000 Terabytes? (1 Petabyte)
– 1,000,000 Terabytes? (1 Exabyte)
• Excel 2013 can have 1,048,576 rows by 16,384 columns
– About 10 Gigabyte of data
• Only going to get bigger
– 90% of all data produced in the past two years !
– Rate is increasing
• Recall
– Giga = 10⁹
– Tera = 10¹²
– Peta = 10¹⁵
– Exa = 10¹⁸
6
Big Data Evolution
7
2. Data Sources
8
Where does the data come from?
• Science – particle, astrophysics
• Industry – oil, finance, telecom
– Actually all verticals
• Social – Facebook, LinkedIn, Twitter
• Medicine – genome, neuroscience
• Government – census, education, police
• Sports – statistics
• Environment – weather, sensors
9
Unstructured Data
• 80% of data is unstructured
• NoSQL
• Document based
– Documents
– Texts, tweets
– Emails
– Machine logs
– Blogs
– Web pages
– Photos
– Videos (YouTube)
• Graph based
– Social media sites
– Facebook has 1.1billions users (Microstrategy, July 27, 2013)
10
Why do we need to use big data?
Use in public and private sector to:
• Make faster and more accurate business decisions
• Make accurate predictions
• Gain competitive advantage
• Implement smarter marketing – CRM
• Discover new opportunities
• Enhance Business Intelligence
• Enable fraud detection
• Reduce crime
• Improve scientific research
• Quicken analysis (up to real time)
– Weeks, days  minutes, seconds
11
Big Data Startup - Case Study
• Rocket Fuel
• No. 4 on Forbes' 2013 Most Promising Companies In
America list
• Digital advertising startup
• Screens over 26 billion ads per day
• “Advertising that learns” big data platform
• Distributed planet-scale computing engine
• Hadoop implementation
• Founders from Yahoo!, Salesforce.com, DoubleClick
• Targeting algorithms use lifestyle, purchase intent and
social data
12
Some big statistics
13
3. Databases
14
Database Timeline
15
Relational databases – SQL
Proprietary
• Oracle DB
• IBM DB2
• Microsoft SQL
• SAP
• EMC
Open Source
• MySQL
• PostgresQL
• Drizzle
• Firebird
16
Non-relational databases – NoSQL
• BigTable – Google
• Cassandra – Facebook
• Eucalyptus – Amazon
• Hbase – Hadoop
• MongoDB – 10Gen
• Neo4j - NeoTechnologies
• CouchDB - Apache
• CouchBase
• Riak - Basho
• Redis - Pivotal
17
4. Big Data Analytics
18
Big Data Analytics - Incumbents
• Oracle – Exadata, Exalytics
• Microsoft – HDInsight, xVelocity
• IBM – Netezza, Cognos, BigInsights
• SAP – HANA, Business Objects
• EMC – Pivotal (Greenplum)
• HP – Vertica, HAVEn
• All run on Hadoop
19
Big Data Analytics – Pure Plays
• Pure plays – definition:
– Been around more than 20 years
– Purely data analytic companies
• Teradata - Aster
• SAS
• Microstrategy
20
Big Data Analytics – New Entrants
• Hortonworks
• Cloudera
• MapR
• Acunu
• Pentaho
• Tableau
• Talend
• Splunk
21
(Some of) IBM’s Big Data Acquisitions
• Algorithmics
– Oct 2011, $400million
• OpenPages
– Oct 2010, ?
• Netezza
– Sept 2010, $1.7billion
• SPSS
– Jan 2010, $1.2billion
• Cognos
– Jan 2008, $4.9billion
• About $10billion in four years
http://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_IBM
22
Big Data Science Tools
• Hadoop
• NoSQL
• MapReduce
• R
• Matlab
• Python
• Statistics
23
Big Data Hadoop Stack
• Hadoop is the de facto big data operating system
• Developed from Google and Yahoo! (2005)
• It is distributed, open source and managed by Apache
24
Analytic Technologies
• A/B testing
• Genetic algorithms
• Machine learning
• Natural language
processing
• Neural networks
• Pattern recognition
• Anomaly detection
• Decision tree
• Predictive modeling
• Regression testing
• Sentiment analysis
• Signal processing
• Simulations
• Time series analysis
• Visualization
• Multivariate analysis
• Text analytics
25
Glossary
• OLTP = On Line Transactional Processing
• OLAP = On Line Analytic Processing
• ODBC = Open DataBase Connectivity
• IMDB = In Memory DataBase
• CRUD = Create, Read, Update, Delete
• ETL = Extract, Transform and Load
• CDO = Chief Data Officer
• NLP = Natural Language Processing
• GQL = Graph Query Language
• AaaS = Analytics as a Service
• EDW = Enterprise Data Warehouse
26
References
• Microstrategy website, 27 July, 2013, Michael Saylor
Presentation at Microstrategy World 2013,
http://www.microstrategy.com/
• Teradata website www.teradata.com
• Wikipedia http://en.wikipedia.org/wiki/
• Google images www.google.co.uk
• IBM website www.ibm.com
• Youtube www.youtube.com
• Hadoop www.hortonworks.com
27
Any Questions?
28

More Related Content

What's hot

Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
Anand572211
 

What's hot (20)

Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014
 
Big data
Big dataBig data
Big data
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big data
Big dataBig data
Big data
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Big data
Big dataBig data
Big data
 

Similar to Big data – An Introduction, July 2013

Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 

Similar to Big data – An Introduction, July 2013 (20)

BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 

More from Peter Morgan

More from Peter Morgan (12)

Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019
 
AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024
 
Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018
 
Simulation Hypothesis 2017
Simulation Hypothesis 2017Simulation Hypothesis 2017
Simulation Hypothesis 2017
 
AI Developments Aug 2017
AI Developments Aug 2017AI Developments Aug 2017
AI Developments Aug 2017
 
London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017
 
Robotics Overview 2016
Robotics Overview 2016Robotics Overview 2016
Robotics Overview 2016
 
AI and Blockchain 2017
AI and Blockchain 2017AI and Blockchain 2017
AI and Blockchain 2017
 
AI in Healthcare 2017
AI in Healthcare 2017AI in Healthcare 2017
AI in Healthcare 2017
 
AI Predictions 2017
AI Predictions 2017AI Predictions 2017
AI Predictions 2017
 
AI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYCAI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYC
 
Machine Learning - Where to Next?, May 2015
Machine Learning  - Where to Next?, May 2015Machine Learning  - Where to Next?, May 2015
Machine Learning - Where to Next?, May 2015
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

Big data – An Introduction, July 2013

  • 1. Making Sense out of Big Data Peter Morgan - July 2013
  • 2. Table of Contents 1. Definition and Overview 2. Data Sources 3. Databases 4. Data Analytics Glossary References 2
  • 3. 1. Definition and Overview 3
  • 4. What is big data? More and more data is being collected and stored each day 4
  • 5. Four main components • Data – Structured and unstructured • Databases – Proprietary and open source • Query language – Querying the database • Analytics – Analysing the data 5
  • 6. How big is big? • Large data sets – Greater than 1,000 Terabytes? (1 Petabyte) – 1,000,000 Terabytes? (1 Exabyte) • Excel 2013 can have 1,048,576 rows by 16,384 columns – About 10 Gigabyte of data • Only going to get bigger – 90% of all data produced in the past two years ! – Rate is increasing • Recall – Giga = 10⁹ – Tera = 10¹² – Peta = 10¹⁵ – Exa = 10¹⁸ 6
  • 9. Where does the data come from? • Science – particle, astrophysics • Industry – oil, finance, telecom – Actually all verticals • Social – Facebook, LinkedIn, Twitter • Medicine – genome, neuroscience • Government – census, education, police • Sports – statistics • Environment – weather, sensors 9
  • 10. Unstructured Data • 80% of data is unstructured • NoSQL • Document based – Documents – Texts, tweets – Emails – Machine logs – Blogs – Web pages – Photos – Videos (YouTube) • Graph based – Social media sites – Facebook has 1.1billions users (Microstrategy, July 27, 2013) 10
  • 11. Why do we need to use big data? Use in public and private sector to: • Make faster and more accurate business decisions • Make accurate predictions • Gain competitive advantage • Implement smarter marketing – CRM • Discover new opportunities • Enhance Business Intelligence • Enable fraud detection • Reduce crime • Improve scientific research • Quicken analysis (up to real time) – Weeks, days  minutes, seconds 11
  • 12. Big Data Startup - Case Study • Rocket Fuel • No. 4 on Forbes' 2013 Most Promising Companies In America list • Digital advertising startup • Screens over 26 billion ads per day • “Advertising that learns” big data platform • Distributed planet-scale computing engine • Hadoop implementation • Founders from Yahoo!, Salesforce.com, DoubleClick • Targeting algorithms use lifestyle, purchase intent and social data 12
  • 16. Relational databases – SQL Proprietary • Oracle DB • IBM DB2 • Microsoft SQL • SAP • EMC Open Source • MySQL • PostgresQL • Drizzle • Firebird 16
  • 17. Non-relational databases – NoSQL • BigTable – Google • Cassandra – Facebook • Eucalyptus – Amazon • Hbase – Hadoop • MongoDB – 10Gen • Neo4j - NeoTechnologies • CouchDB - Apache • CouchBase • Riak - Basho • Redis - Pivotal 17
  • 18. 4. Big Data Analytics 18
  • 19. Big Data Analytics - Incumbents • Oracle – Exadata, Exalytics • Microsoft – HDInsight, xVelocity • IBM – Netezza, Cognos, BigInsights • SAP – HANA, Business Objects • EMC – Pivotal (Greenplum) • HP – Vertica, HAVEn • All run on Hadoop 19
  • 20. Big Data Analytics – Pure Plays • Pure plays – definition: – Been around more than 20 years – Purely data analytic companies • Teradata - Aster • SAS • Microstrategy 20
  • 21. Big Data Analytics – New Entrants • Hortonworks • Cloudera • MapR • Acunu • Pentaho • Tableau • Talend • Splunk 21
  • 22. (Some of) IBM’s Big Data Acquisitions • Algorithmics – Oct 2011, $400million • OpenPages – Oct 2010, ? • Netezza – Sept 2010, $1.7billion • SPSS – Jan 2010, $1.2billion • Cognos – Jan 2008, $4.9billion • About $10billion in four years http://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_IBM 22
  • 23. Big Data Science Tools • Hadoop • NoSQL • MapReduce • R • Matlab • Python • Statistics 23
  • 24. Big Data Hadoop Stack • Hadoop is the de facto big data operating system • Developed from Google and Yahoo! (2005) • It is distributed, open source and managed by Apache 24
  • 25. Analytic Technologies • A/B testing • Genetic algorithms • Machine learning • Natural language processing • Neural networks • Pattern recognition • Anomaly detection • Decision tree • Predictive modeling • Regression testing • Sentiment analysis • Signal processing • Simulations • Time series analysis • Visualization • Multivariate analysis • Text analytics 25
  • 26. Glossary • OLTP = On Line Transactional Processing • OLAP = On Line Analytic Processing • ODBC = Open DataBase Connectivity • IMDB = In Memory DataBase • CRUD = Create, Read, Update, Delete • ETL = Extract, Transform and Load • CDO = Chief Data Officer • NLP = Natural Language Processing • GQL = Graph Query Language • AaaS = Analytics as a Service • EDW = Enterprise Data Warehouse 26
  • 27. References • Microstrategy website, 27 July, 2013, Michael Saylor Presentation at Microstrategy World 2013, http://www.microstrategy.com/ • Teradata website www.teradata.com • Wikipedia http://en.wikipedia.org/wiki/ • Google images www.google.co.uk • IBM website www.ibm.com • Youtube www.youtube.com • Hadoop www.hortonworks.com 27