SlideShare a Scribd company logo
1 of 70
Download to read offline
NICK HALSTEAD, FOUNDER
DATASIFT, @NIK
Big Data
“Myths and Legends”
#BDW13
Thursday, 25 April 13
#BDW13
BIG DATASOCIAL DATA +
TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS
#DATASIFT
Thursday, 25 April 13
#BDW13
BIG DATASOCIAL DATA +
TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS1.5 BILLION ITEMS DAY
#DATASIFT
Thursday, 25 April 13
#BDW13
BIG DATASOCIAL DATA +
TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS1.5 BILLION ITEMS DAY
1.5 PETABYTES OF STORAGE
#DATASIFT
Thursday, 25 April 13
#BDW13
BIG DATASOCIAL DATA +
TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS1.5 BILLION ITEMS DAY
1.5 PETABYTES OF STORAGE
5000 CPU HADOOP CLUSTER #DATASIFT
Thursday, 25 April 13
Big Data
“Myths and Legends”
#BD13
Thursday, 25 April 13
BIG DATA PERCEPTION
#GOOGLE
I THOUGHT I WOULD ASK GOOGLE....
Thursday, 25 April 13
BIG DATA PERCEPTION
#GOOGLE
I THOUGHT I WOULD ASK GOOGLE....
Thursday, 25 April 13
BIG DATA PERCEPTION
#GOOGLE
I THOUGHT I WOULD ASK GOOGLE....
Thursday, 25 April 13
BIG DATA VENDOR “MYTHS”
Thursday, 25 April 13
Thursday, 25 April 13
BIG DATA VENDOR “MYTHS”
Thursday, 25 April 13
#BDW13
Thursday, 25 April 13
1. YOU MUST BUY ALL OF THIS (for one job!)
#BDW13
Thursday, 25 April 13
2. HOW BIG IS “BIG”
Thursday, 25 April 13
#BDW13
Thursday, 25 April 13
20 PETABYTES IN EACH SEARCH INDEX REBULD (this was 2 years ago)
#BDW13
Thursday, 25 April 13
20 PETABYTES IN EACH SEARCH INDEX REBULD (this was 2 years ago)
900,000 SERVERS
#BDW13
Thursday, 25 April 13
#BDW13
Thursday, 25 April 13
#BDW13
3.2 BILLION LIKES AND COMMENTS PER DAY
Thursday, 25 April 13
#BDW13
3.2 BILLION LIKES AND COMMENTS PER DAY
OVER HALF A PETABYTE … EVERY 24 HOURS
Thursday, 25 April 13
#BDW13 #HADRON
Thursday, 25 April 13
150 MILLION SENSORS DELIVERING DATA 40 MILLION TIMES PER SECOND
#BDW13 #HADRON
Thursday, 25 April 13
150 MILLION SENSORS DELIVERING DATA 40 MILLION TIMES PER SECOND
10’s OF PETABYTES PER YEAR
#BDW13 #HADRON
Thursday, 25 April 13
A TYPICAL COMPANY
Thursday, 25 April 13
A TYPICAL COMPANY
100 EMPLOYEES
Thursday, 25 April 13
A TYPICAL COMPANY
100 EMPLOYEES
10,000 CUSTOMERS
Thursday, 25 April 13
A TYPICAL COMPANY
100 EMPLOYEES
10,000 CUSTOMERS
25 DATABASES (customers, transactions, etc)
Thursday, 25 April 13
A TYPICAL COMPANY
100 EMPLOYEES
10,000 CUSTOMERS
1 MILLION TRANSACTIONS RECORDS
25 DATABASES (customers, transactions, etc)
Thursday, 25 April 13
A TYPICAL COMPANY
100 EMPLOYEES
10,000 CUSTOMERS
1 MILLION TRANSACTIONS RECORDS
5,000 BYTES PER TRANSACTION
25 DATABASES (customers, transactions, etc)
Thursday, 25 April 13
A TYPICAL COMPANY
100 EMPLOYEES
10,000 CUSTOMERS
1 MILLION TRANSACTIONS RECORDS
5,000 BYTES PER TRANSACTION
25 DATABASES (customers, transactions, etc)
=4 GIGABYTES (for largest database)
Thursday, 25 April 13
A TYPICAL COMPANY
100 EMPLOYEES
10,000 CUSTOMERS
1 MILLION TRANSACTIONS RECORDS
5,000 BYTES PER TRANSACTION
25 DATABASES (customers, transactions, etc)
=4 GIGABYTES (for largest database)
=20 GIGABYTES (for ALL company data)
Thursday, 25 April 13
A TYPICAL HARDDRIVE
2000 GIGABYTES (2TB)
Thursday, 25 April 13
A TYPICAL HARDDRIVE
2000 GIGABYTES (2TB)
4000 GIGABYTES (4TB)
Thursday, 25 April 13
3. YOU NEED *LOTS* OF DATA SCIENTISTS
#DILBERT#BDW13
Thursday, 25 April 13
3. YOU NEED *LOTS* OF DATA SCIENTISTS
#DILBERT#BDW13
Thursday, 25 April 13
4. HOW BIG DATA IS USED
#BDW13
Thursday, 25 April 13
4. HOW BIG DATA IS USED
#BDW13
BANKING
Thursday, 25 April 13
4. HOW BIG DATA IS USED
#BDW13
BANKING
COMMUNICATIONS
Thursday, 25 April 13
4. HOW BIG DATA IS USED
#BDW13
BANKING
COMMUNICATIONS
GOVERNMENT
Thursday, 25 April 13
4. HOW BIG DATA IS USED
#BDW13
Thursday, 25 April 13
4. HOW BIG DATA IS USED
#BDW13
WEB LOGS 51%
Thursday, 25 April 13
4. HOW BIG DATA IS USED
#BDW13
WEB LOGS 51%
CLICK STREAM 35%
Thursday, 25 April 13
5. HADOOP GONE BAD
+
SQL
#BDW13 #HADOOPGONEBAD
Thursday, 25 April 13
RDBM - RELATIONAL DATABASE
#BDW13
Thursday, 25 April 13
RDBM - RELATIONAL DATABASE
NEEDS TO BE PRE-DEFINED
#BDW13
Thursday, 25 April 13
RDBM - RELATIONAL DATABASE
NEEDS TO BE PRE-DEFINED
REQUIRES INDEX TO PERFORM
#BDW13
Thursday, 25 April 13
RDBM - RELATIONAL DATABASE
NEEDS TO BE PRE-DEFINED
REQUIRES INDEX TO PERFORM
QUERIES ARE CONSTRAINED
#BDW13
Thursday, 25 April 13
MAP REDUCE
#MAPREDUCE#BDW13
Thursday, 25 April 13
MAP REDUCE
PROCESS CLOSE TO THE DATA
#MAPREDUCE#BDW13
Thursday, 25 April 13
MAP REDUCE
PROCESS CLOSE TO THE DATA
PARALLEL EXECUTION
#MAPREDUCE#BDW13
Thursday, 25 April 13
MAP REDUCE
PROCESS CLOSE TO THE DATA
PARALLEL EXECUTION
ANY TYPE OF ANALYSIS
#MAPREDUCE#BDW13
Thursday, 25 April 13
MAP REDUCE
PROCESS CLOSE TO THE DATA
PARALLEL EXECUTION
ANY TYPE OF ANALYSIS
HIDES DETAILS OFFAULT TOLERANCE, LOCALITY
AND LOAD BALANCING
#MAPREDUCE#BDW13
Thursday, 25 April 13
BIG DATA SCHEMA #NOSQL
HBASE
COLUMNS FILES
#BDW13
Thursday, 25 April 13
(QUICK ASIDE)
#SIDEBARThursday, 25 April 13
GOOGLE FILE SYSTEM (GFS) GOOGLE MAPREDUCE (GMR).
GOOGLE STARTED ALL THIS....
Thursday, 25 April 13
GOOGLE DREMEL
http://bit.ly/mS8QxX#BDW13
Thursday, 25 April 13
GOOGLE DREMEL
INTERACTIVE ANALYSIS
http://bit.ly/mS8QxX#BDW13
Thursday, 25 April 13
GOOGLE DREMEL
INTERACTIVE ANALYSIS
SCALE UP TO 10,000 SERVERS
http://bit.ly/mS8QxX#BDW13
Thursday, 25 April 13
GOOGLE DREMEL
INTERACTIVE ANALYSIS
SCALE UP TO 10,000 SERVERS
COLUMN STORAGE
http://bit.ly/mS8QxX#BDW13
Thursday, 25 April 13
OpenDremel
GOOGLE BIG QUERY
Google
Big Query
#BDW13
Thursday, 25 April 13
http://research.google.com/archive/spanner.html
GOOGLE SPANNER
#SPANNER #NEWSQL
Thursday, 25 April 13
http://research.google.com/archive/spanner.html
GOOGLE SPANNER
#SPANNER #NEWSQL
Thursday, 25 April 13
http://research.google.com/archive/spanner.html
GOOGLE SPANNER
#SPANNER #NEWSQL
RELATIONAL DATABASE
Thursday, 25 April 13
http://research.google.com/archive/spanner.html
GOOGLE SPANNER
#SPANNER #NEWSQL
RELATIONAL DATABASE
GLOBALLY DISTRIBUTED
Thursday, 25 April 13
http://research.google.com/archive/spanner.html
GOOGLE SPANNER
#SPANNER #NEWSQL
RELATIONAL DATABASE
GLOBALLY DISTRIBUTED
USE GPS / TRUETIME
Thursday, 25 April 13
http://research.google.com/archive/spanner.html
GOOGLE SPANNER
#SPANNER #NEWSQL
RELATIONAL DATABASE
GLOBALLY DISTRIBUTED
USE GPS / TRUETIME
NO OPEN SOURCE EQUIVALENT
Thursday, 25 April 13
Thursday, 25 April 13
BIG DATA IS THE NEW OIL
Thursday, 25 April 13
NICK HALSTEAD, FOUNDER
HTTP://DATASIFT.COM
WE ARE HIRING!!
Thursday, 25 April 13

More Related Content

More from Nick Halstead (6)

DataSift Historics in 5 Steps
DataSift Historics in 5 StepsDataSift Historics in 5 Steps
DataSift Historics in 5 Steps
 
DataSift API
DataSift APIDataSift API
DataSift API
 
Twitter and Mediasift Partnership
Twitter and Mediasift PartnershipTwitter and Mediasift Partnership
Twitter and Mediasift Partnership
 
Have I Got The Future Of News For You
Have I Got The Future Of News For YouHave I Got The Future Of News For You
Have I Got The Future Of News For You
 
A guide to Twitter Tools & Jargon
A guide to Twitter Tools & JargonA guide to Twitter Tools & Jargon
A guide to Twitter Tools & Jargon
 
Building on Twitter
Building on TwitterBuilding on Twitter
Building on Twitter
 

Recently uploaded

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Big Data Week - Myths and Legends