SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Big data
- Overview -
2016/03/04
Mulodo Vietnam Co., Ltd.
“Big data”
Types
Science :
LHC: Large Hadron Collider
Medical :
Gene analysis
Market (IT?):
Business use
What is “Big data”?
Types
Science :
LHC: Large Hadron Collider
Medical :
Gene analysis
What is “Big data”?
Market (IT?):
Business use
History of Data processing
50’s
- “BI : Business Intelligence” (1958)
80’s
- “DSS : Decision support system” (80’s)
- “SQL86” (1986)
- “Knowledge Discovery in Databases” (1989)
- “BI (Redefinition)” (1989)
90’s
- “Data Warehouse” (1990)
- “OLAP: online analytical processing” (1993)
- “Improvement of computing power” (90’s)
- “Price reduction of storage” (90’s)
- “Data Mining” (1996)
History of Data processing
2000’s
- “Spread of The Internet” (00’s)
- ‘Google: Big data stack 1.0’ (00’s)
- “MapReduce framework” (2004)
- “Independence of Hadoop project from Nutch” (2006)
- “Amazon: S3” (2006)
- “Explosive prosperity of EC” (00’s)
2010’s
- “Big data” in ‘The Economist(UK)’ (2010)
- “Google: BigQuery” (2010)
- “fluentd” (2011)
- “Amazon: Redshift” (2012)
- “DMP: data management platform” (10’s)
- “Google: Big data stack 2.0-3.0” (10’s)
- “Apache crunch, Implara, Prest,...” (10’s)
80's 90's 00's 10's
Let's look back on the history
of Big data
(Especially storage and query engine)
80's 90's 00's 10's
SQL(86)
Easy to use,
structured/ruled.
independent from storage
80's 90's 00's 10's
Map
Reduce
SQL(86)
big data
stack/GFS
use HUGE data
batch like process
(for huge logs)
But,
Proprietary
Too Huge to treat
on usual RDBMS
80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
HBase
Open source
products!
We need source.
We love freedom.
80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
pig
Easy to useE-commerce
require huge
data analysis.
M/R is too heavy to
use......
80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
pigHive
SQL -> (M/R) -> Result
Pig
Original language <=> (M/R)
80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
pig
Google announced
Dremel
for interactive
analysis
of huge data
BigQuery
We want analyze huge
data interactively.
80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
pig
BigQuery
Dremel
1. divide SQL for shards
2. process them in parallel.
It’s Not a wrapper of M/R,
but process SQL super
parallel.
(ie. full scan for each query with
thousands servers w/o index)
80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
Presto
Impala
pig
Open source
products!
We need source.
We love freedom.
80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
Presto
Impala
pig
Add social circumstances
on this figure.
80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
HDFS
Dremel
Presto
Impala
pig
Redshift
S3
DWH
DataMining
BI BIDSS
DMP
computing power
Improvement of
Storage
Price reduction of Spread of The Internet
Explosive prosperity of EC
Many requests
Many solutions...
Many requests
Many solutions...
But you can think which solution is
better for your project. (I hope)
How to use Big data
A) How to aggregate data?
- huge amount of data
- too high frequency data
B) How to maintenance data?
- Data will increase....
- Query engine cost, Storage cost.
- Data check cost
C) How to analyze data? (what for?)
- UI / UX
— Understanding of business requirements
How to aggregate data
<Libevent shock>

parallel -> event driven.
* similar to “parallel -> USB”
Fluentd
- Async
- (Puseudo) realtime <-> Periodic Batch
other

- logstash
- Lamda and Kinesis (AWS)
- ...
How to analyze data
UI / UX
<solution set for log monitering>
* ELK : logstash + Elastic search + Kibaa
* Fluentd + Norikra + GrowthForecast
Next :
* Trying some storage
* Trying to build system design
* Diving to some solutions

Contenu connexe

En vedette

Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)Soshi Nemoto
 
1000speakers3 Nemo
1000speakers3 Nemo1000speakers3 Nemo
1000speakers3 NemoSoshi NEMOTO
 
CI : the first_step: Auto Testing with CircleCI - (MOSG)
CI : the first_step: Auto Testing with CircleCI - (MOSG)CI : the first_step: Auto Testing with CircleCI - (MOSG)
CI : the first_step: Auto Testing with CircleCI - (MOSG)Soshi Nemoto
 
Fabric workshop(1) - (MOSG)
Fabric workshop(1) - (MOSG)Fabric workshop(1) - (MOSG)
Fabric workshop(1) - (MOSG)Soshi Nemoto
 
DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)Soshi Nemoto
 

En vedette (6)

Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)
 
1000speakers3 Nemo
1000speakers3 Nemo1000speakers3 Nemo
1000speakers3 Nemo
 
CI : the first_step: Auto Testing with CircleCI - (MOSG)
CI : the first_step: Auto Testing with CircleCI - (MOSG)CI : the first_step: Auto Testing with CircleCI - (MOSG)
CI : the first_step: Auto Testing with CircleCI - (MOSG)
 
Php Lt 20080316
Php Lt 20080316Php Lt 20080316
Php Lt 20080316
 
Fabric workshop(1) - (MOSG)
Fabric workshop(1) - (MOSG)Fabric workshop(1) - (MOSG)
Fabric workshop(1) - (MOSG)
 
DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)
 

Dernier

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Big data (overview) - (MOSG)

  • 1. Big data - Overview - 2016/03/04 Mulodo Vietnam Co., Ltd.
  • 3. Types Science : LHC: Large Hadron Collider Medical : Gene analysis Market (IT?): Business use What is “Big data”?
  • 4. Types Science : LHC: Large Hadron Collider Medical : Gene analysis What is “Big data”? Market (IT?): Business use
  • 5. History of Data processing 50’s - “BI : Business Intelligence” (1958) 80’s - “DSS : Decision support system” (80’s) - “SQL86” (1986) - “Knowledge Discovery in Databases” (1989) - “BI (Redefinition)” (1989) 90’s - “Data Warehouse” (1990) - “OLAP: online analytical processing” (1993) - “Improvement of computing power” (90’s) - “Price reduction of storage” (90’s) - “Data Mining” (1996)
  • 6. History of Data processing 2000’s - “Spread of The Internet” (00’s) - ‘Google: Big data stack 1.0’ (00’s) - “MapReduce framework” (2004) - “Independence of Hadoop project from Nutch” (2006) - “Amazon: S3” (2006) - “Explosive prosperity of EC” (00’s) 2010’s - “Big data” in ‘The Economist(UK)’ (2010) - “Google: BigQuery” (2010) - “fluentd” (2011) - “Amazon: Redshift” (2012) - “DMP: data management platform” (10’s) - “Google: Big data stack 2.0-3.0” (10’s) - “Apache crunch, Implara, Prest,...” (10’s)
  • 7. 80's 90's 00's 10's Let's look back on the history of Big data (Especially storage and query engine)
  • 8. 80's 90's 00's 10's SQL(86) Easy to use, structured/ruled. independent from storage
  • 9. 80's 90's 00's 10's Map Reduce SQL(86) big data stack/GFS use HUGE data batch like process (for huge logs) But, Proprietary Too Huge to treat on usual RDBMS
  • 10. 80's 90's 00's 10's Map Reduce SQL(86) Hadoop big data stack/GFS HBase Open source products! We need source. We love freedom.
  • 11. 80's 90's 00's 10's Map Reduce SQL(86) Hadoop big data stack/GFS Hive HBase pig Easy to useE-commerce require huge data analysis. M/R is too heavy to use......
  • 12. 80's 90's 00's 10's Map Reduce SQL(86) Hadoop big data stack/GFS Hive HBase pigHive SQL -> (M/R) -> Result Pig Original language <=> (M/R)
  • 13. 80's 90's 00's 10's Map Reduce big data stack/CFS SQL(86) Hadoop big data stack/GFS Hive HBase Dremel pig Google announced Dremel for interactive analysis of huge data BigQuery We want analyze huge data interactively.
  • 14. 80's 90's 00's 10's Map Reduce big data stack/CFS SQL(86) Hadoop big data stack/GFS Hive HBase Dremel pig BigQuery Dremel 1. divide SQL for shards 2. process them in parallel. It’s Not a wrapper of M/R, but process SQL super parallel. (ie. full scan for each query with thousands servers w/o index)
  • 15. 80's 90's 00's 10's Map Reduce big data stack/CFS BigQuery SQL(86) Hadoop big data stack/GFS Hive HBase Dremel Presto Impala pig Open source products! We need source. We love freedom.
  • 16. 80's 90's 00's 10's Map Reduce big data stack/CFS BigQuery SQL(86) Hadoop big data stack/GFS Hive HBase Dremel Presto Impala pig Add social circumstances on this figure.
  • 17. 80's 90's 00's 10's Map Reduce big data stack/CFS BigQuery SQL(86) Hadoop big data stack/GFS Hive HBase HDFS Dremel Presto Impala pig Redshift S3 DWH DataMining BI BIDSS DMP computing power Improvement of Storage Price reduction of Spread of The Internet Explosive prosperity of EC
  • 19. Many requests Many solutions... But you can think which solution is better for your project. (I hope)
  • 20. How to use Big data A) How to aggregate data? - huge amount of data - too high frequency data B) How to maintenance data? - Data will increase.... - Query engine cost, Storage cost. - Data check cost C) How to analyze data? (what for?) - UI / UX — Understanding of business requirements
  • 21. How to aggregate data <Libevent shock>
 parallel -> event driven. * similar to “parallel -> USB” Fluentd - Async - (Puseudo) realtime <-> Periodic Batch other
 - logstash - Lamda and Kinesis (AWS) - ...
  • 22. How to analyze data UI / UX <solution set for log monitering> * ELK : logstash + Elastic search + Kibaa * Fluentd + Norikra + GrowthForecast
  • 23. Next : * Trying some storage * Trying to build system design * Diving to some solutions