Rough history of tech issue related big data.
Mulodo Open Study Group (MOSG) @Ho chi minh, Vietnam
http://www.meetup.com/Open-Study-Group-Saigon/events/229243903/
3. Types
Science :
LHC: Large Hadron Collider
Medical :
Gene analysis
Market (IT?):
Business use
What is “Big data”?
4. Types
Science :
LHC: Large Hadron Collider
Medical :
Gene analysis
What is “Big data”?
Market (IT?):
Business use
5. History of Data processing
50’s
- “BI : Business Intelligence” (1958)
80’s
- “DSS : Decision support system” (80’s)
- “SQL86” (1986)
- “Knowledge Discovery in Databases” (1989)
- “BI (Redefinition)” (1989)
90’s
- “Data Warehouse” (1990)
- “OLAP: online analytical processing” (1993)
- “Improvement of computing power” (90’s)
- “Price reduction of storage” (90’s)
- “Data Mining” (1996)
6. History of Data processing
2000’s
- “Spread of The Internet” (00’s)
- ‘Google: Big data stack 1.0’ (00’s)
- “MapReduce framework” (2004)
- “Independence of Hadoop project from Nutch” (2006)
- “Amazon: S3” (2006)
- “Explosive prosperity of EC” (00’s)
2010’s
- “Big data” in ‘The Economist(UK)’ (2010)
- “Google: BigQuery” (2010)
- “fluentd” (2011)
- “Amazon: Redshift” (2012)
- “DMP: data management platform” (10’s)
- “Google: Big data stack 2.0-3.0” (10’s)
- “Apache crunch, Implara, Prest,...” (10’s)
7. 80's 90's 00's 10's
Let's look back on the history
of Big data
(Especially storage and query engine)
8. 80's 90's 00's 10's
SQL(86)
Easy to use,
structured/ruled.
independent from storage
9. 80's 90's 00's 10's
Map
Reduce
SQL(86)
big data
stack/GFS
use HUGE data
batch like process
(for huge logs)
But,
Proprietary
Too Huge to treat
on usual RDBMS
10. 80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
HBase
Open source
products!
We need source.
We love freedom.
11. 80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
pig
Easy to useE-commerce
require huge
data analysis.
M/R is too heavy to
use......
12. 80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
pigHive
SQL -> (M/R) -> Result
Pig
Original language <=> (M/R)
13. 80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
pig
Google announced
Dremel
for interactive
analysis
of huge data
BigQuery
We want analyze huge
data interactively.
14. 80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
pig
BigQuery
Dremel
1. divide SQL for shards
2. process them in parallel.
It’s Not a wrapper of M/R,
but process SQL super
parallel.
(ie. full scan for each query with
thousands servers w/o index)
15. 80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
Presto
Impala
pig
Open source
products!
We need source.
We love freedom.
16. 80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
Presto
Impala
pig
Add social circumstances
on this figure.
17. 80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
HDFS
Dremel
Presto
Impala
pig
Redshift
S3
DWH
DataMining
BI BIDSS
DMP
computing power
Improvement of
Storage
Price reduction of Spread of The Internet
Explosive prosperity of EC
20. How to use Big data
A) How to aggregate data?
- huge amount of data
- too high frequency data
B) How to maintenance data?
- Data will increase....
- Query engine cost, Storage cost.
- Data check cost
C) How to analyze data? (what for?)
- UI / UX
— Understanding of business requirements
21. How to aggregate data
<Libevent shock>
parallel -> event driven.
* similar to “parallel -> USB”
Fluentd
- Async
- (Puseudo) realtime <-> Periodic Batch
other
- logstash
- Lamda and Kinesis (AWS)
- ...
22. How to analyze data
UI / UX
<solution set for log monitering>
* ELK : logstash + Elastic search + Kibaa
* Fluentd + Norikra + GrowthForecast
23. Next :
* Trying some storage
* Trying to build system design
* Diving to some solutions