Big data (overview) - (MOSG)

Big data
- Overview -
2016/03/04
Mulodo Vietnam Co., Ltd.

Types
Science :
LHC: Large Hadron Collider
Medical :
Gene analysis
Market (IT?):
Business use
What is “Big data”?

Types
Science :
LHC: Large Hadron Collider
Medical :
Gene analysis
What is “Big data”?
Market (IT?):
Business use

History of Data processing
50’s
- “BI : Business Intelligence” (1958)
80’s
- “DSS : Decision support system” (80’s)
- “SQL86” (1986)
- “Knowledge Discovery in Databases” (1989)
- “BI (Redefinition)” (1989)
90’s
- “Data Warehouse” (1990)
- “OLAP: online analytical processing” (1993)
- “Improvement of computing power” (90’s)
- “Price reduction of storage” (90’s)
- “Data Mining” (1996)

History of Data processing
2000’s
- “Spread of The Internet” (00’s)
- ‘Google: Big data stack 1.0’ (00’s)
- “MapReduce framework” (2004)
- “Independence of Hadoop project from Nutch” (2006)
- “Amazon: S3” (2006)
- “Explosive prosperity of EC” (00’s)
2010’s
- “Big data” in ‘The Economist(UK)’ (2010)
- “Google: BigQuery” (2010)
- “fluentd” (2011)
- “Amazon: Redshift” (2012)
- “DMP: data management platform” (10’s)
- “Google: Big data stack 2.0-3.0” (10’s)
- “Apache crunch, Implara, Prest,...” (10’s)

80's 90's 00's 10's
Let's look back on the history
of Big data
(Especially storage and query engine)

80's 90's 00's 10's
SQL(86)
Easy to use,
structured/ruled.
independent from storage

80's 90's 00's 10's
Map
Reduce
SQL(86)
big data
stack/GFS
use HUGE data
batch like process
(for huge logs)
But,
Proprietary
Too Huge to treat
on usual RDBMS

80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
HBase
Open source
products!
We need source.
We love freedom.

80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
pig
Easy to useE-commerce
require huge
data analysis.
M/R is too heavy to
use......

80's 90's 00's 10's
Map
Reduce
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
pigHive
SQL -> (M/R) -> Result
Pig
Original language <=> (M/R)

80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
pig
Google announced
Dremel
for interactive
analysis
of huge data
BigQuery
We want analyze huge
data interactively.

80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
pig
BigQuery
Dremel
1. divide SQL for shards
2. process them in parallel.
It’s Not a wrapper of M/R,
but process SQL super
parallel.
(ie. full scan for each query with
thousands servers w/o index)

80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
Presto
Impala
pig
Open source
products!
We need source.
We love freedom.

80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
Dremel
Presto
Impala
pig
Add social circumstances
on this figure.

80's 90's 00's 10's
Map
Reduce
big data
stack/CFS
BigQuery
SQL(86)
Hadoop
big data
stack/GFS
Hive
HBase
HDFS
Dremel
Presto
Impala
pig
Redshift
S3
DWH
DataMining
BI BIDSS
DMP
computing power
Improvement of
Storage
Price reduction of Spread of The Internet
Explosive prosperity of EC

Many requests
Many solutions...

Many requests
Many solutions...
But you can think which solution is
better for your project. (I hope)

How to use Big data
A) How to aggregate data?
- huge amount of data
- too high frequency data
B) How to maintenance data?
- Data will increase....
- Query engine cost, Storage cost.
- Data check cost
C) How to analyze data? (what for?)
- UI / UX
— Understanding of business requirements

How to aggregate data
<Libevent shock> 
parallel -> event driven.
* similar to “parallel -> USB”
Fluentd
- Async
- (Puseudo) realtime <-> Periodic Batch
other 
- logstash
- Lamda and Kinesis (AWS)
- ...

How to analyze data
UI / UX
<solution set for log monitering>
* ELK : logstash + Elastic search + Kibaa
* Fluentd + Norikra + GrowthForecast

Next :
* Trying some storage
* Trying to build system design
* Diving to some solutions

Big data (overview) - (MOSG)

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (6)

Dernier

Dernier (20)

Big data (overview) - (MOSG)