Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Big Data Engineering - Top 10 Pragmatics
1. The road lies plain before
me;--'tis a theme
Single and of
determined bounds; …
- Wordsworth, The Prelude
m
pre ss.co
. word ol
bl eclix te Scho
p:/ /dou Gr adua 2
ka r, htt val Post l2 7,201
n a San r, Na Apri
Krish in a
st Sem
hD Gue
00–P
EC40
2. What is
Big
Data ?
Big
Data to
smart
data
Big
Data
Pipeline
o Agenda
o To cover the broad
picture
o Touch upon
instances of the Analytics/ Cloud
technologies Modeling
Analytic
R
Algorithms Architectures
employed
o Of the Big Data Processing - Storage -
domain … Visualization
Hadoop NOSQL
3. Thanks to …
The giants whose
shoulders I am
standing on
Special
Thanks
to:
Peter
Ateshian,
NPS
Prof
Murali
Tummala,
NPS
Shirley
Bailes,O’Reilly
Ed
Dumbill,O’Reilly
Jeff
Barr,AWS
Jenny
Kohr
Chynoweth,AWS
4. Porcelain vs. Plumbing
• The balance is always
interesting …
• This talk has both
• Would be happy to dive deep
into plumbing topics like
Hadoop, R, MongoDB,
Cassandra et al…
5. EBC322
① Volume
o Scale
② Velocity
o Data
change
rate
vs.
decision
window
③ Variety
o Different
sources
&
formats
o Structured
vs.
Unstructured
④ Variability
o Breadth
of
interpreta<on
&
o Depth
of
analy<cs
hKp://doubleclix.wordpress.com/2011/09/13/when-‐is-‐big-‐data-‐really-‐big-‐data/
hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf
hKp://www.quora.com/Business-‐Intelligence/What-‐is-‐the-‐future-‐of-‐business-‐intelligence
6. EBC322
① Volume
o Scale
② Velocity
o Data
change
rate
vs.
decision
window
③ Variety
o Different
sources
&
formats
o Structured
vs.
Unstructured
④ Variability
o Breadth
of
interpreta<on
&
o Depth
of
analy<cs
hKp://doubleclix.wordpress.com/2011/09/13/when-‐is-‐big-‐data-‐really-‐big-‐data/
hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf
hKp://www.quora.com/Business-‐Intelligence/What-‐is-‐the-‐future-‐of-‐business-‐intelligence
7. EBC322
① Volume
o Scale
② Velocity
o Data
change
rate
vs.
decision
window
③ Variety
o Different
sources
&
formats
o Structured
vs.
Unstructured
④ Variability
o Breadth
of
interpreta<on
&
o Depth
of
analy<cs
hKp://doubleclix.wordpress.com/2011/09/13/when-‐is-‐big-‐data-‐really-‐big-‐data/
hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf
hKp://www.quora.com/Business-‐Intelligence/What-‐is-‐the-‐future-‐of-‐business-‐intelligence
8. EBC322
① Volume
o Scale
② Velocity
o Data
change
rate
vs.
decision
window
③ Variety
o Different
sources
&
formats
o Structured
vs.
Unstructured
④ Variability
o Breadth
of
interpreta<on
&
o Depth
of
analy<cs
hKp://doubleclix.wordpress.com/2011/09/13/when-‐is-‐big-‐data-‐really-‐big-‐data/
hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf
hKp://www.quora.com/Business-‐Intelligence/What-‐is-‐the-‐future-‐of-‐business-‐intelligence
9. EBC322
① Volume
o Scale
② Velocity
o Data
change
rate
vs.
decision
window
③ Variety
o Different
sources
&
formats
o Structured
vs.
Unstructured
④ Variability
o Breadth
of
interpreta<on
&
o Depth
of
analy<cs
⑤ Contextual
o Dynamic
variability
o RecommendaWon
⑥ Connectedness
hKp://doubleclix.wordpress.com/2011/09/13/when-‐is-‐big-‐data-‐really-‐big-‐data/
hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf
10. • “…
they
didn’t
need
a
genius,
…
but
build
the
world’s
most
impressive
dileKante
…
baKling
the
efficient
human
mind
with
spectacular
flamboyant
inefficiency”
–
Final
Jeopardy
by
Stephen
Baker
• 15
TB
memory,
across
90
IBM
760
servers,
in
10
racks
• 1
TB
of
dataset
• 200
Million
pages
processed
by
Hadoop
• This
is
a
good
example
of
Connected
data
– Contextual
w/
variability
– Breath
of
interpretaWon
– AnalyWcs
depth
hKp://doubleclix.wordpress.com/2011/03/01/the-‐educaWon-‐of-‐a-‐machine-‐%E2%80%93-‐review-‐of-‐book-‐%E2%80%9Cfinal-‐jeopardy
%E2%80%9D-‐by-‐stephen-‐baker/
hKp://doubleclix.wordpress.com/2011/02/17/watson-‐at-‐jeopardy-‐a-‐race-‐of-‐machines/
12. Ref:h&p:goo.gl/Mm83k
Infer-ability
Model
Internal
dashboards,
Hand
Tableau
Context
coded
Programs,
Connectedness
R,
Mahout,
…
SQL,
Variety
BI
Tools,
Hadoop,
Pig,
Hive,
Variability
SQL
.NET
Dryad,
NOSQL,
Logs,
Various
Velocity
Scribe,
HDFS,
XML,
other
tools
Flume,
=iles,
…
Volume
Storm,
Hadoop
…
Decomplexify! Contextualize! Network! Reason! Infer!
13. Twitter
§ 200 million tweets/day
§ Peak 10,000/second
§ How would you handle the fire
hose for social network analytics
?
AWS – 900 Billion objects!
Zynga
§ “Analytics company, not a
gaming company!”
§ Harvests data : 15 TB/day
Storage
§ Test new features
§ 4 U box = 40 TB,
§ Target advertising
1 PB = 25 boxes !
§
§ 230 million players/month
hKp://goo.gl/dcBsQ
16. D3.js
Tableau
R
Dashboard
Mahout
Hadoop
BI
Tools
Predict,
Pig/Hive
Recommend
NOSQL
Model & & Visualize
Cassandra
R
Reason
MongoDB
Transform
Splunk
Hbase
& Analyze
Scribe
Neo4j
Flume
Storm
Store
When I think of my own native land, !
Collect In a moment I seem to be there; !
But, alas! recollection at hand
Soon hurries me back to despair.!
- Cowper, The Solitude Of Alexander SelKirk!
17. NOSQL
Key
Value
Column
Document
Graph
In-‐memory
SimpleDB
CouchDB
Neo4j
Memcached
Google
MongoDB
FlockDB
BigTable
Disk
Based
HBase
Lotus
Domino
InfiniteGraph
Redis
Cassandra
Riak
Tokyo
Cabinet
Dynamo
HyperTable
Voldemort
Azure
TS
19. Sotware
As
A
Service
Plasorm
As
A
Service
Infrastructure
As
A
Service
19
20.
21. Amazon – Canonical Cloud
• S3
–
Blob
storage
• Dynamo
DB
–
NOSQL
• EMR
–
ElasWc
Map
Reduce
• EC2
–
Compute
• 1%
of
Internet
traffic
“Scalability is about building wider roads,
not about building faster cars” – Steve
Swartz
hKp://blog.deepfield.net/2012/04/18/how-‐big-‐is-‐amazons-‐cloud/