Internet Infrastructures for Big Data
Talk given at Verisign's Distinguished Speaker Series, 2014
Prof. Philippe Cudre-Mauroux
eXascale Infolab
http://exascale.info/
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
1. Internet Infrastructures
for Big Data
Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg
Switzerland
VeriSign EMEA
June 26, 2014
1
2. eXascale Infolab
• New lab @ U. of Fribourg, Switzerland
• Financed by Swiss Federal State / companies / private
foundations
• Big (non-relational) data management
(Volume, Velocity, Variety) (… mostly)
2
3. On the Menu Today
• Big Data!
– Big Data Buzz
– 3 Big Data projects w/ XI & Verisign
3
5. Big Data “Central Theorem”
Data+Technology Actionable Insight $$
Reporting, Monitoring, Root Cause Analysis,
(User) Modelization, Prediction
5
6. Big Data Buzz
6
Between now and 2015, the firm expects big data to
create some 4.4 million IT jobs globally; of those, 1.9
million will be in the U.S. Applying an economic
multiplier to that estimate, Gartner expects each new big-
data-related IT job to create work for three more people
outside the tech industry, for a total of almost 6 million
more U.S. jobs.
Growth in the Asia Pacific Big Data market
is expected to accelerate rapidly in two to
three years time, from a mere US$258.5
million last year to in excess of $1.76
billion in 2016, with highest growth in the
storage segment.
7. Big Data Everywhere!
• The Age of Big Data (NYTimes Feb. 11, 2012)
http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-
the-world.html
“Welcome to the Age of Big Data. The new megarich of Silicon Valley,
first at Google and now Facebook, are masters at harnessing the data
of the Web — online searches, posts and messages — with Internet
advertising. At the World Economic Forum last month in Davos,
Switzerland, Big Data was a marquee topic. A report by the forum, “Big
Data, Big Impact,” declared data a new class of economic asset, like
currency or gold.”
7
10. The 3-Vs of Big Data
• Volume
– amount of data
• Velocity
– speed of data in and out
• Variety
– range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or
high-variety information assets that require new forms of
processing to enable enhanced decision making, insight
discovery and process optimization"
Coming up: 3
examples from XI
10
11. Volume: Fixing the Hadoop
Distributed File System
• Hadoop (YARN): “cluster Operating System”
• Often synonymous with Big Data
• Used everywhere (… even in CH)
11
12. HDFS Blocks Placement Strategy
Rack 1 Rack 2
● 1st replica on local
node or random
node
● 2nd replica on a
different node in a
different rack
● 3rd replica on a
different node in
same rack as 2nd
replica
➡Not hardware-aware
➡Block level rather than file level
13. Solution: Hadaps File Placement
• Assigns weights to DataNodes
– I/O-bound jobs finish earlier on new media
– CPU-bound jobs finish earlier on new CPUs
• Uses lower utilization servers first
• Moves more blocks to newer generations
• Operates on file level
Up to 300% performance
improvement by activating
all nodes
1
A
1
2
B
1
2
C
1
2
D
2
3
E
2
3
F
2
3
2
34
56
7
8
9
Blocks
Weight
123456
789
1 2
3
4
5
6
7 8
9
10
10
10
16. Data at each Vertex!
• Spatial + temporal statistical processing (mini-
Lisas)
• Stream processing (Storm) + Array processing
(SciDB)
base
station 29
sensor 1053
sensor 1054
base
station 17
base
station 42Peer Information Management overlay
Array Data Management System
OLTP HYRISE OLAP
OLTP HYRISE OLAP
OLTP HYRISE OLAP
Anomaly
Detection
Alert
Sliding-Window
Average
Data Gap
Event
Mini-Lisa
Computations
Missing Data?
Anomaly
Detected?
Yes
No
Yes Anomaly
Event
Delta
Compression
Fluctuation?
Yes Publish
Value
Event
No
No
Alive Event
Stream Processing Flow
16
18. Variety: Sharing Data Locally & Globally
• 70+% of the world’s population has no or
very limited access to the Web
[Ahmed Shams 2013]
18
19. Our Solution: ERS, the
Entity Registry System
• Three-tier solution to deploy data-powered apps
– Flexible
• Seamlessly reconcile entities in local / ad-hoc / global modes
– Collaborative
• Transactional consistency,
data versioning
– Scalable
• Bridges, scale-out servers,
tunable consistency
– Open-source
• https://github.com/ers-devs
19
20. Ongoing Deployments
• Entity-powered apps for the Sugar Learning
Platform
• Ambient Assisted Living of elderly persons
in tropical environments
20
21. Special Thanks to…
• Vincenzo Russo, Benoit Perroud, Matt
Thomas, Romain Cholat and the whole
Verisign Fribourg office
• Burt Kaliski and his team
• Allison Mankin, Scott Hollenbeck, Debra
Anderson & the Internet Infrastructures Grant
team
… for their continued support