1. Paradigm Shifts:
Big Data
Pini Cohen
VP and Senior Analyst
Tell me and I’ll forget
Show me and I may remember STKI Summit 2012
Involve me and I’ll understand
2. The “Magic” of internet companies
Source: http://venturebeat.com/2011/10/24/next-hot-internet-companies-not-in-us/internet-company-growth/
Pini Cohen’s work Copyright STKI@2012
2
Do not remove source or attribution from any slide or graph
3. Pinterest
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 3
4. Pinterest Architecture Update - 18 Million Visitors, 10x Growth,12 Employees, 410
TB of Data
• 80 million objects stored in S3 with 410 terabytes of user
data, 10x what they had in August. EC2 instances have
grown by 3x. Around $39K fo S3 and $30K for EC2 a month.
• Pay for what you use saves money. Most traffic happens in
the afternoons and evenings, so they reduce the number of
instances at night by 40%.
• 12 employees as of last December. Using the cloud a site can
grow dramatically while maintaining a very small team.
Looks like 31 employees as of now.
Source: http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 4
5. Instagram
• The Instagram philosophy:
• Simplicity
• Optimized for minimal operational burden
• Instrument everything
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 5
6. Scaling Instagram
• Instagram went to 30+ million users in less than two years
and then rocketed to 40 million users 10 days after the
launch of its Android application.
• After the release of the Android they had 1 million new
users in 12 hours.
• 2 engineers in 2010.
• 3 engineers in 2011
• 5 engineers 2012, 2.5 on the backend. This includes
iPhone and Android development.
Source: http://highscalability.com/blog/2012/4/16/instagram-architecture-update-whats-new-with-instagram.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 6
7. Tumblr – Microbloging social networking platform
• 500 million page views a day
• 15B+ page views month
• Peak rate of ~40k requests per second
• 1+ TB/day into Hadoop cluster
• Many TB/day into MySQL/HBase/Redis/Memcache
• Growing at 30% a month
• ~1000 hardware nodes in production (not cloud)
• ~20 engineers (total 106 employees)
Source: http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html STKI modifications
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 7
8. Technology listing
• Hadoop Mapreduce
• NoSQL dbms (Cassandra, Mongo, HBASE)
• Shrading
• In Memory DBMS
• Memcashed
• MemSQL
• Solr
• Redis
• DJANGO
• Python
• ELB - Elastic load balancing amazon
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
9. Paradigm shifts agenda
• Big Data:
• Big Data definition and background
• Big Data value
• Big Data technology
Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 9
10. Big Data Definition – 4 V’s (or more…)
• Volume – tens of TBs and more (15-20TB+)
• Velocity – the speed in which data is added – 10M items
per hour and more. And the speed in which the data needs
to be processed
• Variety – different types of data – structured &
unstructured. In many cases deals with internet of things,
social media, but also with voice, video, etc.
• Variability - able to cope with new attributes and changing
data types – without interrupting the analytical process
(without “import-export”)
• Other optional V’s - validity, volatility, viscosity (resistance
to flow), etc. source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 10
11. The origins of the 3V’s:
• 2002 research by Doug Laney from META Group (now
Gartner):
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 11
12. “Big Data” theme main current usage:
• “Big Data" is just marketing jargon. -Doug Laney,
Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Source: http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg
• STKI : doing something significantly different from
what you’ve done until now
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 12
13. Big Data at work:
• Orbitz Worldwide has collected 750 terabytes of
unstructured data on their consumers’ behavior – detailed
information from customer online visits and browsing
sessions. Using Hadoop, models have been developed
intended to improve search results and tailor the user
experience based on everything from location, interest in
family travel versus solo travel, and even the kind of device
being used to explore travel options.
• The result? To date, a 7% increase in interaction rate, 37%
growth in stickiness of sessions and a net 2.6% in booking
path engagement.
Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 13
14. DW appliances will be discussed later
Teradata EMC Greenplun Oracle Exadata
Source: http://www.asugnews.com/2011/09/06/inside-saps-product-naming-strategies/
Pini Cohen’s work Copyright STKI@2012
14
Microsoft Parallel Data Warehouse
Do not remove source or attribution from any slide or graph
15. What is the business value of big data analytics?
• Big data is now a technology looking for a business need
• It can mean doing the same thing but better / faster
(better segmentation, more accurate analysis model)
• Or it can mean doing completely new things (telematics,
sentiment analysis, recommendation engine, matching
competition’s pricing in real time, being able to analyze
data we haven’t been able to analyze in the past)
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
16. Decision making – old school vs. new school (big data)
• Old School:
• Phase 1 : Analyze existing data and prepare general model
• Phase 2: Apply the general model to specific client
• This means applying the same model for many clients when they
arrive
• Issues with Old School decision making:
• Time gap between preparing and applying the model
• # of combinations might be too big for general model (example:
recommendation based in interest)
• The general model generated is biased towards “main stream”
population
• New School (Big Data):
• Phase 1: Prepare specific model for the client and apply the model
– instantly
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 16
17. Big data use cases
• Recommendation engines – match users to one another
and provide recommendation based on similar users
(Examples: Linkedin – people you may know; Amazon)
• Sentiment Analysis (Macro or individual user)
• Fraud Detection - customer behavior, historical and
transactional data combined. Same but more affordable
• Customer Churn
• Social graph analysis – influencers
• Customer experience analysis – combine data from call
center, web, social media etc.
• Improved segmentation – more data (clickstream, call
records) for more accurate analysis
• Improved customer retention
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
18. Technology: Elements Concepts
• Storing data for analytics (mainly):
• HDFS – Hadoop File System
• Map Reduce- Programming method mainly for analytics
• Other “Add-on”: Pig, , Hive, JAQL (IBM)
• Storing and retrieving data - DBMS:
• NoSQL – DBMS (not only SQL):
• Cassandra
• MongoDB
• CouchDB
• Hbase
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 18
19. Who Uses Hadoop?
• Amazon/A9 Quantcast
• AOL
Rackspace/Mailtrust
• Facebook
• Fox interactive media
Veoh
• Netflix Yahoo!
• New York Times PowerSet (now
Microsoft)
More at http://wiki.apache.org/hadoop/PoweredBy
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 19
20. Who Uses Cassandra?
• Facebook SimpleGeo
• Digg Rackspace
• Despegar Shazam
• Ooyala SoftwareProjects
• Imagini
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 20
21. Big Data technologies (Hadoop etc.) vs. traditional IT
Traditional IT Big Data
Centralized Storage Local storage
Brand redundant Servers Cheap HW White Boxes
Standard Infrastructure and virtual Is standardization needed?! (in the HW
servers. level). No server virtualization.
Well established backup and DRP Why do I need backup? How do I tackle
procedures DRP (compute clusters that are stretched
over locations)
Traditional vendors Open Source solutions
Mature products and procedures In a new patch for specific issues
sometimes it is written “not implemented
yet”
Traditional programming, SQL Different kind of programming (map-
reduce) , no Joins
Will Big Data infrastructure be part of existing infrastructure or will be
developed as new domain?
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 21
22. New type of scale:
• Hadoop:
• Up to 4,000 machines in a cluster
• Up to 20 PB in a cluster
• Currently traditional IT technologies can not handle this
kind of scale.
• This scale comes with a cost!
Source: http://www.techsangam.com/wp-content/uploads/2012/01/i_love_scalability_mug.jpg
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 22
23. Brewer's (CAP) Theorem
• It is impossible for a distributed computer system to
simultaneously provide all three of the following
guarantees:
• Consistency (all nodes see the same data at the same time)
• Availability (node failures do not prevent survivors from
continuing to operate)
• Partition Tolerance (the system continues to operate in many
partitions and despite arbitrary message loss)
Source: Scalebase STKI modifications
Professor Eric A. Brewer
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 23
24. Dealing With CAP
• Drop Consistency
• Welcome to the “Eventually Consistent” term.
• At the end – everything will work out just fine - And hey, sometimes
this is a good enough solution
• When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will
be consistent
• For a given accepted update and a given node, eventually either
the update reaches the node or the node is removed from service
• Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
Source: Scalebase
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 24
25. Hadoop
• Apache Hadoop is a software framework that supports
data-intensive distributed applications
• It enables applications to work with thousands of nodes
and petabytes of data.
• Hadoop was inspired by Google's MapReduce and Google
File System (GFS) papers
• Contains (basically):
• HDFS – Hadoop file System
• MapReduce programming model
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 25
26. HDFS – Hadoop File System
• Parallel
• Distributed on commodity elements
• Throughput over latency
• Reliable and self healing
• For large scale – typical file is gigabytes to terabytes (for
one file!)
• Applications need a write-once-read-many access
model (mainly analytics)
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 26
27. HDFS motivation
• What if you needed to write a program that distributes
data on commodity HW (PC’s or Servers). You would need
to take care of:
• Where is the data located
• How to distribute data between the nodes
• How many times you want to replicate the data
• How to insert, select and update data
• What to do if one node or more fails
• How to add node or to take out a node
• Manage and monitor the environment
• Hadoop File System did it for you!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 27
28. HDFS: Hadoop Distributed File Systems
• Data nodes and Name node
• Client requests meta data about a file from namenode
• Data is served directly from datanode
HDFS namenode
Application
(file name, block id)
HDFS Client File namespace /user/css534/input
(block id, block location)
block 3df2
instructions state
(block id, byte range)
HDFS datanode HDFS datanode
block data
Linux local file system Linux local file system
… …
source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 28
29. Datanode Blockreports
File “part-0” will be
replicated twice and will
populatesaved in blocks 1
and 3 (file is big so it has to
be divided to 2 blocks)
Block 1 is on data nodes A and C
source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 29
30. HDFS basic limitations
• Namenode is single point of failure
• Write-once model
• Plan to support appending-writes
• A namespace with an extremely large number of files
exceeds Namenode’s capacity to maintain
• Cannot be mounted by exisiting OS
• Getting data in and out is tedious
• HDFS does not implement / support user quotas / access
permissions
• Data balancing schemes
• No periodic checkpoints
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 30
31. Map Reduce programming model
• In very basic – Brings the program to the data
• Contains two elements:
• Map: this part of the job is performed in parallel asynchronous
by each node
• Reduce: gather the result from the relevant nodes
• In more detail :
• Map : return (write on temp file) a list containing zero or more
( k, v ) pairs
• Output can be a different key from the input
• Output can have same key
• Reduce : return a new list of reduced output from input
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 31
32. MapReduce motivation
• What if you needed to write a program that processes data
that’s on distributed computers?
• You would need to write distributed program that:
• Finds where the data located
• Work on each node and then combine the result from each node
together.
• Where (on the local node) and how (format) to write the
intermediate results
• Find when the jobs of all participating nodes have concluded and
then start the “aggregation” part
• What to do if a job is stuck (restart the job or turn to another node
to perform the same job)
• Hadopp MapReduce is the framework for you!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 32
33. MapReduce example:
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 33
34. Dataflow in Hadoop
Master Job: Word Count
Submit job
All elements – standard HW
map schedule reduce
map reduce
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 34
35. Dataflow in Hadoop
Hello World Bye World
Read Hello 1
Input File World 2
map reduce
Block 1 Bye1
Hello Hadoop Goodbye Hadoop
HDFS
Block 2 Hello 1
map Hadoop 2 reduce
Goodbye 1
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 35
36. Dataflow in Hadoop
Finished Finished + Location
map Local
FS
reduce
Local
map FS reduce
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 36
37. Dataflow in Hadoop
map Local
FS
reduce
HTTP GET
Local
map FS reduce
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 37
38. Dataflow in Hadoop
Write
Final
reduce
Answer
HDFS
reduce Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 38
39. Components of Cluster Node
Flow File Input
Processor
Flow Analysis Flow Analysis • Flow file
Cluster File Map Reduce
Cluster File
Map Reduce input processor
System
(System)
HDFS • Flow analysis
flow- ( HDFS )
MapReduce Library map/reduce
tools
• Flow-tools
Hadoop • Hadoop
• HDFS
Java Virtual Machine
• MapReduce
Operating System ( Linux ) • Java VM
• OS : Linux
Hardware ( CPU, HDD, Memory, NIC )
Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 39
40. Hive: MapReduce helper:
• Code Example:
• hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
• hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a
WHERE a.key < 100;
• hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.*
FROM events a;
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites,
a.pokes FROM profiles a;
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*)
FROM invites a WHERE a.ds='2008-08-15';
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar
FROM invites a;
• hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT
SUM(a.pc) FROM pc1 a;
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 40
41. NoSQL DBMS: storing and retrieving data
• Key/Value
• A big hash table
• Examples: Voldemort, Amazon’s Dynamo
• Big Table
• Big table, column families
• Examples: Hbase, Cassandra
• Document based
• Collections of collections
• Examples: CouchDB, MongoDB
• Graph databases
• Based on graph theory
• Examples: Neo4J
• Each solves a different problem
Source: Scalebase
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 41
42. Pros/Cons
• Pros:
• Performance
• BigData
• Most solutions are open source
• Data is replicated to nodes and is therefore fault-tolerant
(partitioning)
• Don't require a schema
• Can scale up and down
• Cons:
• Code change
• No framework support
• Not ACID
• Eco system (BI, Backup)
• There is always a database at the backend
• Some API is just too simple
Source: Scalebase
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 42
43. Apache Cassandra
• Cassandra is a highly scalable, eventually
consistent, distributed, structured key-value
store
• Child of Google’s BigTable and Amazon’s
Dynamo
• Peer to peer architecture. All nodes are equal Source: ids.snu.ac.kr/w/images/1/18/2011SS-03.ppt
• Cassandra’s replication factor (RF) is the total
number of nodes onto which the data will be
placed. RF of at least 2 is highly recommended,
keeping in mind that your effective number of
nodes is (N total nodes / RF).
• CQL (Cassandra Query Language) command line
• Time stamp for each value written
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 43
44. Consistent Hashing
• Partition using consistent hashing (for the
first node data is placed) based on MD5
Distributed hash table algorithm A
• Keys hash to a point on a fixed circular
C
space V B
• Ring is partitioned into a set of ordered
slots and servers and keys hashed over
these slots
• Nodes take positions on the circle. S D
• A, B, and D exists.
• B responsible for AB range ( for replication
factor=2 – default).
• D responsible for BD range.
• A responsible for DA range. R H
• C joins.
• B, D split ranges. M
• C gets BC from D.
Source: http://www.intertech.com/resource/usergroup/NoSQL.ppt
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 44
45. Cassandra’s tunable consistency (write)
Level Behavior
Ensure that the write has been written to at least 1 node, including HintedHandoff
ANY
recipients.
Ensure that the write has been written to at least 1 replica's commit log and
ONE
memory table before responding to the client.
Ensure that the write has been written to at least 2 replica's before responding to
TWO
the client.
Ensure that the write has been written to at least 3 replica's before responding to
THREE
the client.
Ensure that the write has been written to N / 2 + 1 replicas before responding to the
QUORUM
client.
Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within
LOCAL_QUORUM
the local datacenter (requires NetworkTopologyStrategy)
Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each
EACH_QUORUM
datacenter (requires NetworkTopologyStrategy)
Ensure that the write is written to all N replicas before responding to the client. Any
ALL
unresponsive replicas will fail the operation.
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph Source: wiki
45
46. Cassandra’s data model structure
Think of cassandra as row-oriented
keyspace
column family
settings
(eg,
partitioner) settings column
(eg,
comparator,
type [Std]) name value clock
Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 46
47. Data Model – “flexible” scheme!
ColumnFamily: Rockets
Key Value
1 Name Value
name Rocket-Powered Roller Skates
toon Ready, Set, Zoom
inventoryQty 5
brakes false
2 Name Value
name Little Giant Do-It-Yourself Rocket-Sled Kit
toon Beep Prepared
inventoryQty 4
brakes false
3 Name Value
name Acme Jet Propelled Unicycle
toon Hot Rod and Reel
inventoryQty 1
wheels 1
Source: http://wenku.baidu.com/view/6e254321482fb4daa58d4b87.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 47
48. Cassandra’s CQL – Cassandra SQL Language
• SQL like. Example:
• CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and
strategy_options:replication_factor=1;
• CREATE INDEX ON users (birth_date);
• SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
• However:
• No Joins
• No UPDATES/DELETES
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 48
49. NoSQL benchmark – for scale!
Source: r esearch.yahoo.com/files/ycsb-v4.pdf
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 49
50. Can we live with NoSQL limitations?
• Facebook has dropped Cassandra
• “..we found Cassandra's eventual consistency model to be a
difficult pattern to reconcile for our new Messages
infrastructure”
• Facebook has selected HBase (Columnar DBMS) .
http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-
messages/454991608919
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 50
51. What about other NoSQL DBMS?
• MongoDB
• Hbase
• CouchDB
• Maybe next session….
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 51
52. Big Data potential implications on IT
• Will traditional RDBMS be obsolete? Surely no!
• Several areas are Big Data zone by definition – Internet
marketing, Cyber, DW, etc.
• How well can we live with “Eventually Consistent” which in
most cases means 1-2 minutes delay?!
• Can we define that all batch data can live well on Big Data
technologies?
• Will we see at the end (10 years form now) that only small
portion of data still resides on RDBMS and most of the data
resides on Big Data technologies?!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 52
53. Big data challenges
• NLP in Hebrew (entity recognition is more difficult)
• Adapting analytical algorithms to match big data world
(Anomaly detection needs to be redefined)
• Some problem with consistency
• Skiils problem – BI needs to program in Java, Hadoop,
NoSQL knowledge
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
54. Example of big data technology: SPLUNK
• Splunk is a traditional IT vendor based on MapReduce
(from 2009)
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 54
55. Thanks for your patience and hope you enjoyed
Here you can find the latest version of this presentation http://www.slideshare.net/pini
Pini Cohen’s work Copyright STKI@2012
55
Do not remove source or attribution from any slide or graph