SlideShare a Scribd company logo
1 of 15
So HappyTogether
 BigTable + Dynamo
 Semi-structured data model
 Decentralized – no special roles
 Ridiculously fast writes, fast reads
 Tunably consistent
 Cross-DC capable
 You design your data model based off of your
query model
 Real-time ad-hoc queries aren’t viable
 Secondary indexes help (0.7)
 What about analytics?
 Hadoop has analytics
 MapReduce
 Pig/Hive and other tools built above MapReduce
 Configurable data sources/destinations
 Many already familiar with it
 Active community
 Always able to output to Cassandra directly
 0.6
 ColumnFamilyInputFormat
 Pig support – Cassandra LoadFunc
 0.7
 ColumnFamilyOutputFormat
 Hadoop Streaming Output
 Streamlined configuration
 Recipe
 Overlay Hadoop on top of Cassandra
 Separate server for name node and job tracker
 Co-locate task trackers with Cassandra nodes
 Add data nodes to taste
 Voilà
 Data locality
 Analytics engine scales with data
 Example
 Cassandra specific InputFormat
 Configuration – ConfigHelper, Hadoop variables
 InputSplits over the data – tunable
 Example usage in contrib/word_count
 OutputFormat
 Configuration – ConfigHelper, Hadoop variables
 Batches output – tunable
 Don’t have to use Cassandra api
 Some optimizations (e.g. ConsistencyLevel.ONE)
 Example usage in contrib/word_count
 60,000+ Documented UFO Sightings
 Data set from http://infochimps.com
sighted_at reported_at location shape duration description
19951009 19951009 Iowa City, IA
Man repts.Witnessing “flash,
followed by a classic UFO, w/ a
tailfin at back.” …
19940801 19950220 Renton, WA
Man repts. seeing 2x large
ships hovering in night sky
while using Russian-made
night binoculars.
19970111 19970111 St. Cloud, MN pyramid 2 min.
Summary : Right when me and
my friend left my house we
saw a bright green glowing
object that looked like a 4
sided pyramid then after about
2 min it took off straight into
the sky leaving a yellow trail
behind it…
 What about languages outside of Java?
 Build on what Hadoop uses - Streaming
 Output streaming in 0.7.0
 Example in contrib/hadoop_streaming_output
 Input streaming in progress, likely 0.7.1
 Developed atYahoo!
 PigLatin/Grunt shell
 Powerful scripting language for analytics
 Example usage in contrib/pig
 Configuration – Hadoop/Env variables
 Raptr.com
 Home grown solution -> Cassandra + Hadoop
 Query time: hours -> minutes
 Pig obviated their need for multi-lingual MR
 Speed and ease are enabling
 Imagini/Visual DNA
 US Government (Digital Reasoning)
 See http://github.com/digitalreasoning/PyStratus
 Hive support in progress (HIVE-1434)
 Hadoop Input Streaming (likely 0.7.1)
 Performance improvements
 Hadoop analytics for Cassandra
 Data locality for processing
 Scales with the cluster
 More information
 http://cassandra.apache.org
 http://wiki.apache.org/cassandra/HadoopSupport
 Cassandra:The Definitive Guide
 About me:
 jeremy.hanna@rackspace.com
 @jeromatron onTwitter
 jeromatron on IRC in #cassandra

More Related Content

What's hot

Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
zingopen
 

What's hot (20)

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Data
 
Geek camp
Geek campGeek camp
Geek camp
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Map Analytics in Starcraft II (2/3/2015)
Map Analytics in Starcraft II (2/3/2015)Map Analytics in Starcraft II (2/3/2015)
Map Analytics in Starcraft II (2/3/2015)
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training course
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 

Viewers also liked

Viewers also liked (12)

Real time ship tracking system using ais data
Real time ship tracking system using ais dataReal time ship tracking system using ais data
Real time ship tracking system using ais data
 
Flapping Foil Propulsion System in Ship and Underwater Vehicles
Flapping Foil Propulsion System in Ship and Underwater Vehicles Flapping Foil Propulsion System in Ship and Underwater Vehicles
Flapping Foil Propulsion System in Ship and Underwater Vehicles
 
Marine Propulsion
Marine PropulsionMarine Propulsion
Marine Propulsion
 
Propulsion Systems Of Ships
Propulsion Systems Of ShipsPropulsion Systems Of Ships
Propulsion Systems Of Ships
 
Marine Propulsion History and Electric Propulsion & Future Technology
Marine Propulsion History and Electric Propulsion & Future TechnologyMarine Propulsion History and Electric Propulsion & Future Technology
Marine Propulsion History and Electric Propulsion & Future Technology
 
A seminar report on Electric Propulsion
A seminar report on Electric PropulsionA seminar report on Electric Propulsion
A seminar report on Electric Propulsion
 
The Electric Propulsion Systems
The Electric Propulsion SystemsThe Electric Propulsion Systems
The Electric Propulsion Systems
 
Hydraulics training
Hydraulics trainingHydraulics training
Hydraulics training
 
SHIP PROPULSION SEMINAR report
SHIP PROPULSION SEMINAR reportSHIP PROPULSION SEMINAR report
SHIP PROPULSION SEMINAR report
 
Basic hydraulic circuit
Basic hydraulic circuitBasic hydraulic circuit
Basic hydraulic circuit
 
BIOMIMETIC ARCHITECTURE
BIOMIMETIC ARCHITECTUREBIOMIMETIC ARCHITECTURE
BIOMIMETIC ARCHITECTURE
 
Biomimicry
BiomimicryBiomimicry
Biomimicry
 

Similar to Cassandra + Hadoop @ApacheCon

Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 

Similar to Cassandra + Hadoop @ApacheCon (20)

Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Training
TrainingTraining
Training
 
Hadoop - A Very Short Introduction
Hadoop - A Very Short IntroductionHadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 

More from Jeremy Hanna

Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for Developers
Jeremy Hanna
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Jeremy Hanna
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
Jeremy Hanna
 

More from Jeremy Hanna (12)

Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for Developers
 
Troubleshooting Cassandra
Troubleshooting CassandraTroubleshooting Cassandra
Troubleshooting Cassandra
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Intro to cassandra + hadoop
Intro to cassandra + hadoopIntro to cassandra + hadoop
Intro to cassandra + hadoop
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Cassandra + Hadoop @ApacheCon

  • 2.  BigTable + Dynamo  Semi-structured data model  Decentralized – no special roles  Ridiculously fast writes, fast reads  Tunably consistent  Cross-DC capable
  • 3.  You design your data model based off of your query model  Real-time ad-hoc queries aren’t viable  Secondary indexes help (0.7)  What about analytics?
  • 4.  Hadoop has analytics  MapReduce  Pig/Hive and other tools built above MapReduce  Configurable data sources/destinations  Many already familiar with it  Active community
  • 5.  Always able to output to Cassandra directly  0.6  ColumnFamilyInputFormat  Pig support – Cassandra LoadFunc  0.7  ColumnFamilyOutputFormat  Hadoop Streaming Output  Streamlined configuration
  • 6.  Recipe  Overlay Hadoop on top of Cassandra  Separate server for name node and job tracker  Co-locate task trackers with Cassandra nodes  Add data nodes to taste  Voilà  Data locality  Analytics engine scales with data  Example
  • 7.  Cassandra specific InputFormat  Configuration – ConfigHelper, Hadoop variables  InputSplits over the data – tunable  Example usage in contrib/word_count
  • 8.  OutputFormat  Configuration – ConfigHelper, Hadoop variables  Batches output – tunable  Don’t have to use Cassandra api  Some optimizations (e.g. ConsistencyLevel.ONE)  Example usage in contrib/word_count
  • 9.  60,000+ Documented UFO Sightings  Data set from http://infochimps.com sighted_at reported_at location shape duration description 19951009 19951009 Iowa City, IA Man repts.Witnessing “flash, followed by a classic UFO, w/ a tailfin at back.” … 19940801 19950220 Renton, WA Man repts. seeing 2x large ships hovering in night sky while using Russian-made night binoculars. 19970111 19970111 St. Cloud, MN pyramid 2 min. Summary : Right when me and my friend left my house we saw a bright green glowing object that looked like a 4 sided pyramid then after about 2 min it took off straight into the sky leaving a yellow trail behind it…
  • 10.  What about languages outside of Java?  Build on what Hadoop uses - Streaming  Output streaming in 0.7.0  Example in contrib/hadoop_streaming_output  Input streaming in progress, likely 0.7.1
  • 11.  Developed atYahoo!  PigLatin/Grunt shell  Powerful scripting language for analytics  Example usage in contrib/pig  Configuration – Hadoop/Env variables
  • 12.  Raptr.com  Home grown solution -> Cassandra + Hadoop  Query time: hours -> minutes  Pig obviated their need for multi-lingual MR  Speed and ease are enabling  Imagini/Visual DNA  US Government (Digital Reasoning)  See http://github.com/digitalreasoning/PyStratus
  • 13.  Hive support in progress (HIVE-1434)  Hadoop Input Streaming (likely 0.7.1)  Performance improvements
  • 14.  Hadoop analytics for Cassandra  Data locality for processing  Scales with the cluster
  • 15.  More information  http://cassandra.apache.org  http://wiki.apache.org/cassandra/HadoopSupport  Cassandra:The Definitive Guide  About me:  jeremy.hanna@rackspace.com  @jeromatron onTwitter  jeromatron on IRC in #cassandra

Editor's Notes

  1. Talk a little about background of the theme – hippies, The Turtles, readability.
  2. Mention Jeff Hodges, Johan, Stu, and Todd Lipcon.
  3. Mention how InputSplit works and how it can choose among replicas – array of locations returned.
  4. Highlight how this is the same extension point that is used with HDFS, HBase and any other data source/destination for MapReduce.
  5. IOW, are people using this stuff in the real world? In production? Put some notes in here about raptr and imagini’s use cases.