SlideShare une entreprise Scribd logo
1  sur  109
Télécharger pour lire hors ligne
Ephemeral	
  Hadoop	
  Clusters	
  in	
  the	
  Cloud	
  




                                               [1]	
  


                Greg	
  Fodor,	
  Etsy	
  
               gfodor@etsy.com	
  
about	
  me	
  
 gfodor@etsy.com	
  
     @gfodor	
  
   Data	
  Wrangler	
  
about	
  etsy	
  
the	
  world’s	
  handmade	
  marketplace	
  
total	
  members:	
  9,000,000	
  
total	
  acHve	
  shops:	
  800,000	
  
     items	
  listed:	
  9.5M	
  
page	
  views	
  per	
  month:	
  >1B	
  
    2010	
  sales:	
  $314.3M	
  
lots	
  of	
  data	
  
about	
  this	
  talk	
  
ephemeral?	
  
[5]	
  
“elasHc”	
  to	
  the	
  extreme	
  
how	
  did	
  we	
  get	
  here?	
  
wanted	
  to	
  dip	
  our	
  toes	
  
        stop	
  hiWng	
  the	
  database	
  
          stop	
  grepping	
  log	
  files	
  
2	
  data	
  sources	
  -­‐>	
  S3	
  
database	
  snapshots	
  

                          input:	
  
                       nightly	
  diffs	
  
(SELECT	
  *	
  FROM	
  <table>	
  WHERE	
  update_date	
  >	
  1	
  day	
  ago)	
  




               output:	
  
full	
  tables	
  as	
  sequence	
  files	
  
visit	
  logs	
  
      input:	
  
akamai	
  access	
  logs	
  
 (event	
  beacons)	
  

       output:	
  
 [visit_id,	
  [event]]	
  
processing	
  the	
  data	
  
[2]	
  
data	
  flow	
  
 joins,	
  group	
  bys,	
  etc.	
  
cascading	
  
      Chris	
  Wensel	
  
hhp://www.cascading.org/	
  
great	
  implementaHon	
  
Java	
  syntax	
  




                 [10]	
  
cascading.jruby	
  
Grégoire	
  Marabout	
  (Qualtera),	
  Mah	
  Walker	
  (Etsy),	
  Stefan	
  Karpinski	
  (Etsy),	
  Steve	
  Mardenfeld	
  (Etsy)	
  

                                               github:	
  hhp://bit.ly/o3DNtC	
  
                                               blog:	
  hhp://etsy.me/cFytuL	
  
“push”	
  job	
  binaries	
  to	
  S3	
  

run	
  on	
  ElasHc	
  Map/Reduce	
  
           starts	
  cluster,	
  runs,	
  shuts	
  down	
  




    access	
  results	
  on	
  S3	
  
next	
  project:	
  
shop	
  recommendaHons	
  
3	
  steps:	
  
✔ data	
  preparaHon	
  -­‐	
  Cascading	
  
     ✖ analysis/training	
  
           ✖ predicHon	
  
sparse	
  implementaHon	
  of	
  SVD	
  
3	
  steps:	
  
✔ data	
  preparaHon	
  -­‐	
  Cascading	
  
 ✖ analysis/training	
  -­‐	
  MATLAB	
  
   ✖ predicHon	
  -­‐	
  MATLAB	
  
“MATLAB,	
  in	
  my	
  	
  
Hadoop	
  cluster?”	
  
hadoop	
  streaming	
  
arbitrary	
  scripts	
  for	
  map	
  &	
  reduce	
  
Swiss	
  army	
  knife	
  
Full	
  dataset	
  analysis	
  
Matlab,	
  Ruby	
  scripts	
  



‘ArHfact’	
  outputs	
  
Tokyo	
  Cabinet,	
  Lucene,	
  SQLite	
  



Side-­‐effects	
  
MySQL,	
  CloudFront	
                                            [3]	
  
3	
  steps:	
  
✔ data	
  preparaHon	
  -­‐	
  Cascading	
  
✔ analysis/training	
  -­‐	
  MATLAB	
  
   ✔ predicHon	
  -­‐	
  MATLAB	
  
Job	
  2	
     Job	
  1	
  




                              [4]	
  
Barnum	
  
Sinatra	
  web	
  service	
  on	
  EC2	
  
barnum	
  starts	
  job	
  and	
  passes	
  
             callback	
  URL	
  

  when	
  job	
  finishes,	
  hadoop	
  hits	
  
callback	
  URL	
  to	
  barnum	
  to	
  proceed	
  
Barnum	
  constructs	
  
3	
  steps:	
  
✔ data	
  preparaHon	
  -­‐	
  Cascading	
  
✔ analysis/training	
  -­‐	
  MATLAB	
  
   ✔ predicHon	
  -­‐	
  MATLAB	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
suggested_shops.yaml:	
  
geWng	
  data	
  back	
  to	
  web	
  stack?	
  
[6]	
  
v1	
  
ad-­‐hoc	
  shell	
  scripts	
  
TSV	
  into	
  unsharded	
  MySQL	
  
          not	
  re-­‐usable	
  



                                        [6]	
  
v2	
  
datasets	
  are	
  versioned	
  based	
  upon	
  
        job	
  execuHon	
  Hme	
  
MySQL	
  Tables:	
  




Memcache	
  Cluster:	
  
Output	
  dataset	
  <-­‐>	
  ORM	
  Model	
  
PHP:	
  
PHP:	
  




Cascading:	
  
PHP:	
  




Cascading:	
  




PHP:	
  
Old	
  tables	
  regularly	
  dropped	
  
how	
  we’re	
  using	
  this	
  stack	
  


analyHcs	
                   products	
  
   (internal)	
                 (external)	
  
analyHcs	
  
products	
  
search	
  quality	
  
recommendaHons	
  
May	
  2011:	
  	
  
4,926	
  successful	
  job	
  runs	
  
[5]	
  
scale	
  up	
  from	
  zero	
  
isolaHon	
  
isolaHon	
  across	
  runs	
  
       fresh	
  machine	
  each	
  Hme	
  
isolaHon	
  between	
  developers	
  
              no	
  toe-­‐stepping	
  
heterogeneous	
  clusters	
  
big	
  RAM	
  when	
  you	
  need	
  it	
  
            (but	
  not	
  when	
  you	
  don’t)	
  
need	
  one	
  machine?	
  	
  
 use	
  one	
  machine.	
  
wriHng	
  jobs	
  
PHENOMENAL	
  
  COSMIC	
  
  POWERS	
  

                 [7]	
  
prototyping	
  
run	
  slow,	
  unopHmized	
  version	
  on	
  500	
  machine	
  for	
  <	
  $100	
  
parameter	
  tuning	
  
Try	
  N=1,	
  2,	
  5,	
  10	
  and	
  see	
  which	
  results	
  in	
  best	
  output	
  
[9]	
  
quesHons?	
  
photo	
  credits	
  
[1]	
  by	
  elfike	
  hhp://www.flickr.com/photos/elfike/157439707/	
  
[2]	
  by	
  Dan4th	
  hhp://www.flickr.com/photos/43264265@N00/5371557240/	
  
[3]	
  by	
  mandolux	
  	
  hhp://www.flickr.com/photos/73935252@N00/34418046/	
  
[4]	
  by	
  The	
  Suss-­‐Man	
  hhp://www.flickr.com/photos/8692813@N06/4580254188/	
  
[5]	
  by	
  Stephen	
  Rees	
  hhp://www.flickr.com/photos/60142746@N00/214461223/	
  
[6]	
  by	
  Let	
  Ideas	
  Compete	
  
hhp://www.flickr.com/photos/quesHon_everything/3414827746/	
  
[7]	
  by	
  funkandjazz	
  hhp://www.flickr.com/photos/phunk/2484159004/	
  
[8]	
  by	
  ViaMoi	
  hhp://www.flickr.com/photos/12187843@N07/3343619603/	
  
[9]	
  by	
  kreg.steppe	
  hhp://www.flickr.com/photos/spyndle/500305000/	
  
[10]	
  clipart	
  (really)	
  
[11]	
  by	
  Chris	
  Pirillo	
  hhp://www.flickr.com/photos/49503157467@N01/34588230/	
  

Contenu connexe

Tendances

How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...Jos Boumans
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsHadoop User Group
 
scalable machine learning
scalable machine learningscalable machine learning
scalable machine learningSamir Bessalah
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...PROIDEA
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Dataiku
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverDataWorks Summit
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドTatsuya Sasaki
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 

Tendances (20)

How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
scalable machine learning
scalable machine learningscalable machine learning
scalable machine learning
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
[4DEV][Łódź] Ivan Vaskevych - InfluxDB and Grafana fighting together with IoT...
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game Forever
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 

En vedette

Data mining for_product_search
Data mining for_product_searchData mining for_product_search
Data mining for_product_searchAaron Beppu
 
Transforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceTransforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceJason Davis
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages MaturelyJohn Allspaw
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMatt Graham
 
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Gregg Donovan
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanGregg Donovan
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex SystemsJohn Allspaw
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorJohn Allspaw
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightRoss Snyder
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012Nick Galbreath
 
Code as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyCode as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyChad Dickerson
 

En vedette (13)

Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Data mining for_product_search
Data mining for_product_searchData mining for_product_search
Data mining for_product_search
 
Transforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceTransforming Search in the Digital Marketplace
Transforming Search in the Digital Marketplace
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages Maturely
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
 
DevTools at Etsy
DevTools at EtsyDevTools at Etsy
DevTools at Etsy
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex Systems
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human Error
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 
Code as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyCode as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at Etsy
 

Similaire à Emphemeral hadoop clusters in the cloud

First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWS(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWSAmazon Web Services
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Holden Karau
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
RubyEnRails2007 - Dr Nic Williams - Keynote
RubyEnRails2007 - Dr Nic Williams - KeynoteRubyEnRails2007 - Dr Nic Williams - Keynote
RubyEnRails2007 - Dr Nic Williams - KeynoteDr Nic Williams
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Hivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAHivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAMakoto Yui
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceAshok Modi
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 

Similaire à Emphemeral hadoop clusters in the cloud (20)

First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWS(GAM303) Riot Games: Migrating Mountains of Data to AWS
(GAM303) Riot Games: Migrating Mountains of Data to AWS
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
RubyEnRails2007 - Dr Nic Williams - Keynote
RubyEnRails2007 - Dr Nic Williams - KeynoteRubyEnRails2007 - Dr Nic Williams - Keynote
RubyEnRails2007 - Dr Nic Williams - Keynote
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Hivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAHivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CA
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Emphemeral hadoop clusters in the cloud