SlideShare une entreprise Scribd logo
1  sur  17
Hadoop gets Groovy
Steve Loughran– Hortonworks
stevel at hortonworks.com
@steveloughran

Berlin, June 2012




© Hortonworks Inc. 2012
Where are you in this diagram?
                                    Hadoop Skills
                                                      Doug,Owen
                                                      Arun, Jakob
   Groovy Skills



                                         @steveloughran




                     James Strachan
                     Guillamue Laforge
                                                               Page 2
          © Hortonworks Inc. 2012
Grumpy : Groovy Hadoop Library

• Something lightweight for testing

• Wanted to play in the M/R layer

• Already using Groovy

• Liked: JVM integration, tooling, libraries, IntelliJ
 IDEA, Books…



        git@github.com:steveloughran/grumpy.git


                                                         Page 3
      © Hortonworks Inc. 2012
What is Groovy?
A dynamic language within the JVM

• Java++
   –Maps, lists, tuples, Closures

• Flavours of Ruby and Python
    –'Duck' typing, Grails, (Scripting)



A way to do things in the JVM that Sun didn't imagine



                                                    Page 4
      © Hortonworks Inc. 2012
Can use & subclass java classes:
class LineCountMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> {

static final def emitKey = new Text("lines")
static final def one = new IntWritable(1)

void map(LongWritable key,
           Text value,
           Mapper.Context context) {
    context.write(emitKey, one)
  }
}




                                                       Page 5
      © Hortonworks Inc. 2012
Closures & lists

class CountReducer2 extends Reducer {

    def reduce(Text k,
               Iterable values,
               Reducer.Context ctx) {

        def sum = values.collect() {it.get() }.sum()

        ctx.write(k, new IntWritable(sum));
    }

}




                                                       Page 6
          © Hortonworks Inc. 2012
Closures & lists

values.collect() {
    it.get()
  }.sum()

List<values> -> List<int> -> int




                                   Page 7
    © Hortonworks Inc. 2012
Result: MR jobs in Groovy
In:
gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurball
gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptop
gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Panky
gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23,
gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vas
gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurball
gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptop
gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Panky
gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vas
gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43,
gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,Smiley



Out: 163,198,223 device sightings!




                                                                       Page 8
       © Hortonworks Inc. 2012
why no Pig? Sliding Window Debounce
void map(LongWritable key, BlueEvent event,
           Mapper.Context context) {

    BlueEvent ev2 = window.insert(event)
    List<BlueEvent> expired = window.purgeExpired(event)
    expired.each { evt ->
        emit(context, evt)
    }
}

void cleanup(Mapper.Context context) {
  window.each { evt ->
    emit(context, evt)
  }
}

                                                           Page 9
        © Hortonworks Inc. 2012
Device sightings by day for 2007

1600000




1400000




1200000




1000000




800000




600000




400000




200000




      0
          1   6   11   16   21   26   31   36   41   46   51   56   61   66   71   76   81   86   91   96   101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361 366




                                                                                                                                                                                                                                                                                                      Page 10
                                                      © Hortonworks Inc. 2012
Improving Hadoop APIs
Configuration.metaClass.setAt = { key, val ->
 set(key.toString(), val.toString())
}

Configuration.metaClass.getAt = { key ->
  get(key)
}

Configuration.metaClass.add = {map ->
  map.each {elt ->
    set((elt.key).toString(),
        (elt.value).toString() )
}


                                                Page 11
     © Hortonworks Inc. 2012
& Configuration gets better

conf['mapscript'] = new File(src).text

String scriptText = conf['mapscript']

conf.add([
  window:60000,
  'redscript':reduceScript
  ])



Extending to Job class trickier –subclassing better


                                                      Page 12
     © Hortonworks Inc. 2012
New today! script driven MR jobs!
protected void setup(Mapper.Context ctx) {
   this.ctx = ctx
   this.conf = ctx.configuration
   ScriptCompiler comp = new ScriptCompiler(conf)
   String scriptText = conf['mapscript']
   map = comp.parse(scriptText, this, ctx)
 }

 protected void map(Writable key, Writable value,
    Mapper.Context ctx) {
   map.setProperty('key',key)
   map.setProperty('value',value)
   map.run()
 }



                                                    Page 13
     © Hortonworks Inc. 2012
Things to consider
• Performance: Groovy 2 on Java7
• 'False friends' -Types, if(), exceptions

• If you can use Pig, use it.
• Use Groovy for testing, extending Hadoop
  classes (output formatter, etc)
• Play with YARN and Giraph with it




                                             Page 14
     © Hortonworks Inc. 2012
Questions?

hortonworks.com




                             Page 15
   © Hortonworks Inc. 2012
hortonworks.com




                             Page 16
   © Hortonworks Inc. 2012
Performance?
• Groovy 1 over-introspects
• HLL hides a lot of overhead



• If your work is I/O bound, less important
• Speed of development vs execution
• Need to benchmark on Java 7




                                              Page 17
     © Hortonworks Inc. 2012

Contenu connexe

Tendances

Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOAltinity Ltd
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabadKelly Technologies
 
Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Nagi Teramo
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009prasadc
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7Rahul Gupta
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesDataWorks Summit/Hadoop Summit
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverDataWorks Summit
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
 
NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010Ben Scofield
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Jéferson Machado
 

Tendances (20)

Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Trading volume mapping R in recent environment
Trading volume mapping R in recent environment
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Indexed Hive
Indexed HiveIndexed Hive
Indexed Hive
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game Forever
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Apache Spark Internals - Part 2
Apache Spark Internals - Part 2
 
Presto in Treasure Data
Presto in Treasure DataPresto in Treasure Data
Presto in Treasure Data
 

En vedette

Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)guestbbb20c4
 
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2thePlatform
 
Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Wuzna Haroon
 
Energy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEnergy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEugenio Bacile di Castiglione
 
Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Patrick Nicholson
 
Secure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSecure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSafeNet
 
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities SAP Portal
 

En vedette (16)

Teamcenter – sap integration gateway
Teamcenter – sap integration gatewayTeamcenter – sap integration gateway
Teamcenter – sap integration gateway
 
Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)
 
Advanced Work Packaging in Construction: An Introduction
Advanced Work Packaging in Construction: An IntroductionAdvanced Work Packaging in Construction: An Introduction
Advanced Work Packaging in Construction: An Introduction
 
S&OP Process
S&OP ProcessS&OP Process
S&OP Process
 
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
 
Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution
 
Energy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEnergy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energetica
 
Nt1310 project
Nt1310 projectNt1310 project
Nt1310 project
 
Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016
 
Information från Läkemedelsverket #5 2013
Information från Läkemedelsverket #5 2013Information från Läkemedelsverket #5 2013
Information från Läkemedelsverket #5 2013
 
"15 Business Story Ideas to Jump on Now"
"15 Business Story Ideas to Jump on Now""15 Business Story Ideas to Jump on Now"
"15 Business Story Ideas to Jump on Now"
 
Credit cards
Credit cardsCredit cards
Credit cards
 
cathy resume
cathy resumecathy resume
cathy resume
 
Secure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSecure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the Web
 
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
 
Basics of Coding in Pediatrics Medical Billing
Basics of Coding in Pediatrics Medical BillingBasics of Coding in Pediatrics Medical Billing
Basics of Coding in Pediatrics Medical Billing
 

Similaire à Hadoop gets Groovy

The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browsergethue
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
LISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial HandoutsLISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial HandoutsTobias Oetiker
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingToni Cebrián
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 
Cascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 

Similaire à Hadoop gets Groovy (20)

The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Hadoop Jungle
Hadoop JungleHadoop Jungle
Hadoop Jungle
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
LISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial HandoutsLISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial Handouts
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
Cascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop World
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Clojure And Swing
Clojure And SwingClojure And Swing
Clojure And Swing
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 

Plus de Steve Loughran

The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 

Plus de Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Hadoop gets Groovy

  • 1. Hadoop gets Groovy Steve Loughran– Hortonworks stevel at hortonworks.com @steveloughran Berlin, June 2012 © Hortonworks Inc. 2012
  • 2. Where are you in this diagram? Hadoop Skills Doug,Owen Arun, Jakob Groovy Skills @steveloughran James Strachan Guillamue Laforge Page 2 © Hortonworks Inc. 2012
  • 3. Grumpy : Groovy Hadoop Library • Something lightweight for testing • Wanted to play in the M/R layer • Already using Groovy • Liked: JVM integration, tooling, libraries, IntelliJ IDEA, Books… git@github.com:steveloughran/grumpy.git Page 3 © Hortonworks Inc. 2012
  • 4. What is Groovy? A dynamic language within the JVM • Java++ –Maps, lists, tuples, Closures • Flavours of Ruby and Python –'Duck' typing, Grails, (Scripting) A way to do things in the JVM that Sun didn't imagine Page 4 © Hortonworks Inc. 2012
  • 5. Can use & subclass java classes: class LineCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { static final def emitKey = new Text("lines") static final def one = new IntWritable(1) void map(LongWritable key, Text value, Mapper.Context context) { context.write(emitKey, one) } } Page 5 © Hortonworks Inc. 2012
  • 6. Closures & lists class CountReducer2 extends Reducer { def reduce(Text k, Iterable values, Reducer.Context ctx) { def sum = values.collect() {it.get() }.sum() ctx.write(k, new IntWritable(sum)); } } Page 6 © Hortonworks Inc. 2012
  • 7. Closures & lists values.collect() { it.get() }.sum() List<values> -> List<int> -> int Page 7 © Hortonworks Inc. 2012
  • 8. Result: MR jobs in Groovy In: gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurball gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptop gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Panky gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23, gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vas gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurball gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptop gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Panky gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vas gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43, gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,Smiley Out: 163,198,223 device sightings! Page 8 © Hortonworks Inc. 2012
  • 9. why no Pig? Sliding Window Debounce void map(LongWritable key, BlueEvent event, Mapper.Context context) { BlueEvent ev2 = window.insert(event) List<BlueEvent> expired = window.purgeExpired(event) expired.each { evt -> emit(context, evt) } } void cleanup(Mapper.Context context) { window.each { evt -> emit(context, evt) } } Page 9 © Hortonworks Inc. 2012
  • 10. Device sightings by day for 2007 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361 366 Page 10 © Hortonworks Inc. 2012
  • 11. Improving Hadoop APIs Configuration.metaClass.setAt = { key, val -> set(key.toString(), val.toString()) } Configuration.metaClass.getAt = { key -> get(key) } Configuration.metaClass.add = {map -> map.each {elt -> set((elt.key).toString(), (elt.value).toString() ) } Page 11 © Hortonworks Inc. 2012
  • 12. & Configuration gets better conf['mapscript'] = new File(src).text String scriptText = conf['mapscript'] conf.add([ window:60000, 'redscript':reduceScript ]) Extending to Job class trickier –subclassing better Page 12 © Hortonworks Inc. 2012
  • 13. New today! script driven MR jobs! protected void setup(Mapper.Context ctx) { this.ctx = ctx this.conf = ctx.configuration ScriptCompiler comp = new ScriptCompiler(conf) String scriptText = conf['mapscript'] map = comp.parse(scriptText, this, ctx) } protected void map(Writable key, Writable value, Mapper.Context ctx) { map.setProperty('key',key) map.setProperty('value',value) map.run() } Page 13 © Hortonworks Inc. 2012
  • 14. Things to consider • Performance: Groovy 2 on Java7 • 'False friends' -Types, if(), exceptions • If you can use Pig, use it. • Use Groovy for testing, extending Hadoop classes (output formatter, etc) • Play with YARN and Giraph with it Page 14 © Hortonworks Inc. 2012
  • 15. Questions? hortonworks.com Page 15 © Hortonworks Inc. 2012
  • 16. hortonworks.com Page 16 © Hortonworks Inc. 2012
  • 17. Performance? • Groovy 1 over-introspects • HLL hides a lot of overhead • If your work is I/O bound, less important • Speed of development vs execution • Need to benchmark on Java 7 Page 17 © Hortonworks Inc. 2012

Notes de l'éditeur

  1. What is the knowledge/skill level of the audience?I’m taking about Groovy, not time to cover Hadoop as wellDisclaimer: I am a Groovy user, not an Expert.
  2. How to describe Groovy? It depends on what you are doing with it? You can look at it and come to different conclusions based on your useIt&apos;s like Java with the datatypes they forgotIt&apos;s got ruby concepts (closures), keywords from python, and dynamic &apos;duck&apos; typingIt lets you do things in the JVM that the original Java authors didn&apos;t expect
  3. It&apos;s a map and reduce -in the reduce. Turtles all the way down. Or at least elephants.