SlideShare une entreprise Scribd logo
1  sur  16
Hadoop Jute RC Compiler


          Tanping Wang

         Yahoo! User Data Analytics
Agenda

• How we use Jute compiler today
• JuteRC compiler
• By using JuteRC, what we have achieved




Yahoo! Presentation Template, Confidential   2   03/30/12
Hadoop Record Compiler (Jute)


• Generates serialization code to store data in
  the Sequence File format
       org.apache.hadoop.record




Yahoo! Presentation Template, Confidential   3   03/30/12
How Do We Use Jute Today

• Use Data Definition Language to define my
  data type:
      class MyDataType {
          buffer myBuffer;
          long myLong;
      }
• Use Jute compiler to generate serialization
  code:
      $ rcc –language java mydatatype.jr


Yahoo! Presentation Template, Confidential   4   03/30/12
Jute Generates Serialization Code for me

public void serialize(final org.apache.hadoop.record.RecordOutput _rio_a, final String
   _rio_tag)
   throws java.io.IOException {
   _rio_a.startRecord(this,_rio_tag);
   _rio_a.writeBuffer(myBuffer,"myBuffer");
   _rio_a.writeLong(myLong,"myLong");
   _rio_a.endRecord(this,_rio_tag);
   }


private void deserializeWithoutFilter(final org.apache.hadoop.record.RecordInput
    _rio_a, final String _rio_tag)
    throws java.io.IOException {
    _rio_a.startRecord(_rio_tag);
    myBuffer=_rio_a.readBuffer("myBuffer");
    myLong=_rio_a.readLong("myLong");
    _rio_a.endRecord(_rio_tag);
    }



Yahoo! Presentation Template, Confidential   5                03/30/12
Today Yahoo audience ETL pipeline processes
      tens of terabytes of data per day.
      We rely on Jute. We use Sequence File to
      store our data.




Yahoo! Presentation Template, Confidential   6   03/30/12
However, We Want To Use RC Format.




Yahoo! Presentation Template, Confidential   7   03/30/12
RC File Format
  •     RCFile shares much similarity with Sequence File, but splits a file
        into row groups. Inside each row group, it stores columns as rows.




  •     Similar data types are grouped together. This potentially brings
        better compression rate.


Yahoo! Presentation Template, Confidential   8        03/30/12
Jute only supports Sequence File Format.
                              So We built JuteRC Compiler.




Yahoo! Presentation Template, Confidential   9       03/30/12
Data Type




Yahoo! Presentation Template, Confidental   10   03/30/12
Also…

• For each JType, overwrite genReadMethod
  and genWriteMethod.
• Changed CodeGenerator in Jute.




Yahoo! Presentation Template, Confidential   11   03/30/12
Serialization Code Generated by Jute v.s.
JuteRC
  Jute:
  public void serialize(final org.apache.hadoop.record.RecordOutput _rio_a, final String
    _rio_tag)
    throws java.io.IOException {
    _rio_a.startRecord(this,_rio_tag);
    _rio_a.writeBuffer(myBuffer,"myBuffer");
    _rio_a.writeLong(myLong,"myLong");
    _rio_a.endRecord(this,_rio_tag);
    }

 JuteRC:
 public class MyDataType extends
   org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable {
 public void serialize() {
    int writeIndx = 0;
    try {
    com.yahoo.ccdi.fetl.RcUtil.writeBuffer(this, myBuffer, writeIndx++);
    com.yahoo.ccdi.fetl.RcUtil.writeLong(this, myLong, writeIndx++);
    } catch(java.io.IOException e) { } }

Yahoo! Presentation Template, Confidential   12                03/30/12
Deserialization Code
      Jute:
      private void deserializeWithoutFilter(final org.apache.hadoop.record.RecordInput
      _rio_a, final String _rio_tag)
      throws java.io.IOException {
      _rio_a.startRecord(_rio_tag);
      myBuffer=_rio_a.readBuffer("myBuffer");
      myLong=_rio_a.readLong("myLong");
      _rio_a.endRecord(_rio_tag);
      }

     JuteRC:
      public void
      deserialize(org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable bra){
       int readIndx = 0;
       try {
         myBuffer=com.yahoo.ccdi.fetl.RcUtil.readBuffer(bra, readIndx++);
         myLong=com.yahoo.ccdi.fetl.RcUtil.readLong(bra, readIndx++);
       }catch(java.io.IOException e) { }}}




Yahoo! Presentation Template, Confidential   13                03/30/12
Using RC
• Convert sequence file format file to RC format:
  achieved 26~28% file size reduction.




• Faster IO performance: reading/writing 0.6X
• Process our data using both Hive and PIG on
  top of HCatalog.

Yahoo! Presentation Template, Confidential   14   03/30/12
Open Source

• We are in the process to open source JuteRC.
  Under review by Yahoo! Open Source
  Working Group.
• MapReduce programmer can directly plug in
  the code generated by JuteRC and store their
  data in RC format.




Yahoo! Presentation Template, Confidential   15   03/30/12
References

•    RCFile: A Fast and Space-efficient Data Placement Structure in
     MapReduce-based Warehouse Systems
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-


•    Hive RCFile:
http://hive.apache.org/docs/r0.7.0/api/org/apache/hadoop/hive/ql/io/RCFile.ht




Yahoo! Presentation Template, Confidential   16   03/30/12

Contenu connexe

Tendances

SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfsSWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfsMariano Rodriguez-Muro
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012Jimmy Lai
 
Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014mgrawinkel
 
JahiaOne - Jahia 7, The External Data Provider
JahiaOne - Jahia 7, The External Data ProviderJahiaOne - Jahia 7, The External Data Provider
JahiaOne - Jahia 7, The External Data ProviderJahia Solutions Group
 
Pig - Analyzing data sets
Pig - Analyzing data setsPig - Analyzing data sets
Pig - Analyzing data setsCreditas
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014rpbrehm
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaChris Mattmann
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013mumrah
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Olaf Hartig
 
An Introduction to Spring Data
An Introduction to Spring DataAn Introduction to Spring Data
An Introduction to Spring DataOliver Gierke
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkThamme Gowda
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityThamme Gowda
 

Tendances (20)

SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfsSWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
SWT Lecture Session 6 - RDFS semantics, inference techniques, sesame rdfs
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
Spring Data in 10 minutes
Spring Data in 10 minutesSpring Data in 10 minutes
Spring Data in 10 minutes
 
Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014
 
JahiaOne - Jahia 7, The External Data Provider
JahiaOne - Jahia 7, The External Data ProviderJahiaOne - Jahia 7, The External Data Provider
JahiaOne - Jahia 7, The External Data Provider
 
Pig - Analyzing data sets
Pig - Analyzing data setsPig - Analyzing data sets
Pig - Analyzing data sets
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache Tika
 
Week1 dbd
Week1 dbdWeek1 dbd
Week1 dbd
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
Lucene InputFormat (lightning talk) - TriHUG December 10, 2013
 
Hfile
HfileHfile
Hfile
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
 
python and database
python and databasepython and database
python and database
 
An Introduction to Spring Data
An Introduction to Spring DataAn Introduction to Spring Data
An Introduction to Spring Data
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache Spark
 
Getting triples from records: the role of ISBD
Getting triples from records: the role of ISBDGetting triples from records: the role of ISBD
Getting triples from records: the role of ISBD
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
 
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 

En vedette

Tortilla de sobras de puchero
Tortilla de  sobras de pucheroTortilla de  sobras de puchero
Tortilla de sobras de pucheroChoquera
 
Ebook 16palavras ingles
Ebook 16palavras inglesEbook 16palavras ingles
Ebook 16palavras inglesElder Oliveira
 
História do Brasil - Boris Fausto
História do Brasil - Boris FaustoHistória do Brasil - Boris Fausto
História do Brasil - Boris FaustoAurelio Junior
 
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...Jérôme Françoisse
 
неделя математики 8 группа
неделя математики 8 группанеделя математики 8 группа
неделя математики 8 группаyuyukul
 
группа №13 развивающие игры воскобовича
группа №13 развивающие игры воскобовичагруппа №13 развивающие игры воскобовича
группа №13 развивающие игры воскобовичаyuyukul
 
Matter Incidental to the Execution of the Will
Matter Incidental to the Execution of the Will Matter Incidental to the Execution of the Will
Matter Incidental to the Execution of the Will a_sophi
 
임산부 영양상담
임산부 영양상담임산부 영양상담
임산부 영양상담mothersafe
 
Leading Age 2015 Presentation Repositioning ROI Synergy in Design, Care, and...
Leading Age 2015 Presentation Repositioning ROI  Synergy in Design, Care, and...Leading Age 2015 Presentation Repositioning ROI  Synergy in Design, Care, and...
Leading Age 2015 Presentation Repositioning ROI Synergy in Design, Care, and...Christine Rancourt
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake
 

En vedette (10)

Tortilla de sobras de puchero
Tortilla de  sobras de pucheroTortilla de  sobras de puchero
Tortilla de sobras de puchero
 
Ebook 16palavras ingles
Ebook 16palavras inglesEbook 16palavras ingles
Ebook 16palavras ingles
 
História do Brasil - Boris Fausto
História do Brasil - Boris FaustoHistória do Brasil - Boris Fausto
História do Brasil - Boris Fausto
 
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
 
неделя математики 8 группа
неделя математики 8 группанеделя математики 8 группа
неделя математики 8 группа
 
группа №13 развивающие игры воскобовича
группа №13 развивающие игры воскобовичагруппа №13 развивающие игры воскобовича
группа №13 развивающие игры воскобовича
 
Matter Incidental to the Execution of the Will
Matter Incidental to the Execution of the Will Matter Incidental to the Execution of the Will
Matter Incidental to the Execution of the Will
 
임산부 영양상담
임산부 영양상담임산부 영양상담
임산부 영양상담
 
Leading Age 2015 Presentation Repositioning ROI Synergy in Design, Care, and...
Leading Age 2015 Presentation Repositioning ROI  Synergy in Design, Care, and...Leading Age 2015 Presentation Repositioning ROI  Synergy in Design, Care, and...
Leading Age 2015 Presentation Repositioning ROI Synergy in Design, Care, and...
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 

Similaire à March 2012 HUG: JuteRC compiler

Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitJukka Zitting
 
Hibernate complete Training
Hibernate complete TrainingHibernate complete Training
Hibernate complete Trainingsourabh aggarwal
 
JavaScript Miller Columns
JavaScript Miller ColumnsJavaScript Miller Columns
JavaScript Miller ColumnsJonathan Fine
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupJelena Zanko
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
20100730 phpstudy
20100730 phpstudy20100730 phpstudy
20100730 phpstudyYusuke Ando
 
jRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting ServicejRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting ServiceGunnar Hillert
 
.NET @ apache.org
 .NET @ apache.org .NET @ apache.org
.NET @ apache.orgTed Husted
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready WebMichael Nelson
 
07 response-headers
07 response-headers07 response-headers
07 response-headershanichandra
 
JBoss Architect Forum London - October 2013 - Platform as a What?
JBoss Architect Forum London - October 2013 - Platform as a What?JBoss Architect Forum London - October 2013 - Platform as a What?
JBoss Architect Forum London - October 2013 - Platform as a What?JBossArchitectForum
 
Hibernate
HibernateHibernate
HibernateAjay K
 
Joget Workflow v6 Training Slides - 20 - Basic System Administration
Joget Workflow v6 Training Slides - 20 - Basic System AdministrationJoget Workflow v6 Training Slides - 20 - Basic System Administration
Joget Workflow v6 Training Slides - 20 - Basic System AdministrationJoget Workflow
 
Ruby for soul of BigData Nerds
Ruby for soul of BigData NerdsRuby for soul of BigData Nerds
Ruby for soul of BigData NerdsAbhishek Parolkar
 

Similaire à March 2012 HUG: JuteRC compiler (20)

Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache Jackrabbit
 
Hibernate complete Training
Hibernate complete TrainingHibernate complete Training
Hibernate complete Training
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
JavaScript Miller Columns
JavaScript Miller ColumnsJavaScript Miller Columns
JavaScript Miller Columns
 
dJango
dJangodJango
dJango
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
Test02
Test02Test02
Test02
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Java se7 features
Java se7 featuresJava se7 features
Java se7 features
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
20100730 phpstudy
20100730 phpstudy20100730 phpstudy
20100730 phpstudy
 
jRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting ServicejRecruiter - The AJUG Job Posting Service
jRecruiter - The AJUG Job Posting Service
 
.NET @ apache.org
 .NET @ apache.org .NET @ apache.org
.NET @ apache.org
 
Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
 
07 response-headers
07 response-headers07 response-headers
07 response-headers
 
JBoss Architect Forum London - October 2013 - Platform as a What?
JBoss Architect Forum London - October 2013 - Platform as a What?JBoss Architect Forum London - October 2013 - Platform as a What?
JBoss Architect Forum London - October 2013 - Platform as a What?
 
Hibernate
HibernateHibernate
Hibernate
 
Joget Workflow v6 Training Slides - 20 - Basic System Administration
Joget Workflow v6 Training Slides - 20 - Basic System AdministrationJoget Workflow v6 Training Slides - 20 - Basic System Administration
Joget Workflow v6 Training Slides - 20 - Basic System Administration
 
Ruby for soul of BigData Nerds
Ruby for soul of BigData NerdsRuby for soul of BigData Nerds
Ruby for soul of BigData Nerds
 

Plus de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Dernier

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Dernier (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

March 2012 HUG: JuteRC compiler

  • 1. Hadoop Jute RC Compiler Tanping Wang Yahoo! User Data Analytics
  • 2. Agenda • How we use Jute compiler today • JuteRC compiler • By using JuteRC, what we have achieved Yahoo! Presentation Template, Confidential 2 03/30/12
  • 3. Hadoop Record Compiler (Jute) • Generates serialization code to store data in the Sequence File format org.apache.hadoop.record Yahoo! Presentation Template, Confidential 3 03/30/12
  • 4. How Do We Use Jute Today • Use Data Definition Language to define my data type: class MyDataType { buffer myBuffer; long myLong; } • Use Jute compiler to generate serialization code: $ rcc –language java mydatatype.jr Yahoo! Presentation Template, Confidential 4 03/30/12
  • 5. Jute Generates Serialization Code for me public void serialize(final org.apache.hadoop.record.RecordOutput _rio_a, final String _rio_tag) throws java.io.IOException { _rio_a.startRecord(this,_rio_tag); _rio_a.writeBuffer(myBuffer,"myBuffer"); _rio_a.writeLong(myLong,"myLong"); _rio_a.endRecord(this,_rio_tag); } private void deserializeWithoutFilter(final org.apache.hadoop.record.RecordInput _rio_a, final String _rio_tag) throws java.io.IOException { _rio_a.startRecord(_rio_tag); myBuffer=_rio_a.readBuffer("myBuffer"); myLong=_rio_a.readLong("myLong"); _rio_a.endRecord(_rio_tag); } Yahoo! Presentation Template, Confidential 5 03/30/12
  • 6. Today Yahoo audience ETL pipeline processes tens of terabytes of data per day. We rely on Jute. We use Sequence File to store our data. Yahoo! Presentation Template, Confidential 6 03/30/12
  • 7. However, We Want To Use RC Format. Yahoo! Presentation Template, Confidential 7 03/30/12
  • 8. RC File Format • RCFile shares much similarity with Sequence File, but splits a file into row groups. Inside each row group, it stores columns as rows. • Similar data types are grouped together. This potentially brings better compression rate. Yahoo! Presentation Template, Confidential 8 03/30/12
  • 9. Jute only supports Sequence File Format. So We built JuteRC Compiler. Yahoo! Presentation Template, Confidential 9 03/30/12
  • 10. Data Type Yahoo! Presentation Template, Confidental 10 03/30/12
  • 11. Also… • For each JType, overwrite genReadMethod and genWriteMethod. • Changed CodeGenerator in Jute. Yahoo! Presentation Template, Confidential 11 03/30/12
  • 12. Serialization Code Generated by Jute v.s. JuteRC Jute: public void serialize(final org.apache.hadoop.record.RecordOutput _rio_a, final String _rio_tag) throws java.io.IOException { _rio_a.startRecord(this,_rio_tag); _rio_a.writeBuffer(myBuffer,"myBuffer"); _rio_a.writeLong(myLong,"myLong"); _rio_a.endRecord(this,_rio_tag); } JuteRC: public class MyDataType extends org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable { public void serialize() { int writeIndx = 0; try { com.yahoo.ccdi.fetl.RcUtil.writeBuffer(this, myBuffer, writeIndx++); com.yahoo.ccdi.fetl.RcUtil.writeLong(this, myLong, writeIndx++); } catch(java.io.IOException e) { } } Yahoo! Presentation Template, Confidential 12 03/30/12
  • 13. Deserialization Code Jute: private void deserializeWithoutFilter(final org.apache.hadoop.record.RecordInput _rio_a, final String _rio_tag) throws java.io.IOException { _rio_a.startRecord(_rio_tag); myBuffer=_rio_a.readBuffer("myBuffer"); myLong=_rio_a.readLong("myLong"); _rio_a.endRecord(_rio_tag); } JuteRC: public void deserialize(org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable bra){ int readIndx = 0; try { myBuffer=com.yahoo.ccdi.fetl.RcUtil.readBuffer(bra, readIndx++); myLong=com.yahoo.ccdi.fetl.RcUtil.readLong(bra, readIndx++); }catch(java.io.IOException e) { }}} Yahoo! Presentation Template, Confidential 13 03/30/12
  • 14. Using RC • Convert sequence file format file to RC format: achieved 26~28% file size reduction. • Faster IO performance: reading/writing 0.6X • Process our data using both Hive and PIG on top of HCatalog. Yahoo! Presentation Template, Confidential 14 03/30/12
  • 15. Open Source • We are in the process to open source JuteRC. Under review by Yahoo! Open Source Working Group. • MapReduce programmer can directly plug in the code generated by JuteRC and store their data in RC format. Yahoo! Presentation Template, Confidential 15 03/30/12
  • 16. References • RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11- • Hive RCFile: http://hive.apache.org/docs/r0.7.0/api/org/apache/hadoop/hive/ql/io/RCFile.ht Yahoo! Presentation Template, Confidential 16 03/30/12