Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Future of HCatalog - Hadoop Summit 2012

13 293 vues

Publié le

Publié dans : Technologie
  • Positions Available Now! We currently have several openings for writing workers. ■■■ https://tinyurl.com/vvgf8vz
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Earn Up To $316/day! Easy Writing Jobs from the comfort of home! ➤➤ http://t.cn/AieXS62G
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Paid To Write? Earn up to $200/day on with simple writing jobs. ●●● http://ishbv.com/easywriter/pdf
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Looking For A Job? Positions available now. FT or PT. $10-$30/hr. No exp required. ◆◆◆ https://tinyurl.com/vvgf8vz
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • all of this slide outstanding, very apreciate
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Future of HCatalog - Hadoop Summit 2012

  1. Future of HCatalogAlan F. Gates@alanfgates Page 1
  2. Who Am I?• HCatalog committer and mentor• Co-founder of Hortonworks• Lead for Pig, Hive, and HCatalog at Hortonworks• Pig committer and PMC Member• Member of Apache Software Foundation and Incubator PMC• Author of Programming Pig from O’Reilly © Hortonworks Inc. 2012 Page 2
  3. Hadoop EcosystemMapReduce Hive Pig SerDeInputFormat/ InputFormat/ Load/ Metastore ClientOuputFormat OuputFormat Store HDFS Metastore © Hortonworks 2012 Page 3
  4. Opening up Metadata to MR & Pig MapReduce Hive Pig HCaInputFormat/ HCatLoader/ HCatOuputFormat HCatStorer SerDe InputFormat/ Metastore Client OuputFormat HDFS Metastore © Hortonworks 2012 Page 4
  5. Templeton - REST API• REST endpoints: databases, tables, partitions, columns, table properties• PUT to create/update, GET to list or describe, DELETE to drop Get a list of all tables in the default database: GET http://…/v1/ddl/database/default/table Hadoop/ HCatalog { "tables": ["counted","processed",], "database": "default" } © Hortonworks 2012 Page 5
  6. Templeton - REST API• REST endpoints: databases, tables, partitions, columns, table properties• PUT to create/update, GET to list or describe, DELETE to drop Create new table “rawevents” PUT {"columns": [{ "name": "url", "type": "string" }, { "name": "user", "type": "string"}], "partitionedBy": [{ "name": "ds", "type": "string" }]} http://…/v1/ddl/database/default/table/rawevents Hadoop/ HCatalog { "table": "rawevents", "database": "default” } © Hortonworks 2012 Page 6
  7. Templeton - REST API• REST endpoints: databases, tables, partitions, columns, table properties• PUT to create/update, GET to list or describe, DELETE to drop Describe table “rawevents” GET http://…/v1/ddl/database/default/table/rawevents Hadoop/ HCatalog { "columns": [{"name": "url","type": "string"}, {"name": "user","type": "string"}], "database": "default", "table": "rawevents" }• Included in HDP• Not yet checked in, but you can find the code on Apache’s JIRA HCATALOG-182 © Hortonworks 2012 Page 7
  8. Reading and Writing Data in Parallel• Use Case: Users want – to read and write records in parallel between Hadoop and their parallel system – driven by their system – in a language independent way – without needing to understand Hadoop’s file formats• Example: an MPP data store wants to read data out of Hadoop as HCatRecords for its parallel jobs• What exists today – webhdfs – Language independent – Can move data in parallel – Driven from the user side – Moves only bytes, no understanding of file format – Sqoop – Can move data in parallel – Understands data format – Driven from Hadoop side – Requires connector or JDBC © 2012 Hortonworks Page 8
  9. HCatReader and HCatWriter getHCatReader Master HCatalog HCatReader readInput SlaveSplits Iterator<HCatRecord> read Slave HDFS Iterator<HCatRecord> read Slave Iterator<HCatRecord> Right now all in Java, needs to be REST © 2012 Hortonworks Page 9
  10. Storing Semi-/Unstructured DataTable Users File Users Name Zip {"name":"alice","zip":"93201"} Alice 93201 {"name":"bob”,"zip":"76331"} Bob 76331 {"name":"cindy"} {"zip":"87890"} select name, zip A = load ‘Users’ as from users; (name:chararray, zip:chararray); B = foreach A generate name, zip; © Hortonworks Inc. 2012 Page 10
  11. Storing Semi-/Unstructured DataTable Users File Users Name Zip {"name":"alice","zip":"93201"} Alice 93201 {"name":"bob”,"zip":"76331"} Bob 76331 {"name":"cindy"} {"zip":"87890"} A = load ‘Users’ as (name:chararray, zip:chararray); B = foreach A generate name, zip; select name, zip from users; A = load ‘Users’ B = foreach A generate name, zip; © Hortonworks Inc. 2012 Page 11
  12. Storing Semi-/Unstructured DataTable Users File Users Name Zip {"name":"alice","zip":"93201"} Alice 93201 {"name":"bob”,"zip":"76331"} Bob 76331 {"name":"cindy"} {"zip":"87890"} A = load ‘Users’ as (name:chararray, zip:chararray); B = foreach A generate name, zip; select name, zip A = load ‘Users’ from users; B = foreach A generate name, zip; © Hortonworks Inc. 2012 Page 12
  13. Hive ODBC/JDBC Today Issue: Have to have Hive JDBC code on the client Client Hive Hadoop Server Issues: • Not concurrent ODBC • Not secure Client • Not scalableIssue: Open source versionnot easy to use © 2012 Hortonworks Page 13
  14. ODBC/JDBC Proposal JDBC ClientProvide robust open source REST Hadoopimplementations Server • Spawns job inside cluster ODBC • Runs job as submitting user Client • Works with security • Scaling web services well understood © 2012 Hortonworks Page 14
  15. Questions © 2012 Hortonworks Page 15

×