SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
HCatalog & Templeton
                         Youngwoo Kim (brandon.kim@nexr.com, kt.com)
                                 Daegeun Kim (dani.kim@geekple.com)
                              데이터분석플랫폼 KTCloudware (NexR)




Wednesday, July 18, 12
HCatalog



Wednesday, July 18, 12
Hadoop Ecosystem
                               (Many data processing tools)

                   MapReduce               Hive                 Pig

                                                              LoadFunc
                                                              StoreFunc
                                   Metastore      SerDe
                                                  SerDe


                                   RDBMS



                          InputFormat / OutputFormat / ...

                                      Filesystem

Wednesday, July 18, 12
Problems


                    •    Hive 외에는 메타스토어의 부재

                    •    한 클러스터에서 다양한 도구를 사용하는 경우 연동이 쉽지 않다.

                         •   매번 커뮤니케이션 비용이 발생

                             •   어디에? 어떻게? 뭘?

                         •   M/R, Pig 사용자는 기억해야할 많은 정보

                         •   스키마, 데이터 경로 또는 포맷 변경은 M/R, Pig 에 많은 영향




Wednesday, July 18, 12
HCatalog

                    •    Apache Incubator

                    •    Hive metastore 기반

                    •    M/R, Pig 사용자에게 읽고 쓸 수 있는 프로그래밍 인터페이스 제공

                    •    MapReduce 작업이 필요없는 모든 DDL 명령 제공 (CLI Commands)

                         •   import/export, CREATE TABLE AS SELECT 등 제외

                    •    Data exploration 기능 제공

                         •   SHOW TABLES, DESCRIBE 제공

                         •   http://incubator.apache.org/hcatalog/docs/r0.4.0/cli.html

                    •    Hortonworks,Yahoo, Twitter, ... 등 개발



Wednesday, July 18, 12
Table abstraction



                    •    메타데이터

                         •   데이터 위치, 스키마, 압축, 파티션, 포맷 등

                    •    HCatalog를 이용하여 데이터를 추상화

                         •   한 곳에서 메타데이터가 관리되며 그 만큼 역할 또한 중요

                         •   컬럼 타입으로 primitives, map, list, struct 지원




Wednesday, July 18, 12
HCatalog
                   MapReduce                 Hive                  Pig

                   HCatInputFormat                              HCatLoader
                  HCatOutputFormat                              HCatStorer


                                     Metastore      SerDe
                                                    SerDe


                                                  InputFormat
                                     RDBMS
                                                 OutputFormat



                                        Filesystem

Wednesday, July 18, 12
Data types : Pig

                           HCatalog = Hive                             Pig


                                primitives
                                                        int, long, float, double, chararray
                    (int, long, float, double, string)

                                 map
                                                                      map
                    (contains key and value pairs)

                             list
                                                                       bag
        (contains a list elements of same data type)

                          struct
                                                                      tuple
         (contains elements of different data types)




Wednesday, July 18, 12
Examples



Wednesday, July 18, 12
DDL


            $HCAT_HOME/bin/hcat -e “
            drop table if exists rawevents;
            create external table rawevents (
              url string, user string
            )
            partitioned by (ds string)
            “

            $HIVE_HOME/bin/hive -e “
            LOAD DATA LOCAL INPATH ‘...’ OVERWRITE INTO TABLE rawevents
            PARTITION (ds=‘20120530`)
            “




Wednesday, July 18, 12
Pig

   raw = LOAD '/data/rawevents/20120530' AS (url, user);

   botless = FILTER raw BY myudfs.NotABot(user);

   grpd = GROUP botless by (url, user);

   cntd = FOREACH grpd GENERATE flatten(url, user), COUNT(botless);

   STORE cntd INTO '/data/counted/20120530';
   http://www.slideshare.net/hortonworks/h-cat-berlinbuzzwords2012 : Page. 8




Wednesday, July 18, 12
Pig + HCatalog
   Pig
   raw = LOAD '/data/rawevents/20120530' AS (url, user);


   Pig + HCatalog
   raw = LOAD 'rawevents' using org.apache.hcatalog.pig.HCatLoader();

   LOAD '/data/rawevents/20120530'

   Pig + HCatalog (Partition Filter)
   raw_0530 = FILTER raw BY ds = '20120530';

   Pig
   STORE cntd INTO '/data/counted/20120530';


   Pig + HCatalog
   STORE cntd INTO 'counted' using org.apache.hcatalog.pig.HCatStorer();
                         http://www.slideshare.net/hortonworks/h-cat-berlinbuzzwords2012 : Page. 8



Wednesday, July 18, 12
MapReduce


                    •    HCatInputFormat과 HCatOutputFormat 클래스를 활용

                    •    Value 클래스는 기본적으로 HCatRecord를 활용

                         •   Key는 사용하지 않음

                    •    OutputValueClass는 HCatRecord로 설정

                    •    언제나 그렇듯 Reducer는 필수가 아닌 선택

                    •    파티션 제어 가능

                    •    스키마로 쉽게 제어 가능




Wednesday, July 18, 12
MapReduce - Job
                 Job job = new Job(getConf());
                 job.setJarByClass(HCatMRTest.class);
                 job.setJobName("HCatMRTest");

                 job.setOutputKeyClass(WritableComparable.class);
                 job.setOutputValueClass(HCatRecord.class);

                 job.setMapperClass(HCatMRTest.Map.class);
                 job.setInputFormatClass(HCatInputFormat.class);
                 job.setOutputFormatClass(HCatOutputFormat.class);

                 job.setNumReduceTasks(0);


Wednesday, July 18, 12
MapReduce - DB, TBL, Partition
                 java.util.Map<String, String> partition = ...
                 partition.put("ds", "20120530");

                 in = InputJobInfo.create("DB", "rawevents",
                "ds='20120530'");
                 out = OutputJobInfo.create("DB", "counted", partition);

                 HCatInputFormat.setInput(job, in);
                 HCatOutputFormat.setOutput(job, out);


                 HCatSchema s = HCatOutputFormat.getTableSchema(job);
                 HCatOutputFormat.setSchema(job, s);


Wednesday, July 18, 12
MapReduce - HCatRecord


                    •    레코드 단위로 사용되는 클래스

                    •    boolean, byte, short, integer, long, float, double, string, list, struct, map

                         •   tinyint : HCatRecord.getByte

                         •   smallint : HCatRecord.getShort

                    •    Index 또는 컬럼명으로 접근가능

                         •   컬럼명으로 접근할 때는 HCatSchema 정보 필요

                    •    파티션 컬럼이 들어갈 수 있도록 공간 확보




Wednesday, July 18, 12
MapReduce - HCatRecord
            테이블 스키마 정보 획득 방법

            HCatSchema in = HCatInputFormat.getTableSchema(context)
            HCatSchema out = HCatOutputFormat.getTableSchema(context)

            HCatRecord record = new HCatRecord(3);
            record.set(“url”, out, value.get(“url”, in));

            context.write(null, record);



            해당 스키마 정보는 job.xml에 기록(encoded)
             * mapreduce.lib.hcat.job.info
             * mapreduce.lib.hcatoutput.info




Wednesday, July 18, 12
Conclusions




                    •    Pig 및 MR만을 사용하더라도 메타데이터 관리가 편해진다

                    •    다양한 도구를 활용할 때 효과를 발휘

                    •    빠른 컨트리뷰션이 이루어지고 있어 추후에 더 기대




Wednesday, July 18, 12
Templeton



Wednesday, July 18, 12
Wednesday, July 18, 12
The Templeton project is named after the a
                character in the award-winning children's
                novel Charlotte's Web, by E. B. White. The
                novel's protagonist is a pig named Wilber.
                Templeton is a rat who helps Wilber by
                running errands and making deliveries.




Wednesday, July 18, 12
Templeton

                •        HCatalog 연동

                    •     Thrift
                    •     Java API (HCATALOG-419)
                    •     REST API
                •        Web services interface for HCatalog access and Pig, Hive
                         and MR Job excution
                    •     http://github.com/hortonworks/templeton
                    •     HCATALOG-182
                    •     a.k.a ‘webhcat’


Wednesday, July 18, 12
Getting started

                  • Install
                     ◦ Requirements
                        ■ Hadoop 0.20.205 or Hadoop 1.x
                        ■ Zookeeper
                        ■ HCatalog
                        ■ Hadoop Distributed Cache
                            ■ To use the Hive, Pig, or hadoop/streaming
                              resources
                  • Configuration
                     ◦ templeton-site.xml
                  • Security
                     ◦ Default security (without additional authentication)
                     ◦ Authentication via Kerberos

Wednesday, July 18, 12
Templeton Resources



                :version
                   Returns a list of supported response types.
                status
                   Returns the Templeton server status.
                version
                   Returns the a list of supported versions and the
                current version.




Wednesday, July 18, 12
Templeton Resources (2)
                ddl
                    Performs an HCatalog DDL command.
                ddl/database
                    List HCatalog databases.
                ddl/database/:db (GET)
                    Describe an HCatalog database.
                ddl/database/:db (PUT)
                    Create an HCatalog database.
                ddl/database/:db (DELETE)
                    Delete (drop) an HCatalog database.
                ddl/database/:db/table
                    List the tables in an HCatalog database.
                ddl/database/:db/table/:table (GET)
                    Describe an HCatalog table.
                ddl/database/:db/table/:table (POST)
                    Rename an HCatalog table.
                ddl/database/:db/table/:table/partion
                    List all partitions in an HCatalog table.
                ddl/database/:db/table/:table/partion/:partition (GET)
                    Describe a single partition in an HCatalog table.
                    ......
                    ......
                ddl/database/:db/table/:table/partion/:partition (PUT)
Wednesday, July 18, 12
Templeton Resources (3)

                mapreduce/streaming
                    Creates and queues Hadoop streaming MapReduce jobs.
                mapreduce/jar
                    Creates and queues standard Hadoop MapReduce jobs.
                pig
                    Creates and queues Pig jobs.
                hive
                    Runs Hive queries and commands.
                queue
                    Returns a list of all jobids registered for the user.
                queue/:jobid (GET)
                    Returns the status of a job given its ID.
                queue/:jobid (DELETE)
                    Kill a job given its ID.




Wednesday, July 18, 12
Examples

                $ curl -s 'http://tb080:50111/templeton/v1/status'
                {"status":"ok","version":"v1"}
                $ curl -s -d user.name=nexr -d 'exec=show tables;'
                'http://tb080:50111/templeton/v1/ddl'
                {
                  "stdout": "empnnamenname_a29n",
                  "stderr": "WARNING: org.apache.hadoop.metrics.jvm.EventCounter
                is deprecated. ......
                //[jar:file:/home/nexr/nexr_platforms/hadoop/hadoop-1.0.3/
                lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/
                StaticLoggerBinder.class]nSLF4J: See http://www.slf4j.org/
                codes.html#multiple_bindings for an explanation.nOKnTime
                taken: 0.491 secondsn",
                  "exitcode": 0
                }




Wednesday, July 18, 12
Examples

                $ curl -s 'http://tb080:50111/templeton/v1/ddl/database/default/
                table/emp?user.name=nexr'
                {
                  "statement": "use default; desc emp; ",
                  "error": "...",
                  "exec": {
                    "stdout": "{"columns":[{"name":"empno","type":"int
                "},{"name":"name","type":"string"},{"name":"deptno
                ","type":"int"}]}t t n",
                    "stderr": "WARNING:
                org.apache.hadoop.metrics.jvm.EventCounter is deprecated. ......
                explanation.nOKnTime taken: 0.324 secondsnOKnTime taken:
                0.398 secondsn",
                    "exitcode": 0
                  }
                }




Wednesday, July 18, 12
Examples
                $ curl -s -X PUT -HContent-type:application/json -d '{
                 "comment": "Test table",
                 "columns": [
                   { "name": "id", "type": "bigint" },
                   { "name": "price", "type": "float", "comment": "The unit price" } ],
                 "partitionedBy": [
                   { "name": "country", "type": "string" } ],
                 "format": { "storedAs": "rcfile" } }' 
                'http://tb080:50111/templeton/v1/ddl/database/default/table/test_table?
                user.name=nexr'
                hive> show tables;
                OK
                emp
                test_table
                Time taken: 0.477 seconds
                hive> describe extended test_table;
                OK
                id bigint
                price float The unit price
                country string

                Detailed Table Information Table(tableName:test_table,
                dbName:default, owner:nexr, createTime:1342578059, lastAccessTime:0,
                retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id,
                type:bigint, comment:null), FieldSchema(name:price, type:float,
                comment:The unit price), FieldSchema(name:country, type:string,

Wednesday, July 18, 12
Future of Templeton



                  • webhcat
                  • Java API based on REST API
                  • Integrate or replace existing web interfaces, e.g.,
                    WebHDFS




Wednesday, July 18, 12
References

                  • Apache HCatalog (Incubating), http://
                    incubator.apache.org/hcatalog/
                  • HCatalog, http://www.slideshare.net/ydn/jan-2012-hug-
                    hcatalog
                  • Future of HCatalog, http://www.slideshare.net/
                    hortonworks/future-of-hcatalog-hadoop-summit-2012
                  • Introduction to HCatalog, http://geekdani.wordpress.com/
                    2012/07/11/introduction-to-hcatalog/
                  • HCatalog 설치와 HCatalog를 이용한 Hive & Pig 스키마 연
                    동, http://mixellaneous.tistory.com/1123




Wednesday, July 18, 12

Contenu connexe

En vedette

FRENDS 발표자료 - 취미 개발기
FRENDS 발표자료 - 취미 개발기FRENDS 발표자료 - 취미 개발기
FRENDS 발표자료 - 취미 개발기Daegeun Kim
 
SIP-17 Type Dynamic
SIP-17 Type DynamicSIP-17 Type Dynamic
SIP-17 Type DynamicDaegeun Kim
 
스칼라
스칼라스칼라
스칼라Daegeun Kim
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataSaurav Kumar Sinha
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesThanigai Vellore
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Kubernetes on GCP
Kubernetes on GCPKubernetes on GCP
Kubernetes on GCPDaegeun Kim
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 

En vedette (20)

FRENDS 발표자료 - 취미 개발기
FRENDS 발표자료 - 취미 개발기FRENDS 발표자료 - 취미 개발기
FRENDS 발표자료 - 취미 개발기
 
SIP-17 Type Dynamic
SIP-17 Type DynamicSIP-17 Type Dynamic
SIP-17 Type Dynamic
 
스칼라
스칼라스칼라
스칼라
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big Data
 
Jan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalogJan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalog
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Logical clocks
Logical clocksLogical clocks
Logical clocks
 
Gerrit
GerritGerrit
Gerrit
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Spark and Shark
Spark and SharkSpark and Shark
Spark and Shark
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Kubernetes on GCP
Kubernetes on GCPKubernetes on GCP
Kubernetes on GCP
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 

Similaire à HCatalog & Templeton

EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
 
PHP Streams
PHP StreamsPHP Streams
PHP StreamsG Woo
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Sumeet Singh
 
H cat berlinbuzzwords2012
H cat berlinbuzzwords2012H cat berlinbuzzwords2012
H cat berlinbuzzwords2012Hortonworks
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gatestrihug
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Gabriele Modena
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseCloudera, Inc.
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangaloreappaji intelhunt
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Hortonworks
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionChetan Khatri
 
Simplifying Apache Cascading
Simplifying Apache CascadingSimplifying Apache Cascading
Simplifying Apache CascadingMing Yuan
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 

Similaire à HCatalog & Templeton (20)

EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
PHP Streams
PHP StreamsPHP Streams
PHP Streams
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
 
H cat berlinbuzzwords2012
H cat berlinbuzzwords2012H cat berlinbuzzwords2012
H cat berlinbuzzwords2012
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gates
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Future of HCatalog
Future of HCatalogFuture of HCatalog
Future of HCatalog
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
 
Simplifying Apache Cascading
Simplifying Apache CascadingSimplifying Apache Cascading
Simplifying Apache Cascading
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 

Dernier

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

HCatalog & Templeton

  • 1. HCatalog & Templeton Youngwoo Kim (brandon.kim@nexr.com, kt.com) Daegeun Kim (dani.kim@geekple.com) 데이터분석플랫폼 KTCloudware (NexR) Wednesday, July 18, 12
  • 3. Hadoop Ecosystem (Many data processing tools) MapReduce Hive Pig LoadFunc StoreFunc Metastore SerDe SerDe RDBMS InputFormat / OutputFormat / ... Filesystem Wednesday, July 18, 12
  • 4. Problems • Hive 외에는 메타스토어의 부재 • 한 클러스터에서 다양한 도구를 사용하는 경우 연동이 쉽지 않다. • 매번 커뮤니케이션 비용이 발생 • 어디에? 어떻게? 뭘? • M/R, Pig 사용자는 기억해야할 많은 정보 • 스키마, 데이터 경로 또는 포맷 변경은 M/R, Pig 에 많은 영향 Wednesday, July 18, 12
  • 5. HCatalog • Apache Incubator • Hive metastore 기반 • M/R, Pig 사용자에게 읽고 쓸 수 있는 프로그래밍 인터페이스 제공 • MapReduce 작업이 필요없는 모든 DDL 명령 제공 (CLI Commands) • import/export, CREATE TABLE AS SELECT 등 제외 • Data exploration 기능 제공 • SHOW TABLES, DESCRIBE 제공 • http://incubator.apache.org/hcatalog/docs/r0.4.0/cli.html • Hortonworks,Yahoo, Twitter, ... 등 개발 Wednesday, July 18, 12
  • 6. Table abstraction • 메타데이터 • 데이터 위치, 스키마, 압축, 파티션, 포맷 등 • HCatalog를 이용하여 데이터를 추상화 • 한 곳에서 메타데이터가 관리되며 그 만큼 역할 또한 중요 • 컬럼 타입으로 primitives, map, list, struct 지원 Wednesday, July 18, 12
  • 7. HCatalog MapReduce Hive Pig HCatInputFormat HCatLoader HCatOutputFormat HCatStorer Metastore SerDe SerDe InputFormat RDBMS OutputFormat Filesystem Wednesday, July 18, 12
  • 8. Data types : Pig HCatalog = Hive Pig primitives int, long, float, double, chararray (int, long, float, double, string) map map (contains key and value pairs) list bag (contains a list elements of same data type) struct tuple (contains elements of different data types) Wednesday, July 18, 12
  • 10. DDL $HCAT_HOME/bin/hcat -e “ drop table if exists rawevents; create external table rawevents ( url string, user string ) partitioned by (ds string) “ $HIVE_HOME/bin/hive -e “ LOAD DATA LOCAL INPATH ‘...’ OVERWRITE INTO TABLE rawevents PARTITION (ds=‘20120530`) “ Wednesday, July 18, 12
  • 11. Pig raw = LOAD '/data/rawevents/20120530' AS (url, user); botless = FILTER raw BY myudfs.NotABot(user); grpd = GROUP botless by (url, user); cntd = FOREACH grpd GENERATE flatten(url, user), COUNT(botless); STORE cntd INTO '/data/counted/20120530'; http://www.slideshare.net/hortonworks/h-cat-berlinbuzzwords2012 : Page. 8 Wednesday, July 18, 12
  • 12. Pig + HCatalog Pig raw = LOAD '/data/rawevents/20120530' AS (url, user); Pig + HCatalog raw = LOAD 'rawevents' using org.apache.hcatalog.pig.HCatLoader(); LOAD '/data/rawevents/20120530' Pig + HCatalog (Partition Filter) raw_0530 = FILTER raw BY ds = '20120530'; Pig STORE cntd INTO '/data/counted/20120530'; Pig + HCatalog STORE cntd INTO 'counted' using org.apache.hcatalog.pig.HCatStorer(); http://www.slideshare.net/hortonworks/h-cat-berlinbuzzwords2012 : Page. 8 Wednesday, July 18, 12
  • 13. MapReduce • HCatInputFormat과 HCatOutputFormat 클래스를 활용 • Value 클래스는 기본적으로 HCatRecord를 활용 • Key는 사용하지 않음 • OutputValueClass는 HCatRecord로 설정 • 언제나 그렇듯 Reducer는 필수가 아닌 선택 • 파티션 제어 가능 • 스키마로 쉽게 제어 가능 Wednesday, July 18, 12
  • 14. MapReduce - Job Job job = new Job(getConf()); job.setJarByClass(HCatMRTest.class); job.setJobName("HCatMRTest"); job.setOutputKeyClass(WritableComparable.class); job.setOutputValueClass(HCatRecord.class); job.setMapperClass(HCatMRTest.Map.class); job.setInputFormatClass(HCatInputFormat.class); job.setOutputFormatClass(HCatOutputFormat.class); job.setNumReduceTasks(0); Wednesday, July 18, 12
  • 15. MapReduce - DB, TBL, Partition java.util.Map<String, String> partition = ... partition.put("ds", "20120530"); in = InputJobInfo.create("DB", "rawevents", "ds='20120530'"); out = OutputJobInfo.create("DB", "counted", partition); HCatInputFormat.setInput(job, in); HCatOutputFormat.setOutput(job, out); HCatSchema s = HCatOutputFormat.getTableSchema(job); HCatOutputFormat.setSchema(job, s); Wednesday, July 18, 12
  • 16. MapReduce - HCatRecord • 레코드 단위로 사용되는 클래스 • boolean, byte, short, integer, long, float, double, string, list, struct, map • tinyint : HCatRecord.getByte • smallint : HCatRecord.getShort • Index 또는 컬럼명으로 접근가능 • 컬럼명으로 접근할 때는 HCatSchema 정보 필요 • 파티션 컬럼이 들어갈 수 있도록 공간 확보 Wednesday, July 18, 12
  • 17. MapReduce - HCatRecord 테이블 스키마 정보 획득 방법 HCatSchema in = HCatInputFormat.getTableSchema(context) HCatSchema out = HCatOutputFormat.getTableSchema(context) HCatRecord record = new HCatRecord(3); record.set(“url”, out, value.get(“url”, in)); context.write(null, record); 해당 스키마 정보는 job.xml에 기록(encoded) * mapreduce.lib.hcat.job.info * mapreduce.lib.hcatoutput.info Wednesday, July 18, 12
  • 18. Conclusions • Pig 및 MR만을 사용하더라도 메타데이터 관리가 편해진다 • 다양한 도구를 활용할 때 효과를 발휘 • 빠른 컨트리뷰션이 이루어지고 있어 추후에 더 기대 Wednesday, July 18, 12
  • 21. The Templeton project is named after the a character in the award-winning children's novel Charlotte's Web, by E. B. White. The novel's protagonist is a pig named Wilber. Templeton is a rat who helps Wilber by running errands and making deliveries. Wednesday, July 18, 12
  • 22. Templeton • HCatalog 연동 • Thrift • Java API (HCATALOG-419) • REST API • Web services interface for HCatalog access and Pig, Hive and MR Job excution • http://github.com/hortonworks/templeton • HCATALOG-182 • a.k.a ‘webhcat’ Wednesday, July 18, 12
  • 23. Getting started • Install ◦ Requirements ■ Hadoop 0.20.205 or Hadoop 1.x ■ Zookeeper ■ HCatalog ■ Hadoop Distributed Cache ■ To use the Hive, Pig, or hadoop/streaming resources • Configuration ◦ templeton-site.xml • Security ◦ Default security (without additional authentication) ◦ Authentication via Kerberos Wednesday, July 18, 12
  • 24. Templeton Resources :version Returns a list of supported response types. status Returns the Templeton server status. version Returns the a list of supported versions and the current version. Wednesday, July 18, 12
  • 25. Templeton Resources (2) ddl Performs an HCatalog DDL command. ddl/database List HCatalog databases. ddl/database/:db (GET) Describe an HCatalog database. ddl/database/:db (PUT) Create an HCatalog database. ddl/database/:db (DELETE) Delete (drop) an HCatalog database. ddl/database/:db/table List the tables in an HCatalog database. ddl/database/:db/table/:table (GET) Describe an HCatalog table. ddl/database/:db/table/:table (POST) Rename an HCatalog table. ddl/database/:db/table/:table/partion List all partitions in an HCatalog table. ddl/database/:db/table/:table/partion/:partition (GET) Describe a single partition in an HCatalog table. ...... ...... ddl/database/:db/table/:table/partion/:partition (PUT) Wednesday, July 18, 12
  • 26. Templeton Resources (3) mapreduce/streaming Creates and queues Hadoop streaming MapReduce jobs. mapreduce/jar Creates and queues standard Hadoop MapReduce jobs. pig Creates and queues Pig jobs. hive Runs Hive queries and commands. queue Returns a list of all jobids registered for the user. queue/:jobid (GET) Returns the status of a job given its ID. queue/:jobid (DELETE) Kill a job given its ID. Wednesday, July 18, 12
  • 27. Examples $ curl -s 'http://tb080:50111/templeton/v1/status' {"status":"ok","version":"v1"} $ curl -s -d user.name=nexr -d 'exec=show tables;' 'http://tb080:50111/templeton/v1/ddl' { "stdout": "empnnamenname_a29n", "stderr": "WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. ...... //[jar:file:/home/nexr/nexr_platforms/hadoop/hadoop-1.0.3/ lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/ StaticLoggerBinder.class]nSLF4J: See http://www.slf4j.org/ codes.html#multiple_bindings for an explanation.nOKnTime taken: 0.491 secondsn", "exitcode": 0 } Wednesday, July 18, 12
  • 28. Examples $ curl -s 'http://tb080:50111/templeton/v1/ddl/database/default/ table/emp?user.name=nexr' { "statement": "use default; desc emp; ", "error": "...", "exec": { "stdout": "{"columns":[{"name":"empno","type":"int "},{"name":"name","type":"string"},{"name":"deptno ","type":"int"}]}t t n", "stderr": "WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. ...... explanation.nOKnTime taken: 0.324 secondsnOKnTime taken: 0.398 secondsn", "exitcode": 0 } } Wednesday, July 18, 12
  • 29. Examples $ curl -s -X PUT -HContent-type:application/json -d '{ "comment": "Test table", "columns": [ { "name": "id", "type": "bigint" }, { "name": "price", "type": "float", "comment": "The unit price" } ], "partitionedBy": [ { "name": "country", "type": "string" } ], "format": { "storedAs": "rcfile" } }' 'http://tb080:50111/templeton/v1/ddl/database/default/table/test_table? user.name=nexr' hive> show tables; OK emp test_table Time taken: 0.477 seconds hive> describe extended test_table; OK id bigint price float The unit price country string Detailed Table Information Table(tableName:test_table, dbName:default, owner:nexr, createTime:1342578059, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:bigint, comment:null), FieldSchema(name:price, type:float, comment:The unit price), FieldSchema(name:country, type:string, Wednesday, July 18, 12
  • 30. Future of Templeton • webhcat • Java API based on REST API • Integrate or replace existing web interfaces, e.g., WebHDFS Wednesday, July 18, 12
  • 31. References • Apache HCatalog (Incubating), http:// incubator.apache.org/hcatalog/ • HCatalog, http://www.slideshare.net/ydn/jan-2012-hug- hcatalog • Future of HCatalog, http://www.slideshare.net/ hortonworks/future-of-hcatalog-hadoop-summit-2012 • Introduction to HCatalog, http://geekdani.wordpress.com/ 2012/07/11/introduction-to-hcatalog/ • HCatalog 설치와 HCatalog를 이용한 Hive & Pig 스키마 연 동, http://mixellaneous.tistory.com/1123 Wednesday, July 18, 12