SlideShare une entreprise Scribd logo
1  sur  17
Introduction to Pig Xiafei.qiu@PCA
Nested Data Model Field, Tuple, Bag, Map
Normal Operators Arithmetic Operators X = FOREACH A GENERATE f1, f2, f1 % f2; Boolean Operators X = FILTER A BY (f1 == 8) OR (NOT (f2+f3 > f1)); Cast operators X = FOREACH B GENERATE group, (chararray) COUNT(A) AS total; Comparison Operators X = FILTER A BY (f1 matches '.*apache.*') OR (NOT (f2+f3 > f1)); Flatten Operator Tuple: remove a level of nesting Bag :remove a level of nesting, may cause cross product 
Normal Operators
Relational Operators LOADa bag of tuples A = LOAD 'data' [USING function] [AS schema];  STORE A = STORE alias INTO 'directory' [USING function]; FOREACHtuple in the bag, produce a new tuple A = FOREACH queries GENERATE uid, expandQuery(query); FILTERa bag to produce a subset of it A = FILTER queries BY uidneq ‘bot’ OR notBot(uid);
Relational Operators COGROUP/GROUPone or less than 127 relations alias = GROUP … by …, … by… {group: int, A: {name: chararray,age: int,gpa: float}} (18,{(John,18,4.0F),(Joe,18,3.8F)})
Relational Operators JOIN(inner/outer) Replicated Joins one or more relations are small enough to fit into main memory.  Skewed Joins computes a histogram of the key space and uses this data to allocate reducers for a given key.  Merge Joins Sorted, perform join on map phase
Relational Operators
Relational Operators ORDERalias by filed DESC/ASC Unstable SPLITalias INTO alias IF …, alias IF … CROSS cross product X = CROSS A, B; DISTINCT Removes duplicate tuples in a relation. X = DISTINCT A; LIMIT LIMITE A 3; SAMPLE SAMPLE alias size; IMPORT Import other .pig file DEFINE Define a Pig macro.
Built In Eval Function AVG/MAX/MIN/SUM  on a single column of a bag; group it first COUNT/ COUNT_STAR  number of elements in a bag; COUNT_STAR  counts null CONCAT DIFF IsEmpty SIZE TOKENIZE
Other Built In Function Load/Store Functions Math Functions String Functions
Map-Reduce Plan Compilation Compile each GROUP into distinct Map-Reduce job Push commands between LOAD and GROUP to the Map Side Commands between subsequent GROUP Gi and Gi+1 pushed into the Reduce  Side of Gi
Map-Reduce Plan Compilation ORDER is compiled into two map-reduce jobs. MR1: sample the key space MR2: sort
User Defined Function Simple Eval Function public class UPPER extends EvalFunc<String>{  public String exec(Tuple input) throws IOException {     // .......  }}
User Defined Function Aggregate Functions Algebraic Interface they can be computed incrementally in a distributed fashion. Accumulator Interface designed to decrease memory usage
Accumulator Interface public interface Accumulator <T> {    public void accumulate(Tuple b) throws IOException;      public T getValue();      public void cleanup();}
Aggregate Functions public interface Algebraic{        public String getInitial();        public String getIntermed();        public String getFinal();}

Contenu connexe

Tendances

STACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data StructureSTACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data StructureYaksh Jethva
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementationSri Prasanna
 
Csa stack
Csa stackCsa stack
Csa stackPCTE
 
Presentation topic is stick data structure
Presentation topic is stick data structurePresentation topic is stick data structure
Presentation topic is stick data structureAizazAli21
 
Simple Linear Regression with R
Simple Linear Regression with RSimple Linear Regression with R
Simple Linear Regression with RJerome Gomes
 
Stacks Implementation and Examples
Stacks Implementation and ExamplesStacks Implementation and Examples
Stacks Implementation and Examplesgreatqadirgee4u
 
Stack organization
Stack organizationStack organization
Stack organizationchauhankapil
 
Data Visualization With R: Introduction
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: IntroductionRsquared Academy
 
Data Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsRsquared Academy
 

Tendances (20)

STACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data StructureSTACK ( LIFO STRUCTURE) - Data Structure
STACK ( LIFO STRUCTURE) - Data Structure
 
Introduction To Stack
Introduction To StackIntroduction To Stack
Introduction To Stack
 
Stack project
Stack projectStack project
Stack project
 
Algo>Stacks
Algo>StacksAlgo>Stacks
Algo>Stacks
 
Stack of Data structure
Stack of Data structureStack of Data structure
Stack of Data structure
 
Stack
StackStack
Stack
 
Stack - Data Structure
Stack - Data StructureStack - Data Structure
Stack - Data Structure
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
 
Stack push pop
Stack push popStack push pop
Stack push pop
 
Stack a Data Structure
Stack a Data StructureStack a Data Structure
Stack a Data Structure
 
Csa stack
Csa stackCsa stack
Csa stack
 
Presentation topic is stick data structure
Presentation topic is stick data structurePresentation topic is stick data structure
Presentation topic is stick data structure
 
stack presentation
stack presentationstack presentation
stack presentation
 
Simple Linear Regression with R
Simple Linear Regression with RSimple Linear Regression with R
Simple Linear Regression with R
 
Stack and queue
Stack and queueStack and queue
Stack and queue
 
Stacks Implementation and Examples
Stacks Implementation and ExamplesStacks Implementation and Examples
Stacks Implementation and Examples
 
Stack organization
Stack organizationStack organization
Stack organization
 
Stacks in c++
Stacks in c++Stacks in c++
Stacks in c++
 
Data Visualization With R: Introduction
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: Introduction
 
Data Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple Graphs
 

Similaire à Introduction to pig

ADT STACK and Queues
ADT STACK and QueuesADT STACK and Queues
ADT STACK and QueuesBHARATH KUMAR
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)Sri Prasanna
 
Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Calvin Cheng
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementationtugrulh
 
Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-listpinakspatel
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
MapfilterreducepresentationManjuKumara GH
 
Functional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative ProgrammersFunctional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative ProgrammersChris
 
Type class survival guide
Type class survival guideType class survival guide
Type class survival guideMark Canlas
 
Data structures stacks
Data structures   stacksData structures   stacks
Data structures stacksmaamir farooq
 
Fp in scala part 1
Fp in scala part 1Fp in scala part 1
Fp in scala part 1Hang Zhao
 
Composition birds-and-recursion
Composition birds-and-recursionComposition birds-and-recursion
Composition birds-and-recursionDavid Atchley
 
Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)osfameron
 

Similaire à Introduction to pig (20)

Lec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptxLec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptx
 
4.1-Pig.pptx
4.1-Pig.pptx4.1-Pig.pptx
4.1-Pig.pptx
 
Hadoop Pig
Hadoop PigHadoop Pig
Hadoop Pig
 
ADT STACK and Queues
ADT STACK and QueuesADT STACK and Queues
ADT STACK and Queues
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
 
Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)
 
Lec2 Mapred
Lec2 MapredLec2 Mapred
Lec2 Mapred
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and ImplementationDistributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
 
Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-list
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
Mapfilterreducepresentation
 
Functional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative ProgrammersFunctional Programming Concepts for Imperative Programmers
Functional Programming Concepts for Imperative Programmers
 
Type class survival guide
Type class survival guideType class survival guide
Type class survival guide
 
Data structures stacks
Data structures   stacksData structures   stacks
Data structures stacks
 
stacks and queues
stacks and queuesstacks and queues
stacks and queues
 
Fp in scala part 1
Fp in scala part 1Fp in scala part 1
Fp in scala part 1
 
Composition birds-and-recursion
Composition birds-and-recursionComposition birds-and-recursion
Composition birds-and-recursion
 
Pig
PigPig
Pig
 
Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)Functional Pearls 4 (YAPC::EU::2009 remix)
Functional Pearls 4 (YAPC::EU::2009 remix)
 
Zenith it-hadoop-training
Zenith it-hadoop-trainingZenith it-hadoop-training
Zenith it-hadoop-training
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Introduction to pig