SlideShare une entreprise Scribd logo
1  sur  12
Introduction to
PIG
Agenda
 What Is Pig? And what it is use for?
 Pig Philosophy
 Pig’s Data Model
 Pig Example
 Pig Latin
 Pig Latin vs SQL
 Pig Macros
 Pig UDF’s
What Is Pig? And what it is
use for?
 Pig has a pig engine which is used for executing data flows in parallel like how
map tasks are distributed among the cluster nodes and get job done in
Mapreduce .
 Pig uses its own Pig Latin language for expressing these data flows.
 Pig runs on Hadoop. It makes use of both the Hadoop Distributed File System,
HDFS, and Hadoop’s processing system, MapReduce.
 By default, Pig reads input files from HDFS, uses HDFS to store intermediate data
between MapReduce jobs, and writes its output to HDFS.
 Pig Latin use cases tend to fall into three separate categories: traditional extract
transform load (ETL) data pipelines, research on raw data, and iterative
processing.
Pig Philosophy
 Pigs eat anything
 Pigs live anywhere
 Pigs are domestic animals
 Pigs fly
Pig’s Data Model : Types
 Pig’s data types can be divided into two categories: scalar types, which
contain a single value, and complex types, which contain other types
Pig’s Data Model : Schemas
 If a schema for the data is available, Pig will make use of it, both for up-front error
checking and for optimization
 Syntax: Loads= load ‘data.txt' as(col1:int, col2:chararray, col3:chararray,
col4:float);
 It is also possible to specify the schema without giving explicit data types. In this case,
the data type is assumed to be bytearray
 Syntax: Loads= load ‘data.txt' as(col1, col2, col3, col4);
Pig Example
 Grunt is Pig’s interactive shell. It enables users to enter Pig Latin interactively and
provides a shell for users to interact with HDFS.
 To enter Grunt, pig -x local, pig -x mapreduce, pig -x tez
 records = LOAD 'input/ncdc/micro-tab/sample.txt‘ AS (year:chararray,
temperature:int, quality:int);
 filtered_records = FILTER records BY temperature != 9999 AND quality IN (0, 1, 4,
5, 9);
 grouped_records = GROUP filtered_records BY year;
 max_temp = FOREACH grouped_records GENERATE group,
MAX(filtered_records.temperature);
 DUMP max_temp;
Pig Latin : RelationalOperators
Pig Latin : Diagnostic/UDF
Operators
Pig Latin vs SQL
Pig Macros
 Macros provide a way to package reusable pieces of Pig Latin code from within
Pig Latin itself.
DEFINE max_by_group(X, group_key, max_field) RETURNS Y {
A = GROUP $X by $group_key;
$Y = FOREACH A GENERATE group, MAX($X.$max_field);
};
records = LOAD 'input/ncdc/micro-tab/sample.txt‘ AS (year:chararray, temperature:int,
quality:int);
filtered_records = FILTER records BY temperature != 9999 AND quality IN (0, 1, 4, 5, 9);
max_temp = max_by_group(filtered_records, year, temperature);
DUMP max_temp;
Pig UDF’s
 A Filter UDF
 An Eval UDF
 A Load UDF

Contenu connexe

Tendances

Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMfnothaft
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Andy Petrella
 
ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014fnothaft
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...DECK36
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...William Yetman
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Peadar Coyle
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your datasetTuri, Inc.
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystemGrzegorz Kolpuc
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialGasperi Jerome
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesAndré Valdestilhas
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData StackPeadar Coyle
 
Strata-Hadoop 2015 Presentation
Strata-Hadoop 2015 PresentationStrata-Hadoop 2015 Presentation
Strata-Hadoop 2015 PresentationTimothy Danford
 
Pycon 2016-open-space
Pycon 2016-open-spacePycon 2016-open-space
Pycon 2016-open-spaceChetan Khatri
 
A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsMaribel Acosta Deibe
 

Tendances (20)

Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAM
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystem
 
Spark Summit East 2015
Spark Summit East 2015Spark Summit East 2015
Spark Summit East 2015
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
RESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatialRESTo - restful semantic search tool for geospatial
RESTo - restful semantic search tool for geospatial
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
Strata-Hadoop 2015 Presentation
Strata-Hadoop 2015 PresentationStrata-Hadoop 2015 Presentation
Strata-Hadoop 2015 Presentation
 
Pycon 2016-open-space
Pycon 2016-open-spacePycon 2016-open-space
Pycon 2016-open-space
 
A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Presentation dropbox
Presentation dropboxPresentation dropbox
Presentation dropbox
 
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
 

Similaire à Introduction to pig

Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkbhargavi804095
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
PigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxPigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxRahul Borate
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 

Similaire à Introduction to pig (20)

Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
power point presentation on pig -hadoop framework
power point presentation on pig -hadoop frameworkpower point presentation on pig -hadoop framework
power point presentation on pig -hadoop framework
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
PigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxPigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptx
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
PigHive.pptx
PigHive.pptxPigHive.pptx
PigHive.pptx
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
PigHive.pptx
PigHive.pptxPigHive.pptx
PigHive.pptx
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
Pig
PigPig
Pig
 
Apache pig
Apache pigApache pig
Apache pig
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 

Plus de Uday Vakalapudi

Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting techniqueUday Vakalapudi
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduceUday Vakalapudi
 
Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Uday Vakalapudi
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityUday Vakalapudi
 

Plus de Uday Vakalapudi (12)

Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Introduction to hbase
Introduction to hbaseIntroduction to hbase
Introduction to hbase
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 
Hadoop Mapreduce joins
Hadoop Mapreduce joinsHadoop Mapreduce joins
Hadoop Mapreduce joins
 
Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data Locality
 
Flume basic
Flume basicFlume basic
Flume basic
 

Dernier

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 

Dernier (20)

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 

Introduction to pig

  • 2. Agenda  What Is Pig? And what it is use for?  Pig Philosophy  Pig’s Data Model  Pig Example  Pig Latin  Pig Latin vs SQL  Pig Macros  Pig UDF’s
  • 3. What Is Pig? And what it is use for?  Pig has a pig engine which is used for executing data flows in parallel like how map tasks are distributed among the cluster nodes and get job done in Mapreduce .  Pig uses its own Pig Latin language for expressing these data flows.  Pig runs on Hadoop. It makes use of both the Hadoop Distributed File System, HDFS, and Hadoop’s processing system, MapReduce.  By default, Pig reads input files from HDFS, uses HDFS to store intermediate data between MapReduce jobs, and writes its output to HDFS.  Pig Latin use cases tend to fall into three separate categories: traditional extract transform load (ETL) data pipelines, research on raw data, and iterative processing.
  • 4. Pig Philosophy  Pigs eat anything  Pigs live anywhere  Pigs are domestic animals  Pigs fly
  • 5. Pig’s Data Model : Types  Pig’s data types can be divided into two categories: scalar types, which contain a single value, and complex types, which contain other types
  • 6. Pig’s Data Model : Schemas  If a schema for the data is available, Pig will make use of it, both for up-front error checking and for optimization  Syntax: Loads= load ‘data.txt' as(col1:int, col2:chararray, col3:chararray, col4:float);  It is also possible to specify the schema without giving explicit data types. In this case, the data type is assumed to be bytearray  Syntax: Loads= load ‘data.txt' as(col1, col2, col3, col4);
  • 7. Pig Example  Grunt is Pig’s interactive shell. It enables users to enter Pig Latin interactively and provides a shell for users to interact with HDFS.  To enter Grunt, pig -x local, pig -x mapreduce, pig -x tez  records = LOAD 'input/ncdc/micro-tab/sample.txt‘ AS (year:chararray, temperature:int, quality:int);  filtered_records = FILTER records BY temperature != 9999 AND quality IN (0, 1, 4, 5, 9);  grouped_records = GROUP filtered_records BY year;  max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature);  DUMP max_temp;
  • 8. Pig Latin : RelationalOperators
  • 9. Pig Latin : Diagnostic/UDF Operators
  • 11. Pig Macros  Macros provide a way to package reusable pieces of Pig Latin code from within Pig Latin itself. DEFINE max_by_group(X, group_key, max_field) RETURNS Y { A = GROUP $X by $group_key; $Y = FOREACH A GENERATE group, MAX($X.$max_field); }; records = LOAD 'input/ncdc/micro-tab/sample.txt‘ AS (year:chararray, temperature:int, quality:int); filtered_records = FILTER records BY temperature != 9999 AND quality IN (0, 1, 4, 5, 9); max_temp = max_by_group(filtered_records, year, temperature); DUMP max_temp;
  • 12. Pig UDF’s  A Filter UDF  An Eval UDF  A Load UDF