SlideShare une entreprise Scribd logo
1  sur  25
The Apache Hadoop HIVE
Omoyayi Ibrahim Omodamilola
Student No.; 20174831
PhD Biomedical Engineering
Outline
• Big Data
• History of Database (NoSQL vs SQL)
• New SQL database
• SQL
• NoSQL
• Factors Affecting the Selection of a database
• Hadoop Hive
– Functions of Hive on Hadoop
– Hive vs Java vs Pig
• Hadoop Distributed File System
• Hive Architecture
• Work Flow of Hive
• List of Reference
INTRODUCTION
• The initiation of The Hadoop Apache Hive began in 2007 by
Facebook due to its data growth.
• This ETL system began to fail over few years as more people
joined Facebook.
• In August 2008, Facebook decided to move to scalable a
more scalable open-source Hadoop environment; Hive
• Facebook, Netflix and Amazons support the Apache Hive
SQL now known as the HiveQL
SQL (left) vs NoSQL (right)
Source: Google Images
NEW STRUCTURED QUERY LANGUAGE
NewSQL
• Relational + NoSQL
• designed for Web-scale applications
• provide many of the traditional SQL
operations
Class of modern relational database management systems that seek to
provide the same scalable performance of NoSQL systems for online
transaction processing (OLTP) read-write workloads while still
maintaining the ACID guarantees of a traditional database system.
RELATIONAL DATABASES SQL
• Structured Query Language (SQL)
• Consists of two or more tables with columns and
row
• Relationship between tables and field types is called
a schema
• (SQL) is a programming language used by database
(MySQL, Sybase, Oracle, or IBM DM2, SQL)
architects to design relational databases.
• These databases are well understood and widely
supported
Popular SQL databases and RDBMS’s
• MySQL—the most popular open-source database
• Oracle—an object-relational DBMS written in the C++ language.
• IMB DB2—a family of database server products from IBM that are
built to handle advanced “big data” analytics.
• Sybase—a relational model database server product for
businesses primarily used on the Unix OS and Linux
• MS SQL Server—a Microsoft-developed RDBMS for enterprise-
level databases that supports both SQL and NoSQL architectures.
• Microsoft Azure—a cloud computing platform that supports any
operating system, and lets you store, compute, and scale data
• MariaDB—an enhanced, drop-in version of MySQL.
• PostgreSQL—an enterprise-level, object-relational DBMS that uses
procedural languages like Perl and Python.
NOSQL DATABASES
• Easy to access
• Greater flexibility
• Documents oriented data
• Massive amounts of data
• Uncleared data requirements
• Data Includes: sensor data, social sharing, personal
settings, photos, location-based information, online
activity, usage metrics, etc
Source: UpWork
POPULAR NOSQL DATABASES
• MongoDB—the most popular NoSQL system
• Apache’s CouchDB—a true DB for the web, it uses the
JSON data exchange format to store its documents
• HBase—another Apache project, developed as a part of
Hadoop, this open-source, non-relational “column
store”
• Oracle NoSQL—Oracle’s entry into the NoSQL category.
• Apache’s Cassandra DB—born at Facebook, handling
massive amounts of structured data. Examples:
Instagram, Comcast, Apple, and Spotify (growing app).
• Riak—It has fault-tolerance replication and automatic
data distribution built in for excellent performance.
SQL
Pros Cons
Relational databases work with structured data. Relational Databases do not scale out
horizontally very well (concurrency and data
size), only vertically.
They support ACID (Atomicity, Consistency,
Isolation, Durability) transactional consistency
and support.
Data is normalized, meaning lots of joins, which
affects speed.
They come with built-in data integrity and a
large eco-system.
Data is normalized, meaning lots of joins, which
affects speed.
Relationships in this system have constraints. They have problems working with semi-
structured data.
There is limitless indexing. Strong SQL
NoSQL
Pros Cons
They scale out horizontally and work with
unstructured and semi-structured data.
Data is deformalized, requiring mass updates
(i.e. product name change).
Some support ACID transactional
consistency.
Weaker or eventual consistency instead of
ACID
Schema-free or Schema-on-read options. Does not have built-in data integrity (must
do in code)
High availability of language training, setup,
and developments cost
Limited support
Databases are open source and so “free” Does not have built-in data integrity (must
do in code)
Numerous commercial products available.
Hadoop
• Facebook, Google, Yahoo, Amazon, and Microsoft
• Exponential growth of data
• Doug Cutting developed an open source version of
MapReduce system called Hadoop
• Hadoop is a software ecosystem that allows for
massively parallel computing
• Large data procedure which might takes 20 hours of
processing time on relational database may only
take 3 minutes with Hadoop
• Hive looks like old SQL - HQL
Hadoop clusters on Client computers
Hive is not
• A relational database
• A design for OnLine Transaction Processing
OLTP
• A language for real-time queries and row-level
updates
FUCTIONS OF HIVE ON HADOOP
• Data Warehouse system built on top of Hadoop
• Takes advantages of Hadoop processing power
• Facilitates data summarization, ad-hoc queries,
analysis of large datasets stored in Hadoop
• Provides a SQL interface (known as Hive QL – HQL)
which is widely familiar to most programmers
• Saves times using Hadoop MapReduce programmes
• Provides mechanism to project structure onto
Hadoop datasets
• Loads fast and allow flexibility at the cost of query
time
Apaches framework
• Sqoop: It is used to import and export data to
and from between HDFS and RDBMS.
• Pig: It is a procedural language platform used
to develop a script for MapReduce operations.
• Hive: It is a platform used to develop SQL type
scripts to do MapReduce operations
Hive vs Java and Pig
Java Pig
• Word Count MapReduce
example: List words and
number of occurrences in a
document
Java takes 63 lines of java codes
to write this hive only takes 7
easy lines of code.
• High level programming
language
• Good for ETL
• Powerful transformation
capabilities
• Often used in combination with
HIVE.
Hive Architecture
HIVE DIRECTORY STRUCTURE
• Lib directory
– SHIVE_HOME/lib
– Location of the Hive JAR files
– Contain the actual Java code that implement the Hive
functionality
• Bin directory
– SHIVE_HOME/bin
– Location of Hive Scripts/Services
• Conf directory
– HIVE_HOME/conf
– Location of configuration files
Summary & Conclusion
• Hive is a data warehouse infrastructure tool to process
structured data in Hadoop.
• It resides on top Hadoop to summarize Big Data, and
makes querying and analyzing easy.
• Initially Hive was developed by Facebook, later the
Apache Software Foundation took it up and
• Developed it further as an open source under the
name Apache Hive.
• It is used by different companies. For example,
Amazon uses it in Amazon Elastic MapReduce.
REFERENCES
• http://www.dataversity.net/review-pros-cons-
different-databases-relational-versus-non-
relational/
• https://segment.com/blog/choosing-a-
database-for-analytics/
• https://www.upwork.com/hiring/data/sql-vs-
nosql-databases-whats-the-difference/
DON’T THANK ME THANK HIVE

Contenu connexe

Tendances

Tendances (20)

Apache hive
Apache hiveApache hive
Apache hive
 
Hive
HiveHive
Hive
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Hbase
HbaseHbase
Hbase
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 

Similaire à Apache Hadoop Hive

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxiaeronlineexm
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 

Similaire à Apache Hadoop Hive (20)

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Apache drill
Apache drillApache drill
Apache drill
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
hive.pptx
hive.pptxhive.pptx
hive.pptx
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 

Plus de Some corner at the Laboratory (10)

CRISPR in Cancer Biology and Therapy.pptx
CRISPR in Cancer Biology and Therapy.pptxCRISPR in Cancer Biology and Therapy.pptx
CRISPR in Cancer Biology and Therapy.pptx
 
Wet Granulation Process Optimization 1.0.pptx
Wet Granulation Process Optimization 1.0.pptxWet Granulation Process Optimization 1.0.pptx
Wet Granulation Process Optimization 1.0.pptx
 
Smart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel Biosynthesis
Smart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel BiosynthesisSmart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel Biosynthesis
Smart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel Biosynthesis
 
microRNA “miRNA”mi RNA
microRNA “miRNA”mi RNAmicroRNA “miRNA”mi RNA
microRNA “miRNA”mi RNA
 
Tissue regeneration of the liver
Tissue regeneration of the liverTissue regeneration of the liver
Tissue regeneration of the liver
 
Omoyayi ibrahim
Omoyayi ibrahimOmoyayi ibrahim
Omoyayi ibrahim
 
Hydrogel Nanocomposties: THE BIOMEDICAL APPLICATION
Hydrogel Nanocomposties: THE BIOMEDICAL APPLICATIONHydrogel Nanocomposties: THE BIOMEDICAL APPLICATION
Hydrogel Nanocomposties: THE BIOMEDICAL APPLICATION
 
The Skeletal & Muscular Systems ;
The  Skeletal & Muscular Systems ; The  Skeletal & Muscular Systems ;
The Skeletal & Muscular Systems ;
 
Biomaterials presentation
Biomaterials presentationBiomaterials presentation
Biomaterials presentation
 
The social value of pollution prevention
The social value of pollution preventionThe social value of pollution prevention
The social value of pollution prevention
 

Dernier

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 

Dernier (20)

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 

Apache Hadoop Hive

  • 1. The Apache Hadoop HIVE Omoyayi Ibrahim Omodamilola Student No.; 20174831 PhD Biomedical Engineering
  • 2. Outline • Big Data • History of Database (NoSQL vs SQL) • New SQL database • SQL • NoSQL • Factors Affecting the Selection of a database • Hadoop Hive – Functions of Hive on Hadoop – Hive vs Java vs Pig • Hadoop Distributed File System • Hive Architecture • Work Flow of Hive • List of Reference
  • 3.
  • 4. INTRODUCTION • The initiation of The Hadoop Apache Hive began in 2007 by Facebook due to its data growth. • This ETL system began to fail over few years as more people joined Facebook. • In August 2008, Facebook decided to move to scalable a more scalable open-source Hadoop environment; Hive • Facebook, Netflix and Amazons support the Apache Hive SQL now known as the HiveQL
  • 5. SQL (left) vs NoSQL (right) Source: Google Images
  • 6. NEW STRUCTURED QUERY LANGUAGE NewSQL • Relational + NoSQL • designed for Web-scale applications • provide many of the traditional SQL operations Class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.
  • 7. RELATIONAL DATABASES SQL • Structured Query Language (SQL) • Consists of two or more tables with columns and row • Relationship between tables and field types is called a schema • (SQL) is a programming language used by database (MySQL, Sybase, Oracle, or IBM DM2, SQL) architects to design relational databases. • These databases are well understood and widely supported
  • 8. Popular SQL databases and RDBMS’s • MySQL—the most popular open-source database • Oracle—an object-relational DBMS written in the C++ language. • IMB DB2—a family of database server products from IBM that are built to handle advanced “big data” analytics. • Sybase—a relational model database server product for businesses primarily used on the Unix OS and Linux • MS SQL Server—a Microsoft-developed RDBMS for enterprise- level databases that supports both SQL and NoSQL architectures. • Microsoft Azure—a cloud computing platform that supports any operating system, and lets you store, compute, and scale data • MariaDB—an enhanced, drop-in version of MySQL. • PostgreSQL—an enterprise-level, object-relational DBMS that uses procedural languages like Perl and Python.
  • 9. NOSQL DATABASES • Easy to access • Greater flexibility • Documents oriented data • Massive amounts of data • Uncleared data requirements • Data Includes: sensor data, social sharing, personal settings, photos, location-based information, online activity, usage metrics, etc
  • 11. POPULAR NOSQL DATABASES • MongoDB—the most popular NoSQL system • Apache’s CouchDB—a true DB for the web, it uses the JSON data exchange format to store its documents • HBase—another Apache project, developed as a part of Hadoop, this open-source, non-relational “column store” • Oracle NoSQL—Oracle’s entry into the NoSQL category. • Apache’s Cassandra DB—born at Facebook, handling massive amounts of structured data. Examples: Instagram, Comcast, Apple, and Spotify (growing app). • Riak—It has fault-tolerance replication and automatic data distribution built in for excellent performance.
  • 12.
  • 13. SQL Pros Cons Relational databases work with structured data. Relational Databases do not scale out horizontally very well (concurrency and data size), only vertically. They support ACID (Atomicity, Consistency, Isolation, Durability) transactional consistency and support. Data is normalized, meaning lots of joins, which affects speed. They come with built-in data integrity and a large eco-system. Data is normalized, meaning lots of joins, which affects speed. Relationships in this system have constraints. They have problems working with semi- structured data. There is limitless indexing. Strong SQL
  • 14. NoSQL Pros Cons They scale out horizontally and work with unstructured and semi-structured data. Data is deformalized, requiring mass updates (i.e. product name change). Some support ACID transactional consistency. Weaker or eventual consistency instead of ACID Schema-free or Schema-on-read options. Does not have built-in data integrity (must do in code) High availability of language training, setup, and developments cost Limited support Databases are open source and so “free” Does not have built-in data integrity (must do in code) Numerous commercial products available.
  • 15. Hadoop • Facebook, Google, Yahoo, Amazon, and Microsoft • Exponential growth of data • Doug Cutting developed an open source version of MapReduce system called Hadoop • Hadoop is a software ecosystem that allows for massively parallel computing • Large data procedure which might takes 20 hours of processing time on relational database may only take 3 minutes with Hadoop • Hive looks like old SQL - HQL
  • 16. Hadoop clusters on Client computers
  • 17. Hive is not • A relational database • A design for OnLine Transaction Processing OLTP • A language for real-time queries and row-level updates
  • 18. FUCTIONS OF HIVE ON HADOOP • Data Warehouse system built on top of Hadoop • Takes advantages of Hadoop processing power • Facilitates data summarization, ad-hoc queries, analysis of large datasets stored in Hadoop • Provides a SQL interface (known as Hive QL – HQL) which is widely familiar to most programmers • Saves times using Hadoop MapReduce programmes • Provides mechanism to project structure onto Hadoop datasets • Loads fast and allow flexibility at the cost of query time
  • 19. Apaches framework • Sqoop: It is used to import and export data to and from between HDFS and RDBMS. • Pig: It is a procedural language platform used to develop a script for MapReduce operations. • Hive: It is a platform used to develop SQL type scripts to do MapReduce operations
  • 20. Hive vs Java and Pig Java Pig • Word Count MapReduce example: List words and number of occurrences in a document Java takes 63 lines of java codes to write this hive only takes 7 easy lines of code. • High level programming language • Good for ETL • Powerful transformation capabilities • Often used in combination with HIVE.
  • 22. HIVE DIRECTORY STRUCTURE • Lib directory – SHIVE_HOME/lib – Location of the Hive JAR files – Contain the actual Java code that implement the Hive functionality • Bin directory – SHIVE_HOME/bin – Location of Hive Scripts/Services • Conf directory – HIVE_HOME/conf – Location of configuration files
  • 23. Summary & Conclusion • Hive is a data warehouse infrastructure tool to process structured data in Hadoop. • It resides on top Hadoop to summarize Big Data, and makes querying and analyzing easy. • Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and • Developed it further as an open source under the name Apache Hive. • It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
  • 25. DON’T THANK ME THANK HIVE