Sqoop Hadoop Tutorial | Apache Sqoop Tutorial | Sqoop Import Data From MySQL to HDFS | Simplilearn

What’s in it for you?
Sqoop Tutorial

Need for Sqoop

Need for Sqoop
What is Sqoop?

Need for Sqoop
What is Sqoop?
Sqoop Features

Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture

Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import

Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import
Sqoop Export

Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import
Sqoop Export
Sqoop Processing

Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import
Sqoop Export
Sqoop Processing
Demo on Sqoop

Need for Sqoop
Data processing
Processing huge volumes of data
requires loading data from diverse
sources
into Hadoop clusters
This process of loading data from
heterogeneous sources comes with a set of
challenges

Need for Sqoop
Maintaining data consistency
1
Challenges

Need for Sqoop
Maintaining data consistency Ensuring efficient
utilization of
resources
1 2
Challenges

Need for Sqoop
utilization of
resources
Loading bulk data
to Hadoop was not
possible
1 2 3
Challenges

Need for Sqoop
utilization of
resources
Loading bulk data
to Hadoop was not
possible
1 2 3 4
Challenges
Loading data using
scripts was slow

Need for Sqoop
utilization of
resources
Loading data using
scripts was slow
Loading bulk data
to Hadoop was not
possible
1 2 3 4
Challenges
Solution
Sqoop helped in overcoming all the challenges to
traditional approach and could load bulk data from
RDBMS to Hadoop very easily

What is Sqoop?
Sqoop is a tool used to transfer bulk data between Hadoop and external datastores such as relational
databases (MS SQL Server, MySQL)
SQOOP = SQL + HADOOP

What is Sqoop?
RDBMS
Import
Export

What is Sqoop?
RDBMS
Import
Export
Export

Sqoop Features
1
5 2
4 3
Parallel import/export
Connectors for all
major RDBMS
databases
Kerberos Security
Integration
Import results of SQL
query
Provides full and
incremental load

Sqoop Features
1
5 2
4 3
Connectors for all major
RDMS databases
Kerberos Security
Integration
query
Provides full and
incremental load
Sqoop uses YARN framework to import and
export data. This provides fault tolerance on top of
parallelism

Sqoop Features
1
5 2
4 3
RDMS databases
Kerberos Security
Integration
query
Provides full and
incremental load
Sqoop allows us to import the result returned from
an SQL query into HDFS

Sqoop Features
1
5 2
4 3
Connectors for all
major RDBMS
databases
Kerberos Security
Integration
query
Provides full and
incremental load
Sqoop provides connectors for multiple Relational
Database Management System (RDBMS)
databases such as MySQL and MS SQL Server

Sqoop Features
1
5 2
4 3
RDMS databases
Kerberos Security
Integration
query
Provides full and
incremental load
Sqoop supports Kerberos computer network
authentication protocol that allows nodes communicating
over a non-secure network to prove their identity to one
another in a secure manner

Sqoop Features
1
5 2
4 3
RDMS databases
Kerberos Security
Integration
query
Sqoop can load the whole table or parts of the table
by a single command. Hence, it supports full and
incremental load
Provides full and
incremental load

Sqoop Architecture
Command
Client
Client submits the import/ export command
to import or export data

Sqoop Architecture
Command
ClientDocument Based
Systems
Relational
Database
Enterprise Data
Warehouse
Connector for Data warehouse
Connector for Document
based system
Connector for RDBMS
Data from different databases is
fetched by Sqoop
Connectors help in working with a
range of popular databases

Sqoop Architecture
Command
Systems
Relational
Database
Enterprise Data
Warehouse
Map Task
HDFS/ HBase/
Hive
Multiple mappers perform map tasks to load
the data on to HDFS

Sqoop Architecture
Command
Systems
Relational
Database
Enterprise Data
Warehouse
Map Task
HDFS/ HBase/
Hive
Similarly, multiple map tasks will export the
data from HDFS on to RDBMS using Sqoop
export command

Sqoop Import
Folders
RDBMS data store

Sqoop Import
Folders
Gathers
Metadata
1
1
Introspect database to gather metadata (primary
key information)
RDBMS data store
Sqoop Import

Sqoop Import
Sqoop job
HDFS
storage
Map
Map
Map
Map
Folders Submits Map-Only
Job
Hadoop Cluster
1
key information)
2
Sqoop divides the input dataset into splits and
uses individual map tasks to push the splits to
HDFS
RDBMS data store
Sqoop Import
2
Gathers
Metadata
1

Sqoop Export
Sqoop job
HDFS
storage
Map
Map
Map
Map
Hadoop Cluster
Sqoop Export
Folders
RDBMS data store
Gathers
Metadata1
Submits Map-Only
Job
2
1
key information)
2
Sqoop divides the input dataset into splits and
uses individual map tasks to push the splits to
RDBMS. Sqoop will export Hadoop files back to
RDBMS tables.

Sqoop Import
$ sqoop import (generic args) (import args)
$ sqoop-import (generic args) (import args)
Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME
--username <username> Set authentication username
--help Print usage instructions

Sqoop Export
$ sqoop export (generic args) (export args)
$ sqoop-export (generic args) (export args)
Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME
--username <username> Set authentication username
--help Print usage instructions

Sqoop Processing
Sqoop runs in the Hadoop cluster1

Sqoop Processing
Sqoop runs in the Hadoop cluster
It imports data from RDBMS / NOSQL database to HDFS
1
2

Sqoop Processing
It uses mappers to slice the incoming data into multiple formats and
load the data in HDFS
1
2
3

Sqoop Processing
It uses mappers to slice the incoming data into multiple formats and
load the data in HDFS
It exports data back into RDBMS while making sure that the
schema of the data in the database in maintained
1
2
3
4

Sqoop Hadoop Tutorial | Apache Sqoop Tutorial | Sqoop Import Data From MySQL to HDFS | Simplilearn

Sqoop Hadoop Tutorial | Apache Sqoop Tutorial | Sqoop Import Data From MySQL to HDFS | Simplilearn

Recommended

Recommended

More Related Content

More from Simplilearn

More from Simplilearn (20)

Recently uploaded

Recently uploaded (20)

Sqoop Hadoop Tutorial | Apache Sqoop Tutorial | Sqoop Import Data From MySQL to HDFS | Simplilearn

Editor's Notes