SlideShare une entreprise Scribd logo
1  sur  32
Apache Sqoop
BY DAVIN.J.ABRAHAM
What is Sqoop
 Apache Sqoop is a tool designed for efficiently transferring bulk
data between Apache Hadoop and structured datastores such as
relational databases.
 Sqoop imports data from external structured datastores into HDFS or
related systems like Hive and HBase.
 Sqoop can also be used to export data from Hadoop and export it
to external structured datastores such as relational databases and
enterprise data warehouses.
 Sqoop works with relational databases such as: Teradata, Netezza,
Oracle, MySQL, Postgres, and HSQLDB.
Why Sqoop?
 As more organizations deploy Hadoop to analyse vast streams of
information, they may find they need to transfer large amount of
data between Hadoop and their existing databases, data
warehouses and other data sources
 Loading bulk data into Hadoop from production systems or
accessing it from map-reduce applications running on a large
cluster is a challenging task since transferring data using scripts is a
inefficient and time-consuming task
Hadoop-Sqoop?
 Hadoop is great for storing massive data in terms of volume using
HDFS
 It Provides a scalable processing environment for structured and
unstructured data
 But it’s Batch-Oriented and thus not suitable for low latency
interactive query operations
 Sqoop is basically an ETL Tool used to copy data between HDFS and
SQL databases
 Import SQL data to HDFS for archival or analysis
 Export HDFS to SQL ( e.g : summarized data used in a DW fact table )
What Sqoop Does
 Designed to efficiently transfer bulk data between Apache Hadoop
and structured datastores such as relational databases, Apache
Sqoop:
 Allows data imports from external datastores and enterprise data
warehouses into Hadoop
 Parallelizes data transfer for fast performance and optimal system
utilization
 Copies data quickly from external systems to Hadoop
 Makes data analysis more efficient
 Mitigates excessive loads to external systems.
How Sqoop Works
 Sqoop provides a pluggable connector mechanism for optimal
connectivity to external systems.
 The Sqoop extension API provides a convenient framework for
building new connectors which can be dropped into Sqoop
installations to provide connectivity to various systems.
 Sqoop itself comes bundled with various connectors that can be
used for popular database and data warehousing systems.
Who Uses Sqoop?
 Online Marketer Coupons.com uses sqoop to exchange data
between Hadoop and the IBM Netezza data warehouse appliance,
The organization can query its structres databases and pipe the
results into Hadoop using sqoop.
 Education company The Apollo group also uses the software not
only to extract data from databases but to inject the results from
Hadoop jobs back into relational databases
 And countless other hadoop users use sqoop to efficiently move
their data
Importing Data - Lists databases in
your mysql database.
$ sqoop list-databases --connect jdbc:mysql://<<mysql-server>>/employees --
username airawat --password myPassword
.
.
.
13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
information_schema
employees
test
Lists tables in your mysql database.
$ sqoop list-tables --connect jdbc:mysql://<<mysql-server>>/employees --
username airawat --password myPassword
.
.
.
13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
departments
dept_emp
dept_manager
employees
employees_exp_stg
employees_export
salaries
titles
Importing data in MySql into HDFS
 Replace "airawat-mySqlServer-node" with the host name of the
node running mySQL server, replace login credentials and target
directory.
Importing a table into HDFS - basic import
$ sqoop import 
--connect jdbc:mysql://airawat-mySqlServer-node/employees 
--username myUID 
--password myPWD 
--table employees 
-m 1 
--target-dir /user/airawat/sqoop-mysql/employees
.
.
.
.9139 KB/sec)
13/05/31 22:32:25 INFO mapreduce.ImportJobBase: Retrieved 300024
records
Executing imports with an options
file for static information
 Rather than repeat the import command along with connection
related input required, each time, you can pass an options file as an
argument to sqoop.
 Create a text file, as follows, and save it someplace, locally on the
node you are running the sqoop client on.
. Sample Options file:
___________________________________________________________________________
$ vi SqoopImportOptions.txt
#
#Options file for sqoop import
#
import
--connect
jdbc:mysql://airawat-mySqlServer-node/employees
--username
myUID
--password
myPwd
#
#All other commands should be specified in the command line
Options File - Command
The command
$ sqoop --options-file SqoopImportOptions.txt 
--table departments 
-m 1 
--target-dir /user/airawat/sqoop-mysql/departments
.
.
.
13/05/31 22:48:55 INFO mapreduce.ImportJobBase: Transferred 153 bytes
in 26.2453 seconds (5.8296 bytes/sec)
13/05/31 22:48:55 INFO mapreduce.ImportJobBase: Retrieved 9 records.
-m argument is to specify number of mappers. The department table has a handful of
records, so I am setting it to 1.
The files Created In hdfs
Files created in HDFS:
$ hadoop fs -ls -R sqoop-mysql/
drwxr-xr-x - airawat airawat 0 2013-05-31 22:48 sqoop-
mysql/departments
-rw-r--r-- 3 airawat airawat 0 2013-05-31 22:48 sqoop-
mysql/departments/_SUCCESS
drwxr-xr-x - airawat airawat 0 2013-05-31 22:48 sqoop-
mysql/departments/_logs
drwxr-xr-x - airawat airawat 0 2013-05-31 22:48 sqoop-
mysql/departments/_logs/history
-rw-r--r-- 3 airawat airawat 79467 2013-05-31 22:48 sqoop-
mysql/departments/_logs/history/cdh-
jt01_1369839495962_job_201305290958_0062_conf.xml
-rw-r--r-- 3 airawat airawat 12441 2013-05-31 22:48 sqoop-
mysql/departments/_logs/history/job_201305290958_0062_1370058514473_ airawa
t_departments.jar
-rw-r--r-- 3 airawat airawat 153 2013-05-31 22:48 sqoop-
mysql/departments/part-m-00000
To View the contents of a table
. Data file contents:
$ hadoop fs -cat sqoop-mysql/departments/part-m-00000 | more
d009,Customer Service
d005,Development
d002,Finance
d003,Human Resources
d001,Marketing
d004,Production
d006,Quality Management
d008,Research
d007,Sales
Import all Rows But Column Specific
$ sqoop --options-file SqoopImportOptions.txt 
--table dept_emp 
--columns “EMP_NO,DEPT_NO,FROM_DATE,TO_DATE” 
--as-textfile 
-m 1 
--target-dir /user/airawat/sqoop-mysql/DeptEmp
Import all Columns, But row Specific
using Where Clause
Import all columns, filter rows using where clause
$ sqoop --options-file SqoopImportOptions.txt 
--table employees 
--where "emp_no > 499948" 
--as-textfile 
-m 1 
--target-dir /user/airawat/sqoop-mysql/employeeGtTest
Import - Free Form Query
. Import with a free form query with where clause
$ sqoop --options-file SqoopImportOptions.txt 
--query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where EMP_NO <
20000 AND $CONDITIONS' 
-m 1 
--target-dir /user/airawat/sqoop-mysql/employeeFrfrmQry1
Import without Where clause
Importwithafreeformquerywithoutwhereclause
$ sqoop --options-file SqoopImportOptions.txt 
--query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees
where $CONDITIONS' 
-m 1 
--target-dir /user/airawat/sqoop-mysql/employeeFrfrmQrySmpl2
Export: Create sample Table
Employees
Create a table in mysql:
mysql> CREATE TABLE employees_export (
emp_no int(11) NOT NULL,
birth_date date NOT NULL,
first_name varchar(14) NOT NULL,
last_name varchar(16) NOT NULL,
gender enum('M','F') NOT NULL,
hire_date date NOT NULL,
PRIMARY KEY (emp_no)
Import Employees to hdfs to
demonstrate export
Import some data into HDFS:
sqoop --options-file SqoopImportOptions.txt 
--query 'select EMP_NO,birth_date,first_name,last_name,gender,hire_date
from employees where $CONDITIONS' 
--split-by EMP_NO 
--direct 
--target-dir /user/airawat/sqoop-mysql/Employees
EXPORT – Create a stage table
Create a stage table in mysql:
mysql > CREATE TABLE employees_exp_stg (
emp_no int(11) NOT NULL,
birth_date date NOT NULL,
first_name varchar(14) NOT NULL,
last_name varchar(16) NOT NULL,
gender enum('M','F') NOT NULL,
hire_date date NOT NULL,
PRIMARY KEY (emp_no)
);
The Export Command
$ sqoop export 
--connect jdbc:mysql://airawat-mysqlserver-node/employees 
--username MyUID 
--password myPWD 
--table employees_export 
--staging-table employees_exp_stg 
--clear-staging-table 
-m 4 
--export-dir /user/airawat/sqoop-mysql/Employees
.
.
.
13/06/04 09:54:18 INFO manager.SqlManager: Migrated 300024
records from `employees_exp_stg` to `employees_export`
Results of Export
Results
mysql> select * from employees_export limit 1;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 200000 | 1960-01-11 | Selwyn | Koshiba | M | 1987-06-05 |
+--------+------------+------------+-----------+--------+------------+
mysql> select count(*) from employees_export;
+----------+
| count(*) |
+----------+
| 300024 |
+----------+
mysql> select * from employees_exp_stg;
Empty set (0.00 sec)
Export – Update Mode
. Exportin updatemode
A2.2.1. Prep:
I am goingto set hiredate to nullfor somerecords,fortryingthisfunctionalityout.
mysql> update employees_export set hire_date = null where
emp_no >400000;
Query OK, 99999 rows affected, 65535 warnings (1.26 sec)
Rows matched: 99999 Changed: 99999 Warnings: 99999
Now to see if the update worked
Sqoop command:
Next, we will export the same data to the same table, and see if the hire date is
updated.
$ sqoop export 
--connect jdbc:mysql://airawat-mysqlserver-node/employees 
--username myUID 
--password myPWD 
--table employees_export 
--direct 
--update-key emp_no 
--update-mode updateonly 
--export-dir /user/airawat/sqoop-mysql/Employees
It Worked!
. Results:
mysql> select count(*) from employees_export where hire_date
is null;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.22 sec)
Export in upsert (Update+Insert)
mode
Upsert = insert if does not exist,
update if exists.
Upsert Command
sqoop export 
--connect jdbc:mysql://airawat-mysqlserver-node/employees 
--username myUID 
--password myPWD 
--table employees_export 
--update-key emp_no 
--update-mode allowinsert 
--export-dir /user/airawat/sqoop-mysql/Employees
Exports may Fail due to
 Loss of connectivity from the Hadoop cluster to the database (either
due to hardware fault, or server software crashes)
 Attempting to INSERT a row which violates a consistency constraint
(for example, inserting a duplicate primary key value)
 Attempting to parse an incomplete or malformed record from the
HDFS source data
 Attempting to parse records using incorrect delimiters
 Capacity issues (such as insufficient RAM or disk space)
Sqoop up Healthcare?
 Most hospitals today store patient information in relational
databases
 In order to analyse this data and gain some insight from it, we need
to get it into Hadoop.
 Sqoop will make that process very efficient.
Thank You For Your Time 

Contenu connexe

Tendances

Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with PythonGokhan Atil
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Guido Schmutz
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 

Tendances (20)

Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Introduction to influx db
Introduction to influx dbIntroduction to influx db
Introduction to influx db
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Spark
SparkSpark
Spark
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 

En vedette

Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015IMC Institute
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for KidsIMC Institute
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartIMC Institute
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in ChinaIMC Institute
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/ProductionIMC Institute
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibIMC Institute
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by stepGiulio Roggero
 

En vedette (14)

Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kids
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in China
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by step
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 

Similaire à Apache sqoop with an use case

GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsGuy Harrison
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Lviv Startup Club
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - PolybaseŁukasz Grala
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads UpMindfire Solutions
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesLindsay Holmwood
 
Adding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSASAdding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSASsumedha.r
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEnis Afgan
 

Similaire à Apache sqoop with an use case (20)

GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Mysql
MysqlMysql
Mysql
 
Hive
HiveHive
Hive
 
Load demo-oct2016
Load demo-oct2016Load demo-oct2016
Load demo-oct2016
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Big Data on the Cloud
Big Data on the CloudBig Data on the Cloud
Big Data on the Cloud
 
Con4445 jesus
Con4445 jesusCon4445 jesus
Con4445 jesus
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads Up
 
Simple Way for MySQL to NoSQL
Simple Way for MySQL to NoSQLSimple Way for MySQL to NoSQL
Simple Way for MySQL to NoSQL
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Adding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSASAdding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSAS
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 

Dernier

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Dernier (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Apache sqoop with an use case

  • 2.
  • 3. What is Sqoop  Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.  Sqoop imports data from external structured datastores into HDFS or related systems like Hive and HBase.  Sqoop can also be used to export data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.  Sqoop works with relational databases such as: Teradata, Netezza, Oracle, MySQL, Postgres, and HSQLDB.
  • 4. Why Sqoop?  As more organizations deploy Hadoop to analyse vast streams of information, they may find they need to transfer large amount of data between Hadoop and their existing databases, data warehouses and other data sources  Loading bulk data into Hadoop from production systems or accessing it from map-reduce applications running on a large cluster is a challenging task since transferring data using scripts is a inefficient and time-consuming task
  • 5. Hadoop-Sqoop?  Hadoop is great for storing massive data in terms of volume using HDFS  It Provides a scalable processing environment for structured and unstructured data  But it’s Batch-Oriented and thus not suitable for low latency interactive query operations  Sqoop is basically an ETL Tool used to copy data between HDFS and SQL databases  Import SQL data to HDFS for archival or analysis  Export HDFS to SQL ( e.g : summarized data used in a DW fact table )
  • 6. What Sqoop Does  Designed to efficiently transfer bulk data between Apache Hadoop and structured datastores such as relational databases, Apache Sqoop:  Allows data imports from external datastores and enterprise data warehouses into Hadoop  Parallelizes data transfer for fast performance and optimal system utilization  Copies data quickly from external systems to Hadoop  Makes data analysis more efficient  Mitigates excessive loads to external systems.
  • 7. How Sqoop Works  Sqoop provides a pluggable connector mechanism for optimal connectivity to external systems.  The Sqoop extension API provides a convenient framework for building new connectors which can be dropped into Sqoop installations to provide connectivity to various systems.  Sqoop itself comes bundled with various connectors that can be used for popular database and data warehousing systems.
  • 8. Who Uses Sqoop?  Online Marketer Coupons.com uses sqoop to exchange data between Hadoop and the IBM Netezza data warehouse appliance, The organization can query its structres databases and pipe the results into Hadoop using sqoop.  Education company The Apollo group also uses the software not only to extract data from databases but to inject the results from Hadoop jobs back into relational databases  And countless other hadoop users use sqoop to efficiently move their data
  • 9. Importing Data - Lists databases in your mysql database. $ sqoop list-databases --connect jdbc:mysql://<<mysql-server>>/employees -- username airawat --password myPassword . . . 13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema employees test
  • 10. Lists tables in your mysql database. $ sqoop list-tables --connect jdbc:mysql://<<mysql-server>>/employees -- username airawat --password myPassword . . . 13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. departments dept_emp dept_manager employees employees_exp_stg employees_export salaries titles
  • 11. Importing data in MySql into HDFS  Replace "airawat-mySqlServer-node" with the host name of the node running mySQL server, replace login credentials and target directory. Importing a table into HDFS - basic import $ sqoop import --connect jdbc:mysql://airawat-mySqlServer-node/employees --username myUID --password myPWD --table employees -m 1 --target-dir /user/airawat/sqoop-mysql/employees . . . .9139 KB/sec) 13/05/31 22:32:25 INFO mapreduce.ImportJobBase: Retrieved 300024 records
  • 12. Executing imports with an options file for static information  Rather than repeat the import command along with connection related input required, each time, you can pass an options file as an argument to sqoop.  Create a text file, as follows, and save it someplace, locally on the node you are running the sqoop client on. . Sample Options file: ___________________________________________________________________________ $ vi SqoopImportOptions.txt # #Options file for sqoop import # import --connect jdbc:mysql://airawat-mySqlServer-node/employees --username myUID --password myPwd # #All other commands should be specified in the command line
  • 13. Options File - Command The command $ sqoop --options-file SqoopImportOptions.txt --table departments -m 1 --target-dir /user/airawat/sqoop-mysql/departments . . . 13/05/31 22:48:55 INFO mapreduce.ImportJobBase: Transferred 153 bytes in 26.2453 seconds (5.8296 bytes/sec) 13/05/31 22:48:55 INFO mapreduce.ImportJobBase: Retrieved 9 records. -m argument is to specify number of mappers. The department table has a handful of records, so I am setting it to 1.
  • 14. The files Created In hdfs Files created in HDFS: $ hadoop fs -ls -R sqoop-mysql/ drwxr-xr-x - airawat airawat 0 2013-05-31 22:48 sqoop- mysql/departments -rw-r--r-- 3 airawat airawat 0 2013-05-31 22:48 sqoop- mysql/departments/_SUCCESS drwxr-xr-x - airawat airawat 0 2013-05-31 22:48 sqoop- mysql/departments/_logs drwxr-xr-x - airawat airawat 0 2013-05-31 22:48 sqoop- mysql/departments/_logs/history -rw-r--r-- 3 airawat airawat 79467 2013-05-31 22:48 sqoop- mysql/departments/_logs/history/cdh- jt01_1369839495962_job_201305290958_0062_conf.xml -rw-r--r-- 3 airawat airawat 12441 2013-05-31 22:48 sqoop- mysql/departments/_logs/history/job_201305290958_0062_1370058514473_ airawa t_departments.jar -rw-r--r-- 3 airawat airawat 153 2013-05-31 22:48 sqoop- mysql/departments/part-m-00000
  • 15. To View the contents of a table . Data file contents: $ hadoop fs -cat sqoop-mysql/departments/part-m-00000 | more d009,Customer Service d005,Development d002,Finance d003,Human Resources d001,Marketing d004,Production d006,Quality Management d008,Research d007,Sales
  • 16. Import all Rows But Column Specific $ sqoop --options-file SqoopImportOptions.txt --table dept_emp --columns “EMP_NO,DEPT_NO,FROM_DATE,TO_DATE” --as-textfile -m 1 --target-dir /user/airawat/sqoop-mysql/DeptEmp
  • 17. Import all Columns, But row Specific using Where Clause Import all columns, filter rows using where clause $ sqoop --options-file SqoopImportOptions.txt --table employees --where "emp_no > 499948" --as-textfile -m 1 --target-dir /user/airawat/sqoop-mysql/employeeGtTest
  • 18. Import - Free Form Query . Import with a free form query with where clause $ sqoop --options-file SqoopImportOptions.txt --query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where EMP_NO < 20000 AND $CONDITIONS' -m 1 --target-dir /user/airawat/sqoop-mysql/employeeFrfrmQry1
  • 19. Import without Where clause Importwithafreeformquerywithoutwhereclause $ sqoop --options-file SqoopImportOptions.txt --query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where $CONDITIONS' -m 1 --target-dir /user/airawat/sqoop-mysql/employeeFrfrmQrySmpl2
  • 20. Export: Create sample Table Employees Create a table in mysql: mysql> CREATE TABLE employees_export ( emp_no int(11) NOT NULL, birth_date date NOT NULL, first_name varchar(14) NOT NULL, last_name varchar(16) NOT NULL, gender enum('M','F') NOT NULL, hire_date date NOT NULL, PRIMARY KEY (emp_no)
  • 21. Import Employees to hdfs to demonstrate export Import some data into HDFS: sqoop --options-file SqoopImportOptions.txt --query 'select EMP_NO,birth_date,first_name,last_name,gender,hire_date from employees where $CONDITIONS' --split-by EMP_NO --direct --target-dir /user/airawat/sqoop-mysql/Employees
  • 22. EXPORT – Create a stage table Create a stage table in mysql: mysql > CREATE TABLE employees_exp_stg ( emp_no int(11) NOT NULL, birth_date date NOT NULL, first_name varchar(14) NOT NULL, last_name varchar(16) NOT NULL, gender enum('M','F') NOT NULL, hire_date date NOT NULL, PRIMARY KEY (emp_no) );
  • 23. The Export Command $ sqoop export --connect jdbc:mysql://airawat-mysqlserver-node/employees --username MyUID --password myPWD --table employees_export --staging-table employees_exp_stg --clear-staging-table -m 4 --export-dir /user/airawat/sqoop-mysql/Employees . . . 13/06/04 09:54:18 INFO manager.SqlManager: Migrated 300024 records from `employees_exp_stg` to `employees_export`
  • 24. Results of Export Results mysql> select * from employees_export limit 1; +--------+------------+------------+-----------+--------+------------+ | emp_no | birth_date | first_name | last_name | gender | hire_date | +--------+------------+------------+-----------+--------+------------+ | 200000 | 1960-01-11 | Selwyn | Koshiba | M | 1987-06-05 | +--------+------------+------------+-----------+--------+------------+ mysql> select count(*) from employees_export; +----------+ | count(*) | +----------+ | 300024 | +----------+ mysql> select * from employees_exp_stg; Empty set (0.00 sec)
  • 25. Export – Update Mode . Exportin updatemode A2.2.1. Prep: I am goingto set hiredate to nullfor somerecords,fortryingthisfunctionalityout. mysql> update employees_export set hire_date = null where emp_no >400000; Query OK, 99999 rows affected, 65535 warnings (1.26 sec) Rows matched: 99999 Changed: 99999 Warnings: 99999
  • 26. Now to see if the update worked Sqoop command: Next, we will export the same data to the same table, and see if the hire date is updated. $ sqoop export --connect jdbc:mysql://airawat-mysqlserver-node/employees --username myUID --password myPWD --table employees_export --direct --update-key emp_no --update-mode updateonly --export-dir /user/airawat/sqoop-mysql/Employees
  • 27. It Worked! . Results: mysql> select count(*) from employees_export where hire_date is null; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.22 sec)
  • 28. Export in upsert (Update+Insert) mode Upsert = insert if does not exist, update if exists.
  • 29. Upsert Command sqoop export --connect jdbc:mysql://airawat-mysqlserver-node/employees --username myUID --password myPWD --table employees_export --update-key emp_no --update-mode allowinsert --export-dir /user/airawat/sqoop-mysql/Employees
  • 30. Exports may Fail due to  Loss of connectivity from the Hadoop cluster to the database (either due to hardware fault, or server software crashes)  Attempting to INSERT a row which violates a consistency constraint (for example, inserting a duplicate primary key value)  Attempting to parse an incomplete or malformed record from the HDFS source data  Attempting to parse records using incorrect delimiters  Capacity issues (such as insufficient RAM or disk space)
  • 31. Sqoop up Healthcare?  Most hospitals today store patient information in relational databases  In order to analyse this data and gain some insight from it, we need to get it into Hadoop.  Sqoop will make that process very efficient.
  • 32. Thank You For Your Time 