SlideShare une entreprise Scribd logo
1  sur  14
Sqoop – Advanced Options
2015
Contents
1 What is Sqoop ?
2 Import and Export data using Sqoop
3 Import and Export command in Sqoop
4 Saved Jobs in Sqoop
5 Option File
6 Important Sqoop Options
What is Sqoop?
Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and
structured data stores such as relational databases.
Import and Export using Sqoop
The import command in Sqoop transfers the data from RDBMS to HDFS/Hive/HBase.
The export command in Sqoop transfers the data from HDFS/Hive/HBase back to
RDBMS.
Import command in Sqoop
The command to import data into Hive :
The command to import data into HDFS :
The command to import data in HBase :
sqoop import --connect <connect-string>/dbname --username uname -P
--table table_name --hive-import -m 1
sqoop import --connect <connect-string>/dbname --username uname --P
--table table_name -m 1
sqoop import --connect <connect-string>/dbname --username root -P
--table table_name --hbase-table table_name
--column-family col_fam_name --hbase-row-key row_key_name --hbase-create-table -m 1
Export command in Sqoop
The command to export data from RDBMS to Hive :
The command to export data from RDBMS to HDFS :
Limitations of Import and Export command:
- Import and Export commands are convenient to use when one wants to transfer data from RDBMS to
HDFS/Hive/HBase and vice-a-versa for a limited number of times.
So what if there is a requirement of executing the import and export commands several times a day ?
In such situations Saved Sqoop Job can save your time.
sqoop export --connect <connect-string>/db_name --table table_name -m 1
--export-dir <path_to_export_dir>
sqoop export --connect <connect-string>/db_name --table table_name -m 1
--export-dir <path_to_export_dir>
Saved Jobs in Sqoop
The Saved Sqoop Job remembers the parameters used by a job so they can be re-
executed by invoking the job several times.
Following command creates saved jobs:
The command above just creates a job with the job name you specify.
It means that the job you created is now available in your saved jobs list which can be
executed later.
Following command executes a saved job :
sqoop job --create job_name --import --connect <connect-string>/dbname  --table table_name
sqoop job --exec job_name --username uname –P
Sample Saved Job
sqoop job --create JOB1
-- import --connect jdbc:mysql://192.168.56.1:3306/adventureworks
-username XXX
-password XXX
--table transactionhistory
--target-dir /user/cloudera/datasets/trans
-m 1
--columns "TransactionID,ProductId,TransactionDate"
--check-column TransactionDate
--incremental lastmodified
--last-value "2004-09-01 00:00:00";
Important Options in Saved Jobs in Sqoop
Sqoop option Usage
--connect Connection string for the source database
--table Source table name
--columns Columns to be extracted
--username User name for accessing source table
--password Password for accessing source table
--check-column
Specifies the column to be examined when determining which rows
to import.
--incremental Specifies how Sqoop determines which rows are new.
--last-value
Specifies the maximum value of the check column from the previous
import. For the first execution of the job, “last-value” is treated as
the upper bound and data is extracted from first record till the upper
bound.
--target-dir Target HDFS directory
--m Number of mapper tasks
--compress
Specifies that compression has to be applied while loading data into
target.
--fields-terminated-by Fields separator in output directory
Sqoop Metastore
• A Sqoop metastore keeps track of all jobs.
• By default, the metastore is contained in your home directory under .sqoop and is
only used for your own jobs. If you want to share jobs, you would need to install a
JDBC-compliant database and use the --meta-connect argument to specify its
location when issuing job commands.
• Important Sqoop commands:
• $ sqoop job –list – Lists all jobs available in metastore
• sqoop job --exec JOB1 – Executes JOB1
• sqoop job --show JOB1 – Displays metadata of JOB1
Option File
Certain arguments in import, export commands and saved jobs are to be written every
time you execute them.
What would be an alternative to this repetitive work ?
For instance following arguments are used repetitively in import and export
commands as well as saved jobs :
• So these arguments can be saved in a single text file say option.txt.
• While executing the command just include this file for the argument --options-file.
• Following command shows the use of –options-file argument:
import
-connect
jdbc:mysql//localhost
-username
-P
Option.txt
sqoop --options-file <path_to_option_file>/db_name --table table_name
Option File
1. Each argument in the option file should be on a new line.
2. -connect in option file cannot be written as --connect.
3. Same is the case for other arguments too.
4. Option file is generally used when large number of Sqoop jobs use a common set
of parameters such as:
1. Source RDBMS ID, Password
2. Source database URL
3. Field Separator
4. Compression type
Sqoop Design Guidelines for Performance
1. Sqoop imports data in parallel from database sources. You can specify the number
of map tasks (parallel processes) to use to perform the import by using the -
m argument. Some databases may see improved performance by increasing this
value to 8 or 16. Do not increase the degree of parallelism greater than that
available within your MapReduce cluster;
2. By default, the import process will use JDBC. Some databases can perform imports
in a more high-performance fashion by using database-specific data movement
tools. For example, MySQL provides the mysqldump tool which can export data
from MySQL to other systems very quickly. By supplying the --direct argument,
you are specifying that Sqoop should attempt the direct import channel.
Thank You

Contenu connexe

Tendances

Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...CloudxLab
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 

Tendances (19)

Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
 
Sqoop tutorial
Sqoop tutorialSqoop tutorial
Sqoop tutorial
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Hive commands
Hive commandsHive commands
Hive commands
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 

En vedette

Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015IMC Institute
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for KidsIMC Institute
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartIMC Institute
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in ChinaIMC Institute
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/ProductionIMC Institute
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibIMC Institute
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by stepGiulio Roggero
 

En vedette (13)

Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kids
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in China
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by step
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 

Similaire à Advanced Sqoop

Hollywood mode off: security testing at scale
Hollywood mode off: security testing at scaleHollywood mode off: security testing at scale
Hollywood mode off: security testing at scaleClaudio Criscione
 
Introduction to WP-CLI: Manage WordPress from the command line
Introduction to WP-CLI: Manage WordPress from the command lineIntroduction to WP-CLI: Manage WordPress from the command line
Introduction to WP-CLI: Manage WordPress from the command lineBehzod Saidov
 
Odoo command line interface
Odoo command line interfaceOdoo command line interface
Odoo command line interfaceJalal Zahid
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0venkatakrishnan k
 
CMake Tutorial
CMake TutorialCMake Tutorial
CMake TutorialFu Haiping
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads UpMindfire Solutions
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh thingsMarcus Deglos
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installationmellempudilavanya999
 
Linux file commands and shell scripts
Linux file commands and shell scriptsLinux file commands and shell scripts
Linux file commands and shell scriptsPrashantTechment
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Lucas Jellema
 
Ansible automation tool with modules
Ansible automation tool with modulesAnsible automation tool with modules
Ansible automation tool with modulesmohamedmoharam
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
Devops for beginners
Devops for beginnersDevops for beginners
Devops for beginnersVivek Parihar
 
Language Enhancement in ColdFusion 8 - CFUnited 2007
Language Enhancement in ColdFusion 8 - CFUnited 2007Language Enhancement in ColdFusion 8 - CFUnited 2007
Language Enhancement in ColdFusion 8 - CFUnited 2007Rupesh Kumar
 

Similaire à Advanced Sqoop (20)

Hollywood mode off: security testing at scale
Hollywood mode off: security testing at scaleHollywood mode off: security testing at scale
Hollywood mode off: security testing at scale
 
Introduction to WP-CLI: Manage WordPress from the command line
Introduction to WP-CLI: Manage WordPress from the command lineIntroduction to WP-CLI: Manage WordPress from the command line
Introduction to WP-CLI: Manage WordPress from the command line
 
linux installation.pdf
linux installation.pdflinux installation.pdf
linux installation.pdf
 
Odoo command line interface
Odoo command line interfaceOdoo command line interface
Odoo command line interface
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
CMake Tutorial
CMake TutorialCMake Tutorial
CMake Tutorial
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads Up
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh things
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
 
Linux file commands and shell scripts
Linux file commands and shell scriptsLinux file commands and shell scripts
Linux file commands and shell scripts
 
Mbuild help
Mbuild helpMbuild help
Mbuild help
 
50 Most Frequently Used UNIX Linux Commands -hmftj
50 Most Frequently Used UNIX  Linux Commands -hmftj50 Most Frequently Used UNIX  Linux Commands -hmftj
50 Most Frequently Used UNIX Linux Commands -hmftj
 
Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)Introducing Node.js in an Oracle technology environment (including hands-on)
Introducing Node.js in an Oracle technology environment (including hands-on)
 
Ansible automation tool with modules
Ansible automation tool with modulesAnsible automation tool with modules
Ansible automation tool with modules
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Devops for beginners
Devops for beginnersDevops for beginners
Devops for beginners
 
Group13
Group13Group13
Group13
 
Language Enhancement in ColdFusion 8 - CFUnited 2007
Language Enhancement in ColdFusion 8 - CFUnited 2007Language Enhancement in ColdFusion 8 - CFUnited 2007
Language Enhancement in ColdFusion 8 - CFUnited 2007
 
AWS Pentest.pdf
AWS Pentest.pdfAWS Pentest.pdf
AWS Pentest.pdf
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Advanced Sqoop

  • 1. Sqoop – Advanced Options 2015
  • 2. Contents 1 What is Sqoop ? 2 Import and Export data using Sqoop 3 Import and Export command in Sqoop 4 Saved Jobs in Sqoop 5 Option File 6 Important Sqoop Options
  • 3. What is Sqoop? Sqoop is a tool designed for efficiently transferring bulk data between Hadoop and structured data stores such as relational databases.
  • 4. Import and Export using Sqoop The import command in Sqoop transfers the data from RDBMS to HDFS/Hive/HBase. The export command in Sqoop transfers the data from HDFS/Hive/HBase back to RDBMS.
  • 5. Import command in Sqoop The command to import data into Hive : The command to import data into HDFS : The command to import data in HBase : sqoop import --connect <connect-string>/dbname --username uname -P --table table_name --hive-import -m 1 sqoop import --connect <connect-string>/dbname --username uname --P --table table_name -m 1 sqoop import --connect <connect-string>/dbname --username root -P --table table_name --hbase-table table_name --column-family col_fam_name --hbase-row-key row_key_name --hbase-create-table -m 1
  • 6. Export command in Sqoop The command to export data from RDBMS to Hive : The command to export data from RDBMS to HDFS : Limitations of Import and Export command: - Import and Export commands are convenient to use when one wants to transfer data from RDBMS to HDFS/Hive/HBase and vice-a-versa for a limited number of times. So what if there is a requirement of executing the import and export commands several times a day ? In such situations Saved Sqoop Job can save your time. sqoop export --connect <connect-string>/db_name --table table_name -m 1 --export-dir <path_to_export_dir> sqoop export --connect <connect-string>/db_name --table table_name -m 1 --export-dir <path_to_export_dir>
  • 7. Saved Jobs in Sqoop The Saved Sqoop Job remembers the parameters used by a job so they can be re- executed by invoking the job several times. Following command creates saved jobs: The command above just creates a job with the job name you specify. It means that the job you created is now available in your saved jobs list which can be executed later. Following command executes a saved job : sqoop job --create job_name --import --connect <connect-string>/dbname --table table_name sqoop job --exec job_name --username uname –P
  • 8. Sample Saved Job sqoop job --create JOB1 -- import --connect jdbc:mysql://192.168.56.1:3306/adventureworks -username XXX -password XXX --table transactionhistory --target-dir /user/cloudera/datasets/trans -m 1 --columns "TransactionID,ProductId,TransactionDate" --check-column TransactionDate --incremental lastmodified --last-value "2004-09-01 00:00:00";
  • 9. Important Options in Saved Jobs in Sqoop Sqoop option Usage --connect Connection string for the source database --table Source table name --columns Columns to be extracted --username User name for accessing source table --password Password for accessing source table --check-column Specifies the column to be examined when determining which rows to import. --incremental Specifies how Sqoop determines which rows are new. --last-value Specifies the maximum value of the check column from the previous import. For the first execution of the job, “last-value” is treated as the upper bound and data is extracted from first record till the upper bound. --target-dir Target HDFS directory --m Number of mapper tasks --compress Specifies that compression has to be applied while loading data into target. --fields-terminated-by Fields separator in output directory
  • 10. Sqoop Metastore • A Sqoop metastore keeps track of all jobs. • By default, the metastore is contained in your home directory under .sqoop and is only used for your own jobs. If you want to share jobs, you would need to install a JDBC-compliant database and use the --meta-connect argument to specify its location when issuing job commands. • Important Sqoop commands: • $ sqoop job –list – Lists all jobs available in metastore • sqoop job --exec JOB1 – Executes JOB1 • sqoop job --show JOB1 – Displays metadata of JOB1
  • 11. Option File Certain arguments in import, export commands and saved jobs are to be written every time you execute them. What would be an alternative to this repetitive work ? For instance following arguments are used repetitively in import and export commands as well as saved jobs : • So these arguments can be saved in a single text file say option.txt. • While executing the command just include this file for the argument --options-file. • Following command shows the use of –options-file argument: import -connect jdbc:mysql//localhost -username -P Option.txt sqoop --options-file <path_to_option_file>/db_name --table table_name
  • 12. Option File 1. Each argument in the option file should be on a new line. 2. -connect in option file cannot be written as --connect. 3. Same is the case for other arguments too. 4. Option file is generally used when large number of Sqoop jobs use a common set of parameters such as: 1. Source RDBMS ID, Password 2. Source database URL 3. Field Separator 4. Compression type
  • 13. Sqoop Design Guidelines for Performance 1. Sqoop imports data in parallel from database sources. You can specify the number of map tasks (parallel processes) to use to perform the import by using the - m argument. Some databases may see improved performance by increasing this value to 8 or 16. Do not increase the degree of parallelism greater than that available within your MapReduce cluster; 2. By default, the import process will use JDBC. Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools. For example, MySQL provides the mysqldump tool which can export data from MySQL to other systems very quickly. By supplying the --direct argument, you are specifying that Sqoop should attempt the direct import channel.