SlideShare une entreprise Scribd logo
1  sur  22
Apache Sqoop

A Data Transfer Tool for Hadoop




         Arvind Prabhakar, Cloudera Inc. Sept 21, 2011
What is Sqoop?

● Allows easy import and export of data from structured
  data stores:
   ○ Relational Database
   ○ Enterprise Data Warehouse
   ○ NoSQL Datastore

● Allows easy integration with Hadoop based systems:
   ○ Hive
   ○ HBase
   ○ Oozie
Agenda

● Motivation

● Importing and exporting data using Sqoop

● Provisioning Hive Metastore

● Populating HBase tables

● Sqoop Connectors

● Current Status and Road Map
Motivation

● Structured data stored in Databases and EDW is not easily
  accessible for analysis in Hadoop

● Access to Databases and EDW from Hadoop Clusters is
  problematic.

● Forcing MapReduce to access data from Databases/EDWs is
  repititive, error-prone and non-trivial.

● Data preparation often required for efficient consumption
  by Hadoop based data pipelines. 

● Current methods of transferring data are inefficient/ad-
  hoc.
Enter: Sqoop

    A tool to automate data transfer between structured     
    datastores and Hadoop.

Highlights

 ● Uses datastore metadata to infer structure definitions
 ● Uses MapReduce framework to transfer data in parallel
 ● Allows structure definitions to be provisioned in Hive
   metastore
 ● Provides an extension mechanism to incorporate high
   performance connectors for external systems. 
Importing Data

mysql> describe ORDERS;
+-----------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| ORDER_NUMBER | int(11) | NO | PRI | NULL |                            |
| ORDER_DATE | datetime | NO | | NULL |                             |
| REQUIRED_DATE | datetime | NO | | NULL |                            |
| SHIP_DATE           | datetime | YES | | NULL |                 |
| STATUS           | varchar(15) | NO | | NULL |               |
| COMMENTS              | text     | YES | | NULL |             |
| CUSTOMER_NUMBER | int(11) | NO | | NULL |                               |
+-----------------+-------------+------+-----+---------+-------+
7 rows in set (0.00 sec)
Importing Data
$ sqoop import --connect jdbc:mysql://localhost/acmedb 
  --table ORDERS --username test --password ****
 ...

INFO mapred.JobClient: Counters: 12
INFO mapred.JobClient:   Job Counters 
INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12873
...
INFO mapred.JobClient:     Launched map tasks=4
INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
INFO mapred.JobClient:   FileSystemCounters
INFO mapred.JobClient:     HDFS_BYTES_READ=505
INFO mapred.JobClient:     FILE_BYTES_WRITTEN=222848
INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=35098
INFO mapred.JobClient:   Map-Reduce Framework
INFO mapred.JobClient:     Map input records=326
INFO mapred.JobClient:     Spilled Records=0
INFO mapred.JobClient:     Map output records=326
INFO mapred.JobClient:     SPLIT_RAW_BYTES=505
INFO mapreduce.ImportJobBase: Transferred 34.2754 KB in 11.2754 seconds (3.0398
KB/sec)
INFO mapreduce.ImportJobBase: Retrieved 326 records.
Importing Data

$ hadoop fs -ls
Found 32 items
....
drwxr-xr-x - arvind staff 0 2011-09-13 19:12 /user/arvind/ORDERS
....

$ hadoop fs -ls /user/arvind/ORDERS

arvind@ap-w510:/opt/ws/apache/sqoop$ hadoop fs -ls /user/arvind/ORDERS
Found 6 items
... 0 2011-09-13 19:12 /user/arvind/ORDERS/_SUCCESS
... 0 2011-09-13 19:12 /user/arvind/ORDERS/_logs
... 8826 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00000
... 8760 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00001
... 8841 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00002
... 8671 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00003
Exporting Data

$ sqoop export --connect jdbc:mysql://localhost/acmedb 
  --table ORDERS_CLEAN --username test --password **** 
  --export-dir /user/arvind/ORDERS
...
INFO mapreduce.ExportJobBase: Transferred 34.7178 KB in 6.7482 seconds (5.1447 KB/sec)
INFO mapreduce.ExportJobBase: Exported 326 records.
$



  ● Default Delimiters: ',' for fields, New-Lines for records
  ● Optionally Specify Escape sequence 
  ● Delimiters can be specified for both import and export
Exporting Data

Exports can optionally use Staging Tables

 ● Map tasks populate staging table

 ● Each map write is broken down into many transactions

 ● Staging table is then used to populate the target table in a
   single transaction

 ● In case of failure, staging table provides insulation from
   data corruption.
Importing Data into Hive

$ sqoop import --connect jdbc:mysql://localhost/acmedb 
  --table ORDERS --username test --password **** --hive-import
 ...

INFO mapred.JobClient: Counters: 12
INFO mapreduce.ImportJobBase: Transferred 34.2754 KB in 11.3995 seconds (3.0068
KB/sec)
INFO mapreduce.ImportJobBase: Retrieved 326 records.
INFO hive.HiveImport: Removing temporary files from import process: ORDERS/_logs
INFO hive.HiveImport: Loading uploaded data into Hive
...
WARN hive.TableDefWriter: Column ORDER_DATE had to be cast to a less precise type in
Hive
WARN hive.TableDefWriter: Column REQUIRED_DATE had to be cast to a less precise type
in Hive
WARN hive.TableDefWriter: Column SHIP_DATE had to be cast to a less precise type in
Hive
...
$
Importing Data into Hive

$ hive
hive> show tables;
OK
...
orders
...
hive> describe orders;
OK
order_number int
order_date string
required_date string
ship_date string
status string
comments string
customer_number int
Time taken: 0.236 seconds
hive>
Importing Data into HBase

$ bin/sqoop import --connect jdbc:mysql://localhost/acmedb 
  --table ORDERS --username test --password **** 
  --hbase-create-table --hbase-table ORDERS --column-family mysql
...
INFO mapreduce.HBaseImportJob: Creating missing HBase table ORDERS
...
INFO mapreduce.ImportJobBase: Retrieved 326 records.
$


  ● Sqoop creates the missing table if instructed
  ● If no Row-Key specified, the Primary Key column is used.
  ● Each output column placed in same column family
  ● Every record read results in an HBase put operation
  ● All values are converted to their string representation and
    inserted as UTF-8 bytes.
Importing Data into HBase

hbase(main):001:0> list
TABLE 
ORDERS 
1 row(s) in 0.3650 seconds

hbase(main):002:0>  describe 'ORDERS'
DESCRIPTION                             ENABLED
{NAME => 'ORDERS', FAMILIES => [                true
 {NAME => 'mysql', BLOOMFILTER => 'NONE',
  REPLICATION_SCOPE => '0', COMPRESSION => 'NONE',
  VERSIONS => '3', TTL => '2147483647',
  BLOCKSIZE => '65536', IN_MEMORY => 'false',
  BLOCKCACHE => 'true'}]}
1 row(s) in 0.0310 seconds

hbase(main):003:0>
Importing Data into HBase

hbase(main):001:0> scan 'ORDERS', { LIMIT => 1 }
ROW COLUMN+CELL
10100 column=mysql:CUSTOMER_NUMBER,timestamp=1316036948264,
    value=363
10100 column=mysql:ORDER_DATE, timestamp=1316036948264,
    value=2003-01-06 00:00:00.0
10100 column=mysql:REQUIRED_DATE, timestamp=1316036948264,
    value=2003-01-13 00:00:00.0
10100 column=mysql:SHIP_DATE, timestamp=1316036948264,
    value=2003-01-10 00:00:00.0
10100 column=mysql:STATUS, timestamp=1316036948264,
    value=Shipped
1 row(s) in 0.0130 seconds

hbase(main):012:0>
Sqoop Connectors

● Connector Mechanism allows creation of new connectors
  that improve/augment Sqoop functionality.

● Bundled connectors include:
   ○ MySQL, PostgreSQL, Oracle, SQLServer, JDBC
   ○ Direct MySQL, Direct PostgreSQL

● Regular connectors are JDBC based.

● Direct Connectors use native tools for high-performance
  data transfer implementation.
Import using Direct MySQL Connector

$ sqoop import --connect jdbc:mysql://localhost/acmedb 
   --table ORDERS --username test --password **** --direct
...
manager.DirectMySQLManager: Beginning mysqldump fast
path import
...

Direct import works as follows:
 ● Data is partitioned into splits using JDBC
 ● Map tasks used mysqldump to do the import with conditional
   selection clause (-w 'ORDER_NUMBER' > ...)
 ● Header and footer information was stripped out

Direct Export similarly uses            mysqlimport   utility.
Third Party Connectors

● Oracle - Developed by Quest Software

● Couchbase - Developed by Couchbase

● Netezza - Developed by Cloudera

● Teradata - Developed by Cloudera

● Microsoft SQL Server - Developed by Microsoft

● Microsoft PDW - Developed by Microsoft

● Volt DB - Developed by VoltDB
Current Status

Sqoop is currently in Apache Incubator

  ● Status Page
     http://incubator.apache.org/projects/sqoop.html

  ● Mailing Lists
     sqoop-user@incubator.apache.org
     sqoop-dev@incubator.apache.org

  ● Release
     Current shipping version is 1.3.0
Hadoop World 2011


A gathering of Hadoop practitioners, developers,
business executives, industry luminaries and
innovative companies in the Hadoop ecosystem.

    ● Network: 1400 attendees, 25+ sponsors
    ● Learn: 60 sessions across 5 tracks for             November 8-9
         ○ Developers                              Sheraton New York Hotel &
         ○ IT Operations                                  Towers, NYC
         ○ Enterprise Architects
         ○ Data Scientists
         ○ Business Decision Makers                 Learn more and register at
                                                     www.hadoopworld.com
    ● Train: Cloudera training and certification
       (November 7, 10, 11)
Sqoop Meetup



      Monday, November 7 - 2011, 8pm - 9pm

                       at

     Sheraton New York Hotel & Towers, NYC
Thank you!

   Q&A

Contenu connexe

Tendances

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkSamy Dindane
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariDataWorks Summit
 

Tendances (20)

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 

En vedette

สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for KidsIMC Institute
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in ChinaIMC Institute
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015IMC Institute
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartIMC Institute
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/ProductionIMC Institute
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibIMC Institute
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by stepGiulio Roggero
 

En vedette (14)

สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kids
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in China
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by step
 

Similaire à Apache Sqoop: A Data Transfer Tool for Hadoop

MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019Dave Stokes
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015Dave Stokes
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015Dave Stokes
 
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresM|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresMariaDB plc
 
Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18Wagner Bianchi
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
In-memory ColumnStore Index
In-memory ColumnStore IndexIn-memory ColumnStore Index
In-memory ColumnStore IndexSolidQ
 
Drupal database Mssql to MySQL migration
Drupal database Mssql to MySQL migrationDrupal database Mssql to MySQL migration
Drupal database Mssql to MySQL migrationAnton Ivanov
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeMariaDB plc
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-actionAssaf Gannon
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP ConferenceDave Stokes
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLBjoern Rost
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONMario Beck
 
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...Jürgen Ambrosi
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
Optimizing your Database Import!
Optimizing your Database Import! Optimizing your Database Import!
Optimizing your Database Import! Nabil Nawaz
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 

Similaire à Apache Sqoop: A Data Transfer Tool for Hadoop (20)

Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresM|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
 
Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
In-memory ColumnStore Index
In-memory ColumnStore IndexIn-memory ColumnStore Index
In-memory ColumnStore Index
 
Drupal database Mssql to MySQL migration
Drupal database Mssql to MySQL migrationDrupal database Mssql to MySQL migration
Drupal database Mssql to MySQL migration
 
How to migrate from Oracle Database with ease
How to migrate from Oracle Database with easeHow to migrate from Oracle Database with ease
How to migrate from Oracle Database with ease
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-action
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
 
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Optimizing your Database Import!
Optimizing your Database Import! Optimizing your Database Import!
Optimizing your Database Import!
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Apache Sqoop: A Data Transfer Tool for Hadoop

  • 1. Apache Sqoop A Data Transfer Tool for Hadoop Arvind Prabhakar, Cloudera Inc. Sept 21, 2011
  • 2. What is Sqoop? ● Allows easy import and export of data from structured data stores: ○ Relational Database ○ Enterprise Data Warehouse ○ NoSQL Datastore ● Allows easy integration with Hadoop based systems: ○ Hive ○ HBase ○ Oozie
  • 3. Agenda ● Motivation ● Importing and exporting data using Sqoop ● Provisioning Hive Metastore ● Populating HBase tables ● Sqoop Connectors ● Current Status and Road Map
  • 4. Motivation ● Structured data stored in Databases and EDW is not easily accessible for analysis in Hadoop ● Access to Databases and EDW from Hadoop Clusters is problematic. ● Forcing MapReduce to access data from Databases/EDWs is repititive, error-prone and non-trivial. ● Data preparation often required for efficient consumption by Hadoop based data pipelines.  ● Current methods of transferring data are inefficient/ad- hoc.
  • 5. Enter: Sqoop     A tool to automate data transfer between structured          datastores and Hadoop. Highlights ● Uses datastore metadata to infer structure definitions ● Uses MapReduce framework to transfer data in parallel ● Allows structure definitions to be provisioned in Hive metastore ● Provides an extension mechanism to incorporate high performance connectors for external systems. 
  • 6. Importing Data mysql> describe ORDERS; +-----------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+-------------+------+-----+---------+-------+ | ORDER_NUMBER | int(11) | NO | PRI | NULL | | | ORDER_DATE | datetime | NO | | NULL | | | REQUIRED_DATE | datetime | NO | | NULL | | | SHIP_DATE | datetime | YES | | NULL | | | STATUS | varchar(15) | NO | | NULL | | | COMMENTS | text | YES | | NULL | | | CUSTOMER_NUMBER | int(11) | NO | | NULL | | +-----------------+-------------+------+-----+---------+-------+ 7 rows in set (0.00 sec)
  • 7. Importing Data $ sqoop import --connect jdbc:mysql://localhost/acmedb --table ORDERS --username test --password ****  ... INFO mapred.JobClient: Counters: 12 INFO mapred.JobClient:   Job Counters  INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12873 ... INFO mapred.JobClient:     Launched map tasks=4 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0 INFO mapred.JobClient:   FileSystemCounters INFO mapred.JobClient:     HDFS_BYTES_READ=505 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=222848 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=35098 INFO mapred.JobClient:   Map-Reduce Framework INFO mapred.JobClient:     Map input records=326 INFO mapred.JobClient:     Spilled Records=0 INFO mapred.JobClient:     Map output records=326 INFO mapred.JobClient:     SPLIT_RAW_BYTES=505 INFO mapreduce.ImportJobBase: Transferred 34.2754 KB in 11.2754 seconds (3.0398 KB/sec) INFO mapreduce.ImportJobBase: Retrieved 326 records.
  • 8. Importing Data $ hadoop fs -ls Found 32 items .... drwxr-xr-x - arvind staff 0 2011-09-13 19:12 /user/arvind/ORDERS .... $ hadoop fs -ls /user/arvind/ORDERS arvind@ap-w510:/opt/ws/apache/sqoop$ hadoop fs -ls /user/arvind/ORDERS Found 6 items ... 0 2011-09-13 19:12 /user/arvind/ORDERS/_SUCCESS ... 0 2011-09-13 19:12 /user/arvind/ORDERS/_logs ... 8826 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00000 ... 8760 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00001 ... 8841 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00002 ... 8671 2011-09-13 19:12 /user/arvind/ORDERS/part-m-00003
  • 9. Exporting Data $ sqoop export --connect jdbc:mysql://localhost/acmedb --table ORDERS_CLEAN --username test --password **** --export-dir /user/arvind/ORDERS ... INFO mapreduce.ExportJobBase: Transferred 34.7178 KB in 6.7482 seconds (5.1447 KB/sec) INFO mapreduce.ExportJobBase: Exported 326 records. $ ● Default Delimiters: ',' for fields, New-Lines for records ● Optionally Specify Escape sequence  ● Delimiters can be specified for both import and export
  • 10. Exporting Data Exports can optionally use Staging Tables ● Map tasks populate staging table ● Each map write is broken down into many transactions ● Staging table is then used to populate the target table in a single transaction ● In case of failure, staging table provides insulation from data corruption.
  • 11. Importing Data into Hive $ sqoop import --connect jdbc:mysql://localhost/acmedb --table ORDERS --username test --password **** --hive-import  ... INFO mapred.JobClient: Counters: 12 INFO mapreduce.ImportJobBase: Transferred 34.2754 KB in 11.3995 seconds (3.0068 KB/sec) INFO mapreduce.ImportJobBase: Retrieved 326 records. INFO hive.HiveImport: Removing temporary files from import process: ORDERS/_logs INFO hive.HiveImport: Loading uploaded data into Hive ... WARN hive.TableDefWriter: Column ORDER_DATE had to be cast to a less precise type in Hive WARN hive.TableDefWriter: Column REQUIRED_DATE had to be cast to a less precise type in Hive WARN hive.TableDefWriter: Column SHIP_DATE had to be cast to a less precise type in Hive ... $
  • 12. Importing Data into Hive $ hive hive> show tables; OK ... orders ... hive> describe orders; OK order_number int order_date string required_date string ship_date string status string comments string customer_number int Time taken: 0.236 seconds hive>
  • 13. Importing Data into HBase $ bin/sqoop import --connect jdbc:mysql://localhost/acmedb --table ORDERS --username test --password **** --hbase-create-table --hbase-table ORDERS --column-family mysql ... INFO mapreduce.HBaseImportJob: Creating missing HBase table ORDERS ... INFO mapreduce.ImportJobBase: Retrieved 326 records. $ ● Sqoop creates the missing table if instructed ● If no Row-Key specified, the Primary Key column is used. ● Each output column placed in same column family ● Every record read results in an HBase put operation ● All values are converted to their string representation and inserted as UTF-8 bytes.
  • 14. Importing Data into HBase hbase(main):001:0> list TABLE  ORDERS  1 row(s) in 0.3650 seconds hbase(main):002:0>  describe 'ORDERS' DESCRIPTION ENABLED {NAME => 'ORDERS', FAMILIES => [ true {NAME => 'mysql', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0310 seconds hbase(main):003:0>
  • 15. Importing Data into HBase hbase(main):001:0> scan 'ORDERS', { LIMIT => 1 } ROW COLUMN+CELL 10100 column=mysql:CUSTOMER_NUMBER,timestamp=1316036948264, value=363 10100 column=mysql:ORDER_DATE, timestamp=1316036948264, value=2003-01-06 00:00:00.0 10100 column=mysql:REQUIRED_DATE, timestamp=1316036948264, value=2003-01-13 00:00:00.0 10100 column=mysql:SHIP_DATE, timestamp=1316036948264, value=2003-01-10 00:00:00.0 10100 column=mysql:STATUS, timestamp=1316036948264, value=Shipped 1 row(s) in 0.0130 seconds hbase(main):012:0>
  • 16. Sqoop Connectors ● Connector Mechanism allows creation of new connectors that improve/augment Sqoop functionality. ● Bundled connectors include: ○ MySQL, PostgreSQL, Oracle, SQLServer, JDBC ○ Direct MySQL, Direct PostgreSQL ● Regular connectors are JDBC based. ● Direct Connectors use native tools for high-performance data transfer implementation.
  • 17. Import using Direct MySQL Connector $ sqoop import --connect jdbc:mysql://localhost/acmedb --table ORDERS --username test --password **** --direct ... manager.DirectMySQLManager: Beginning mysqldump fast path import ... Direct import works as follows: ● Data is partitioned into splits using JDBC ● Map tasks used mysqldump to do the import with conditional selection clause (-w 'ORDER_NUMBER' > ...) ● Header and footer information was stripped out Direct Export similarly uses mysqlimport utility.
  • 18. Third Party Connectors ● Oracle - Developed by Quest Software ● Couchbase - Developed by Couchbase ● Netezza - Developed by Cloudera ● Teradata - Developed by Cloudera ● Microsoft SQL Server - Developed by Microsoft ● Microsoft PDW - Developed by Microsoft ● Volt DB - Developed by VoltDB
  • 19. Current Status Sqoop is currently in Apache Incubator ● Status Page      http://incubator.apache.org/projects/sqoop.html ● Mailing Lists      sqoop-user@incubator.apache.org      sqoop-dev@incubator.apache.org ● Release      Current shipping version is 1.3.0
  • 20. Hadoop World 2011 A gathering of Hadoop practitioners, developers, business executives, industry luminaries and innovative companies in the Hadoop ecosystem. ● Network: 1400 attendees, 25+ sponsors ● Learn: 60 sessions across 5 tracks for November 8-9 ○ Developers Sheraton New York Hotel & ○ IT Operations Towers, NYC ○ Enterprise Architects ○ Data Scientists ○ Business Decision Makers Learn more and register at www.hadoopworld.com ● Train: Cloudera training and certification      (November 7, 10, 11)
  • 21. Sqoop Meetup Monday, November 7 - 2011, 8pm - 9pm at Sheraton New York Hotel & Towers, NYC
  • 22. Thank you! Q&A