1. Apache Sqoop
● What is it ?
● How does it work ?
● Interfaces
● Example
● Architecture
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
2. Scoop – What is it ?
● A command line interface
– ( plus web in scoop2 )
● For data import / export to Hadoop
● Uses Map jobs from Map Reduce
● Supports incremental loads
● Written in Java
● Licensed by Apache
● Uses plugins for new types of data source
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
3. Scoop – How does it work ?
● Data sliced into partitions
● Mappers transfer data
● Data types determined via meta data
● Many data transfer formats supported
– i.e. CSV, Avro
● Can import into
– Hive ( use --hive-import flag )
– Hbase ( use –hbase* flags )
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
4. Scoop – Interfaces
● Get data from
– Relational databases
– Data warehouses
– NoSQL databases
● Load to Hive and Hbase
● Integrates with Oozie
– for scheduling
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
5. Scoop – Example
An example scoop command to
– load data from mySql into Hive
bin/sqoop-import
--connect jdbc:mysql://<mysql host>:<msql port>/db3
-username <username>
-password <password>
--table <tableName>
--hive-table <Hive tableName>
--create-hive-table
--hive-import
--hive-home <hive path>
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
6. Scoop – Architecture
Scoop has moved from
● Scoop1 to Scoop 2
● Changed from client to server install
● Now has web and command line access
● Server now accesses Hive & Hbase
● Oozie uses REST API
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
9. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems
10. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems