SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
 	
  	
  	
  	
  	
  Unlocking	
  Hadoop	
  for	
  Your	
  Rela4onal	
  DB	
  
	
  
	
  
	
  
	
  
	
  
Kathleen Ting | @kate_ting
Technical Account Manager, Cloudera | Sqoop PMC Member
Hadoop User Group UK
10 April 2014
	
  
	
  
	
  
Who	
  Am	
  I?	
  
•  Started	
  3	
  yr	
  ago	
  as	
  1st	
  Cloudera	
  Support	
  Eng	
  
•  Now	
  manages	
  Cloudera’s	
  2	
  largest	
  customers	
  
•  Sqoop	
  CommiJer,	
  PMC	
  Member	
  
•  Co-­‐Author	
  of	
  the	
  Apache	
  Sqoop	
  Cookbook	
  
What	
  is	
  Sqoop?	
  
•  Apache	
  Top-­‐Level	
  Project	
  
•  SQl	
  to	
  hadOOP	
  
•  Tool	
  to	
  transfer	
  data	
  from	
  
rela4onal	
  databases	
  
•  Teradata,	
  MySQL,	
  PostgreSQL,	
  
Oracle,	
  Netezza	
  
•  To/From	
  Hadoop	
  ecosystem	
  
•  HDFS	
  (text,	
  sequence	
  file),	
  
Hive,	
  HBase,	
  Avro	
  
3
Why	
  Sqoop?	
  
•  Efficient/Controlled	
  resource	
  u4liza4on	
  
•  Concurrent	
  connec4ons,	
  Time	
  of	
  opera4on	
  
•  Datatype	
  mapping	
  and	
  conversion	
  
•  Automa4c,	
  and	
  User	
  override	
  
•  Metadata	
  propaga4on	
  
•  Sqoop	
  Record	
  
•  Hive	
  Metastore	
  
•  Avro	
  
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples	
  
•  Sqoop	
  1	
  Challenges	
  
•  Troubleshoo4ng	
  Sqoop	
  1	
  
•  Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Sqoop	
  2	
  
•  Sqoop	
  2	
  Architecture	
  
•  Sqoop	
  2	
  Design	
  Goals	
  
•  Sqoop	
  2	
  UI	
  in	
  Hue	
  
Resources	
  
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples	
  
•  Sqoop	
  1	
  Challenges	
  
•  Troubleshoo4ng	
  Sqoop	
  1	
  
•  Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Sqoop	
  2	
  
•  Sqoop	
  2	
  Architecture	
  
•  Sqoop	
  2	
  Design	
  Goals	
  
•  Sqoop	
  2	
  UI	
  in	
  Hue	
  
Resources	
  
Sqoop	
  1	
  Architecture	
  
7
Sqoop	
  1	
  Command	
  Line	
  
sqoop TOOL PROPS ARG [-- EXTRA]
•  TOOL:	
  import,	
  export	
  
•  PROPS
•  Hadoop	
  (java)	
  proper4es	
  
•  -Dwhatever.whenever=yes
•  ARG
•  Generic	
  SQOOP	
  arguments	
  
•  --table, --connect,	
  ...	
  
•  EXTRA
•  connector	
  specific	
  
•  --schema (PostgreSQL	
  and	
  Microsoa	
  SQL	
  Server)	
  
Sqoop	
  1	
  Example	
  
sqoop import 
--connect jdbc:mysql://mysql.example.com/sqoop 
--username sqoop --password sqoop 
--table cities
sqoop export 
--connect jdbc:mysql://mysql.example.com/sqoop 
--username sqoop --password sqoop 
--table cities 
--export-dir /temp/cities
Sqoop	
  1	
  Challenges	
  
•  Cryp4c,	
  contextual	
  command	
  line	
  arguments	
  
•  Security	
  concerns	
  
•  Type	
  mapping	
  is	
  not	
  clearly	
  defined	
  
•  Client	
  needs	
  access	
  to	
  Hadoop	
  binaries/configura4on	
  
and	
  database	
  
•  JDBC	
  model	
  is	
  enforced	
  
10
Troubleshoo4ng	
  Sqoop	
  1	
  
•  Versions:	
  Sqoop,	
  Hadoop,	
  OS,	
  JDBC	
  
•  Console	
  log	
  aaer	
  running	
  with	
  the	
  --verbose flag	
  
•  Capture	
  the	
  en4re	
  output	
  via	
  sqoop import … &> sqoop.log
•  En4re	
  Sqoop	
  command	
  including	
  the	
  op4ons-­‐file	
  if	
  applicable	
  
•  Expected	
  output	
  and	
  actual	
  output	
  
•  Table	
  defini4on	
  
•  Small	
  input	
  data	
  set	
  that	
  triggers	
  the	
  problem	
  
•  Especially	
  with	
  export,	
  malformed	
  data	
  is	
  oaen	
  the	
  culprit	
  
•  Hadoop	
  task	
  logs	
  
•  Oaen	
  the	
  task	
  logs	
  contain	
  further	
  informa4on	
  describing	
  the	
  problem	
  
•  Permissions	
  on	
  input	
  files	
  
Troubleshoo4ng	
  Sqoop	
  1	
  
Imported	
  table	
  has	
  more	
  rows	
  than	
  source	
  table?	
  
•  Data	
  contains	
  char	
  used	
  as	
  Hive’s	
  delimiters	
  
•  Clean	
  up	
  data	
  
•  --hive-drop-import-delims
•  Removes	
  n, t, and 01 char
•  --hive-delims-replacement “SPECIAL”
•  Replaces	
  n, t, and 01	
  char	
  with	
  string	
  SPECIAL
•  Not	
  restricted	
  to	
  Hive	
  -­‐	
  any	
  import	
  job	
  using	
  text	
  files	
  
•  Ensure	
  output	
  files	
  have	
  one	
  line	
  per	
  imported	
  row	
  
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples	
  
•  Sqoop	
  1	
  Challenges	
  
•  Troubleshoo4ng	
  Sqoop	
  1	
  
•  Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Sqoop	
  2	
  
•  Sqoop	
  2	
  Architecture	
  
•  Sqoop	
  2	
  Design	
  Goals	
  
•  Sqoop	
  2	
  UI	
  in	
  Hue	
  
Resources	
  
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Protec4ng	
  Your	
  Password	
  
sqoop import 
--connect jdbc:mysql://mysql.example.com/sqoop 
--username sqoop 
--table cities 
-P
sqoop import 
--connect jdbc:mysql://mysql.example.com/sqoop 
--username sqoop 
--table cities 
--password-file my-sqoop-password
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
Character parameter '|' has multiple characters;
only the first will be used.
Got error creating database manager:
java.io.IOException:
No manager for connect string: "jdbc:teradata...”
Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
sqoop import --password "spEci@l$" 
–connect 'jdbc:x:/yyy;db=sqoop’
•  Remove	
  all	
  escaping	
  that	
  you’ve	
  added	
  for	
  the	
  shell	
  
•  Use	
  <arg>	
  vs	
  <command>	
  tags	
  as	
  content	
  is	
  
considered	
  to	
  be	
  one	
  parameter	
  
•  Put	
  all	
  -­‐D	
  parameters	
  into	
  configura4on	
  sec4on	
  
•  Install	
  driver	
  into	
  workflow’s	
  lib/	
  directory	
  or	
  shared	
  
ac4on	
  library	
  /user/oozie/share/lib/sqoop/	
  
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Choosing	
  Proper	
  Connector	
  
•  JDBC	
  driver	
  is	
  dependency	
  for	
  all	
  
three	
  connectors	
  
•  Sqoop	
  automa4cally	
  chooses	
  
most	
  op4mal	
  connector	
  
(OraOoop,	
  built-­‐in,	
  	
  
	
  	
  	
  	
  Generic	
  JDBC	
  Connector)	
  
•  Or	
  explicitly	
  chose:	
  	
  
--connection-manager
com.quest.oraoop.OraOopConnManager
Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Overriding	
  Type	
  Mapping	
  
-­‐-­‐map-­‐column-­‐java	
  parameter	
  
•  comma	
  separated	
  list	
  of	
  key-­‐value	
  pairs	
  
•  key	
  =	
  exact	
  column	
  name	
  
•  value	
  =	
  target	
  Java	
  type	
  	
  
sqoop import 
--map-column-java 
c1=Float,c2=String,c3=String ...
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples	
  
•  Sqoop	
  1	
  Challenges	
  
•  Troubleshoo4ng	
  Sqoop	
  1	
  
•  Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Sqoop	
  2	
  
•  Sqoop	
  2	
  Architecture	
  
•  Sqoop	
  2	
  Design	
  Goals	
  
•  Sqoop	
  2	
  UI	
  in	
  Hue	
  
Resources	
  
Sqoop	
  2	
  Architecture	
  
25
Sqoop	
  2	
  Design	
  Goals	
  
•  Security	
  and	
  Separa4on	
  of	
  Concerns	
  
•  Role	
  based	
  access	
  and	
  use	
  
•  Ease	
  of	
  extension	
  
•  No	
  low-­‐level	
  Hadoop	
  knowledge	
  needed	
  	
  
•  No	
  func4onal	
  overlap	
  between	
  Connectors	
  
•  Ease	
  of	
  Use	
  
•  Uniform	
  func4onality	
  
•  Domain	
  specific	
  interac4ons	
  
Sqoop	
  2	
  UI	
  in	
  Hue	
  
•  Troubleshoo4ng	
  
•  sqoop.log	
  file	
  is	
  located	
  in	
  @LOGDIR@	
  and	
  the	
  rest	
  should	
  
be	
  in	
  server/logs/*	
  
•  Look	
  for	
  catalina.out,	
  catalina.log,	
  localhost-­‐*.log	
  
28
29
30
31
32
33
34
35
36
37
Agenda	
  
Sqoop	
  1	
  
•  Sqoop	
  1	
  Architecture	
  
•  Sqoop	
  1	
  Command	
  Line	
  
•  Sqoop	
  1	
  Examples	
  
•  Sqoop	
  1	
  Challenges	
  
•  Troubleshoo4ng	
  Sqoop	
  1	
  
•  Common	
  Sqoop	
  1	
  Issues	
  
•  Protec4ng	
  Your	
  Password	
  
•  Sqoop	
  Works	
  on	
  CLI	
  Not	
  in	
  Oozie	
  
•  Choosing	
  Proper	
  Connector	
  
•  Overriding	
  Type	
  Mapping	
  
Sqoop	
  2	
  
•  Sqoop	
  2	
  Architecture	
  
•  Sqoop	
  2	
  Design	
  Goals	
  
•  Sqoop	
  2	
  UI	
  in	
  Hue	
  
Resources	
  
Resources	
  
39
Sqoop 2
http://archive-primary.cloudera.com/
cdh5/cdh/5/sqoop2/
Sqoop 1

Contenu connexe

Tendances

Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprisesnvvrajesh
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Helena Edelson
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarDatabricks
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 

Tendances (20)

Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 

En vedette

Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
Cy3902 formation-cloudera-developer-training-for-apache-hadoop
Cy3902 formation-cloudera-developer-training-for-apache-hadoopCy3902 formation-cloudera-developer-training-for-apache-hadoop
Cy3902 formation-cloudera-developer-training-for-apache-hadoopCERTyou Formation
 
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...Excelerate Systems
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Spring for Apache Hadoop
Spring for Apache HadoopSpring for Apache Hadoop
Spring for Apache Hadoopzenyk
 
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)Spark Summit
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and OracleTanel Poder
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 

En vedette (10)

Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Cy3902 formation-cloudera-developer-training-for-apache-hadoop
Cy3902 formation-cloudera-developer-training-for-apache-hadoopCy3902 formation-cloudera-developer-training-for-apache-hadoop
Cy3902 formation-cloudera-developer-training-for-apache-hadoop
 
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Spring for Apache Hadoop
Spring for Apache HadoopSpring for Apache Hadoop
Spring for Apache Hadoop
 
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 

Similaire à Apache Sqoop: Unlocking Hadoop for Your Relational Database

44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...
44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...
44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...44CON
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsNETWAYS
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
Velocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopVelocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopChef Software, Inc.
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"IT Event
 
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSCloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSAWS Vietnam Community
 
SCALE12X: Chef for OpenStack
SCALE12X: Chef for OpenStackSCALE12X: Chef for OpenStack
SCALE12X: Chef for OpenStackMatt Ray
 
Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseClark & Parsia LLC
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Databasekendallclark
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Neo4j
 
Oracle hadoop let them talk together !
Oracle hadoop let them talk together !Oracle hadoop let them talk together !
Oracle hadoop let them talk together !Laurent Leturgez
 
201304 chef for open stack overview
201304 chef for open stack overview201304 chef for open stack overview
201304 chef for open stack overviewOpenStack Foundation
 

Similaire à Apache Sqoop: Unlocking Hadoop for Your Relational Database (20)

44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...
44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...
44CON 2014 - Pentesting NoSQL DB's Using NoSQL Exploitation Framework, Franci...
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy Hawkins
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
Velocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack WorkshopVelocity 2011 Chef OpenStack Workshop
Velocity 2011 Chef OpenStack Workshop
 
YARN
YARNYARN
YARN
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
 
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSCloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
 
SCALE12X: Chef for OpenStack
SCALE12X: Chef for OpenStackSCALE12X: Chef for OpenStack
SCALE12X: Chef for OpenStack
 
Polyglot Grails
Polyglot GrailsPolyglot Grails
Polyglot Grails
 
Chef For OpenStack Overview
Chef For OpenStack OverviewChef For OpenStack Overview
Chef For OpenStack Overview
 
Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF Database
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
OCCIware
OCCIwareOCCIware
OCCIware
 
遇見 Ruby on Rails
遇見 Ruby on Rails遇見 Ruby on Rails
遇見 Ruby on Rails
 
Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0
 
Oracle hadoop let them talk together !
Oracle hadoop let them talk together !Oracle hadoop let them talk together !
Oracle hadoop let them talk together !
 
201304 chef for open stack overview
201304 chef for open stack overview201304 chef for open stack overview
201304 chef for open stack overview
 

Plus de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Plus de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Dernier

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Apache Sqoop: Unlocking Hadoop for Your Relational Database

  • 1.            Unlocking  Hadoop  for  Your  Rela4onal  DB             Kathleen Ting | @kate_ting Technical Account Manager, Cloudera | Sqoop PMC Member Hadoop User Group UK 10 April 2014      
  • 2. Who  Am  I?   •  Started  3  yr  ago  as  1st  Cloudera  Support  Eng   •  Now  manages  Cloudera’s  2  largest  customers   •  Sqoop  CommiJer,  PMC  Member   •  Co-­‐Author  of  the  Apache  Sqoop  Cookbook  
  • 3. What  is  Sqoop?   •  Apache  Top-­‐Level  Project   •  SQl  to  hadOOP   •  Tool  to  transfer  data  from   rela4onal  databases   •  Teradata,  MySQL,  PostgreSQL,   Oracle,  Netezza   •  To/From  Hadoop  ecosystem   •  HDFS  (text,  sequence  file),   Hive,  HBase,  Avro   3
  • 4. Why  Sqoop?   •  Efficient/Controlled  resource  u4liza4on   •  Concurrent  connec4ons,  Time  of  opera4on   •  Datatype  mapping  and  conversion   •  Automa4c,  and  User  override   •  Metadata  propaga4on   •  Sqoop  Record   •  Hive  Metastore   •  Avro  
  • 5. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 6. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 8. Sqoop  1  Command  Line   sqoop TOOL PROPS ARG [-- EXTRA] •  TOOL:  import,  export   •  PROPS •  Hadoop  (java)  proper4es   •  -Dwhatever.whenever=yes •  ARG •  Generic  SQOOP  arguments   •  --table, --connect,  ...   •  EXTRA •  connector  specific   •  --schema (PostgreSQL  and  Microsoa  SQL  Server)  
  • 9. Sqoop  1  Example   sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities sqoop export --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --table cities --export-dir /temp/cities
  • 10. Sqoop  1  Challenges   •  Cryp4c,  contextual  command  line  arguments   •  Security  concerns   •  Type  mapping  is  not  clearly  defined   •  Client  needs  access  to  Hadoop  binaries/configura4on   and  database   •  JDBC  model  is  enforced   10
  • 11. Troubleshoo4ng  Sqoop  1   •  Versions:  Sqoop,  Hadoop,  OS,  JDBC   •  Console  log  aaer  running  with  the  --verbose flag   •  Capture  the  en4re  output  via  sqoop import … &> sqoop.log •  En4re  Sqoop  command  including  the  op4ons-­‐file  if  applicable   •  Expected  output  and  actual  output   •  Table  defini4on   •  Small  input  data  set  that  triggers  the  problem   •  Especially  with  export,  malformed  data  is  oaen  the  culprit   •  Hadoop  task  logs   •  Oaen  the  task  logs  contain  further  informa4on  describing  the  problem   •  Permissions  on  input  files  
  • 12. Troubleshoo4ng  Sqoop  1   Imported  table  has  more  rows  than  source  table?   •  Data  contains  char  used  as  Hive’s  delimiters   •  Clean  up  data   •  --hive-drop-import-delims •  Removes  n, t, and 01 char •  --hive-delims-replacement “SPECIAL” •  Replaces  n, t, and 01  char  with  string  SPECIAL •  Not  restricted  to  Hive  -­‐  any  import  job  using  text  files   •  Ensure  output  files  have  one  line  per  imported  row  
  • 13. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 14. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 15. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 16. Protec4ng  Your  Password   sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --table cities -P sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --table cities --password-file my-sqoop-password
  • 17. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 18. Sqoop  Works  on  CLI  Not  in  Oozie   Character parameter '|' has multiple characters; only the first will be used. Got error creating database manager: java.io.IOException: No manager for connect string: "jdbc:teradata...”
  • 19. Sqoop  Works  on  CLI  Not  in  Oozie   sqoop import --password "spEci@l$" –connect 'jdbc:x:/yyy;db=sqoop’ •  Remove  all  escaping  that  you’ve  added  for  the  shell   •  Use  <arg>  vs  <command>  tags  as  content  is   considered  to  be  one  parameter   •  Put  all  -­‐D  parameters  into  configura4on  sec4on   •  Install  driver  into  workflow’s  lib/  directory  or  shared   ac4on  library  /user/oozie/share/lib/sqoop/  
  • 20. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 21. Choosing  Proper  Connector   •  JDBC  driver  is  dependency  for  all   three  connectors   •  Sqoop  automa4cally  chooses   most  op4mal  connector   (OraOoop,  built-­‐in,            Generic  JDBC  Connector)   •  Or  explicitly  chose:     --connection-manager com.quest.oraoop.OraOopConnManager
  • 22. Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping  
  • 23. Overriding  Type  Mapping   -­‐-­‐map-­‐column-­‐java  parameter   •  comma  separated  list  of  key-­‐value  pairs   •  key  =  exact  column  name   •  value  =  target  Java  type     sqoop import --map-column-java c1=Float,c2=String,c3=String ...
  • 24. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources  
  • 26. Sqoop  2  Design  Goals   •  Security  and  Separa4on  of  Concerns   •  Role  based  access  and  use   •  Ease  of  extension   •  No  low-­‐level  Hadoop  knowledge  needed     •  No  func4onal  overlap  between  Connectors   •  Ease  of  Use   •  Uniform  func4onality   •  Domain  specific  interac4ons  
  • 27. Sqoop  2  UI  in  Hue   •  Troubleshoo4ng   •  sqoop.log  file  is  located  in  @LOGDIR@  and  the  rest  should   be  in  server/logs/*   •  Look  for  catalina.out,  catalina.log,  localhost-­‐*.log  
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. 37
  • 38. Agenda   Sqoop  1   •  Sqoop  1  Architecture   •  Sqoop  1  Command  Line   •  Sqoop  1  Examples   •  Sqoop  1  Challenges   •  Troubleshoo4ng  Sqoop  1   •  Common  Sqoop  1  Issues   •  Protec4ng  Your  Password   •  Sqoop  Works  on  CLI  Not  in  Oozie   •  Choosing  Proper  Connector   •  Overriding  Type  Mapping   Sqoop  2   •  Sqoop  2  Architecture   •  Sqoop  2  Design  Goals   •  Sqoop  2  UI  in  Hue   Resources