SlideShare une entreprise Scribd logo
1  sur  57
Big data beyond Hadoop –
How to integrate ALL your data
Kai	
  Wähner	
  
kwaehner@talend.com	
  
@KaiWaehner	
  
www.kai-­‐waehner.de	
  
4/26/13	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Consulting
Developing
Coaching
Speaking
Writing
Main Tasks
Requirements Engineering
Enterprise Architecture Management
Business Process Management
Architecture and Development of Applications
Service-oriented Architecture
Integration of Legacy Applications
Cloud Computing
Big Data
Contact
Email: kontakt@kai-waehner.de
Blog: www.kai-waehner.de/blog
Twitter: @KaiWaehner
Social Networks: Xing, LinkedIn
Kai Wähner
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Key messages
You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
• Big	
  data	
  paradigm	
  shiM	
  	
  
• Challenges	
  of	
  big	
  data	
  
• Big	
  data	
  from	
  a	
  technology	
  perspecPve	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  framework	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  suite	
  
Agenda
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
• Big	
  data	
  paradigm	
  shiM	
  	
  
• Challenges	
  of	
  big	
  data	
  
• Big	
  data	
  from	
  a	
  technology	
  perspecPve	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  framework	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  suite	
  
Agenda
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
William	
  Edwards	
  Deming	
  	
  
(1900	
  –1993)	
  	
  
American	
  staPsPcian,	
  professor,	
  	
  
author,	
  lecturer	
  and	
  consultant	
  
“If	
  you	
  can't	
  measure	
  it,	
  	
  
you	
  can't	
  manage	
  it.”	
  
Why should you care about big data?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
è  „Silence	
  the	
  HiPPOs“	
  (highest-­‐paid	
  person‘s	
  opinion)	
  
è  Being	
  able	
  to	
  interpret	
  unimaginable	
  large	
  data	
  
stream,	
  the	
  gut	
  feeling	
  is	
  no	
  longer	
  jusPfied!	
  
	
  
Why should you care about big data?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
What is big data? The Vs of big data
Volume	
  	
  
(terabytes,	
  
petabytes)	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Variety	
  	
  
(social	
  networks,	
  
blog	
  posts,	
  logs,	
  
sensors,	
  etc.)	
  
	
  	
  	
  	
  	
  Velocity	
  	
  
	
  	
  	
  	
  	
  	
  (realPme	
  or	
  near-­‐
realPme)	
  
	
  
	
  
	
  
	
  
Value	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Big	
  Data	
  Integra3on	
  
–  Land	
  data	
  in	
  a	
  Big	
  Data	
  cluster	
  
–  Implement	
  or	
  generate	
  parallel	
  processes	
  
	
  
	
  Big	
  Data	
  Manipula3on	
  
–  Simplify	
  manipulaPon,	
  such	
  as	
  sort	
  and	
  filter	
  
–  ComputaPonal	
  expensive	
  funcPons	
  
	
  
Big	
  Data	
  Quality	
  &	
  Governance	
  
–  IdenPfy	
  linkages	
  and	
  duplicates,	
  validate	
  big	
  data	
  
–  Match	
  component,	
  execute	
  basic	
  quality	
  features	
  
	
  
Big	
  Data	
  Project	
  Management	
  
–  Place	
  frameworks	
  around	
  big	
  data	
  projects	
  
–  Common	
  Repository,	
  scheduling,	
  monitoring	
  	
  
Big data tasks to solve - before analysis
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
“The	
  advantage	
  of	
  their	
  new	
  system	
  is	
  that	
  they	
  can	
  now	
  look	
  at	
  their	
  data	
  
[from	
  their	
  log	
  processing	
  system]	
  in	
  anyway	
  they	
  want:	
  
➜  Nightly	
  MapReduce	
  jobs	
  collect	
  staPsPcs	
  about	
  their	
  mail	
  system	
  such	
  as	
  
spam	
  counts	
  by	
  domain,	
  bytes	
  transferred	
  and	
  number	
  of	
  logins.	
  	
  
➜  When	
  they	
  wanted	
  to	
  find	
  out	
  which	
  part	
  of	
  the	
  world	
  their	
  customers	
  
logged	
  in	
  from,	
  a	
  quick	
  [ad	
  hoc]	
  MapReduce	
  job	
  was	
  created	
  and	
  they	
  had	
  
the	
  answer	
  within	
  a	
  few	
  hours.	
  Not	
  really	
  possible	
  in	
  your	
  typical	
  ETL	
  
system.”	
  
hjp://highscalability.com/how-­‐rackspace-­‐now-­‐uses-­‐mapreduce-­‐and-­‐hadoop-­‐query-­‐terabytes-­‐data	
  
Use case: Replacing ETL jobs
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
hjp://hkotadia.com/archives/5021	
  
Deduce	
  
Customer	
  	
  
DefecPons	
  
Use case: Risk management
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
➜  With	
  revenue	
  of	
  almost	
  USD	
  30	
  billion	
  and	
  a	
  network	
  of	
  
800	
  locaPons,	
  Macy's	
  is	
  considered	
  the	
  largest	
  store	
  operator	
  in	
  the	
  
USA	
  
➜  Daily	
  price	
  check	
  analysis	
  of	
  its	
  10,000	
  arPcles	
  in	
  less	
  than	
  two	
  hours	
  
➜  Whenever	
  a	
  neighboring	
  compePtor	
  anywhere	
  between	
  New	
  York	
  
and	
  Los	
  Angeles	
  goes	
  for	
  aggressive	
  price	
  reducPons,	
  Macy's	
  follows	
  
its	
  example	
  
➜  If	
  there	
  is	
  no	
  market	
  compePtor,	
  the	
  prices	
  remain	
  unchanged	
  
hjp://www.t-­‐systems.com/about-­‐t-­‐systems/examples-­‐of-­‐successes-­‐companies-­‐analyze-­‐big-­‐data-­‐in-­‐record-­‐Pme-­‐l-­‐t-­‐systems/1029702	
  
Use case: Flexible pricing
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
• Big	
  data	
  paradigm	
  shiM	
  	
  
• Challenges	
  of	
  big	
  data	
  
• Big	
  data	
  from	
  a	
  technology	
  perspecPve	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  framework	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  suite	
  
Agenda
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
This is your
company
Big Data Geek
Limited big data experts
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
➜  Wanna	
  buy	
  a	
  big	
  data	
  soluPon	
  for	
  your	
  industry?	
  	
  
➜  Maybe	
  a	
  compePtor	
  has	
  a	
  big	
  data	
  soluPon	
  which	
  
adds	
  business	
  value?	
  
➜  The	
  compePtor	
  will	
  never	
  publish	
  it	
  (rat-­‐race)!	
  
Big data tool selection (business perspective)
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Looking	
  for	
  ‚your‘	
  required	
  big	
  data	
  product?	
  
Support	
  your	
  data	
  from	
  scratch?	
  	
  
Good	
  luck!	
  J	
  	
  
	
  
Big data tool selection (technical perspective)
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
How to solve these big data challenges?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
à  “[OMen]	
  simple	
  models	
  and	
  	
  
	
  big	
  data	
  trump	
  more-­‐elaborate	
  	
  
	
  [and	
  complex]	
  analyPcs	
  approaches”	
  
	
  
à  “OMen	
  someone	
  coming	
  from	
  	
  
	
  outside	
  an	
  industry	
  can	
  spot	
  	
  
	
  a	
  bejer	
  way	
  to	
  use	
  big	
  data	
  	
  
	
  than	
  an	
  insider”	
  	
  
	
  
	
   Erik	
  Brynjolfsson	
  /	
  Lynn	
  Wu	
  	
  
hjp://alfredopassos.tumblr.com/post/32461599327/big-­‐data-­‐the-­‐management-­‐revoluPon-­‐by-­‐andrew-­‐mcafee	
  
Be no expert! Be simple!
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
à	
  Look	
  at	
  use	
  cases	
  of	
  others	
  	
  
	
  (SMU,	
  but	
  also	
  large	
  companies)	
  
à	
  How	
  can	
  you	
  do	
  something	
  similar	
  
	
  with	
  your	
  data?	
  
	
  
à	
  You	
  have	
  different	
  data	
  sources?	
  	
  
	
  Use	
  it!	
  Combine	
  it!	
  Play	
  with	
  it!	
  
Be creative!
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
1)	
  Do	
  not	
  begin	
  with	
  the	
  data,	
  think	
  about	
  business	
  opportuniPes	
  
2)	
  Choose	
  the	
  right	
  data	
  (combine	
  different	
  data	
  sources)	
  
3)	
  Use	
  easy	
  tooling	
  
	
  
	
  hjp://hbr.org/2012/10/making-­‐advanced-­‐analyPcs-­‐work-­‐for-­‐you	
  
	
  
What is your Big Data process?
Step	
  1	
   Step	
  2	
   Step	
  3	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
• Big	
  data	
  paradigm	
  shiM	
  	
  
• Challenges	
  of	
  big	
  data	
  
• Big	
  data	
  from	
  a	
  technology	
  perspecPve	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  framework	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  suite	
  
Agenda
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Technology perspective
How	
  to	
  process	
  big	
  data?	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
	
  
The	
  criPcal	
  flaw	
  in	
  parallel	
  ETL	
  tools	
  is	
  the	
  fact	
  that	
  the	
  data	
  is	
  almost	
  never	
  local	
  to	
  the	
  processing	
  
nodes.	
  This	
  means	
  that	
  every	
  Pme	
  a	
  large	
  job	
  is	
  run,	
  the	
  data	
  has	
  to	
  first	
  be	
  read	
  from	
  the	
  source,	
  
split	
  N	
  ways	
  and	
  then	
  delivered	
  to	
  the	
  individual	
  nodes.	
  	
  Worse,	
  if	
  the	
  parPPon	
  key	
  of	
  the	
  source	
  
doesn’t	
  match	
  the	
  parPPon	
  key	
  of	
  the	
  target,	
  data	
  has	
  to	
  be	
  constantly	
  exchanged	
  among	
  the	
  
nodes.	
  In	
  essence,	
  parallel	
  ETL	
  treats	
  the	
  network	
  as	
  if	
  it	
  were	
  a	
  physical	
  I/O	
  subsystem.	
  	
  The	
  
network,	
  which	
  is	
  always	
  the	
  slowest	
  part	
  of	
  the	
  process,	
  becomes	
  the	
  weakest	
  link	
  in	
  the	
  
performance	
  chain.	
  	
  
hjp://blog.syncsort.com/2012/08/parallel-­‐etl-­‐tools-­‐are-­‐dead	
  
How to process big data?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Slides:	
  hjp://www.slideshare.net/pavlobaron/100-­‐big-­‐data-­‐0-­‐hadoop-­‐0-­‐java	
  
	
  
Video:	
  hjp://www.infoq.com/presentaPons/Big-­‐Data-­‐Hadoop-­‐Java	
  
How to process big data?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
The	
  defacto	
  standard	
  for	
  big	
  data	
  processing	
  
How to process big data?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Even	
  MicrosoM	
  (the	
  .NET	
  house)	
  relies	
  on	
  Hadoop	
  since	
  2011	
  
How to process big data?
“A	
  big	
  part	
  of	
  [the	
  
company’s	
  strategy]	
  
includes	
  wiring	
  SQL	
  Server	
  
2012	
  (formerly	
  known	
  by	
  
the	
  codename	
  “Denali”)	
  to	
  
the	
  Hadoop	
  distributed	
  
compuPng	
  playorm,	
  and	
  
bringing	
  Hadoop	
  to	
  
Windows	
  Server	
  and	
  Azure”	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Apache	
  Hadoop,	
  an	
  open-­‐source	
  soMware	
  library,	
  is	
  a	
  
framework	
  that	
  allows	
  for	
  the	
  distributed	
  processing	
  of	
  
large	
  data	
  sets	
  across	
  clusters	
  of	
  commodity	
  hardware	
  
using	
  simple	
  programming	
  models.	
  It	
  is	
  designed	
  to	
  scale	
  
up	
  from	
  single	
  servers	
  to	
  thousands	
  of	
  machines,	
  each	
  
offering	
  local	
  computaPon	
  and	
  storage.	
  	
  
	
  
What is Hadoop?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Simple	
  example	
  
•  Input:	
  (very	
  large)	
  text	
  files	
  with	
  lists	
  of	
  strings,	
  such	
  as:	
  	
  
	
  „318,	
  0043012650999991949032412004...0500001N9+01111+99999999999...“	
  
•  We	
  are	
  interested	
  just	
  in	
  some	
  content:	
  year	
  and	
  temperate	
  (marked	
  in	
  red)	
  
•  The	
  Map	
  Reduce	
  funcPon	
  has	
  to	
  compute	
  the	
  maximum	
  temperature	
  for	
  every	
  year	
  
Example	
  from	
  the	
  book	
  “Hadoop:	
  The	
  DefiniPve	
  Guide,	
  3rd	
  EdiPon”	
  
Map (Shuffle) Reduce
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
How to process big data?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
MapReduce
HDFS
Ecosystem
Features
included
Hadoop	
  
DistribuPon	
  
Big	
  Data	
  Suite	
  
few many
Apache
Hadoop
Packaging
Deployment-Tooling
Support
+
Tooling / Modeling
Code Generation
Scheduling
Integration
+
Hadoop alternatives
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
• Big	
  data	
  paradigm	
  shiM	
  	
  
• Challenges	
  of	
  big	
  data	
  
• Big	
  data	
  from	
  a	
  technology	
  perspecPve	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  framework	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  suite	
  
Agenda
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Connectivity
Routing
Transformation
Complexity
of Integration
Enterprise	
  
Service	
  Bus	
  
IntegraPon	
  Suite	
  
Low High
Integration
Framework
INTEGRATION
Tooling
Monitoring
Support+
BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“
+
Alternatives for systems integration
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Complexity
of Integration
Enterprise	
  
Service	
  Bus	
  
IntegraPon	
  Suite	
  
Low High
Integration
Framework
Alternatives for systems integration
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
More details about integration frameworks...
hjp://www.kai-­‐waehner.de/blog/2012/12/20/showdown-­‐integraPon-­‐framework-­‐
spring-­‐integraPon-­‐apache-­‐camel-­‐vs-­‐enterprise-­‐service-­‐bus-­‐esb/	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Enterprise Integration Patterns (EIP)
Apache Camel
Implements the EIPs
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Enterprise Integration Patterns (EIP)
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Enterprise Integration Patterns (EIP)
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Architecture
hjp://java.dzone.com/arPcles/apache-­‐camel-­‐integraPon	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
HTTP	
  
FTP	
  
File	
  
XSLT	
  
MQ	
  
JDBC
Akka	
  
TCP	
  
SMTP	
  
RSS	
  
Quartz	
  
Log	
  
LDAP	
  
JMS	
  
EJB	
  
AMQP	
  
Atom	
  
AWS-S3	
  
Bean-Validation	
  
CXF	
  
IRC	
  
Jetty	
  
JMX	
  
Lucene	
  
Netty	
  
RMI	
  
SQL	
  
Many many more	
   Custom Components
Choose your required components
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Choose your favorite DSL
XML
(not production-ready yet)
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Deploy it wherever you need
Standalone
OSGi
Application Server
Web Container
Spring Container
Cloud
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Enterprise-ready
• Open Source
• Scalability
• Error Handling
• Transaction
• Monitoring
• Tooling
• Commercial Support
	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Example: Camel integration route
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Example: camel-hdfs component
// Producer
from(“jms:MyQueue")
.to(“hdfs:///myDirectory/myFile.txt?valueType=TEXT");
// Consumer
from(“hdfs:///myDirectory/myFile.txt")
.to(“file:target/reports/report.txt");
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Live demo
Apache Camel in action...
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
• Big	
  data	
  paradigm	
  shiM	
  	
  
• Challenges	
  of	
  big	
  data	
  
• Big	
  data	
  from	
  a	
  technology	
  perspecPve	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  framework	
  
• IntegraPon	
  with	
  an	
  open	
  source	
  suite	
  
Agenda
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Connectivity
Routing
Transformation
Complexity
of Integration
Enterprise	
  
Service	
  Bus	
  
IntegraPon	
  Suite	
  
Low High
Integration
Framework
INTEGRATION
Tooling
Monitoring
Support+
BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“
+
Alternatives for systems integration
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Complexity
of Integration
Enterprise	
  
Service	
  Bus	
  
IntegraPon	
  Suite	
  
Low High
Integration
Framework
Alternatives for systems integration
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
More details about ESBs and suites...
hjp://www.kai-­‐waehner.de/blog/2013/01/23/spoilt-­‐for-­‐choice-­‐
how-­‐to-­‐choose-­‐the-­‐right-­‐enterprise-­‐service-­‐bus-­‐esb/	
  
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
…an	
  open	
  source	
  
ecosystem	
  
Talend	
  Open	
  Studio	
  for	
  Big	
  Data	
  
	
  
•  Improves	
  efficiency	
  of	
  big	
  data	
  job	
  design	
  with	
  
graphic	
  interface	
  
•  Generates	
  Hadoop	
  code	
  and	
  run	
  transforms	
  
inside	
  Hadoop	
  
•  NaPve	
  support	
  for	
  HDFS,	
  Pig,	
  Hbase,	
  Hcatalog,	
  
Sqoop	
  and	
  Hive	
  
•  100%	
  open	
  source	
  under	
  an	
  Apache	
  License	
  
•  Standards	
  based	
  
Pig
Vision: Democratize big data
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
…an	
  open	
  source	
  
ecosystem	
  
Talend	
  PlaAorm	
  for	
  Big	
  Data	
  
	
  
•  Builds	
  on	
  Talend	
  Open	
  Studio	
  for	
  Big	
  Data	
  
•  Adds	
  data	
  quality,	
  advanced	
  scalability	
  and	
  
management	
  funcPons	
  
•  MapReduce	
  massively	
  parallel	
  data	
  
processing	
  
•  Shared	
  Repository	
  and	
  remote	
  deployment	
  
•  Data	
  quality	
  and	
  profiling	
  
•  Data	
  cleansing	
  
•  ReporPng	
  and	
  dashboards	
  
•  Commercial	
  support,	
  warranty/IP	
  indemnity	
  
under	
  a	
  subscripPon	
  license	
  
Pig
Vision: Democratize big data
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Talend Open Studio for Big Data
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
„Talend	
  Open	
  Studio	
  for	
  Big	
  Data“	
  in	
  acPon...	
  
Live demo
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Did you get the key message?
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Key messages
You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
©	
  Talend	
  2013	
   	
   	
   	
  “Big	
  Data	
  beyond	
  Hadoop	
  –	
  How	
  to	
  integrate	
  ALL	
  your	
  Data”	
  by	
  Kai	
  Wähner	
  
	
  
Did you get the key message?
Thank you for your attention. Questions?
kwaehner@talend.com
www.kai-waehner.de
LinkedIn / Xing
@KaiWaehner

Contenu connexe

Tendances

Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauDATAVERSITY
 
Big Data Platform and Architecture Recommendation
Big Data Platform and Architecture RecommendationBig Data Platform and Architecture Recommendation
Big Data Platform and Architecture RecommendationSofyan Hadi AHmad
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...Kai Wähner
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Kai Wähner
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Kai Wähner
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms Arne Roßmann
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big dataJack (Yaakov) Bezalel
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySamanthaBerlant
 
5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the CloudTableau Software
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017Jeremy Maranitch
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraDATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow VMware Tanzu
 

Tendances (20)

Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
 
Big Data Platform and Architecture Recommendation
Big Data Platform and Architecture RecommendationBig Data Platform and Architecture Recommendation
Big Data Platform and Architecture Recommendation
 
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
JBoss OneDayTalk 2013: "NoSQL Integration with Apache Camel - MongoDB, CouchD...
 
How to Streamline DataOps on AWS
How to Streamline DataOps on AWSHow to Streamline DataOps on AWS
How to Streamline DataOps on AWS
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to Cassandra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
 

En vedette

Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)Kai Wähner
 
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...Kai Wähner
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Kai Wähner
 
Btug.be - Integrate 2016 Recap by Pieter Vandenheede
Btug.be - Integrate 2016 Recap by Pieter VandenheedeBtug.be - Integrate 2016 Recap by Pieter Vandenheede
Btug.be - Integrate 2016 Recap by Pieter VandenheedePieter Vandenheede
 
Enterprise Integration Patterns
Enterprise Integration PatternsEnterprise Integration Patterns
Enterprise Integration PatternsMarek Sokół
 
BizTalk 2016: The T-Rex has new specs
BizTalk 2016: The T-Rex has new specsBizTalk 2016: The T-Rex has new specs
BizTalk 2016: The T-Rex has new specsPieter Vandenheede
 
Data Warehouse (DWH) with MySQL
Data Warehouse (DWH) with MySQLData Warehouse (DWH) with MySQL
Data Warehouse (DWH) with MySQLFromDual GmbH
 
Reactive Architecture (PHPCon PL 2015)
Reactive Architecture (PHPCon PL 2015)Reactive Architecture (PHPCon PL 2015)
Reactive Architecture (PHPCon PL 2015)Marek Sokół
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
BizTalk Orchestration Fundamentals
BizTalk Orchestration FundamentalsBizTalk Orchestration Fundamentals
BizTalk Orchestration FundamentalsManoj Kumar
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingVenu Anuganti
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with HadoopSagar Jauhari
 
BizTalk Server with SQL Server AlwaysOn
BizTalk Server with SQL Server AlwaysOnBizTalk Server with SQL Server AlwaysOn
BizTalk Server with SQL Server AlwaysOnBizTalk360
 
Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...
Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...
Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...Kai Wähner
 
The Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
The Hardest Part of Microservices: Your Data - Christian Posta, Red HatThe Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
The Hardest Part of Microservices: Your Data - Christian Posta, Red HatAmbassador Labs
 
Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...
Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...
Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...Ambassador Labs
 
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Kai Wähner
 
Informatica
InformaticaInformatica
Informaticamukharji
 

En vedette (20)

Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
Microservices - Death of the Enterprise Service Bus (ESB)? (Update 2016)
 
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
 
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...
 
Btug.be - Integrate 2016 Recap by Pieter Vandenheede
Btug.be - Integrate 2016 Recap by Pieter VandenheedeBtug.be - Integrate 2016 Recap by Pieter Vandenheede
Btug.be - Integrate 2016 Recap by Pieter Vandenheede
 
Enterprise Integration Patterns
Enterprise Integration PatternsEnterprise Integration Patterns
Enterprise Integration Patterns
 
BizTalk 2016: The T-Rex has new specs
BizTalk 2016: The T-Rex has new specsBizTalk 2016: The T-Rex has new specs
BizTalk 2016: The T-Rex has new specs
 
Data Warehouse (DWH) with MySQL
Data Warehouse (DWH) with MySQLData Warehouse (DWH) with MySQL
Data Warehouse (DWH) with MySQL
 
diabetes mellitus project
diabetes mellitus projectdiabetes mellitus project
diabetes mellitus project
 
Reactive Architecture (PHPCon PL 2015)
Reactive Architecture (PHPCon PL 2015)Reactive Architecture (PHPCon PL 2015)
Reactive Architecture (PHPCon PL 2015)
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
BizTalk Orchestration Fundamentals
BizTalk Orchestration FundamentalsBizTalk Orchestration Fundamentals
BizTalk Orchestration Fundamentals
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, Warehousing
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with Hadoop
 
BizTalk Server with SQL Server AlwaysOn
BizTalk Server with SQL Server AlwaysOnBizTalk Server with SQL Server AlwaysOn
BizTalk Server with SQL Server AlwaysOn
 
Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...
Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...
Showdown: Integration Framework (Spring Integration, Apache Camel) vs. Enterp...
 
The Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
The Hardest Part of Microservices: Your Data - Christian Posta, Red HatThe Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
The Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
 
Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...
Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...
Microservices are the Future! (...and always will be) - Josh Holtzman, PayPal...
 
EAI example
EAI exampleEAI example
EAI example
 
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
 
Informatica
InformaticaInformatica
Informatica
 

Similaire à Big Data beyond Apache Hadoop - How to integrate ALL your Data

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integrationjazoon13
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Kai Wähner
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Big Data in small words
Big Data in small wordsBig Data in small words
Big Data in small wordsYogesh Tomar
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Dell World
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleBardess Group
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikBardess Group
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondPatrick Bouillaud
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapterRajiv Tiwari
 

Similaire à Big Data beyond Apache Hadoop - How to integrate ALL your Data (20)

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integration
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Big Data
Big DataBig Data
Big Data
 
Big Data in small words
Big Data in small wordsBig Data in small words
Big Data in small words
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
Big Data
Big DataBig Data
Big Data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyond
 
Road Map for Careers in Big Data
Road Map for Careers in Big DataRoad Map for Careers in Big Data
Road Map for Careers in Big Data
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapter
 

Plus de Kai Wähner

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Kai Wähner
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail IndustryKai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingKai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesKai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Kai Wähner
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsKai Wähner
 

Plus de Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
 

Dernier

Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 

Dernier (20)

Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 

Big Data beyond Apache Hadoop - How to integrate ALL your Data

  • 1. Big data beyond Hadoop – How to integrate ALL your data Kai  Wähner   kwaehner@talend.com   @KaiWaehner   www.kai-­‐waehner.de   4/26/13  
  • 2. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Consulting Developing Coaching Speaking Writing Main Tasks Requirements Engineering Enterprise Architecture Management Business Process Management Architecture and Development of Applications Service-oriented Architecture Integration of Legacy Applications Cloud Computing Big Data Contact Email: kontakt@kai-waehner.de Blog: www.kai-waehner.de/blog Twitter: @KaiWaehner Social Networks: Xing, LinkedIn Kai Wähner
  • 3. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science!
  • 4. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     • Big  data  paradigm  shiM     • Challenges  of  big  data   • Big  data  from  a  technology  perspecPve   • IntegraPon  with  an  open  source  framework   • IntegraPon  with  an  open  source  suite   Agenda
  • 5. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     • Big  data  paradigm  shiM     • Challenges  of  big  data   • Big  data  from  a  technology  perspecPve   • IntegraPon  with  an  open  source  framework   • IntegraPon  with  an  open  source  suite   Agenda
  • 6. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     William  Edwards  Deming     (1900  –1993)     American  staPsPcian,  professor,     author,  lecturer  and  consultant   “If  you  can't  measure  it,     you  can't  manage  it.”   Why should you care about big data?
  • 7. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     è  „Silence  the  HiPPOs“  (highest-­‐paid  person‘s  opinion)   è  Being  able  to  interpret  unimaginable  large  data   stream,  the  gut  feeling  is  no  longer  jusPfied!     Why should you care about big data?
  • 8. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     What is big data? The Vs of big data Volume     (terabytes,   petabytes)                     Variety     (social  networks,   blog  posts,  logs,   sensors,  etc.)            Velocity                (realPme  or  near-­‐ realPme)           Value  
  • 9. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Big  Data  Integra3on   –  Land  data  in  a  Big  Data  cluster   –  Implement  or  generate  parallel  processes      Big  Data  Manipula3on   –  Simplify  manipulaPon,  such  as  sort  and  filter   –  ComputaPonal  expensive  funcPons     Big  Data  Quality  &  Governance   –  IdenPfy  linkages  and  duplicates,  validate  big  data   –  Match  component,  execute  basic  quality  features     Big  Data  Project  Management   –  Place  frameworks  around  big  data  projects   –  Common  Repository,  scheduling,  monitoring     Big data tasks to solve - before analysis
  • 10. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     “The  advantage  of  their  new  system  is  that  they  can  now  look  at  their  data   [from  their  log  processing  system]  in  anyway  they  want:   ➜  Nightly  MapReduce  jobs  collect  staPsPcs  about  their  mail  system  such  as   spam  counts  by  domain,  bytes  transferred  and  number  of  logins.     ➜  When  they  wanted  to  find  out  which  part  of  the  world  their  customers   logged  in  from,  a  quick  [ad  hoc]  MapReduce  job  was  created  and  they  had   the  answer  within  a  few  hours.  Not  really  possible  in  your  typical  ETL   system.”   hjp://highscalability.com/how-­‐rackspace-­‐now-­‐uses-­‐mapreduce-­‐and-­‐hadoop-­‐query-­‐terabytes-­‐data   Use case: Replacing ETL jobs
  • 11. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     hjp://hkotadia.com/archives/5021   Deduce   Customer     DefecPons   Use case: Risk management
  • 12. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     ➜  With  revenue  of  almost  USD  30  billion  and  a  network  of   800  locaPons,  Macy's  is  considered  the  largest  store  operator  in  the   USA   ➜  Daily  price  check  analysis  of  its  10,000  arPcles  in  less  than  two  hours   ➜  Whenever  a  neighboring  compePtor  anywhere  between  New  York   and  Los  Angeles  goes  for  aggressive  price  reducPons,  Macy's  follows   its  example   ➜  If  there  is  no  market  compePtor,  the  prices  remain  unchanged   hjp://www.t-­‐systems.com/about-­‐t-­‐systems/examples-­‐of-­‐successes-­‐companies-­‐analyze-­‐big-­‐data-­‐in-­‐record-­‐Pme-­‐l-­‐t-­‐systems/1029702   Use case: Flexible pricing
  • 13. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     • Big  data  paradigm  shiM     • Challenges  of  big  data   • Big  data  from  a  technology  perspecPve   • IntegraPon  with  an  open  source  framework   • IntegraPon  with  an  open  source  suite   Agenda
  • 14. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     This is your company Big Data Geek Limited big data experts
  • 15. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     ➜  Wanna  buy  a  big  data  soluPon  for  your  industry?     ➜  Maybe  a  compePtor  has  a  big  data  soluPon  which   adds  business  value?   ➜  The  compePtor  will  never  publish  it  (rat-­‐race)!   Big data tool selection (business perspective)
  • 16. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Looking  for  ‚your‘  required  big  data  product?   Support  your  data  from  scratch?     Good  luck!  J       Big data tool selection (technical perspective)
  • 17. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     How to solve these big data challenges?
  • 18. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     à  “[OMen]  simple  models  and      big  data  trump  more-­‐elaborate      [and  complex]  analyPcs  approaches”     à  “OMen  someone  coming  from      outside  an  industry  can  spot      a  bejer  way  to  use  big  data      than  an  insider”         Erik  Brynjolfsson  /  Lynn  Wu     hjp://alfredopassos.tumblr.com/post/32461599327/big-­‐data-­‐the-­‐management-­‐revoluPon-­‐by-­‐andrew-­‐mcafee   Be no expert! Be simple!
  • 19. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     à  Look  at  use  cases  of  others      (SMU,  but  also  large  companies)   à  How  can  you  do  something  similar    with  your  data?     à  You  have  different  data  sources?      Use  it!  Combine  it!  Play  with  it!   Be creative!
  • 20. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     1)  Do  not  begin  with  the  data,  think  about  business  opportuniPes   2)  Choose  the  right  data  (combine  different  data  sources)   3)  Use  easy  tooling      hjp://hbr.org/2012/10/making-­‐advanced-­‐analyPcs-­‐work-­‐for-­‐you     What is your Big Data process? Step  1   Step  2   Step  3  
  • 21. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     • Big  data  paradigm  shiM     • Challenges  of  big  data   • Big  data  from  a  technology  perspecPve   • IntegraPon  with  an  open  source  framework   • IntegraPon  with  an  open  source  suite   Agenda
  • 22. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Technology perspective How  to  process  big  data?  
  • 23. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner       The  criPcal  flaw  in  parallel  ETL  tools  is  the  fact  that  the  data  is  almost  never  local  to  the  processing   nodes.  This  means  that  every  Pme  a  large  job  is  run,  the  data  has  to  first  be  read  from  the  source,   split  N  ways  and  then  delivered  to  the  individual  nodes.    Worse,  if  the  parPPon  key  of  the  source   doesn’t  match  the  parPPon  key  of  the  target,  data  has  to  be  constantly  exchanged  among  the   nodes.  In  essence,  parallel  ETL  treats  the  network  as  if  it  were  a  physical  I/O  subsystem.    The   network,  which  is  always  the  slowest  part  of  the  process,  becomes  the  weakest  link  in  the   performance  chain.     hjp://blog.syncsort.com/2012/08/parallel-­‐etl-­‐tools-­‐are-­‐dead   How to process big data?
  • 24. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Slides:  hjp://www.slideshare.net/pavlobaron/100-­‐big-­‐data-­‐0-­‐hadoop-­‐0-­‐java     Video:  hjp://www.infoq.com/presentaPons/Big-­‐Data-­‐Hadoop-­‐Java   How to process big data?
  • 25. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     The  defacto  standard  for  big  data  processing   How to process big data?
  • 26. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Even  MicrosoM  (the  .NET  house)  relies  on  Hadoop  since  2011   How to process big data? “A  big  part  of  [the   company’s  strategy]   includes  wiring  SQL  Server   2012  (formerly  known  by   the  codename  “Denali”)  to   the  Hadoop  distributed   compuPng  playorm,  and   bringing  Hadoop  to   Windows  Server  and  Azure”  
  • 27. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Apache  Hadoop,  an  open-­‐source  soMware  library,  is  a   framework  that  allows  for  the  distributed  processing  of   large  data  sets  across  clusters  of  commodity  hardware   using  simple  programming  models.  It  is  designed  to  scale   up  from  single  servers  to  thousands  of  machines,  each   offering  local  computaPon  and  storage.       What is Hadoop?
  • 28. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Simple  example   •  Input:  (very  large)  text  files  with  lists  of  strings,  such  as:      „318,  0043012650999991949032412004...0500001N9+01111+99999999999...“   •  We  are  interested  just  in  some  content:  year  and  temperate  (marked  in  red)   •  The  Map  Reduce  funcPon  has  to  compute  the  maximum  temperature  for  every  year   Example  from  the  book  “Hadoop:  The  DefiniPve  Guide,  3rd  EdiPon”   Map (Shuffle) Reduce
  • 29. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     How to process big data?
  • 30. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     MapReduce HDFS Ecosystem Features included Hadoop   DistribuPon   Big  Data  Suite   few many Apache Hadoop Packaging Deployment-Tooling Support + Tooling / Modeling Code Generation Scheduling Integration + Hadoop alternatives
  • 31. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     • Big  data  paradigm  shiM     • Challenges  of  big  data   • Big  data  from  a  technology  perspecPve   • IntegraPon  with  an  open  source  framework   • IntegraPon  with  an  open  source  suite   Agenda
  • 32. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Connectivity Routing Transformation Complexity of Integration Enterprise   Service  Bus   IntegraPon  Suite   Low High Integration Framework INTEGRATION Tooling Monitoring Support+ BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ + Alternatives for systems integration
  • 33. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Complexity of Integration Enterprise   Service  Bus   IntegraPon  Suite   Low High Integration Framework Alternatives for systems integration
  • 34. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     More details about integration frameworks... hjp://www.kai-­‐waehner.de/blog/2012/12/20/showdown-­‐integraPon-­‐framework-­‐ spring-­‐integraPon-­‐apache-­‐camel-­‐vs-­‐enterprise-­‐service-­‐bus-­‐esb/  
  • 35. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Enterprise Integration Patterns (EIP) Apache Camel Implements the EIPs
  • 36. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Enterprise Integration Patterns (EIP)
  • 37. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Enterprise Integration Patterns (EIP)
  • 38. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Architecture hjp://java.dzone.com/arPcles/apache-­‐camel-­‐integraPon  
  • 39. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     HTTP   FTP   File   XSLT   MQ   JDBC Akka   TCP   SMTP   RSS   Quartz   Log   LDAP   JMS   EJB   AMQP   Atom   AWS-S3   Bean-Validation   CXF   IRC   Jetty   JMX   Lucene   Netty   RMI   SQL   Many many more   Custom Components Choose your required components
  • 40. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Choose your favorite DSL XML (not production-ready yet)
  • 41. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Deploy it wherever you need Standalone OSGi Application Server Web Container Spring Container Cloud
  • 42. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Enterprise-ready • Open Source • Scalability • Error Handling • Transaction • Monitoring • Tooling • Commercial Support  
  • 43. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Example: Camel integration route
  • 44. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Example: camel-hdfs component // Producer from(“jms:MyQueue") .to(“hdfs:///myDirectory/myFile.txt?valueType=TEXT"); // Consumer from(“hdfs:///myDirectory/myFile.txt") .to(“file:target/reports/report.txt");
  • 45. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Live demo Apache Camel in action...
  • 46. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     • Big  data  paradigm  shiM     • Challenges  of  big  data   • Big  data  from  a  technology  perspecPve   • IntegraPon  with  an  open  source  framework   • IntegraPon  with  an  open  source  suite   Agenda
  • 47. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Connectivity Routing Transformation Complexity of Integration Enterprise   Service  Bus   IntegraPon  Suite   Low High Integration Framework INTEGRATION Tooling Monitoring Support+ BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ + Alternatives for systems integration
  • 48. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Complexity of Integration Enterprise   Service  Bus   IntegraPon  Suite   Low High Integration Framework Alternatives for systems integration
  • 49. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     More details about ESBs and suites... hjp://www.kai-­‐waehner.de/blog/2013/01/23/spoilt-­‐for-­‐choice-­‐ how-­‐to-­‐choose-­‐the-­‐right-­‐enterprise-­‐service-­‐bus-­‐esb/  
  • 50. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     …an  open  source   ecosystem   Talend  Open  Studio  for  Big  Data     •  Improves  efficiency  of  big  data  job  design  with   graphic  interface   •  Generates  Hadoop  code  and  run  transforms   inside  Hadoop   •  NaPve  support  for  HDFS,  Pig,  Hbase,  Hcatalog,   Sqoop  and  Hive   •  100%  open  source  under  an  Apache  License   •  Standards  based   Pig Vision: Democratize big data
  • 51. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     …an  open  source   ecosystem   Talend  PlaAorm  for  Big  Data     •  Builds  on  Talend  Open  Studio  for  Big  Data   •  Adds  data  quality,  advanced  scalability  and   management  funcPons   •  MapReduce  massively  parallel  data   processing   •  Shared  Repository  and  remote  deployment   •  Data  quality  and  profiling   •  Data  cleansing   •  ReporPng  and  dashboards   •  Commercial  support,  warranty/IP  indemnity   under  a  subscripPon  license   Pig Vision: Democratize big data
  • 52. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Talend Open Studio for Big Data
  • 53. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     „Talend  Open  Studio  for  Big  Data“  in  acPon...   Live demo
  • 54. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Did you get the key message?
  • 55. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science!
  • 56. ©  Talend  2013        “Big  Data  beyond  Hadoop  –  How  to  integrate  ALL  your  Data”  by  Kai  Wähner     Did you get the key message?
  • 57. Thank you for your attention. Questions? kwaehner@talend.com www.kai-waehner.de LinkedIn / Xing @KaiWaehner