SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Taking	
  DataFlow	
  Management	
  to	
  the	
  
Edge	
  with	
  Apache	
  NiFi/MiNiFi	
  
Bryan	
  Bende	
  –	
  So>ware	
  Engineer	
  @Hortonworks	
  
Future	
  of	
  Data	
  NY	
  –	
  December	
  5th	
  2016	
  
2	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Agenda	
  
Ã  Problem	
  DefiniHon	
  
Ã  IntroducHon	
  to	
  Apache	
  NiFi	
  
Ã  IntroducHon	
  to	
  Apache	
  MiNiFi	
  
Ã  Demo!!	
  
Ã  Q&A	
  
3	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
About	
  Me	
  
Ã  SoPware	
  Engineer	
  @	
  Hortonworks	
  
Ã  Apache	
  NiFi	
  PMC	
  &	
  CommiTer	
  
Ã  Working	
  with	
  NiFi	
  since	
  2011	
  
Ã  Recent	
  focus	
  on	
  integraHons	
  with	
  Hadoop	
  ecosystem	
  
Ã  bbende@hortonworks.com	
  /	
  TwiTer	
  @bbende	
  /	
  bryanbende.com	
  
Ã  Bethpage	
  Class	
  of	
  2001!	
  
4	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
The	
  Problem	
  
5	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Team	
  2	
  
It	
  starts	
  out	
  so	
  simple…	
  
Hey!	
  We	
  have	
  some	
  
important	
  data	
  to	
  
send	
  you!	
  	
  
Cool!	
  Your	
  data	
  is	
  
really	
  important	
  to	
  
us!	
  
Team	
  1	
  
This	
  should	
  be	
  easy	
  right?...	
  
6	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
But	
  what	
  about	
  formats	
  &	
  protocols?	
  
Team	
  2	
  
We	
  can	
  publish	
  
Avro	
  records	
  to	
  a	
  
Kaa	
  topic,	
  does	
  
that	
  work?	
  
Oh,	
  well	
  we	
  have	
  
a	
  REST	
  service	
  
that	
  accepts	
  
JSON…	
  
Team	
  1	
  
7	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
And	
  what	
  about	
  security	
  &	
  authenKcaKon?	
  
Team	
  2	
  
Hmm	
  what	
  about	
  
security?	
  We	
  can	
  
authenHcate	
  via	
  
Kerberos	
  
Sorry,	
  we	
  only	
  
support	
  2-­‐Way	
  
TLS	
  with	
  
cerHficates	
  
Team	
  1	
  
8	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
And	
  what	
  about	
  all	
  these	
  devices	
  at	
  the	
  edge?	
  
We	
  also	
  need	
  to	
  
grab	
  data	
  from	
  all	
  
these	
  devices,	
  how	
  
are	
  we	
  going	
  to	
  do	
  
that?	
  
Team	
  2	
  
9	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
And	
  What	
  About…	
  
Ã  OrganizaHonal	
  PoliHcs	
  (my	
  data)	
  
Ã  BriTle	
  ConnecHvity	
  
Ã  Firewalls/Security	
  Domains	
  
Ã  Partnerships	
  bring	
  new	
  data	
  /	
  need	
  
different	
  formats	
  
Ã  Data	
  has	
  to	
  be	
  masked	
  for	
  
compliance	
  purposes	
  
Ã  Where	
  is	
  this	
  data	
  even	
  from?	
  
Ã  Data	
  is	
  in	
  that	
  other	
  system	
  –	
  I	
  need	
  
it	
  over	
  here	
  
	
  
Ã  Bandwidth	
  between	
  those	
  sites	
  is	
  
limited	
  
Ã  My	
  Big	
  Data	
  system	
  needs	
  it	
  in	
  this	
  
other	
  beTer/faster/stronger	
  format	
  
Ã  What	
  schema	
  is	
  that	
  from?	
  
Ã  It	
  needs	
  to	
  be	
  enriched	
  first!	
  
Ã  No	
  not	
  that	
  reference	
  set	
  –	
  this	
  one!	
  
Ã  I	
  didn’t	
  even	
  know	
  that	
  system	
  
existed	
  	
  
10	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Ok	
  so	
  let’s	
  fix	
  this	
  
•  Enterprise	
  Architecture	
  –	
  Standardize	
  on	
  	
  
•  …format	
  
•  …a	
  schema	
  (one	
  that	
  can	
  evolve)	
  
•  …a	
  protocol	
  
•  …an	
  ontology	
  
But	
  now…	
  
•  Standard	
  schema	
  becomes	
  complex	
  
•  Hard	
  to	
  agree	
  on	
  common	
  changes	
  
•  Some	
  teams	
  stuck	
  on	
  older	
  versions	
  
•  ProducHvity	
  starts	
  slowing…	
  
11	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Something	
  to	
  ponder	
  –	
  the	
  disconnect	
  is	
  healthy	
  
•  Having	
  Corporate	
  Standards	
  is	
  a	
  good	
  thing.	
  
•  InnovaHon	
  is	
  a	
  good	
  thing.	
  
Innova&on	
  o(en	
  does	
  not	
  follow	
  the	
  Corporate	
  Standard	
  
12	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
What	
  is	
  Dataflow	
  Management?	
  
13	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Dataflow	
  Management	
  
The	
  systemaKc	
  process	
  by	
  which	
  data	
  is	
  acquired	
  from	
  
all	
  producers	
  and	
  delivered	
  to	
  all	
  consumers	
  	
  
14	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Dataflow	
  Management	
  ConsideraKons	
  
•  Promote	
  Loosely	
  Coupled	
  Systems	
  
•  Types	
  of	
  coupling:	
  Format,	
  Schema,	
  Protocol,	
  Priority,	
  Size,	
  Interest,	
  …	
  
•  Promote	
  Highly	
  Cohesive	
  Systems	
  
•  Producers	
  should	
  focus	
  on	
  producHon	
  (not	
  the	
  intricacies	
  of	
  consumpHon)	
  
•  Consumers	
  should	
  focus	
  on	
  storage	
  or	
  processing	
  (not	
  the	
  details	
  of	
  producHon)	
  
•  Provide	
  Provenance	
  
•  The	
  who/what/when/where/why	
  of	
  data	
  
•  Inter	
  and	
  Intra	
  Process	
  Latency	
  
•  Enable	
  enterprise	
  version	
  control	
  for	
  data	
  
15	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Dataflow	
  Management	
  ConsideraKons	
  
•  Empower	
  Understanding	
  and	
  InteracKon	
  
•  Ability	
  to	
  see	
  the	
  flow,	
  safely	
  and	
  quickly	
  iterate	
  and	
  experiment	
  
•  Breaking	
  producHon	
  is	
  bad	
  –	
  so	
  too	
  is	
  not	
  being	
  able	
  to	
  evolve	
  fast	
  enough	
  
•  Secure	
  
•  Bridge	
  between	
  security	
  domains	
  
•  Data	
  Plane	
  (transport)	
  
•  Control	
  Plane	
  (C&C,	
  Monitoring)	
  
•  Self	
  Service	
  
•  Centralized	
  teams	
  –	
  hard	
  to	
  scale	
  –	
  slow	
  turnaround	
  Hmes	
  
•  Centralized	
  systems	
  –	
  mulH-­‐tenant	
  management	
  works	
  
16	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
The	
  role	
  of	
  messaging	
  systems	
  
•  Reduce	
  variables:	
  Fix	
  protocol,	
  Data	
  Size,	
  Provide	
  Buffering	
  
•  Historically	
  not	
  very	
  fast	
  or	
  replayable:	
  Apache	
  Ka]a	
  solved	
  that	
  
•  Strong	
  soluKon	
  within	
  a	
  controlled	
  domain	
  
•  But	
  numerous	
  challenges	
  remain	
  
•  Topics	
  do	
  not	
  separate	
  key	
  concerns	
  between	
  producer	
  and	
  consumer	
  pairs	
  such	
  as	
  
§  AuthorizaHon	
  
§  Format	
  
§  Schema	
  
§  Interest	
  
§  PrioriHzaHon	
  
•  Flow	
  control	
  
17	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
IntroducKon	
  to	
  Apache	
  NiFi	
  
18	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
The NSA Years
•  Created in 2006
•  Improved over eight years
•  Simple	
  IniHal	
  vision	
  –	
  Visio	
  for	
  real-­‐Hme	
  dataflow	
  management	
  
•  Key Lessons Learned
•  What	
  scale	
  means	
  –	
  down,	
  up,	
  and	
  out	
  
•  The	
  fearsome	
  force	
  known	
  as	
  Compliance	
  Requirements	
  
•  The	
  power	
  of	
  provenance!	
  
•  OperaHonal	
  best-­‐pracHces	
  and	
  anH-­‐paTerns	
  
•  NSA donated the codebase to the ASF in late 2014
19	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
NiFi Key Features
•  Guaranteed	
  delivery	
  
•  Data	
  buffering	
  	
  
-  Backpressure	
  
-  Pressure	
  release	
  
•  PrioriKzed	
  queuing	
  
•  Flow	
  specific	
  QoS	
  
-  Latency	
  vs.	
  throughput	
  
-  Loss	
  tolerance	
  
•  Data	
  provenance	
  
•  Recovery/recording	
  	
  
a	
  rolling	
  log	
  of	
  fine-­‐grained	
  
history	
  
•  Visual	
  command	
  and	
  control	
  
•  Flow	
  templates	
  
•  Pluggable/mulK-­‐role	
  security	
  
•  Designed	
  for	
  extension	
  
•  Clustering	
  
20	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
NiFi Core Concepts
FBP	
  Term	
   NiFi	
  Term	
   DescripKon	
  
InformaHon	
  
Packet	
  
FlowFile	
   Each	
  object	
  moving	
  through	
  the	
  system.	
  
Black	
  Box	
   FlowFile	
  
Processor	
  
Performs	
  the	
  work,	
  doing	
  some	
  combinaHon	
  of	
  data	
  rouHng,	
  transformaHon,	
  
or	
  mediaHon	
  between	
  systems.	
  
Bounded	
  
Buffer	
  
ConnecHon	
   The	
  linkage	
  between	
  processors,	
  acHng	
  as	
  queues	
  and	
  allowing	
  various	
  
processes	
  to	
  interact	
  at	
  differing	
  rates.	
  
Scheduler	
   Flow	
  
Controller	
  
Maintains	
  the	
  knowledge	
  of	
  how	
  processes	
  are	
  connected,	
  and	
  manages	
  the	
  
threads	
  and	
  allocaHons	
  thereof	
  which	
  all	
  processes	
  use.	
  
Subnet	
   Process	
  
Group	
  
A	
  set	
  of	
  processes	
  and	
  their	
  connecHons,	
  which	
  can	
  receive	
  and	
  send	
  data	
  via	
  
ports.	
  A	
  process	
  group	
  allows	
  creaHon	
  of	
  enHrely	
  new	
  component	
  simply	
  by	
  
composiHon	
  of	
  its	
  components.	
  
21	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Visual	
  Command	
  &	
  Control	
  
•  Drag	
  &	
  drop	
  processors	
  to	
  build	
  a	
  flow	
  
•  Start,	
  stop,	
  &	
  configure	
  components	
  in	
  
real-­‐Hme	
  
	
  
•  View	
  errors	
  &	
  corresponding	
  messages	
  
•  View	
  staHsHcs	
  &	
  health	
  of	
  the	
  
dataflow	
  
•  Create	
  shareable	
  templates	
  of	
  
common	
  flows	
  
	
  
22	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Provenance/Lineage	
  
•  Tracks	
  data	
  at	
  each	
  point	
  as	
  it	
  flows	
  
through	
  the	
  system	
  
•  Records,	
  indexes,	
  and	
  makes	
  events	
  
available	
  for	
  display	
  
•  Handles	
  fan-­‐in/fan-­‐out,	
  i.e.	
  merging	
  
and	
  splisng	
  data	
  
•  View	
  aTributes	
  and	
  content	
  at	
  given	
  
points	
  in	
  Hme	
  
23	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
PrioriKzaKon	
  
•  Configure	
  a	
  prioriHzer	
  per	
  connecHon	
  
•  Determine	
  what	
  is	
  important	
  for	
  your	
  
data	
  –	
  Hme	
  based,	
  arrival	
  order,	
  
importance	
  of	
  a	
  data	
  set	
  
•  Funnel	
  many	
  connecHons	
  down	
  to	
  a	
  
single	
  connecHon	
  to	
  prioriHze	
  across	
  
data	
  sets	
  
24	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Back-­‐Pressure	
  
•  Configure	
  back-­‐pressure	
  per	
  
connecHon	
  
•  Based	
  on	
  number	
  of	
  FlowFiles	
  or	
  
total	
  size	
  of	
  FlowFiles	
  
•  Upstream	
  processor	
  no	
  longer	
  
scheduled	
  to	
  run	
  unHl	
  below	
  
threshold	
  
25	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Latency	
  vs.	
  Throughput	
  
•  Choose	
  between	
  lower	
  latency,	
  or	
  higher	
  throughput	
  on	
  each	
  processor	
  
•  Higher	
  throughput	
  allows	
  framework	
  to	
  batch	
  together	
  all	
  operaHons	
  for	
  the	
  selected	
  
amount	
  of	
  Hme	
  for	
  improved	
  performance	
  
•  Processor	
  developer	
  determines	
  whether	
  to	
  support	
  this	
  by	
  using	
  @SupportsBatching	
  
annotaHon	
  
26	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Security	
  
Ã  Control	
  Plane	
  
–  Pluggable	
  authenHcaHon	
  
•  2-­‐Way	
  TLS/SSL,	
  LDAP,	
  Kerberos	
  
–  Pluggable	
  authorizaHon	
  with	
  mulH-­‐tenancy	
  
•  NiFi	
  Policy	
  Based	
  Authorizer	
  
•  Apache	
  Ranger	
  Authorizer	
  
–  Audit	
  trail	
  of	
  all	
  user	
  acHons	
  
Ã  Data	
  Plane	
  
–  OpHonal	
  2-­‐Way	
  TLS/SSL	
  between	
  cluster	
  nodes	
  
–  OpHonal	
  2-­‐Way	
  TLS/SSL	
  on	
  Site-­‐To-­‐Site	
  connecHons	
  (NiFi-­‐to-­‐NiFi)	
  
–  EncrypHon/DecrypHon	
  of	
  data	
  through	
  processors	
  
–  Provenance	
  for	
  audit	
  trail	
  of	
  data	
  
27	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Extensibility	
  
Ã  Built	
  from	
  the	
  ground	
  up	
  with	
  extensions	
  in	
  mind	
  
Ã  Service-­‐loader	
  paTern	
  for…	
  
•  Processors	
  
•  Controller	
  Services	
  
•  ReporHng	
  Tasks	
  
Ã  Extensions	
  packaged	
  as	
  NiFi	
  Archives	
  (NARs)	
  
•  Deploy	
  NiFi	
  lib	
  directory	
  and	
  restart	
  
•  Provides	
  ClassLoader	
  isolaHon	
  
•  Same	
  model	
  as	
  standard	
  components	
  
28	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Architecture	
  -­‐	
  Standalone	
  
OS/Host	
  
JVM	
  
Flow	
  Controller	
  
Web	
  Server	
  
Processor	
  1	
   Extension	
  N	
  
FlowFile	
  
Repository	
  
Content	
  
Repository	
  
Provenance	
  
Repository	
  
Local	
  Storage	
  
Ã  FlowFile	
  Repository	
  
–  Write	
  Ahead	
  Log	
  	
  
–  State	
  of	
  every	
  FlowFile	
  
–  Pointers	
  to	
  content	
  repository	
  
(pass-­‐by-­‐reference)	
  
Ã  Content	
  Repository	
  
–  FlowFile	
  content	
  
–  Copy-­‐on-­‐write	
  
Ã  Provenance	
  Repository	
  
–  Write	
  Ahead	
  Log	
  +	
  Lucene	
  Indexes	
  
–  Store	
  &	
  search	
  lineage	
  events	
  
29	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
OS/Host	
  
JVM	
  
Flow	
  Controller	
  
Web	
  Server	
  
Processor	
  1	
   Extension	
  N	
  
FlowFile	
  
Repository	
  
Content	
  
Repository	
  
Provenance	
  
Repository	
  
Local	
  Storage	
  
OS/Host	
  
JVM	
  
Flow	
  Controller	
  
Web	
  Server	
  
Processor	
  1	
   Extension	
  N	
  
FlowFile	
  
Repository	
  
Content	
  
Repository	
  
Provenance	
  
Repository	
  
Local	
  Storage	
  
Architecture	
  -­‐	
  Cluster	
  
OS/Host	
  
JVM	
  
Flow	
  Controller	
  
Web	
  Server	
  
Processor	
  1	
   Extension	
  N	
  
FlowFile	
  
Repository	
  
Content	
  
Repository	
  
Provenance	
  
Repository	
  
Local	
  Storage	
  
ZooKeeper	
  
Ã  Same	
  dataflow	
  on	
  each	
  node,	
  
data	
  parHHoned	
  across	
  cluster	
  
Ã  Access	
  the	
  UI	
  from	
  any	
  node	
  
Ã  ZooKeeper	
  for	
  auto-­‐elecHon	
  of	
  
Cluster	
  Coordinator	
  &	
  Primary	
  
Node	
  	
  
Ã  Cluster	
  Coordinator	
  receives	
  
heartbeats	
  from	
  other	
  nodes,	
  
manages	
  joining/	
  disconnecHng	
  
Ã  Primary	
  Node	
  for	
  scheduling	
  
processors	
  on	
  a	
  single	
  node	
  
30	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Site-­‐To-­‐Site	
  
Ã  Direct	
  communicaHon	
  between	
  two	
  NiFi	
  instances	
  
Ã  Push	
  to	
  Input	
  Port	
  on	
  receiver,	
  or	
  Pull	
  from	
  Output	
  Port	
  on	
  source	
  
Ã  Communicate	
  between	
  clusters,	
  standalone	
  instances,	
  or	
  both	
  
Ã  Handles	
  load	
  balancing	
  and	
  reliable	
  delivery	
  
Ã  Secure	
  connecHons	
  using	
  cerHficates	
  (opHonal)	
  
Ã  Communicate	
  over	
  TCP	
  or	
  HTTP	
  
	
  
31	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Site-­‐To-­‐Site	
  Push	
  Model	
  
Ã  Source	
  connects	
  Remote	
  Process	
  Group	
  to	
  Input	
  Port	
  on	
  desHnaHon	
  
Ã  Site-­‐To-­‐Site	
  takes	
  care	
  of	
  load	
  balancing	
  across	
  the	
  nodes	
  in	
  the	
  cluster	
  
NiFi	
  Cluster	
  -­‐	
  Node	
  2	
  
Input	
  Port	
  
NiFi	
  Cluster	
  -­‐	
  Node	
  3	
  
Input	
  Port	
  
Standalone	
  NiFi	
  
RPG	
  
NiFi	
  Cluster	
  -­‐	
  Node	
  1	
  
Input	
  Port	
  
32	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Site-­‐To-­‐Site	
  Pull	
  Model	
  
Ã  DesHnaHon	
  connects	
  Remote	
  Process	
  Group	
  to	
  Output	
  Port	
  on	
  the	
  source	
  
Ã  If	
  source	
  was	
  a	
  cluster,	
  each	
  node	
  would	
  pull	
  from	
  each	
  node	
  in	
  cluster	
  
NiFi	
  Cluster	
  -­‐	
  Node	
  2	
  
RPG	
  
NiFi	
  Cluster	
  -­‐	
  Node	
  3	
  
RPG	
  
Standalone	
  NiFi	
  
Output	
  Port	
  
NiFi	
  Cluster	
  -­‐	
  Node	
  1	
  
RPG	
  
33	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
IntroducKon	
  to	
  Apache	
  MiNiFi	
  
34	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Apache	
  MiNiFi	
  	
  
Ã  Sub-­‐project	
  of	
  Apache	
  NiFi	
  
Ã  Created	
  to	
  more	
  effecHvely	
  collect	
  data	
  at	
  the	
  edge	
  
Ã  Smaller	
  footprint,	
  run	
  where	
  the	
  JVM	
  can’t	
  
Ã  Design	
  &	
  Deploy	
  vs.	
  Command	
  &	
  Control	
  
35	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
MiNiFi	
  DistribuKons	
  
Ã  Java	
  
–  <40MB	
  binary	
  distribuHon	
  
–  Requires	
  Java	
  1.8	
  
–  More	
  feature	
  complete	
  
–  Targeted	
  for	
  any	
  systems	
  that	
  can	
  run	
  a	
  JVM	
  (ie.	
  Servers,	
  Raspberry	
  Pi)	
  
Ã  C++	
  
–  600KB	
  code	
  size	
  and	
  staHc	
  data	
  ~50KB	
  
–  Dynamic	
  heap	
  of	
  ~1MB	
  based	
  on	
  use-­‐case	
  
–  Targeted	
  for	
  resource	
  constrained	
  environments	
  (ie.	
  edge	
  IoT	
  devices)	
  
	
  
Ã  Both	
  use	
  same	
  config	
  format	
  and	
  use	
  NiFi	
  terminology	
  
Different	
  focuses	
  depending	
  on	
  requirements	
  
36	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
MiNiFi	
  Java	
  
NiFi	
  Framework	
  
Components	
  
MiNiFi	
  
NiFi	
  Framework	
  
User	
  Interface	
  
Components	
  
NiFi	
  
37	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
MiNiFi	
  Java	
  	
  
Ã  Uses	
  same	
  NAR	
  structure	
  as	
  NiFi	
  
Ã  Use	
  any	
  NAR	
  from	
  NiFi	
  with	
  MiNiFi	
  Java	
  
Ã  NiFi	
  standard	
  processors	
  are	
  bundled	
  by	
  default	
  
–  TailLog	
  
–  UpdateATribute	
  
–  Route	
  on	
  content	
  and	
  aTributes	
  
–  PutEmail	
  
–  ….	
  
38	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
MiNiFi	
  C++	
  	
  
Ã  IniHal	
  set	
  of	
  processors	
  	
  
–  TailFile	
  
–  GetFile	
  
–  GenerateFlowFile	
  
–  LogATribute	
  
–  ListenSyslog	
  
Ã  Site	
  to	
  Site	
  Client	
  implementaHon	
  in	
  C++	
  for	
  talking	
  to	
  NiFi	
  instances	
  
	
  
39	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Design	
  &	
  Deploy	
  
Same	
  approach	
  for	
  Java	
  &	
  C++…	
  
1.  Design	
  a	
  flow	
  in	
  NiFi	
  UI	
  
2.  Export	
  template	
  to	
  XML	
  file	
  
3.  Run	
  MiNiFi	
  Toolkit	
  to	
  convert	
  NiFi	
  template	
  to	
  MiNiFi	
  YAML	
  
4.  Deploy	
  config.yaml	
  to	
  MiNiFi	
  instances	
  
IniHally	
  targeHng	
  flows	
  like…	
  
1.  GetFile/TailFile	
  
2.  RouHng	
  Decision	
  
3.  Site-­‐To-­‐Site	
  Back	
  to	
  core	
  NiFi	
  
40	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Simple	
  config.yml	
  
Tail	
  a	
  rolling	
  file	
  -­‐>	
  Site	
  to	
  Site	
  
41	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
MiNiFi	
  Command	
  and	
  Control	
  
Ã  Design	
  Flow	
  at	
  a	
  centralized	
  place,	
  deploy	
  on	
  the	
  edge	
  
Ã  Version	
  control	
  of	
  flows	
  	
  
–  Align	
  with	
  NiFi	
  SDLC	
  work	
  
Ã  Agent	
  status	
  monitoring	
  
Ã  Bi-­‐direcHonal	
  command	
  and	
  control	
  
Currently	
  a	
  feature	
  proposal,	
  iniKal	
  version	
  being	
  architected	
  
hTps://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control	
  
42	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Demo!	
  
43	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Demo	
  Scenario	
  
Raspberry	
  Pi	
  
MiNiFi	
  Java	
  
Temp/Humidity	
  
Sensor	
  
NiFi	
  
Raspberry	
  Pi	
  
MiNiFi	
  Java	
  
Temp/Humidity	
  
Sensor	
  
site-­‐to-­‐site	
  
Solr	
  
Banana	
  
44	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
QuesKons?	
  
45	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Learn	
  more	
  and	
  join	
  us!	
  
Apache NiFi site
http://nifi.apache.org
Subproject MiNiFi site
http://nifi.apache.org/minifi/
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
https://issues.apache.org/jira/browse/MINIFI
Follow us on Twitter
@apachenifi
46	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2016.	
  All	
  Rights	
  Reserved	
  
Thank	
  you!	
  

Contenu connexe

Tendances

Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemBryan Bende
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and ApexBryan Bende
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without DataBryan Bende
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifiAnshuman Ghosh
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkJoe Percivall
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionMilind Pandit
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationIsheeta Sanghi
 

Tendances (19)

Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without Data
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 

Similaire à Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi

Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fiNAVER D2
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataMats Johansson
 
Using Apache® NiFi to Empower Self-Organising Teams
Using Apache® NiFi to Empower Self-Organising TeamsUsing Apache® NiFi to Empower Self-Organising Teams
Using Apache® NiFi to Empower Self-Organising TeamsSebastian Carroll
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
 

Similaire à Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi (20)

Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Joseph Witt
Joseph WittJoseph Witt
Joseph Witt
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Using Apache® NiFi to Empower Self-Organising Teams
Using Apache® NiFi to Empower Self-Organising TeamsUsing Apache® NiFi to Empower Self-Organising Teams
Using Apache® NiFi to Empower Self-Organising Teams
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 

Plus de Bryan Bende

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiBryan Bende
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record ProcessingBryan Bende
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBryan Bende
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud ComputingBryan Bende
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 

Plus de Bryan Bende (6)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFi
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud Computing
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 

Dernier

IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxJoão Esperancinha
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 

Dernier (20)

IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptx
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 

Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi

  • 1. Taking  DataFlow  Management  to  the   Edge  with  Apache  NiFi/MiNiFi   Bryan  Bende  –  So>ware  Engineer  @Hortonworks   Future  of  Data  NY  –  December  5th  2016  
  • 2. 2   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Agenda   Ã  Problem  DefiniHon   Ã  IntroducHon  to  Apache  NiFi   Ã  IntroducHon  to  Apache  MiNiFi   Ã  Demo!!   Ã  Q&A  
  • 3. 3   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   About  Me   Ã  SoPware  Engineer  @  Hortonworks   Ã  Apache  NiFi  PMC  &  CommiTer   Ã  Working  with  NiFi  since  2011   Ã  Recent  focus  on  integraHons  with  Hadoop  ecosystem   Ã  bbende@hortonworks.com  /  TwiTer  @bbende  /  bryanbende.com   Ã  Bethpage  Class  of  2001!  
  • 4. 4   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   The  Problem  
  • 5. 5   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Team  2   It  starts  out  so  simple…   Hey!  We  have  some   important  data  to   send  you!     Cool!  Your  data  is   really  important  to   us!   Team  1   This  should  be  easy  right?...  
  • 6. 6   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   But  what  about  formats  &  protocols?   Team  2   We  can  publish   Avro  records  to  a   Kaa  topic,  does   that  work?   Oh,  well  we  have   a  REST  service   that  accepts   JSON…   Team  1  
  • 7. 7   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   And  what  about  security  &  authenKcaKon?   Team  2   Hmm  what  about   security?  We  can   authenHcate  via   Kerberos   Sorry,  we  only   support  2-­‐Way   TLS  with   cerHficates   Team  1  
  • 8. 8   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   And  what  about  all  these  devices  at  the  edge?   We  also  need  to   grab  data  from  all   these  devices,  how   are  we  going  to  do   that?   Team  2  
  • 9. 9   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   And  What  About…   Ã  OrganizaHonal  PoliHcs  (my  data)   Ã  BriTle  ConnecHvity   Ã  Firewalls/Security  Domains   Ã  Partnerships  bring  new  data  /  need   different  formats   Ã  Data  has  to  be  masked  for   compliance  purposes   Ã  Where  is  this  data  even  from?   Ã  Data  is  in  that  other  system  –  I  need   it  over  here     Ã  Bandwidth  between  those  sites  is   limited   Ã  My  Big  Data  system  needs  it  in  this   other  beTer/faster/stronger  format   Ã  What  schema  is  that  from?   Ã  It  needs  to  be  enriched  first!   Ã  No  not  that  reference  set  –  this  one!   Ã  I  didn’t  even  know  that  system   existed    
  • 10. 10   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Ok  so  let’s  fix  this   •  Enterprise  Architecture  –  Standardize  on     •  …format   •  …a  schema  (one  that  can  evolve)   •  …a  protocol   •  …an  ontology   But  now…   •  Standard  schema  becomes  complex   •  Hard  to  agree  on  common  changes   •  Some  teams  stuck  on  older  versions   •  ProducHvity  starts  slowing…  
  • 11. 11   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Something  to  ponder  –  the  disconnect  is  healthy   •  Having  Corporate  Standards  is  a  good  thing.   •  InnovaHon  is  a  good  thing.   Innova&on  o(en  does  not  follow  the  Corporate  Standard  
  • 12. 12   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   What  is  Dataflow  Management?  
  • 13. 13   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Dataflow  Management   The  systemaKc  process  by  which  data  is  acquired  from   all  producers  and  delivered  to  all  consumers    
  • 14. 14   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Dataflow  Management  ConsideraKons   •  Promote  Loosely  Coupled  Systems   •  Types  of  coupling:  Format,  Schema,  Protocol,  Priority,  Size,  Interest,  …   •  Promote  Highly  Cohesive  Systems   •  Producers  should  focus  on  producHon  (not  the  intricacies  of  consumpHon)   •  Consumers  should  focus  on  storage  or  processing  (not  the  details  of  producHon)   •  Provide  Provenance   •  The  who/what/when/where/why  of  data   •  Inter  and  Intra  Process  Latency   •  Enable  enterprise  version  control  for  data  
  • 15. 15   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Dataflow  Management  ConsideraKons   •  Empower  Understanding  and  InteracKon   •  Ability  to  see  the  flow,  safely  and  quickly  iterate  and  experiment   •  Breaking  producHon  is  bad  –  so  too  is  not  being  able  to  evolve  fast  enough   •  Secure   •  Bridge  between  security  domains   •  Data  Plane  (transport)   •  Control  Plane  (C&C,  Monitoring)   •  Self  Service   •  Centralized  teams  –  hard  to  scale  –  slow  turnaround  Hmes   •  Centralized  systems  –  mulH-­‐tenant  management  works  
  • 16. 16   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   The  role  of  messaging  systems   •  Reduce  variables:  Fix  protocol,  Data  Size,  Provide  Buffering   •  Historically  not  very  fast  or  replayable:  Apache  Ka]a  solved  that   •  Strong  soluKon  within  a  controlled  domain   •  But  numerous  challenges  remain   •  Topics  do  not  separate  key  concerns  between  producer  and  consumer  pairs  such  as   §  AuthorizaHon   §  Format   §  Schema   §  Interest   §  PrioriHzaHon   •  Flow  control  
  • 17. 17   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   IntroducKon  to  Apache  NiFi  
  • 18. 18   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   The NSA Years •  Created in 2006 •  Improved over eight years •  Simple  IniHal  vision  –  Visio  for  real-­‐Hme  dataflow  management   •  Key Lessons Learned •  What  scale  means  –  down,  up,  and  out   •  The  fearsome  force  known  as  Compliance  Requirements   •  The  power  of  provenance!   •  OperaHonal  best-­‐pracHces  and  anH-­‐paTerns   •  NSA donated the codebase to the ASF in late 2014
  • 19. 19   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   NiFi Key Features •  Guaranteed  delivery   •  Data  buffering     -  Backpressure   -  Pressure  release   •  PrioriKzed  queuing   •  Flow  specific  QoS   -  Latency  vs.  throughput   -  Loss  tolerance   •  Data  provenance   •  Recovery/recording     a  rolling  log  of  fine-­‐grained   history   •  Visual  command  and  control   •  Flow  templates   •  Pluggable/mulK-­‐role  security   •  Designed  for  extension   •  Clustering  
  • 20. 20   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   NiFi Core Concepts FBP  Term   NiFi  Term   DescripKon   InformaHon   Packet   FlowFile   Each  object  moving  through  the  system.   Black  Box   FlowFile   Processor   Performs  the  work,  doing  some  combinaHon  of  data  rouHng,  transformaHon,   or  mediaHon  between  systems.   Bounded   Buffer   ConnecHon   The  linkage  between  processors,  acHng  as  queues  and  allowing  various   processes  to  interact  at  differing  rates.   Scheduler   Flow   Controller   Maintains  the  knowledge  of  how  processes  are  connected,  and  manages  the   threads  and  allocaHons  thereof  which  all  processes  use.   Subnet   Process   Group   A  set  of  processes  and  their  connecHons,  which  can  receive  and  send  data  via   ports.  A  process  group  allows  creaHon  of  enHrely  new  component  simply  by   composiHon  of  its  components.  
  • 21. 21   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Visual  Command  &  Control   •  Drag  &  drop  processors  to  build  a  flow   •  Start,  stop,  &  configure  components  in   real-­‐Hme     •  View  errors  &  corresponding  messages   •  View  staHsHcs  &  health  of  the   dataflow   •  Create  shareable  templates  of   common  flows    
  • 22. 22   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Provenance/Lineage   •  Tracks  data  at  each  point  as  it  flows   through  the  system   •  Records,  indexes,  and  makes  events   available  for  display   •  Handles  fan-­‐in/fan-­‐out,  i.e.  merging   and  splisng  data   •  View  aTributes  and  content  at  given   points  in  Hme  
  • 23. 23   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   PrioriKzaKon   •  Configure  a  prioriHzer  per  connecHon   •  Determine  what  is  important  for  your   data  –  Hme  based,  arrival  order,   importance  of  a  data  set   •  Funnel  many  connecHons  down  to  a   single  connecHon  to  prioriHze  across   data  sets  
  • 24. 24   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Back-­‐Pressure   •  Configure  back-­‐pressure  per   connecHon   •  Based  on  number  of  FlowFiles  or   total  size  of  FlowFiles   •  Upstream  processor  no  longer   scheduled  to  run  unHl  below   threshold  
  • 25. 25   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Latency  vs.  Throughput   •  Choose  between  lower  latency,  or  higher  throughput  on  each  processor   •  Higher  throughput  allows  framework  to  batch  together  all  operaHons  for  the  selected   amount  of  Hme  for  improved  performance   •  Processor  developer  determines  whether  to  support  this  by  using  @SupportsBatching   annotaHon  
  • 26. 26   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Security   Ã  Control  Plane   –  Pluggable  authenHcaHon   •  2-­‐Way  TLS/SSL,  LDAP,  Kerberos   –  Pluggable  authorizaHon  with  mulH-­‐tenancy   •  NiFi  Policy  Based  Authorizer   •  Apache  Ranger  Authorizer   –  Audit  trail  of  all  user  acHons   Ã  Data  Plane   –  OpHonal  2-­‐Way  TLS/SSL  between  cluster  nodes   –  OpHonal  2-­‐Way  TLS/SSL  on  Site-­‐To-­‐Site  connecHons  (NiFi-­‐to-­‐NiFi)   –  EncrypHon/DecrypHon  of  data  through  processors   –  Provenance  for  audit  trail  of  data  
  • 27. 27   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Extensibility   Ã  Built  from  the  ground  up  with  extensions  in  mind   Ã  Service-­‐loader  paTern  for…   •  Processors   •  Controller  Services   •  ReporHng  Tasks   Ã  Extensions  packaged  as  NiFi  Archives  (NARs)   •  Deploy  NiFi  lib  directory  and  restart   •  Provides  ClassLoader  isolaHon   •  Same  model  as  standard  components  
  • 28. 28   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Architecture  -­‐  Standalone   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   Ã  FlowFile  Repository   –  Write  Ahead  Log     –  State  of  every  FlowFile   –  Pointers  to  content  repository   (pass-­‐by-­‐reference)   Ã  Content  Repository   –  FlowFile  content   –  Copy-­‐on-­‐write   Ã  Provenance  Repository   –  Write  Ahead  Log  +  Lucene  Indexes   –  Store  &  search  lineage  events  
  • 29. 29   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   Architecture  -­‐  Cluster   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   ZooKeeper   Ã  Same  dataflow  on  each  node,   data  parHHoned  across  cluster   Ã  Access  the  UI  from  any  node   Ã  ZooKeeper  for  auto-­‐elecHon  of   Cluster  Coordinator  &  Primary   Node     Ã  Cluster  Coordinator  receives   heartbeats  from  other  nodes,   manages  joining/  disconnecHng   Ã  Primary  Node  for  scheduling   processors  on  a  single  node  
  • 30. 30   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Site-­‐To-­‐Site   Ã  Direct  communicaHon  between  two  NiFi  instances   Ã  Push  to  Input  Port  on  receiver,  or  Pull  from  Output  Port  on  source   Ã  Communicate  between  clusters,  standalone  instances,  or  both   Ã  Handles  load  balancing  and  reliable  delivery   Ã  Secure  connecHons  using  cerHficates  (opHonal)   Ã  Communicate  over  TCP  or  HTTP    
  • 31. 31   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Site-­‐To-­‐Site  Push  Model   Ã  Source  connects  Remote  Process  Group  to  Input  Port  on  desHnaHon   Ã  Site-­‐To-­‐Site  takes  care  of  load  balancing  across  the  nodes  in  the  cluster   NiFi  Cluster  -­‐  Node  2   Input  Port   NiFi  Cluster  -­‐  Node  3   Input  Port   Standalone  NiFi   RPG   NiFi  Cluster  -­‐  Node  1   Input  Port  
  • 32. 32   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Site-­‐To-­‐Site  Pull  Model   Ã  DesHnaHon  connects  Remote  Process  Group  to  Output  Port  on  the  source   Ã  If  source  was  a  cluster,  each  node  would  pull  from  each  node  in  cluster   NiFi  Cluster  -­‐  Node  2   RPG   NiFi  Cluster  -­‐  Node  3   RPG   Standalone  NiFi   Output  Port   NiFi  Cluster  -­‐  Node  1   RPG  
  • 33. 33   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   IntroducKon  to  Apache  MiNiFi  
  • 34. 34   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Apache  MiNiFi     Ã  Sub-­‐project  of  Apache  NiFi   Ã  Created  to  more  effecHvely  collect  data  at  the  edge   Ã  Smaller  footprint,  run  where  the  JVM  can’t   Ã  Design  &  Deploy  vs.  Command  &  Control  
  • 35. 35   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  DistribuKons   Ã  Java   –  <40MB  binary  distribuHon   –  Requires  Java  1.8   –  More  feature  complete   –  Targeted  for  any  systems  that  can  run  a  JVM  (ie.  Servers,  Raspberry  Pi)   Ã  C++   –  600KB  code  size  and  staHc  data  ~50KB   –  Dynamic  heap  of  ~1MB  based  on  use-­‐case   –  Targeted  for  resource  constrained  environments  (ie.  edge  IoT  devices)     Ã  Both  use  same  config  format  and  use  NiFi  terminology   Different  focuses  depending  on  requirements  
  • 36. 36   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  Java   NiFi  Framework   Components   MiNiFi   NiFi  Framework   User  Interface   Components   NiFi  
  • 37. 37   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  Java     Ã  Uses  same  NAR  structure  as  NiFi   Ã  Use  any  NAR  from  NiFi  with  MiNiFi  Java   Ã  NiFi  standard  processors  are  bundled  by  default   –  TailLog   –  UpdateATribute   –  Route  on  content  and  aTributes   –  PutEmail   –  ….  
  • 38. 38   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  C++     Ã  IniHal  set  of  processors     –  TailFile   –  GetFile   –  GenerateFlowFile   –  LogATribute   –  ListenSyslog   Ã  Site  to  Site  Client  implementaHon  in  C++  for  talking  to  NiFi  instances    
  • 39. 39   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Design  &  Deploy   Same  approach  for  Java  &  C++…   1.  Design  a  flow  in  NiFi  UI   2.  Export  template  to  XML  file   3.  Run  MiNiFi  Toolkit  to  convert  NiFi  template  to  MiNiFi  YAML   4.  Deploy  config.yaml  to  MiNiFi  instances   IniHally  targeHng  flows  like…   1.  GetFile/TailFile   2.  RouHng  Decision   3.  Site-­‐To-­‐Site  Back  to  core  NiFi  
  • 40. 40   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Simple  config.yml   Tail  a  rolling  file  -­‐>  Site  to  Site  
  • 41. 41   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  Command  and  Control   Ã  Design  Flow  at  a  centralized  place,  deploy  on  the  edge   Ã  Version  control  of  flows     –  Align  with  NiFi  SDLC  work   Ã  Agent  status  monitoring   Ã  Bi-­‐direcHonal  command  and  control   Currently  a  feature  proposal,  iniKal  version  being  architected   hTps://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control  
  • 42. 42   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Demo!  
  • 43. 43   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Demo  Scenario   Raspberry  Pi   MiNiFi  Java   Temp/Humidity   Sensor   NiFi   Raspberry  Pi   MiNiFi  Java   Temp/Humidity   Sensor   site-­‐to-­‐site   Solr   Banana  
  • 44. 44   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   QuesKons?  
  • 45. 45   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Learn  more  and  join  us!   Apache NiFi site http://nifi.apache.org Subproject MiNiFi site http://nifi.apache.org/minifi/ Subscribe to and collaborate at dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI https://issues.apache.org/jira/browse/MINIFI Follow us on Twitter @apachenifi
  • 46. 46   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Thank  you!