SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Apache NiFi: Enterprise data
flow management and FBP
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache, NiFi, Apache NiFi, and the NiFi logo are trademarks of the Apache Software Foundation
Joe Witt | July 2017
Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About me
Member @ Apache Software Foundation
Member @ Apache NiFi PMC
VP Engineering @ Hortonworks
Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• The journey to an FBP-like design
• Architectural elements for Dataflow Management
• Apache NiFi and FBP
• Live Demo and discussion
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The journey to an FBP like design
The data is over here but I want it over there…
Basics of Connecting Systems
For every connection,
these must agree:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
Producer
C1
Consumer
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
It started so simple
• Just needed to scan a directory for new data
• Send it over the link.
But….
• Bandwidth was low, latency high, comms unreliable
• Some data was more useful than others
• The rules for that could change often
• Light-weight in-line analysis could be used to determine relative value
• The value of the data decayed rapidly
• The data’s raw form was highly inefficient for transport
• and large portions of the data could simply be removed in many cases
• How to document, maintain and fine tune the configuration?
• Infrastructure was highly limited
Challenges at the Edge
• Small footprint
• Low power
• Expensive bandwidth
• High latency
• Access to data exceeds
bandwidth (if you're doing
it right)
• Needs recoverability
• Needs to be secured for
both the data plane and
control plane
GATHER
DELIVER
PRIORITIZE
Track from the edge Through the datacenter
Simplistic View of Enterprise Data Flow
The Data Flow Thing
Process and
Analyze Data
Acquire Data
Store Data
Realistic View of Enterprise Data Flow
?
?
?
?
?
?
?
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
What is Dataflow Management
Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Management
The systematic process by which data is
acquired from all producers and delivered to all
consumers
Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Management Considerations
• Promote Loosely Coupled Systems
• Types of coupling: Format, Schema, Protocol, Priority, Size, Interest, …
• Promote Highly Cohesive Systems
• Producers should focus on production (not the intricacies of consumption)
• Consumers should focus on storage or processing (not the details of production)
• Provide Provenance
• The who/what/when/where/why of data
• Inter and Intra Process Latency
• Enable enterprise version control for data
Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Management Considerations
• Empower Understanding and Interaction
• Ability to see the flow, safely and quickly iterate and experiment
• Breaking production is bad – so too is not being able to evolve fast enough
• Secure
• Bridge between security domains
• Data Plane (transport)
• Control Plane (C&C, Monitoring)
• Self Service
• Centralized teams – hard to scale – slow turnaround times
• Centralized systems – multi-tenant management works
Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The role of messaging systems
• Reduce variables: Fix protocol, Data Size, Provide Buffering
• Historically not very fast or replayable: Apache Kafka solved that
• Strong solution within a controlled domain
• But numerous challenges remain
• Topics do not separate key concerns between producer and consumer pairs such as
– Authorization
– Format
– Schema
– Interest
– Prioritization
• Flow control (back-pressure, pressure-release, filtering, etc..)
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi – Built for Dataflow Managment
Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The NSA Years
• Created in 2006
• Improved over eight years
• Simple Initial vision – Visio for real-time dataflow management
• Key Lessons Learned
• What scale means – down, up, and out
• The fearsome force known as Compliance Requirements
• The power of provenance!
• Operational best-practices and anti-patterns
• NSA donated the codebase to the ASF in late 2014
Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
”Maintainability is the real test.”
- J Paul Morrison
Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Key Features
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs.
throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi and FBP concept mapping
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system. NiFi does not have the concept
of bracket/data IP. Just IP.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing,
transformation, or mediation between systems. NiFi does not have the
concept of named input ports on black boxes.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages
the threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send
data via named input/output ports. A process group allows creation of
entirely new component simply by composition of its components.
Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture
Single Node Cluster
One or more nodes
Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories – Pass by reference
FlowFile Content Provenance
F1à C1 C1
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F2à C1 C1 P3à F2 – Clone (F1)
F1à C1 P2à F1 – Route
P1à F1 – Create
P1à F1 – Create
Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories – Copy on Write
FlowFile Content Provenance
F1à C1 C1 P1à F1 - CREATE
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F1à C1
F1.1à C2 C2 (encrypted)
C1 (plaintext) *
P2à F1.1 - MODIFY
P1à F1 - CREATE
* C1 (plaintext) is now eligible to be removed. But if we keep it around as long as possible what cool things can we do?
Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Security – at a high level
Authentication
Authenticate users and systems
• TLS one-way or mutual auth, Username/Password via LDAP, Kerberos/SPNEGO – out of the box
Authorization
Provision access to data
• Pluggable authorization
• Simple file-based authority provider OR Apache Ranger based provider out of the box
• Fine-grained rights assignment per action/component for users and groups
Audit
Maintain a record of data access
• Detailed logging of all user actions
• Detailed logging of all REST API interactions (person or non-person)
• Detailed logging of key system behaviors
• Data Provenance enables fine-grained end to end tracking
Data Protection
Protect data at rest and in motion
• Support a variety of SSL/encryption protocols
• Tag and utilize tags on data for fine grained access controls
• Encrypt/decrypt content
• TDE for Provenance repository (content repository and flowfile WAL work underway!)
Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions
• Apache NiFi – Join the community!
• Feature Requests
• Bug Reports
• Code Contributions
• Peer Reviews
• Documentation
https://nifi.apache.org

Contenu connexe

Tendances

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and ApexBryan Bende
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkJoe Percivall
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionMilind Pandit
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifiGaneshJoshi47
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 

Tendances (17)

Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi Registry
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 

Similaire à Apache NiFi - Flow Based Programming Meetup

BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fiNAVER D2
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxMarco Garcia
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitAldrin Piri
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiJoe Percivall
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIsheeta Sanghi
 

Similaire à Apache NiFi - Flow Based Programming Meetup (20)

BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 

Dernier

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 

Dernier (20)

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 

Apache NiFi - Flow Based Programming Meetup

  • 1. Apache NiFi: Enterprise data flow management and FBP © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache, NiFi, Apache NiFi, and the NiFi logo are trademarks of the Apache Software Foundation Joe Witt | July 2017
  • 2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About me Member @ Apache Software Foundation Member @ Apache NiFi PMC VP Engineering @ Hortonworks
  • 3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • The journey to an FBP-like design • Architectural elements for Dataflow Management • Apache NiFi and FBP • Live Demo and discussion
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved The journey to an FBP like design
  • 5. The data is over here but I want it over there… Basics of Connecting Systems For every connection, these must agree: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size of event 6. Frequency of event 7. Authorization access 8. Relevance P1 Producer C1 Consumer
  • 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 7. It started so simple • Just needed to scan a directory for new data • Send it over the link. But…. • Bandwidth was low, latency high, comms unreliable • Some data was more useful than others • The rules for that could change often • Light-weight in-line analysis could be used to determine relative value • The value of the data decayed rapidly • The data’s raw form was highly inefficient for transport • and large portions of the data could simply be removed in many cases • How to document, maintain and fine tune the configuration? • Infrastructure was highly limited
  • 8. Challenges at the Edge • Small footprint • Low power • Expensive bandwidth • High latency • Access to data exceeds bandwidth (if you're doing it right) • Needs recoverability • Needs to be secured for both the data plane and control plane GATHER DELIVER PRIORITIZE Track from the edge Through the datacenter
  • 9. Simplistic View of Enterprise Data Flow The Data Flow Thing Process and Analyze Data Acquire Data Store Data
  • 10. Realistic View of Enterprise Data Flow ? ? ? ? ? ? ?
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved What is Dataflow Management
  • 12. Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Management The systematic process by which data is acquired from all producers and delivered to all consumers
  • 13. Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Management Considerations • Promote Loosely Coupled Systems • Types of coupling: Format, Schema, Protocol, Priority, Size, Interest, … • Promote Highly Cohesive Systems • Producers should focus on production (not the intricacies of consumption) • Consumers should focus on storage or processing (not the details of production) • Provide Provenance • The who/what/when/where/why of data • Inter and Intra Process Latency • Enable enterprise version control for data
  • 14. Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Management Considerations • Empower Understanding and Interaction • Ability to see the flow, safely and quickly iterate and experiment • Breaking production is bad – so too is not being able to evolve fast enough • Secure • Bridge between security domains • Data Plane (transport) • Control Plane (C&C, Monitoring) • Self Service • Centralized teams – hard to scale – slow turnaround times • Centralized systems – multi-tenant management works
  • 15. Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The role of messaging systems • Reduce variables: Fix protocol, Data Size, Provide Buffering • Historically not very fast or replayable: Apache Kafka solved that • Strong solution within a controlled domain • But numerous challenges remain • Topics do not separate key concerns between producer and consumer pairs such as – Authorization – Format – Schema – Interest – Prioritization • Flow control (back-pressure, pressure-release, filtering, etc..)
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache NiFi – Built for Dataflow Managment
  • 17. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The NSA Years • Created in 2006 • Improved over eight years • Simple Initial vision – Visio for real-time dataflow management • Key Lessons Learned • What scale means – down, up, and out • The fearsome force known as Compliance Requirements • The power of provenance! • Operational best-practices and anti-patterns • NSA donated the codebase to the ASF in late 2014
  • 18. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ”Maintainability is the real test.” - J Paul Morrison
  • 19. Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Key Features • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 20. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi and FBP concept mapping FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. NiFi does not have the concept of bracket/data IP. Just IP. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. NiFi does not have the concept of named input ports on black boxes. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via named input/output ports. A process group allows creation of entirely new component simply by composition of its components.
  • 21. Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture Single Node Cluster One or more nodes
  • 22. Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories – Pass by reference FlowFile Content Provenance F1à C1 C1 Excerpt of demo flow… What’s happening inside the repositories… BEFORE AFTER F2à C1 C1 P3à F2 – Clone (F1) F1à C1 P2à F1 – Route P1à F1 – Create P1à F1 – Create
  • 23. Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories – Copy on Write FlowFile Content Provenance F1à C1 C1 P1à F1 - CREATE Excerpt of demo flow… What’s happening inside the repositories… BEFORE AFTER F1à C1 F1.1à C2 C2 (encrypted) C1 (plaintext) * P2à F1.1 - MODIFY P1à F1 - CREATE * C1 (plaintext) is now eligible to be removed. But if we keep it around as long as possible what cool things can we do?
  • 24. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Security – at a high level Authentication Authenticate users and systems • TLS one-way or mutual auth, Username/Password via LDAP, Kerberos/SPNEGO – out of the box Authorization Provision access to data • Pluggable authorization • Simple file-based authority provider OR Apache Ranger based provider out of the box • Fine-grained rights assignment per action/component for users and groups Audit Maintain a record of data access • Detailed logging of all user actions • Detailed logging of all REST API interactions (person or non-person) • Detailed logging of key system behaviors • Data Provenance enables fine-grained end to end tracking Data Protection Protect data at rest and in motion • Support a variety of SSL/encryption protocols • Tag and utilize tags on data for fine grained access controls • Encrypt/decrypt content • TDE for Provenance repository (content repository and flowfile WAL work underway!)
  • 25. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions • Apache NiFi – Join the community! • Feature Requests • Bug Reports • Code Contributions • Peer Reviews • Documentation https://nifi.apache.org