SlideShare une entreprise Scribd logo
1  sur  16
NiFi Best Practices for the
Enterprise
August 8, 2017
Future of Data – New Jersey
Hosted by Honeywell
Greg Keys
Solutions Engineer @ Hortonworks
Agenda
 NiFi Quick Overview
 NiFi Comparison with Other Technology
Best Practices: People, Process, Assets
© Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Quick Overview
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Data Flow (HDF)
NiFi-like UI for
building stream
analytics
Reusable schema
registry across all
components
Data in Motion Data at Rest
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Crash Course
Other concepts
• template = code
• processor group = encapsulation
• remote processor group = communication across
NiFi clusters
• expression language = dynamic
• provenance = data lineage
• audit = history of data state for all
• replay = playback data in queue
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Big Picture Pattern: Diverse Flows from One Tool
© Hortonworks Inc. 2011 – 2016. All Rights Reserved6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Capability Sweetspots
Comparison to other technologies
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs HDF Streaming Analytics Manager (SAM)
NiFi: Flow Management SAM: Stream Processing
batch, microbatch or streaming data
small (KB) to large (GB)
parse, filter, transform, enrich, reformat
route, merge, prioritization, back
pressure, and edge intelligence
diverse (structured, semi-, un-)
diverse (e.g. text, video analytics)
minimal
streaming (real-time flow)
small (KB, MB) per message
minimal
minimal (mostly routing and merging)
typically Kafka or live stream
analytics and event processing
rich and powerful, live data
Move and process diverse data Analytics and alerting on live data
speed
content size
content manipulation
dataflow management
data sources
analytics
KafkaCommon Pattern:
sweet spot
Both have easy UIs to build complex data movement processing
computation
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache
NiFi/MiNiFi
NiFI Compared to Other Technologies
ETL Tools
(Informatica,
etc)
Enterprise
Service Bus
(Mulesoft,
Fuse, etc)
Message
Queue
(Kafka, JMS,
etc)
NiFi gets you
80% there
NiFi gets you
80% there
NiFi as a single tool has
a very wide sweetspot
“Swiss Army Knife of
Data Movement”
Deliver and process wide
range of data to the
business
(sensor, logs, relational,
restful/json, video …)
NiFi is
complimentary
Event Processing & Streaming Analytics:
previous slide
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs ETL Tools
ETL Tools
ETL on structured & semi-structured
(schema registry for structured)
Diverse sources and targets (e.g. Rest
APIs, PDFs, .xls) including RDBMS
All-encompassing flows: no data prep with
other tool; can join/merge flows; lookup, etc
rapid development
flow management capabilities
(backpressure, prioritization, etc)
Not designed for large/complex joins &
aggregations, log-based CDC (vs field-
based), or industrial-strength cleansing
• Tailored for RDBMS
• ETL based on schema / data modelling
• Highly efficient, optimized performance
sweet spots
Must prepare and model data beforehand
Not designed for dataflow problems and
diverse types of data
NiFi gets you 80% there
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs Enterprise Service Bus
Enterprise Service Bus
Easy robust integrations from diverse
sources to diverse targets
(e.g. Salesforce to Hive; Oracle to Mongo, sFTP
to file system)
Easy file conversions (e.g. XML to JSON)
Easy protocol conversions (route on
attribute, e.g. FTP to Rest)
Easy cross data center communication
(remote processor)
Not a comprehensive single abstraction
layer: no code in applications, minimal
process choreography, SOA stuff, etc
• Single abstraction layer for content
movement throughout enterprise
• Specialized tools for each component of
abstraction (e.g. process orchestration,
protocol binding)
• Highly efficient, optimized performance
sweet spots
Industrial-strength focus on single
abstraction layer prevents flexibility and
agile dev of NiFi data flows (platform stack
vs Swiss Army knife)
NiFi gets you 80% there
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs Message Queue
Message Queue
General and flexible data movement
tool
Great for acquiring and processing
data before delivering to queue
Not designed to be a message queue
(i.e. asynchronous durable handoff to
receiver of message)
• Low latency content transfer between
sender and receiver
• Asynchronous: sender releases
message without receiver taking it
• Durable: message persists until
receiver consumes it
• Highly efficient, optimized performance
sweet spots
Not designed to be a data flow solution
NiFi and message queues are complimentary
© Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Best Practices
People, process, assets
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scaling NiFi across the enterprise
1) think NiFi like software projects
For NiFi to scale across the enterprise and over time:
think people and process
COE
Program Mgmt SDLC Ops
2) establish a COE to align stakeholders and provide direction
plan execute
Standards
admin & monitor
strategize
align stakeholders
Challenges:
- multitenancy
- reuse /
efficiencies
- platform sanity
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise People, Process, Assets
Program Mgmt SDLC Ops
Standards
COE
key player key asset
enterprise
architect
program
mgmt
ops
strategy
leadersecurity risk/
compliance
business
leaders
capabilities/value
needs
education plan
success metrics
data flow summary
• owner
• source(s)
• frequency
• volume
• target(s)
• goal of flow
principles / standards
reusable components
patterns
failure response procedure
Key points
• Similar to what you already know
• Education of diverse stakeholders is
key to applying this to NiFi
• Use agile for process and assets:
start small/simple
iterate, learn, advance … repeat
• Involve business leaders closely:
NiFi (like all tech) ultimately is about
providing business value and should be
driven by this
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Standards and SDLC examples
SDLC
Standards
principles / standards
reusable components
patterns
enterprise
architect
https://hortonworks.com/blog/enterprise-nifi-implementing-
reusable-components-software-development-lifecycle/
Strive to
• leverage NiFi sweetspots
• use reusable components when possible (including Schema Registry!)
• use Expression Language to make processor as flexible as possible
• N- retry error handling template
• store raw to HDFS template
principles / standards
reusable components
SDLC & reuse
patterns
Pattern: N- retry error handling Anti-pattern
• infinite retry loop on long-lasting error and incoming flow file
(e.g. HTTPRequest: url endpoint is down)
Consequence:
• continuous retry with no email alert
• possible backward propagation of backpressure
processor
N times
PutEmail
> N
PutFile
HDF 3.1 Big Improvements Coming!
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best Practices: One NiFi Cluster or Many?
Challenges:
- multitenancy
- reuse /
efficiencies
- platform sanity
NiFI
NiFi
NiFi
NiFi
NiFi
NiFi
NiFi
business groups:
• less independent
• smaller
• semi-autonomous
• larger
challenges for single cluster
• smaller • larger
Separate clusters make more
sense with larger, semi-
autonomous business groups
COE still has governance
role over multiple clusters
/ business groups
NiFi
Key
business group
NiFi cluster

Contenu connexe

Tendances

Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer GuideDeon Huang
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopDataWorks Summit
 
Building Your Data Streams for all the IoT
Building Your Data Streams for all the IoTBuilding Your Data Streams for all the IoT
Building Your Data Streams for all the IoTDevOps.com
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Event Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and ConfluentEvent Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and Confluentconfluent
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifiAnshuman Ghosh
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 

Tendances (20)

Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Building Your Data Streams for all the IoT
Building Your Data Streams for all the IoTBuilding Your Data Streams for all the IoT
Building Your Data Streams for all the IoT
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Event Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and ConfluentEvent Streaming in the Telco Industry with Apache Kafka® and Confluent
Event Streaming in the Telco Industry with Apache Kafka® and Confluent
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 

Similaire à NiFi Best Practices for the Enterprise

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionMilind Pandit
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方HortonworksJapan
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scaleCarolyn Duby
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motionRaúl Marín
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiIsheeta Sanghi
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataMats Johansson
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018Timothy Spann
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 

Similaire à NiFi Best Practices for the Enterprise (20)

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

NiFi Best Practices for the Enterprise

  • 1. NiFi Best Practices for the Enterprise August 8, 2017 Future of Data – New Jersey Hosted by Honeywell Greg Keys Solutions Engineer @ Hortonworks Agenda  NiFi Quick Overview  NiFi Comparison with Other Technology Best Practices: People, Process, Assets
  • 2. © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Quick Overview
  • 3. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks Data Flow (HDF) NiFi-like UI for building stream analytics Reusable schema registry across all components Data in Motion Data at Rest
  • 4. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Crash Course Other concepts • template = code • processor group = encapsulation • remote processor group = communication across NiFi clusters • expression language = dynamic • provenance = data lineage • audit = history of data state for all • replay = playback data in queue
  • 5. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Big Picture Pattern: Diverse Flows from One Tool
  • 6. © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Capability Sweetspots Comparison to other technologies
  • 7. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs HDF Streaming Analytics Manager (SAM) NiFi: Flow Management SAM: Stream Processing batch, microbatch or streaming data small (KB) to large (GB) parse, filter, transform, enrich, reformat route, merge, prioritization, back pressure, and edge intelligence diverse (structured, semi-, un-) diverse (e.g. text, video analytics) minimal streaming (real-time flow) small (KB, MB) per message minimal minimal (mostly routing and merging) typically Kafka or live stream analytics and event processing rich and powerful, live data Move and process diverse data Analytics and alerting on live data speed content size content manipulation dataflow management data sources analytics KafkaCommon Pattern: sweet spot Both have easy UIs to build complex data movement processing computation
  • 8. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi/MiNiFi NiFI Compared to Other Technologies ETL Tools (Informatica, etc) Enterprise Service Bus (Mulesoft, Fuse, etc) Message Queue (Kafka, JMS, etc) NiFi gets you 80% there NiFi gets you 80% there NiFi as a single tool has a very wide sweetspot “Swiss Army Knife of Data Movement” Deliver and process wide range of data to the business (sensor, logs, relational, restful/json, video …) NiFi is complimentary Event Processing & Streaming Analytics: previous slide
  • 9. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs ETL Tools ETL Tools ETL on structured & semi-structured (schema registry for structured) Diverse sources and targets (e.g. Rest APIs, PDFs, .xls) including RDBMS All-encompassing flows: no data prep with other tool; can join/merge flows; lookup, etc rapid development flow management capabilities (backpressure, prioritization, etc) Not designed for large/complex joins & aggregations, log-based CDC (vs field- based), or industrial-strength cleansing • Tailored for RDBMS • ETL based on schema / data modelling • Highly efficient, optimized performance sweet spots Must prepare and model data beforehand Not designed for dataflow problems and diverse types of data NiFi gets you 80% there
  • 10. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs Enterprise Service Bus Enterprise Service Bus Easy robust integrations from diverse sources to diverse targets (e.g. Salesforce to Hive; Oracle to Mongo, sFTP to file system) Easy file conversions (e.g. XML to JSON) Easy protocol conversions (route on attribute, e.g. FTP to Rest) Easy cross data center communication (remote processor) Not a comprehensive single abstraction layer: no code in applications, minimal process choreography, SOA stuff, etc • Single abstraction layer for content movement throughout enterprise • Specialized tools for each component of abstraction (e.g. process orchestration, protocol binding) • Highly efficient, optimized performance sweet spots Industrial-strength focus on single abstraction layer prevents flexibility and agile dev of NiFi data flows (platform stack vs Swiss Army knife) NiFi gets you 80% there
  • 11. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs Message Queue Message Queue General and flexible data movement tool Great for acquiring and processing data before delivering to queue Not designed to be a message queue (i.e. asynchronous durable handoff to receiver of message) • Low latency content transfer between sender and receiver • Asynchronous: sender releases message without receiver taking it • Durable: message persists until receiver consumes it • Highly efficient, optimized performance sweet spots Not designed to be a data flow solution NiFi and message queues are complimentary
  • 12. © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Best Practices People, process, assets
  • 13. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scaling NiFi across the enterprise 1) think NiFi like software projects For NiFi to scale across the enterprise and over time: think people and process COE Program Mgmt SDLC Ops 2) establish a COE to align stakeholders and provide direction plan execute Standards admin & monitor strategize align stakeholders Challenges: - multitenancy - reuse / efficiencies - platform sanity
  • 14. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise People, Process, Assets Program Mgmt SDLC Ops Standards COE key player key asset enterprise architect program mgmt ops strategy leadersecurity risk/ compliance business leaders capabilities/value needs education plan success metrics data flow summary • owner • source(s) • frequency • volume • target(s) • goal of flow principles / standards reusable components patterns failure response procedure Key points • Similar to what you already know • Education of diverse stakeholders is key to applying this to NiFi • Use agile for process and assets: start small/simple iterate, learn, advance … repeat • Involve business leaders closely: NiFi (like all tech) ultimately is about providing business value and should be driven by this
  • 15. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Standards and SDLC examples SDLC Standards principles / standards reusable components patterns enterprise architect https://hortonworks.com/blog/enterprise-nifi-implementing- reusable-components-software-development-lifecycle/ Strive to • leverage NiFi sweetspots • use reusable components when possible (including Schema Registry!) • use Expression Language to make processor as flexible as possible • N- retry error handling template • store raw to HDFS template principles / standards reusable components SDLC & reuse patterns Pattern: N- retry error handling Anti-pattern • infinite retry loop on long-lasting error and incoming flow file (e.g. HTTPRequest: url endpoint is down) Consequence: • continuous retry with no email alert • possible backward propagation of backpressure processor N times PutEmail > N PutFile HDF 3.1 Big Improvements Coming!
  • 16. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Best Practices: One NiFi Cluster or Many? Challenges: - multitenancy - reuse / efficiencies - platform sanity NiFI NiFi NiFi NiFi NiFi NiFi NiFi business groups: • less independent • smaller • semi-autonomous • larger challenges for single cluster • smaller • larger Separate clusters make more sense with larger, semi- autonomous business groups COE still has governance role over multiple clusters / business groups NiFi Key business group NiFi cluster