3. History
3
• Developed at NSA for over eight years
• Donated to the Apache Software Foundation Nov 2014
• Undergoing incubation
• Three ASF releases to date
• Closing in on 0.2.0 release
4. The problem space: Enterprise Dataflow
4
Automate the flow of
data from any source
…to systems which
extract meaning and
insight
…and to those that
store and make it
available for users
5. Use Cases for NiFi
5
• Remote sensor delivery
• Inter-site / global distribution
• Intra-site distribution
• ‘Big Data’ ingest
• Data Processing (enrichment, filtering, sanitization)
6. The challenges we faced
6
• Transport / Messaging was not enough
• Needed to understand the big picture
• Needed the ability to make *immediate* changes
• Must maintain chain of custody for data
• Rigorous security and compliance requirements
7. Why transport and messaging was not enough?
7
• Data access exceeded resources to transport
• Decoupling systems is about more than the connectivity
• Message sizes ranged from B to GB
• Not all data is created equal
• Needed precise security controls
• SSL and topic level authorization insufficient
8.
The basic building blocks
Real-time Command and Control
The Power of Provenance
8
Apache NiFi Foundational Concepts
2
3
1
9. HEADER
-‐ UUID
-‐ Name
-‐ Size
-‐ Entry
Time
A3ributes
Map
[[Key
|
Value]]
CONTENT
Flow File
9
• Types
• Events
• Objects
• Files
• Messages
• Media
• Formats
• JSON
• Avro
• Text
• Mp4
• Proprietary
• Sizes
• Bytes to GBs
15. Tighten the feedback loop
• Changes have consequences (good or bad)
• And you see them as they occur
Continuous Improvement
• Compare real-time vs. historical statistics
• View data provenance
• View Content at any stage
Intuitive user experience
• Visual programming
• Logical flow graph
15
Real-time command and control2
16. Latency Optimization
• Intra process
• Inter process
• End-to-end
Compliance
• Prove handling
• Assess impact
Understanding
• Step through time
• View content
• View Context
16
The Power of Provenance – Chain of custody for data3
18. Flow File Repo – Write Ahead Log
Content Repo
Add more partitions
Input/Output Streams
Copy on Write
Pass by Reference
Allow tradeoffs of latency vs throughput
18
How fast is it and why?
19. - User to System and System to System
- Authentication (2-Way SSL, more coming…)
- Authorization (pluggable)
- Authorize a specific piece of data to a specific system
- Data provenance
- Prove you have done the right thing
- Recover when you have not
19
How does it deal with security?
20. Web UI
Push API
Reporting Tasks (ganglia, graphite, etc…)
Pull API
REST API
20
How can I monitor this at runtime?
21. Flow File Processors
Advanced UI
Flow File Prioritizer
Reporting Tasks
Controller Services
Build Clients against our REST API
21
What are the points of extension?
22. Status and direction for NiFi
22
Efficient use of each node
- 100s of MB/s per node
- 100Ks transactions/s per node
Simple / Effective scaling model
Runtime Command and Control
Data Provenance
Distributed durability of data
- Maybe Kafka backed queues
High Availability Cluster Manager
Live / Rolling Upgrades
Provenance Query Language /
Reporting
A complete user experience enabled by
provenance
Existing Strengths Roadmap Highlights
23. Apache NiFi (incubating) site
http://nifi.incubator.apache.org
Subscribe to and collaborate at
dev@nifi.incubator.apache.org
users@nifi.incubator.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
@apachenifi
23
Learn more about Apache NiFi