Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi

633 vues

Publié le

Description: MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately.

Abstract: Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.

Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.

Takeaways: Attendees will learn about opportunities to bring their data flow and capture closer to the "edge" -- sources of data like IoT devices, vehicles, machinery, etc. They will understand the possibilities to prioritize, filter, secure, and manipulate this data earlier in the data lifecycle to enhance their data visibility and performance.

Publié dans : Technologie
  • Soyez le premier à commenter

Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi

  1. 1. © Hortonworks Inc. 2011–2019. All rights reserved;1 Help, I Have Too Many Raspberry Pis! Tim Spann NiFi Expert, Easily Distracted by New TensorFlow Models 21 March 2019 Dataworks Summit Barcelona
  2. 2. © Hortonworks Inc. 2011–2019. All rights reserved;2 The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi Andy LoPresto | @yolopey DIM Security Engineering Manager at Cloudera, Apache NiFi PMC & Committer 21 March 2019 Dataworks Summit Barcelona
  3. 3. © Hortonworks Inc. 2011–2019. All rights reserved;3 Gauging Audience Familiarity With NiFi “What’s a NeeFee?” No experience with dataflow No experience with NiFi “I can pick this up pretty quickly” Some experience with dataflow Some experience with NiFi “I refactored the Ambari integration endpoint to allow for mutual authentication TLS during my coffee break” Forgotten more about NiFi than most of us will ever know
  4. 4. © Hortonworks Inc. 2011–2019. All rights reserved;4 Agenda • What is dataflow and what are the challenges? • Apache NiFi • IoT Challenges • Apache MiNiFi • Exploration • Community • Roadmap • Q&A All slides provided online, no need to transcribe
  5. 5. © Hortonworks Inc. 2011–2019. All rights reserved;5 What is dataflow?
  6. 6. © Hortonworks Inc. 2011–2019. All rights reserved;6 What is dataflow? • Moving some content from A to B • Content could be any bytes • Logs • HTTP • XML • CSV • Images • Video • Telemetry Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  7. 7. © Hortonworks Inc. 2011–2019. All rights reserved;7 Moving data effectively is hard “Data Pipeline” https://xkcd.com/2054/
  8. 8. © Hortonworks Inc. 2011–2019. All rights reserved;8 • Standards • Formats • Protocols • Veracity • Validity • Schemas • Partitioning/ Bundling Data Dataflow Challenges In 3 Categories Infrastructure • “Exactly Once” Delivery • Ensuring Security • Overcoming Security • Credential Management • Network People • Compliance • “That [person| team|group]” • Consumers Change • Requirements Change • “Exactly Once” Delivery
  9. 9. © Hortonworks Inc. 2011–2019. All rights reserved;9 Raise your hand if you want to maintain Python scripts for the rest of your life Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs
  10. 10. © Hortonworks Inc. 2011–2019. All rights reserved;10 What is Apache NiFi?
  11. 11. © Hortonworks Inc. 2011–2019. All rights reserved;11 • Server/DC class • GUI & REST API • Interacts with hundreds of services NiFi One Minute Intro to the NiFi Ecosystem MiNiFi (JVM/C++) • Agent class • Runs on minimal hardware • Interacts with NiFi • Simple event processing NiFi Registry • Standalone application • Handles asset management • Flow versioning & deployment • Extension registry
  12. 12. © Hortonworks Inc. 2011–2019. All rights reserved;12 Flowfiles Are Like HTTP Data HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ Binary Content * Header Content
  13. 13. © Hortonworks Inc. 2011–2019. All rights reserved;13 User Interface Less of this…… more of this
  14. 14. © Hortonworks Inc. 2011–2019. All rights reserved;14 Deeper Ecosystem Integration: 286+ Processors, 61 Controller Services Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute All Apache project logos are trademarks of the ASF and the respective projects. Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket Parse Records Convert Records
  15. 15. © Hortonworks Inc. 2011–2019. All rights reserved;15 People want to know about their data Colin the Chicken | Portlandia | IFC
  16. 16. © Hortonworks Inc. 2011–2019. All rights reserved;16 “This is local? I’m going to ask you this one more time — this is local?” Colin the Chicken | Portlandia | IFC
  17. 17. © Hortonworks Inc. 2011–2019. All rights reserved;17 “Is that USDA organic, or Oregon organic, or Portland organic?” Colin the Chicken | Portlandia | IFC
  18. 18. © Hortonworks Inc. 2011–2019. All rights reserved;18 “Oh you have this information? This is fantastic!” Colin the Chicken | Portlandia | IFC
  19. 19. © Hortonworks Inc. 2011–2019. All rights reserved;19 Data Provenance ▪ Constrained ▪ High-latency ▪ Localized context ▪ Hybrid – cloud/on-premises ▪ Low-latency ▪ Global context Origin – attribution Replay – recovery Evolution of topologies Long retention Types of Lineage • Event • Configuration
  20. 20. © Hortonworks Inc. 2011–2019. All rights reserved;20 What are the IoT challenges?
  21. 21. © Hortonworks Inc. 2011–2019. All rights reserved;21 IoT Challenges • Limited computing capability • Limited power/network • Restricted software library/platform availability • No UI • Physically inaccessible • Not frequently updated • Competing standards/protocols • Scalability • Privacy & Security @_lennart
  22. 22. © Hortonworks Inc. 2011–2019. All rights reserved;22 • When the Mirai attack has its own Wikipedia page, that’s not good • Hackers stole high-roller database from casino via aquarium thermometer connected to internet (04/2018) Recent Examples
  23. 23. © Hortonworks Inc. 2011–2019. All rights reserved;23 “The S in IoT stands for security”Oleg Šelajev, @shelajev
  24. 24. © Hortonworks Inc. 2011–2019. All rights reserved;24 • Runs on JVM • Provides UI for flow design & monitoring • Security built-in • TLS, authentication/authorization, encrypted data • Handles practically any format/protocol NiFi Solves Everything*
  25. 25. © Hortonworks Inc. 2011–2019. All rights reserved;25 • NiFi supports AMQP, MQTT, UDP, TCP, HTTP(S), CEF, JMS, (S)FTP, AWSIoT • With a little pruning, NiFi can run on a Raspberry Pi NiFi for IoT
  26. 26. © Hortonworks Inc. 2011–2019. All rights reserved;26 • NiFi is designed to “own the box” • NiFi 0.7.x started up in about 10-15 minutes on RP3 (593 MB) • NiFi 1.x started up in about 30 minutes on RP3 (760 MB) • 33 new processors • Rewrite for multi tenant authorization • Complete UI overhaul So Why Do We Need A Different Solution?
  27. 27. © Hortonworks Inc. 2011–2019. All rights reserved;27 Enter Apache MiNiFi
  28. 28. © Hortonworks Inc. 2011–2019. All rights reserved;28 • Get the key parts of NiFi close to where data begins and provide bidirectional communication • NiFi lives in the data center — give it an enterprise server or a cluster of them • MiNiFi lives as close to where data is born and is a guest on that device or system • IoT • Connected car • Legacy hardware Apache NiFi Subproject: MiNiFi
  29. 29. © Hortonworks Inc. 2011–2019. All rights reserved;29 • NiFi is big • 1.9.1 release is 1.2 GB compressed • Can be modified to run in restricted environments, but requires manual surgery • Provides UI, provenance query, etc. • Runs on dedicated machines/clusters — “owns the box” • MiNiFi lives at the edge • No UI • 0.5.0 Java release is 67 MB, C++ release is 6.1 MB (0.2.0 fits on a floppy disk) • “Good guest” Why build MiNiFi?
  30. 30. © Hortonworks Inc. 2011–2019. All rights reserved;30 • MiNiFi Java (v0.5.0) • Modified version of NiFi • No UI • YAML configuration • Reduced processor count • 63+ by default, more 
 available with 
 additional NARs • MiNiFi C++ (v0.5.0) • Written from scratch • 33 processors by default • Bi-directional site-to-site & provenance data Flavors of MiNiFi
  31. 31. © Hortonworks Inc. 2011–2019. All rights reserved;31 NiFi vs MiNiFi Java Processes NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  32. 32. © Hortonworks Inc. 2011–2019. All rights reserved;32 • NiFi • Design flows • Aggregate data from many sources • Perform routing/analysis/SEP • MiNiFi • Receive flows • Collect data • Send for processing How Does MiNiFi Interact With NiFi?
  33. 33. © Hortonworks Inc. 2011–2019. All rights reserved;33 • We’ve been imagining EDGE to CORE as a bi-directional linear system • Let’s expand 
 that to the real 
 world Let’s Add Dimensionality
  34. 34. © Hortonworks Inc. 2011–2019. All rights reserved;34 • Data tagging/provenance • Governance from edge (geopolitical restrictions) • Security (encryption, certificate-based authentication) • Low latency (immediate reactions & decision-making) What does MiNiFi provide? Connected Car Reference Platform Box Tuner + DSRC CardConnectivity Card
  35. 35. © Hortonworks Inc. 2011–2019. All rights reserved;35 MiNiFi on a Connected Car Comprehension Collection Processing / Synthesis Parse <> Listen <> CAN Bus Gateway MCU MCU MCU Ethernet / Ethernet AVB Local Interconnect Network Yet to be established protocol Listen Ethernet Listen LINListen CAN Parse CAN Parse Ethernet Parse LIN Route Transmit Execute PrioritizeFilter
  36. 36. © Hortonworks Inc. 2011–2019. All rights reserved;36 MiNiFi on a Connected Car
  37. 37. © Hortonworks Inc. 2011–2019. All rights reserved;37 • Site-to-Site • NiFi protocol • Two implementations • Raw socket • HTTP(S) • Secured with mutual authentication TLS • HTTP(S), (S)FTP, JMS, Syslog, File, Email, Process MiNiFi Exfil
  38. 38. © Hortonworks Inc. 2011–2019. All rights reserved;38 Edge Data Exploration
  39. 39. © Hortonworks Inc. 2011–2019. All rights reserved;39 Credit Card “Processing”
  40. 40. © Hortonworks Inc. 2011–2019. All rights reserved;40 Collect Card Contents • User swipes magnetic stripe card • Raspberry Pi collects data • Sends back to laptop
  41. 41. © Hortonworks Inc. 2011–2019. All rights reserved;41 Component List • IDTECH WCR3237-733U • Raspberry Pi 3 B+ • Apache MiNiFi 0.6.0-SNAPSHOT • Apache NiFi 1.9.0-SNAPSHOT • Apache NiFi Registry 0.4.0-SNAPSHOT
  42. 42. © Hortonworks Inc. 2011–2019. All rights reserved;42 Flow Design • NiFi Flow • Receive data via S2S • Route encrypted CC data to decrypt • Route all plaintext swipes through parser • Convert to JSON • Batch & write to file • MiNiFi Flow • Tail file containing swipe data • Detect CC swipes & encrypt • Transmit all swipes to NiFi via S2S
  43. 43. © Hortonworks Inc. 2011–2019. All rights reserved;43 NiFi Flow
  44. 44. © Hortonworks Inc. 2011–2019. All rights reserved;44 MiNiFi Flow
  45. 45. © Hortonworks Inc. 2011–2019. All rights reserved;45 Custom MSR Parsing Logic
  46. 46. © Hortonworks Inc. 2011–2019. All rights reserved;46 Running on Raspberry Pi
  47. 47. © Hortonworks Inc. 2011–2019. All rights reserved;47 Running on MacBook Pro
  48. 48. © Hortonworks Inc. 2011–2019. All rights reserved;48 Sense HAT & Inky pHAT
  49. 49. © Hortonworks Inc. 2011–2019. All rights reserved;49 Sense Hat & Inky pHAT
  50. 50. © Hortonworks Inc. 2011–2019. All rights reserved;50 Flow Design
  51. 51. © Hortonworks Inc. 2011–2019. All rights reserved;51 • Read JSON output from Python script • Send to NiFi via Site-to-Site Reading from Sense HAT
  52. 52. © Hortonworks Inc. 2011–2019. All rights reserved;52 • Ingest JSON from Site-to-Site • Extract JSON elements to attributes • Perform conversion Processing Data
  53. 53. © Hortonworks Inc. 2011–2019. All rights reserved;53 • Receive flowfiles from “Data Center” via Site-to-Site • Sample 10% (down throttling) • Safety valve • Send data to display Displaying Data
  54. 54. © Hortonworks Inc. 2011–2019. All rights reserved;54 Inky pHAT Python Code & VNC
  55. 55. © Hortonworks Inc. 2011–2019. All rights reserved;55 Sense Hat & Inky pHAT
  56. 56. © Hortonworks Inc. 2011–2019. All rights reserved;56 • Add icons & dynamic coloring to eInk display • Signal traffic via LED output on Sense HAT • Build simple UI for touchscreen to enable/disable • Send traffic to Grafana dashboard • Train machine learning model on optimal frequency for each data point sampling Next Steps
  57. 57. © Hortonworks Inc. 2011–2019. All rights reserved;57 Community
  58. 58. © Hortonworks Inc. 2011–2019. All rights reserved;58 Community Example • Jeremy Dyer • Alexa + MiNiFi + Dyer 2.0 http://www.opensourcedad.com/apache/minifi-cpp/2016/12/18/poop-scale.html
  59. 59. © Hortonworks Inc. 2011–2019. All rights reserved;59 What’s Next?
  60. 60. © Hortonworks Inc. 2011–2019. All rights reserved;60 • Cluster Data Management • Load-Balanced Connections • Node Decomissioning • SQL results to record format • Hot-loading NARs • Auto-infer schema for record processors • Docker improvements • TLS Toolkit signing w/ external CA (standalone only) New Features in 1.8.0 - 1.9.1
  61. 61. © Hortonworks Inc. 2011–2019. All rights reserved;61 Previously, on NiFi Clusters… • Ingest activity (FTP, file read, etc.) performed on single node • Routed via RPG to Input Port in cluster • Flowfiles distributed across all nodes Mark Payne, Load Balancing Data Across a NiFi Cluster
  62. 62. © Hortonworks Inc. 2011–2019. All rights reserved;62 Load-Balanced Connections • Individual connections can be load- balanced • None • Round Robin • Attribute node affinity • Single node Mark Payne, Load Balancing Data Across a NiFi Cluster
  63. 63. © Hortonworks Inc. 2011–2019. All rights reserved;63 • Previously, flows were exported via XML templates • Didn’t contain sensitive values • Couldn’t be updated in-place • No tracking system • NiFi Registry brings asset management as first-class citizen to NiFi • Flows can be versioned • Flows can be promoted between environments Introducing Apache NiFi Registry 0.3.0 NiFi Registry for Dataflows
  64. 64. © Hortonworks Inc. 2011–2019. All rights reserved;64 Roadmap
  65. 65. © Hortonworks Inc. 2011–2019. All rights reserved;65 Roadmap • Command & Control Architecture • Extension Registry
  66. 66. © Hortonworks Inc. 2011–2019. All rights reserved;66 MiNiFi Command & Control Architecture • Flow Versioning • Develop flows for class of MiNiFi instances • Command & Control (C2) API (in Java master) • FileChangeIngestor • RestAPIIngestor • PullHTTPIngestor
  67. 67. © Hortonworks Inc. 2011–2019. All rights reserved;67 MiNiFi C2 + Raspberry Pi • Abdelkrim Hadjidj • Step-by-step walkthrough • deploying services • configuring endpoints • developing flows How to build an IIoT system using Apache NiFi, MiNiFi, C2 Server, MQTT and Raspberry Pi
  68. 68. © Hortonworks Inc. 2011–2019. All rights reserved;68 Extension Registry • Uses NiFi Registry to manage extensions (“narballs”) • Allows faster download, build, and deployment • Think Ubuntu Server vs. Desktop • Create, upload, and download functional bundles • NiFi Wiki - Extension Registry Feature Proposal • NiFi Dev Mailing List - Discussion of implementation LinuxTechi.com
  69. 69. © Hortonworks Inc. 2011–2019. All rights reserved;69 • NiFi 1.9.1 — 16 Mar 2019 (167+ Jiras) • New processors (Kudu, Slack) • Hot-loading NARs • Security HTTP headers • Record processors automatically infer schema • MiNiFi C++ 0.6.0 — Voting now… • MiNiFi Java 0.5.0 — 7 July 2018 • NiFi Registry 0.3.0 — 25 Sept 2018 New Announcements
  70. 70. © Hortonworks Inc. 2011–2019. All rights reserved;70 Apache NiFi site
 https://nifi.apache.org Subproject MiNiFi site https://nifi.apache.org/minifi/ Subscribe to and collaborate at
 dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues
 https://issues.apache.org/jira/browse/NIFI Follow us on Twitter @apachenifi Learn more and join us
  71. 71. © Hortonworks Inc. 2011–2019. All rights reserved;71 More NiFi This Week Title Date Speakers Room Cloudera Data in Motion Meetup - Future of Data Barcelona (free) Tuesday 3/19 1800 - 2000 Tim Spann, Dan Chaffelson, Andy LoPresto 127-128 Apache NiFi Crash Course Wednesday 3/20 1400 - 1600 Nathan Gough, Andy LoPresto 111 Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path towards DataOps 1450 - 1530 Arda Basar, Ivan Georgiev 129-130 IoT, Streaming, and Dataflow Birds of a Feather 1600 - 1730 Tim Spann, Dan Chaffelson, Purnima Reddy Kuchikulla, Andy LoPresto 129-130 Addressing Challenges with IoT Edge Management Thursday 3/21 1100 - 1140 Dinesh Chandrasekhar 127-128 Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi 1150 - 1230 Andy LoPresto 127-128 Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data 1400 - 1440 Tim Spann 129-130 BYOP: Custom Processor Development with Apache NiFi 1600 - 1640 Andy LoPresto 131-132 Apache Deep Learning 201 1650 - 1730 Tim Spann 122-123 Platform for the Research and Analysis of Cybernetic Threats 1650 - 1730 Monica Franceschini 124-125
  72. 72. © Hortonworks Inc. 2011–2019. All rights reserved;72 Thank you alopresto@hortonworks.com | alopresto@apache.org | @yolopey github.com/alopresto/slides

×