What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks