SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
From Device to Data Center to Insights
Architectural Considerations for the Internet of Anything
P. Taylor Goetz, Hortonworks
@ptgoetz
About Me
• Tech Staff @ Hortonworks
• PMC Chair, Apache Storm
• ASF Member
• PMC, Apache Incubator, Apache Arrow, Apache
Kylin, Apache Apex
• Mentor/PPMC, Apache Eagle (Incubating), Apache
Mynewt (Incubating), Apache Metron (Incubating),
Apache Gossip (Incubating)
26 billion IoT devices by 2020
-Gartner
http://www.gartner.com/newsroom/id/2636073
IPv4 Address Space: 4.6 billion
IoT Growth
• Everyone here should know IoT is huge
• Sensors, Phones, Connected Cars, Wearables, Software-as-a-
Sensor, ...
• Cuts across virtually all industries
IoT Architecture
Key Architectural Tiers
• Origin: Devices and Data Sources
• Transport: Orchestrating Bi-Directional Data Flow Between Sources
• Analytics: Analysis of Unbounded (Streaming) and Bounded
(Batch) Data, and Acting in Response
Origin Tier
Birthplace of IoT Data
Origin Tier
• Where data is born, but also a destination
• Sensors and Devices
• Constrained Hubs/Gateways
Origin Tier
Devices are getting smaller, cheaper, and increasingly network
enabled.
Examples:
• RaspberryPi ($35, Full OS)
• ESP8266 (<$5 WiFi-enabled microcontroller)
Origin Tier
Devices in the Origin Tier both transmit and receive data.
• Command and Control
• Actuators (interaction with the physical environment)
• End user alerts and notifications
IoT Protocol Considerations
IoT Protocol Considerations
• Device-Device / Device-Gateway Communication
• Radio Frequency Protocols
• IP-based Protocols
IoT Protocol Considerations
Radio Frequency Protocols
• Typically for very resource-constrained devices (Ex: Wireless
sensors in a home security system)
• Usually involve an intermediary hub/gateway as a protocol bridge
(Ex: Main panel in a home security system)
• Short range
• Low Power
Radio Frequency Protocols
ZigBee
• Intended for low power applications (~2 yr. battery life)
• Low data rates
• Simpler and less expensive that WPANs like Bluetooth
Radio Frequency Protocols
ZigBee
• Range: 10–100 meters LOS (between nodes, but messages can
hop in a mesh network)
• Data Rate: 250 kbit/s
• Supports Star, Tree, and Mesh network topologies
• Requires a coordinator device for every network (usually the hub/
gateway)
Radio Frequency Protocols
Z-Wave
• Targets home automation
• Low power/Low data rate
• Proprietary
• Sole chip vendor
Radio Frequency Protocols
Z-Wave
• Range: ~30 meters LOS (between nodes, but messages can hop)
• Data Rate: 100kbit/s
• Form source-routed mesh-networks (can route around failures/obstacles)
• Devices must be paired
• Requires a primary controller (e.g. the hub/gateway)
• Max 232 devices per network (but networks can be bridged)
Radio Frequency Protocols
Bluetooth/Blootooth LE
• Targets wireless computer and device accessories
• High data rates
• Do not form routed networks like Zigbee and Z-Wave
• Usually one host to many device pairing
• Range: 0.5m (Class 4) - 100m (Class 1)
• Data Rate: 1 Mbit/s - 24 Mbit/s
Radio Frequency Protocols
Thread
• New wireless protocol introduced by Nest (Google/Alphabet), Samsung, ARM, Qualcomm
• Built on top of the same (IEEE 802.15.4) specification as ZigBee
• IPv6-based
• Mesh network with hops supported
• ~250 devices per network
• Very low power (purported years of operation on a single AA with deep sleep modes)
• Very new/unsure future — WiFi, Bluetooth, etc. already ubiquitous
IoT Protocol Considerations
IP-Based Protocols
• Require a full IP stack
• Higher power consumption
• Longer range (e.g. WiFi)
IP-Based Protocols
CoAP - Constrained Application Protocol
• Designed to be used on micro controllers with as little as 10k of
memory.
• Simple request/response protocol
• Much like HTTP but based on UDP
• Based on the REST model (GET, PUT, POST, DELETE)
• Strong security via DTLS (Datagram Transport Layer Security)
IP-Based Protocols
CoAP - Constrained Application Protocol
• Simple 4-byte header
• Subset of MIME types and HTTP response codes
• Data model agnostic
• one-to-one
• Tranport (UDP) <— Base Messaging (Simple Confirmable/Non-
Confirmable message transfer) <— REST Semantics
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Pub/Sub messaging protocol
• Requires a broker (though brokers can be lightweight)
• many-to-many broadcast
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Message == Topic + Payload
• Topics: users/ptgoetz/office/thermostat
• Topic wildcards:
• Single level (+): users/ptgoetz/+/thermostat
• Multi-level (#): users/ptgoetz/office/#
• Payload: Just a bunch of bytes (you define the schema)
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Delivery guarantees (QoS):
• 0: At-most-once
• 1: At-least-once
• 2: Exactly-once
• Last will and testament (when a device goes offline)
• Security via SSL/TLS
Apache Mynewt (incubating)
• Real-time, modular OS for IoT devices
• Designed for use in devices with power, memory and
storage constraints
• Support for many ARM Cortex-M based boards
(including Arduino)
• HAL for unified access to MCU features
• Connectivity with Bluetooth LE
• WiFi, CoAP, and Thread support (roadmap)
• Remote Firmware Upgrades
• Command-line tools for package management
Transport Tier
Data Flow From Device to Data Center
Transport Tier
• Connecting Edge Devices:
• To and from the Analytics Tier (data center)
• To and from one another (inter-device communication)
• Bridging Protocols:
• e.g. WPAN to IP
• Collecting/Transforming/Enriching Data in Motion
Apache NiFi
Apache NiFi
• Data flow orchestration tool
• Guaranteed Delivery
• Data provenance (important in the Analytics
Tier)
• Backpressure with release
• Flow-specific QoS
• Web-based UI for editing data flows
• Data flows modifiable at runtime
• Supports bi-directional data flows
• Integrates with just about any system
Apache NiFi
Basic Concepts
• Flow File: Unit of user data with associated
key-value metadata
• Processor: Components for creating, sending,
receiving, transforming, routing, etc. Flow Files
• Connection: Acts as the link between
processors.
• Flow Controller: Brokers the exchange of data
between processors
• Process Group: Set of Processors and
Connections with Input/Output ports. New
components can be created by composition.
Apache NiFi minifi
• Supplement to NiFi for constrained
devices/environments
• More suitable for edge devices
• Small footprint
• Designed to collect data near where
it originates an integrate with NiFi
Apache NiFi
For more information:
• https://nifi.apache.org
Some of the best technical
documentation I’ve ever seen:
• https://nifi.apache.org/docs.html
Analytics Tier
Acting on Insights
Analytics Tier
• Where IoT data often (but not always) intersects with Big Data
platforms and Cloud Computing
• Vertical scaling may suffice
Analytics Tier
• Many, many options…
• [insert your definition of Hadoop here]
Analytics Tier
Key Platform Considerations:
• Unbounded (Stream) data processing frequently necessary
• Apache Storm, Apache Flink, etc.
• Bounded (Batch) data processing frequently necessary
• e.g. Training machine learning models, etc.
• Apache Hadoop M/R, Apache Flink, Apache Spark
• Time Series DB a common requirement
• Apache HBase, Apache Cassandra, etc.
Analytics Tier
Key Platform Considerations:
• Latency matters for many use cases
• Latency can add up quickly, depending on the number of “hops”
• Windowing semantics and flexibility
When?
The importance of event time(s).
What is Event Time and why is it so important?
• Event Times: Origin Time vs. Processing Time
• Ex: Airplane Mode
• Other types of Event Time:
• Enrichment Time
• Ingest Time
• Processing Time 1, 2, n…
• Exit Time (e.g. “return” events, C2, bi-directional communication)
Choose a platform/API that gives
you the most flexibility with respect
to dealing with various event times.
Future-Proofing and Scaling
Small to Medium Scale:
• Not Big Data
• Investment in large-scale distributed system infrastructure
wouldn’t make sense.
• YAGNI (Yet…)
• Vertical scaling may suffice
Future-Proofing and Scaling
Medium to Large Scale:
• A single server is no longer cutting it
• “V”s are starting to pile up
• Need to move to a distributed architecture to scale with increasing
demand
• Your data is now Big
Apache Beam (incubating)
• Unified API for dealing with bounded/
unbounded data sources (i.e. batch/
streaming)
• One API. Multiple implementations
(execution engines). Called
“Runners” in Beamspeak.
Apache Beam (incubating)
• Major focus on Windowing and
properly dealing with Event Time(s)
• Sliding Windows, Tumbling Windows,
Session Windows, etc.
• Watermark capabilities for dealing
with late data
Apache Beam (incubating)
• Runner/Execution Engine Availability
• Local runner (single machine)
• Runners for Google Cloud
Dataflow, Flink and Spark
• Others underway: Apache Storm,
Apache Apex and others
Apache Beam (incubating)
• Choose the right runner for your
current scaling and organizational
needs (you can switch later as as
necessary)
• Understand the limits of different
runner implementations
• Outside of Google Data Flow, the
Flink runner is currently the most
feature-complete (this will change)
Apache Beam (incubating)
For a technical deep dive into Apache
Beam:
Apache Beam: A Unified Model for
Batch and Streaming Data
Processing
- Davor Bonaci, Google Inc.
Thursday 4:10PM, Ballroom A
Firmware, Parsers, and
Schemas
(Oh my!)
Problem: Data Formats
• Many IoT devices transmit data as a raw array of bytes
• The format of that data may be proprietary
• To be of any use it must be parsed into a machine-readable format
(i.e. Schema)
• Once parsed, you need to know the schema
Problem: Firmware Versions
• Deployed IoT devices may be running any number of versions
• Data formats may differ between firmware versions
• Multiple parsers may be necessary to accommodate different
device types and firmware versions
Solution: Parser Registry
• Allow manufacturers to supply proprietary parsers, load at runtime
• Parser API to include way to discover schema
• Tag data with device type + firmware version at the hub/gateway
• Look up associated parser when data arrives
• (This can be done either in either the Transport or Analytics tier)
Solution: Schema Registry
• When parsers are registered, also register the associated schema
• Downstream components (Transport/Analytics Tier) discover
schema based on metadata
Who owns your IoT data?
Hint: It may not be you.
Who owns your data?
• Beware of 3rd-party device manufacturers
• Data is valuable, and everyone wants it
• Frequently exclusive access
Who owns your data?
• Device manufacturers may hoard data.
• Retention policies limit how long you can store the data.
• Aggregate/Derivative data okay, but what’s the definition?
Thank you!
Questions?
P. Taylor Goetz, Hortonworks
@ptgoetz

Contenu connexe

Tendances

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoYu Liu
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 

Tendances (20)

End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 

En vedette

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
分散システムにおけるUUID(汎用一意識別子)の利用拡大
分散システムにおけるUUID(汎用一意識別子)の利用拡大分散システムにおけるUUID(汎用一意識別子)の利用拡大
分散システムにおけるUUID(汎用一意識別子)の利用拡大Kazuki Aranami
 
ttyrecからGIFアニメを作る話
ttyrecからGIFアニメを作る話ttyrecからGIFアニメを作る話
ttyrecからGIFアニメを作る話Yoshihiro Sugi
 
Uuidはどこまでuuidか試してみた
Uuidはどこまでuuidか試してみたUuidはどこまでuuidか試してみた
Uuidはどこまでuuidか試してみたYu Yamada
 
Open Source and the Internet of Things
Open Source and the Internet of ThingsOpen Source and the Internet of Things
Open Source and the Internet of ThingsBlack Duck by Synopsys
 
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...Kai Wähner
 

En vedette (7)

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
分散システムにおけるUUID(汎用一意識別子)の利用拡大
分散システムにおけるUUID(汎用一意識別子)の利用拡大分散システムにおけるUUID(汎用一意識別子)の利用拡大
分散システムにおけるUUID(汎用一意識別子)の利用拡大
 
UUID
UUIDUUID
UUID
 
ttyrecからGIFアニメを作る話
ttyrecからGIFアニメを作る話ttyrecからGIFアニメを作る話
ttyrecからGIFアニメを作る話
 
Uuidはどこまでuuidか試してみた
Uuidはどこまでuuidか試してみたUuidはどこまでuuidか試してみた
Uuidはどこまでuuidか試してみた
 
Open Source and the Internet of Things
Open Source and the Internet of ThingsOpen Source and the Internet of Things
Open Source and the Internet of Things
 
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
IoT Open Source Integration Comparison (Kura, Node-RED, Flogo, Apache Nifi, S...
 

Similaire à From Device to Data Center to Insights

ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 
IP Signal Distribution
IP Signal DistributionIP Signal Distribution
IP Signal DistributionrAVe [PUBS]
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTDominik Obermaier
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Sooraj Sanker
 
Null mumbai-iot-workshop
Null mumbai-iot-workshopNull mumbai-iot-workshop
Null mumbai-iot-workshopNitesh Malviya
 
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureGlobal Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureVinoth Rajagopalan
 
Web technologies: recap on TCP-IP
Web technologies: recap on TCP-IPWeb technologies: recap on TCP-IP
Web technologies: recap on TCP-IPPiero Fraternali
 
5 introduction to internet
5 introduction to internet5 introduction to internet
5 introduction to internetVedpal Yadav
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumData Driven Innovation
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute ClusterRamsay Key
 
09 Systems Software Programming-Network Programming.pptx
09 Systems Software Programming-Network Programming.pptx09 Systems Software Programming-Network Programming.pptx
09 Systems Software Programming-Network Programming.pptxKushalSrivastava23
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterShawn Zandi
 
IoT interoperability
IoT interoperabilityIoT interoperability
IoT interoperability1248 Ltd.
 

Similaire à From Device to Data Center to Insights (20)

ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
IP Signal Distribution
IP Signal DistributionIP Signal Distribution
IP Signal Distribution
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTT
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?
 
Null mumbai-iot-workshop
Null mumbai-iot-workshopNull mumbai-iot-workshop
Null mumbai-iot-workshop
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureGlobal Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
 
Web technologies: recap on TCP-IP
Web technologies: recap on TCP-IPWeb technologies: recap on TCP-IP
Web technologies: recap on TCP-IP
 
5 introduction to internet
5 introduction to internet5 introduction to internet
5 introduction to internet
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
09 Systems Software Programming-Network Programming.pptx
09 Systems Software Programming-Network Programming.pptx09 Systems Software Programming-Network Programming.pptx
09 Systems Software Programming-Network Programming.pptx
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data Center
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
IoT interoperability
IoT interoperabilityIoT interoperability
IoT interoperability
 

Plus de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

From Device to Data Center to Insights

  • 1. From Device to Data Center to Insights Architectural Considerations for the Internet of Anything P. Taylor Goetz, Hortonworks @ptgoetz
  • 2. About Me • Tech Staff @ Hortonworks • PMC Chair, Apache Storm • ASF Member • PMC, Apache Incubator, Apache Arrow, Apache Kylin, Apache Apex • Mentor/PPMC, Apache Eagle (Incubating), Apache Mynewt (Incubating), Apache Metron (Incubating), Apache Gossip (Incubating)
  • 3. 26 billion IoT devices by 2020 -Gartner http://www.gartner.com/newsroom/id/2636073
  • 4. IPv4 Address Space: 4.6 billion
  • 5. IoT Growth • Everyone here should know IoT is huge • Sensors, Phones, Connected Cars, Wearables, Software-as-a- Sensor, ... • Cuts across virtually all industries
  • 7. Key Architectural Tiers • Origin: Devices and Data Sources • Transport: Orchestrating Bi-Directional Data Flow Between Sources • Analytics: Analysis of Unbounded (Streaming) and Bounded (Batch) Data, and Acting in Response
  • 9. Origin Tier • Where data is born, but also a destination • Sensors and Devices • Constrained Hubs/Gateways
  • 10. Origin Tier Devices are getting smaller, cheaper, and increasingly network enabled. Examples: • RaspberryPi ($35, Full OS) • ESP8266 (<$5 WiFi-enabled microcontroller)
  • 11. Origin Tier Devices in the Origin Tier both transmit and receive data. • Command and Control • Actuators (interaction with the physical environment) • End user alerts and notifications
  • 13. IoT Protocol Considerations • Device-Device / Device-Gateway Communication • Radio Frequency Protocols • IP-based Protocols
  • 14. IoT Protocol Considerations Radio Frequency Protocols • Typically for very resource-constrained devices (Ex: Wireless sensors in a home security system) • Usually involve an intermediary hub/gateway as a protocol bridge (Ex: Main panel in a home security system) • Short range • Low Power
  • 15. Radio Frequency Protocols ZigBee • Intended for low power applications (~2 yr. battery life) • Low data rates • Simpler and less expensive that WPANs like Bluetooth
  • 16. Radio Frequency Protocols ZigBee • Range: 10–100 meters LOS (between nodes, but messages can hop in a mesh network) • Data Rate: 250 kbit/s • Supports Star, Tree, and Mesh network topologies • Requires a coordinator device for every network (usually the hub/ gateway)
  • 17. Radio Frequency Protocols Z-Wave • Targets home automation • Low power/Low data rate • Proprietary • Sole chip vendor
  • 18. Radio Frequency Protocols Z-Wave • Range: ~30 meters LOS (between nodes, but messages can hop) • Data Rate: 100kbit/s • Form source-routed mesh-networks (can route around failures/obstacles) • Devices must be paired • Requires a primary controller (e.g. the hub/gateway) • Max 232 devices per network (but networks can be bridged)
  • 19. Radio Frequency Protocols Bluetooth/Blootooth LE • Targets wireless computer and device accessories • High data rates • Do not form routed networks like Zigbee and Z-Wave • Usually one host to many device pairing • Range: 0.5m (Class 4) - 100m (Class 1) • Data Rate: 1 Mbit/s - 24 Mbit/s
  • 20. Radio Frequency Protocols Thread • New wireless protocol introduced by Nest (Google/Alphabet), Samsung, ARM, Qualcomm • Built on top of the same (IEEE 802.15.4) specification as ZigBee • IPv6-based • Mesh network with hops supported • ~250 devices per network • Very low power (purported years of operation on a single AA with deep sleep modes) • Very new/unsure future — WiFi, Bluetooth, etc. already ubiquitous
  • 21. IoT Protocol Considerations IP-Based Protocols • Require a full IP stack • Higher power consumption • Longer range (e.g. WiFi)
  • 22. IP-Based Protocols CoAP - Constrained Application Protocol • Designed to be used on micro controllers with as little as 10k of memory. • Simple request/response protocol • Much like HTTP but based on UDP • Based on the REST model (GET, PUT, POST, DELETE) • Strong security via DTLS (Datagram Transport Layer Security)
  • 23. IP-Based Protocols CoAP - Constrained Application Protocol • Simple 4-byte header • Subset of MIME types and HTTP response codes • Data model agnostic • one-to-one • Tranport (UDP) <— Base Messaging (Simple Confirmable/Non- Confirmable message transfer) <— REST Semantics
  • 24. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Pub/Sub messaging protocol • Requires a broker (though brokers can be lightweight) • many-to-many broadcast
  • 25. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Message == Topic + Payload • Topics: users/ptgoetz/office/thermostat • Topic wildcards: • Single level (+): users/ptgoetz/+/thermostat • Multi-level (#): users/ptgoetz/office/# • Payload: Just a bunch of bytes (you define the schema)
  • 26. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Delivery guarantees (QoS): • 0: At-most-once • 1: At-least-once • 2: Exactly-once • Last will and testament (when a device goes offline) • Security via SSL/TLS
  • 27. Apache Mynewt (incubating) • Real-time, modular OS for IoT devices • Designed for use in devices with power, memory and storage constraints • Support for many ARM Cortex-M based boards (including Arduino) • HAL for unified access to MCU features • Connectivity with Bluetooth LE • WiFi, CoAP, and Thread support (roadmap) • Remote Firmware Upgrades • Command-line tools for package management
  • 28. Transport Tier Data Flow From Device to Data Center
  • 29. Transport Tier • Connecting Edge Devices: • To and from the Analytics Tier (data center) • To and from one another (inter-device communication) • Bridging Protocols: • e.g. WPAN to IP • Collecting/Transforming/Enriching Data in Motion
  • 31. Apache NiFi • Data flow orchestration tool • Guaranteed Delivery • Data provenance (important in the Analytics Tier) • Backpressure with release • Flow-specific QoS • Web-based UI for editing data flows • Data flows modifiable at runtime • Supports bi-directional data flows • Integrates with just about any system
  • 32. Apache NiFi Basic Concepts • Flow File: Unit of user data with associated key-value metadata • Processor: Components for creating, sending, receiving, transforming, routing, etc. Flow Files • Connection: Acts as the link between processors. • Flow Controller: Brokers the exchange of data between processors • Process Group: Set of Processors and Connections with Input/Output ports. New components can be created by composition.
  • 33. Apache NiFi minifi • Supplement to NiFi for constrained devices/environments • More suitable for edge devices • Small footprint • Designed to collect data near where it originates an integrate with NiFi
  • 34. Apache NiFi For more information: • https://nifi.apache.org Some of the best technical documentation I’ve ever seen: • https://nifi.apache.org/docs.html
  • 36. Analytics Tier • Where IoT data often (but not always) intersects with Big Data platforms and Cloud Computing • Vertical scaling may suffice
  • 37. Analytics Tier • Many, many options… • [insert your definition of Hadoop here]
  • 38. Analytics Tier Key Platform Considerations: • Unbounded (Stream) data processing frequently necessary • Apache Storm, Apache Flink, etc. • Bounded (Batch) data processing frequently necessary • e.g. Training machine learning models, etc. • Apache Hadoop M/R, Apache Flink, Apache Spark • Time Series DB a common requirement • Apache HBase, Apache Cassandra, etc.
  • 39. Analytics Tier Key Platform Considerations: • Latency matters for many use cases • Latency can add up quickly, depending on the number of “hops” • Windowing semantics and flexibility
  • 40. When? The importance of event time(s).
  • 41. What is Event Time and why is it so important? • Event Times: Origin Time vs. Processing Time • Ex: Airplane Mode • Other types of Event Time: • Enrichment Time • Ingest Time • Processing Time 1, 2, n… • Exit Time (e.g. “return” events, C2, bi-directional communication)
  • 42. Choose a platform/API that gives you the most flexibility with respect to dealing with various event times.
  • 43. Future-Proofing and Scaling Small to Medium Scale: • Not Big Data • Investment in large-scale distributed system infrastructure wouldn’t make sense. • YAGNI (Yet…) • Vertical scaling may suffice
  • 44. Future-Proofing and Scaling Medium to Large Scale: • A single server is no longer cutting it • “V”s are starting to pile up • Need to move to a distributed architecture to scale with increasing demand • Your data is now Big
  • 45. Apache Beam (incubating) • Unified API for dealing with bounded/ unbounded data sources (i.e. batch/ streaming) • One API. Multiple implementations (execution engines). Called “Runners” in Beamspeak.
  • 46. Apache Beam (incubating) • Major focus on Windowing and properly dealing with Event Time(s) • Sliding Windows, Tumbling Windows, Session Windows, etc. • Watermark capabilities for dealing with late data
  • 47. Apache Beam (incubating) • Runner/Execution Engine Availability • Local runner (single machine) • Runners for Google Cloud Dataflow, Flink and Spark • Others underway: Apache Storm, Apache Apex and others
  • 48. Apache Beam (incubating) • Choose the right runner for your current scaling and organizational needs (you can switch later as as necessary) • Understand the limits of different runner implementations • Outside of Google Data Flow, the Flink runner is currently the most feature-complete (this will change)
  • 49. Apache Beam (incubating) For a technical deep dive into Apache Beam: Apache Beam: A Unified Model for Batch and Streaming Data Processing - Davor Bonaci, Google Inc. Thursday 4:10PM, Ballroom A
  • 51. Problem: Data Formats • Many IoT devices transmit data as a raw array of bytes • The format of that data may be proprietary • To be of any use it must be parsed into a machine-readable format (i.e. Schema) • Once parsed, you need to know the schema
  • 52. Problem: Firmware Versions • Deployed IoT devices may be running any number of versions • Data formats may differ between firmware versions • Multiple parsers may be necessary to accommodate different device types and firmware versions
  • 53. Solution: Parser Registry • Allow manufacturers to supply proprietary parsers, load at runtime • Parser API to include way to discover schema • Tag data with device type + firmware version at the hub/gateway • Look up associated parser when data arrives • (This can be done either in either the Transport or Analytics tier)
  • 54. Solution: Schema Registry • When parsers are registered, also register the associated schema • Downstream components (Transport/Analytics Tier) discover schema based on metadata
  • 55. Who owns your IoT data? Hint: It may not be you.
  • 56. Who owns your data? • Beware of 3rd-party device manufacturers • Data is valuable, and everyone wants it • Frequently exclusive access
  • 57. Who owns your data? • Device manufacturers may hoard data. • Retention policies limit how long you can store the data. • Aggregate/Derivative data okay, but what’s the definition?
  • 58. Thank you! Questions? P. Taylor Goetz, Hortonworks @ptgoetz