Contenu connexe Similaire à Michael Hausenblas- Scalable time series and stream processing for IoT applications (20) Michael Hausenblas- Scalable time series and stream processing for IoT applications1. © 2016 Mesosphere, Inc. All Rights Reserved.
SCALABLE TIME
SERIES AND
STREAM
PROCESSING
FOR IOT
APPLICATIONS
1
Michael Hausenblas, Developer & Cloud Advocate | 2016-01-16
11. © 2015 Mesosphere, Inc. All Rights Reserved.
LET'S TALK ABOUT WORKLOADS* …
11*) kudos to Timothy St. Clair, @timothysc
batch streaming PaaS
MapReduce
12. © 2015 Mesosphere, Inc. All Rights Reserved.
• Apache Kafka
• ØMQ, RabbitMQ, Disque (Redis-based), etc.
• fluentd, Logstash, Flume
• Akka streams
• cloud-only: AWS SQS, Google Cloud Pub/Sub
• see also queues.io
MESSAGE QUEUES & ROUTERS
12
13. © 2015 Mesosphere, Inc. All Rights Reserved.
APACHE KAFKA
13
• High-throughput, distributed, persistent
publish-subscribe messaging system
• Originates from LinkedIn
• Typically used as buffer/de-coupling
layer in online stream processing
Message queues & routers
kafka.apache.org
14. © 2015 Mesosphere, Inc. All Rights Reserved.
FLUENTD
14
Message queues & routers
www.fluentd.org
15. © 2015 Mesosphere, Inc. All Rights Reserved.
STREAM PROCESSING PLATFORMS
15
• Apache Storm
• Apache Spark
• Apache Samza
• Apache Flink
• Concord
• cloud-only: AWS Kinesis, Google Cloud Dataflow
• see also my webinar on stream processing
16. © 2015 Mesosphere, Inc. All Rights Reserved.
APACHE STORM
16
• Distributed, fault-tolerant stream-
processing platform
• Guaranteed message processing
(replaying messages on failure)
• Concepts: tuples, streams, spouts, bolts,
topologies
Stream processing platforms
storm.apache.org
17. © 2015 Mesosphere, Inc. All Rights Reserved.
APACHE SPARK
17
Stream processing platforms
spark.apache.org
Spark SQL Spark Streaming
MLlib
(machine learning)
Spark core (RDD)
GraphX
(graph processing)
Mesos
Filesystem (local, HDFS, S3) or data store (HBase, Cassandra, Elasticsearch, etc.)
YARNStandalone
18. © 2015 Mesosphere, Inc. All Rights Reserved.
TIME SERIES DATASTORES
18
• InfluxDB
• OpenTSDB
• KairosDB
• Prometheus
• see also iot-a.info
19. © 2015 Mesosphere, Inc. All Rights Reserved.
OPENTSDB
19
• Distributed time series database on top HBase
• Store, index, query & plot metrics
• Extremely scalable
• Low-level monitoring
Time series datastores
opentsdb.net
20. © 2015 Mesosphere, Inc. All Rights Reserved.
INFLUXDB
20
• No-dependency, time series database written in Go
• SQLish query language (incl. regex, fan out)
• Single node or Raft-based distributed node mode
Time series datastores
influxdb.com
21. © 2015 Mesosphere, Inc. All Rights Reserved.
CHALLENGES
21
• Set up and operation of components
• Elasticity: static vs. dynamic partitioning
• Efficient usage of resources (TCO)
22. © 2015 Mesosphere, Inc. All Rights Reserved.
MEET THE
DATACENTER
OPERATING
SYSTEM
(DCOS)
22
23. © 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL OS VS. DISTRIBUTED OS
23
http://bitly.com/os-vs-dcos
24. © 2015 Mesosphere, Inc. All Rights Reserved.
DCOS IS A DISTRIBUTED OPERATING SYSTEM
24
• local OS per node (+container enabled)
• scheduling (long-lived, batch)
• networking
• service discovery
• stateful services
• security
• monitoring, logging, debugging
26. © 2015 Mesosphere, Inc. All Rights Reserved.
BENEFITS
26
DCOS
• Run stateless services such as Web server or app
server and Big Data services like Kafka, Spark, or
Cassandra together on one cluster
• Dynamic partitioning of your cluster, depending on
your business requirements
• Increased utilization (10% → 80%++)
28. © 2015 Mesosphere, Inc. All Rights Reserved. 28
https://mesosphere.com/blog/2015/11/18/dcos-time-series-demo
29. © 2015 Mesosphere, Inc. All Rights Reserved. 29
https://github.com/mesosphere/time-series-demo
30. © 2015 Mesosphere, Inc. All Rights Reserved.
Q & A
30
• @mhausenblas
• mhausenblas.info
• @mesosphere
• mesosphere.io/product
• mesosphere.com/infinity