2. @zepouet#InfluxDB
:: InfluxDB :: Time Series ::
• About Me
• What is a time serie ?
• State of the Art in 2015
• Why yet another product for time series ?
• Live Demo
• Q/A
8. @zepouet#InfluxDB
What we have to store ?
• At the moment, we have :
• Graphite
• OpenTSDB (events, Hadoop, HBase…)
• Kairos (events, rewrite from OpenTSBD)
• Ganglia (more present in BigData/Hadoop)
• And others…
9. @zepouet#InfluxDB
What we have to collect ?
• At the moment, we have :
• CollectD
• Sensu
• DropWizard/Metrics
• JMXTrans
• Jolokia
11. @zepouet#InfluxDB
Because in 2015, we need
• Simple product to install and manage
• To store millions of points (IoT is here)
• HTTP native support (JSON)
• Build with API
• Automatically clear out old data
• Easy scalable : cloud is a buzzword
14. @zepouet#InfluxDB
Feedback
•Data volume :
•1 event / sensor / minute
•1 * 60 * 24 = 1440 events per day
•42.300 events per month
•518.400 events per year
•First error : use MYSQL
•Second error : bad pattern with InfluxDB
17. @zepouet#InfluxDB
InfluxDB :: design goals
• Simple to install and manage thank to Go.
• No external dependencies like Zookeeper and Hadoop.
• HTTP(s) interface for reading and writing data.
• Horizontally scalable.
• On disk and in memory. Most data is cold.
• Compute percentiles and others functions on the fly.
• Downsample data on different windows of time.
18. @zepouet#InfluxDB
InfluxDB :: installing
• MacOS : $ brew install influxdb
• Debian : $ sudo dpkg -i influxdb_latest_amd64.deb
• CentOS : $ sudo rpm -ivh influxdb-latest-1.x86_64.rpm
• Docker : $ docker run tutum/influxdb
• Soon ARM and Windows
20. @zepouet#InfluxDB
InfluxDB :: design
• Database (like in Mysql, Postgres…)
• Time Series (kind of like tables with time, sequence number and
columns)
• A timeserie is composed by points or events (kinds of like
rows)
• Primary index is always time
• Null values are not stored
• You can have millions of series
21. @zepouet#InfluxDB
InfluxDB :: security
• Cluster admins
• Database admins
• Database users
• Read permissions
• only certains series
• only queries with a column having a specific value (e.g. customer_id = 32)
• Write permissions
• only certains series
• only columns having a specific value
23. @zepouet#InfluxDB
InfluxDB :: Pitfalls
• Schemaless Warning
• Data partinioning with one serie
Time Name Host Metrics
3236765 cpu web0 78
3236765 disk_io web0 98344
3236765 load db1 5
3236765 eth_0 ldap0 8755
24. @zepouet#InfluxDB
Time Name Host Metrics
3236765 disk_io web0 98344
3236766 disk_io web0 98354
3236767 disk_io web0 98224
3236768 disk_io web0 98994
Time Name Host Metrics
3236765 eth_0 ldap0 8755
3236766 eth_0 ldap0 8721
3236767 eth_0 ldap0 8734
3236768 eth_0 ldap0 8723
Time Name Host Metrics
3236765 cpu web0 78
3236766 cpu web0 77
3236767 cpu web0 79
3236768 cpu web0 76
Time Name Host Metrics
3236765 load db1 5
3236766 load db1 6
3236767 load db1 5
3236768 load db1 7
25. @zepouet#InfluxDB
InfluxDB :: Why so many series?
• To take advantage of the Storage engines
• Points are indexed by time, not by any other
columns
• Tricks : easily work with grafana
InfluxDB works best with large number of series with
fewer columns in each one
26. @zepouet#InfluxDB
:: Query Langage
• select * from /.*/ limit 1
• select val1, val2 from serverA
• select cpu from /server.*/
• select * from /.*/ where time > now() - 1h
• select * from /.*/ where time > ‘2013-08-12 23:32:00’
• select * from /.*/ group by time(10m)
• select count(val) from /.*/ group by time(10m)
• select percentile(val, 95) from /.*/ group by time(10m)
• select count(distinct(val)) from /.*/
27. @zepouet#InfluxDB
:: Query Langage
• DELETE
• delete from response_times where time < now() - 1h
• delete from /^stats.*/ where time < now() - 7d
• drop series response_times
• GROUP BY
• select count(type) from events group by time(10m);
• select count(type),type from events group by time(10m), type;
28. @zepouet#InfluxDB
:: Visualize and summarize
• Graphs
• Last 10 minutes
• Last 4 hours
• Last 24 hours
• Past week
• Past month
• All time
29. @zepouet#InfluxDB
:: Merging :: Series
• select count(type)
from user_events merge admin_events
group by time(10m)
• select mean(value)
from merge(/.*az.1.*.cpu/)
group by time(1h)
30. @zepouet#InfluxDB
:: Joining :: Series
• select hosta.value + hostb.value
from cpu_load as hosta inner join cpu_load as hostb
where hosta.host = 'hosta.influxdb.orb'
and hostb.host = ‘hostb.influxdb.org’;
• select errors_per_minute.value / page_views_per_minute.value
from errors_per_minute inner join page_views_per_minute
31. @zepouet#InfluxDB
:: Naming Strategy :: 0.8
• Tag versus Value
• Rule :
<tagName>.<tagValue>.serieName
• Examples :
arduino.uno.shield.ethernet.sensor.dht11.temperature
arduino.uno.shield.ethernet.sensor.dht11.temperature
arduino.uno.shield.wifi.sensor.dht22.humidity
arduino.uno.shield.wifi.sensor.dht22.humidity