4. What is InfluxDB?
Q2
Open source
Time series
Written in Go
Easy to use
Automated data retention policy
Schemaless
Client libraries available for the development
Storing large amounts of data and providing rapid query results
Developing very fast
3-15Guamaral Vasili
5. How to connect InfluxDB?
Q2
CLI
Admin interface
4-15Guamaral Vasili
6. Data Structure
Q2
Zero to many points
Measurement
Fields
Tags
Timestamp
Line protocol
Data type
5-15Guamaral Vasili
<measurement>[,<tag-key>=<tag-value>...]
<field-key>=<field-value>[,<field2-key>=<field2-value>...]
[unix-nano-timestamp]
String
Float, Int, Boolean, String
7. Data Structure
Q2
Measurement
Name is the description of data
Tags
if they’re commonly-queried meta data
if you plan to use them with GROUP BY()
Fields
At least one key-value field required
if you plan to use them with an InfluxQL function
if you need them to be something other than a string
Timestamp
Primary index is always time
6-15Guamaral Vasili
8. Query Language
Q2
SQL-like query language
HTTP-API for writes & queries
Continuous queries
Support some mathematical operators
Support some functions
Support some tools
Automated data retention policy
7-15Guamaral Vasili
13. Continuous Query
Q2
Runs automatically and periodically
Syntax
Meta syntax
Query syntax
12-15Guamaral Vasili
CREATE CONTINUOUS QUERY ON
<db_name> [RESAMPLE [EVERY <interval>]
[FOR <interval>]]
BEGIN
SELECT <function>(<stuff>)[,<function>(<stuff>)]
INTO <different_measurement>
FROM <current_measurement> [WHERE <stuff>]
GROUP BY time(<interval>)[,<stuff>]
END
17. Thank you for your attention!
Do you have any question?
Guamaral Vasili
LinkedIn: https://www.linkedin.com/in/guamaral-vasil-707393a5
GitHub: https://github.com/GuamaralVasili/influxDb
Sapienza University of Rome – DIAG – Pervasive Systems
Notes de l'éditeur
Good morning. My name is Guamaral. I am an exchange student from Mongolia. I will introduce the InfluxDatabase.
Before I talk about InfluxDb, I would like to tell you briefly about time series data.
A time series data is a sequence of observations which are ordered in time or space.
For example: closing amounts of the stock markets,
the periodically measured temperatures
CPU loads of your computer
and measurement of your heart rate…sth like that
This is the Influxdata platform. It is end-to-end platform for managing, collecting, storing and visualizing time-series data at scale. There are 4 main components. Those are telegraf, chronograf, influxdb and kapacitor.
Telegraf is used by collecting data. Chronograf is used by visualizing data.
Kapacitor is a data processing engine for InfluxDB that makes it easy to create alerts, run ETL(extract transform and load) jobs and detect anomalies.
InfluxDb is used by storing data. It is the core component of influxdata platform.
What is influxDb?
It is an open source time-series database. specifically designed for time series data. also designed for high-availibility and I/O speed.
Written in Go.
It doesn’t require any other software to install and run.
It has a retention policy that describes how long it keeps data and how many copies of those data are stored in cluster.
Influxdb is schemaless[skimeles] databaase. Because it’s easy to add points.
Client libraries are available for the development such as Java C#, Php, python, perl, javascript, ruby… etc Library is very rich.
it is mainly focused on quickly storing large amounts of incoming data and providing rapid query results on the datasets.
It is developing very fast. Because when I studied influxdb 4 weeks ago, The last version was 0.10version. But now it is already 0.12version.
We can connect to InfluxDB in a 2 ways. The very common way to connect is using command line interface. Just type influx. Then it will automatically connect to the influxdb.
When you install influxdb, the influx command should be available via the command line.
The second way is to connect using admin user interface. http://localhost:8083.
on port 8083. The interface looks like this.
As I said before data in InfluxDB is organized by “time series”. It can have zero to many points.
Points consists measurement, field, tag and timestamp.
To write a point into influxdb, use a line protocol that is a text based format protocol. This is the format of line protocol.
You must specify the measurement and tags are optional if you insert any tag you have to separate it by comma. There must be at least one field. First field is separated by space and other fields are separated by comma each other. Timestamp is optional and is separated by space.
For the Datatypes:
Measurements, tag keys, tag values, and field keys are always stored as strings in the database. For the field values can be stored as float, int, boolean, or string because a field value is always associated with a timestamp.
All subsequent(daraagiin) field values must match the type of the first point.
If you insert boolean data a field at first time, then you have to insert only boolean values in that field not strings, integers or floats.
I will talk about detailed description of point members.
Measurement is similar to SQL table. The measurement name is the description of the data that are stored in the associated fields. And primary index is always time. Because it is a time series database.
Tags and fields are similar to SQL table column.
Tags are optional. You don’t need to have tags in your data structure, but it’s generally a good idea to make use of them because, tags are indexed. Tags are ideal for storing commonly-queried metadata. Therefore queries on tags are worked more quickly than those on fields.
You have to store your data in tags, if they’re commonly-queried meta data or if you plan to use them with GROUP BY()
Fields are a required InfluxDB’s data structure - you cannot have data in InfluxDB without fields.
You have to store your data in fields, if you plan to use them with an influxdb functions or if you need them to be something other than a string
Timestamp is not required. When no timestamp is provided, the server will insert the point with the local server timestamp. Timestamps must be in Unix time and are assumed to be in nanoseconds.
InfluxQL is an SQL-like query language for interacting with data in InfluxDB.
It uses HTTP-API for writing data and querying data.
Instead of stored procedure, it uses continuoues queries.
InfluxDb supports some mathematical operators. Such as Addition, substraction, multiplication and division.
But doesn’t support Inequalities and Miscellaneous(Mislienies and logical operators.
It uses some functions and visualization tools And it has automated data retention policy.
There are many ways to write data into InfluxDb including command line interface, client libraries and plugins for common data formats. Among them 2 are very common.
The first one is CLI.
To write points using the command line interface, use the insert command. The CLI will return nothing on success and should give an informative parser error if the point cannot be written.
This is the insert query. Then we can select our new point.
Second one is write data, Using the HTTP-API.
To write points using HTTP, POST to the /write endpoint at port 8086 with curl.
The body of the POST is line protocol.
Successful writes will return a 204 HTTP Status Code.
Invalid syntax will return a 400 code.
You can write multiple points by separating each point with a new line.
Or you can write points from a file by passing @filename to curl.
You can also perform a query using http-api.
To perform a query send a GET request to the /query endpoint.
You must specify a target database in the db query parameter and specify your query in the q query parameter.
InfluxDB returns JSON response.
You can also change the timestamp format.
There are 3 kinds of function. Using them you can aggregate, select and transform your data.
CQ is similar to sql stored procedure. CQs run automatically and periodically within a database and write the query results to another measurement.
CQ syntax separated into meta syntax and query syntax.
Meta syntax:
A CQ belongs to a database.
ON <database_name> clause, you have to specify the database where you want the CQ to live with .
The optional RESAMPLE clause determines how often InfluxDB runs the CQ. The RESAMPLE clause must specify either EVERY, or FOR, or both. For example: every 2minutes or during 2 minutes InfluxDb runs your CQ.
Without the RESAMPLE clause, InfluxDB runs the CQ at the same interval as the GROUP BY time().
Query syntax:
In this section, we can write our query in the select clause.
INTO clause determines where do you want save your query results. It is your destination measurement to save your results.
FROM clause determines the current measurement that you want to calculate data.
Time interval determines, you calculates the 30 minutes of the field.
CQ requires a function in the SELECT clause and must include a GROUP BY time() clause
You also can create, show, drop continuous query. CQ don’t backfill data.
I create a continuous query that calculates median of buffered memory and mean of free memory on database telegraf. CQ calculates 2 minute time mean and median. It runs every 1 minutes.
Here is the continuous query already created on the telegraf database.
Here the continuous query worked and inserted the query results into mem_copy measurement.
Retention policy describes for how long InfluxDb keeps data and how many copies of those data are stored in the cluster.
When you create a database, InfluxDB automatically creates an RP called default with an infinite duration.
DURATION determines how long InfluxDB keeps the data
REPLICATION determines how many independent copies of each point are stored in the cluster, where 1 is the number of data nodes.
DEFAULT sets the new retention policy as the default retention policy for the database.
One_day RP is active. Because default clause is true.
//
Very simple example with retention policy and continuous query is
For example : server admins have to work on the statistical data. They have to calculate every 15 minutes mean data for everyday. In that case we can solve this problem using retention policy and continuous query.
1. We need a retention policy on the database to be a 1 day policy. InfluxDB automatically deletes the datas that are older than 1 day.
2. Then we need a continuous query that calculates 15 minutes average data. Also there are many examples.
Influx db connects many tools such as telegraf, grafana, chronograf…etc.
Telegraf is a plugin-driven server agent for collecting & reporting metrics. I used telegraf sample database that collects cpu measurements automatically.
Grafana and Chronograf are used for visualizing time series data. The most of the users of Influxdb use Grafana to visualize their data.
Chronograf is also visualization tool written in Go, simple as installing. The set of features on the last initial release is small,
it’s just the starting point for an application and toolkit for data visualization of time series data from InfluxDB.
Recently influxdata team announced officially that chronograf is the official data visualization tool for influxDb. Because they wanted something that could be used by non-programmers to quickly get answers to questions about their time series data.
Let’s show some examples on Chronograf.
I have 2 databases that are NOAA_water_database and telegraf.
First we have to connect database that is telegraf. Then apply.
lets visualize average of cpu usage idle process
first we have to connect database that is telegraf.
There are 2 sections filter by and extract by.
In filter by, you have to choose measurement and you can filter by tags in your where clause. Tag is optional.
tag key is cpu
tag value is cpu-total
In the extract by section you can select any of your field key with some function.
Example I want to select average of usage_idle field group by 15 minutes.
here is our graf
you can configure from what time until what time
*************************
Also you can create your own dashboard. You can add your old visualization into your new dashboard. It is our new dashboard.