The IT industry is a diverse and dynamic world where applications and functions may be spread out - and move between a multitude of providers and technologies such as Amazon AWS, Rackspace, KVM, volatile containers, and your internal traditional IT infrastructure with physical servers.
Monitoring all of these might require one monitoring tool per platform, or at least a few to seamlessly blend metrics, events and logs to get true Observability on your environment. OP5's intention is to address this with Project Omega. Designed from the ground up using cloud-native technologies packaged in a container environment to be running on premise or as SaaS, scaling horizontally with Kubernetes.
Initially the focus is on monitoring OpenStack with the Monasca project and developing the agent in and for the community providing patches and reviews since the Queens release of OpenStack, using modern REST API’s, time series database for metrics, message queues using Kafka and preparing the stack to for real-time analysis using Apache Storm.
Monitor everything from physical hardware to application functionality
1. Only 4
days
SUBHEADING TEXT
Monitor everything from
physical hardware to
application functionality
Welcome to our lavish
smorgasbord offering
within IT Monitoring.
OP5 is the market leader
of IT monitoring
throughout the Nordic
region and in over 50
countries around the
world.
2. Passionate software developer at OP5 AB.
Particular interests are coding, cloud, software engineering and architecture,
distributed and scalable systems.
Nicolas Seyvet
3. The IT Monitoring
Software Solution.
From Sweden. For a Global Market. Based on Open Source.
OP5 is a Swedish company founded in 2004. The vision was to develop an IT
monitoring software solution based on the Open Source project Nagios that
would offer an unprecedented user experience. A solution that would be
easy to implement, intuitive to work with and provide unparalleled scalability
to support clients and their ever changing business needs.
Today, OP5 has grown into an International company with a presence in over
60 countries. Thousands of IT professionals across the world rely daily on
solutions from OP5 to monitor their business-critical IT services.
4. The OP5 product Monitor is Nagios
Based on:
- Checks
- Plugins
- BUT static infrastructure
5. Infrastructure:
- Increased number of devices
- Virtual
Applications:
- On-demand deployments (cloud)
- Ephemeral/moving processes
- Distributed
Monitor everything in the data center?
The three Vs of Big Data:
- Volume
- Velocity
- Variety
Dynamic, complex environment
Outpacing humans
Average DC -> ~ 20 000 servers
7. Time series
Event
source
Multiple series of timestamp, value pairs
<series name> (t0, v0) (t1, v1) (t2, v2) (t3, v3) …
metric/event
produces
Time
pod.io.read_bytes_sec
8. Not all sources are created equal
Time
Long lived
Virtual Infrastructure
Application layer
Medium lived
Ephemeral
Physical Infrastructure
9. An example
Let’s assume 20 000 servers with 4 micro-services per server:
Assume 100 metrics per instance:
Out of which:
Add dynamicity and elasticity → 0.01%/s replacement rate:
Then, add the virtual infrastructure, failures in the DC, new racks, etc.
→ 20 000 + 4 x 20 000 = 100 000 instances
→ 10 000 000 active time series
→ 2 000 000 are long lived
8 000 000 are ephemeral
→ 0.01% * 8 000 000 = 80 new time series/s
~6 900 000 new time series per day
11. Monasca (http://monasca.io/) is a open-source multi-tenant, massively scalable,
fault-tolerant monitoring-as-a-service solution.
Main features:
- An event driven architecture.
- A set of REST APIs for high-speed event processing and querying.
- A real-time streaming engine (alarms and transformations)..
- An agent (collector) with plugins.
- A push based system.
Part of the (but not limited to) OpenStack family.
Monasca
12. OpenStack began in 2010 as a joint project between NASA and Rackspace.
Open source software for creating private and public clouds (Infrastructure as a Service)..
Control large pools of compute, storage, and networking resources throughout a datacenter,
managed through a dashboard or via RESTful APIs.
OpenStack
Key Features
16. Monasca API
Data/Event Bus
Publish/
Subscribe
The core
Kafka is an OpenSource massively scalable Pub-Sub message queue:
- horizontally scalable
- fault-tolerant
- high throughput (>100K to millions of events/s)
- at least once guarantee
20. Easy to extend
Data/Event Bus
My Function/App
Persister
Streaming
Engine
Notification
Engine
Event driven architecture.
Publish/
Subscribe
...
21. Highest level:
What to alarm on?
Domain Specific Language (DSL)
Where a sub-expression:
<sub_expression>
::= <function> '(' <metric> [',' period] ')' <operator> threshold_value ['times' periods]
Example:
<expression>
::= <subexpression> [(and | or) <subexpression>]*
avg(disk.space_used_perc{hostname=compute_node_1}) >= 99
and
count(log.error{hostname=compute_node_1,component=kafka},deterministic) >= 1
function
min
max
sum
avg
count
last
23. To sum up:
- Built for self-healing and elasticity (horizontal scalability)
- Can handle billions of time-series at high throughput
- Multi-tenant
- Extensible
- DSL to monitor what matters
- Can combine different sources (metrics/events/logs)
Built on top of Kubernetes, runs on AWS, OpenStack and VMWare.
$ # Deploy in one line
$ helm install op5_monasca
OP5 Monasca
24. OP5 HQ
Norgegatan 2
SE-164 32 Kista
Sweden
+46 (0)8 58 83 01 00
www.OP5.com
inkedin.com/company/OP5/
facebook.com/OP5ab
twitter.com/OP5ab
Call us
Follow us
Nicolas Seyvet
Backend Engineer
Email nseyvet@op5.com
Twitter: @NicolasSeyvet
Blog: http://babounehacks.blogspot.se/
Github: https://github.com/nseyvet
https://github.com/baboune
Questions?