SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Real-Time log analysis with
Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale
whoami
CEO of Elodina http://www.elodina.net/ a big data as a service platform built on top open source
software. The Elodina platform enables customers to analyze data streams and programmatically
react to the results in real-time. We solve today’s data analytics needs by providing the tools and
support necessary to utilize open source technologies. As users, contributors and committers,
Elodina also provides support for frameworks that run on Mesos including Apache Kafka,
Exhibitor (Zookeeper), Apache Storm, Apache Cassandra and a whole lot more!
Apache Kafka Committer & PMC Member
LinkedIn: http://linkedin.com/in/charmalloc
Twitter : @allthingshadoop
2© 2015. All Rights Reserved.
1 Intro To Mesos, Kafka, Etc
2 Architecture Overview
3 Breaking it down into pieces
4 Questions?
3© 2015. All Rights Reserved.
Apache Mesos
4© 2015. All Rights Reserved.
Mesos Papers
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf
Google Borg - https://research.google.com/pubs/pub43438.html
Google Omega: flexible, scalable schedulers for large compute clusters http:
//eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf
5
Static Partitioning
6
Static Partitioning
7
Static Partitioning
8
Static Partitioning
9
Fine Grained Resource Elasticity
"If people knew how low it really is, we’d all get fired."
https://gigaom.com/2013/11/30/the-sorry-state-of-server-utilization-and-the-impending-post-hypervisor-era/
10
An operating system for your data center
11
EVERYTHING ON MESOS
12
How it works
13
Marathon
14
https://github.com/mesosphere/marathon
Cluster-wide init and control system for
services in cgroups or docker based on
Apache Mesos
Docker on Marathon
{
"id": "basic-3",
"cmd": "python3 -m http.server 8080",
"cpus": 0.5,
"mem": 32.0,
"container": {
"type": "DOCKER",
"docker": {
"image": "python:3",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 8080, "hostPort": 0 }
]
}
}
}
15
Apache Kafka
16
Kafka papers
Apache Kafka was first open sourced by LinkedIn in 2011
Papers
● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf
● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en-
us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf
● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering.
linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
http://kafka.apache.org/
17
How Big Data Starts
18
More Big Data! More!
19
uhhhh
20
eeesh
21
Kafka de-couples data pipelines
22
Distributed Replicated Log
Read & Write
In real time
As much as you want
As fast as your network
23
Reference Architecture
24
Producers
syslog → Kafka via docker https://hub.docker.com/r/stealthly/syslog/
syslog → Kafka scheduler https://github.com/stealthly/syslog-service
statsd → Kafka scheduler https://github.com/stealthly/statsd-mesos-kafka
system stats collection → Kafka scheduler https://github.com/stealthly/syscol
tailf → Kafka https://github.com/stealthly/go_kafka_client/tree/master/producers/tailf
Any language https://cwiki.apache.org/confluence/display/KAFKA/Clients
25
Reference Architecture
26
Kafka on Mesos
https://github.com/mesos/kafka
27
Kafka on Mesos
● smart broker.id assignment.
● preservation of broker placement (through constraints and/or new features).
● ability to-do configuration changes.
● rolling restarts (for things like configuration changes).
● scaling the cluster up and down with automatic, programmatic and manual
options.
● smart partition assignment via constraints visa vi roles, resources and
attributes.
28
CLI & REST API
● scheduler - starts the scheduler.
● broker
○ add - adds one more more brokers to the cluster.
○ update - changes resources, constraints or broker properties one or more brokers.
○ remove - take a broker out of the cluster.
○ start - starts a broker up.
○ stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help stop)
● topic
○ list - list topics in cluster
○ add - add new topics in cluster
○ update - change topics in cluster
○ rebalance - allows you to rebalance a cluster either by selecting the brokers or topics to rebalance. Manual
assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication
factor on a topic.
● help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command}
29
Reference Architecture
30
Schema Avro or ProtoBuff
- https://github.com/stealthly/go_kafka_client/blob/master/syslog/syslog_proto/logline.proto
- https://github.com/stealthly/go_kafka_client/blob/master/logline.avsc
logline
• line
• logtypeid
• source
• tags (k/v pairs)
• timings (k/v pairs)
31
Consume from Kafka → Write to Cassandra
Implement CQL write here https://github.
com/stealthly/go_kafka_client/blob/master/consumers/consum
ers.go#L186-L194 with https://github.com/gocql/gocql
Go Kafka Client does fan out work processing, rebalance doesn’
t upset consumers that are reading already.
32
Reference Architecture
33
Sample Spark Job → Cassandra
https://github.com/stealthly/gauntlet
Uses the Cassandra Spark Connector https://github.
com/datastax/spark-cassandra-connector
34
Use DataStax Enterprise to enable Search
http://docs.datastax.com/en/datastax_enterprise/4.
8/datastax_enterprise/srch/srchOverview.html
35
Questions?
36
http://www.elodina.net
Thank you

Contenu connexe

Tendances

Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...Mydbops
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafkaconfluent
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...Michael Stack
 
DalmatinerDB and cockroachDB monitoring plataform
DalmatinerDB and cockroachDB monitoring plataformDalmatinerDB and cockroachDB monitoring plataform
DalmatinerDB and cockroachDB monitoring plataformLeandro Totino Pereira
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
 
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...Jeremy Eder
 
Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Red Hat Developers
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streamsconfluent
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021StreamNative
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggleconfluent
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Timothy Spann
 
Solution for events logging with akka streams and kafka
Solution for events logging with akka streams and kafkaSolution for events logging with akka streams and kafka
Solution for events logging with akka streams and kafkaAnatoly Sementsov
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & BigtopApache Apex
 

Tendances (20)

Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
Analyze corefile and backtraces with GDB for Mysql/MariaDB on Linux - Nilanda...
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafka
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
 
DalmatinerDB and cockroachDB monitoring plataform
DalmatinerDB and cockroachDB monitoring plataformDalmatinerDB and cockroachDB monitoring plataform
DalmatinerDB and cockroachDB monitoring plataform
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)
 
Solution for events logging with akka streams and kafka
Solution for events logging with akka streams and kafkaSolution for events logging with akka streams and kafka
Solution for events logging with akka streams and kafka
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
 
Apache Apex & Bigtop
Apache Apex & BigtopApache Apex & Bigtop
Apache Apex & Bigtop
 

En vedette

Docker @ RelateIQ Presentation
Docker @ RelateIQ PresentationDocker @ RelateIQ Presentation
Docker @ RelateIQ PresentationJohn Fiedler
 
BIG DATA サービス と ツール
BIG DATA サービス と ツールBIG DATA サービス と ツール
BIG DATA サービス と ツールNgoc Dao
 
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A ServiceScala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A ServiceRomeo Kienzler
 
Using docker for data science - part 2
Using docker for data science - part 2Using docker for data science - part 2
Using docker for data science - part 2Calvin Giles
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka
 
Growing the Mesos Ecosystem
Growing the Mesos EcosystemGrowing the Mesos Ecosystem
Growing the Mesos EcosystemMesosphere Inc.
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Overview of DataStax OpsCenter
Overview of DataStax OpsCenterOverview of DataStax OpsCenter
Overview of DataStax OpsCenterDataStax
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Data Con LA
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and SparkFelix Crisan
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka confluent
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and KafkaN Masahiro
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Data Con LA
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaJay Kreps
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internalsAnton Kirillov
 

En vedette (20)

Docker @ RelateIQ Presentation
Docker @ RelateIQ PresentationDocker @ RelateIQ Presentation
Docker @ RelateIQ Presentation
 
BIG DATA サービス と ツール
BIG DATA サービス と ツールBIG DATA サービス と ツール
BIG DATA サービス と ツール
 
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A ServiceScala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
 
Using docker for data science - part 2
Using docker for data science - part 2Using docker for data science - part 2
Using docker for data science - part 2
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
 
Growing the Mesos Ecosystem
Growing the Mesos EcosystemGrowing the Mesos Ecosystem
Growing the Mesos Ecosystem
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Overview of DataStax OpsCenter
Overview of DataStax OpsCenterOverview of DataStax OpsCenter
Overview of DataStax OpsCenter
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and Spark
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 

Similaire à Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale

Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...C4Media
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache MesosJoe Stein
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Lalit Panwar
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedEdureka!
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Micron Technology
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 

Similaire à Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale (20)

Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale

  • 1. Real-Time log analysis with Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale
  • 2. whoami CEO of Elodina http://www.elodina.net/ a big data as a service platform built on top open source software. The Elodina platform enables customers to analyze data streams and programmatically react to the results in real-time. We solve today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. As users, contributors and committers, Elodina also provides support for frameworks that run on Mesos including Apache Kafka, Exhibitor (Zookeeper), Apache Storm, Apache Cassandra and a whole lot more! Apache Kafka Committer & PMC Member LinkedIn: http://linkedin.com/in/charmalloc Twitter : @allthingshadoop 2© 2015. All Rights Reserved.
  • 3. 1 Intro To Mesos, Kafka, Etc 2 Architecture Overview 3 Breaking it down into pieces 4 Questions? 3© 2015. All Rights Reserved.
  • 4. Apache Mesos 4© 2015. All Rights Reserved.
  • 5. Mesos Papers Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf Google Borg - https://research.google.com/pubs/pub43438.html Google Omega: flexible, scalable schedulers for large compute clusters http: //eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf 5
  • 10. Fine Grained Resource Elasticity "If people knew how low it really is, we’d all get fired." https://gigaom.com/2013/11/30/the-sorry-state-of-server-utilization-and-the-impending-post-hypervisor-era/ 10
  • 11. An operating system for your data center 11
  • 14. Marathon 14 https://github.com/mesosphere/marathon Cluster-wide init and control system for services in cgroups or docker based on Apache Mesos
  • 15. Docker on Marathon { "id": "basic-3", "cmd": "python3 -m http.server 8080", "cpus": 0.5, "mem": 32.0, "container": { "type": "DOCKER", "docker": { "image": "python:3", "network": "BRIDGE", "portMappings": [ { "containerPort": 8080, "hostPort": 0 } ] } } } 15
  • 17. Kafka papers Apache Kafka was first open sourced by LinkedIn in 2011 Papers ● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf ● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en- us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf ● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf ● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering. linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying http://kafka.apache.org/ 17
  • 18. How Big Data Starts 18
  • 19. More Big Data! More! 19
  • 22. Kafka de-couples data pipelines 22
  • 23. Distributed Replicated Log Read & Write In real time As much as you want As fast as your network 23
  • 25. Producers syslog → Kafka via docker https://hub.docker.com/r/stealthly/syslog/ syslog → Kafka scheduler https://github.com/stealthly/syslog-service statsd → Kafka scheduler https://github.com/stealthly/statsd-mesos-kafka system stats collection → Kafka scheduler https://github.com/stealthly/syscol tailf → Kafka https://github.com/stealthly/go_kafka_client/tree/master/producers/tailf Any language https://cwiki.apache.org/confluence/display/KAFKA/Clients 25
  • 28. Kafka on Mesos ● smart broker.id assignment. ● preservation of broker placement (through constraints and/or new features). ● ability to-do configuration changes. ● rolling restarts (for things like configuration changes). ● scaling the cluster up and down with automatic, programmatic and manual options. ● smart partition assignment via constraints visa vi roles, resources and attributes. 28
  • 29. CLI & REST API ● scheduler - starts the scheduler. ● broker ○ add - adds one more more brokers to the cluster. ○ update - changes resources, constraints or broker properties one or more brokers. ○ remove - take a broker out of the cluster. ○ start - starts a broker up. ○ stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help stop) ● topic ○ list - list topics in cluster ○ add - add new topics in cluster ○ update - change topics in cluster ○ rebalance - allows you to rebalance a cluster either by selecting the brokers or topics to rebalance. Manual assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication factor on a topic. ● help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command} 29
  • 31. Schema Avro or ProtoBuff - https://github.com/stealthly/go_kafka_client/blob/master/syslog/syslog_proto/logline.proto - https://github.com/stealthly/go_kafka_client/blob/master/logline.avsc logline • line • logtypeid • source • tags (k/v pairs) • timings (k/v pairs) 31
  • 32. Consume from Kafka → Write to Cassandra Implement CQL write here https://github. com/stealthly/go_kafka_client/blob/master/consumers/consum ers.go#L186-L194 with https://github.com/gocql/gocql Go Kafka Client does fan out work processing, rebalance doesn’ t upset consumers that are reading already. 32
  • 34. Sample Spark Job → Cassandra https://github.com/stealthly/gauntlet Uses the Cassandra Spark Connector https://github. com/datastax/spark-cassandra-connector 34
  • 35. Use DataStax Enterprise to enable Search http://docs.datastax.com/en/datastax_enterprise/4. 8/datastax_enterprise/srch/srchOverview.html 35