SlideShare une entreprise Scribd logo
1  sur  36
Case Study
Elasticsearch Ingest @ Cisco Intercloud
Agenda
• Express Overview of StreamSets Data Collector
Kirit Basu, Product Management, StreamSets
• Introduction to Elastic
CatherineJohnson, Solutions Architect, Elastic
• Implementing Shipped Analytics Using StreamSets and Elasticsearch
Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group
Group
Performance Management
for Data Flows
© 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc.
History Founded by Informatica and Cloudera veterans.
Mission Bring operational excellence to managing data in motion.
Challenge Move data efficiently and with quality in the face of change.
Solution Open source software enabling performance management of
data flows.
Use cases Hadoop Ingest, Search Ingest, Message Broker Enablement,
Log Shipping, Cloud Migration, IoT, ...
Momentum Thousands of downloads, hundreds of companies using.
StreamSets At a Glance
© 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc.
StreamSets Data Collector
Adaptable Flows for Efficiency
Design ingest pipelines with minimal coding and
maximum flexibility.
Data Flow KPIs for Control
Monitor and act on data flow performance and
data quality.
Containerized Architecture for Agility
Operate continuously in the face of constant
change.
Open source software for the rapid
development and reliably operation of
complex data flows.
Get Started with StreamSets
http://streamsets.com/opensource
https://github.com/streamsets/datacollector/
#streamsets
March 2016
Introduction to Elastic
Software that makes massive amounts of
structured and unstructured data usable for
search, logging, analytics, and more in mission
critical systems and applications
Examples: Elastic Stack Use Cases
Logging
IT Operations
Application Management
Security Analytics
Analytics Search
Marketing Insights
Business Development
Customer Sentiment
Website Search
Internal/Intranet Search
URL Search
Internal Systems/Applications External Systems/Applications
Developers IT/Ops Business Users
Elastic Solves Many Developer Use Cases
Social
Location
User-
Activity
Machine
(Log files)
Documents
Handles Complex
& Diverse Data
Meets Today’s Core
Developer Requirements
Developer requirements
Many users / use cases
Fast data processing
Large data volumes
Data quality & integrity
Cross-source insights
Solves Critical
Use Cases
Application
Search
Embedded
Search
Logging
Security
Analytics
Operational
Analytics
More …
The Elastic Stack
Ingest
Store, Index,
& Analyze
User Interface
Plugins Monitoring Security Alerting
Elastic Cloud: Hosted Elasticsearch
Thank you!
www.elastic.co
Implementing Shipped Analytics Using
Streamsets and Elasticsearch
Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group
Tymofii Polekhin, Software Engineer
Agenda
MANTL & Shipped
Shipped Analytics for Shipped
Why we need Shipped Analytics?
Archtecture and Data Flow
Streamsets Pipelines
End to end dataflow and performance with Elasticsearch
Benefits of Streamsets
Demo
Microservices managed and scaled separately
Microservices managed by Mesos in a single platform
Microservices architecture for Mesos frameworks and other components
CIS/AWS/Metastack/vSphere/UCS…
Terraform
Spark
Executor N
Spark
Executor 1
Spark
Scheduler
Kafka
Broker N
Kafka
Broker 1
Kafka
Scheduler
Docker Docker
TraefikMicroservices …
REST API
REST API
Scripted provisioning
Direct provisioning
Policy, Auto-scaling
VM1
or
BM1
VM2
or
BM2
VM3
or
BM3
VM4
or
BM4
VM5
or
BM5
Shipped Analytics Cluster
Probe
Probe
Probe
• Both Shipped and Shipped Analytics running on MANTL
• Shipped Analytics – infra and app logs and metrics analysis
mesos-master
mesos-slave
marathon
zookeeper
consul
syslog
frameworks
collectd
cpu
memory
interface
disk
df
load
docker
zookeeper
marathon
mesos-slave
mesos-master
CollectD and Filebeat processes
running on every node in the
cluster.
Infrastructure Layer
Zookeeper Cluster Consul Cluster
Mesos Cluster
Marathon Framework
Kafka Cluster
topbeat filebeat
journalbeat dockerbeat
• Experimenting with Elastic Beats (unified arch., closer to micro-services model)
• Elastic Beats to replace collectd plugins and cAdvisor for containers
<file | top | *>beat collectd
logstash
DNS SRV beats.logstash.service.consul
Data normalization
Tagging
Cluster name decoration
Logstash is a single process per
cluster, discoverable with
standard inter-cluster
discovery mechanism, which
will get metrics from collectd
on every slave and logs from
filebeat on every slave,
normalize data and send to
desired output
DNS SRV collectd.logstash.service.consul
NOTE: currently Logstash is running in Docker container on every node, will be moving to Filebeat and Logstash mesos framework soon
logstash
Kafka 0.9.0.0 supports SSL
authentication and data
encryption for producers.
This is must-have security
when sending data to external
destination through WAN.
Sending data to central SA
cluster for long-term analytics
SSL encryption
WAN
kafka
SSL authentication
Shipped cluster
Shipped Analytics
StreamSets running in Mesos
Spark Cluster mode processing
data from multiple source
Shipped clusters and storing it
in Elasticsearch cluster.
kafka
elasticsearch
Streamsets Spark Streaming Cluster
Spark Job
Master instance
Spark Job Spark Job Spark Job
Lambda Reference Architecture
Monitoring / Analytics Cluster (local, Texas-3)
Global Monitoring / Analytics Cluster (global, Texas-1)
Monitoring / Analytics Cluster (local, Ams. -1 )
Monitoring / Analytics Cluster (local, Lon.-1)
Local components and deployment is the same as global, just smaller
Real-time and batch processing (Lambda), anomaly detection, visualization
SSL
Kafka
SSL
SSL
MQTT
Divide nodes by role for more
stable cluster operation and
ease of scalability
3 master/search nodes
5 live data nodes
3 archive data nodes
master/
search
master/
search
master/
search
live/
data
live/
data
live/
data
live/
data
live/
data
archive
/data
archive
/data
archive
/data
Shards=5 Replicas=4 Shards=5 Replicas=1
archive
/data
archive
/data
CPU=4
RAM=30GB
HDD=4TB
CPU=4
RAM=30GB
HDD=4TB
CPU=4
RAM=30GB
HDD=4TB
Streamsets pipelines process
incoming messages and
transform them according to
business logic requirements,
normalizing metrics and
parsing log lines; popping up
important information using
GROK filters or scripts.
Cluster Name
Decorator
Fields Type
Normalization
Metrics/Logs
Stream Splitter
ES Logs Output
General GROK
Filters
Float Value
Truncate
ES Metrics
Output
Shipped GROK
Logic
Marathon
• Streamsets instances running in docker containers in Marathon
o Easy deployment and scaling
o Fast upgrade to newer version
• Issues we faced with this approach:
o Containers were killed by marathon
o Needed to re-import pipeline every time we launch container
Marathon
• Working with Streamsets trying to resolve the OOM issue we increased
container memory and SDC heap size
• At first, all looked normal and we thought that it was just
starving on resources, but several days later we had SDC killed again
• We increased MEM and HEAP even more – to 16G, but we bought just
another day or two before is was killed again
• Looked like SDC heap were constantly filling with data
that don’t go away and eventually it kills the container
• Also GC was working hard and sometimes we got freezes
up to 60 seconds
• Decided to move out from Docker
Marathon
• Streamsets reading JSON messages from Kafka cluster and output
to Elasticsearch cluster
o De-serializing and serializing JSON was very slow with single
threaded process
o Consuming from Kafka performance test showed:
 JSON format: 5k records/sec avg
 Text format: 50k records/sec avg
 Binary format: 250k records/sec avg
• Streamsets team were very proactive with this issues
and in 2 days we received a fix for multi-threaded JSON parsing
o New testing showed:
 JSON format: 66k records/sec avg
Marathon
• Streamsets has never failed because of any internal logic bugs
but we kept seeing this oom-killer popping up and recovering was
not automated
• We decided to leave docker and run SDC natively on host,
still using Marathon for scaling and failover
• Without docker, we now can upload our pipeline on SDC startup,
and it will start working as soon as instance has loaded
We can freely scale up/down whenever we need
Also, we got rid of oom-killer issue as well
Each one of our 3 SDC instances already processes ~3B messages, with no issues!
• Streamsets pipeline consume metrics gathered by collectd
and logs gathered by logstash from 4 different clusters
(including self), transform and decorate them and send to
Elasticsearch for storage and analytics.
• First of all we consume messages from Kafka topic at
average of 5,000 messages per second. The consumer
itself parses JSON-format and sends further.
• Next stage is a JavaScript script that decorates messages
with cluster name, based on a instance hostname in that
message
• Finally, we exclude Marathon events from stream sending
them directly to ES
• Next stage will splits stream into 2 parts: logs and metrics
• Metrics are send straight to ES without any transformation
• Logs are the most interesting part:
o We pop docker container logs from stream and
delete “time” field that’s duplicate timstamp and
sending them to ES
o We separate logs from specific clusters, because we
need to apply special logic for them
o Separation is done though mapping IP’s to clusters in
the pipeline realtime
• Collecting data from several Mesos clusters and need to
correlate container metrics with it’s logs
• Use appID taskID and runID to identify specific containers
logs
• Container logs itself have all three of this, while mesos-
master and mesos-agent logs lacks runID
• All unidentified data is discarded
Current ShippedAnalytics prod cluster configuration:
Kafka Cluster: 7 brokers with 4CPU and 16GB RAM each
Logstash topic for all incoming messages with 7 partitions and 2 replicas
Current data flow is avg 5000 messages/sec to Kafka
Current data size is avg 1,2MB/sec to Kafka
Streamsets: 3 instances with identical pipeline configuration reading from Kafka cluster
7 partitions are split between 3 instances like 3/2/2
All 3 instances running natively on host (non-docker) with Marathon
Marathon restarts failed instance with automatic pipeline upload and start
Elasticsearch: 7 nodes with 4CPU, 16GB RAM and 2TB storage each
Each metrics is written to its own index, total of 15 indexes
Each index has 5 primary shards and 5 replica shards
Total Doc count: 17,5B Total Doc size: 3.84TB
1 Day rate count: ~500M 1 Day rate size: ~120GB
Streamsets is a great product to work with, also team is super helpful and works fast
• Lots of input and output connectors, huge processing capabilities
• Very intuitive and rich User Interface
• Easy to create pipelines visually, instead of writing code
• Clear data flow paths
• Small resource consumption compared to performance
• Easily can handle up to 10k records/sec to Elasticsearch with 1CPU 2GB RAM
• Simple configuration and deployment process
• Opensource(!)
• Fast logic changes with minimum downtime
• Preview mode(!) – check every stage before throwing all your data it
• Rich data transformation possibilities
• GROK filters – easy to migrate from Logstash
• Smart Errors handling
• Reliable: not once did Streamets crashed by itself – only Docker, Marathon, Mesos issues
Thank You!

Contenu connexe

Tendances

Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexDataWorks Summit
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...confluent
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 PresentationsAna Rebelo
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to OneSerg Masyutin
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Spark Summit
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector Yahoo Developer Network
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesSingleStore
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookDatabricks
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioPiotr Czarnas
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Spark Summit
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 

Tendances (20)

Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.io
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 

En vedette

Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheuskawamuray
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudRick Bilodeau
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big DataStreamsets Inc.
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) Andreas Chatzakis
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...DataStax
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackAnirvan Chakraborty
 
Demystifying salesforce for developers
Demystifying salesforce for developersDemystifying salesforce for developers
Demystifying salesforce for developersHeitor Souza
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringAnant Rustagi
 
Extreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes WebinarExtreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes WebinarSalesforce Developers
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormEdureka!
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
 
Cassandra Day Atlanta 2016 - Monitoring Cassandra
Cassandra Day Atlanta 2016  - Monitoring CassandraCassandra Day Atlanta 2016  - Monitoring Cassandra
Cassandra Day Atlanta 2016 - Monitoring Cassandraaaronmorton
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Spark Summit
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Brian Brazil
 
Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by SalesforceThinqloud
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Data Con LA
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMartin Zapletal
 

En vedette (20)

Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big Data
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
 
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Demystifying salesforce for developers
Demystifying salesforce for developersDemystifying salesforce for developers
Demystifying salesforce for developers
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
Extreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes WebinarExtreme Salesforce Data Volumes Webinar
Extreme Salesforce Data Volumes Webinar
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Cassandra Day Atlanta 2016 - Monitoring Cassandra
Cassandra Day Atlanta 2016  - Monitoring CassandraCassandra Day Atlanta 2016  - Monitoring Cassandra
Cassandra Day Atlanta 2016 - Monitoring Cassandra
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
 
Handling of Large Data by Salesforce
Handling of Large Data by SalesforceHandling of Large Data by Salesforce
Handling of Large Data by Salesforce
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Salesforce REST API
Salesforce  REST API Salesforce  REST API
Salesforce REST API
 

Similaire à Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud

Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john malloryAmazon Web Services
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)Mathew Beane
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyDaniel Hochman
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business ProblemsKen Owens
 

Similaire à Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud (20)

Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)ELK Ruminating on Logs (Zendcon 2016)
ELK Ruminating on Logs (Zendcon 2016)
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 

Dernier

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Dernier (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud

  • 1. Case Study Elasticsearch Ingest @ Cisco Intercloud
  • 2. Agenda • Express Overview of StreamSets Data Collector Kirit Basu, Product Management, StreamSets • Introduction to Elastic CatherineJohnson, Solutions Architect, Elastic • Implementing Shipped Analytics Using StreamSets and Elasticsearch Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group Group
  • 4. © 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc. History Founded by Informatica and Cloudera veterans. Mission Bring operational excellence to managing data in motion. Challenge Move data efficiently and with quality in the face of change. Solution Open source software enabling performance management of data flows. Use cases Hadoop Ingest, Search Ingest, Message Broker Enablement, Log Shipping, Cloud Migration, IoT, ... Momentum Thousands of downloads, hundreds of companies using. StreamSets At a Glance
  • 5. © 2015 StreamSets, Inc. All rights reserved. May not be copied, modified, or distributed in whole or part without written consent of StreamSets, Inc. StreamSets Data Collector Adaptable Flows for Efficiency Design ingest pipelines with minimal coding and maximum flexibility. Data Flow KPIs for Control Monitor and act on data flow performance and data quality. Containerized Architecture for Agility Operate continuously in the face of constant change. Open source software for the rapid development and reliably operation of complex data flows.
  • 6. Get Started with StreamSets http://streamsets.com/opensource https://github.com/streamsets/datacollector/ #streamsets
  • 8. Software that makes massive amounts of structured and unstructured data usable for search, logging, analytics, and more in mission critical systems and applications
  • 9. Examples: Elastic Stack Use Cases Logging IT Operations Application Management Security Analytics Analytics Search Marketing Insights Business Development Customer Sentiment Website Search Internal/Intranet Search URL Search Internal Systems/Applications External Systems/Applications Developers IT/Ops Business Users
  • 10. Elastic Solves Many Developer Use Cases Social Location User- Activity Machine (Log files) Documents Handles Complex & Diverse Data Meets Today’s Core Developer Requirements Developer requirements Many users / use cases Fast data processing Large data volumes Data quality & integrity Cross-source insights Solves Critical Use Cases Application Search Embedded Search Logging Security Analytics Operational Analytics More …
  • 11. The Elastic Stack Ingest Store, Index, & Analyze User Interface Plugins Monitoring Security Alerting Elastic Cloud: Hosted Elasticsearch
  • 13. Implementing Shipped Analytics Using Streamsets and Elasticsearch Dmitri Chtchourov, Innovation Architect, Cloud Solutions CTO Group Tymofii Polekhin, Software Engineer
  • 14. Agenda MANTL & Shipped Shipped Analytics for Shipped Why we need Shipped Analytics? Archtecture and Data Flow Streamsets Pipelines End to end dataflow and performance with Elasticsearch Benefits of Streamsets Demo
  • 15. Microservices managed and scaled separately Microservices managed by Mesos in a single platform Microservices architecture for Mesos frameworks and other components CIS/AWS/Metastack/vSphere/UCS… Terraform Spark Executor N Spark Executor 1 Spark Scheduler Kafka Broker N Kafka Broker 1 Kafka Scheduler Docker Docker TraefikMicroservices … REST API REST API Scripted provisioning Direct provisioning Policy, Auto-scaling VM1 or BM1 VM2 or BM2 VM3 or BM3 VM4 or BM4 VM5 or BM5
  • 16.
  • 17. Shipped Analytics Cluster Probe Probe Probe • Both Shipped and Shipped Analytics running on MANTL • Shipped Analytics – infra and app logs and metrics analysis
  • 19. Infrastructure Layer Zookeeper Cluster Consul Cluster Mesos Cluster Marathon Framework Kafka Cluster topbeat filebeat journalbeat dockerbeat • Experimenting with Elastic Beats (unified arch., closer to micro-services model) • Elastic Beats to replace collectd plugins and cAdvisor for containers
  • 20. <file | top | *>beat collectd logstash DNS SRV beats.logstash.service.consul Data normalization Tagging Cluster name decoration Logstash is a single process per cluster, discoverable with standard inter-cluster discovery mechanism, which will get metrics from collectd on every slave and logs from filebeat on every slave, normalize data and send to desired output DNS SRV collectd.logstash.service.consul NOTE: currently Logstash is running in Docker container on every node, will be moving to Filebeat and Logstash mesos framework soon
  • 21. logstash Kafka 0.9.0.0 supports SSL authentication and data encryption for producers. This is must-have security when sending data to external destination through WAN. Sending data to central SA cluster for long-term analytics SSL encryption WAN kafka SSL authentication Shipped cluster Shipped Analytics
  • 22. StreamSets running in Mesos Spark Cluster mode processing data from multiple source Shipped clusters and storing it in Elasticsearch cluster. kafka elasticsearch Streamsets Spark Streaming Cluster Spark Job Master instance Spark Job Spark Job Spark Job
  • 23. Lambda Reference Architecture Monitoring / Analytics Cluster (local, Texas-3) Global Monitoring / Analytics Cluster (global, Texas-1) Monitoring / Analytics Cluster (local, Ams. -1 ) Monitoring / Analytics Cluster (local, Lon.-1) Local components and deployment is the same as global, just smaller Real-time and batch processing (Lambda), anomaly detection, visualization SSL Kafka SSL SSL MQTT
  • 24. Divide nodes by role for more stable cluster operation and ease of scalability 3 master/search nodes 5 live data nodes 3 archive data nodes master/ search master/ search master/ search live/ data live/ data live/ data live/ data live/ data archive /data archive /data archive /data Shards=5 Replicas=4 Shards=5 Replicas=1 archive /data archive /data CPU=4 RAM=30GB HDD=4TB CPU=4 RAM=30GB HDD=4TB CPU=4 RAM=30GB HDD=4TB
  • 25. Streamsets pipelines process incoming messages and transform them according to business logic requirements, normalizing metrics and parsing log lines; popping up important information using GROK filters or scripts. Cluster Name Decorator Fields Type Normalization Metrics/Logs Stream Splitter ES Logs Output General GROK Filters Float Value Truncate ES Metrics Output Shipped GROK Logic
  • 26. Marathon • Streamsets instances running in docker containers in Marathon o Easy deployment and scaling o Fast upgrade to newer version • Issues we faced with this approach: o Containers were killed by marathon o Needed to re-import pipeline every time we launch container
  • 27. Marathon • Working with Streamsets trying to resolve the OOM issue we increased container memory and SDC heap size • At first, all looked normal and we thought that it was just starving on resources, but several days later we had SDC killed again • We increased MEM and HEAP even more – to 16G, but we bought just another day or two before is was killed again • Looked like SDC heap were constantly filling with data that don’t go away and eventually it kills the container • Also GC was working hard and sometimes we got freezes up to 60 seconds • Decided to move out from Docker
  • 28. Marathon • Streamsets reading JSON messages from Kafka cluster and output to Elasticsearch cluster o De-serializing and serializing JSON was very slow with single threaded process o Consuming from Kafka performance test showed:  JSON format: 5k records/sec avg  Text format: 50k records/sec avg  Binary format: 250k records/sec avg • Streamsets team were very proactive with this issues and in 2 days we received a fix for multi-threaded JSON parsing o New testing showed:  JSON format: 66k records/sec avg
  • 29. Marathon • Streamsets has never failed because of any internal logic bugs but we kept seeing this oom-killer popping up and recovering was not automated • We decided to leave docker and run SDC natively on host, still using Marathon for scaling and failover • Without docker, we now can upload our pipeline on SDC startup, and it will start working as soon as instance has loaded We can freely scale up/down whenever we need Also, we got rid of oom-killer issue as well
  • 30. Each one of our 3 SDC instances already processes ~3B messages, with no issues!
  • 31. • Streamsets pipeline consume metrics gathered by collectd and logs gathered by logstash from 4 different clusters (including self), transform and decorate them and send to Elasticsearch for storage and analytics. • First of all we consume messages from Kafka topic at average of 5,000 messages per second. The consumer itself parses JSON-format and sends further. • Next stage is a JavaScript script that decorates messages with cluster name, based on a instance hostname in that message • Finally, we exclude Marathon events from stream sending them directly to ES
  • 32. • Next stage will splits stream into 2 parts: logs and metrics • Metrics are send straight to ES without any transformation • Logs are the most interesting part: o We pop docker container logs from stream and delete “time” field that’s duplicate timstamp and sending them to ES o We separate logs from specific clusters, because we need to apply special logic for them o Separation is done though mapping IP’s to clusters in the pipeline realtime
  • 33. • Collecting data from several Mesos clusters and need to correlate container metrics with it’s logs • Use appID taskID and runID to identify specific containers logs • Container logs itself have all three of this, while mesos- master and mesos-agent logs lacks runID • All unidentified data is discarded
  • 34. Current ShippedAnalytics prod cluster configuration: Kafka Cluster: 7 brokers with 4CPU and 16GB RAM each Logstash topic for all incoming messages with 7 partitions and 2 replicas Current data flow is avg 5000 messages/sec to Kafka Current data size is avg 1,2MB/sec to Kafka Streamsets: 3 instances with identical pipeline configuration reading from Kafka cluster 7 partitions are split between 3 instances like 3/2/2 All 3 instances running natively on host (non-docker) with Marathon Marathon restarts failed instance with automatic pipeline upload and start Elasticsearch: 7 nodes with 4CPU, 16GB RAM and 2TB storage each Each metrics is written to its own index, total of 15 indexes Each index has 5 primary shards and 5 replica shards Total Doc count: 17,5B Total Doc size: 3.84TB 1 Day rate count: ~500M 1 Day rate size: ~120GB
  • 35. Streamsets is a great product to work with, also team is super helpful and works fast • Lots of input and output connectors, huge processing capabilities • Very intuitive and rich User Interface • Easy to create pipelines visually, instead of writing code • Clear data flow paths • Small resource consumption compared to performance • Easily can handle up to 10k records/sec to Elasticsearch with 1CPU 2GB RAM • Simple configuration and deployment process • Opensource(!) • Fast logic changes with minimum downtime • Preview mode(!) – check every stage before throwing all your data it • Rich data transformation possibilities • GROK filters – easy to migrate from Logstash • Smart Errors handling • Reliable: not once did Streamets crashed by itself – only Docker, Marathon, Mesos issues