SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Prasanth Kothuri, CERN
Daniel Lanza Garcia, CERN
Real-time Detection Of Anomalies In
The Database Infrastructure Using
Apache Spark
#EUde13
Outline
• About CERN
• Apache Spark at CERN
• Data Lake Challenges
• Detecting Anomalous Connections
• Real-time Intelligent Monitoring
2#EUde13
About CERN
3#EUde13
• CERN - European Laboratory for Particle Physics
• Founded in 1954 by 12 Countries for fundamental
physics research in the post-war Europe
• Today 21 member states + world-wide
collaborations
• About ~1000 MCHF yearly budget
• 2’300 CERN personnel
• 10’000 users from 110 countries
Fundamental Research
4#EUde13
• What is 95% of the Universe made
of?
• Why do particles have mass?
• Why is there no antimatter left in
the Universe?
• What was the Universe like just
after the “Big Bang”
Who Changed My Plan? 5
The Large Hadron Collider (LHC)
Largest machine in the world
27km, 6000+ superconducting magnets
Emptiest place in the solar system
High vacuum inside the magnets
Hottest spot in the galaxy
During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun;
Fastest racetrack on Earth
Protons circulate 11245 times/s (99.9999991% the speed of light)
Apache Spark @ CERN
• Spark is a popular compute engine for Analytics
+ ETL
– Deployed on four production Hadoop/YARN clusters
• Aggregated capacity (2017): ~1500 physical cores, 11 PB
– Adoption is growing. Key projects involving Spark:
• Analytics for accelerator controls and logging
• Monitoring IT Infrastructure, uses Spark + Spark streaming
• Analytics on aggregated logs
• Explorations on the use of Spark for high energy physics
6#EUdev2
Detection Of Anomalies In
Database Infrastructure
7#EUde13
CERN Database Infrastructure
8#EUde13
• Over 100 Oracle databases, most of them RAC
• Running Oracle 11.2.0.4 and 12.1.0.2
• 950 TB of data files for production DBs in total (without counting
replicas), NAS as storage
• Oracle In-Memory in production since July 2015
• Examples of critical production DBs:
• Quench Protection System: 150’000 changes/s
• LHC logging database: 492 TB, growing 180 TB/year
• But also DB as a Service (single instances)
• 310 MySQL, 70 PostgreSQL, 9 Oracle, 5 InfluxDB
Data Lake
9#EUde13
• Objective: Build central repository for database audit,
performance metrics and logs with the goal of real time
analytics and offline analytics
• Use cases
• Detection of Anomalies
• Troubleshooting
• Capacity Planning
Diverse Data
10#EUde13
• Tables
• Audit
• Performance Metrics
• Alert
• Log files
• Listener
Data pipeline Architecture
11#EUde13
alertlog
Audit
AWR
Kafka
500 events/minute
500 events/minute
500 events/minute
70.000 events/minute
70.000 events/minute
HDFS ACLs
Per-index ACLS
Aggregations and Dashboards
12#EUde13
Real-time analysis
13#EUde13
• Attempt to detect anomalous connections using Machine
Learning techniques
• First approach to use kNN classifier to classify connection
as normal or anomalous
• Anomaly detection suffer from high false positive rate, the
challenge is to choose features that best characterize
user connections pattern
Feature Selection
14#EUde13
• Data Preparation
• Feature Extraction
• Client_host
• Client_user
• Client_program
• Service_name
• Hour_of_the_day
• Day_of_the_week
{
"oracle_sid" : "XXXX_YYYY",
"listener_name": "listener_scan1",
"database_type" : "oracle",
"producer" : "oracle",
"source_type": "listener",
"hostname" : "XXXXX.cern.ch",
"flume_agent_version": "0.1.6-7.el6",
"type" : "dbconnection",
"event_timestamp" : "2017-09-27T04:45:27+0200",
"CONNECT_DATA_SERVER": "DEDICATED",
"CONNECT_DATA_SERVICE_NAME": “XXXr.cern.ch",
"client_program": "python",
"client_host" : "pxxxxj2.cern.ch",
"client_user": "merge",
"client_protocol" : "tcp",
"client_ip": "137.100.100.167",
"client_port": 38432,
"type" : "establish",
"service_name" : "XXXX.cern.ch",
"return_code": 0
}
Detecting anomalous connections job
• Machine learning: kNN
15#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
16#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
17#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
18#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
19#EUde13
Threshold
Detecting anomalous connections job
• Machine learning: kNN
– Making it scalable, each user should behave similarly
20#EUde13
Detecting anomalous connections job
• Distance Function:
21#EUde13
• Spark Streaming Job processing – 10 million / day
• 6 executors, each with 4 cores and 8GB of memory
Metrics monitor job
• A general purpose metrics monitor
22#EUde13
New metrics by
aggregation and
math equations
Metrics
source
Notifications
sink
(optional)
Metrics monitor job: data flow
23#EUde13
Monitor 1
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
Monitor 2
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
Metrics
source
Analysis
results
sink
(optional)
Monitor N
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator 1
Notificator 2
Notificator N
Metrics
source
Notifications
sink
(optional)
Metrics monitor job: modularity
24#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Metrics
source
Analysis
results
sink
(optional)
Notificator
• Easy implementation of new components
– Extending corresponding abstract class
– Stateful operations: a store is available
Metrics
source
Properties
source
• Which metrics should process this monitor?
– CPU Usage of all databases, Cache Hit % of MySQL
databases, all machines in PROD, …
• Filter metrics by attributes
• Accepts exact value or regular expressions
• If not applied, all metrics are processed
25#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Massage metric value
– Average, difference, filter, …
• Internal store for stateful pre-analysis
• If applied, result will be the analyzed value
26#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: average value of a period
27#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: weighted average value of a period
28#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: difference with previous value
29#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: difference with previous value
30#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Is the metric behaving as it should?
• Determine the status of each incoming metric
– OK, WARNING, ERROR, EXCEPTION
• Results can be sent to external storage
• Internal store for stateful analysis
31#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: thresholds fixed manually
32#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning from recent behaviour (average and variance)
– Errors when metric does not behave as it used to
33#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning from recent behaviour (average and variance)
– Easily affected by outliers
34#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning from recent behaviour (percentiles)
35#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
36#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
37#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
38#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: learning a season (hour, day, week)
39#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Should we inform anyone?
– Metric has been in ERROR for 20 minutes…
• Determines when a notification is raised
• Based on metric statuses
• Notifications are consumed by Notifications sink
40#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: constant status for certain period
41#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
• Type: percentage of the period in certain statuses
42#EUde13
Monitor
Filter
(optional)
Pre-analysis
(optional)
Analysis
Notificator
Notificator
Metrics monitor job: configuration
43#EUde13
checkpoint.dir = <path_to_store_stateful_data> (default: /tmp/)
spark.batch.time = <seconds> (default: 30)
metrics.source.kafka-prod.type = kafka
metrics.source.kafka-prod.consumer.bootstrap.servers = server:port,server:port
metrics.source.kafka-prod.topics = db-logging-platform
metrics.source.kafka-prod.parser.attributes = INSTANCE_NAME METRIC_NAME
metrics.source.kafka-prod.parser.value.attribute = VALUE
metrics.source.kafka-prod.parser.timestamp.attribute = END_TIME
results.sink.type = elastic
results.sink.index = itdb_db-metric-results/log
notifications.sink.type = elastic
notifications.sink.index = itdb_db-metric-notifications/log
# Monitors
monitor.<monitor-id-1>.<confs>...
monitor.<monitor-id-2>.<confs>...
monitor.<monitor-id-n>.<confs>...
Properties
source
Metrics monitor job: monitor configuration
44#EUde13
# Monitor CPU of all instances in production
monitor.CPUUsage.filter.attribute.INSTANCE_NAME = regex:prod-.*
monitor.CPUUsage.filter.attribute.METRIC_NAME = CPU Usage Per Sec
monitor.CPUUsage.missing.max-period = 3m
monitor.CPUUsage.pre-analysis.type = weighted-average
monitor.CPUUsage.pre-analysis.period = 5m
monitor.CPUUsage.analysis.type = seasonal
monitor.CPUUsage.analysis.season = hour
monitor.CPUUsage.analysis.learning.ratio = 0.2
monitor.CPUUsage.notificator.error-perc.type = percentage
monitor.CPUUsage.notificator.error-perc.statuses = ERROR
monitor.CPUUsage.notificator.error-perc.period = 10m
monitor.CPUUsage.notificator.warn-perc.type = percentage
monitor.CPUUsage.notificator.warn-perc.statuses = WARNING
monitor.CPUUsage.notificator.warn-perc.period = 20m
• Configuration can be updated while running
• Add/remove monitor or change any parameter
Filter
Pre-
anal
ysis
Anal
ysis
Notificator
Notificator
Metrics monitor job
• We are monitoring:
– Around 30K metrics per minute (day season analysis)
– Processing time = 30 seconds/minute
– On 3 executors (YARN)
• Repo, user’s and developer’s manual, contributions,
issues:
– https://github.com/cerndb/metrics-monitor
45#EUde13
Tips and Suggestions
46#EUde13
Java 8:
47#EUde13
Java 8:
Extend RDDs and Streams + Optional + lambdas + +
java.streaming.* + reference methods =
48#EUde13
Not using Spark check-pointing feature
49#EUde13
• Pro: develop/update the job without loosing stateful data
• Cons: implement save/load
• Note: for stateful operations checkpoint needs to be enable
– But when restarting, data is removed
Conclusion
50#EUde13
• Building the data lake opens up for several kinds of analysis
• Implementation of anomalous connections helps us to identify
potential intrusions to the database infrastructure
• Intelligent monitoring tool learns and adapts the threshold
• Intelligent monitoring tool is open source and available for
everyone
• Apache Spark Streaming helps us to perform machine
learning at scale

Contenu connexe

Tendances

Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersPrateek Maheshwari
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uberconfluent
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluent
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...DevOps.com
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka ReplicatorMichael Hongliang Xu
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
 
Hunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic StackHunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic StackElasticsearch
 
Life as a SRE at Instana
Life as a SRE at InstanaLife as a SRE at Instana
Life as a SRE at InstanaMarcel Birkner
 
Elastic Observability keynote
Elastic Observability keynoteElastic Observability keynote
Elastic Observability keynoteElasticsearch
 
The Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPCThe Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPCinside-BigData.com
 
Logging and observability
Logging and observabilityLogging and observability
Logging and observabilityAnton Drukh
 

Tendances (20)

Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clusters
 
Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Observability & Datadog
Observability & DatadogObservability & Datadog
Observability & Datadog
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Hunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic StackHunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic Stack
 
Life as a SRE at Instana
Life as a SRE at InstanaLife as a SRE at Instana
Life as a SRE at Instana
 
Elastic Observability keynote
Elastic Observability keynoteElastic Observability keynote
Elastic Observability keynote
 
The Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPCThe Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPC
 
Logging and observability
Logging and observabilityLogging and observability
Logging and observability
 

Similaire à Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark with Daniel Lanza and Prasanth Kothuri

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
Combining out - of - band monitoring with AI and big data for datacenter aut...
Combining out - of - band monitoring with AI and big data  for datacenter aut...Combining out - of - band monitoring with AI and big data  for datacenter aut...
Combining out - of - band monitoring with AI and big data for datacenter aut...Ganesan Narayanasamy
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Integrated Carbon Observation System (ICOS)
 
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsQin Liu
 
ET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC PosterET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC PosterJoshua Proulx
 
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Szymon Skorupinski
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
Sensor Organism project presentation
Sensor Organism project presentationSensor Organism project presentation
Sensor Organism project presentationNaums Mogers
 
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...Deep Learning Italia
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPyData
 
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...Spark Summit
 
Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsAndrew Lowe
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenNETWAYS
 

Similaire à Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark with Daniel Lanza and Prasanth Kothuri (20)

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
DA-JPL-final
DA-JPL-finalDA-JPL-final
DA-JPL-final
 
Combining out - of - band monitoring with AI and big data for datacenter aut...
Combining out - of - band monitoring with AI and big data  for datacenter aut...Combining out - of - band monitoring with AI and big data  for datacenter aut...
Combining out - of - band monitoring with AI and big data for datacenter aut...
 
ROBOTICS - Introduction to Robotics Microcontroller
ROBOTICS -  Introduction to Robotics MicrocontrollerROBOTICS -  Introduction to Robotics Microcontroller
ROBOTICS - Introduction to Robotics Microcontroller
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...
 
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
 
ET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC PosterET-NavSwarm 2016 URC Poster
ET-NavSwarm 2016 URC Poster
 
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Super Computers
Super ComputersSuper Computers
Super Computers
 
Sensor Organism project presentation
Sensor Organism project presentationSensor Organism project presentation
Sensor Organism project presentation
 
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
Machine Learning Algorithms for Anomaly Detection in Particles Accelerators T...
 
OOW-IMC-final
OOW-IMC-finalOOW-IMC-final
OOW-IMC-final
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
supercomputer
supercomputersupercomputer
supercomputer
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark Basham
 
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozni...
 
Supercomputer @ manarat university by reza
Supercomputer  @ manarat university by rezaSupercomputer  @ manarat university by reza
Supercomputer @ manarat university by reza
 
Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle Physics
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe Haen
 

Plus de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Plus de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Dernier

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 

Dernier (20)

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark with Daniel Lanza and Prasanth Kothuri

  • 1. Prasanth Kothuri, CERN Daniel Lanza Garcia, CERN Real-time Detection Of Anomalies In The Database Infrastructure Using Apache Spark #EUde13
  • 2. Outline • About CERN • Apache Spark at CERN • Data Lake Challenges • Detecting Anomalous Connections • Real-time Intelligent Monitoring 2#EUde13
  • 3. About CERN 3#EUde13 • CERN - European Laboratory for Particle Physics • Founded in 1954 by 12 Countries for fundamental physics research in the post-war Europe • Today 21 member states + world-wide collaborations • About ~1000 MCHF yearly budget • 2’300 CERN personnel • 10’000 users from 110 countries
  • 4. Fundamental Research 4#EUde13 • What is 95% of the Universe made of? • Why do particles have mass? • Why is there no antimatter left in the Universe? • What was the Universe like just after the “Big Bang”
  • 5. Who Changed My Plan? 5 The Large Hadron Collider (LHC) Largest machine in the world 27km, 6000+ superconducting magnets Emptiest place in the solar system High vacuum inside the magnets Hottest spot in the galaxy During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun; Fastest racetrack on Earth Protons circulate 11245 times/s (99.9999991% the speed of light)
  • 6. Apache Spark @ CERN • Spark is a popular compute engine for Analytics + ETL – Deployed on four production Hadoop/YARN clusters • Aggregated capacity (2017): ~1500 physical cores, 11 PB – Adoption is growing. Key projects involving Spark: • Analytics for accelerator controls and logging • Monitoring IT Infrastructure, uses Spark + Spark streaming • Analytics on aggregated logs • Explorations on the use of Spark for high energy physics 6#EUdev2
  • 7. Detection Of Anomalies In Database Infrastructure 7#EUde13
  • 8. CERN Database Infrastructure 8#EUde13 • Over 100 Oracle databases, most of them RAC • Running Oracle 11.2.0.4 and 12.1.0.2 • 950 TB of data files for production DBs in total (without counting replicas), NAS as storage • Oracle In-Memory in production since July 2015 • Examples of critical production DBs: • Quench Protection System: 150’000 changes/s • LHC logging database: 492 TB, growing 180 TB/year • But also DB as a Service (single instances) • 310 MySQL, 70 PostgreSQL, 9 Oracle, 5 InfluxDB
  • 9. Data Lake 9#EUde13 • Objective: Build central repository for database audit, performance metrics and logs with the goal of real time analytics and offline analytics • Use cases • Detection of Anomalies • Troubleshooting • Capacity Planning
  • 10. Diverse Data 10#EUde13 • Tables • Audit • Performance Metrics • Alert • Log files • Listener
  • 11. Data pipeline Architecture 11#EUde13 alertlog Audit AWR Kafka 500 events/minute 500 events/minute 500 events/minute 70.000 events/minute 70.000 events/minute HDFS ACLs Per-index ACLS
  • 13. Real-time analysis 13#EUde13 • Attempt to detect anomalous connections using Machine Learning techniques • First approach to use kNN classifier to classify connection as normal or anomalous • Anomaly detection suffer from high false positive rate, the challenge is to choose features that best characterize user connections pattern
  • 14. Feature Selection 14#EUde13 • Data Preparation • Feature Extraction • Client_host • Client_user • Client_program • Service_name • Hour_of_the_day • Day_of_the_week { "oracle_sid" : "XXXX_YYYY", "listener_name": "listener_scan1", "database_type" : "oracle", "producer" : "oracle", "source_type": "listener", "hostname" : "XXXXX.cern.ch", "flume_agent_version": "0.1.6-7.el6", "type" : "dbconnection", "event_timestamp" : "2017-09-27T04:45:27+0200", "CONNECT_DATA_SERVER": "DEDICATED", "CONNECT_DATA_SERVICE_NAME": “XXXr.cern.ch", "client_program": "python", "client_host" : "pxxxxj2.cern.ch", "client_user": "merge", "client_protocol" : "tcp", "client_ip": "137.100.100.167", "client_port": 38432, "type" : "establish", "service_name" : "XXXX.cern.ch", "return_code": 0 }
  • 15. Detecting anomalous connections job • Machine learning: kNN 15#EUde13 Threshold
  • 16. Detecting anomalous connections job • Machine learning: kNN 16#EUde13 Threshold
  • 17. Detecting anomalous connections job • Machine learning: kNN 17#EUde13 Threshold
  • 18. Detecting anomalous connections job • Machine learning: kNN 18#EUde13 Threshold
  • 19. Detecting anomalous connections job • Machine learning: kNN 19#EUde13 Threshold
  • 20. Detecting anomalous connections job • Machine learning: kNN – Making it scalable, each user should behave similarly 20#EUde13
  • 21. Detecting anomalous connections job • Distance Function: 21#EUde13 • Spark Streaming Job processing – 10 million / day • 6 executors, each with 4 cores and 8GB of memory
  • 22. Metrics monitor job • A general purpose metrics monitor 22#EUde13
  • 23. New metrics by aggregation and math equations Metrics source Notifications sink (optional) Metrics monitor job: data flow 23#EUde13 Monitor 1 Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator Monitor 2 Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator Metrics source Analysis results sink (optional) Monitor N Filter (optional) Pre-analysis (optional) Analysis Notificator 1 Notificator 2 Notificator N Metrics source
  • 24. Notifications sink (optional) Metrics monitor job: modularity 24#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Metrics source Analysis results sink (optional) Notificator • Easy implementation of new components – Extending corresponding abstract class – Stateful operations: a store is available Metrics source Properties source
  • 25. • Which metrics should process this monitor? – CPU Usage of all databases, Cache Hit % of MySQL databases, all machines in PROD, … • Filter metrics by attributes • Accepts exact value or regular expressions • If not applied, all metrics are processed 25#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 26. • Massage metric value – Average, difference, filter, … • Internal store for stateful pre-analysis • If applied, result will be the analyzed value 26#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 27. • Type: average value of a period 27#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 28. • Type: weighted average value of a period 28#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 29. • Type: difference with previous value 29#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 30. • Type: difference with previous value 30#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 31. • Is the metric behaving as it should? • Determine the status of each incoming metric – OK, WARNING, ERROR, EXCEPTION • Results can be sent to external storage • Internal store for stateful analysis 31#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 32. • Type: thresholds fixed manually 32#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 33. • Type: learning from recent behaviour (average and variance) – Errors when metric does not behave as it used to 33#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 34. • Type: learning from recent behaviour (average and variance) – Easily affected by outliers 34#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 35. • Type: learning from recent behaviour (percentiles) 35#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 36. • Type: learning a season (hour, day, week) 36#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 37. • Type: learning a season (hour, day, week) 37#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 38. • Type: learning a season (hour, day, week) 38#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 39. • Type: learning a season (hour, day, week) 39#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 40. • Should we inform anyone? – Metric has been in ERROR for 20 minutes… • Determines when a notification is raised • Based on metric statuses • Notifications are consumed by Notifications sink 40#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 41. • Type: constant status for certain period 41#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 42. • Type: percentage of the period in certain statuses 42#EUde13 Monitor Filter (optional) Pre-analysis (optional) Analysis Notificator Notificator
  • 43. Metrics monitor job: configuration 43#EUde13 checkpoint.dir = <path_to_store_stateful_data> (default: /tmp/) spark.batch.time = <seconds> (default: 30) metrics.source.kafka-prod.type = kafka metrics.source.kafka-prod.consumer.bootstrap.servers = server:port,server:port metrics.source.kafka-prod.topics = db-logging-platform metrics.source.kafka-prod.parser.attributes = INSTANCE_NAME METRIC_NAME metrics.source.kafka-prod.parser.value.attribute = VALUE metrics.source.kafka-prod.parser.timestamp.attribute = END_TIME results.sink.type = elastic results.sink.index = itdb_db-metric-results/log notifications.sink.type = elastic notifications.sink.index = itdb_db-metric-notifications/log # Monitors monitor.<monitor-id-1>.<confs>... monitor.<monitor-id-2>.<confs>... monitor.<monitor-id-n>.<confs>... Properties source
  • 44. Metrics monitor job: monitor configuration 44#EUde13 # Monitor CPU of all instances in production monitor.CPUUsage.filter.attribute.INSTANCE_NAME = regex:prod-.* monitor.CPUUsage.filter.attribute.METRIC_NAME = CPU Usage Per Sec monitor.CPUUsage.missing.max-period = 3m monitor.CPUUsage.pre-analysis.type = weighted-average monitor.CPUUsage.pre-analysis.period = 5m monitor.CPUUsage.analysis.type = seasonal monitor.CPUUsage.analysis.season = hour monitor.CPUUsage.analysis.learning.ratio = 0.2 monitor.CPUUsage.notificator.error-perc.type = percentage monitor.CPUUsage.notificator.error-perc.statuses = ERROR monitor.CPUUsage.notificator.error-perc.period = 10m monitor.CPUUsage.notificator.warn-perc.type = percentage monitor.CPUUsage.notificator.warn-perc.statuses = WARNING monitor.CPUUsage.notificator.warn-perc.period = 20m • Configuration can be updated while running • Add/remove monitor or change any parameter Filter Pre- anal ysis Anal ysis Notificator Notificator
  • 45. Metrics monitor job • We are monitoring: – Around 30K metrics per minute (day season analysis) – Processing time = 30 seconds/minute – On 3 executors (YARN) • Repo, user’s and developer’s manual, contributions, issues: – https://github.com/cerndb/metrics-monitor 45#EUde13
  • 48. Java 8: Extend RDDs and Streams + Optional + lambdas + + java.streaming.* + reference methods = 48#EUde13
  • 49. Not using Spark check-pointing feature 49#EUde13 • Pro: develop/update the job without loosing stateful data • Cons: implement save/load • Note: for stateful operations checkpoint needs to be enable – But when restarting, data is removed
  • 50. Conclusion 50#EUde13 • Building the data lake opens up for several kinds of analysis • Implementation of anomalous connections helps us to identify potential intrusions to the database infrastructure • Intelligent monitoring tool learns and adapts the threshold • Intelligent monitoring tool is open source and available for everyone • Apache Spark Streaming helps us to perform machine learning at scale