SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Model as a Service
Modern Streaming Data Science with Apache Metron
Casey Stella
@casey_stella
2017
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Introduction
Hi, I’m Casey Stella!
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
◦ The enriched data will be indexed for further analysis in a realtime index and in a batch
index
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Apache Metron
• Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking:
◦ We ingest network data from many different sources.
◦ That data is then normalized and enriched
◦ Potential threats will be identified in the enriched data as it streams by and the
likelihood of the threat will be ranked
◦ The enriched data will be indexed for further analysis in a realtime index and in a batch
index
• Better enrichment =⇒ more context =⇒ better insights
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
• Machine learning can adapt better, but can be hard to scale at high velocity
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Enrichment =⇒ Data Science
Threat identification is a hard thing, it turns out. The key to the solution is enabling
better threat identification.
• Fixed rules are too coarse and multiply aggressively.
• Statistical baselining can adapt better than rules in some circumstances, but has
scale challenges as well
• Machine learning can adapt better, but can be hard to scale at high velocity
The ideal solution is to enable the mixing of rules, statistical baselining and machine
learning.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
• Many instances of many models will be running at once
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Characteristics of a Solution
• There’s a lot of data and it comes at you fast
◦ Build atop scalable infrastructure: Storm + Hadoop
• Data Scientists rarely standardize on one technology or library for all problems
◦ Bring your own Language and Library
◦ Model interaction can be done via exposing models as REST endpoints
• Data Science can be computationally expensive
◦ If we assume models are state-free, they look a lot like traditional microservices and can
be scaled by replication
• Integration of new models and retirement of old models must be seamless
◦ Registry and Autodiscovery through zookeeper
• Many instances of many models will be running at once
◦ Share resources fairly by using Yarn for deployment and model lifecycle management
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
• Provide a REST interface serving up your model
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Requirements of a Model Creator
We want to ask as little of data scientists and model creators as possible:
• Handle training your model wherever and however you like
• Make your model stateless
• Provide a REST interface serving up your model
From there, MaaS will handle deployment, management and model auto-discovery
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Model as a Service: Stellar
Now that we have a deployment and autodiscovery solution, we need to interact with the
models. We think of Stellar as like Excel functions that we can run on streaming data:
• Compose a rich set of built-in functions with user defined functions
• Provide simple primitives around the functions: boolean operations, conditionals,
numerical computation.
Model interaction is enabled as a set of Stellar functions that discover and interact with
model endpoints.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Model as a Service: Architecture
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Interlude: Domain Generating Algorithms
Domain Generating Algorithms are used by botnets to communicate with compromised
computers in an evasive way. Traffic to a fixed command and control host would be
noticed and firewall rules could be modified to cut communication channels cleanly.
Instead, the botnet command and control hosts must move around to evade detection.
Typically this involves periodically generating a synthetic domain in a repeatable way
and having the compromised computers attempt to connect to a few candidate synthetic
domains daily with some moderate hope of guessing the right one. This traffic is small
enough to be lost in the shuffle of a large organization, making the evasion effective.
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Demo
• One solution for detecting synthetic domains is to construct a classifier that must be
run against every domain going across the network.
• At the BSides DFW Conference in 2013, ClickSecurity presented a model written in
Python for detecting synthetic domains.
• We can expose this model as a REST endpoint, deploy via MaaS and interact with
the model via Stellar
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
dga_model_endpoint := MAAS_GET_ENDPOINT('dga')
dga_result_map := MAAS_MODEL_APPLY( dga_model_endpoint
, { 'host' : domain_without_subdomains }
)
dga_result := MAP_GET('is_malicious', dga_result_map)
is_dga := dga_result != null && dga_result == 'dga'
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
• Automatic conversion of common model output formats to an endpoint
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Next Steps
Going forward, we’d like to add a few things to make this an even more complete
solution
• A UI for model deployment
• Model Monitoring w/ dashboards
• Alternative endpoint formats (e.g. gRPC)
• Automatic conversion of common model output formats to an endpoint
• A searchable model registry
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
Questions
Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather
session Thursday.
• Find me at http://caseystella.com
• Twitter handle: @casey_stella
• Email address: cstella@hortonworks.com
Casey Stella@casey_stella (Hortonworks) Model as a Service 2017

Contenu connexe

Tendances

Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR Technologies
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkItai Yaffe
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...DataWorks Summit
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 

Tendances (20)

Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 

En vedette

En vedette (12)

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Solving Cyber at Scale
Solving Cyber at ScaleSolving Cyber at Scale
Solving Cyber at Scale
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Similaire à MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron

Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...DataStax Academy
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationRanga Vadlamudi
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSAmazon Web Services
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Amazon Web Services
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraDataStax Academy
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information SecuritySven Krasser
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it worldChris Dwan
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 

Similaire à MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (20)

Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Spring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB PresentationSpring + QueryDSL + MongoDB Presentation
Spring + QueryDSL + MongoDB Presentation
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWS
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache Cassandra
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information Security
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it world
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron

  • 1. Model as a Service Modern Streaming Data Science with Apache Metron Casey Stella @casey_stella 2017 Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 2. Introduction Hi, I’m Casey Stella! Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 3. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 4. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 5. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 6. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 7. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked ◦ The enriched data will be indexed for further analysis in a realtime index and in a batch index Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 8. Apache Metron • Apache Metron is a streaming cybersecurity analytics solution. Roughly speaking: ◦ We ingest network data from many different sources. ◦ That data is then normalized and enriched ◦ Potential threats will be identified in the enriched data as it streams by and the likelihood of the threat will be ranked ◦ The enriched data will be indexed for further analysis in a realtime index and in a batch index • Better enrichment =⇒ more context =⇒ better insights Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 9. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 10. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 11. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 12. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well • Machine learning can adapt better, but can be hard to scale at high velocity Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 13. Enrichment =⇒ Data Science Threat identification is a hard thing, it turns out. The key to the solution is enabling better threat identification. • Fixed rules are too coarse and multiply aggressively. • Statistical baselining can adapt better than rules in some circumstances, but has scale challenges as well • Machine learning can adapt better, but can be hard to scale at high velocity The ideal solution is to enable the mixing of rules, statistical baselining and machine learning. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 14. Characteristics of a Solution • There’s a lot of data and it comes at you fast Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 15. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 16. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 17. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 18. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 19. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 20. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 21. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 22. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 23. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper • Many instances of many models will be running at once Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 24. Characteristics of a Solution • There’s a lot of data and it comes at you fast ◦ Build atop scalable infrastructure: Storm + Hadoop • Data Scientists rarely standardize on one technology or library for all problems ◦ Bring your own Language and Library ◦ Model interaction can be done via exposing models as REST endpoints • Data Science can be computationally expensive ◦ If we assume models are state-free, they look a lot like traditional microservices and can be scaled by replication • Integration of new models and retirement of old models must be seamless ◦ Registry and Autodiscovery through zookeeper • Many instances of many models will be running at once ◦ Share resources fairly by using Yarn for deployment and model lifecycle management Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 25. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 26. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 27. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 28. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless • Provide a REST interface serving up your model Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 29. Requirements of a Model Creator We want to ask as little of data scientists and model creators as possible: • Handle training your model wherever and however you like • Make your model stateless • Provide a REST interface serving up your model From there, MaaS will handle deployment, management and model auto-discovery Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 30. Model as a Service: Stellar Now that we have a deployment and autodiscovery solution, we need to interact with the models. We think of Stellar as like Excel functions that we can run on streaming data: • Compose a rich set of built-in functions with user defined functions • Provide simple primitives around the functions: boolean operations, conditionals, numerical computation. Model interaction is enabled as a set of Stellar functions that discover and interact with model endpoints. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 31. Model as a Service: Architecture Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 32. Interlude: Domain Generating Algorithms Domain Generating Algorithms are used by botnets to communicate with compromised computers in an evasive way. Traffic to a fixed command and control host would be noticed and firewall rules could be modified to cut communication channels cleanly. Instead, the botnet command and control hosts must move around to evade detection. Typically this involves periodically generating a synthetic domain in a repeatable way and having the compromised computers attempt to connect to a few candidate synthetic domains daily with some moderate hope of guessing the right one. This traffic is small enough to be lost in the shuffle of a large organization, making the evasion effective. Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 33. Demo • One solution for detecting synthetic domains is to construct a classifier that must be run against every domain going across the network. • At the BSides DFW Conference in 2013, ClickSecurity presented a model written in Python for detecting synthetic domains. • We can expose this model as a REST endpoint, deploy via MaaS and interact with the model via Stellar Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 34. dga_model_endpoint := MAAS_GET_ENDPOINT('dga') dga_result_map := MAAS_MODEL_APPLY( dga_model_endpoint , { 'host' : domain_without_subdomains } ) dga_result := MAP_GET('is_malicious', dga_result_map) is_dga := dga_result != null && dga_result == 'dga' Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 35. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 36. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 37. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 38. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 39. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) • Automatic conversion of common model output formats to an endpoint Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 40. Next Steps Going forward, we’d like to add a few things to make this an even more complete solution • A UI for model deployment • Model Monitoring w/ dashboards • Alternative endpoint formats (e.g. gRPC) • Automatic conversion of common model output formats to an endpoint • A searchable model registry Casey Stella@casey_stella (Hortonworks) Model as a Service 2017
  • 41. Questions Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather session Thursday. • Find me at http://caseystella.com • Twitter handle: @casey_stella • Email address: cstella@hortonworks.com Casey Stella@casey_stella (Hortonworks) Model as a Service 2017