SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
1
Leveraging the Power of Solr with Spark
JOHANNES WEIGEND
CTO, QAware GmbH / Germany
2
3
01
Agenda
Introduction to Solr Cloud and Spark
Importing
Searching and Aggregating
Scaling Up
It is Hard to Scale Horizontally!
■ Functions

- Trivial
- Loadbalancing of stateless services (macro- / microservices)
- More users -> more machines
- Nontrivial
- More machines -> faster response times
■ Data

- Trivial
- Linear distribution of data on multiple machines
- More machines -> more data
- Nontrivial
- Constant response times with growing datasets
4
5
Cloud
-Document based NoSQL database with outstanding search capabilities
A document is a collection of fields (string, number, date, …)
Single und multiple fields (fields can be arrays)
Nested documents
Static und dynamic scheme
Powerful query language (Lucene)
-Horizontally scalable with Solr Cloud
Distributed data in separate shards
Resilience by combination of zookeeper and replication
-Powerful aggregations (aka facets)
6
Shard2
Solr Server
Zookeeper
Solr ServerSolr Server
Shard1
Zookeeper Zookeeper Zookeeper
Ensamble
Solr Cloud
Leader
Scale Out
Shard3
Replica8 Replica9
Shard5Shard4 Shard6 Shard8Shard7 Shard9
Replica2 Replica3 Replica5
Shards
Replicas
Collection
Replica4 Replica7 Replica1 Replica6
The Architecture of Solr Cloud
Two Levels of Distribution
Search Search Search
Search

Index

Store

Map Map Map
Calculate

Cache

Join

Combine

Frontend
Reduce Business Layer
Combining Solr + Spark
7
READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
■Distributed computing (100x faster than Hadoop M/R)
■Distributed Map/Reduce on distributed data can be done in-memory
■Supports online and batch workloads
■Scala with Java/Scala/Python APIs
■Processes data from distributed and local sources
-Textfiles (accessible from all nodes)
-Hadoop File System (HDFS)
-Databases (JDBC)
-Solr per Lucidworks API
8
Driver
9
Apache Spark
executing parallel tasks
executing parallel tasks
Executor
Executor
10
Cloud in a Box
The Cloud in a Box

6th generation Intel® Core™ i5-6260U processor
with Intel® Iris™ graphics
(1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB
Cache, 15W TDP)
CPU
32 GB Dual-channel DDR4 SODIMMs
1.2V, 2133 MHz
RAM
256 GB Samsung M.2 internal SSDDISK
! Used for all benchmarks in this talk
10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal
11
12
13
01
Introduction into Solr Cloud and Spark
Importing
Searching and Aggregating
Scaling Up
Agenda
Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH
Monitoring Sample Data
■ Single CSV per process, host, metric type
wls1_lpapp18_jmx.csv
Datetime CPU % Usage Heap % Usage #GC Invocations
1/10/16 9:00,000 50 50 1000
1/10/16 10:00,000 60 60 1100
1/10/16 11:00,000 70 70 1300
1/10/16 12:00,000 80 80 1800
CSV Solr document per cell
14
15
CloudSolrClient
SOLR1
SOLR2
SOLR3
add(List<document> batch) ShardsClient
Input Data
read input data
create batch
add batch to Solr
Bottleneck
Processing
Bottleneck
Network
Importing and Indexing into Solr can be slow
Some Options to Speed Things Up
Spark Executor
16
CloudSolrClient Solr Server 1
add(List<document> batch) Shards
Parallel Cloud Importer
Distributed
Input Data
-read input data
-create batch
-add batch to Solr
Parallel Import with Spark makes Import Scalable
Node1
CloudSolrClient Solr Server 2Spark ExecutorNode2
Scale upScale up Scale up
Node n
Solr Server 3CloudSolrClientSpark ExecutorNode3
17
How to Import Multiple (HDFS) Files
18
19
Solr UUID-Field
20
Import takes - 78411 ms
—> 180.000 Docs per Second
Indexing 14 Mio Docs in 1:20 Min
SolrJ and Spark have Different Transitive Dependencies
Depending on the Software Version
■ Adding both libraries to your classpath leads by transitivity to serious
problems at runtime (Serialization errors / ClassNotFoundExceptions…)
■ Pinning / Exclusion helps - but can produce strange errors. There is
currently no satisfying solution for the BigData class path hell.
21
22
01
Introduction into Solr Cloud and Spark
Importing
Searching and Aggregating
Scaling Up
Agenda
23
Using Solr Facet Queries for Aggregation
#
# Grouping per sub query
#
curl $SOLR/$COLLECTION/select -d '
q=process:wls1 AND metric:*.HeapMemoryUsage.used&
rows=0&
json.facet={
Hosts: {
type: terms,
field: host,
facet:{
Off : { query : "value: [* TO 0]" },
Idle : { query : "value: [0 TO 1000000000]" },
Busy : { query : "value: [1000000001 TO 10000000000]" },
Overload : { query : "value: [10000000001 TO *]" }
}
}
}
Why Do we Need Even More?
■ Data centerer applications need a scalable way of
- Post processing search results or facets (business logik, ML,
data analytics)
- Post filtering search results
- Processing denormalized data (if you store a one-to-many
relation in a single Solr document)
24
Accessing Solr from Spark with SolrRDD
■ https://github.com/
lucidworks/spark-solr
■ You have to build the
library locally. There is no
released version at Maven
Central.
■ Make sure to adjust the
versions depending on
your environment
25
Streaming from Solr into Spark
Not Bad! 14 Mio in 1:27 Minutes
26
27
You Can Speed up Spark / Solr by Factor 10
Using the Export Handler
Using SolrRDD with Java
28
29
Reading 14 Mio Docs in 10 Seconds
Streaming 14 Mio Solr documents into Spark
takes 10 Seconds
—> 1.400 000 Docs per Second
RDDs using /export Handler Rocks!
30
Scaling up
31
Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH
Recap: Monitoring Sample Data
■ Single CSV per process, host, metric type
wls1_lpapp18_jmx.csv
Date CPU % Usage Heap % Usage #GC Invocations
1/10/16 9:00,000 50 50 1000
1/10/16 10:00,000 60 60 1100
1/10/16 11:00,000 70 70 1300
1/10/16 12:00,000 80 80 1800
CSV SOLR
32
1000 lines with 10.000
columuns = 3MB gzipped
1000 x 10.000 docs = 1 Mio Solr docs
A Naive Solr Datamodel
A single Solr document per CSV cell
‣ Advantage
You can use Solr for aggregation, sorting and
searching for values or time intervals
‣ Disadvantage
Data explosion (single compressed CSV file with 3MB
in size produces 1 Mil Solr documents)
33
Column Based Denormalization
wls1_lpapp18_jmx.csv
Date CPU % Usage Heap % Usage #GC Invocations
1/10/16 9:00,000 50 50 1000
1/10/16 10:00,000 60 60 1100
1/10/16 11:00,000 70 70 1300
1/10/16 12:00,000 80 80 1800
CSV
SolrDocument {
process: wls1
host: lpapp18
type: jmx
maxdate: 1/10/16 9:00
mindate: 1/10/16 12:00
metric: CPU % Usage
values: [BINARY (Date, Long)]
max: 80
min: 50
avg: 65
}
n 1
Store 1000-10000 events in a single document
Document per column
34
Storing 1-to-1400 Relation in a Single Document
Base64 encoded and gzipped
values: [{date: …, value:}, … ]
35
32k Limit for DocValues
Benefits of Denomalization
‣ Benefits
- You can scale from a xxx million documents in a Solr Cloud up to
trillions of searchable events
- Import is vastly faster
‣ Drawbacks
- Searching on single values requires additional logic
- Counting and faceting requires additional logic
‣ Spark can solve these problems by parallel post processing
- Decompressing, aggregating, joining, grouping
36
Accessing Compressed Data within Spark
37
38
Indexing 19 Million of CSV Values
in 13500 Solr documents
takes now 24 Seconds (before 1:20)
—> 800,000 Values per Second
39
Streaming One Billion of Solr Values into Spark
Takes now 34 Seconds (Before 700 s)
—> 29,000,000 Values per Second
Summary
■ The combination of Solr Cloud and Spark gives you the power to
deal with BigData workloads in realtime
■ Denormalization can make your Solr application vastly faster
■ Make use of the /export handler when using the SolrRDD
■ Parallel post processing is mandatory for nontrivial applications
■ If you want to learn more: come to the Chronix talk on Friday
40
Learn More
■ https://github.com/lucidworks/spark-solr
■ https://github.com/jweigend/solr-spark
■ http://chronix.io
■ https://github.com/ChronixDB/chronix.spark/
■ http://qaware.blogspot.de
41
42
43

Contenu connexe

Tendances

Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosRahul Kumar
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallSpark Summit
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...DataWorks Summit
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connectorDuyhai Doan
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data prajods
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkLegacy Typesafe (now Lightbend)
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchPatricia Gorla
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 

Tendances (20)

Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug Grall
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise Search
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 

En vedette

JEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon EuropeJEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon EuropeQAware GmbH
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesQAware GmbH
 
Lightweight developer provisioning with gradle and seu as-code
Lightweight developer provisioning with gradle and seu as-codeLightweight developer provisioning with gradle and seu as-code
Lightweight developer provisioning with gradle and seu as-codeQAware GmbH
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrQAware GmbH
 
Secure Architecture and Programming 101
Secure Architecture and Programming 101Secure Architecture and Programming 101
Secure Architecture and Programming 101QAware GmbH
 
Der Cloud Native Stack in a Nutshell
Der Cloud Native Stack in a NutshellDer Cloud Native Stack in a Nutshell
Der Cloud Native Stack in a NutshellQAware GmbH
 
Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)QAware GmbH
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrQAware GmbH
 
Vamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital servicesVamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital servicesQAware GmbH
 
Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!QAware GmbH
 
A Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native StackA Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native StackQAware GmbH
 
Developing Skills for Amazon Echo
Developing Skills for Amazon EchoDeveloping Skills for Amazon Echo
Developing Skills for Amazon EchoQAware GmbH
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusQAware GmbH
 
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.QAware GmbH
 
Kubernetes 101 and Fun
Kubernetes 101 and FunKubernetes 101 and Fun
Kubernetes 101 and FunQAware GmbH
 
Hands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and FunHands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and FunQAware GmbH
 
Cloud Native Unleashed
Cloud Native UnleashedCloud Native Unleashed
Cloud Native UnleashedQAware GmbH
 
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Mario-Leander Reimer
 
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickelnDie Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickelnQAware GmbH
 
ApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr IntegrationApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr Integrationthelabdude
 

En vedette (20)

JEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon EuropeJEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon Europe
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing Microservices
 
Lightweight developer provisioning with gradle and seu as-code
Lightweight developer provisioning with gradle and seu as-codeLightweight developer provisioning with gradle and seu as-code
Lightweight developer provisioning with gradle and seu as-code
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache Solr
 
Secure Architecture and Programming 101
Secure Architecture and Programming 101Secure Architecture and Programming 101
Secure Architecture and Programming 101
 
Der Cloud Native Stack in a Nutshell
Der Cloud Native Stack in a NutshellDer Cloud Native Stack in a Nutshell
Der Cloud Native Stack in a Nutshell
 
Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache Solr
 
Vamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital servicesVamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital services
 
Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!
 
A Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native StackA Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native Stack
 
Developing Skills for Amazon Echo
Developing Skills for Amazon EchoDeveloping Skills for Amazon Echo
Developing Skills for Amazon Echo
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
 
Kubernetes 101 and Fun
Kubernetes 101 and FunKubernetes 101 and Fun
Kubernetes 101 and Fun
 
Hands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and FunHands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and Fun
 
Cloud Native Unleashed
Cloud Native UnleashedCloud Native Unleashed
Cloud Native Unleashed
 
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
 
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickelnDie Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
 
ApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr IntegrationApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr Integration
 

Similaire à Leveraging the Power of Solr with Spark

Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseFlorian Lautenschlager
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedis Labs
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Amazon Web Services
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseAll Things Open
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAndre Essing
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike, Inc.
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge ShareingPhilip Zhong
 

Similaire à Leveraging the Power of Solr with Spark (20)

Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 

Plus de QAware GmbH

50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdfQAware GmbH
 
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...QAware GmbH
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzQAware GmbH
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureQAware GmbH
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!QAware GmbH
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringQAware GmbH
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightQAware GmbH
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAsQAware GmbH
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo QAware GmbH
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...QAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.QAware GmbH
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPQAware GmbH
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.QAware GmbH
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysQAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 

Plus de QAware GmbH (20)

50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf
 
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile Architecture
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform Engineering
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAs
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API Gateways
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 

Dernier

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 

Dernier (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 

Leveraging the Power of Solr with Spark

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A 1
  • 2. Leveraging the Power of Solr with Spark JOHANNES WEIGEND CTO, QAware GmbH / Germany 2
  • 3. 3 01 Agenda Introduction to Solr Cloud and Spark Importing Searching and Aggregating Scaling Up
  • 4. It is Hard to Scale Horizontally! ■ Functions - Trivial - Loadbalancing of stateless services (macro- / microservices) - More users -> more machines - Nontrivial - More machines -> faster response times ■ Data - Trivial - Linear distribution of data on multiple machines - More machines -> more data - Nontrivial - Constant response times with growing datasets 4
  • 5. 5 Cloud -Document based NoSQL database with outstanding search capabilities A document is a collection of fields (string, number, date, …) Single und multiple fields (fields can be arrays) Nested documents Static und dynamic scheme Powerful query language (Lucene) -Horizontally scalable with Solr Cloud Distributed data in separate shards Resilience by combination of zookeeper and replication -Powerful aggregations (aka facets)
  • 6. 6 Shard2 Solr Server Zookeeper Solr ServerSolr Server Shard1 Zookeeper Zookeeper Zookeeper Ensamble Solr Cloud Leader Scale Out Shard3 Replica8 Replica9 Shard5Shard4 Shard6 Shard8Shard7 Shard9 Replica2 Replica3 Replica5 Shards Replicas Collection Replica4 Replica7 Replica1 Replica6 The Architecture of Solr Cloud Two Levels of Distribution
  • 7. Search Search Search Search Index Store Map Map Map Calculate Cache Join Combine Frontend Reduce Business Layer Combining Solr + Spark 7
  • 8. READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf ■Distributed computing (100x faster than Hadoop M/R) ■Distributed Map/Reduce on distributed data can be done in-memory ■Supports online and batch workloads ■Scala with Java/Scala/Python APIs ■Processes data from distributed and local sources -Textfiles (accessible from all nodes) -Hadoop File System (HDFS) -Databases (JDBC) -Solr per Lucidworks API 8
  • 9. Driver 9 Apache Spark executing parallel tasks executing parallel tasks Executor Executor
  • 11. The Cloud in a Box
 6th generation Intel® Core™ i5-6260U processor with Intel® Iris™ graphics (1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB Cache, 15W TDP) CPU 32 GB Dual-channel DDR4 SODIMMs 1.2V, 2133 MHz RAM 256 GB Samsung M.2 internal SSDDISK ! Used for all benchmarks in this talk 10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal 11
  • 12. 12
  • 13. 13 01 Introduction into Solr Cloud and Spark Importing Searching and Aggregating Scaling Up Agenda
  • 14. Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH Monitoring Sample Data ■ Single CSV per process, host, metric type wls1_lpapp18_jmx.csv Datetime CPU % Usage Heap % Usage #GC Invocations 1/10/16 9:00,000 50 50 1000 1/10/16 10:00,000 60 60 1100 1/10/16 11:00,000 70 70 1300 1/10/16 12:00,000 80 80 1800 CSV Solr document per cell 14
  • 15. 15 CloudSolrClient SOLR1 SOLR2 SOLR3 add(List<document> batch) ShardsClient Input Data read input data create batch add batch to Solr Bottleneck Processing Bottleneck Network Importing and Indexing into Solr can be slow Some Options to Speed Things Up
  • 16. Spark Executor 16 CloudSolrClient Solr Server 1 add(List<document> batch) Shards Parallel Cloud Importer Distributed Input Data -read input data -create batch -add batch to Solr Parallel Import with Spark makes Import Scalable Node1 CloudSolrClient Solr Server 2Spark ExecutorNode2 Scale upScale up Scale up Node n Solr Server 3CloudSolrClientSpark ExecutorNode3
  • 17. 17 How to Import Multiple (HDFS) Files
  • 18. 18
  • 20. 20 Import takes - 78411 ms —> 180.000 Docs per Second Indexing 14 Mio Docs in 1:20 Min
  • 21. SolrJ and Spark have Different Transitive Dependencies Depending on the Software Version ■ Adding both libraries to your classpath leads by transitivity to serious problems at runtime (Serialization errors / ClassNotFoundExceptions…) ■ Pinning / Exclusion helps - but can produce strange errors. There is currently no satisfying solution for the BigData class path hell. 21
  • 22. 22 01 Introduction into Solr Cloud and Spark Importing Searching and Aggregating Scaling Up Agenda
  • 23. 23 Using Solr Facet Queries for Aggregation # # Grouping per sub query # curl $SOLR/$COLLECTION/select -d ' q=process:wls1 AND metric:*.HeapMemoryUsage.used& rows=0& json.facet={ Hosts: { type: terms, field: host, facet:{ Off : { query : "value: [* TO 0]" }, Idle : { query : "value: [0 TO 1000000000]" }, Busy : { query : "value: [1000000001 TO 10000000000]" }, Overload : { query : "value: [10000000001 TO *]" } } } }
  • 24. Why Do we Need Even More? ■ Data centerer applications need a scalable way of - Post processing search results or facets (business logik, ML, data analytics) - Post filtering search results - Processing denormalized data (if you store a one-to-many relation in a single Solr document) 24
  • 25. Accessing Solr from Spark with SolrRDD ■ https://github.com/ lucidworks/spark-solr ■ You have to build the library locally. There is no released version at Maven Central. ■ Make sure to adjust the versions depending on your environment 25
  • 26. Streaming from Solr into Spark Not Bad! 14 Mio in 1:27 Minutes 26
  • 27. 27 You Can Speed up Spark / Solr by Factor 10 Using the Export Handler
  • 29. 29 Reading 14 Mio Docs in 10 Seconds Streaming 14 Mio Solr documents into Spark takes 10 Seconds —> 1.400 000 Docs per Second
  • 30. RDDs using /export Handler Rocks! 30
  • 32. Apache Big Data North America | Vancouver | 05.05.2016 | Johannes Weigend | © QAware GmbH Recap: Monitoring Sample Data ■ Single CSV per process, host, metric type wls1_lpapp18_jmx.csv Date CPU % Usage Heap % Usage #GC Invocations 1/10/16 9:00,000 50 50 1000 1/10/16 10:00,000 60 60 1100 1/10/16 11:00,000 70 70 1300 1/10/16 12:00,000 80 80 1800 CSV SOLR 32 1000 lines with 10.000 columuns = 3MB gzipped 1000 x 10.000 docs = 1 Mio Solr docs
  • 33. A Naive Solr Datamodel A single Solr document per CSV cell ‣ Advantage You can use Solr for aggregation, sorting and searching for values or time intervals ‣ Disadvantage Data explosion (single compressed CSV file with 3MB in size produces 1 Mil Solr documents) 33
  • 34. Column Based Denormalization wls1_lpapp18_jmx.csv Date CPU % Usage Heap % Usage #GC Invocations 1/10/16 9:00,000 50 50 1000 1/10/16 10:00,000 60 60 1100 1/10/16 11:00,000 70 70 1300 1/10/16 12:00,000 80 80 1800 CSV SolrDocument { process: wls1 host: lpapp18 type: jmx maxdate: 1/10/16 9:00 mindate: 1/10/16 12:00 metric: CPU % Usage values: [BINARY (Date, Long)] max: 80 min: 50 avg: 65 } n 1 Store 1000-10000 events in a single document Document per column 34
  • 35. Storing 1-to-1400 Relation in a Single Document Base64 encoded and gzipped values: [{date: …, value:}, … ] 35 32k Limit for DocValues
  • 36. Benefits of Denomalization ‣ Benefits - You can scale from a xxx million documents in a Solr Cloud up to trillions of searchable events - Import is vastly faster ‣ Drawbacks - Searching on single values requires additional logic - Counting and faceting requires additional logic ‣ Spark can solve these problems by parallel post processing - Decompressing, aggregating, joining, grouping 36
  • 37. Accessing Compressed Data within Spark 37
  • 38. 38 Indexing 19 Million of CSV Values in 13500 Solr documents takes now 24 Seconds (before 1:20) —> 800,000 Values per Second
  • 39. 39 Streaming One Billion of Solr Values into Spark Takes now 34 Seconds (Before 700 s) —> 29,000,000 Values per Second
  • 40. Summary ■ The combination of Solr Cloud and Spark gives you the power to deal with BigData workloads in realtime ■ Denormalization can make your Solr application vastly faster ■ Make use of the /export handler when using the SolrRDD ■ Parallel post processing is mandatory for nontrivial applications ■ If you want to learn more: come to the Chronix talk on Friday 40
  • 41. Learn More ■ https://github.com/lucidworks/spark-solr ■ https://github.com/jweigend/solr-spark ■ http://chronix.io ■ https://github.com/ChronixDB/chronix.spark/ ■ http://qaware.blogspot.de 41
  • 42. 42
  • 43. 43