SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Addressing Performance
Issues in Titan+Cassandra
Introduction
● Nakul Jeirath
● Senior security engineer at WellAware (wellaware.us)
● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform
Transitioned ~2 years ago
Titan+Cassandra Performance Factors
● Titan deployment methodology
● Cassandra tuning
● Titan JVM tuning
● Data modeling
● Indexing
○ Property indices
○ Vertex centric indices
● Query structure
● Caching
○ Transaction cache
○ Database level cache
● Titan options
Titan+Cassandra Performance Factors
● Titan deployment methodology
● Cassandra tuning
● Titan JVM tuning
● Data modeling
● Indexing
○ Property indices
○ Vertex centric indices
● Query structure
● Caching
○ Transaction cache
○ Database level cache
● Titan options
Ted Wilmes - Cassandra Summit 2015:
Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandra
Video: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770
This talk
Our focus will be reads, check out Ted's
talk for write optimization
A Toy Example
http://coachesbythenumbers.com/sportsource-college-football-data-packages/
2005 College Football Data
● Team names & conferences
● Game record with dates and scores
● Interesting questions:
○ Records for all teams in conference X
○ Top 25 ranking using record + strength of opponents
○ Three team loop (A beat B beat C beat A)
Toy Model
Label: team
name: Purdue
conf: Big 10
Label: team
name: IU
conf: Big 10
label: beat
date: 11/19/05
score: 41-14
gremlin> g.V().count()
==>239
gremlin> g.E().count()
==>718
Test Bench
Shut down
Titan
Clear
Titan DB
Start
Titan
Load test dataset
Source code:
https://github.com/njeirath/titan-perf-tester
Test Runner
public class PerfTestRunner {
public static DescriptiveStatistics test(final TitanGraph graph, int iterations, PerfOperation op) {
DescriptiveStatistics stats = new DescriptiveStatistics();
for (int i = 0; i < iterations; i++) {
TitanTransaction tx = graph.newTransaction();
Date start = new Date();
op.run(tx);
Date end = new Date();
stats.addValue(end.getTime() - start.getTime());
tx.rollback();
}
return stats;
}
}
Pass in test query
as LambdaStart new transaction
Run test query
Record time
Rollback transaction
Anatomy of Gremin Queries
● Simplest form of OLTP query
○ picks an entry point(s) to graph
○ traverses from initial vertices
Initial graph entry
selection
Edge traversal
Example:
How many games did a Big 10 team win?
g.V().has('conference', 'Big Ten Conference').outE('beat').count()
Selecting the Entry Point
Typically won't have vertex ID(s) to select directly
Will select based on one or more vertex property
Feasible to scan all vertices in small graphs
Becomes prohibitively expensive on large graphs
Start from these
Property Index
Test query: g.V().has('conference', 'Big Ten Conference').toList()
Output:
07:45:24 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx -
Query requires iterating over all vertices [(conference = Big Ten Conference)]. For
better performance, use indexes
Titan is nice enough to warn us of this issue
Creating Index on "Conference" Property
mgmt = graph.openManagement()
conf = mgmt.getPropertyKey('conference')
mgmt.buildIndex('byConference',
Vertex.class).addKey(conf).buildCompositeIndex()
mgmt.commit()
mgmt.awaitGraphIndexStatus(graph, 'byConference').call()
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byConference"), SchemaAction.
REINDEX).get()
mgmt.commit()
Access graph management,
create composite index, and
commit
Wait for key to be
available and reindex
Reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
"Conference" Index Timing Comparison
Without Index:
n: 10
min: 127.0
max: 203.0
mean: 159.9
std dev: 29.598986469134378
median: 151.0
With Index:
n: 10
min: 2.0
max: 7.0
mean: 3.2
std dev: 1.6865480854231356
median: 2.5
Represents 49.96875x increase
Property Indices in Titan
● Composite Index
○ Supports equality comparison only
○ Can handle combinations of properties but must be pre-defined (Ex: Name and Age)
● Mixed Index
○ Greater conditionality support
○ Can handle lookups on arbitrary combinations of indexed keys
● Titan also has support for other external indexing backend
● Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0
/indexes.html
High Order Vertices
Don't always want to traverse all edges incident on a vertex
Filtering based on some edge properties is desirable
Similar to vertices: feasible to inspect each edge for low order
vertices
Prohibitive on high order vertices
Traverse these
edges
Vertex Centric Index
Example query:
dateFilter = lt(20051000)
g.V().has('conference', 'Big Ten Conference').as('team', 'wins', 'losses')
.select('team', 'wins', 'losses')
.by('name')
.by(__.outE().has('date', dateFilter).count())
.by(__.inE().has('date', dateFilter).count())
Gets Big 10 team records for games played before October 2005
Notes on Vertex Centric Indices
From Titan 1.0.0 documentation:
Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html#vertex-indexes
Titan 0.4.4 does not automatically create vertex-centric indices
No need to create one for our example
May be necessary if a composite key query is being performed
Ex: Get Big 10 team records for games played before October 2005 and won by more than 14 points
Query Structuring
Order of steps in query can make a difference
Consider:
g.V().order().by(__.outE().count(), decr).has('conference', 'Big Ten Conference').values('name')
vs
g.V().has('conference', 'Big Ten Conference').order().by(__.outE().count(), decr).values('name')
Mean times: 1032.8 ms vs 42.8 ms respectively
Titan Caching
Support for database and transaction level caching
Storage
Backend
Titan
DB Cache
TX Cache
TX Cache
TX Cache
Client
Client
Client
Transaction Cache
Transaction starts on graph access and ends on commit or rollback
Useful for workloads accessing same data repeatedly
A
B
C
D
Rank of team A
is a count of
these "beat"
edges
Ex: Team Rankings
g.V().order().by(__.out().out().count(), decr).as('team', 'score', 'wins', 'losses').select('team', 'score', 'wins',
'losses').by('name').by(__.out().out().count()).by(__.outE().count()).by(__.inE().count()).limit(25)
With TX cache: 3361 ms, without TX cache: 5206 ms
/r/mildlyinteresting/
1. Texas
2. USC
3. Penn State
4. Ohio State
5. Virginia Tech
6. TCU
7. West Virginia
8. Lousianna State
9. Alabama
10. Oregon
11. Louisville
12. Georgia
13. UCLA
14. Miami (FL)
1. Texas
2. USC
3. Penn State
4. Virginia Tech
5. LSU
6. Ohio State
7. Georgia
8. TCU
9. West Virginia
10. Alabama
11. Boston College
12. Oklahoma
13. Florida
14. UCLA
http://www.collegefootballpoll.com/2005_archive_computer_rankings.html
2005 End of
Season
Computer
Rankings
Our Query
Results
Transaction Caching Gotchas
Cache Thrashing
Symptom: Queries suddenly & significantly slow
down as data size increases
Solve this by tuning transaction cache size
● Globally by setting cache.tx-cache-size
● Per transaction using TransactionBuilder
Memory Leak
Transactions automatically started and are
thread aware
With read only access in separate threads,
transaction caches can leak
Solved by calling g.rollback() at the end of the
thread execution (releases the TX cache)
Transaction Cache Settings
Transaction cache can be setup in properties files
Settings can be overridden when creating transaction using TransactionBuilder:
Example:
tx=graph.buildTransaction().vertexCacheSize(50000).start()
Other transaction settings can be found here: http://s3.thinkaurelius.
com/docs/titan/1.0.0/tx.html#tx-config
Database Level Caching
Database caching helps performance across transactions:
gremlin> stats2.getValues()
==>[1016.0, 41.0, 27.0, 26.0, 24.0, 23.0, 24.0, 21.0, 18.0, 18.0]
Trades consistency for speed in clusters
Node 1
Node 2
Node n
Titan 1
Titan 2
Titan n
Cold
cache
Warm cache
1. Read
2. Write
3. Read
Titan Options
query.batch - Whether traversal queries should be batched when executed
against the storage backend. This can lead to significant performance
improvement if there is a non-trivial latency to the backend.
query.fast-property - Whether to pre-fetch all properties on first singular vertex
property access. This can eliminate backend calls on subsequent property access
for the same vertex at the expense of retrieving all properties at once. This can be
expensive for vertices with many properties
http://s3.thinkaurelius.com/docs/titan/1.0.0/titan-config-ref.html
Using "query.fast-property"
Test query: g.V().group().by('conference').by('name')
query.fast-property = false:
n: 10
min: 243.0
max: 262.0
mean: 250.4
std dev: 5.125101625008685
median: 250.0
query.fast-property = true:
n: 10
min: 127.0
max: 151.0
mean: 138.1
std dev: 7.233410137841088
median: 139.5
Summary
● Titan indices
○ Property indices - vertex/edge lookups
○ Vertex centric indices - edge traversals
● Generally limiting elements early in traversal is a good thing
● Caching
○ Database level - improve speed while potentially increasing likelihood of stale data
○ Transaction level - helps when repeatedly visiting elements within a transaction
● Various other options available for specific tuning needs
Thanks For Watching
Questions
Nakul Jeirath
@njeirath
Senior Security Engineer - WellAware

Contenu connexe

Tendances

A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...Neville Li
 
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data ArchitectDataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data ArchitectDataStax Academy
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modelingRomain Hardouin
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesSigmoid
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data prajods
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
codecentric AG: Using Cassandra and Clojure for Data Crunching backends
codecentric AG: Using Cassandra and Clojure for Data Crunching backendscodecentric AG: Using Cassandra and Clojure for Data Crunching backends
codecentric AG: Using Cassandra and Clojure for Data Crunching backendsDataStax Academy
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon RedshiftJeff Patti
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleScyllaDB
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)PingCAP
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingScyllaDB
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
 
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...DataStax
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for HumansCitus Data
 

Tendances (20)

A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data ArchitectDataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
 
Druid
DruidDruid
Druid
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
codecentric AG: Using Cassandra and Clojure for Data Crunching backends
codecentric AG: Using Cassandra and Clojure for Data Crunching backendscodecentric AG: Using Cassandra and Clojure for Data Crunching backends
codecentric AG: Using Cassandra and Clojure for Data Crunching backends
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Golang in TiDB (GopherChina 2017)
Golang in TiDB  (GopherChina 2017)Golang in TiDB  (GopherChina 2017)
Golang in TiDB (GopherChina 2017)
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 

En vedette

Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...
TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...
TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...Miguel Gea
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Cassandra(no sql)によるシステム提案と開発
Cassandra(no sql)によるシステム提案と開発Cassandra(no sql)によるシステム提案と開発
Cassandra(no sql)によるシステム提案と開発kishimotosc
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics EngineMarko Rodriguez
 
Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発
Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発
Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発kishimotosc
 
MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...
MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...
MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...MongoDB
 
GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...
GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...
GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...Neo4j
 
Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~
Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~
Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~kishimotosc
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
An indoor location aware system for an io t-based smart museum
An indoor location aware system for an io t-based smart museumAn indoor location aware system for an io t-based smart museum
An indoor location aware system for an io t-based smart museumieeepondy
 
Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Matthias Broecheler
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandrajohnrjenson
 
Titan mrktng strategy
Titan mrktng strategyTitan mrktng strategy
Titan mrktng strategyKallol Sarkar
 
Build a Recommendation Engine using Amazon Machine Learning in Real-time
Build a Recommendation Engine using Amazon Machine Learning in Real-timeBuild a Recommendation Engine using Amazon Machine Learning in Real-time
Build a Recommendation Engine using Amazon Machine Learning in Real-timeAmazon Web Services
 

En vedette (20)

Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...
TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...
TOWARDS SMART & INCLUSIVE SOCIETY: BUILDING 3D IMMERSIVE MUSEUM BY CHILDREN W...
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Cassandra(no sql)によるシステム提案と開発
Cassandra(no sql)によるシステム提案と開発Cassandra(no sql)によるシステム提案と開発
Cassandra(no sql)によるシステム提案と開発
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発
Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発
Cassandraのトランザクションサポート化 & web2pyによるcms用プラグイン開発
 
MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...
MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...
MongoDB IoT City Tour LONDON: Why your Dad's database won't work for IoT. Joe...
 
GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...
GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...
GraphConnect Europe 2016 - IoT - where do Graphs fit with Business Requiremen...
 
Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~
Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~
Devsumi2013【15-e-5】NoSQLの野心的な使い方 ~Apache Cassandra編~
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
IOT Trend and Solution Development in Taiwan
IOT Trend and Solution Development in TaiwanIOT Trend and Solution Development in Taiwan
IOT Trend and Solution Development in Taiwan
 
An indoor location aware system for an io t-based smart museum
An indoor location aware system for an io t-based smart museumAn indoor location aware system for an io t-based smart museum
An indoor location aware system for an io t-based smart museum
 
Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan
TitanTitan
Titan
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Titan mrktng strategy
Titan mrktng strategyTitan mrktng strategy
Titan mrktng strategy
 
Titan ppt
Titan pptTitan ppt
Titan ppt
 
Titan watches
Titan watchesTitan watches
Titan watches
 
Build a Recommendation Engine using Amazon Machine Learning in Real-time
Build a Recommendation Engine using Amazon Machine Learning in Real-timeBuild a Recommendation Engine using Amazon Machine Learning in Real-time
Build a Recommendation Engine using Amazon Machine Learning in Real-time
 

Similaire à Addressing performance issues in titan+cassandra

A Journey from Relational to Graph
A Journey from Relational to GraphA Journey from Relational to Graph
A Journey from Relational to GraphNakul Jeirath
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Reportnyin27
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015eddiebaggott
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 
To Study The Tips Tricks Guidelines Related To Performance Tuning For N Hib...
To Study The Tips Tricks  Guidelines Related To Performance Tuning For  N Hib...To Study The Tips Tricks  Guidelines Related To Performance Tuning For  N Hib...
To Study The Tips Tricks Guidelines Related To Performance Tuning For N Hib...Shahzad
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin airKonstantine Krutiy
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkAnirudh Todi
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen TatarynovFwdays
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"LogeekNightUkraine
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayGrega Kespret
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japanHiromitsu Komatsu
 
FMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menannoFMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menannoVerein FM Konferenz
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
The Nuts and Bolts of Kafka Streams---An Architectural Deep Dive
The Nuts and Bolts of Kafka Streams---An Architectural Deep DiveThe Nuts and Bolts of Kafka Streams---An Architectural Deep Dive
The Nuts and Bolts of Kafka Streams---An Architectural Deep DiveHostedbyConfluent
 

Similaire à Addressing performance issues in titan+cassandra (20)

A Journey from Relational to Graph
A Journey from Relational to GraphA Journey from Relational to Graph
A Journey from Relational to Graph
 
Database Development Replication Security Maintenance Report
Database Development Replication Security Maintenance ReportDatabase Development Replication Security Maintenance Report
Database Development Replication Security Maintenance Report
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
How Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data ManagementHow Database Convergence Impacts the Coming Decades of Data Management
How Database Convergence Impacts the Coming Decades of Data Management
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
To Study The Tips Tricks Guidelines Related To Performance Tuning For N Hib...
To Study The Tips Tricks  Guidelines Related To Performance Tuning For  N Hib...To Study The Tips Tricks  Guidelines Related To Performance Tuning For  N Hib...
To Study The Tips Tricks Guidelines Related To Performance Tuning For N Hib...
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Dun ddd
Dun dddDun ddd
Dun ddd
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech Talk
 
Tsar tech talk
Tsar tech talkTsar tech talk
Tsar tech talk
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
FMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menannoFMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menanno
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
The Nuts and Bolts of Kafka Streams---An Architectural Deep Dive
The Nuts and Bolts of Kafka Streams---An Architectural Deep DiveThe Nuts and Bolts of Kafka Streams---An Architectural Deep Dive
The Nuts and Bolts of Kafka Streams---An Architectural Deep Dive
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Dernier (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Addressing performance issues in titan+cassandra

  • 2. Introduction ● Nakul Jeirath ● Senior security engineer at WellAware (wellaware.us) ● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform
  • 4. Titan+Cassandra Performance Factors ● Titan deployment methodology ● Cassandra tuning ● Titan JVM tuning ● Data modeling ● Indexing ○ Property indices ○ Vertex centric indices ● Query structure ● Caching ○ Transaction cache ○ Database level cache ● Titan options
  • 5. Titan+Cassandra Performance Factors ● Titan deployment methodology ● Cassandra tuning ● Titan JVM tuning ● Data modeling ● Indexing ○ Property indices ○ Vertex centric indices ● Query structure ● Caching ○ Transaction cache ○ Database level cache ● Titan options Ted Wilmes - Cassandra Summit 2015: Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandra Video: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770 This talk Our focus will be reads, check out Ted's talk for write optimization
  • 6. A Toy Example http://coachesbythenumbers.com/sportsource-college-football-data-packages/ 2005 College Football Data ● Team names & conferences ● Game record with dates and scores ● Interesting questions: ○ Records for all teams in conference X ○ Top 25 ranking using record + strength of opponents ○ Three team loop (A beat B beat C beat A)
  • 7. Toy Model Label: team name: Purdue conf: Big 10 Label: team name: IU conf: Big 10 label: beat date: 11/19/05 score: 41-14 gremlin> g.V().count() ==>239 gremlin> g.E().count() ==>718
  • 8. Test Bench Shut down Titan Clear Titan DB Start Titan Load test dataset Source code: https://github.com/njeirath/titan-perf-tester
  • 9. Test Runner public class PerfTestRunner { public static DescriptiveStatistics test(final TitanGraph graph, int iterations, PerfOperation op) { DescriptiveStatistics stats = new DescriptiveStatistics(); for (int i = 0; i < iterations; i++) { TitanTransaction tx = graph.newTransaction(); Date start = new Date(); op.run(tx); Date end = new Date(); stats.addValue(end.getTime() - start.getTime()); tx.rollback(); } return stats; } } Pass in test query as LambdaStart new transaction Run test query Record time Rollback transaction
  • 10. Anatomy of Gremin Queries ● Simplest form of OLTP query ○ picks an entry point(s) to graph ○ traverses from initial vertices Initial graph entry selection Edge traversal Example: How many games did a Big 10 team win? g.V().has('conference', 'Big Ten Conference').outE('beat').count()
  • 11. Selecting the Entry Point Typically won't have vertex ID(s) to select directly Will select based on one or more vertex property Feasible to scan all vertices in small graphs Becomes prohibitively expensive on large graphs Start from these
  • 12. Property Index Test query: g.V().has('conference', 'Big Ten Conference').toList() Output: 07:45:24 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [(conference = Big Ten Conference)]. For better performance, use indexes Titan is nice enough to warn us of this issue
  • 13. Creating Index on "Conference" Property mgmt = graph.openManagement() conf = mgmt.getPropertyKey('conference') mgmt.buildIndex('byConference', Vertex.class).addKey(conf).buildCompositeIndex() mgmt.commit() mgmt.awaitGraphIndexStatus(graph, 'byConference').call() mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex("byConference"), SchemaAction. REINDEX).get() mgmt.commit() Access graph management, create composite index, and commit Wait for key to be available and reindex Reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
  • 14. "Conference" Index Timing Comparison Without Index: n: 10 min: 127.0 max: 203.0 mean: 159.9 std dev: 29.598986469134378 median: 151.0 With Index: n: 10 min: 2.0 max: 7.0 mean: 3.2 std dev: 1.6865480854231356 median: 2.5 Represents 49.96875x increase
  • 15. Property Indices in Titan ● Composite Index ○ Supports equality comparison only ○ Can handle combinations of properties but must be pre-defined (Ex: Name and Age) ● Mixed Index ○ Greater conditionality support ○ Can handle lookups on arbitrary combinations of indexed keys ● Titan also has support for other external indexing backend ● Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0 /indexes.html
  • 16. High Order Vertices Don't always want to traverse all edges incident on a vertex Filtering based on some edge properties is desirable Similar to vertices: feasible to inspect each edge for low order vertices Prohibitive on high order vertices Traverse these edges
  • 17. Vertex Centric Index Example query: dateFilter = lt(20051000) g.V().has('conference', 'Big Ten Conference').as('team', 'wins', 'losses') .select('team', 'wins', 'losses') .by('name') .by(__.outE().has('date', dateFilter).count()) .by(__.inE().has('date', dateFilter).count()) Gets Big 10 team records for games played before October 2005
  • 18. Notes on Vertex Centric Indices From Titan 1.0.0 documentation: Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html#vertex-indexes Titan 0.4.4 does not automatically create vertex-centric indices No need to create one for our example May be necessary if a composite key query is being performed Ex: Get Big 10 team records for games played before October 2005 and won by more than 14 points
  • 19. Query Structuring Order of steps in query can make a difference Consider: g.V().order().by(__.outE().count(), decr).has('conference', 'Big Ten Conference').values('name') vs g.V().has('conference', 'Big Ten Conference').order().by(__.outE().count(), decr).values('name') Mean times: 1032.8 ms vs 42.8 ms respectively
  • 20. Titan Caching Support for database and transaction level caching Storage Backend Titan DB Cache TX Cache TX Cache TX Cache Client Client Client
  • 21. Transaction Cache Transaction starts on graph access and ends on commit or rollback Useful for workloads accessing same data repeatedly A B C D Rank of team A is a count of these "beat" edges Ex: Team Rankings g.V().order().by(__.out().out().count(), decr).as('team', 'score', 'wins', 'losses').select('team', 'score', 'wins', 'losses').by('name').by(__.out().out().count()).by(__.outE().count()).by(__.inE().count()).limit(25) With TX cache: 3361 ms, without TX cache: 5206 ms
  • 22. /r/mildlyinteresting/ 1. Texas 2. USC 3. Penn State 4. Ohio State 5. Virginia Tech 6. TCU 7. West Virginia 8. Lousianna State 9. Alabama 10. Oregon 11. Louisville 12. Georgia 13. UCLA 14. Miami (FL) 1. Texas 2. USC 3. Penn State 4. Virginia Tech 5. LSU 6. Ohio State 7. Georgia 8. TCU 9. West Virginia 10. Alabama 11. Boston College 12. Oklahoma 13. Florida 14. UCLA http://www.collegefootballpoll.com/2005_archive_computer_rankings.html 2005 End of Season Computer Rankings Our Query Results
  • 23. Transaction Caching Gotchas Cache Thrashing Symptom: Queries suddenly & significantly slow down as data size increases Solve this by tuning transaction cache size ● Globally by setting cache.tx-cache-size ● Per transaction using TransactionBuilder Memory Leak Transactions automatically started and are thread aware With read only access in separate threads, transaction caches can leak Solved by calling g.rollback() at the end of the thread execution (releases the TX cache)
  • 24. Transaction Cache Settings Transaction cache can be setup in properties files Settings can be overridden when creating transaction using TransactionBuilder: Example: tx=graph.buildTransaction().vertexCacheSize(50000).start() Other transaction settings can be found here: http://s3.thinkaurelius. com/docs/titan/1.0.0/tx.html#tx-config
  • 25. Database Level Caching Database caching helps performance across transactions: gremlin> stats2.getValues() ==>[1016.0, 41.0, 27.0, 26.0, 24.0, 23.0, 24.0, 21.0, 18.0, 18.0] Trades consistency for speed in clusters Node 1 Node 2 Node n Titan 1 Titan 2 Titan n Cold cache Warm cache 1. Read 2. Write 3. Read
  • 26. Titan Options query.batch - Whether traversal queries should be batched when executed against the storage backend. This can lead to significant performance improvement if there is a non-trivial latency to the backend. query.fast-property - Whether to pre-fetch all properties on first singular vertex property access. This can eliminate backend calls on subsequent property access for the same vertex at the expense of retrieving all properties at once. This can be expensive for vertices with many properties http://s3.thinkaurelius.com/docs/titan/1.0.0/titan-config-ref.html
  • 27. Using "query.fast-property" Test query: g.V().group().by('conference').by('name') query.fast-property = false: n: 10 min: 243.0 max: 262.0 mean: 250.4 std dev: 5.125101625008685 median: 250.0 query.fast-property = true: n: 10 min: 127.0 max: 151.0 mean: 138.1 std dev: 7.233410137841088 median: 139.5
  • 28. Summary ● Titan indices ○ Property indices - vertex/edge lookups ○ Vertex centric indices - edge traversals ● Generally limiting elements early in traversal is a good thing ● Caching ○ Database level - improve speed while potentially increasing likelihood of stale data ○ Transaction level - helps when repeatedly visiting elements within a transaction ● Various other options available for specific tuning needs
  • 29. Thanks For Watching Questions Nakul Jeirath @njeirath Senior Security Engineer - WellAware