ECMFA 2016 slides

Intro Experiment setup Results
Stress-Testing Centralised Model Stores
Antonio García-Domínguez, Dimitris Kolovos, Konstantinos
Barmpis, Ran Wei and Richard Paige
University of York, Aston University
ECMFA’16
July 6th, 2016
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21

Approaches for collaborative modelling
Use ﬁle-based models over standard VCS
Simple to use, reuses mature VCS (SVN/Git)
Large models can be broken up into fragments
Loss of big picture (no simple way to do model-wide queries)
Use specialized model repositories (e.g. CDO)
Harder to use, proprietary versioning, less widely adopted
Models are directly stored in a database
Queries are answered from the database
Hawk: solving limitations with ﬁle-based VCS
Mirrors and reconnects fragments into a graph DB
Queries are fast, versioning and storage are orthogonal

Simplified workflow of Hawk
Workflow implemented by Hawk
Hawk uses a monitor to watch over collections of model files:
local folders, SVN/Git repos, Eclipse workspaces...
If files are changed, graph is updated to mirror their contents
Graph DB can be then queried through local/remote APIs

Structure of a Hawk index
Metamodel and model on the left side produce graph on the right side
Node types: metamodels, types, instances and ﬁles
Two lookup tables for metamodels and ﬁles

Additional features in Hawk
Indexed attributes
Common scenario: ﬁnd an Author by name
Users can tell Hawk to index a type by an attribute
EOL queries will reuse index transparently, e.g.
“Author.all.select(x | x.name = ’Value’)”
Derived features
Another scenario: ﬁnd Authors with 10+ books
Hawk can be told to precompute this and prepare a lookup
EOL queries written with the new feature will be sped up, e.g.
“Author.all.select(x | x.nBooks >= 10)”

Model repositories: Eclipse CDO
Pluggable storage
CDO can support multiple storage solutions
DB store is the most mature (embedded H2 by default)
Other stores include MongoDB, db4o or Objectivity
Caching and querying
CDO provides an EMF Resource implementation
Resource provides comprehensive generic caching
Remote queries are supported (OCL)

Comparing remote query APIs in CDO and Hawk
Hawk
Based on Apache Thrift (JSON / binary formats) + gzip
Stateless service-oriented API (e.g. “query”, “addRepository”)
Client → server: request-response
Server → client: subscribe-publish
Supports HTTP(S) and TCP
CDO
Based on Eclipse Net4j (binary)
Stateful buﬀer-oriented API (opaque sequences of bytes)
Bidirectional communication between client and server:
TCP: persistent connection
HTTP(S): client polls server

Research questions
Observations about CDO and Hawk
Both represent a model as a database
Both have remote model querying APIs
Each system has made diﬀerent API design choices
How do those choices impact query throughput?
Questions
RQ1: impact of HTTP vs TCP?
RQ2: impact of API design?
RQ3: impact of caching and indexed/derived attributes?

Intro Experiment setup Results Network Queries
Experiment setup: systems used
Observations
CDO and Hawk used same hardware, same version of Eclipse
(Mars), same HTTP server (Jetty) and memory (4GiB)
Only one of CDO or Hawk ran at a time
Controller manages clients and collects results through SSH

Experiment setup: workload
Model used: set4 from GraBaTs 2009
Reverse engineered from Eclipse JDT source code
Contains 4.9M elements: 677MB XMI file
1.4GB in CDO (H2 database)
1.9GB in Hawk (Neo4j graph)
Workload configurations
Servers are “warmed up” to a steady state first
Lightest workload: 1 machine runs 1000 queries over 1 thread
Rest: 2 machines, each runs 500 queries over 2–32 threads
Measurements
Time to connect + query + retrieve element IDs
Refer to paper for notched box plots and statistical tests

Queries: OCL
Listing 1: OQ: GraBaTs query in OCL for evaluating CDO
1 DOM::TypeDeclaration.allInstances()→select(td |
2 td.bodyDeclarations→selectByKind(DOM::MethodDeclaration)
3 →exists(md : DOM::MethodDeclaration |
4 md.modifiers
5 →selectByKind(DOM::Modifier)
6 →exists(mod : DOM::Modifier | mod.public)
7 and md.modifiers
8 →selectByKind(DOM::Modifier)
9 →exists(mod : DOM::Modifier | mod.static)
10 and md.returnType.oclIsTypeOf(DOM::SimpleType)
11 and md.returnType.oclAsType(DOM::SimpleType).name.fullyQualifiedName
12 = td.name.fullyQualifiedName))
Summary
Finds all possible singletons (returned from a static and public
method within the same type).

Queries: basic EOL
Listing 2: HQ1: translation of OQ to EOL for evaluating Hawk
1 return TypeDeclaration.all.select(td|
2 td.bodyDeclarations.exists(md:MethodDeclaration|
3 md.returnType.isTypeOf(SimpleType)
4 and md.returnType.name.fullyQualifiedName == td.name.fullyQualifiedName
5 and md.modifiers.exists(mod:Modifier|mod.public==true)
6 and md.modifiers.exists(mod:Modifier|mod.static==true)));
Summary
Direct translation of the OCL query.

Queries: EOL + extended MethodDeclarations
Listing 3: HQ2: HQ1 using derived attributes on MethodDeclaration
1 return MethodDeclaration.all.select(md |
2 md.isPublic and md.isStatic and md.isSameReturnType
3 ).collect( td | td.eContainer ).asSet;
Better approach
Tell Hawk to extend MethodDeclaration with “isPublic”,
“isStatic” and “isSameReturnType”
Perform lookup for the relevant MethodDeclarations
Retrieve the set of TypeDeclarations that contain them

Queries: EOL + extended TypeDeclarations
Listing 4: HQ2: HQ1 using derived attributes on TypeDeclaration
1 return TypeDeclaration.all.select(td|td.isSingleton);
Even better approach
Tell Hawk to extend TypeDeclaration with “isSingleton”
Perform lookup for the relevant TypeDeclarations directly

RQ1: protocol impact (CDO)
HTTP degrades CDO noticeably
1 2 4 8 16 32 64
0
1
2
3
4
5
·104
Client threads
Medianresponsetime(ms)
TCP
HTTP
1 2 4 8 16 32 64
0
5
10
15
20
25
Client threads
Failedqueries(CDO+HTTP)HTTP woes
635.66% hit for 1 client, still noticeable for 2 and 4
Slight chance of errors or incorrect results for 4+ threads

RQ1: protocol impact (Hawk)
HTTP hit is consistent for Hawk
1 2 4 8 16 32 64
0
1
2
3
4
5
·104
Client threads
Medianresponsetime(ms)
TCP
HTTP
Hawk+HTTP has a roughly consistent 20% performance hit
No failed queries and no incorrect query results

RQ2: API design impact
Packet traces with Wireshark explain HTTP results
CDO trace: 58 packets (10.2kB)
Session setup → query setup → 6s of silence → results
Conclusion: CDO+HTTP uses regular polling for server-client
communication, and CDO reports results asynchronously
Introduces delay, breaks down for many clients
Suggestion: long polling / WebSockets instead?
Hawk trace: 14 packets (2.8kB)
Single request/response pair (no session/query setup)
Simple and reliable for small result sets
May have problems transmitting large result sets
Suggestion: optional async query API (pub-sub)

RQ3: impact of internals
1 2 4 8 16 32 64
102
103
104
Client threads
Medianresponsetime(ms) CDO + OCL
Hawk + EOL, basic
Hawk + EOL, isPublic
Hawk + EOL, isSingleton
CDO has more extensive generic caching than Hawk: e.g. SQL
log shows it caches “X.all” in memory (Hawk uses DB cache)
Hawk outperforms CDO by 10x–100x with derived attributes
(replaces iteration with lookups + set intersections)

What would be my ideal API?
Service-oriented, sync+async sides
Service orientation makes third party integration easier
Synchronous req/resp: simple operations, small queries
Asynchronous pub/sub: complex operations, large queries
Sync API can set up async operations
Flexible encoding with transparent compression
Provide multiple encodings through code generation
Transparent gzip compression is easy to integrate
Note: HTTP ﬁelds didn’t add that much overhead (20%)
Internals for faster queries
Uncommon queries: extensive caching (as in CDO)
Common queries: query-speciﬁc indices (as in Hawk)

Conclusions and future work
Summary
In collaborative modelling, many users will query the same
models repeatedly to arrive at shared answers
CDO and Hawk implement remote querying very differently
From our results, we have suggested what an ideal remote
query API would be like
Future work
Wider assortment of queries (e.g. ones that exercise larger
portions of the models or produce large result sets)
Extend the range of configurations (tools, stores)
Analysing remote queries to offload tasks to client

End of the presentation
Questions?
@antoniogado

ECMFA 2016 slides

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à ECMFA 2016 slides

Similaire à ECMFA 2016 slides (20)

Plus de Antonio García-Domínguez

Plus de Antonio García-Domínguez (16)

Dernier

Dernier (20)

ECMFA 2016 slides