10 Trends Likely to Shape Enterprise Technology in 2024
ECMFA 2016 slides
1. Intro Experiment setup Results
Stress-Testing Centralised Model Stores
Antonio García-Domínguez, Dimitris Kolovos, Konstantinos
Barmpis, Ran Wei and Richard Paige
University of York, Aston University
ECMFA’16
July 6th, 2016
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21
2. Intro Experiment setup Results
Approaches for collaborative modelling
Use file-based models over standard VCS
Simple to use, reuses mature VCS (SVN/Git)
Large models can be broken up into fragments
Loss of big picture (no simple way to do model-wide queries)
Use specialized model repositories (e.g. CDO)
Harder to use, proprietary versioning, less widely adopted
Models are directly stored in a database
Queries are answered from the database
Hawk: solving limitations with file-based VCS
Mirrors and reconnects fragments into a graph DB
Queries are fast, versioning and storage are orthogonal
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 2 / 21
3. Intro Experiment setup Results
Simplified workflow of Hawk
Workflow implemented by Hawk
Hawk uses a monitor to watch over collections of model files:
local folders, SVN/Git repos, Eclipse workspaces...
If files are changed, graph is updated to mirror their contents
Graph DB can be then queried through local/remote APIs
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 3 / 21
4. Intro Experiment setup Results
Structure of a Hawk index
Metamodel and model on the left side produce graph on the right side
Node types: metamodels, types, instances and files
Two lookup tables for metamodels and files
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 4 / 21
5. Intro Experiment setup Results
Additional features in Hawk
Indexed attributes
Common scenario: find an Author by name
Users can tell Hawk to index a type by an attribute
EOL queries will reuse index transparently, e.g.
“Author.all.select(x | x.name = ’Value’)”
Derived features
Another scenario: find Authors with 10+ books
Hawk can be told to precompute this and prepare a lookup
EOL queries written with the new feature will be sped up, e.g.
“Author.all.select(x | x.nBooks >= 10)”
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 5 / 21
6. Intro Experiment setup Results
Model repositories: Eclipse CDO
Pluggable storage
CDO can support multiple storage solutions
DB store is the most mature (embedded H2 by default)
Other stores include MongoDB, db4o or Objectivity
Caching and querying
CDO provides an EMF Resource implementation
Resource provides comprehensive generic caching
Remote queries are supported (OCL)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 6 / 21
7. Intro Experiment setup Results
Comparing remote query APIs in CDO and Hawk
Hawk
Based on Apache Thrift (JSON / binary formats) + gzip
Stateless service-oriented API (e.g. “query”, “addRepository”)
Client → server: request-response
Server → client: subscribe-publish
Supports HTTP(S) and TCP
CDO
Based on Eclipse Net4j (binary)
Stateful buffer-oriented API (opaque sequences of bytes)
Bidirectional communication between client and server:
TCP: persistent connection
HTTP(S): client polls server
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 7 / 21
8. Intro Experiment setup Results
Research questions
Observations about CDO and Hawk
Both represent a model as a database
Both have remote model querying APIs
Each system has made different API design choices
How do those choices impact query throughput?
Questions
RQ1: impact of HTTP vs TCP?
RQ2: impact of API design?
RQ3: impact of caching and indexed/derived attributes?
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 8 / 21
9. Intro Experiment setup Results Network Queries
Experiment setup: systems used
Observations
CDO and Hawk used same hardware, same version of Eclipse
(Mars), same HTTP server (Jetty) and memory (4GiB)
Only one of CDO or Hawk ran at a time
Controller manages clients and collects results through SSH
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 9 / 21
10. Intro Experiment setup Results Network Queries
Experiment setup: workload
Model used: set4 from GraBaTs 2009
Reverse engineered from Eclipse JDT source code
Contains 4.9M elements: 677MB XMI file
1.4GB in CDO (H2 database)
1.9GB in Hawk (Neo4j graph)
Workload configurations
Servers are “warmed up” to a steady state first
Lightest workload: 1 machine runs 1000 queries over 1 thread
Rest: 2 machines, each runs 500 queries over 2–32 threads
Measurements
Time to connect + query + retrieve element IDs
Refer to paper for notched box plots and statistical tests
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 10 / 21
11. Intro Experiment setup Results Network Queries
Queries: OCL
Listing 1: OQ: GraBaTs query in OCL for evaluating CDO
1 DOM::TypeDeclaration.allInstances()→select(td |
2 td.bodyDeclarations→selectByKind(DOM::MethodDeclaration)
3 →exists(md : DOM::MethodDeclaration |
4 md.modifiers
5 →selectByKind(DOM::Modifier)
6 →exists(mod : DOM::Modifier | mod.public)
7 and md.modifiers
8 →selectByKind(DOM::Modifier)
9 →exists(mod : DOM::Modifier | mod.static)
10 and md.returnType.oclIsTypeOf(DOM::SimpleType)
11 and md.returnType.oclAsType(DOM::SimpleType).name.fullyQualifiedName
12 = td.name.fullyQualifiedName))
Summary
Finds all possible singletons (returned from a static and public
method within the same type).
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 11 / 21
12. Intro Experiment setup Results Network Queries
Queries: basic EOL
Listing 2: HQ1: translation of OQ to EOL for evaluating Hawk
1 return TypeDeclaration.all.select(td|
2 td.bodyDeclarations.exists(md:MethodDeclaration|
3 md.returnType.isTypeOf(SimpleType)
4 and md.returnType.name.fullyQualifiedName == td.name.fullyQualifiedName
5 and md.modifiers.exists(mod:Modifier|mod.public==true)
6 and md.modifiers.exists(mod:Modifier|mod.static==true)));
Summary
Direct translation of the OCL query.
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 12 / 21
13. Intro Experiment setup Results Network Queries
Queries: EOL + extended MethodDeclarations
Listing 3: HQ2: HQ1 using derived attributes on MethodDeclaration
1 return MethodDeclaration.all.select(md |
2 md.isPublic and md.isStatic and md.isSameReturnType
3 ).collect( td | td.eContainer ).asSet;
Better approach
Tell Hawk to extend MethodDeclaration with “isPublic”,
“isStatic” and “isSameReturnType”
Perform lookup for the relevant MethodDeclarations
Retrieve the set of TypeDeclarations that contain them
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 13 / 21
14. Intro Experiment setup Results Network Queries
Queries: EOL + extended TypeDeclarations
Listing 4: HQ2: HQ1 using derived attributes on TypeDeclaration
1 return TypeDeclaration.all.select(td|td.isSingleton);
Even better approach
Tell Hawk to extend TypeDeclaration with “isSingleton”
Perform lookup for the relevant TypeDeclarations directly
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 14 / 21
15. Intro Experiment setup Results
RQ1: protocol impact (CDO)
HTTP degrades CDO noticeably
1 2 4 8 16 32 64
0
1
2
3
4
5
·104
Client threads
Medianresponsetime(ms)
TCP
HTTP
1 2 4 8 16 32 64
0
5
10
15
20
25
Client threads
Failedqueries(CDO+HTTP)HTTP woes
635.66% hit for 1 client, still noticeable for 2 and 4
Slight chance of errors or incorrect results for 4+ threads
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 15 / 21
16. Intro Experiment setup Results
RQ1: protocol impact (Hawk)
HTTP hit is consistent for Hawk
1 2 4 8 16 32 64
0
1
2
3
4
5
·104
Client threads
Medianresponsetime(ms)
TCP
HTTP
Hawk+HTTP has a roughly consistent 20% performance hit
No failed queries and no incorrect query results
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 16 / 21
17. Intro Experiment setup Results
RQ2: API design impact
Packet traces with Wireshark explain HTTP results
CDO trace: 58 packets (10.2kB)
Session setup → query setup → 6s of silence → results
Conclusion: CDO+HTTP uses regular polling for server-client
communication, and CDO reports results asynchronously
Introduces delay, breaks down for many clients
Suggestion: long polling / WebSockets instead?
Hawk trace: 14 packets (2.8kB)
Single request/response pair (no session/query setup)
Simple and reliable for small result sets
May have problems transmitting large result sets
Suggestion: optional async query API (pub-sub)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 17 / 21
18. Intro Experiment setup Results
RQ3: impact of internals
1 2 4 8 16 32 64
102
103
104
Client threads
Medianresponsetime(ms) CDO + OCL
Hawk + EOL, basic
Hawk + EOL, isPublic
Hawk + EOL, isSingleton
CDO has more extensive generic caching than Hawk: e.g. SQL
log shows it caches “X.all” in memory (Hawk uses DB cache)
Hawk outperforms CDO by 10x–100x with derived attributes
(replaces iteration with lookups + set intersections)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 18 / 21
19. Intro Experiment setup Results
What would be my ideal API?
Service-oriented, sync+async sides
Service orientation makes third party integration easier
Synchronous req/resp: simple operations, small queries
Asynchronous pub/sub: complex operations, large queries
Sync API can set up async operations
Flexible encoding with transparent compression
Provide multiple encodings through code generation
Transparent gzip compression is easy to integrate
Note: HTTP fields didn’t add that much overhead (20%)
Internals for faster queries
Uncommon queries: extensive caching (as in CDO)
Common queries: query-specific indices (as in Hawk)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 19 / 21
20. Intro Experiment setup Results
Conclusions and future work
Summary
In collaborative modelling, many users will query the same
models repeatedly to arrive at shared answers
CDO and Hawk implement remote querying very differently
From our results, we have suggested what an ideal remote
query API would be like
Future work
Wider assortment of queries (e.g. ones that exercise larger
portions of the models or produce large result sets)
Extend the range of configurations (tools, stores)
Analysing remote queries to offload tasks to client
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 20 / 21
21. Intro Experiment setup Results
End of the presentation
Questions?
@antoniogado
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 21 / 21