SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Intro Experiment setup Results
Stress-Testing Centralised Model Stores
Antonio García-Domínguez, Dimitris Kolovos, Konstantinos
Barmpis, Ran Wei and Richard Paige
University of York, Aston University
ECMFA’16
July 6th, 2016
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21
Intro Experiment setup Results
Approaches for collaborative modelling
Use file-based models over standard VCS
Simple to use, reuses mature VCS (SVN/Git)
Large models can be broken up into fragments
Loss of big picture (no simple way to do model-wide queries)
Use specialized model repositories (e.g. CDO)
Harder to use, proprietary versioning, less widely adopted
Models are directly stored in a database
Queries are answered from the database
Hawk: solving limitations with file-based VCS
Mirrors and reconnects fragments into a graph DB
Queries are fast, versioning and storage are orthogonal
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 2 / 21
Intro Experiment setup Results
Simplified workflow of Hawk
Workflow implemented by Hawk
Hawk uses a monitor to watch over collections of model files:
local folders, SVN/Git repos, Eclipse workspaces...
If files are changed, graph is updated to mirror their contents
Graph DB can be then queried through local/remote APIs
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 3 / 21
Intro Experiment setup Results
Structure of a Hawk index
Metamodel and model on the left side produce graph on the right side
Node types: metamodels, types, instances and files
Two lookup tables for metamodels and files
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 4 / 21
Intro Experiment setup Results
Additional features in Hawk
Indexed attributes
Common scenario: find an Author by name
Users can tell Hawk to index a type by an attribute
EOL queries will reuse index transparently, e.g.
“Author.all.select(x | x.name = ’Value’)”
Derived features
Another scenario: find Authors with 10+ books
Hawk can be told to precompute this and prepare a lookup
EOL queries written with the new feature will be sped up, e.g.
“Author.all.select(x | x.nBooks >= 10)”
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 5 / 21
Intro Experiment setup Results
Model repositories: Eclipse CDO
Pluggable storage
CDO can support multiple storage solutions
DB store is the most mature (embedded H2 by default)
Other stores include MongoDB, db4o or Objectivity
Caching and querying
CDO provides an EMF Resource implementation
Resource provides comprehensive generic caching
Remote queries are supported (OCL)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 6 / 21
Intro Experiment setup Results
Comparing remote query APIs in CDO and Hawk
Hawk
Based on Apache Thrift (JSON / binary formats) + gzip
Stateless service-oriented API (e.g. “query”, “addRepository”)
Client → server: request-response
Server → client: subscribe-publish
Supports HTTP(S) and TCP
CDO
Based on Eclipse Net4j (binary)
Stateful buffer-oriented API (opaque sequences of bytes)
Bidirectional communication between client and server:
TCP: persistent connection
HTTP(S): client polls server
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 7 / 21
Intro Experiment setup Results
Research questions
Observations about CDO and Hawk
Both represent a model as a database
Both have remote model querying APIs
Each system has made different API design choices
How do those choices impact query throughput?
Questions
RQ1: impact of HTTP vs TCP?
RQ2: impact of API design?
RQ3: impact of caching and indexed/derived attributes?
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 8 / 21
Intro Experiment setup Results Network Queries
Experiment setup: systems used
Observations
CDO and Hawk used same hardware, same version of Eclipse
(Mars), same HTTP server (Jetty) and memory (4GiB)
Only one of CDO or Hawk ran at a time
Controller manages clients and collects results through SSH
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 9 / 21
Intro Experiment setup Results Network Queries
Experiment setup: workload
Model used: set4 from GraBaTs 2009
Reverse engineered from Eclipse JDT source code
Contains 4.9M elements: 677MB XMI file
1.4GB in CDO (H2 database)
1.9GB in Hawk (Neo4j graph)
Workload configurations
Servers are “warmed up” to a steady state first
Lightest workload: 1 machine runs 1000 queries over 1 thread
Rest: 2 machines, each runs 500 queries over 2–32 threads
Measurements
Time to connect + query + retrieve element IDs
Refer to paper for notched box plots and statistical tests
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 10 / 21
Intro Experiment setup Results Network Queries
Queries: OCL
Listing 1: OQ: GraBaTs query in OCL for evaluating CDO
1 DOM::TypeDeclaration.allInstances()→select(td |
2 td.bodyDeclarations→selectByKind(DOM::MethodDeclaration)
3 →exists(md : DOM::MethodDeclaration |
4 md.modifiers
5 →selectByKind(DOM::Modifier)
6 →exists(mod : DOM::Modifier | mod.public)
7 and md.modifiers
8 →selectByKind(DOM::Modifier)
9 →exists(mod : DOM::Modifier | mod.static)
10 and md.returnType.oclIsTypeOf(DOM::SimpleType)
11 and md.returnType.oclAsType(DOM::SimpleType).name.fullyQualifiedName
12 = td.name.fullyQualifiedName))
Summary
Finds all possible singletons (returned from a static and public
method within the same type).
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 11 / 21
Intro Experiment setup Results Network Queries
Queries: basic EOL
Listing 2: HQ1: translation of OQ to EOL for evaluating Hawk
1 return TypeDeclaration.all.select(td|
2 td.bodyDeclarations.exists(md:MethodDeclaration|
3 md.returnType.isTypeOf(SimpleType)
4 and md.returnType.name.fullyQualifiedName == td.name.fullyQualifiedName
5 and md.modifiers.exists(mod:Modifier|mod.public==true)
6 and md.modifiers.exists(mod:Modifier|mod.static==true)));
Summary
Direct translation of the OCL query.
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 12 / 21
Intro Experiment setup Results Network Queries
Queries: EOL + extended MethodDeclarations
Listing 3: HQ2: HQ1 using derived attributes on MethodDeclaration
1 return MethodDeclaration.all.select(md |
2 md.isPublic and md.isStatic and md.isSameReturnType
3 ).collect( td | td.eContainer ).asSet;
Better approach
Tell Hawk to extend MethodDeclaration with “isPublic”,
“isStatic” and “isSameReturnType”
Perform lookup for the relevant MethodDeclarations
Retrieve the set of TypeDeclarations that contain them
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 13 / 21
Intro Experiment setup Results Network Queries
Queries: EOL + extended TypeDeclarations
Listing 4: HQ2: HQ1 using derived attributes on TypeDeclaration
1 return TypeDeclaration.all.select(td|td.isSingleton);
Even better approach
Tell Hawk to extend TypeDeclaration with “isSingleton”
Perform lookup for the relevant TypeDeclarations directly
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 14 / 21
Intro Experiment setup Results
RQ1: protocol impact (CDO)
HTTP degrades CDO noticeably
1 2 4 8 16 32 64
0
1
2
3
4
5
·104
Client threads
Medianresponsetime(ms)
TCP
HTTP
1 2 4 8 16 32 64
0
5
10
15
20
25
Client threads
Failedqueries(CDO+HTTP)HTTP woes
635.66% hit for 1 client, still noticeable for 2 and 4
Slight chance of errors or incorrect results for 4+ threads
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 15 / 21
Intro Experiment setup Results
RQ1: protocol impact (Hawk)
HTTP hit is consistent for Hawk
1 2 4 8 16 32 64
0
1
2
3
4
5
·104
Client threads
Medianresponsetime(ms)
TCP
HTTP
Hawk+HTTP has a roughly consistent 20% performance hit
No failed queries and no incorrect query results
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 16 / 21
Intro Experiment setup Results
RQ2: API design impact
Packet traces with Wireshark explain HTTP results
CDO trace: 58 packets (10.2kB)
Session setup → query setup → 6s of silence → results
Conclusion: CDO+HTTP uses regular polling for server-client
communication, and CDO reports results asynchronously
Introduces delay, breaks down for many clients
Suggestion: long polling / WebSockets instead?
Hawk trace: 14 packets (2.8kB)
Single request/response pair (no session/query setup)
Simple and reliable for small result sets
May have problems transmitting large result sets
Suggestion: optional async query API (pub-sub)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 17 / 21
Intro Experiment setup Results
RQ3: impact of internals
1 2 4 8 16 32 64
102
103
104
Client threads
Medianresponsetime(ms) CDO + OCL
Hawk + EOL, basic
Hawk + EOL, isPublic
Hawk + EOL, isSingleton
CDO has more extensive generic caching than Hawk: e.g. SQL
log shows it caches “X.all” in memory (Hawk uses DB cache)
Hawk outperforms CDO by 10x–100x with derived attributes
(replaces iteration with lookups + set intersections)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 18 / 21
Intro Experiment setup Results
What would be my ideal API?
Service-oriented, sync+async sides
Service orientation makes third party integration easier
Synchronous req/resp: simple operations, small queries
Asynchronous pub/sub: complex operations, large queries
Sync API can set up async operations
Flexible encoding with transparent compression
Provide multiple encodings through code generation
Transparent gzip compression is easy to integrate
Note: HTTP fields didn’t add that much overhead (20%)
Internals for faster queries
Uncommon queries: extensive caching (as in CDO)
Common queries: query-specific indices (as in Hawk)
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 19 / 21
Intro Experiment setup Results
Conclusions and future work
Summary
In collaborative modelling, many users will query the same
models repeatedly to arrive at shared answers
CDO and Hawk implement remote querying very differently
From our results, we have suggested what an ideal remote
query API would be like
Future work
Wider assortment of queries (e.g. ones that exercise larger
portions of the models or produce large result sets)
Extend the range of configurations (tools, stores)
Analysing remote queries to offload tasks to client
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 20 / 21
Intro Experiment setup Results
End of the presentation
Questions?
@antoniogado
A. García-Domínguez et al. Stress-Testing Centralised Model Stores 21 / 21

Contenu connexe

Tendances

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
University of California, San Diego
 
How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19
Databricks
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
Redis Labs
 
Eclipse Memory Analyzer - More Than Just a Heap Walker
Eclipse Memory Analyzer - More Than Just a Heap WalkerEclipse Memory Analyzer - More Than Just a Heap Walker
Eclipse Memory Analyzer - More Than Just a Heap Walker
guest62fd60c
 
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
Ontico
 

Tendances (20)

Dapper
DapperDapper
Dapper
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
Jvm profiling under the hood
Jvm profiling under the hoodJvm profiling under the hood
Jvm profiling under the hood
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
Debugging Your Production JVM
Debugging Your Production JVMDebugging Your Production JVM
Debugging Your Production JVM
 
How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
 
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
 
PyCon Ukraine 2016: Maintaining a high load Python project for newcomers
PyCon Ukraine 2016: Maintaining a high load Python project for newcomersPyCon Ukraine 2016: Maintaining a high load Python project for newcomers
PyCon Ukraine 2016: Maintaining a high load Python project for newcomers
 
Eclipse Memory Analyzer - More Than Just a Heap Walker
Eclipse Memory Analyzer - More Than Just a Heap WalkerEclipse Memory Analyzer - More Than Just a Heap Walker
Eclipse Memory Analyzer - More Than Just a Heap Walker
 
Static analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutesStatic analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutes
 
MongoDB Quick Reference Card
MongoDB Quick Reference CardMongoDB Quick Reference Card
MongoDB Quick Reference Card
 
The Ring programming language version 1.5.4 book - Part 180 of 185
The Ring programming language version 1.5.4 book - Part 180 of 185The Ring programming language version 1.5.4 book - Part 180 of 185
The Ring programming language version 1.5.4 book - Part 180 of 185
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
Advanced Production Debugging
Advanced Production DebuggingAdvanced Production Debugging
Advanced Production Debugging
 
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
 
Dapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUDDapper & Dapper.SimpleCRUD
Dapper & Dapper.SimpleCRUD
 

Similaire à ECMFA 2016 slides

1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
honey725342
 

Similaire à ECMFA 2016 slides (20)

OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?
 
Composing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap LayersComposing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap Layers
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
It's always your fault
It's always your faultIt's always your fault
It's always your fault
 
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
Model-based Regression Testing of Autonomous Robots
Model-based Regression Testing of Autonomous RobotsModel-based Regression Testing of Autonomous Robots
Model-based Regression Testing of Autonomous Robots
 

Plus de Antonio García-Domínguez

Software libre para la integración de información en la Universidad de Cádiz
Software libre para la integración de información en la Universidad de CádizSoftware libre para la integración de información en la Universidad de Cádiz
Software libre para la integración de información en la Universidad de Cádiz
Antonio García-Domínguez
 

Plus de Antonio García-Domínguez (16)

MODELS 2022 Journal-First presentation: ETeMoX - explaining reinforcement lea...
MODELS 2022 Journal-First presentation: ETeMoX - explaining reinforcement lea...MODELS 2022 Journal-First presentation: ETeMoX - explaining reinforcement lea...
MODELS 2022 Journal-First presentation: ETeMoX - explaining reinforcement lea...
 
MODELS 2022 Picto Web tool demo
MODELS 2022 Picto Web tool demoMODELS 2022 Picto Web tool demo
MODELS 2022 Picto Web tool demo
 
EduSymp 2022 slides (The Epsilon Playground)
EduSymp 2022 slides (The Epsilon Playground)EduSymp 2022 slides (The Epsilon Playground)
EduSymp 2022 slides (The Epsilon Playground)
 
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
History-Aware Explanations: Towards Enabling Human-in-the-Loop in Self-Adapti...
 
Boosting individual feedback with AutoFeedback
Boosting individual feedback with AutoFeedbackBoosting individual feedback with AutoFeedback
Boosting individual feedback with AutoFeedback
 
MODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsMODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patterns
 
Tips and resources for publication-grade figures and tables
Tips and resources for publication-grade figures and tablesTips and resources for publication-grade figures and tables
Tips and resources for publication-grade figures and tables
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
MRT 2018: reflecting on the past and the present with temporal graph models
MRT 2018: reflecting on the past and the present with temporal graph modelsMRT 2018: reflecting on the past and the present with temporal graph models
MRT 2018: reflecting on the past and the present with temporal graph models
 
Hawk: indexado de modelos en bases de datos NoSQL
Hawk: indexado de modelos en bases de datos NoSQLHawk: indexado de modelos en bases de datos NoSQL
Hawk: indexado de modelos en bases de datos NoSQL
 
Software and product quality for videogames
Software and product quality for videogamesSoftware and product quality for videogames
Software and product quality for videogames
 
Developing a new Epsilon Language through Annotations: TestLang
Developing a new Epsilon Language through Annotations: TestLangDeveloping a new Epsilon Language through Annotations: TestLang
Developing a new Epsilon Language through Annotations: TestLang
 
MoDELS'16 presentation: Integration of a Graph-Based Model Indexer in Commerc...
MoDELS'16 presentation: Integration of a Graph-Based Model Indexer in Commerc...MoDELS'16 presentation: Integration of a Graph-Based Model Indexer in Commerc...
MoDELS'16 presentation: Integration of a Graph-Based Model Indexer in Commerc...
 
BMSD 2015 slides (revised)
BMSD 2015 slides (revised)BMSD 2015 slides (revised)
BMSD 2015 slides (revised)
 
Elaboración de un buen póster científico
Elaboración de un buen póster científicoElaboración de un buen póster científico
Elaboración de un buen póster científico
 
Software libre para la integración de información en la Universidad de Cádiz
Software libre para la integración de información en la Universidad de CádizSoftware libre para la integración de información en la Universidad de Cádiz
Software libre para la integración de información en la Universidad de Cádiz
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 

ECMFA 2016 slides

  • 1. Intro Experiment setup Results Stress-Testing Centralised Model Stores Antonio García-Domínguez, Dimitris Kolovos, Konstantinos Barmpis, Ran Wei and Richard Paige University of York, Aston University ECMFA’16 July 6th, 2016 A. García-Domínguez et al. Stress-Testing Centralised Model Stores 1 / 21
  • 2. Intro Experiment setup Results Approaches for collaborative modelling Use file-based models over standard VCS Simple to use, reuses mature VCS (SVN/Git) Large models can be broken up into fragments Loss of big picture (no simple way to do model-wide queries) Use specialized model repositories (e.g. CDO) Harder to use, proprietary versioning, less widely adopted Models are directly stored in a database Queries are answered from the database Hawk: solving limitations with file-based VCS Mirrors and reconnects fragments into a graph DB Queries are fast, versioning and storage are orthogonal A. García-Domínguez et al. Stress-Testing Centralised Model Stores 2 / 21
  • 3. Intro Experiment setup Results Simplified workflow of Hawk Workflow implemented by Hawk Hawk uses a monitor to watch over collections of model files: local folders, SVN/Git repos, Eclipse workspaces... If files are changed, graph is updated to mirror their contents Graph DB can be then queried through local/remote APIs A. García-Domínguez et al. Stress-Testing Centralised Model Stores 3 / 21
  • 4. Intro Experiment setup Results Structure of a Hawk index Metamodel and model on the left side produce graph on the right side Node types: metamodels, types, instances and files Two lookup tables for metamodels and files A. García-Domínguez et al. Stress-Testing Centralised Model Stores 4 / 21
  • 5. Intro Experiment setup Results Additional features in Hawk Indexed attributes Common scenario: find an Author by name Users can tell Hawk to index a type by an attribute EOL queries will reuse index transparently, e.g. “Author.all.select(x | x.name = ’Value’)” Derived features Another scenario: find Authors with 10+ books Hawk can be told to precompute this and prepare a lookup EOL queries written with the new feature will be sped up, e.g. “Author.all.select(x | x.nBooks >= 10)” A. García-Domínguez et al. Stress-Testing Centralised Model Stores 5 / 21
  • 6. Intro Experiment setup Results Model repositories: Eclipse CDO Pluggable storage CDO can support multiple storage solutions DB store is the most mature (embedded H2 by default) Other stores include MongoDB, db4o or Objectivity Caching and querying CDO provides an EMF Resource implementation Resource provides comprehensive generic caching Remote queries are supported (OCL) A. García-Domínguez et al. Stress-Testing Centralised Model Stores 6 / 21
  • 7. Intro Experiment setup Results Comparing remote query APIs in CDO and Hawk Hawk Based on Apache Thrift (JSON / binary formats) + gzip Stateless service-oriented API (e.g. “query”, “addRepository”) Client → server: request-response Server → client: subscribe-publish Supports HTTP(S) and TCP CDO Based on Eclipse Net4j (binary) Stateful buffer-oriented API (opaque sequences of bytes) Bidirectional communication between client and server: TCP: persistent connection HTTP(S): client polls server A. García-Domínguez et al. Stress-Testing Centralised Model Stores 7 / 21
  • 8. Intro Experiment setup Results Research questions Observations about CDO and Hawk Both represent a model as a database Both have remote model querying APIs Each system has made different API design choices How do those choices impact query throughput? Questions RQ1: impact of HTTP vs TCP? RQ2: impact of API design? RQ3: impact of caching and indexed/derived attributes? A. García-Domínguez et al. Stress-Testing Centralised Model Stores 8 / 21
  • 9. Intro Experiment setup Results Network Queries Experiment setup: systems used Observations CDO and Hawk used same hardware, same version of Eclipse (Mars), same HTTP server (Jetty) and memory (4GiB) Only one of CDO or Hawk ran at a time Controller manages clients and collects results through SSH A. García-Domínguez et al. Stress-Testing Centralised Model Stores 9 / 21
  • 10. Intro Experiment setup Results Network Queries Experiment setup: workload Model used: set4 from GraBaTs 2009 Reverse engineered from Eclipse JDT source code Contains 4.9M elements: 677MB XMI file 1.4GB in CDO (H2 database) 1.9GB in Hawk (Neo4j graph) Workload configurations Servers are “warmed up” to a steady state first Lightest workload: 1 machine runs 1000 queries over 1 thread Rest: 2 machines, each runs 500 queries over 2–32 threads Measurements Time to connect + query + retrieve element IDs Refer to paper for notched box plots and statistical tests A. García-Domínguez et al. Stress-Testing Centralised Model Stores 10 / 21
  • 11. Intro Experiment setup Results Network Queries Queries: OCL Listing 1: OQ: GraBaTs query in OCL for evaluating CDO 1 DOM::TypeDeclaration.allInstances()→select(td | 2 td.bodyDeclarations→selectByKind(DOM::MethodDeclaration) 3 →exists(md : DOM::MethodDeclaration | 4 md.modifiers 5 →selectByKind(DOM::Modifier) 6 →exists(mod : DOM::Modifier | mod.public) 7 and md.modifiers 8 →selectByKind(DOM::Modifier) 9 →exists(mod : DOM::Modifier | mod.static) 10 and md.returnType.oclIsTypeOf(DOM::SimpleType) 11 and md.returnType.oclAsType(DOM::SimpleType).name.fullyQualifiedName 12 = td.name.fullyQualifiedName)) Summary Finds all possible singletons (returned from a static and public method within the same type). A. García-Domínguez et al. Stress-Testing Centralised Model Stores 11 / 21
  • 12. Intro Experiment setup Results Network Queries Queries: basic EOL Listing 2: HQ1: translation of OQ to EOL for evaluating Hawk 1 return TypeDeclaration.all.select(td| 2 td.bodyDeclarations.exists(md:MethodDeclaration| 3 md.returnType.isTypeOf(SimpleType) 4 and md.returnType.name.fullyQualifiedName == td.name.fullyQualifiedName 5 and md.modifiers.exists(mod:Modifier|mod.public==true) 6 and md.modifiers.exists(mod:Modifier|mod.static==true))); Summary Direct translation of the OCL query. A. García-Domínguez et al. Stress-Testing Centralised Model Stores 12 / 21
  • 13. Intro Experiment setup Results Network Queries Queries: EOL + extended MethodDeclarations Listing 3: HQ2: HQ1 using derived attributes on MethodDeclaration 1 return MethodDeclaration.all.select(md | 2 md.isPublic and md.isStatic and md.isSameReturnType 3 ).collect( td | td.eContainer ).asSet; Better approach Tell Hawk to extend MethodDeclaration with “isPublic”, “isStatic” and “isSameReturnType” Perform lookup for the relevant MethodDeclarations Retrieve the set of TypeDeclarations that contain them A. García-Domínguez et al. Stress-Testing Centralised Model Stores 13 / 21
  • 14. Intro Experiment setup Results Network Queries Queries: EOL + extended TypeDeclarations Listing 4: HQ2: HQ1 using derived attributes on TypeDeclaration 1 return TypeDeclaration.all.select(td|td.isSingleton); Even better approach Tell Hawk to extend TypeDeclaration with “isSingleton” Perform lookup for the relevant TypeDeclarations directly A. García-Domínguez et al. Stress-Testing Centralised Model Stores 14 / 21
  • 15. Intro Experiment setup Results RQ1: protocol impact (CDO) HTTP degrades CDO noticeably 1 2 4 8 16 32 64 0 1 2 3 4 5 ·104 Client threads Medianresponsetime(ms) TCP HTTP 1 2 4 8 16 32 64 0 5 10 15 20 25 Client threads Failedqueries(CDO+HTTP)HTTP woes 635.66% hit for 1 client, still noticeable for 2 and 4 Slight chance of errors or incorrect results for 4+ threads A. García-Domínguez et al. Stress-Testing Centralised Model Stores 15 / 21
  • 16. Intro Experiment setup Results RQ1: protocol impact (Hawk) HTTP hit is consistent for Hawk 1 2 4 8 16 32 64 0 1 2 3 4 5 ·104 Client threads Medianresponsetime(ms) TCP HTTP Hawk+HTTP has a roughly consistent 20% performance hit No failed queries and no incorrect query results A. García-Domínguez et al. Stress-Testing Centralised Model Stores 16 / 21
  • 17. Intro Experiment setup Results RQ2: API design impact Packet traces with Wireshark explain HTTP results CDO trace: 58 packets (10.2kB) Session setup → query setup → 6s of silence → results Conclusion: CDO+HTTP uses regular polling for server-client communication, and CDO reports results asynchronously Introduces delay, breaks down for many clients Suggestion: long polling / WebSockets instead? Hawk trace: 14 packets (2.8kB) Single request/response pair (no session/query setup) Simple and reliable for small result sets May have problems transmitting large result sets Suggestion: optional async query API (pub-sub) A. García-Domínguez et al. Stress-Testing Centralised Model Stores 17 / 21
  • 18. Intro Experiment setup Results RQ3: impact of internals 1 2 4 8 16 32 64 102 103 104 Client threads Medianresponsetime(ms) CDO + OCL Hawk + EOL, basic Hawk + EOL, isPublic Hawk + EOL, isSingleton CDO has more extensive generic caching than Hawk: e.g. SQL log shows it caches “X.all” in memory (Hawk uses DB cache) Hawk outperforms CDO by 10x–100x with derived attributes (replaces iteration with lookups + set intersections) A. García-Domínguez et al. Stress-Testing Centralised Model Stores 18 / 21
  • 19. Intro Experiment setup Results What would be my ideal API? Service-oriented, sync+async sides Service orientation makes third party integration easier Synchronous req/resp: simple operations, small queries Asynchronous pub/sub: complex operations, large queries Sync API can set up async operations Flexible encoding with transparent compression Provide multiple encodings through code generation Transparent gzip compression is easy to integrate Note: HTTP fields didn’t add that much overhead (20%) Internals for faster queries Uncommon queries: extensive caching (as in CDO) Common queries: query-specific indices (as in Hawk) A. García-Domínguez et al. Stress-Testing Centralised Model Stores 19 / 21
  • 20. Intro Experiment setup Results Conclusions and future work Summary In collaborative modelling, many users will query the same models repeatedly to arrive at shared answers CDO and Hawk implement remote querying very differently From our results, we have suggested what an ideal remote query API would be like Future work Wider assortment of queries (e.g. ones that exercise larger portions of the models or produce large result sets) Extend the range of configurations (tools, stores) Analysing remote queries to offload tasks to client A. García-Domínguez et al. Stress-Testing Centralised Model Stores 20 / 21
  • 21. Intro Experiment setup Results End of the presentation Questions? @antoniogado A. García-Domínguez et al. Stress-Testing Centralised Model Stores 21 / 21