SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Model-based Mining of 
Source Code Repositories 
Markus Scheidgen, Joachim Fischer 
{scheidge,fischer}@informatik.hu-berlin.de 
1 
Tuesday, 30. September 2014
Agenda 
▶ Mining Software Repositories (MSR) and Software Evolution 
Research (SER) 
▶ srcrepo – a model-based MSR system 
■ components and mining process 
■ distributed storage of very large models with emf-fragments 
■ a meta-model for source code repositories 
■ gathering software metrics with an OCL-like internal Scala DSL 
▶ work in progress - discussion of remaining problems and 
limitations 
2 
Tuesday, 30. September 2014
Relevant Research Fields 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
3 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
srcrepo – Components and Process 
4 
Tuesday, 30. September 2014
srcrepo – Components and Process 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
Tuesday, 30. September 2014
srcrepo – Components and Process 
revision tree 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
srcrepo storage 
(EMF-models via EMF-Fragments, 
e.g. on mongodb) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
Tuesday, 30. September 2014
srcrepo – Components and Process 
revision tree 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
srcrepo storage 
(EMF-models via EMF-Fragments, 
e.g. on mongodb) 
srcrepo runtime 
(headless eclipse RCP) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
revision tree 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
M! 
M!"# 
M# M# 
M#"$ 
store 
timelines of metrics 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
statistics software 
(e.g. R, Matlab) 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
M! 
M!"# 
M# M# 
M#"$ 
store 
timelines of metrics 
export 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
Distributed Model Store with emf-fragments 
6 
normal references 
* 
Tuesday, 30. September 2014
Distributed Model Store with emf-fragments 
6 
normal references 
* 
fragmenting references 
«fragments» 
* 
Tuesday, 30. September 2014
A Meta-Model for Source Code Repositories 
7 
code revision tree 
» 
«fragments» 
Tuesday, 30. September 2014
8 
MoDisco source code revision tree 
«fragments» 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
▶ OCL-like internal Scala DSL analog to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
9 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
ownedPackages 
▶ OCL-like internal Scala Model DSL analog Package 
to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
9 
pure OCL 
context Model: 
self.ownedElements->collect(p|p.ownedElements)->size 
Abstract 
Type 
Declaration 
ownedElements 
ownedElements 
* 
* 
* 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
Abstract 
Type 
Declaration 
ownedPackages 
* 
▶ OCL-like internal Scala DSL ownedElements 
Model analog Package 
to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
9 
pure OCL 
context Model: 
* 
ownedElements 
* 
self.ownedElements->collect(p|p.ownedElements)->size 
OCL-like expression in Scala 
def numberOfFirstPackageLevelTypes(self: Model): Int = 
self.getOwnedElements().collect(p=>p.getOwnedElements()).size() 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
▶ Extending OCL’s collection operations: 
■ convenience operations 
■ closure 
■ aggregation 
■ execution 
trait OclCollection[E] extends java.lang.Iterable[E] 
{ 
def size(): Int 
def first(): E 
def exists(predicate: (E) => Boolean): Boolean 
def forAll(predicate: (E) => Boolean): Boolean 
def select(predicate: (E) => Boolean): OclCollection[E] 
def reject(predicate: (E) => Boolean): OclCollection[E] 
def collect[R](expr: (E) => R): OclCollection[R] 
def selectOfType[T]:OclCollection[T] 
def collectNotNull[R](expr: (E) => R): OclCollection[R] 
def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] 
def closure(expr: (E) => OclCollection[10 
E]): OclCollection[E] 
123456789 
10 
11 
12 
13 
14 
15 
16 
Tuesday, 30. September 2014
trait OclCollection[E] extends java.lang.Iterable[E] 
{ 
def size(): Int 
def first(): E 
def exists(predicate: (E) => Boolean): Boolean 
def forAll(predicate: (E) => Boolean): Boolean 
def select(predicate: (E) => Boolean): OclCollection[E] 
def reject(predicate: (E) => Boolean): OclCollection[E] 
def collect[R](expr: (E) => R): OclCollection[R] 
def selectOfType[T]:OclCollection[T] 
def collectNotNull[R](expr: (E) => R): OclCollection[R] 
def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] 
def closure(expr: (E) => OclCollection[E]): OclCollection[E] 
def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R 
def sum(expr: (E) => Double): Double 
def product(expr: (E) => Double): Double 
def max(expr: (E) => Double): Double 
def min(expr: (E) => Double): Double 
def stats(expr: (E) => Double): Stats 
def run(runnable: (E) => Unit): Unit 
} 
11 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
Tuesday, 30. September 2014
Complex Example: Average Weighted Methods per Class (WMC) 
▶ WMC is the first CK-metric [1]. There different commonly 
used weights; here we use cyclomatic complexity. 
def classes(model:Model):OclCollection[ClassDeclaration] = 
model.getOwnedElements() 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
Tuesday, 30. September 2014
Complex Example: Average Weighted Methods per Class (WMC) 
▶ WMC is the first CK-metric [1]. There different commonly 
used weights; here we use cyclomatic complexity. 
def classes(model:Model):OclCollection[ClassDeclaration] = 
model.getOwnedElements() 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
Model 
Abstract 
Body 
Declaration 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
Package 
ownedPackages 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
Abstract 
Type 
Declaration 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
Class 
Declaration 
1 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
TypeAccess 
ownedElements 
ownedElements 
bodyDeclarations 
superInterfaces 
superClass 
returnType 
type 
usagesInTypeAccess 
* 
* * 
* 
* * 
1 
1 
Tuesday, 30. September 2014
Model 
Package 
ownedElements 
Complex Example: Average Weighted Methods * per Class * 
ownedPackages 
(WMC) 
ownedElements 
* 
▶ WMC is the first CK-metric [1]. There different Abstract 
commonly 
Type 
used weights; here we use cyclomatic complexity. 
Declaration 
bodyDeclarations 
type 
* 
1 
def classes(model:Model):OclCollection[Abstract 
ClassDeclaration] = 
model.getOwnedElements() 
Body 
usagesInTypeAccess 
Declaration 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
superInterfaces 
* * 
TypeAccess 
1 
returnType 
1 
Method 
Declaration 
superClass 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
Class 
Declaration 
Abstract 
Method 
Invocation 
method 
1 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
Tuesday, 30. September 2014
Implementation of the OCL-Collection Operations 
▶ Just in time iterator-based implementation rather than 
straight forward aggregation of result collections. 
13 
:RepositoryModel 
:Rev :Rev :Rev 
:Par... :Par... :Par... :Par... :Par... :Par... 
:Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! 
1 
2 
3 
4 
Tuesday, 30. September 2014
Future Work, Remaining Problems, and Limitations 
14 
Scalability 
■very large compilation units 
■incremental snapshot creation 
■batching OCL execution 
■experiments with large scale 
repository (e.g. git.eclipse.org) 
Heterogeneity 
■MoDisco for different 
programming languages 
■common metrics meta-model 
(e.g. OMG, KDM) 
■VCS abstraction and support for 
different VCS 
Accessibility 
■relating results to 
software repository 
entities 
■persisting and 
exporting results 
Information Depth 
■diff-models from 
comparison of 
compilation 
units 
▶ Very large compilation units (CU): e.g. a 3 MB, 600 kLOC CU in org.eclipse.emf 
■ tends to have lots of dependencies ➞ changes often ➞ makes problem even bigger 
■ CUs are smallest common denominator between text-based VCS view and syntax-based 
AST view 
■ smaller units require model-comparison or text-to-AST mappings 
▶ Support for different programming languages: either abstraction, parallel meta-models, or 
mixed approach 
■ MoDisco is extendable, but only Java support exists; other languages need to be 
implemented ➞ parallel meta-models 
■ A reasonable abstraction for multiple (or all) programming language probably does 
not exist. 
■ A shared abstract meta-model that all language meta-models extends could be an 
sensible compromise. 
Tuesday, 30. September 2014
15 
Tuesday, 30. September 2014

Contenu connexe

Dernier

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Model-based Mining of Software Repositories

  • 1. Model-based Mining of Source Code Repositories Markus Scheidgen, Joachim Fischer {scheidge,fischer}@informatik.hu-berlin.de 1 Tuesday, 30. September 2014
  • 2. Agenda ▶ Mining Software Repositories (MSR) and Software Evolution Research (SER) ▶ srcrepo – a model-based MSR system ■ components and mining process ■ distributed storage of very large models with emf-fragments ■ a meta-model for source code repositories ■ gathering software metrics with an OCL-like internal Scala DSL ▶ work in progress - discussion of remaining problems and limitations 2 Tuesday, 30. September 2014
  • 3. Relevant Research Fields 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 3 Tuesday, 30. September 2014
  • 4. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 5. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 6. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 7. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 8. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 9. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 10. srcrepo – Components and Process 4 Tuesday, 30. September 2014
  • 11. srcrepo – Components and Process 4 large scale software repositories (e.g. github, sourceforge) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects Tuesday, 30. September 2014
  • 12. srcrepo – Components and Process revision tree 4 large scale software repositories (e.g. github, sourceforge) srcrepo storage (EMF-models via EMF-Fragments, e.g. on mongodb) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects C! 1 2 3 A" B# A! AST-models of new and changed CUs import Tuesday, 30. September 2014
  • 13. srcrepo – Components and Process revision tree 4 large scale software repositories (e.g. github, sourceforge) srcrepo storage (EMF-models via EMF-Fragments, e.g. on mongodb) srcrepo runtime (headless eclipse RCP) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 revision tree 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis Tuesday, 30. September 2014
  • 14. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis Tuesday, 30. September 2014
  • 15. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis OCL M! M" M# Tuesday, 30. September 2014
  • 16. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 17. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis M! M!"# M# M# M#"$ store timelines of metrics OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 18. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki statistics software (e.g. R, Matlab) C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis M! M!"# M# M# M#"$ store timelines of metrics export OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 19. Distributed Model Store with emf-fragments 6 normal references * Tuesday, 30. September 2014
  • 20. Distributed Model Store with emf-fragments 6 normal references * fragmenting references «fragments» * Tuesday, 30. September 2014
  • 21. A Meta-Model for Source Code Repositories 7 code revision tree » «fragments» Tuesday, 30. September 2014
  • 22. 8 MoDisco source code revision tree «fragments» Tuesday, 30. September 2014
  • 23. A OCL-like internal Scala DSL for Computing Metrics ▶ OCL-like internal Scala DSL analog to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 9 Tuesday, 30. September 2014
  • 24. A OCL-like internal Scala DSL for Computing Metrics ownedPackages ▶ OCL-like internal Scala Model DSL analog Package to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 9 pure OCL context Model: self.ownedElements->collect(p|p.ownedElements)->size Abstract Type Declaration ownedElements ownedElements * * * 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare Tuesday, 30. September 2014
  • 25. A OCL-like internal Scala DSL for Computing Metrics Abstract Type Declaration ownedPackages * ▶ OCL-like internal Scala DSL ownedElements Model analog Package to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 9 pure OCL context Model: * ownedElements * self.ownedElements->collect(p|p.ownedElements)->size OCL-like expression in Scala def numberOfFirstPackageLevelTypes(self: Model): Int = self.getOwnedElements().collect(p=>p.getOwnedElements()).size() 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare Tuesday, 30. September 2014
  • 26. A OCL-like internal Scala DSL for Computing Metrics ▶ Extending OCL’s collection operations: ■ convenience operations ■ closure ■ aggregation ■ execution trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[10 E]): OclCollection[E] 123456789 10 11 12 13 14 15 16 Tuesday, 30. September 2014
  • 27. trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[E]): OclCollection[E] def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R def sum(expr: (E) => Double): Double def product(expr: (E) => Double): Double def max(expr: (E) => Double): Double def min(expr: (E) => Double): Double def stats(expr: (E) => Double): Stats def run(runnable: (E) => Unit): Unit } 11 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Tuesday, 30. September 2014
  • 28. Complex Example: Average Weighted Methods per Class (WMC) ▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity. def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 Tuesday, 30. September 2014
  • 29. Complex Example: Average Weighted Methods per Class (WMC) ▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity. def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) Model Abstract Body Declaration 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() Package ownedPackages .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average Abstract Type Declaration def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 Class Declaration 1 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 TypeAccess ownedElements ownedElements bodyDeclarations superInterfaces superClass returnType type usagesInTypeAccess * * * * * * 1 1 Tuesday, 30. September 2014
  • 30. Model Package ownedElements Complex Example: Average Weighted Methods * per Class * ownedPackages (WMC) ownedElements * ▶ WMC is the first CK-metric [1]. There different Abstract commonly Type used weights; here we use cyclomatic complexity. Declaration bodyDeclarations type * 1 def classes(model:Model):OclCollection[Abstract ClassDeclaration] = model.getOwnedElements() Body usagesInTypeAccess Declaration .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() superInterfaces * * TypeAccess 1 returnType 1 Method Declaration superClass .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 Class Declaration Abstract Method Invocation method 1 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 Tuesday, 30. September 2014
  • 31. Implementation of the OCL-Collection Operations ▶ Just in time iterator-based implementation rather than straight forward aggregation of result collections. 13 :RepositoryModel :Rev :Rev :Rev :Par... :Par... :Par... :Par... :Par... :Par... :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! 1 2 3 4 Tuesday, 30. September 2014
  • 32. Future Work, Remaining Problems, and Limitations 14 Scalability ■very large compilation units ■incremental snapshot creation ■batching OCL execution ■experiments with large scale repository (e.g. git.eclipse.org) Heterogeneity ■MoDisco for different programming languages ■common metrics meta-model (e.g. OMG, KDM) ■VCS abstraction and support for different VCS Accessibility ■relating results to software repository entities ■persisting and exporting results Information Depth ■diff-models from comparison of compilation units ▶ Very large compilation units (CU): e.g. a 3 MB, 600 kLOC CU in org.eclipse.emf ■ tends to have lots of dependencies ➞ changes often ➞ makes problem even bigger ■ CUs are smallest common denominator between text-based VCS view and syntax-based AST view ■ smaller units require model-comparison or text-to-AST mappings ▶ Support for different programming languages: either abstraction, parallel meta-models, or mixed approach ■ MoDisco is extendable, but only Java support exists; other languages need to be implemented ➞ parallel meta-models ■ A reasonable abstraction for multiple (or all) programming language probably does not exist. ■ A shared abstract meta-model that all language meta-models extends could be an sensible compromise. Tuesday, 30. September 2014
  • 33. 15 Tuesday, 30. September 2014