Presented at the 21st IEEE International Conference on Program Comprehension (ICPC 2013), San Francisco (USA). Website of the paper: http://softlang.uni-koblenz.de/explore-API-usage/
Exploring the Future Potential of AI-Enabled Smartphone Processors
Multi-dimensional Exploration of API Usage
1. Multi-dimensional
Exploration of API Usage
Coen De Roover1, Ralf Lämmel2, Ekaterina Pek3
1 Software Languages Lab, Vrije Universiteit Brussel, Belgium
2 Software Languages Team, University of Koblenz-Landau, Germany
3 ADAPT Lab, University of Koblenz-Landau, Germany
2. Exploration Story: JHotDraw
➜ relatively few references to SAX and DOM
what XML APIs are used and how extensively?
Swing!!java.lang!!JavaBeans!!java.io!!AWT!!java.util
Package org.jhotdraw.undo
AWT!!Swing!!java.io java.lang java.util
JavaBeans java.text java.lang.reflect!!DOM!!java.net
java.util.regex!!Java Print Service!!java.util.zip!!java.lang.annotation
java.math java.lang.ref java.util.concurrent Java security!!javax.imageio!!SAX
JHotDraw’s API Cocktail
Fig. 8. The API Cocktail of JHotDraw (cloud of API tags).
View – A list as in the case of the API Footprint insight, except
that it is narrowed down to a sub-API of interest.
GUI
IO!!For
XML!!D
JH
GUI
APIs
for the
API cloud
cept
for
th a
tory
ents.
heck
API
GUI!!Data!!Basics!!
IO!!Format!!Component!!Meta!!
XML!!Distribution!!Parsing!!Control!!Math!!Output!!Security!!Concurrency
JHotDraw’s API Domain Cocktail
GUI!!Basics!!Component!!IO
Package org.jhotdraw.undo
Project jhotdraw
Fig. 9. Cocktail of domains for JHotDraw.
java.lang!!java.net!!Swing!!JavaBeans!!java.io!!
APIs
API domains
Coupling in JHotDraw
for the interface org.jhotdraw.app.View
Fig. 10. API Coupling for JHotDraw’s interface org.jhotdraw.app.View.
software concepts. Consider Fig. 9 for illustration. It shows
API domains for all of JHotDraw and also for its undo
package. Thus, it presents the API cocktails of Fig. 8 in a
API domain cloud
?
Let me start by making the concept of exploring API usage more concrete.
Imagine you are a developer tasked with migrating JH from XML to JSON for persistency.
The first thing you would like to know is what APIs for manipulating XML are used, and how extensively these APIs are used.
You could gain these insights through the two tag clouds shown on the slide. The top one contains the domains of the APIs used by JH, the bottom one the actual APIs. The
size of a tag corresponds to the amount of references to the API or the domain. So we can conclude that XML apis are used by JH, more concretely DOM and SAX, but not
extensively. There are a lot more references to the AWT and SWING APIs from the GUI programming domain, for instance.
3. Exploration Story: JHotDraw
➜ footprint of DOM in JHotDraw is but 94 refs to 19 distinct elements
what elements of DOM are actually used?
table of referenced API elements (i.e., DOM slice)
?
The next insight to gain is whether the project uses the complete DOM API, or just a small subset. Given a table of referenced API elements, the latter seems to be the case.
There are only 94 references to 19 distinct types and methods.
Even better news, no exotic API elements are used.
4. Exploration Story: JHotDraw
Slice of JHotDraw
with DOM usage
➜ local to 1/13 top-level packages
How is DOM usage distributed across JHotDraw?
table of referencing project elements (i.e., JHotDraw slice)
?
in the view of hundreds of API elements declared by the
public void applyStylesTo(Element elem) {for (CSSRule rule : rules) {if (rule.matches(elem)) {rule.apply(elem);}
}
}
usage.
All good news so far, but it could still be the case that the API is used all over the project. Luckily, given a table of referencing project elements, the use of DOM is local to 4
classes in the org.jhotdraw.xml package.
Our exploration therefore shows that migrating from XML to JSON is feasible.
5. Exploring API Usage: Quaatlas API Atlas
API metadata
API named collection of elements (98)
API domain named collection of APIs addressing the same domain (27)
API facet named collection of API elements addressing a particular concern
basic
d API
erface
ot pay
works.
faces,
n soft-
vokes
efixes,
APIs.
ssibly
GUI
g as a
ckage
types
kages
APIs.
cause
Java’s
mmon
ubsets
which
hould
ysis is
given
y sort
rectly
ojects,
tions of projects or APIs as well as specific packages, types,
or methods thereof. For instance, we may be interested in #api
for a specific project. Also, we may be interested in #ref for
some part of an API.
Further, these metrics can be configured to count only
specific patterns. It is easy to see now that the given metrics
are not even orthogonal because, for example, #derive can be
obtained from #ref by only counting patterns for ‘extends’ and
‘implements’ relationships.
API Domains: We assume that each API addresses some
programming domain such as XML processing or GUI pro-
gramming. We are not aware of any general, widely adopted
attempt to associate APIs with domains, but the idea appears
to merit further research. We have begun collecting program-
ming domains (or in fact, API domains) and tagging APIs
appropriately. Let us list a few API domains and associate
them with well-known Java APIs:
GUI: GUI programming, e.g., Swing and AWT.
XML: XML processing, e.g., DOM, JDOM, and SAX.
Data: Data structures incl. containers, e.g., java.util.
IO: File- and stream-based I/O, e.g., java.io and java.nio.
Component: Component-oriented programming, e.g., JavaBeans.
Meta: Meta-programming incl. reflection, e.g., java.lang.reflect.
Basics: Basic language support, e.g., java.lang.String.
API domains are helpful in reporting API usage and quan-
tifying API usage of interest in more abstract terms than the
names of individual APIs, as will be illustrated in §VI.
API Facets: An API may contain dozens or hundreds
of types each of which has many method members in turn.
Some APIs use sub-packages to organize such API complexity,
but those sub-packages are typically concerned with advanced
API usage whereas the core facets of API usage are not
distinguished in any operational manner. This makes it hard
to understand API usage at a somewhat abstract level.
2. output : corpus
3. for each name in candidateList :
4. (psrc, pbin ) = obtainProject(name);
5. patches = exploratoryBuild(psrc, pbin );
6. timestamp = build(psrc, patches);
7. (java, classes, jars) = collectStats(psrc);
8. java0
= filter(java);
9. (jarsbuilt , jarslib) = detectJars(timestamp, java0
, jars);
10. java0
compiled = detectJava(timestamp, java0
, classes, jarsbuilt );
11. p0
src = (java0
compiled , jarslib);
12. p0
bin = jarsbuilt ;
13. p0
= (p0
src, p0
bin );
14. if validate(p0
) : corpus = corpus + p0
;
Fig. 4. Pseudocode describing the corpus (re)-engineering method.
Accordingly, we propose leveraging a notion of API facets
in the sense of aspects or concerns supported by the API.
In this paper, we assume that facets are represented as named
collections of specific API types or methods. As an illustration,
we name a few API facets of the typical DOM-like API such
as DOM itself, JDOM, or dom4j:
Input / Output: De-/serialization for DOM trees.
Observation: Getter-like access and other ‘read only’ forms.
Addition: Addition of nodes et al. as part also of construction.
Removal: Removal of nodes et al. as a form of mutation.
Namespaces: XML namespace manipulation.
Nontrivial XML: Use of CDATA, PI, and other XML idiosyncrasies.
Nontrivial API: Usage of types and methods that are beyond normal
API usage. For instance, XML APIs may provide some framework
for node factories or adapters for API integration.
API facets are helpful in communicating API usage to the
user at a more abstract level than the level of individual
types and methods, as will be illustrated in §VI. We leverage
knowledge of the APIs to identify (to name) API facets and to
tag APIs appropriately. The idea of grouping API members,
e.g., by their functional roles, has also been studied in related
work on code completion; see §III.
V. THE QUAATLAS CORPUS FOR API-USAGE ANALYSIS
Our study requires a suitable corpus of mature, well-
developed projects coming from different application domains.
Arguably, such projects show sufficient and advanced API
usage. We decided to restrict ourselves to open-source Java
projects; in order to increase quality and reproducibility of our
research, we decided to use an existing, established and cu-
rated, collection of Java projects—the QUALITAS corpus [27],
release 20101126r. As we discuss in §IV, API usage entails
the ability to resolve types. However, QUALITAS does not
guarantee the availability of a project’s library types. The
collection consists of source and binary forms as they are
provided by the project developers.
be exten
be added
projects.
Line 4
source a
project w
nature o
The exp
occur du
stage, w
in the b
set is sm
build scr
or invoc
to push
explorato
build the
After
modifica
Java file
types, fo
we explo
containe
On lin
that we
line 9, w
informat
classify
or as bui
and the
compiled
source c
types tog
the binar
The r
rebuildin
making s
the meth
and libra
we add t
This p
the proc
per proje
coverage
somethin
10 as an
it on reg
gathered by studying API usage in a corpus of projects
re-engineered Qualitas corpus to Eclipse projects that compile (79)
dependencies resolved and separated from project files
In the paper, we present a similar exploration-based approach for understanding API usage. This approach relies on a lot of meta-data about APIs that we have made
available in an API atlas.
For 98 APIs, this atlas describes the individual packages/types/methods the API consists of. A fine-grained description is necessary as libraries such as Google Guava or
even java.util group different APIs together.
We also associated a domain with each API. This resulted in 27 API domains. Finally, we have started describing groups of elements within an API that address a particular
concern.
We gathered this meta-data by studying the APIs used in a corpus of 79 mature projects. We re-engineered the projects from the Qualitas corpus such that all their
dependencies are resolved and separated from project files. This enables extracting precise API usage facts.
6. linked to 101
Note that the entire API atlas is available on the paper’s website.
There, we also present the meta-data in a human-readable format. One nice feature there is that each API is linked to its description on the 101companies wiki where you
can also browse through small example programs that use the API etc.
7. Exploring API Usage: Exapus Platform
scaled and ordered by usage metrics: #ref, #elem, #derive, #proj, #api, ...
computes exploration views on usage facts
selection of API references
organized as project or API slice
project members + outgoing refs within their scope
API members + incoming refs within their scope
rendered as graph, table or cloud
by referenced elements: API name, element, meta-data ...
by referencing elements: project name, element, syntactic pattern, ...
gathers API usage facts for a given corpus
referenced element, referencing scope, syntactic pattern (e.g., super call)
The actual exploration-based approach to understanding API usage is supported by a tool that extracts references to API elements from a single project or a corpus of
projects. During an AST visit, the tool records for each reference it discovers the referenced element, the project scope in which this reference resides, and the syntactic
form of the reference. This could be a method return type, a super call, or a type parameter, ..
The tool presents exploration views on the extracted facts, which can be configured along several dimensions. First of all, you can configure what API references to include
in a view using conditions on the referenced element and the referencing element. For instance, only the exceptions defined by an API from the XML domain that are caught
in the JH project. Next, you can choose to organize these references as a slice of project members with outgoing refs or as a slice of API members with incoming refs.
Finally, you can have these slices rendered as a graph/table/cloud scaled by a usage metric. For instance, a tag cloud scaled by the amount of subclassing along the border
between a project and an API.
8. What follows are some screenshots of the tool in action. At the far left, there is a list of predefined views. Their configuration can be edited in the top-right corner. Shown
here is the configuration of a view that results in the tag cloud we saw earlier. At the top, you can select what referenced elements to include. Here, we include all of them
using a wildcard pattern. At the bottom, you can select what referencing elements to include in a view. Here, we only include references from the JH project.
Note that even though the tool has a dynamic IDE-like feel, it is actually completely web-based. We hope this will encourage others to explore and augment our API meta-
data.
9. Here you see a project-centric table of outgoing references from JH to the Java collections API and DOM. We see for instance that the method add of StyleManager invokes
method add of java.util.List. At the bottom-left, you see a tag cloud for the currently selected project element. We see that there are more references to data APIs than to
XML apis in the StyleManager class.
The source code for this class is shown at the bottom-right. API references are highlighted within the source code.
10. Finally, here you see an API-centric graph of references from JH to the APIs known to us. Nodes are APIs. Borders of the nodes are scaled by the relative amount of
referenced elements. So this is basically another rendering of the tag cloud you saw earlier. You could also choose to scale the borders of the nodes using a different
metric, such as the amount of derivation that happens.
11. And of course, we also made this tool publicly available.
12. Insight: API Dispersion
intent
stakeholder
view
intelligence
understand and compare dispersion of an API across the corpus
API developer
project-centric table
usage metrics for quantitative comparison
API facets for qualitative comparison
Fig. 5. JDOM’s API Dispersion in QUAATLAS (project-centric table).
B. The API Dispersion Insight
choose compliance tests for API evolution
So, what insights about API usage can one hope to gain through such a tool? And how should you configure the tool such that it produces the right view for each insight?
In the paper, we discuss this in a structured manner for several API usage insights.
The one shown here is concerned with how dispersed or widespread an API is across a corpus of projects. It can be gained by configuring the tool to produce a table of
referencing project elements, together with some usage metrics. Here, we see JDOM’s dispersion in the corpus. The table is sorted by the amount of references each
project contains. We see that the informa project has the most references, but that jspwiki references the most distinct API elements. We also see that this project is one of
the few that contain subtypes of API elements. So who could benefit from this insight? This would be the developer of an API that needs to choose easy and difficult
projects for compliance testing after an API evolution.
13. Insight: API Footprint
intent
stakeholder
view
intelligence
understand what API elements are
actually used in a corpus
or in specific project scopes
API or project developer
API-centric table or tree
ordered or scaled by #ref
Fig. 6. JDOM’s API Footprint in QUAATLAS (API-centric table).
Nontrivial JDOM API usage in velocity
org.apache.velocity.anakia.AnakiaJDOMFactory
Scope Tags incl. facets #proj
...
API migration by project developer:
target effort
API evolution by API developer:
minimize breaking changes
The API footprint insight is dual to the API dispersion insight in the sense that it is gained through a slice of referenced API elements rather than through a table of
referencing project elements. API developers might want to gain this insight for an entire corpus of projects to minimize the impact of breaking API changes. A project
developer might want to gain this insight for a single project to decide whether a wrapper-based migration, where a wrapper of the new API has to be produced for each
referenced element, is feasible.
14. Insight: API Coupling
intent
stakeholder
view
intelligence
understand what APIs or API domains are
used in smaller project scopes
project developer
API-centric cloud, usage metrics applied
reveals potential code smell: too many APIs in small scope
on org.jhotdraw.app.AbstractView:
Basics!!Distribution!!GUI!!IO!!Component
java.lang!!java.net!!Swing!!JavaBeans!!java.io!!
APIs
API domains
Coupling in JHotDraw
for the interface org.jhotdraw.app.View
string
manipulation
view
saving
view
painting
change
notification
exceptions
during saving
helps understand design and motivation for API dependencies
Shown here is an insight that is targeted more towards project developers who would like to understand what APIs are used together in a small project scope. This insight
can be gained by configuring the tool to produce an API tag cloud for the currently selected project scope.
The one on the slide is for the AbstractView class of JH, which seems to be referencing quite a lot of different APIs.
For small project scopes, such as a method, this could be the sign of a code smell. For larger scopes, API tag clouds can also help understand the motivation behind API
dependencies. Here for instance, java.lang is referenced for string manipulation, java.net for saving a view to a URI, Swing for painting views, JavaBeans for change
notifications, and java.io for handling exceptions during the saving of a view.
15. Insight: API Profile
intent
stakeholder
view
intelligence
understand what API facets are used in varying project scopes
project developer
API-centric cloud of API facets, usage metrics applied
project scope: reveals API asbestos
smaller scope: API usage scenarios
Observation!!Input!!Exception!!
Package de.nava.informa.parsers
Observation!!Input!!
Nontrivial XML!!Manipulation Exception!!Renaming
Addition Namespaces!!Nontrivial API!!Output!!
Project informa
JDOM’s API Profile for informa
e.g., JDOM’s profile in informa
The API profile insight is similar, but is gained through a cloud of the facets of a single API used within a project scope rather than complete APIs. At the top, we see the
JDOM facets used within the entire informa project. Here, seldomly used non-trivial parts of an API reveal that the project might be difficult to change.
At the bottom, we see the JDOM facets used within a smaller scope of the project. Here, the displayed facets correspond to API usage scenarios: the parsers package reads
XML files and observes XML nodes.
16. Conclusion
described several insights to be gained about API usage
http://softlang.uni-koblenz.de/explore-API-usage
provided Quaatlas API atlas
re-engineered Qualitas projects for precise extraction of API usage
added meta-data concerning APIs, API domains, API facets
presented multi-dimensional exploration model
supported by IDE-like web-based platform Exapus
configurable views on API usage
cocktail, dispersion, distribution, footprint, coupling, profile
future work
empirical research on understanding API usage through exploration
support flow analyses in views