1. OMG
-‐
I
Need
a
Study
for
my
PhD
:-‐
Walid
Maalej
–
Feb
2014
–
Kiel
–
@maalejw
2. Summary
1
2
3
Convince
you
with
the
potenHals
of
empirical
research
in
so6ware
engineering
Introduce
important
terminology
as
index
to
find
more
knowledge
Share
my
experience
(best
prac@ces
and
piAalls)
and
discuss
it
with
you
2
3. Outline
of
my
Talk
1
MoHvaHon
2
Research
QuesHons
3
!!
Research
Methods
!!
4
[Data
CollecHon
and
Analysis]
3
4. What
is
Empirical
Research?
+
Data
ObservaHon
SystemaHc
The
“new
standard”
in
the
SE
community!
4
5. Other
Research
Approaches
Engineering-‐driven
AnalyHcal
SuperDuper
Super1
…
MathemaHcal,
formal
MustBe1
Super2
MustBe2
MustBe3
AnecdoHc
∧α
α≥√∞
5
6. Goal
of
Empirical
Studies
Explore
Understand
phenomena
and
iden@fy
problems
Evaluate
Check
and
improve
solu@ons,
measure
impact
6
7. Outline
of
my
Talk
1
MoHvaHon
2
Research
QuesHons
3
Research
Methods
4
[Data
CollecHon
and
Analysis]
7
8. Define
*What*
Do
you
Want
to
Study!
Which
strategies
(including
steps
and
ac@vi@es)
do
developers
use
in
order
to
comprehend
so6ware?
Which
sources
of
informaHon
do
developers
use
during
program
comprehension?
Which
tools
do
developers
use
when
understanding
programs
and
in
which
way?
8
9. Iterate!
Rephrase
When
you
are
Done!
Data
Analysis
End
Data
CollecHon
Methods
QuesHons
Start
9
10. Be
Concrete
and
Precise!
Bad
Good
[NO
research
ques@ons!]
What
is
the
impact
of
How
can
we
make
so6ware
informa@on
dispersion
on
developments
more
efficient?
development
produc@vity?
How
do
developers
perceive
tool
integra@on?
How
do
developers
assess
tool
integra@on
as-‐is?
10
11. Try
to
not
solve
the
“World
Hunger
Problem”
in
your
PhD!
11
12. Common
Types
of
QuesHons...
Type
Example
What/Which
Which
tools
do
developers
use
during
a
bug
fixing
task?
How
How
to
developers
proceeds
to
fix
a
bug?
Why
Why
are
agile
methods
popular
in
industry?
When
When
are
code
clones
useful?
How
much
How
o6en
How
frequently
do
developers
need
to
know
the
steps
to
reproduce
a
bug?
12
13. Outline
of
my
Talk
1
MoHvaHon
2
Research
QuesHons
3
Research
Methods
4
[Data
CollecHon
and
Analysis]
13
15. Example:
Tool
IntegraHon
Revisited
(Maalej
2009)
Phase
2
Explanatory,
QuanHtaHve
Phase
1
Exploratory,
QualitaHve
1
3
4
Semi-‐structured
face-‐to-‐face
interviews
with
engineers
Field
experiments
2
with
engineers
Content
analysis
of
project
Online
ques@onnaire
with
professionals
ar@facts
15
16. How
it
was
Presented
in
the
Paper
Research
Questions
Phase 1: Exploratory, Qualitative
Repeated
Interviews
Content
Analysis
Phase 2: Explanatory, Quantitative
Field
Experiment
Questionnaire
1. As-is assessment
2. Problems
3. Practices
4. Requirements
5. Appropriateness
16
30. PerfecHon
Your
QuesHons!
1.
Remove
unclear
quesHons!
2.
Put
the
least
important
last!
3.
Match
quesHons
with
answers
4.
Think
about
the
outliers
30
31. Exclude
Non-‐Serious
Subjects!
• Filter
incomplete
answers?
• Use
“check”
ques@ons
• Remove
"noise
answers”
• Random
order
of
the
ques@ons
and
answers
• ….
31
32. MoHvate
and
Give
IncenHves
1.
Share
results
(informaHon
and
tools)
3.
Offer
dedicated
analysis
2.
Raffle
giqs
4.
Show
the
importance
of
your
research
32
33. Use
Likert
or
SemanHc
Scales
for
Flexibility!
Problems encountered due to missing knowledge
Frequency
Often - Usually
Count (%)
Mode
(70,1%)
Never
Usually
Fixing a bug
When I am trying to understand other’s code
I need to know...
Never/
Rarely
f
What was the coder’s intention as he wrote this
Seldom
monthly
1333
Often
weekly
1267
Usually
daily
I don’t
know
1154
747
Reusing a component
Often
(69,8%)
Often
1153
Whether
1135
Problems encountered due to missing knowledge
**
Frequency
Never
1035
Often
Often - Usually
791
Count (%)
Mode
Often
681
Seldom
Fixing a bug
Understanding other s code (e.g. for review or documentation)
(70,1%)
(59,6%)
Usually
Often
1333
1190
1267
1025
1154
733
**
Seldom
747
677
f
Often
Often
Seldom
Reusing a component
538
(69,8%)
Whether
Implementing a feature
1153 (59,0%)
Often
Seldom
Often
1135
948
1035
862
791
839
**
Often
Often
681
829
Seldom
Often
33
36. 1333
1267
Focus
on
Quasi-‐ExperimentaHon
Instead
of
RepresentaHve
Summaries!
f
1154
747
Often
Reusing a component
(69,8%)
Often
1153
Whether
1135
1035
Often - Usually
791
Count (%)
Frequency
Never
**
Often
Mode
Often
681
Problems encountered due to missing knowledge
Seldom
Fixing a bug
Understanding other s code (e.g. for review or documentation)
(70,1%)
(59,6%)
Usually
Often
1333
1190
1267
1025
1154
733
**
Seldom
747
677
f
Often
Often
Seldom
Reusing a component
538
(69,8%)
Whether
Implementing a feature
1153 (59,0%)
Often
Seldom
Often
1135
948
1035
862
791
839
**
3-5 years
6-10 years
Small
1-5 employees
0-2 other
25%
Understanding years s code (e.g. for review or documentation)
**
35%
16%
**
37%
Large
30%
>500 employees
28%
28%
Medium
>10 years
50-500 employees
(A) Development Experience
(B)
** Size of Employer
Both
27%
66%
Private
Closed Source
Often
Often
681
829 1-7 people
717
(59,6%)
1190
Public 667
Open Source
1025
7%
(C) Types of Projects
733
677
(D)
>30 people
1%
Often 16-30 people
8-15 people Seldom
5%
23%
Seldom
Collaborators Count incl. Team
Seldom
(59,0%)
948
Often
71% Seldom
Often
538
Implementing a feature
Seldom
Often
Often
36
37. OBSERVATIONS
Are
rather…
• Objec@ve
• Quali-‐/quan@ta@ve
• Exploratory
• With
users
and
>1
researchers!
37
38. Observe
Less
but
in
RealisHc
Environment
How
many
subjects
do
we
need?
38
39. Use
an
ObservaHon
Template!
On the Comprehension of Program Comprehension
36:5
Table II. Excerpt from the Observation Protocol of Participant P5 (Observational Study)
Daytime
...
10:19
Relative time
...
00:27
10:20
10:24
10:26
10:28
00:28
00:32
00:34
00:36
10:29
00:37
10:30
10:31
...
00:38
00:39
...
Observation/ Quote
...
Read Jira ticket
Comment: “this sounds like the ticket from yesterday”
Refresh source code repository
Publish code to local Tomcat
Debug code in local Tomcat
Open web application in browser and enter text into
form fields
Change configuration in XML file content.xml
Exclamation: “not this complicated xml file again”
Publish changes to local Tomcat
Debug local Tomcat
...
Postponed questions
...
What information
considered?
Why debugging?
How known what to
change?
...
A single observation session lasted for 45 minutes, leaving another 45 minutes for
the interview. We did not want to spend more than 90 minutes because concentration of both observed developer and observer decreases over time. In each session, one
participant was observed and interviewed by one observer.
Prepare
codes
for
observaHons!
2.2.2. Online Survey. The survey focused on knowledge consumed and produced in soft39
ware comprehension. Starting from the findings of several recent studies [Ko et al.
2007; Sillito et al. 2008; Fritz and Murphy 2010], we assumed that knowledge needs
43. Talk
About
Your
ObservaHon
(Peer
Debriefing)!
• This
helps
to
idenHfy
the
relevant
observaHons
and
to
group
observaHon
• Avoid
talking
to
subjects
during
observaHon
43
47. What
is
Reliability?
Reliability
measure
correctly
and
reduce
systema@c
errors
If
redone
the
results
will
be
the
same
Validity
measure
the
right
thing
and
reduce
risk
for
assump@ons
Results
can
be
generalized
to
popula@on
Why
is
Content
Analysis
Reliable?
47
48. Develop
a
Coding
Guide
API$Knowledge$Coding$Guide$Version$7.2$
You!will!be!presented!with!documentation+blocks!extracted!from!API!reference!documentation!(Javadocs!and!the!like).!For!each!
block,!you!will!be!also!presented!with!the!name!of!its!corresponding!package/namespace,!class,!method,!or!field.!Your!task!is!to!read!
each!block!carefully!and!evaluate!whether!the!block!contains!knowledge!of!the!different!types!described!below.!You!will!need!to!
evaluate!whether!each!block!contains!knowledge+of+each+different!type.!Rate!the!knowledge!type!as!true!only!if!there!is!clear!
evidence!that!knowledge!of!that!type!is!present!in!the!block.!If!you!hesitate!about!whether!or!not!to!rate!a!knowledge!type!as!true,!
leave!it!as!false.!
Do!not!evaluate!automatically!generated!information!such!as!the!declaration!of!an!element!(e.g.!extends!MyInterface),!or!generated!
links!in!“specified!by”.!Only!evaluate!human!created!documentation!in!the!block!(see!last!section!in!page!5!for!more!details).!
Read!the!following!description!very!carefully.!It!explains!how!to!rate!each!knowledge!type!for!a!given!block.!
Knowledge)Types)
Functionality)and)Behavior)
• Describe
the
coding
task
• Give
clear
defini@ons
and
how
to
interpret
the
data
• Give
examples
Describes!what!the!API!does!(or!does!not!do)!in!terms!of!functionality!or!features.!The!block!describes!what+happens!when!the!API!
is+used!(a!field!value!is!set,!or!a!method!is!called).!This!also!includes!specified!behavior!such!as!what!an!element!does,!given!special!
input!values!(for!example,!null)!or!what!may!cause!an!exception!to!be!raised.!!
Functionality!and!behavior!knowledge!can!also!be!found!in!the!description+of+parameters!(e.g.,!what!the!element!does!in!response!
to!a!specific!input),!return+values!(e.g.,!what!the!API!element!returns),!and!thrown+exceptions.!!
•
•
Detects&stream&close&and¬ifies&the&watcher&
Obtains&the&SSL&session&of&the&underlying&connection,&if&any.&If&this&connection&is&open,&and&the&underlying&socket&is&an&SSLSocket,&
the&SSL&session&of&that&socket&is&obtained.&This&is&a&potentially&blocking&operation.&
Only+rate+this+type+as+true+if+the+block+contains+information+that+actually+adds+to+what+is+obvious+given+the+complete+signature+of+
the+API+element+associated+with+the+block.!If!a!description!of!functionality!only!repeats!the!name!of!the!method!or!field,!it!does!not!
contain!this!type!of!knowledge!and!you!should!rate!it!as!false,!and!instead!rate!the!knowledge!type!nonMinformation!as!true.!For!
example,!this!would!be!the!case!if!the!documentation!for!a!method!called!getTitle!was!
•
Returns&the&title.&
Similarly!for!constructors,!if!the!documentation!simply!states!“Constructs&a&new&X”,!“Instantiates&a&new&object”,!or!something!similar!
the!value!is!false!(with!nonMinformation!coded!as!true).!In!some!cases!nonMinformation!will!be!phrased!to!look!like!a!description!of!
functionality,!for!examples!with!sentences!that!start!with!verbs!like!“gets”,!“adds”,!“determines”,!“initializes”.!Carefully!read!the!
name!and!signature!of!the!API!element!and!only!assign!a!value!of!true!for!this!knowledge!type!if!the!block!adds!something!to!the!
description!of!the!element.!
However,!if!any!other!details!are!provided,!rate!this!type!as!true.!For!example:!
• Creates&a&new&MalformedChallengeException&with&a&null&detail&message.&
Should!get!a!value!of!true!because!of!the!additional!information!about!the!value!of!the!message!field.!
Mentioning!that!a!value+can+be+obtained!from!a!field,!property,!or!getter!method!does!not!constitute!a!description!of!functionality,!
except!the!API!performs!some!additional!functions!when!the!value!is!accessed.!For!example,!the!block!below!does!not!represent!a!
description!of!functionality.!The!NonMinformation!type!for!this!block!should!be!rated!as!true.!
•
[LoggerDescription.Verbosity&Property]&Gets&the&verbosity&level&for&the&logger.&
Note+IMPORTANT:!Description!of!functionality!is!not!limited!to!the!functionality!of!the!element!associated!with!the!block,!but!the!
API!as!a!whole.!However,+if+the+block+explains+a+sequence+of+method+calls+or+creation+of+particular+objects+(e.g.+events)+code+this+
as+ControlJflow.+For!example,!if!setting!the!value!of!a!field!results!in!some!perceived!behavior!by!the!framework,!this!knowledge!
counts!as!functionality.!If!the!block!describes!a!resulting!sequence!of!method!calls!or!events!fired,!this!is!control!flow.!If!the!block!
contains!both,!then!both!should!be!coded!as!true.!!
!
[Maalej
&
Robillard
2013]
[Pagano
&
Maalej
2013]
1!
48
51. 1264
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 9, SEPTEMBER 2013
Patterns of Knowledge
in API Reference Documentation
Walid Maalej and Martin P. Robillard
Abstract—Reading reference documentation is an important part of programming with application programming interfaces (APIs).
Reference documentation complements the API by providing information not obvious from the API syntax. To improve the quality of
reference documentation and the efficiency with which the relevant information it contains can be accessed, we must first understand
its content. We report on a study of the nature and organization of knowledge contained in the reference documentation of the
hundreds of APIs provided as a part of two major technology platforms: Java SDK 6 and .NET 4.0. Our study involved the development
of a taxonomy of knowledge types based on grounded methods and independent empirical validation. Seventeen trained coders used
the taxonomy to rate a total of 5,574 randomly sampled documentation units to assess the knowledge they contain. Our results provide
a comprehensive perspective on the patterns of knowledge in API documentation: observations about the types of knowledge it
contains and how this knowledge is distributed throughout the documentation. The taxonomy and patterns of knowledge we present in
this paper can be used to help practitioners evaluate the content of their API documentation, better organize their documentation, and
limit the amount of low-value content. They also provide a vocabulary that can help structure and facilitate discussions about the
content of APIs.
Index Terms—API documentation, software documentation, empirical study, content analysis, grounded method, data mining, pattern
mining, Java, .NET
Ç
1 INTRODUCTION
A
programming interfaces (APIs) enable the
reuse of libraries and frameworks in software development. In essence, an API is a contract between the
component providing a functionality and the component
using that functionality (the client). The syntactic information is, in all but the most trivial cases, insufficient to allow a
developer to correctly use the API in a programming task.
First, interfaces abstract complex behavior, knowledge of
which may be necessary to understand a feature. Second,
even if the behavior of a component could be completely
specified by its interface, developers often need ancillary
knowledge about that element: how it relates to domain
terms, how to combine it with other elements, and so on
[30]. This knowledge is generally provided by documentation, in particular by the API’s reference documentation.
We define API reference documentation as a set of
documents indexed by API element name, where each
document specifically provides information about an
element (class, method, etc.). For example, the API
documentation of the Java Development Toolkit (JDK) is a
set of web pages, one for each package or type in the API.
Although many forms of API documentation exist, there is
usually a clear distinction between reference documentation
and other forms of documentation with a more pedagogical
intent (e.g., tutorials, books, and FAQs).
PPLICATION
Reference documentation is a necessary and significant
part of a framework. For example, the reference documentation of the JDK 6 (SE and EE) totals over three million
words, or six times the length of Tolstoy’s epic novel War
and Peace. Reference documentation also plays a crucial role
in how developers learn and use an API, and developers
can have high expectations about the information they
should find therein [14], [30]. Empirical studies have
described how developers have numerous and varied
questions about the use of APIs (see Section 8). Efficient
representation and access of knowledge in API reference
documentation is therefore a likely factor for improving
software development productivity.
Most technology platforms exposing APIs provide a
documentation system with a uniform structure and lookand-feel for presenting and organizing the API documentation. For example, Java APIs are documented through
Javadocs, documentation for Python modules can be
generated with the pydoc utility, and Microsoft technologies, whose documentation is available through the MSDN
website, follow the same look-and-feel. Unfortunately, no
standard and a few conventions exist regarding the content
of reference documentation. For example, an early article
explains the rationale behind Javadocs and gives a set of
conventions for what should and should not be part of
Javadocs [20]. In practice, however, these conventions are
51
56. In
Soqware
Engineering…
• Automa@on
tools
transform
A
-‐>
B
• We
o6en
have
A
and
B!
• We
can
use
the
tool
with
A
and
check
whether
the
output
is
B
56
57. SomeHmes:
We
Have
the
Data…
What
Can
We
Do
With
It!
•
•
•
•
Examples:
Revision
history
>>
bug
predic@on
Bug
data
>>
link
to
source
code
Interac@on
data
>>
??
Mixing
the
exploraHon
and
evaluaHon
task
57
58. EXPERIMENT/
USER
STUDY
Are
rather…
• Quan@ta@ve
• Evalua@ve
• Involve
users
• In
Lab
seqng?
58
59. Build
a
Control
and
an
Experimental
Group
With
and
without
tool
(Aspirin).
Problems?
59
64. Outline
of
my
Talk
1
MoHvaHon
2
Research
QuesHons
3
Research
Methods
4
[Data
CollecHon
and
Analysis]
64
65. Data
CollecHon
is
Expensive
Probably
the
most
painful
part,
expensive
to
redo!
Try
to
use
and
reuse
exisHng
data
before
collec@ng
new
data!
Reused
data
from
related
work,
ask
authors!
Plenty
of
data
in
open
source
repositories!
65
67. Avoid
ReporHng
too
Many
Numbers
Example
removed
for
privacy
reason!
• Use
appendix
• Share
data
online
67
68. Use
Simple
VisualizaHon!
Channels Used to Access Knowledge
Never
Seldom
-1
Often
0
1
Usually
2
Other People
Count
Often - Usually
Mean
1170 (81,6%)
-2
1,0
Project Artifacts
Issue and bug reports
895
(63,1%)
0,4
API description
1076 (75,8%)
0,9
Comments in source code
990
(69,3%)
0,6
Commit messages
525
(38,1%)
-0,4
Personal Artifacts
Personal notes (logbooks, diaries, post-it s)
339
(24,1%)
-0,9
Personal e-mails
906
(42,8%)
-0,2
Work item/task descriptions (To-dos)
710
(50,3%)
0,0
Intranet
437
(31,8%)
-0,7
Project or organization Wiiki
532
(38,5%)
-0,5
Experience Databases / groupware systems
181
(13,4%)
-1,0
Forums and mailing lists
742
(52,7%)
0,1
Web search engines (e.g. Google, Yahoo)
1170 (81,4%)
1,0
Public documentation / web pages I know
1081 (76,0%)
0,9
Knowledge Management Systems
Internet
Problems encountered due to missing knowledge
Frequency
Often - Usually
Count (%)
Mode
(70,1%)
Never
Usually
Fixing a bug
1333
1267
f
1154
747
Reusing a component
Whether
Often
(69,8%)
1153
1135
68
Often
69. Discuss
Findings
and
LimitaHons!
Empir Software Eng
Fig. 9 Dependencies between
blogs and commits in terms of
time
between commit messages and blog post on this result, we calculated the average
time period for each grade. Figure 9 shows the results. The strength of dependency
between a commit message and a blog post decreases with an increasing time period
between the commit and the post.
To summarize, developers also use blogs to summarize their work. They are more
likely to publish information about recent activities they have performed than about
old activities.
6 Discussion
In this section we highlight three main findings. First, we discuss the importance of
blogs as a project medium and blogging as a project function. Second, we discuss
the purpose of blogging in open source software projects based on our results, differentiating between blogging committers and other stakeholders. Finally, we derive
insights for future research, in particular how to integrate blogs into development
environments and blogging into developers’ workflows as well as how to dissolve
boundaries between developers and other stakeholders.
6.1 Blogging is a Project Function
In all studied open source communities we observed regular and frequent blogging
activities since several years and across many releases. This is not surprising, as
blogs became one of the most popular media for sharing and accessing software
engineering knowledge in the last years (Parnin and Treude 2011). While individual
developers only blog occasionally, the community as a whole constantly shares
information and produces an average of up to six blog posts per day. These posts
are written equally by committers as well as other community members.
Unlike committers in large open source projects, which have been studied quite
thoroughly (e.g. Mockus et al. 2002), other community members are less researched.
This non-committing group includes not only actual users of the software, but also
other stakeholders such as evangelists, community coordinators, companies’ proxies,
and managers. Evangelists might have created the project long time ago. They have
large experience and special interests in the success of an open source project, and
therefore advertise it and demonstrate its usefulness. Managers and coordinators
might be hired by the community to plan releases or organize conferences. Crowston
Empir Software Eng
topics and their popularities. Developers as well as other stakeholders discuss about
requirements, implementation, and community aspects. On the one hand, developers
report about their recent development activities to communicate their project work
to a broad audience, including users and other stakeholders. On the other hand
users and other stakeholders seem to have their blogging peak time shortly after new
versions are released—reporting on their experiences with the new changes. Utilizing
these experiences and the volunteered resources provides a huge benefit for software
projects. We claim that communities should be created systematically and integrated
in software systems utilizing social media such as blogs. In (Maalej and Pagano 2011)
we envision a software framework that enables the development and maintenance of
such social software.
7 Results Validity
7.1 External Validity
Although our study was neither designed to be generalizable nor representative
for all developers and communities, we think that most of the results have a high
degree of generalizability, in particular for large open source communities. At
the design time of the study, we knew neither the entire population of software
development blogs, nor of blogging developers. Therefore we were unable to study
a random representative sample of blogs and bloggers. Instead, we aimed at a rather
exploratory, hypothesis-generating study to better understand blogs, their usage, and
role in the development project. The four studied communities should rather be
interpreted as four cases than as one homogeneous dataset. However, the careful
selection of these communities, their leading role in open source software, and the
large number of their blogs and bloggers give confidence that many of the results
apply for other comparable communities as well.
We think that our results are representative for each of the studied communities
due to the following reasons.
–
–
–
–
Our datasets include all community blogs from the last seven years.
We conducted statistical tests to check the statistical significance of our results
and exclude hazard factors.
We got similar results using different analysis methods (e.g. descriptive statistics
and topic analysis).
In two of the studied communities (Eclipse and GNOME), we were able to
contact three senior active members. Among them were both committers, who
had contributed for around three to four years (92 to over 600 commits each),
and evangelists, who had been involved at least three years in the community.
While discussing the results in detail they confirmed the findings based on their
experiences.
Nevertheless, there are three limitations which should be considered when interpreting the results. First, for Eclipse and GNOME we were unable to analyze the blogs of
69