SlideShare une entreprise Scribd logo
1  sur  165
Télécharger pour lire hors ligne
Empirical Methods in
Software Engineering
Alessio Ferrari, ISTI-CNR, Pisa, Italy

alessio.ferrari@isti.cnr.it
April, 2020
What is Software
Engineering?
• Software engineering it the systematic design and
development of software products and the management
of the software process.

• Software engineering has as one of its primary objectives
the production of programs that meet specifications, are
demonstrably accurate, produced on time, and within
budget.
cf. D. O’Neil, 1980. https://doi.org/10.1147/sj.194.0421
The Scope of Software Engineering (SE)
customer / user analyst
developer
requirements
software
system designer
tester
apparently good
softwarecustomer / user
customer / user
customer
service / analyst
requirements
maintainer /
system designer
Typical SE Problems
How can I find bugs in my code?
How can improve software
development speed?
How can I reduce the resources
dedicated to testing?
How can improve my
requirements?
Typical SE Solutions (when we were not Empirical)
How can I find bugs in my code?
How can improve software
development speed?
How can I reduce the resources
dedicated to testing?
How can improve my
requirements?
This new testing
environment will allow you
to find all the bugs
Let us use this
new prototypical programming
language
We can use a controlled language
Let’s use model checking
Typical SE Failures
How can I find bugs in my code?
How can improve software
development speed?
How can I reduce the resources
dedicated to testing?
How can improve my
requirements?
This new testing
environment will allow you
to find all the bugs
Let us use this
new prototypical programming
language
We can use a controlled language
Let’s use model checking
It is very complex! I need to re-
train all my team!
The language does not cover my
real cases!
The language does not allow me
to express what I want!
I need training! It takes too long!
Language is too strict!
The Software Engineer’s Illusion
We though our tiny solutions would scale up to (ALL)
real-world cases, we thought a successful simple
example was sufficient to ensure that our idea was
working…
We were convinced that we could smoothly pass from theory to practice
Problem
Solution
We thought we could change the world without knowing the world
The Hard Truth
• Of course, we were wrong…

• Software development is a complex, context-dependent phenomenon involving
multiple stakeholders, professionals, needs, technologies, domains (aircraft software,
mobile app to track your diet, enterprise software to manage workflow, you name it…)

• You rarely start the development from scratch (there may be legacy systems to
refactor, need to interact with external systems and databases)

• Even if the software to be developed is new, developers have specific backgrounds
and skills that have an impact on the development

• You rarely know how the project will go, as the context is surely going to change
throughout the project

• We thought it was too simple, and rarely SE solutions came from research, as SE
researchers we felt useless…
We understood that, to change the world,
we should first learn about the world
Empirical Software
Engineering Research
• The use of a (not “the”) scientific method to investigate software
engineering problems (there is no “official” scientific method)

• Start from observation, formulate hypothesis, select methodology, 

validate hypothesis with respect to reality

• The simple idea is that if I understand how things work in practice, I can
find ways to improve them
• Knowledge and understanding is not regarded as a final goal, but a
means to an end, where the end is solving real-world problems (within
the scope of software engineering)

• Scientifically evaluating whether my solution has solved the problem is
also empirical software engineering
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
Theory
Space
Reality
Space
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
I have a lot of bugs
in this software
People work hours and hours,
but still, lots of bugs
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
I have a lot of bugs
in this software
People work hours and hours,
but still, lots of bugs
bugs may be produced
by too many work hours
bugs may be associated
to complex code
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
bugs may be produced
by too many work hours
I see that most bugs are
introduced
between 8pm and 8am
bugs may be associated
to complex code
No relation with code
complexity
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
I see that most bugs are
introduced
between 8pm and 8am
No relation with code
complexity
Install a system
that prevents developers from
working at night
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
Install a system
that prevent developers from
working at night
I reduced the number of bugs!
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
I reduced the number of bugs!
I cannot meet
the delivery deadlines!
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
I cannot meet
the delivery deadlines!
Development speed may
be lower during the day
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
Development speed may
be lower during the day
More correct code
during the day, but
slower speed
More bugs during the
night, but faster speed
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
More correct code
during the day, but
slower speed
More bugs during the
night, but faster speed
Dedicate more testing
resources for code
developed at night
Typical Empirical Software
Engineering Cycle
Observe
Reality
Formulate
Problem Theory
Evaluate Theory
Against Reality
Formulate
Solution Theory
Evaluate Solution
Against Reality
Dedicate more testing
resources for code
developed at night
Less bugs
More software
Typical SE Problems
How can I find bugs in my code?
How can improve software
development speed?
How can I reduce the resources
dedicated to testing?
How can improve my
requirements?
Problems are the same…
Typical SE Solutions (Today)
The way of approaching problems
is different…
Let’s see how you write
requirements now
Let’s see what are
the main quality problems of your
requirements
Solutions are more context-specific,
start from reality,
and use a scientific method
How can improve my
requirements?
I see your
requirements language is clear, but
requirements are just incomplete! How many
meetings do you normally do with your
customer?
… Ok, let’s try to
schedule a meeting each week to
revise the requirements with the
customer
But WHICH method?
Software Engineering as a
Strange Creature
but also a human
and social facet
It has a technical facet
Empirical Methods in
Software Engineering
Come from hard sciences
(mostly quantitative)
…but also from
social sciences (mostly
qualitative)
experiment
with human subjects
literature review /
archival analysis
interview
and ethnography
survey
case study field study
experiment
with software subjects
judgment study
Goal and Scope
of this Course
• To learn a set of methods commonly used in empirical
software engineering research

• To learn when to use a certain scientific method

• To learn how to combine different methods
Remember that all methods are FLAWED!
All Methods Are Flawed! 

(from Steve Easterbrook)
• Experiments
• Real-world environment is simplified, as I have to focus only on a specific set of
variables (independent, dependent, controlled variables, we will see them later)

• Surveys
• People tell you what they think, not what they do, and it is hard to be sure that they
have correctly understood the questions

• Interviews and Ethnography
• Unavoidable researcher bias, as theories are derived from qualitative data

• Case Studies (which could include (quasi-)experiments, surveys, interviews, etc.)
• Hard to generalise and hard to separate environment from unit of analysis, as case
studies are real-world experiences (several confounding variables) in one or a few
companies (limited scope)
http://www.cs.toronto.edu/~sme/CSC2130/
Never stick to methodological purity!
WARNING: there is no acknowledged taxonomy for these methods!
Course Outline
• Overview of Empirical Methods

• Interviews and Ethnography

• Surveys

• Systematic Literature Reviews

• Qualitative Data Analysis Methods

• Experiments, Quasi-experiments and Hypothesis Testing

• Mining Software Repositories

• Case Studies and Action Research
Reference Book
Roles and Tasks in
Software Engineering
These are the things that you will study
as an Empirical Software Engineer
Roles in SE
• Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make
management decisions (can also be a single person in case the co)
• Managers: three different levels of management may be present in a large company (low, middle, top)

• Top-level managers are responsible for controlling and overseeing the entire organization.

• Middle-level managers are responsible for executing organizational plans which comply with the company’s
policies. These managers act at an intermediary between top-level management and low-level management.

• Low-level managers focus on controlling and directing. They serve as role models for the employees they
supervise.

• Customers: the ones who buy the system
• Users: the ones who use the system
• Requirements/Business Analysts: the ones that gather requirements from customers and users
• Designers and Architects: the ones that design the system at the high level
• Developers: the ones who code
• Testers: the ones who test the code
Roles in SE
• Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make
management decisions (can also be a single person in case the co)
• Managers: three different levels of management may be present in a large company (low, middle, top)

• Top-level managers are responsible for controlling and overseeing the entire organization.

• Middle-level managers are responsible for executing organizational plans which comply with the company’s
policies. These managers act at an intermediary between top-level management and low-level management.

• Low-level managers focus on controlling and directing. They serve as role models for the employees they
supervise.

• Customers: the ones who buy the system
• Users: the ones who use the system
• Requirements/Business Analysts: the ones that gather requirements from customers and users
• Designers and Architects: the ones that design the system at the high level
• Developers: the ones who code
• Testers: the ones who test the code
Companies may include only a subset of the roles
Roles in SE
• Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make
management decisions (can also be a single person in case the co)
• Managers: three different levels of management may be present in a large company (low, middle, top)

• Top-level managers are responsible for controlling and overseeing the entire organization.

• Middle-level managers are responsible for executing organizational plans which comply with the company’s
policies. These managers act at an intermediary between top-level management and low-level management.

• Low-level managers focus on controlling and directing. They serve as role models for the employees they
supervise.

• Customers: the ones who buy the system
• Users: the ones who use the system
• Requirements/Business Analysts: the ones that gather requirements from customers and users
• Designers and Architects: the ones that design the system at the high level
• Developers: the ones who code
• Testers: the ones who test the code
Companies may include only a subset of the roles
Some roles may be covered by the same person
Roles in SE
• Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make
management decisions (can also be a single person in case the co)
• Managers: three different levels of management may be present in a large company (low, middle, top)

• Top-level managers are responsible for controlling and overseeing the entire organization.

• Middle-level managers are responsible for executing organizational plans which comply with the company’s
policies. These managers act at an intermediary between top-level management and low-level management.

• Low-level managers focus on controlling and directing. They serve as role models for the employees they
supervise.

• Customers: the ones who buy the system
• Users: the ones who use the system
• Requirements/Business Analysts: the ones that gather requirements from customers and users
• Designers and Architects: the ones that design the system at the high level
• Developers: the ones who code
• Testers: the ones who test the code
The roles may depend on the adopted software process!
Companies may include only a subset of the roles
Some roles may be covered by the same person
(Main) Tasks in Software Engineering
• Requirements Elicitation and Analysis

• Software Architecture

• Software Development 

• Software Testing

• Software Documentation

• Software Maintenance

• Software Process Management
Formulating Research
Questions
cf. R. Feldt http://www.robertfeldt.net/advice/guide_to_creating_research_questions.pdf
Research Question(s)• Every research endeavour starts with a question about the world: a problem to solve, a curiosity about some
observed fact (subconsciously related to something relevant that you may not be able to always articulate, e.g.,
why do developers prefer to work at night? —why are you asking this question? Because it’s interesting, but why
it is so?), a curiosity about some unknown fact (which are the most frequent defects in opensource code?)

• The research question is the inquiry that guides your research: 

• e.g., Which are the most frequent defects in code developed by people with less than 6 months experience?
Which are the most frequent defects in code developed by people with 6 months to 3 years experience? […]
• You normally structure your research and reporting according to one or more research questions: they help to
clarify your GOAL to the reader but also TO YOU

• If you have more than one research question, it is good to establish a general research question (or research
objective): 

• e.g., (mainly considering HOW aspects) To which extent certain defects types are related to the degree of
experience of the developer?

• e.g., (a more general one, may include also WHY aspects) Which is the relationship between defect types and
degree of experience of the developer?
Research Question(s)• Every research endeavour starts with a question about the world: a problem to solve, a curiosity about some
observed fact (subconsciously related to something relevant that you may not be able to always articulate, e.g.,
why do developers prefer to work at night? —why are you asking this question? Because it’s interesting, but why
it is so?), a curiosity about some unknown fact (which are the most frequent defects in opensource code?)

• The research question is the inquiry that guides your research: 

• e.g., Which are the most frequent defects in code developed by people with less than 6 months experience?
Which are the most frequent defects in code developed by people with 6 months to 3 years experience? […]
• You normally structure your research and reporting according to one or more research questions: they help to
clarify your GOAL to the reader but also TO YOU

• If you have more than one research question, it is good to establish a general research question (or research
objective): 

• e.g., (mainly considering HOW aspects) To which extent certain defects types are related to the degree of
experience of the developer?

• e.g., (a more general one, may include also WHY aspects) Which is the relationship between defect types and
degree of experience of the developer?
Many times a clear formulation of the general research question
comes AFTER the formulation of the more specific research questions
Research Question(s)• Every research endeavour starts with a question about the world: a problem to solve, a curiosity about some
observed fact (subconsciously related to something relevant that you may not be able to always articulate, e.g.,
why do developers prefer to work at night? —why are you asking this question? Because it’s interesting, but why
it is so?), a curiosity about some unknown fact (which are the most frequent defects in opensource code?)

• The research question is the inquiry that guides your research: 

• e.g., Which are the most frequent defects in code developed by people with less than 6 months experience?
Which are the most frequent defects in code developed by people with 6 months to 3 years experience? […]
• You normally structure your research and reporting according to one or more research questions: they help to
clarify your GOAL to the reader but also TO YOU

• If you have more than one research question, it is good to establish a general research question (or research
objective): 

• e.g., (mainly considering HOW aspects) To which extent certain defects types are related to the degree of
experience of the developer?

• e.g., (a more general one, may include also WHY aspects) Which is the relationship between defect types and
degree of experience of the developer?
Many times a clear formulation of the general research question
comes AFTER the formulation of the more specific research questions
TIP: sometimes you can formulate the general research question
as a Research Objective, e.g.: Understanding to which extent certain defect types are
related to the degree of experience of a developer
Types of Research Questions (from Robert Feldt)
Research
Questions (RQs)
Solution-focused Knowledge-focused
Creating Refining Exploratory Base-rate
Existence Descriptive Comparative
Relationship
Frequency Process Existence Causality
Comparative Context
http://www.robertfeldt.net/advice/guide_to_creating_research_questions.pdf
Types of Research Questions (from Robert Feldt)
Research
Questions (RQs)
Solution-focused Knowledge-focused
Creating Refining Exploratory Base-rate
Existence Descriptive Comparative
Relationship
Frequency Process Existence Causality
Comparative Context
if not much is known about
the phenomenon under study,
we want to create tentative theories,
and give some evidence that a certain phenomenon
can be measured
(e.g., To which extent do developers get tired of coding?)
Types of Research Questions (from Robert Feldt)
Research
Questions (RQs)
Solution-focused Knowledge-focused
Creating Refining Exploratory Base-rate
Existence Descriptive Comparative
Relationship
Frequency Process Existence Causality
Comparative Context
describe when and how the phenomenon
under study appears (normal patterns), when we already
have a well-defined problem and context
(e.g., When do developers get tired of coding?)
Types of Research Questions (from Robert Feldt)
Research
Questions (RQs)
Solution-focused Knowledge-focused
Creating Refining Exploratory Base-rate
Existence Descriptive Comparative
Relationship
Frequency Process Existence Causality
Comparative Context
describe how the phenomenon
under study relates to other phenomena
(e.g., Why do developers get tired of coding?)
Types of Research Questions (from Robert Feldt)
Research
Questions (RQs)
Solution-focused Knowledge-focused
Creating Refining Exploratory Base-rate
Existence Descriptive Comparative
Relationship
Frequency Process Existence Causality
Comparative Context
describe better ways to solve problem or situation
Which strategies help to achieve X?
How can we refine S to achieve X in a better way?
“How can we refined S to achieve X in a better way?”
Sub-Types of RQs Examples
Exploratory/Existence “Does X exist?”,
“Is Y something that software engineers really do?”
Exploratory/Descriptive “What is X like?”,
“What are its properties/attributes?”,
“How can we categorize/measure X?”,
“What are the components of X?”
Exploratory/Comparative “How does X differ from Y?”
Base-rate/Frequency “How often does X occur?”,
“What is an average amount of X?”
Base-rate/Process “How does X normally work?”,
“What is the process by which X happens?”,
“In what sequence does the events of X occur?”
Relationship/Existence “Are X and Y related?”,
“Do occurrences of X correlate with Y?”
“What correlates with X?”
Relationship/Causality “What causes X?”,
“Does X cause Y?”,
“Does X prevent Y?”,
Causality/Comparative “Does X cause more Y than Z does?”,
“Is X better at preventing Y than Z is?”
Causality/Context “Does X cause more Y under one condition than others?”
Creating Research Questions
• Select overarching research topic (e.g., software development speed)

• Do you want to create more and better understanding (Knowledge-based),

or are you seeking for a solution to a problem (Solution-based)?

• (Knowledge-based) e.g., what affects development speed? 

how can we measure development speed?
• How much is known about the topic?

• Not much (Explorative): how can we measure development speed?

• We know the phenomenon, but not how or when it occurs (Base-rate): what is the
development speed of agile teams?
• We know the phenomenon, but not its causes (Relationship): what affects development
speed?
• (Solution-based) e.g., how can I improve development speed? what is the easiest way to improve
development speed?
• One research question is not sufficient and you need a combination of them, so try to find a main
research question or objective, and identify sub-questions, e.g., by checking the types of questions in
the previous table and adapting them to your problem
Data Types,
Measure, Scales
cf. Wholin et al., 2012, https://doi.org/10.1007/978-3-642-29044-2
Alessio Ferrari, ISTI-CNR, Pisa, Italy

alessio.ferrari@isti.cnr.it
Empirical Methods in
Software Engineering
experiment
with human subjects
literature review /
archival analysis
interview
and ethnography
survey
case study field study
experiment
with software subjects
judgment study
Empirical Methods in
Software Engineering
experiment
with human subjects
literature review /
archival analysis
interview
and ethnography
survey
case study field study
experiment
with software subjects
judgment study
Empirical inquiries entail OBSERVATION
Empirical Methods in
Software Engineering
experiment
with human subjects
literature review /
archival analysis
interview
and ethnography
survey
case study field study
experiment
with software subjects
judgment study
Empirical inquiries entail OBSERVATION
Regardless of the research method
you use, you will need to collect
and analyse data
Qualitative and Quantitative
Types of Data
• As you are doing empirical research, you will need to
collect data, regardless of the method you use

• Qualitative data (aka WORDS) come from interviews,
surveys but also from other sources that may be relevant
for SE, such as social-media opinions, code comments,
app reviews

• Quantitative data (aka NUMBERS) come from
measurements (also done on qualitative data)
Measure
• A MEASURE is a mapping from the attribute of an entity to a
measurement value, which can be numerical or categorical (a label)

• Entities are objects we can observe in the real world, and have attributes

• entity: source code; 

• attribute: complexity; 

• measure A: lines of code; value A: 1000 

• measure B: evaluation made by user; value B: “very complex”

• The purpose of mapping the attributes into a measurement value is to characterize
and manipulate the attributes in a formal way.

• To be valid, the measure must not violate any necessary properties of the attribute it
measures and it must be a proper mathematical characterization of the attribute (if
code X is more complex than code Y, this should be reflected in the measure)
Scale
• A mapping of an attribute to a measurement value can be
done in different ways, and each way is a scale

• Complexity can be measured in lines of code (LOC) or in
“evaluation made by user”, these are different scales
Entity
(Source Code)
Attribute
(Complexity)
Measure
Measurement
Value (1000)
Scale
(LOC)
Scale Types (Level of Measurement)
• Nominal (named values): maps the attribute of the entity into a name or symbol; can be
seen as a form of classification of the attribute (e.g., types of code defects)

• Ordinal (named and ordered values): the ordinal scale ranks the entities after an
ordering criterion (“greater than”, “better than”, and “more complex”), (e.g., catastrophic,
critical, marginal, negligible risk)
• Interval (named, ordered and proportionate intervals): the interval scale is used when
the difference between two measures are meaningful, but the value itself is not
meaningful. 

• This scale type orders the values in the same way as the ordinal scale but there is a
notion of “relative distance” between two entities 

• Rare in SE, temperature in Celsius is a typical interval scale, but you can set up a
scale like the IQ (Intelligence Quotient) also in SE (e.g, usability scale based on a test)
• Ratio (named, ordered, proportionate intervals, have a meaningful zero): if there
exists a meaningful zero value (negative values do not exist) and the ratio between two
measures is meaningful (e.g., lines of code is a ratio scale)
Scales: Time and Duration
• (Clock) Time and Duration: what types of scale are them?
• Time is an interval scale
• Duration is a ratio scale
• Time is an interval measure when using any standard calendar and time
measurement system as there is no fixed start point

• 2018/10/23:20:10 CE and 2018/10/23:20:20 CE; there is a 10 second gap but
the latter is not twice the former and there is no meaningful 0

• Duration (the amount of time something takes) is a ratio measure as it has a
meaningful zero

• 20 seconds is twice as long as 10 seconds and 10 days is twice as long as 5
days.
cf. https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/
Scale Types and Power
cf. https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/
Different scale types imply different allowed operations
Measure Types
• Objective: an objective measure is a measure where there is no judgement in the
measurement value and is therefore only dependent on the entity that is being
measured. 

• An objective measure can be measured several times and by different researchers,
and the same value can be obtained within the measurement error. 

• Subjective: a subjective measure is the opposite of the objective measure. The person
making the measurement contributes by making some sort of judgement. The measure
depends on both the entity and the viewpoint from which they are taken. 

• A subjective measure can be different if the entity is measured again. A subjective
measure is mostly of nominal or ordinal scale type.
• Direct: does not involve measurements on other attributes (e.g., LOC).
• Indirect: is derived from the other measurements of other attributes, possibly involving
more than one entity (e.g., defect density, productivity).
Measurements in SE
• In SE we normally measure three classes of entities

• PROCESS: The process describes which activities that
are needed to produce the software.

• PRODUCT: The products are the artifacts, code,
deliverables or documents that results from a process
activity.

• RESOURCES: Resources are the objects, such as
personnel, hardware, budget, needed for a process
activity.
Measurements in SE
• Relevant measures in SE are often indirect, subjective
and are normally expressed in nominal or ordinal scale

• Most of the times we want to link some internal attribute
(e.g, code size, colours of the GUI) to an external one
(e.g., perceived complexity, usability)

• In principle we could not apply advanced statistical
analysis when we deal with these measures…however,
we do it anyway (but we should always reflect on the risks
and on the value of our conclusions)
cf. Briand et al. https://doi.org/10.1007/BF00125812
Data Collection
Techniques in SE
cf. Lethbridge et al., 2005, https://doi.org/10.1007/s10664-005-1290-x
Data Collection
• Before measuring you need to collect data that can be relevant to your research
questions

• Depending on the question, you may need different data collection techniques

• Normally, the data collection technique is also driven by the context that you CAN
access: 

• are you in contact with a company and can you interview people? —interview

• are you in contact with the company and you can create meetings with their
developers? —focus group

• you do not have any direct contact with companies, but you can reach some
people? —questionnaire

• you need to compare the performance of different tools, which licenses do you
have? Can you buy them? —static and dynamic analysis of a system
Reaching the (RIGHT) source of information is one of the hardest part…
Data Collection Techniques
cf. Lethbridge et al., 2005, https://doi.org/10.1007/s10664-005-1290-x
requires direct access to a participant population. Second degree contact requires access
Table 1. Data collection techniques suitable for field studies of software engineering.
Category Technique
Inquisitive techniques
First Degree
(direct involvement of software engineers)
& Brainstorming and Focus Groups
& Interviews
& Questionnaires
& Conceptual Modeling
Observational techniques
& Work Diaries
& Think-aloud Protocols
& Shadowing and Observation Synchronized Shadowing
& Participant Observation (Joining the Team)
Second Degree
(indirect involvement of software engineers)
& Instrumenting Systems
& Fly on the Wall (Participants Taping Their Work)
Third Degree (study of work artifacts only) & Analysis of Electronic Databases of Work Performed
& Analysis of Tool Use Logs
& Documentation Analysis
& Static and Dynamic Analysis of a System
DATA COLLECTION METHODS 313
Data Collection Techniques
Humans tend not to be reliable reporters, as they often do not remember past events
with a high degree of accuracy. Records of activities, such as tapes, work products, and
repositories, tend to be more reliable. However, care must be taken when interpreting
these data sources as they may not be consistent, internally or with each other.
Despite their drawbacks, first degree techniques are invaluable because of their
flexibility and the phenomenon they can be used to study. Existing logs and repositories
are easy to use but the data available is highly constrained. Software engineers, on the
other hand, can be asked about a much wider range of topics. Second degree techniques
Figure 1. Cost, reliability, flexibility, and phenomena addressed.
NO INVOLVEMENT,
e.g., analysis
of data logs
INDIRECT involvement
of people,
e.g., instrumentation
DIRECT involvement of people
e.g., brainstorming,
questionnaires
people intensive data intensive
1st Degree Techniques:
Direct Involvement
• Inquisitive:
• Brainstorming / Focus Groups

• Interviews

• Questionnaires / Surveys

• Conceptual modelling

• Observational:
• Think Aloud

• Shadowing / Observation

• Participant Observation
Brainstorming
and Focus Groups
• What are they: based on a simple trigger question, people are free to express whatever comes to their
mind, initially on paper, and then take turns to speak.

• Advantages:
• new to a domain and seeking ideas for further exploration

• rapidly identifying what is important to the participant population

• sense of involvement in the research
• Disadvantages:
• can become unfocused

• hard to schedule with busy developers (you need to stop the activity of many people)

• Example: understanding factors leading to success and failure of software process improvement.
Researchers involved 13 software companies and implemented 49 focus groups. The groups were
comprised of between 4 and 6 participants. Each session lasted 90 minutes. There were three types of
groups: senior managers, project managers, and developers. The focus groups were moderated and
tackled very specific questions aimed at understanding several factors leading to success and failure
for software process improvement. 

•
Interviews
• What are they: ask a series of questions to some relevant actor of the software process
• Advantages:
• People are familiar with question-answering

• People tend to be happy when someone asks about them

• Create rapport with people

• Possibility to clarify

• Disadvantages:
• People are not always reliable, and this can bias the results

• Difficulties in sampling (random sampling often not applicable)

• Time consuming: scheduling, data transcription, etc.

• Example: Study the design process used on 19 different projects at various organizations. They
interviewed personnel from three different levels of the participating projects, systems engineers,
senior software designers and project managers. The researchers conducted 97 interviews, which
resulted in over 3000 pages of transcripts of the audio recordings.
Questionnaires and Surveys
• What are they: written pre-defined questions to be answered by people.
• Advantages:
• quick and easy to administer

• reach more people

• Disadvantages:
• difficult to clarify questions and answers

• return rates can be low (10% normally, 20% if you’re lucky)

• Example: paper-based questionnaire to identify factors affecting a certain
tool adoption in 52 organizations. The author contacted organizations who
had purchased the tools and surveyed key information systems personnel
about the use of the tool.
Surveys and questionnaire are treated as synonyms here
Conceptual Modelling
• What is it: participants create a diagram of some aspect of their work, often a system architecture
or organisational structure or process. The intent is to bring to light their mental models.
• Advantages:
• easy to collect (drawing)

• can explain systems that are hard to understand otherwise

• Disadvantages:
• require domain knowledge to be interpreted

• can be hard to convince the engineers or other subjects to draw details

• Example:
• Identify the process in terms of tools, actors and tasks, for performing reimbursement of
expenses in a public administration office. The goal was to re-engineer the process. Interviews
with personnel to gather information, graphical diagrams shown to the personnel, and
validation of the diagrams.
Conceptual modelling requires interviews or focus groups
Work Diaries
• What are they: require participants to record various events that occur during the day. Filling out a
form at the end of the day, recording specific activities as they occur, or noting whatever the current
task is at a pre-selected time.
• Advantages:
• better self-reports of events because they record activities on an ongoing basis rather than in
retrospect

• you can randomly sample work diary moment

• Disadvantages:
• you need to convince people

• can interfere with respondent as they work (recording can affect the work)

• people could neglect to record

• Example: I want to know which are the communication patterns (who do you contact, and about
what) in a company. I ask developers to record their communication patterns for a period of one
week. Identification of the interaction between the team members, and the typical communication
patterns of developers.
Think Aloud
• What are they: researchers ask participants to think out loud while performing a task.
As software engineers sometimes forget to verbalize, experimenters may occasionally
remind them to continue thinking out loud. Usually last no more than 2 hours. 

• Advantages:
• One of the few ways to test a cognitive model

• Easy to implement

• You can also ask to write down

• Disadvantages:
• Difficult and time consuming to analyse output

• Example: I want to understand the strategy used by developers when debugging. I
give a certain piece of software, ask them to add some functions, the system will return
an error, and then I ask them to debug the code and think aloud about what they do.
•
Shadowing/Observation
• What are they: with shadowing the experimenter follows and observe the
participant and records their activities. With observation, I follow and observe more
participants (e.g., in meetings).
• Advantages:
• Easy to implement

• No special equipment needed

• Disadvantages:
• Know just the general, observable activity

• Need to know the environment and domain very well

• Can be annoying for people, and could bias their behaviour

• Example: I want to monitor informal communication in the group, and I observe an
open development space for a certain amount of days.
Participant Observation (Join the Team)
• What are they: the researcher joins the development team and perform some activities like the
others.
• Advantages:
• More acceptance by the participants

• Deeper understanding of the dynamics

• Create rapport with people

• You can contribute to the team

• Disadvantages:
• Extremely time consuming (it’s an additional job)

• May lose external perspective

• Example: Over 17 months, a researcher participated in 23 code inspection meetings. From his
participation, he developed a series of hypotheses on how factors such as familiarity,
organizational distance, and physical distance are related to how much time is spent on
discussion and tasks.
2nd Degree Techniques:
Indirect Involvement
• Instrumenting Systems

• Fly-on-the-Wall: participants recording their own work
The researcher needs to have contact
with the research environment and with the participants, but:
1. does not need to interact with them during data collection,
and 2. not much effort is required to participants
Instrumenting Systems
• What is it: monitor developer-system interaction during a certain task, e.g., with eye tracking, cameras,
wristband, or add-on tools for logging. 

• Advantages:
• No time commitment for software engineering (unless you carry out an experiment)

• Accurate information

• Disadvantages:
• Data are “raw” and do not have a clear meaning

• Ethical concerns in monitoring users

• Example: I want to monitor the degree of engagement of software developers in a company. I ask them
to use wristbands during their day to record their engagement (sensed engagement). I instrument their
computers with a logger to check what they are doing. I ask them to write down their degree of
engagement every 30 minutes (working diary, reported engagement). I check to which extent the two
measures (sensed engagement and reported) are in agreement, and what were the developers doing.
Fly-on-the-Wall
• What is it: participants are required to record or videotape themselves when they do a
specific task.

• Advantages:
• Little effort required by the participant

• No direct interaction with the researcher

• Disadvantages:
• High amount of data and high cost for analysing them

• Videos are multi-modal data and analysing them is not straightforward

• Not always easy to understand the content of videos

• Example: I ask the team to video tape each meeting they do for a certain period (e.g.,
an iteration). I review the recording to see specific patterns of interaction, and the roles
of the people.
3rd Degree Techniques
aka Mining Software Repositories
• Analysis of electronic database of work performed /
Analysis of tool logs 

• Document Analysis such as code documentation and
other software related documents

• Static and Dynamic analysis of a system (Software
Analytics)
Require access only to work artefacts, such
as source code or documentation
3rd Degree Techniques
aka Mining Software Repositories
• Analysis of electronic database of work performed /
Analysis of tool logs 

• Document Analysis such as code documentation and
other software related documents

• Static and Dynamic analysis of a system (Software
Analytics)
Require access only to work artefacts, such
as source code or documentation
In recent years, with the development of shared repositories, such
as GitHub, these data collection activities go under the name
Mining Software Repositories
Analysis of electronic database
of Work Performed and Tool Logs
• What is it: access to the platforms for issue or bug reporting (e.g., Bugzilla), change request, configuration
management systems, version control systems (e.g., git)
• Advantages:
• Large amount of data

• Stable and independent of the researcher

• People do not need to do extra work 

• Disadvantages:
• Too much data!

• Limited knowledge of work environment

• People do not necessarily fill all the information needed (e.g., in commit messages)

• Different process management policies in different companies, and this may impact on the data

• Example: I want to understand which are the typical patterns of software evolution. I analyse the change
requests and commits in a certain software repository and check, e.g., when are they typically performed,
by whom, and if there is a typical sequence of actions.
Document Analysis
• What is it: analysis of documents related to the software process, such as code comments, 

e-mails, stack overflow, twitter, app review, developer’s documentation, users’ manual, etc.
• Advantages:
• Large amount of data in natural language (English, Italian, German, etc.)

• Written information can answer why questions

• Researcher’s independent

• Disadvantages:
• Requires knowledge of the context

• Natural language processing (NLP) techniques needed for large amount of data

• Data are often “dirty”

• Example: I want to understand whether the app reviews on the Apple Store actually contain
potential new requirements for the app. I ask some subjects to check a certain amount of reviews,
identify requirements, and check their agreement (I can also decide to automatically predict
whether a certain review includes a requirement or not, based on the manually checked reviews).
Static and Dynamic Analysis of
a System (Software Analytics)
• What is it: analyze the code (static analysis) or traces generated by running the code (dynamic analysis) to
learn about the design, and indirectly about how software engineers think and work. One might compare the
programming or architectural styles of several software engineers by analyzing their use of various constructs,
or the values of various complexity metrics.
• Advantages:
• Large amount of data

• Researcher’s independent

• Analysis tools are emerging (https://github.com/ishepard/pydriller, https://github.com/uni-bremen-agst/
libvcs4j, https://ghtorrent.org)

• Disadvantages:
• Source code is not always easy to understand

• Dynamic behaviour is even more difficult

• Need to resort on automatic support

• Example: I want to check which are the most frequent dynamic errors triggered by software in GitHub. 

I download a selection of representative projects, and analyse them with an abstract interpretation tool, and
see which are the typical errors.
Data Collection Techniques: Summary
Table 2. Questions asked by software engineering researchers (column 2) that can be answered by field study techniques.
Technique Used by researchers when their goal is to understand: Volume of data
Also used by software
engineers for:
First Order Techniques
Brainstorming and Focus Groups Ideas and general background about the process and product,
general opinions (also useful to enhance participant rapport)
Small Requirements gathering,
project planning
Surveys General information (including opinions) about process,
product, personal knowledge etc.
Small to Large Requirements and evaluation
Conceptual modeling Mental models of product or process Small Requirements
Work Diaries Time spent or frequency of certain tasks (rough approximation,
over days or weeks)
Medium
Think-aloud sessions Mental models, goals, rationale and patterns of activities Medium to large UI evaluation
Shadowing and Observation Time spent or frequency of tasks (intermittent over relatively
short periods), patterns of activities, some goals and rationale
Small Advanced approaches to use
case or task analysis
Participant observation
(joining the team)
Deep understanding, goals and rationale for actions, time spent
or frequency over a long period
Medium
Second Order Techniques
Instrumenting systems Software usage over a long period, for many participants Large Software usage analysis
Fly in the wall Time spent intermittently in one location, patterns of activities
(particularly collaboration)
Medium
Third Order Techniques
Analysis of work databases Long-term patterns relating to software evolution, faults etc. Large Metrics gathering
Analysis of tool use logs Details of tool usage Large
Documentation analysis Design and documentation practices, general understanding Medium Reverse engineering
Static and dynamic analysis Design and programming practices, general understanding Large Program comprehension,
metrics, testing, etc.
DATACOLLECTIONMETHODS315
NOTE: “first order” in this table means “first degree”
cf. Lethbridge et al., 2005, https://doi.org/10.1007/s10664-005-1290-x
Building Theories in
Software Engineering
cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12
cf. Mendez Fernandez, https://www.slideshare.net/mendezfe/an-introduction-into-
philosophy-of-science-for-software-engineers 

Alessio Ferrari, ISTI-CNR, Pisa, Italy

alessio.ferrari@isti.cnr.it
What is a Theory?
• A statement about the existence of some pattern in the
entities that belong to a certain context



• The boundary of the context determines the scope of
applicability of the theory (e.g., all the people in a certain
company vs all the C developers of the world)
A theory exists where some form of REGULARITY can be identified
“entities” =
“observable phenomena” =
“events and objects”
What is a Theory?
• I can have different levels of sophistication of a theory, depending to how
much abstract are the entities considered (how far are them from direct,
measurable observations):

• Low: 90% of faults are found in functions that are longer than 1000
LOC (once a definition of fault is given, this can be verified quite
precisely)

• Medium: Requirements defects can be classified into unclarity,
incompleteness and incorrectness (I need precise definitions for the
three classes, I have to assess that all the existing defects can be
linked to one of the classes, I have to check that every reader classifies
in the same manner…verification of the theory is complicated)

• High: If the team leader is not self-confident, developers lose trust 

(I need measures for self-confidence and trust, verification of the
theory is VERY complicated)
In SE you will find all these different types of theory
What a Theory Does?
Description Explanation Prediction
Explanation
and
Prediction
Design
and
Action
descriptions
and conceptualisations
(taxonomies, ontologies,
e.g., defect types example)
identify the motivation
(e.g., team leader example)
predict according
to a model
(e.g., fault example)
find model and motivation
prescriptive
(e.g., testing
resources
example,
initial slides)
cf. also https://www.quora.com/How-can-statistics-tell-us-about-causality
What are the Elements 

of a Theory?
• The elements of a theory can be framed according to 6 questions
What
How
Why
Where When
for
Whom
Scope conditions
What are the entities in terms of which a theory offers description, explanation,
prediction or prescription? 

These are the constructs of a theory.
How are the constructs related? Relationships between constructs make up a
theory’s propositions, and describe how the constructs interact. 

Can lead to predictions

Why do the relationships hold? Answers to this question 

are what give the theory explanatory power
Identify the circumstances in which the theory 

is applicable (the context)
Constructs (WHAT)12 Building Theories in Software Engineering 323
Table 3 Constructs, propositions, example explanations and scope of the theory of UML-based
development
Constructs
C1 UML-based development method
C2 Costs (total number of person hours in the project)
C3 Communication (ease of discussing solutions within development teams and in reviews)
C4 Design (perceived structural properties of the code)
C5 Documentation (the documentation of the system for the purpose of passing reviews as
well as for expected future maintainability)
C6 Testability (more efficient development of test cases and better quality, i.e., better coverage)
C7 Training (training in the UML-based method before the start of the project)
C8 Coordination (of requirements and teams)
C9 Legacy code (code that has not been reverse engineered to UML-models)
Propositions
P1 The use of a UML-based development method increases costs
P2 The use of a UML-based development method positively affects communication
P3 The use of a UML-based development method positively affects design
P4 The use of a UML-based development method positively affects documentation
P5 The use of a UML-based development method positively affects testability
We will see more of this example later on
Propositions (HOW)
C2 Costs (total number of person hours in the project)
C3 Communication (ease of discussing solutions within development teams and in reviews)
C4 Design (perceived structural properties of the code)
C5 Documentation (the documentation of the system for the purpose of passing reviews as
well as for expected future maintainability)
C6 Testability (more efficient development of test cases and better quality, i.e., better coverage)
C7 Training (training in the UML-based method before the start of the project)
C8 Coordination (of requirements and teams)
C9 Legacy code (code that has not been reverse engineered to UML-models)
Propositions
P1 The use of a UML-based development method increases costs
P2 The use of a UML-based development method positively affects communication
P3 The use of a UML-based development method positively affects design
P4 The use of a UML-based development method positively affects documentation
P5 The use of a UML-based development method positively affects testability
P6 The positive effects of UML-based development are reduced if training is not sufficient
and adapted
P7 The positive effects of UML-based development are reduced if there is insufficient coordi-
nation of modelling activities among distributed teams working on the same project
P8 The positive effects of UML-based development are reduced if the activity includes
modification of legacy code
Explanations
E4 The documentation is
– More complete
– More consistent due to traceability among models and between models and code
– More readable, and makes it easier to find specific information, due to a common
format
– More understandable for non-technical people
– May be viewed from different perspectives due to different types of diagram
E5 Test cases based on UML models
– Are easier to develop
– Can be developed earlier
Explanation (WHY)
are specified further into propositions of the theory, as indicated in Fig. 3; the
P3 The use of a UML-based development method positively affects design
P4 The use of a UML-based development method positively affects documentation
P5 The use of a UML-based development method positively affects testability
P6 The positive effects of UML-based development are reduced if training is not sufficient
and adapted
P7 The positive effects of UML-based development are reduced if there is insufficient coordi-
nation of modelling activities among distributed teams working on the same project
P8 The positive effects of UML-based development are reduced if the activity includes
modification of legacy code
Explanations
E4 The documentation is
– More complete
– More consistent due to traceability among models and between models and code
– More readable, and makes it easier to find specific information, due to a common
format
– More understandable for non-technical people
– May be viewed from different perspectives due to different types of diagram
E5 Test cases based on UML models
– Are easier to develop
– Can be developed earlier
– Are more complete
– Have a more a unified format
Moreover, traceability from requirements to code and test cases makes it is easier to
identify which test cases must be run after an update
Scope
The theory is supposed to be applicable for distributed projects creating and modifying
large, embedded, safety-critical subsystems, based on legacy code or new code
Scope Conditions (WHEN,
WHERE, for WHOM…)
are specified further into propositions of the theory, as indicated in Fig. 3; the
propositions P6–P8 are examples of moderators.
The scope of the theory is also illustrated in the diagram. Scope conditions are
typically modelled as subclasses or component classes. Figure 3 shows that our
– More readable, and makes it easier to find specific information, due to a common
format
– More understandable for non-technical people
– May be viewed from different perspectives due to different types of diagram
E5 Test cases based on UML models
– Are easier to develop
– Can be developed earlier
– Are more complete
– Have a more a unified format
Moreover, traceability from requirements to code and test cases makes it is easier to
identify which test cases must be run after an update
Scope
The theory is supposed to be applicable for distributed projects creating and modifying
large, embedded, safety-critical subsystems, based on legacy code or new code
Scope Conditions (WHEN,
WHERE, for WHOM…)
are specified further into propositions of the theory, as indicated in Fig. 3; the
propositions P6–P8 are examples of moderators.
The scope of the theory is also illustrated in the diagram. Scope conditions are
typically modelled as subclasses or component classes. Figure 3 shows that our
– More readable, and makes it easier to find specific information, due to a common
format
– More understandable for non-technical people
– May be viewed from different perspectives due to different types of diagram
E5 Test cases based on UML models
– Are easier to develop
– Can be developed earlier
– Are more complete
– Have a more a unified format
Moreover, traceability from requirements to code and test cases makes it is easier to
identify which test cases must be run after an update
Scope
The theory is supposed to be applicable for distributed projects creating and modifying
large, embedded, safety-critical subsystems, based on legacy code or new code
This example theory answers all the questions,
but the theories you develop may answer
only a SUBSET of the questions
(e.g., WHY is left to other researchers)
How are Theories Formed?
Induction, Deduction, Abduction
Theory
Observation
Induction
Hypothesis
Deduction
Test
Theory
Abduction
Deduction
How are Theories Formed?
Induction, Deduction, Abduction
Theory
Observation
inference of a
generalized conclusion
from particular instances
Induction
Hypothesis
Deduction
Test
Theory
Abduction
Deduction
derive testable
hypothesis for a theory
How are Theories Formed?
Induction, Deduction, Abduction
Theory
Observation
inference of a
generalized conclusion
from particular instances
Induction
Hypothesis
Deduction
Test
Theory
Abduction
Deduction
How are Theories Formed?
Induction, Deduction, Abduction
Theory
Observation
inference of a
generalized conclusion
from particular instances
Induction
Hypothesis
Deduction
Test
Theory
Abduction
Deduction
How are Theories Formed?
Induction, Deduction, Abduction
Theory
Observation
inference of a
generalized conclusion
from particular instances
Induction
Hypothesis
Deduction
Test
Theory
Abduction
generalize from theories
Deduction
derive testable
hypothesis for a theory
How are Theories Formed?
Induction, Deduction, Abduction
Theory
Observation
inference of a
generalized conclusion
from particular instances
Induction
Hypothesis
Deduction
Test
Theory
Abduction
generalize from theories
Deduction
Criteria for Evaluating Theories12 Building Theories in Software Engineering 319
the presence of a falsifiable theory, which gives rise to hypotheses that are tested
by observation. Although this framework as such has been overtaken by other
frameworks (Ruse, 1995), the principle of testability remains fundamental for
empirically-based theories. There are no commonly agreed set of criteria for evalu-
ating testability, but we will emphasize the criteria as follows: (1) The constructs
Table 1 Criteria for evaluating theories
Testability The degree to which a theory is constructed such that empirical
refutation is possible
Empirical support The degree to which a theory is supported by empirical studies that
confirm its validity
Explanatory power The degree to which a theory accounts for and predicts all known
observations within its scope, is simple in that it has few ad hoc
assumption, and relates to that which is already well understood
Parsimony The degree to which a theory is economically constructed with a mini-
mum of concepts and propositions
Generality The breadth of the scope of a theory and the degree to which the theory
is independent of specific settings
Utility The degree to which a theory supports the relevant areas of the software
industry
cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12 

To what extent does my theory explain WHY?
Step-by-Step guide to
Formulating Theories (Deductive)
1. Define constructs of the theory (can be novel constructs,
existing ones, or refinements of existing ones)

2. Define propositions (novel, modifications/refinements of
existing ones)

3. Provide explanations to justify the theory (explicit assumptions
and logical justifications for the constructs and propositions of
the theory, referring to existing theories, also from other
disciplines)

4. Define the scope of interest (values of constructs, and
combinations thereof, that the theory is oriented to explain)
Step-by-Step guide to
Formulating Theories (Deductive)
1. Define constructs of the theory (can be novel constructs,
existing ones, or refinements of existing ones)

2. Define propositions (novel, modifications/refinements of
existing ones)

3. Provide explanations to justify the theory (explicit assumptions
and logical justifications for the constructs and propositions of
the theory, referring to existing theories, also from other
disciplines)

4. Define the scope of interest (values of constructs, and
combinations thereof, that the theory is oriented to explain)
Every time you are applying an empirical method
you are actually building theories
Step-by-Step guide to
Formulating Theories (Deductive)
5. Test the theory through empirical research (examination of the validity of the theory’s
predictions through empirical studies):

1. Choosing an appropriate research setting and sample. The sample does not
only include the actors, but also the sample of technologies, activities (tasks)
and systems.

2. Operationalizing theoretical constructs into empirical variables (e.g., justify the
connection between complexity of software and its measure in lines of code)

3. Operationalizing theoretical propositions into empirically testable hypotheses
(definition of hypotheses in terms of empirical variables)

4. Application of qualitative or quantitative methods to test the hypotheses
(when speaking about hypothesis testing, we normally refer to quantitative
statistical tests, however the conceptual process is the same also for qualitative
methods)

6. Define scope of validity (part of the scope of interest in which the theory has actually
been validated)
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Step-by-Step Graphical Guide (Deductive)
Theory
Operationalisation
(Variables
Hypothesis
and Sample Definition)
Data Collection and
Measurements
Data
Analysis
(Hypothesis
Testing)
Confirm/Reject
and
Scope
of Validity
Scope of Validity
328 D.I.K. Sjøberg et al.
to the scope of interest. The first consideration to make in testing a theory is to
make sure that the study fits the theory’s scope of interest. Otherwise, the results
would be irrelevant to that theory. Moreover, in a given study, typically only a part
Fig. 4 Scope of interest versus scope of validity
cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12
Threats to Validity
• Empirical support (or inconsistencies) between theoretical
propositions and empirical observations do not necessarily imply that
the theory is validated (or disconfirmed)

• Judgements regarding the validity of the theory require that the study
is well conducted, and not encumbered with

• Invalid operationalization of theoretical constructs and propositions

• Inappropriate research design

• Inaccuracy in data collection and data analysis

• Misinterpretation of empirical findings
Threats to Validity
• Construct: have I operationalised all the constructs
correctly? (e.g., is LOC a good scale to measure
complexity?)

• Internal: are there aspects that may have influenced my
outcome and that I did not consider? (identify
confounding variables, e.g., did the people already see
the code they are evaluating?)

• External: to which extent are my findings generalisable
(how much of the scope of interest is covered, e.g., which
type of languages are considered?)
Each research method has specific classifications
for threats to validity, we will see them later,
here are three general notions
Theory
Operationalisation
Sample Definition, Data
Collection
Step-by-Step Graphical Guide (Inductive)
Often I do not have the information to identify
construct, propositions, and explanations
before data collection
Therefore I start with data collection!
Constructs, propositions and
explanations are extracted from the
data (normally QUALITATIVE)
Theory
Data Analysis and
Operationalisation
Sample Definition,
Data Collection
Step-by-Step Graphical Guide (Inductive to Deductive)
INDUCTIVE
Theory
Data Analysis and
Operationalisation
Sample Definition,
Data Collection
Step-by-Step Graphical Guide (Inductive to Deductive)
INDUCTIVE
Operationalisation
Sample Definition,
Data Collection
Data Analysis /
Hypothesis Testing
Refutation/
Confirmation
Scope of Validity
DEDUCTIVE
Generating a Theory — Inductive to Deductive
An Example
• Field study in a company to investigate benefits and
challenges of the use of a UML-based method in a
large distributed development project
• Goal of the project: new safety-critical process-control
system based on several existing systems

• Four sites in three countries, 230 people, 100 using UML

• Data was collected through individual interviews,
questionnaires and project documents.
cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12
Generating a Theory: Example
• Step 1: Defining the constructs
• Interviews are performed to identify which are the
most significant concepts to consider. They applied
the so-called “open coding” to the interview transcripts
to identify the constructs
Generating a Theory: Example
• Step 1: Defining the constructs
• Interviews are performed to identify which are the
most significant concepts to consider. They applied
the so-called “open coding” to the interview transcripts
to identify the constructs
12 Building Theories in Software Engineering 323
Table 3 Constructs, propositions, example explanations and scope of the theory of UML-based
development
Constructs
C1 UML-based development method
C2 Costs (total number of person hours in the project)
C3 Communication (ease of discussing solutions within development teams and in reviews)
C4 Design (perceived structural properties of the code)
C5 Documentation (the documentation of the system for the purpose of passing reviews as
well as for expected future maintainability)
C6 Testability (more efficient development of test cases and better quality, i.e., better coverage)
C7 Training (training in the UML-based method before the start of the project)
C8 Coordination (of requirements and teams)
C9 Legacy code (code that has not been reverse engineered to UML-models)
Propositions
Generating a Theory: Example
• Step 2: Defining the propositions
• From the interviews, relationships are identified between
constructs (e.g., relation between UML and cost), and these
are translated into propositions

• The resulting propositions are confirmed with questionnaires
12 Building Theories in Software Engineering 323
Table 3 Constructs, propositions, example explanations and scope of the theory of UML-based
development
Constructs
C1 UML-based development method
C2 Costs (total number of person hours in the project)
C3 Communication (ease of discussing solutions within development teams and in reviews)
C4 Design (perceived structural properties of the code)
C5 Documentation (the documentation of the system for the purpose of passing reviews as
well as for expected future maintainability)
C6 Testability (more efficient development of test cases and better quality, i.e., better coverage)
C7 Training (training in the UML-based method before the start of the project)
C8 Coordination (of requirements and teams)
C9 Legacy code (code that has not been reverse engineered to UML-models)
Propositions
P1 The use of a UML-based development method increases costs
P2 The use of a UML-based development method positively affects communication
P3 The use of a UML-based development method positively affects design
P4 The use of a UML-based development method positively affects documentation
P5 The use of a UML-based development method positively affects testability
P6 The positive effects of UML-based development are reduced if training is not sufficient
and adapted
P7 The positive effects of UML-based development are reduced if there is insufficient coordi-
nation of modelling activities among distributed teams working on the same project
P8 The positive effects of UML-based development are reduced if the activity includes
modification of legacy code
Explanations
Generating a Theory: Example
• Step 3: Provide explanations
• Further analyse the interviews to understand the reasons behind
the propositions

• Perform further interviews and check project documents to make
sense of identified phenomena
C5 Documentation (the documentation of the system for the purpose of passing reviews as
well as for expected future maintainability)
C6 Testability (more efficient development of test cases and better quality, i.e., better coverage)
C7 Training (training in the UML-based method before the start of the project)
C8 Coordination (of requirements and teams)
C9 Legacy code (code that has not been reverse engineered to UML-models)
Propositions
P1 The use of a UML-based development method increases costs
P2 The use of a UML-based development method positively affects communication
P3 The use of a UML-based development method positively affects design
P4 The use of a UML-based development method positively affects documentation
P5 The use of a UML-based development method positively affects testability
P6 The positive effects of UML-based development are reduced if training is not sufficient
and adapted
P7 The positive effects of UML-based development are reduced if there is insufficient coordi-
nation of modelling activities among distributed teams working on the same project
P8 The positive effects of UML-based development are reduced if the activity includes
modification of legacy code
Explanations
E4 The documentation is
– More complete
– More consistent due to traceability among models and between models and code
– More readable, and makes it easier to find specific information, due to a common
format
– More understandable for non-technical people
– May be viewed from different perspectives due to different types of diagram
E5 Test cases based on UML models
– Are easier to develop
– Can be developed earlier
– Are more complete
– Have a more a unified format
Moreover, traceability from requirements to code and test cases makes it is easier to
identify which test cases must be run after an update
Scope
Generating a Theory: Example
• Step 4: Identifying the scope of interest of the theory
• Technology: UML

• Actor: designers in distributed teams

• Software System: large, embedded software

• Activity: create and modify UML diagrams
Generating a Theory: Example
• Step 5: Testing the theory - Deductive Step
• Consider each proposition and perform a study for each one, or for a subset, e.g., “Use of
UML methods increases cost”, “Use of UML methods positively affects testability”

• I can use different methods to test the theory:

• Field studies: identify companies who are willing to introduce UML; establish a way to
evaluate cost (e.g., man-hour); consider a comparable company not using UML; check
resulting cost.

• Experiment: two group of subjects; give them a requirements document; ask group 1 to
implement the code; ask group 2 to design and then implement; evaluate and compare
cost.

• Survey/Questionnaire: contact multiple companies who have introduced UML and ask
them to state their agreement with the propositions and the explanations 

• Step 6: based on the selected study I identify the scope of validity (larger for survey, narrower for
field studies)
Generating a Theory: Example
• Theory Evaluation
• Testability: constructs are not ambiguous, and propositions are clear,
furthermore protocols are shared for replication. Since some subjective data
collection was performed, replication may lead to different results.
• Empirical support: other studies seem to confirm part of the propositions
• Explanatory power: the motivations are derived from interviews, and not all
factors may have been considered. Hence the explanatory power is limited (did
not account for all possible reasons WHY).
• Parsimony: reduced number of constructs and relationships in the proposition
• Generality: scope is narrow, as I have performed a case study
• Utility: utility is high, as it can help decision making
These are all logical arguments
that have to be checked by peers!
The ABC of Software
Engineering Research
cf. Stol and Fitzgerald, 2018 https://doi.org/10.1145/3241743
The Need for a Taxonomy
of Methods Strategies
• As we said, there is no universally accepted taxonomy for
research methods in SE
The ABC of Software Engineering Research 11:3
Table 1. A “Mixed Bag”: Alternative Research Methods in Software Engineering
According to a Selection of Sources
Glass et al. [63] Zannier et al. [230] Sjøberg et al. [190] Höfer and Tichy [75] Easterbrook et al. [48]
Action research Controlled experiment Controlled experiment Case study Experimentation
Conceptual analysis Quasi experiment Surveys Correlational study Case study
Concept implementation Case study Case studies Ethnography Survey
Case study Exploratory case study Action research Ex post facto study Ethnography
Data analysis Experience report Experiment Action research
Discourse analysis Meta-analysis Meta-analysis
Ethnography Example application Phenomenology
Field experiment Survey Survey
Field study Discussion
Grounded theory
Hermeneutics
Instrument development
Laboratory experiment
(human/software)
Literature review
Meta-analysis
Mathematical proof
Protocol analysis
Phenomenology
Simulation
Descriptive/expl. survey
Each author use different terms to refer to
research methods, and there is no agreement
Let us talk about STRATEGIES,
which can adopt specific METHODS…
A Unifying Framework: ABC of SE Research
• Actors: human and technical, i.e., managers, software engineers, users,
software systems, software development artifacts incl. defects, tools,
techniques, prototypes 

• Behaviour: of all actors, i.e., system behavior (e.g., reliability, performance,
and other quality attributes), software engineers’ behavior and antecedents
such as productivity, motivation, and intention 

• Context: of all actors, i.e., industrial settings, organizations, software
projects, development teams, software laboratory, classroom, meeting rooms
"Optimizing a study to achieve generalizability over actors (A) and precise
measurement of their behavior (B), in a realistic context (C), is impossible, and is
a “three-horned dilemma [since] there is no way—in principle—to maximize all
three (conflicting) desiderata of the research strategy domain” (McGrath,
1981 https://doi.org/10.1177/000276428102500205 )
Three main dimensions…
• Obstrusiveness: to what extent does a researcher
“intrude” on the research setting, or simply make
observations in an unobtrusive way (i.e., how much
control do I have on the empirical settings)

• Generalizability: to which extent the research
findings are generalizable (i.e., how much of the
scope of interest is it covered, given the current
scope of validity)
And two other dimensions…
A Unifying Framework: ABC
of SE Research
ABC Framework
more
obtrusive
less
obtrusive
more
general
less
general
Precise characterisation
of Behaviour
is relevant
Precise characterisation
of specific Context
is relevant
Generalizability
over Actors
is relevant
A
B
C
Note: Actors can be People or Software; Behaviour of People or Software
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System
Field Studies (Jungle)
• Purpose: To investigate the impact of distributed teams
in software development

• Setting: Natural, first author spent 7 months on-site at an
organization. 

• Procedure: Document study, observation, interviews. 

• Findings: Four major problems and 8 specific challenges
Example
Field Studies (Jungle)
• Setting: Natural setting that exists before the researcher enters it. Minimal intrusion of
the setting so as not to disturb realism, only to facilitate data collection. 

• Purpose:

• Exploratory, to understand what’s going on, how things work, or to generate
hypotheses. 

• Typical Methods and Data: Case study, ethnography, observational study; qualitative
data incl. interviews, field notes, archival documents, may include quantitative data. 

• Inherent Limitations:
• No statistical generalizability

• No control over events

• Low precision of measurement
Field Studies (Jungle)
• Essence: 

• Facilitates the study of real-world actors (people, systems) and their
behaviors in a natural setting that is not manipulated by the
researcher. 

• High potential to capture realistic settings and a high degree of detail
of a particular system and context. 

• Evaluation Considerations: 

• Not suitable to investigate statistical relationships, or to otherwise
manipulate variables, 

• Not suitable for findings that hold for larger populations.
Field Experiments
(Natural Reserve)
• Purpose: To identify a cost-effective way to avoid software
defects. 

• Setting: Natural, company staff and researcher collaborated
on-site, using real products to evaluate new approaches. 

• Procedure: Action research (improving case study, design
science), data include defect reports, time spent, usability
issues, timeliness of the project, product sales. 

• Findings: certain techniques are beneficial, while other are
time consuming and do not avoid defects
Field Experiments
(Natural Reserve)
• Setting: Natural, pre-existing setting (in vivo), but some level of intrusion
due to the deliberate manipulation of aspects of the setting; study
affected by confounding factors. 

• Purpose: To investigate, evaluate, or compare techniques, practices,
processes, or approaches within a real-world and pre-existing setting. 

• Typical Methods and Data: case study, quasi-experiment, action
research; studies may use either quantitative data or qualitative data. 

• Inherent Limitations:
• No statistical generalizability

• Precision of measurement affected by confounding contextual factors
• Essence: 

• Facilitates the study of effects of a modification of properties of a
studied entity or phenomenon that occurs in a natural setting, i.e.,
pre-exist independent of the researcher. 

• Potentially very costly to set up due to complexity of natural settings. 

• Evaluation Considerations: 

• Limited level of precision of measurement; 

• Results not generalizable, but strongly linked to the specific setting
due to confounding variables that are very difficult to isolate.
Field Experiments
(Natural Reserve)
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System
Experimental Simulations
(Flight Simulator)
• Purpose: To understand how developers perceive the testing
team

• Setting: Contrived, simulation environment with experimental
stimuli that were previously defined (e.g., the software to be
written by the developers, the types of checks performed by
testers)

• Procedure: developers develop code, testers test and give
feedback during a meeting, impressions of developers are
observed

• Findings: Insights into defensive reactions of the developers
Experimental Simulations
(Flight Simulator)
• Setting: Contrived setting (in virtuo) created specifically for a study to represent a
concrete type of setting. Environment is created by the researcher to study
behavior of actors. 

• Purpose: To study behavior of participants or systems in a controlled setting that
resembles a real-world, concrete class of settings as closely as possible. 

• Typical Methods and Data: Simulation/Role-playing games, management
games, instrumented multiplayer games; quantitative or qualitative data,
depending on the simulation instrument. 

• Inherent Limitations:
• Generalizability reduced as the setting is designed to mirror a specific type of
setting (e.g. I have specific subjects from a company)

• Realism reduced due to artificial setting Similar to Lab experiment,
but more context-specific
• Essence: 

• A contrived setting that simulates a specific class of real-world systems
that to some extent resembles reality.

• Temporal flow of events depends on the simulation environment and
actors’ behavior, which allows for observing more natural behavior than a
laboratory experiment. 

• Evaluation Considerations: 

• Reduced level of realism compared to field experiments due to the
contrived setting

• Behavior of actors may reflect that in natural settings, but consequences
for actors lack realism, which may affect their behavior.
Experimental Simulations
(Flight Simulator)
Laboratory Experiments
(in Vitro Experiments)
• Purpose: To investigate the hypothesis that a certain
code inspection method A is more effective than another
method B

• Setting: Contrived, laboratory exercise with graduate
students 

• Procedure: Measurement of effect of inspection methods
on 4 dependent variables including fault detection rate

• Findings: inspection method A is more effective than
inspection method B
Laboratory Experiments
(in Vitro Experiments)
• Setting: Contrived setting (in vitro) created specifically for a study, with high degree of
control of all measured variables. 

• Purpose:
• to study with a high degree of precision relationships between variables, or comparisons
between techniques; 

• may allow establishment of causality between variables. 

• Typical Methods and Data: Randomized controlled experiments and quasi experiments,
comparative evaluations with benchmark studies; usually quantitative data exclusively. 

• Inherent Limitations:
• Abstract or unrealistic context due to highly artificial setting

• Scope of problem reduced to study the “essence”, optimizing internal validity at cost of
external validity
• Essence:
• A controlled setting where behavior of actors (humans or systems) is
carefully measured through a number of discrete trials to establish
effects or conduct comparative analyses. 

• Maximum potential to capture precise measurement of variables (high
internal validity) due to potential to isolate confounding factors. 

• Evaluation Considerations: 

• Studied relationships and variables are more abstract due to the
contrived and “sterile” nature of the research setting. 

• The setting is more artificial than for experimental simulations
Laboratory Experiments
(in Vitro Experiments)
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System
Judgment Studies
(Courtrooms)
• Purpose: To evaluate a set of 12 practices based on
feedback by team managers. 

• Setting: Neutral, dedicated meeting room with seating
around a table. 

• Procedure: 10 managers from 7 companies, selected
based on their interest and expertise. 

• Findings: a framework of defect and benefits for the 12
practices
Judgment Studies
(Courtrooms)
• Setting: Neutral setting; may be actively designed to nullify the context, so that
“responses” are in relation to some stimulus (question or instructions), independent
of setting. 

• Purpose: To elicit information from subjects for purposes of evaluation or study of
some entities. 

• Typical Methods and Data: Delphi studies, interview studies, focus group,
evaluation studies; use of qualitative and/or quantitative data. 

• Inherent Limitations:
• Responses not related to any specific or realistic context

• Less generalizability than sample studies due to lack of representative sampling

• Less control and precision of measurement than a lab. exp.
• Essence: 

• Facilitates study of responses or behavior of actors that
bears no relation to the research setting, which is
neutral or actively “neutralized.” 

• Allows for more complex questions and interactions
between researcher and respondents. 

• Evaluation Considerations: No concrete or natural
setting, which prohibits capturing direct observations of
phenomena.
Judgment Studies
(Courtrooms)
Sample Studies
(Referendum)
• Purpose: To investigate the state of practice of
requirements engineering in industry. 

• Setting: Neutral, web-based questionnaire. 

• Procedure: 22 questions; participants drawn from
internet; 194 responses from a population of 1,519 

• Findings: Findings include organization and participant
characteristics (various domains; participants held variety
of positions); software development life cycle model
(agile, waterfall, etc.); RE techniques.
Sample Studies
(Referendum)
• Setting: Neutral setting. Limited level of precision of measurement; no variables
are manipulated. The researcher must deal with whatever data is collected. 

• Purpose: To study the distribution of a particular characteristic in a population (of
people or systems), or the correlation between two or more characteristics in a
population.

• Typical Methods and Data: Software repository mining, surveys, questionnaires,
interviews; analysis includes correlational methods, e.g., regression. Typically,
quantitative data (e.g., Likert scales) but can include qualitative data. 

• Inherent Limitations:
• Reductionist—depth of and number of data points per participant limited

• Data collection not “interactive”: no option to clarify questions; repository data
comes as is, no opportunity to manipulate variables, only to correlate them
• Essence: 

• Facilitates data collection from a representative sample of a population (human
or nonhuman, such as systems or design artifacts). 

• Maximum potential to generalize findings to a wider population;

• Unobtrusive research strategy. 

• Evaluation Considerations: 

• Questions tend to be “simple”; 

• Limited opportunity for “complex” interaction between the researcher and
subjects. 

• Research setting offers no realistic context.
Sample Studies
(Referendum)
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System
Formal Theory
(Mathematical Model)
• Purpose: To develop an understanding of the role of
creativity in RE. 

• Setting: no empirical observations, but derivation of a
conceptual framework from literature. 

• Procedure: check general creativity literature, check
requirements engineering creativity literature. 

• Findings: A theoretical framework that offers RE
researchers a basis to incorporate creativity in RE
methods and techniques.
Formal Theory
(Mathematical Model)
• Setting: Nonempirical setting; typically a research office or library. 

• Purpose: 

• To develop a conceptualization, framework, or theory on a topic. 

• Focus is on formulating relations among concepts, or explanations that hold for a
wide range of contexts. 

• Typical Methods and Data: literature reviews, Conceptual reasoning, concept
development, development of propositions and/or hypotheses; framework
development. 

• Inherent Limitations:
• Low on realism: does not consider a specific context but rather abstract concepts

• No manipulation of variables or measurement (no empirical information is gathered)
• Essence:
• The careful and justified construction of a theoretical model that
represents one view of a phenomenon, which helps to analyze
or explain the real world. 

• Model generic behavior for a range of classes of populations
(humans or nonhuman artifacts), which serves to make
predictions or explanations about the real world. 

• Evaluation Considerations:
• Theoretical models do not generate new empirical observations,
though may inform future empirical studies.
Formal Theory
(Mathematical Model)
Computer Simulations
(Forecasting system)
• Purpose: To investigate bottlenecks and overload in the
testing processes. 

• Setting: Nonempirical, a discrete event simulator was
implemented. 

• Procedure: Four simulation scenarios with different
parameter values to model different circumstances. 

• Findings: Two ways were identified to avoid congestion:

(1) increase number of staff, (2) increase the number of
interactions with the development team.
Computer Simulations
(Forecasting system)
• Setting: Nonempirical setting (in silico); no recording of observations in the real world.
There are no actors (people, real-world systems) or real-world behavior: everything is
specified in the simulation. 

• Purpose: To model a particular system or phenomenon that facilitates evaluation of a
large number of complex scenarios that are captured in the preprogrammed model. 

• Typical Methods and Data: Development of software programs that contain symbolic
representations of all variables a researcher considers important; usually these variables
are derived and calibrated based on prior empirical studies. 

• Inherent Limitations:
• No empirical data is gathered

• Results will be as good as the accuracy of the model representing the simulated
system

• Low generalizability as it attempts to model a specific class of real-world systems
• Essence:
• Represents a symbolic replica of a concrete real-world system
where all configurations and variables are preprogrammed. 

• Useful to run a large number of complex scenarios to explore a
solution space, which might not be feasible to do manually. 

• Evaluation Considerations: 

• All simulation rules are preprogrammed: no new empirical (i.e.,
real world, as opposed to simulated) behavior is observed.

• Due to concrete implementation, limited generalizability.
Computer Simulations
(Forecasting system)
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System
Frequent
in SE
Rare
in SE
Conclusion
7 Commandments of the Empirical
Software Engineer (From Daniel Méndez)
1. No such thing as absolute and / or universal truth (truth is always
relative)



2. The value of scientific theories always depends on their 

• ability to stand criticism by the (research) community,

• robustness / our confidence (e.g. degree of corroboration),

• contribution to the body of knowledge (relation to existing evidence)

• ability to solve a problem

3. Theory building is a long endeavour where

• progress comes in an iterative, step-wise manner,

• empirical inquiries need to consider many non-trivial factors,

• we often need to rely on pragmatism and creativity

• we depend on acceptance by peers (research communities) 

https://www.slideshare.net/mendezfe/an-introduction-into-philosophy-of-science-for-software-engineers
4. Be sceptical and open at the same time

• no statement imposed by authorities shall be immune to criticism

• be open to existing evidence and arguments/explanations by others



5. Be always aware of

• strengths & limitations of single research methods

• validity and scope of observations and related theories

• relation to existing body of knowledge / existing evidence



6. Appreciate the value of

• all research processes and methods

• null results (one’s failure can be another one’s success) 

• replication studies (progress comes via repetitive steps)



7. Be an active part of something bigger (knowledge is built by
communities)

Contenu connexe

Tendances

Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Artificial intelligence and knowledge representation
Artificial intelligence and knowledge representationArtificial intelligence and knowledge representation
Artificial intelligence and knowledge representationSajan Sahu
 
Artificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language ProcessingArtificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language ProcessingFrank Cunha
 
Test Data Management: The Underestimated Pain
Test Data Management: The Underestimated PainTest Data Management: The Underestimated Pain
Test Data Management: The Underestimated PainChelsea Frischknecht
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysisDiana Maynard
 
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptxPROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptxShantanuDharekar
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
Turing Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptxTuring Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptxRSAISHANKAR
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 

Tendances (20)

Software Metrics
Software MetricsSoftware Metrics
Software Metrics
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Artificial intelligence and knowledge representation
Artificial intelligence and knowledge representationArtificial intelligence and knowledge representation
Artificial intelligence and knowledge representation
 
Artificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language ProcessingArtificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Test Data Management: The Underestimated Pain
Test Data Management: The Underestimated PainTest Data Management: The Underestimated Pain
Test Data Management: The Underestimated Pain
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Function Point Analysis
Function Point AnalysisFunction Point Analysis
Function Point Analysis
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Graph Planning
Graph PlanningGraph Planning
Graph Planning
 
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptxPROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
 
NLP
NLPNLP
NLP
 
Language models
Language modelsLanguage models
Language models
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Software Evolution
Software EvolutionSoftware Evolution
Software Evolution
 
Turing Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptxTuring Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptx
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 

Similaire à Empirical Methods in Software Engineering - an Overview

[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineering[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineeringIvano Malavolta
 
Debugging microservices in production
Debugging microservices in productionDebugging microservices in production
Debugging microservices in productionbcantrill
 
CSE_2014 SE MODULE 1 V.10.pptx
CSE_2014 SE MODULE 1 V.10.pptxCSE_2014 SE MODULE 1 V.10.pptx
CSE_2014 SE MODULE 1 V.10.pptxAbdulMateen516672
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the PeopleAndy Zaidman
 
CSE_2014 SE MODULE 1 V.10 (2).pptx
CSE_2014 SE MODULE 1 V.10 (2).pptxCSE_2014 SE MODULE 1 V.10 (2).pptx
CSE_2014 SE MODULE 1 V.10 (2).pptxMrSDeepakRajAssistan
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!University of Córdoba
 
[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineeringIvano Malavolta
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in productionbcantrill
 
RESEARCH in software engineering
RESEARCH in software engineeringRESEARCH in software engineering
RESEARCH in software engineeringIvano Malavolta
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemGiovanni Asproni
 
Applying Systems Thinking to Solve Wicked Problems in Software Engineering
Applying Systems Thinking to Solve Wicked Problems in Software EngineeringApplying Systems Thinking to Solve Wicked Problems in Software Engineering
Applying Systems Thinking to Solve Wicked Problems in Software EngineeringMajed Ayyad
 
Exploratory Testing in a chaotic world to share
Exploratory Testing in a chaotic world   to shareExploratory Testing in a chaotic world   to share
Exploratory Testing in a chaotic world to shareDoron Bar
 
Markus Clermont - Surviving in an Agile Environment - Google - SoftTest Ireland
Markus Clermont - Surviving in an Agile Environment - Google - SoftTest IrelandMarkus Clermont - Surviving in an Agile Environment - Google - SoftTest Ireland
Markus Clermont - Surviving in an Agile Environment - Google - SoftTest IrelandDavid O'Dowd
 
Agile software development
Agile software developmentAgile software development
Agile software developmentHemangi Talele
 
Contemporary Software Engineering Practices Together With Enterprise
Contemporary Software Engineering Practices Together With EnterpriseContemporary Software Engineering Practices Together With Enterprise
Contemporary Software Engineering Practices Together With EnterpriseKenan Sevindik
 
Fact or Fiction? What Software Analytics Can Do For Us
Fact or Fiction? What Software Analytics Can Do For UsFact or Fiction? What Software Analytics Can Do For Us
Fact or Fiction? What Software Analytics Can Do For UsAndy Zaidman
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?Michaela Greiler
 
Lecture4 requirement engineering
Lecture4 requirement engineeringLecture4 requirement engineering
Lecture4 requirement engineeringShahid Riaz
 

Similaire à Empirical Methods in Software Engineering - an Overview (20)

[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineering[2016/2017] RESEARCH in software engineering
[2016/2017] RESEARCH in software engineering
 
Debugging microservices in production
Debugging microservices in productionDebugging microservices in production
Debugging microservices in production
 
3.pptx
3.pptx3.pptx
3.pptx
 
CSE_2014 SE MODULE 1 V.10.pptx
CSE_2014 SE MODULE 1 V.10.pptxCSE_2014 SE MODULE 1 V.10.pptx
CSE_2014 SE MODULE 1 V.10.pptx
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the People
 
CSE_2014 SE MODULE 1 V.10 (2).pptx
CSE_2014 SE MODULE 1 V.10 (2).pptxCSE_2014 SE MODULE 1 V.10 (2).pptx
CSE_2014 SE MODULE 1 V.10 (2).pptx
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
 
[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in production
 
RESEARCH in software engineering
RESEARCH in software engineeringRESEARCH in software engineering
RESEARCH in software engineering
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
 
Applying Systems Thinking to Solve Wicked Problems in Software Engineering
Applying Systems Thinking to Solve Wicked Problems in Software EngineeringApplying Systems Thinking to Solve Wicked Problems in Software Engineering
Applying Systems Thinking to Solve Wicked Problems in Software Engineering
 
Exploratory Testing in a chaotic world to share
Exploratory Testing in a chaotic world   to shareExploratory Testing in a chaotic world   to share
Exploratory Testing in a chaotic world to share
 
Fundamentals of testing
Fundamentals of testingFundamentals of testing
Fundamentals of testing
 
Markus Clermont - Surviving in an Agile Environment - Google - SoftTest Ireland
Markus Clermont - Surviving in an Agile Environment - Google - SoftTest IrelandMarkus Clermont - Surviving in an Agile Environment - Google - SoftTest Ireland
Markus Clermont - Surviving in an Agile Environment - Google - SoftTest Ireland
 
Agile software development
Agile software developmentAgile software development
Agile software development
 
Contemporary Software Engineering Practices Together With Enterprise
Contemporary Software Engineering Practices Together With EnterpriseContemporary Software Engineering Practices Together With Enterprise
Contemporary Software Engineering Practices Together With Enterprise
 
Fact or Fiction? What Software Analytics Can Do For Us
Fact or Fiction? What Software Analytics Can Do For UsFact or Fiction? What Software Analytics Can Do For Us
Fact or Fiction? What Software Analytics Can Do For Us
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Lecture4 requirement engineering
Lecture4 requirement engineeringLecture4 requirement engineering
Lecture4 requirement engineering
 

Plus de alessio_ferrari

Systematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping StudiesSystematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping Studiesalessio_ferrari
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineeringalessio_ferrari
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineeringalessio_ferrari
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...alessio_ferrari
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validityalessio_ferrari
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2alessio_ferrari
 
Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1alessio_ferrari
 
Ambiguity in Software Engineering
Ambiguity in Software EngineeringAmbiguity in Software Engineering
Ambiguity in Software Engineeringalessio_ferrari
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overviewalessio_ferrari
 

Plus de alessio_ferrari (9)

Systematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping StudiesSystematic Literature Reviews and Systematic Mapping Studies
Systematic Literature Reviews and Systematic Mapping Studies
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to ValidityControlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validity
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2
 
Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1Requirements Engineering: focus on Natural Language Processing, Lecture 1
Requirements Engineering: focus on Natural Language Processing, Lecture 1
 
Ambiguity in Software Engineering
Ambiguity in Software EngineeringAmbiguity in Software Engineering
Ambiguity in Software Engineering
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 

Dernier

Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Data modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software DomainData modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software DomainAbdul Ahad
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotEdgard Alejos
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 

Dernier (20)

Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Data modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software DomainData modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software Domain
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform Copilot
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 

Empirical Methods in Software Engineering - an Overview

  • 1. Empirical Methods in Software Engineering Alessio Ferrari, ISTI-CNR, Pisa, Italy alessio.ferrari@isti.cnr.it April, 2020
  • 2. What is Software Engineering? • Software engineering it the systematic design and development of software products and the management of the software process. • Software engineering has as one of its primary objectives the production of programs that meet specifications, are demonstrably accurate, produced on time, and within budget. cf. D. O’Neil, 1980. https://doi.org/10.1147/sj.194.0421
  • 3. The Scope of Software Engineering (SE) customer / user analyst developer requirements software system designer tester apparently good softwarecustomer / user customer / user customer service / analyst requirements maintainer / system designer
  • 4. Typical SE Problems How can I find bugs in my code? How can improve software development speed? How can I reduce the resources dedicated to testing? How can improve my requirements?
  • 5. Typical SE Solutions (when we were not Empirical) How can I find bugs in my code? How can improve software development speed? How can I reduce the resources dedicated to testing? How can improve my requirements? This new testing environment will allow you to find all the bugs Let us use this new prototypical programming language We can use a controlled language Let’s use model checking
  • 6. Typical SE Failures How can I find bugs in my code? How can improve software development speed? How can I reduce the resources dedicated to testing? How can improve my requirements? This new testing environment will allow you to find all the bugs Let us use this new prototypical programming language We can use a controlled language Let’s use model checking It is very complex! I need to re- train all my team! The language does not cover my real cases! The language does not allow me to express what I want! I need training! It takes too long! Language is too strict!
  • 7. The Software Engineer’s Illusion We though our tiny solutions would scale up to (ALL) real-world cases, we thought a successful simple example was sufficient to ensure that our idea was working… We were convinced that we could smoothly pass from theory to practice Problem Solution We thought we could change the world without knowing the world
  • 8. The Hard Truth • Of course, we were wrong… • Software development is a complex, context-dependent phenomenon involving multiple stakeholders, professionals, needs, technologies, domains (aircraft software, mobile app to track your diet, enterprise software to manage workflow, you name it…) • You rarely start the development from scratch (there may be legacy systems to refactor, need to interact with external systems and databases) • Even if the software to be developed is new, developers have specific backgrounds and skills that have an impact on the development • You rarely know how the project will go, as the context is surely going to change throughout the project • We thought it was too simple, and rarely SE solutions came from research, as SE researchers we felt useless… We understood that, to change the world, we should first learn about the world
  • 9. Empirical Software Engineering Research • The use of a (not “the”) scientific method to investigate software engineering problems (there is no “official” scientific method) • Start from observation, formulate hypothesis, select methodology, 
 validate hypothesis with respect to reality • The simple idea is that if I understand how things work in practice, I can find ways to improve them • Knowledge and understanding is not regarded as a final goal, but a means to an end, where the end is solving real-world problems (within the scope of software engineering) • Scientifically evaluating whether my solution has solved the problem is also empirical software engineering
  • 10. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality Theory Space Reality Space
  • 11. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality I have a lot of bugs in this software People work hours and hours, but still, lots of bugs
  • 12. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality I have a lot of bugs in this software People work hours and hours, but still, lots of bugs bugs may be produced by too many work hours bugs may be associated to complex code
  • 13. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality bugs may be produced by too many work hours I see that most bugs are introduced between 8pm and 8am bugs may be associated to complex code No relation with code complexity
  • 14. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality I see that most bugs are introduced between 8pm and 8am No relation with code complexity Install a system that prevents developers from working at night
  • 15. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality Install a system that prevent developers from working at night I reduced the number of bugs!
  • 16. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality I reduced the number of bugs! I cannot meet the delivery deadlines!
  • 17. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality I cannot meet the delivery deadlines! Development speed may be lower during the day
  • 18. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality Development speed may be lower during the day More correct code during the day, but slower speed More bugs during the night, but faster speed
  • 19. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality More correct code during the day, but slower speed More bugs during the night, but faster speed Dedicate more testing resources for code developed at night
  • 20. Typical Empirical Software Engineering Cycle Observe Reality Formulate Problem Theory Evaluate Theory Against Reality Formulate Solution Theory Evaluate Solution Against Reality Dedicate more testing resources for code developed at night Less bugs More software
  • 21. Typical SE Problems How can I find bugs in my code? How can improve software development speed? How can I reduce the resources dedicated to testing? How can improve my requirements? Problems are the same…
  • 22. Typical SE Solutions (Today) The way of approaching problems is different… Let’s see how you write requirements now Let’s see what are the main quality problems of your requirements Solutions are more context-specific, start from reality, and use a scientific method How can improve my requirements? I see your requirements language is clear, but requirements are just incomplete! How many meetings do you normally do with your customer? … Ok, let’s try to schedule a meeting each week to revise the requirements with the customer But WHICH method?
  • 23. Software Engineering as a Strange Creature but also a human and social facet It has a technical facet
  • 24. Empirical Methods in Software Engineering Come from hard sciences (mostly quantitative) …but also from social sciences (mostly qualitative) experiment with human subjects literature review / archival analysis interview and ethnography survey case study field study experiment with software subjects judgment study
  • 25. Goal and Scope of this Course • To learn a set of methods commonly used in empirical software engineering research • To learn when to use a certain scientific method • To learn how to combine different methods Remember that all methods are FLAWED!
  • 26. All Methods Are Flawed! 
 (from Steve Easterbrook) • Experiments • Real-world environment is simplified, as I have to focus only on a specific set of variables (independent, dependent, controlled variables, we will see them later) • Surveys • People tell you what they think, not what they do, and it is hard to be sure that they have correctly understood the questions • Interviews and Ethnography • Unavoidable researcher bias, as theories are derived from qualitative data • Case Studies (which could include (quasi-)experiments, surveys, interviews, etc.) • Hard to generalise and hard to separate environment from unit of analysis, as case studies are real-world experiences (several confounding variables) in one or a few companies (limited scope) http://www.cs.toronto.edu/~sme/CSC2130/ Never stick to methodological purity! WARNING: there is no acknowledged taxonomy for these methods!
  • 27. Course Outline • Overview of Empirical Methods • Interviews and Ethnography • Surveys • Systematic Literature Reviews • Qualitative Data Analysis Methods • Experiments, Quasi-experiments and Hypothesis Testing • Mining Software Repositories • Case Studies and Action Research
  • 29. Roles and Tasks in Software Engineering These are the things that you will study as an Empirical Software Engineer
  • 30. Roles in SE • Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make management decisions (can also be a single person in case the co) • Managers: three different levels of management may be present in a large company (low, middle, top) • Top-level managers are responsible for controlling and overseeing the entire organization. • Middle-level managers are responsible for executing organizational plans which comply with the company’s policies. These managers act at an intermediary between top-level management and low-level management. • Low-level managers focus on controlling and directing. They serve as role models for the employees they supervise. • Customers: the ones who buy the system • Users: the ones who use the system • Requirements/Business Analysts: the ones that gather requirements from customers and users • Designers and Architects: the ones that design the system at the high level • Developers: the ones who code • Testers: the ones who test the code
  • 31. Roles in SE • Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make management decisions (can also be a single person in case the co) • Managers: three different levels of management may be present in a large company (low, middle, top) • Top-level managers are responsible for controlling and overseeing the entire organization. • Middle-level managers are responsible for executing organizational plans which comply with the company’s policies. These managers act at an intermediary between top-level management and low-level management. • Low-level managers focus on controlling and directing. They serve as role models for the employees they supervise. • Customers: the ones who buy the system • Users: the ones who use the system • Requirements/Business Analysts: the ones that gather requirements from customers and users • Designers and Architects: the ones that design the system at the high level • Developers: the ones who code • Testers: the ones who test the code Companies may include only a subset of the roles
  • 32. Roles in SE • Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make management decisions (can also be a single person in case the co) • Managers: three different levels of management may be present in a large company (low, middle, top) • Top-level managers are responsible for controlling and overseeing the entire organization. • Middle-level managers are responsible for executing organizational plans which comply with the company’s policies. These managers act at an intermediary between top-level management and low-level management. • Low-level managers focus on controlling and directing. They serve as role models for the employees they supervise. • Customers: the ones who buy the system • Users: the ones who use the system • Requirements/Business Analysts: the ones that gather requirements from customers and users • Designers and Architects: the ones that design the system at the high level • Developers: the ones who code • Testers: the ones who test the code Companies may include only a subset of the roles Some roles may be covered by the same person
  • 33. Roles in SE • Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make management decisions (can also be a single person in case the co) • Managers: three different levels of management may be present in a large company (low, middle, top) • Top-level managers are responsible for controlling and overseeing the entire organization. • Middle-level managers are responsible for executing organizational plans which comply with the company’s policies. These managers act at an intermediary between top-level management and low-level management. • Low-level managers focus on controlling and directing. They serve as role models for the employees they supervise. • Customers: the ones who buy the system • Users: the ones who use the system • Requirements/Business Analysts: the ones that gather requirements from customers and users • Designers and Architects: the ones that design the system at the high level • Developers: the ones who code • Testers: the ones who test the code The roles may depend on the adopted software process! Companies may include only a subset of the roles Some roles may be covered by the same person
  • 34. (Main) Tasks in Software Engineering • Requirements Elicitation and Analysis • Software Architecture • Software Development • Software Testing • Software Documentation • Software Maintenance • Software Process Management
  • 35. Formulating Research Questions cf. R. Feldt http://www.robertfeldt.net/advice/guide_to_creating_research_questions.pdf
  • 36. Research Question(s)• Every research endeavour starts with a question about the world: a problem to solve, a curiosity about some observed fact (subconsciously related to something relevant that you may not be able to always articulate, e.g., why do developers prefer to work at night? —why are you asking this question? Because it’s interesting, but why it is so?), a curiosity about some unknown fact (which are the most frequent defects in opensource code?) • The research question is the inquiry that guides your research: • e.g., Which are the most frequent defects in code developed by people with less than 6 months experience? Which are the most frequent defects in code developed by people with 6 months to 3 years experience? […] • You normally structure your research and reporting according to one or more research questions: they help to clarify your GOAL to the reader but also TO YOU • If you have more than one research question, it is good to establish a general research question (or research objective): • e.g., (mainly considering HOW aspects) To which extent certain defects types are related to the degree of experience of the developer? • e.g., (a more general one, may include also WHY aspects) Which is the relationship between defect types and degree of experience of the developer?
  • 37. Research Question(s)• Every research endeavour starts with a question about the world: a problem to solve, a curiosity about some observed fact (subconsciously related to something relevant that you may not be able to always articulate, e.g., why do developers prefer to work at night? —why are you asking this question? Because it’s interesting, but why it is so?), a curiosity about some unknown fact (which are the most frequent defects in opensource code?) • The research question is the inquiry that guides your research: • e.g., Which are the most frequent defects in code developed by people with less than 6 months experience? Which are the most frequent defects in code developed by people with 6 months to 3 years experience? […] • You normally structure your research and reporting according to one or more research questions: they help to clarify your GOAL to the reader but also TO YOU • If you have more than one research question, it is good to establish a general research question (or research objective): • e.g., (mainly considering HOW aspects) To which extent certain defects types are related to the degree of experience of the developer? • e.g., (a more general one, may include also WHY aspects) Which is the relationship between defect types and degree of experience of the developer? Many times a clear formulation of the general research question comes AFTER the formulation of the more specific research questions
  • 38. Research Question(s)• Every research endeavour starts with a question about the world: a problem to solve, a curiosity about some observed fact (subconsciously related to something relevant that you may not be able to always articulate, e.g., why do developers prefer to work at night? —why are you asking this question? Because it’s interesting, but why it is so?), a curiosity about some unknown fact (which are the most frequent defects in opensource code?) • The research question is the inquiry that guides your research: • e.g., Which are the most frequent defects in code developed by people with less than 6 months experience? Which are the most frequent defects in code developed by people with 6 months to 3 years experience? […] • You normally structure your research and reporting according to one or more research questions: they help to clarify your GOAL to the reader but also TO YOU • If you have more than one research question, it is good to establish a general research question (or research objective): • e.g., (mainly considering HOW aspects) To which extent certain defects types are related to the degree of experience of the developer? • e.g., (a more general one, may include also WHY aspects) Which is the relationship between defect types and degree of experience of the developer? Many times a clear formulation of the general research question comes AFTER the formulation of the more specific research questions TIP: sometimes you can formulate the general research question as a Research Objective, e.g.: Understanding to which extent certain defect types are related to the degree of experience of a developer
  • 39. Types of Research Questions (from Robert Feldt) Research Questions (RQs) Solution-focused Knowledge-focused Creating Refining Exploratory Base-rate Existence Descriptive Comparative Relationship Frequency Process Existence Causality Comparative Context http://www.robertfeldt.net/advice/guide_to_creating_research_questions.pdf
  • 40. Types of Research Questions (from Robert Feldt) Research Questions (RQs) Solution-focused Knowledge-focused Creating Refining Exploratory Base-rate Existence Descriptive Comparative Relationship Frequency Process Existence Causality Comparative Context if not much is known about the phenomenon under study, we want to create tentative theories, and give some evidence that a certain phenomenon can be measured (e.g., To which extent do developers get tired of coding?)
  • 41. Types of Research Questions (from Robert Feldt) Research Questions (RQs) Solution-focused Knowledge-focused Creating Refining Exploratory Base-rate Existence Descriptive Comparative Relationship Frequency Process Existence Causality Comparative Context describe when and how the phenomenon under study appears (normal patterns), when we already have a well-defined problem and context (e.g., When do developers get tired of coding?)
  • 42. Types of Research Questions (from Robert Feldt) Research Questions (RQs) Solution-focused Knowledge-focused Creating Refining Exploratory Base-rate Existence Descriptive Comparative Relationship Frequency Process Existence Causality Comparative Context describe how the phenomenon under study relates to other phenomena (e.g., Why do developers get tired of coding?)
  • 43. Types of Research Questions (from Robert Feldt) Research Questions (RQs) Solution-focused Knowledge-focused Creating Refining Exploratory Base-rate Existence Descriptive Comparative Relationship Frequency Process Existence Causality Comparative Context describe better ways to solve problem or situation Which strategies help to achieve X? How can we refine S to achieve X in a better way?
  • 44. “How can we refined S to achieve X in a better way?” Sub-Types of RQs Examples Exploratory/Existence “Does X exist?”, “Is Y something that software engineers really do?” Exploratory/Descriptive “What is X like?”, “What are its properties/attributes?”, “How can we categorize/measure X?”, “What are the components of X?” Exploratory/Comparative “How does X differ from Y?” Base-rate/Frequency “How often does X occur?”, “What is an average amount of X?” Base-rate/Process “How does X normally work?”, “What is the process by which X happens?”, “In what sequence does the events of X occur?” Relationship/Existence “Are X and Y related?”, “Do occurrences of X correlate with Y?” “What correlates with X?” Relationship/Causality “What causes X?”, “Does X cause Y?”, “Does X prevent Y?”, Causality/Comparative “Does X cause more Y than Z does?”, “Is X better at preventing Y than Z is?” Causality/Context “Does X cause more Y under one condition than others?”
  • 45. Creating Research Questions • Select overarching research topic (e.g., software development speed) • Do you want to create more and better understanding (Knowledge-based),
 or are you seeking for a solution to a problem (Solution-based)? • (Knowledge-based) e.g., what affects development speed? 
 how can we measure development speed? • How much is known about the topic? • Not much (Explorative): how can we measure development speed? • We know the phenomenon, but not how or when it occurs (Base-rate): what is the development speed of agile teams? • We know the phenomenon, but not its causes (Relationship): what affects development speed? • (Solution-based) e.g., how can I improve development speed? what is the easiest way to improve development speed? • One research question is not sufficient and you need a combination of them, so try to find a main research question or objective, and identify sub-questions, e.g., by checking the types of questions in the previous table and adapting them to your problem
  • 46. Data Types, Measure, Scales cf. Wholin et al., 2012, https://doi.org/10.1007/978-3-642-29044-2 Alessio Ferrari, ISTI-CNR, Pisa, Italy alessio.ferrari@isti.cnr.it
  • 47. Empirical Methods in Software Engineering experiment with human subjects literature review / archival analysis interview and ethnography survey case study field study experiment with software subjects judgment study
  • 48. Empirical Methods in Software Engineering experiment with human subjects literature review / archival analysis interview and ethnography survey case study field study experiment with software subjects judgment study Empirical inquiries entail OBSERVATION
  • 49. Empirical Methods in Software Engineering experiment with human subjects literature review / archival analysis interview and ethnography survey case study field study experiment with software subjects judgment study Empirical inquiries entail OBSERVATION Regardless of the research method you use, you will need to collect and analyse data
  • 50. Qualitative and Quantitative Types of Data • As you are doing empirical research, you will need to collect data, regardless of the method you use • Qualitative data (aka WORDS) come from interviews, surveys but also from other sources that may be relevant for SE, such as social-media opinions, code comments, app reviews • Quantitative data (aka NUMBERS) come from measurements (also done on qualitative data)
  • 51. Measure • A MEASURE is a mapping from the attribute of an entity to a measurement value, which can be numerical or categorical (a label) • Entities are objects we can observe in the real world, and have attributes • entity: source code; • attribute: complexity; • measure A: lines of code; value A: 1000 • measure B: evaluation made by user; value B: “very complex” • The purpose of mapping the attributes into a measurement value is to characterize and manipulate the attributes in a formal way. • To be valid, the measure must not violate any necessary properties of the attribute it measures and it must be a proper mathematical characterization of the attribute (if code X is more complex than code Y, this should be reflected in the measure)
  • 52. Scale • A mapping of an attribute to a measurement value can be done in different ways, and each way is a scale • Complexity can be measured in lines of code (LOC) or in “evaluation made by user”, these are different scales Entity (Source Code) Attribute (Complexity) Measure Measurement Value (1000) Scale (LOC)
  • 53. Scale Types (Level of Measurement) • Nominal (named values): maps the attribute of the entity into a name or symbol; can be seen as a form of classification of the attribute (e.g., types of code defects) • Ordinal (named and ordered values): the ordinal scale ranks the entities after an ordering criterion (“greater than”, “better than”, and “more complex”), (e.g., catastrophic, critical, marginal, negligible risk) • Interval (named, ordered and proportionate intervals): the interval scale is used when the difference between two measures are meaningful, but the value itself is not meaningful. • This scale type orders the values in the same way as the ordinal scale but there is a notion of “relative distance” between two entities • Rare in SE, temperature in Celsius is a typical interval scale, but you can set up a scale like the IQ (Intelligence Quotient) also in SE (e.g, usability scale based on a test) • Ratio (named, ordered, proportionate intervals, have a meaningful zero): if there exists a meaningful zero value (negative values do not exist) and the ratio between two measures is meaningful (e.g., lines of code is a ratio scale)
  • 54. Scales: Time and Duration • (Clock) Time and Duration: what types of scale are them? • Time is an interval scale • Duration is a ratio scale • Time is an interval measure when using any standard calendar and time measurement system as there is no fixed start point • 2018/10/23:20:10 CE and 2018/10/23:20:20 CE; there is a 10 second gap but the latter is not twice the former and there is no meaningful 0 • Duration (the amount of time something takes) is a ratio measure as it has a meaningful zero • 20 seconds is twice as long as 10 seconds and 10 days is twice as long as 5 days. cf. https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/
  • 55. Scale Types and Power cf. https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/ Different scale types imply different allowed operations
  • 56. Measure Types • Objective: an objective measure is a measure where there is no judgement in the measurement value and is therefore only dependent on the entity that is being measured. • An objective measure can be measured several times and by different researchers, and the same value can be obtained within the measurement error. • Subjective: a subjective measure is the opposite of the objective measure. The person making the measurement contributes by making some sort of judgement. The measure depends on both the entity and the viewpoint from which they are taken. • A subjective measure can be different if the entity is measured again. A subjective measure is mostly of nominal or ordinal scale type. • Direct: does not involve measurements on other attributes (e.g., LOC). • Indirect: is derived from the other measurements of other attributes, possibly involving more than one entity (e.g., defect density, productivity).
  • 57. Measurements in SE • In SE we normally measure three classes of entities • PROCESS: The process describes which activities that are needed to produce the software. • PRODUCT: The products are the artifacts, code, deliverables or documents that results from a process activity. • RESOURCES: Resources are the objects, such as personnel, hardware, budget, needed for a process activity.
  • 58. Measurements in SE • Relevant measures in SE are often indirect, subjective and are normally expressed in nominal or ordinal scale • Most of the times we want to link some internal attribute (e.g, code size, colours of the GUI) to an external one (e.g., perceived complexity, usability) • In principle we could not apply advanced statistical analysis when we deal with these measures…however, we do it anyway (but we should always reflect on the risks and on the value of our conclusions) cf. Briand et al. https://doi.org/10.1007/BF00125812
  • 59. Data Collection Techniques in SE cf. Lethbridge et al., 2005, https://doi.org/10.1007/s10664-005-1290-x
  • 60. Data Collection • Before measuring you need to collect data that can be relevant to your research questions • Depending on the question, you may need different data collection techniques • Normally, the data collection technique is also driven by the context that you CAN access: • are you in contact with a company and can you interview people? —interview • are you in contact with the company and you can create meetings with their developers? —focus group • you do not have any direct contact with companies, but you can reach some people? —questionnaire • you need to compare the performance of different tools, which licenses do you have? Can you buy them? —static and dynamic analysis of a system Reaching the (RIGHT) source of information is one of the hardest part…
  • 61. Data Collection Techniques cf. Lethbridge et al., 2005, https://doi.org/10.1007/s10664-005-1290-x requires direct access to a participant population. Second degree contact requires access Table 1. Data collection techniques suitable for field studies of software engineering. Category Technique Inquisitive techniques First Degree (direct involvement of software engineers) & Brainstorming and Focus Groups & Interviews & Questionnaires & Conceptual Modeling Observational techniques & Work Diaries & Think-aloud Protocols & Shadowing and Observation Synchronized Shadowing & Participant Observation (Joining the Team) Second Degree (indirect involvement of software engineers) & Instrumenting Systems & Fly on the Wall (Participants Taping Their Work) Third Degree (study of work artifacts only) & Analysis of Electronic Databases of Work Performed & Analysis of Tool Use Logs & Documentation Analysis & Static and Dynamic Analysis of a System DATA COLLECTION METHODS 313
  • 62. Data Collection Techniques Humans tend not to be reliable reporters, as they often do not remember past events with a high degree of accuracy. Records of activities, such as tapes, work products, and repositories, tend to be more reliable. However, care must be taken when interpreting these data sources as they may not be consistent, internally or with each other. Despite their drawbacks, first degree techniques are invaluable because of their flexibility and the phenomenon they can be used to study. Existing logs and repositories are easy to use but the data available is highly constrained. Software engineers, on the other hand, can be asked about a much wider range of topics. Second degree techniques Figure 1. Cost, reliability, flexibility, and phenomena addressed. NO INVOLVEMENT, e.g., analysis of data logs INDIRECT involvement of people, e.g., instrumentation DIRECT involvement of people e.g., brainstorming, questionnaires people intensive data intensive
  • 63. 1st Degree Techniques: Direct Involvement • Inquisitive: • Brainstorming / Focus Groups • Interviews • Questionnaires / Surveys • Conceptual modelling • Observational: • Think Aloud • Shadowing / Observation • Participant Observation
  • 64. Brainstorming and Focus Groups • What are they: based on a simple trigger question, people are free to express whatever comes to their mind, initially on paper, and then take turns to speak. • Advantages: • new to a domain and seeking ideas for further exploration • rapidly identifying what is important to the participant population • sense of involvement in the research • Disadvantages: • can become unfocused • hard to schedule with busy developers (you need to stop the activity of many people) • Example: understanding factors leading to success and failure of software process improvement. Researchers involved 13 software companies and implemented 49 focus groups. The groups were comprised of between 4 and 6 participants. Each session lasted 90 minutes. There were three types of groups: senior managers, project managers, and developers. The focus groups were moderated and tackled very specific questions aimed at understanding several factors leading to success and failure for software process improvement. •
  • 65. Interviews • What are they: ask a series of questions to some relevant actor of the software process • Advantages: • People are familiar with question-answering • People tend to be happy when someone asks about them • Create rapport with people • Possibility to clarify • Disadvantages: • People are not always reliable, and this can bias the results • Difficulties in sampling (random sampling often not applicable) • Time consuming: scheduling, data transcription, etc. • Example: Study the design process used on 19 different projects at various organizations. They interviewed personnel from three different levels of the participating projects, systems engineers, senior software designers and project managers. The researchers conducted 97 interviews, which resulted in over 3000 pages of transcripts of the audio recordings.
  • 66. Questionnaires and Surveys • What are they: written pre-defined questions to be answered by people. • Advantages: • quick and easy to administer • reach more people • Disadvantages: • difficult to clarify questions and answers • return rates can be low (10% normally, 20% if you’re lucky) • Example: paper-based questionnaire to identify factors affecting a certain tool adoption in 52 organizations. The author contacted organizations who had purchased the tools and surveyed key information systems personnel about the use of the tool. Surveys and questionnaire are treated as synonyms here
  • 67. Conceptual Modelling • What is it: participants create a diagram of some aspect of their work, often a system architecture or organisational structure or process. The intent is to bring to light their mental models. • Advantages: • easy to collect (drawing) • can explain systems that are hard to understand otherwise • Disadvantages: • require domain knowledge to be interpreted • can be hard to convince the engineers or other subjects to draw details • Example: • Identify the process in terms of tools, actors and tasks, for performing reimbursement of expenses in a public administration office. The goal was to re-engineer the process. Interviews with personnel to gather information, graphical diagrams shown to the personnel, and validation of the diagrams. Conceptual modelling requires interviews or focus groups
  • 68. Work Diaries • What are they: require participants to record various events that occur during the day. Filling out a form at the end of the day, recording specific activities as they occur, or noting whatever the current task is at a pre-selected time. • Advantages: • better self-reports of events because they record activities on an ongoing basis rather than in retrospect • you can randomly sample work diary moment • Disadvantages: • you need to convince people • can interfere with respondent as they work (recording can affect the work) • people could neglect to record • Example: I want to know which are the communication patterns (who do you contact, and about what) in a company. I ask developers to record their communication patterns for a period of one week. Identification of the interaction between the team members, and the typical communication patterns of developers.
  • 69. Think Aloud • What are they: researchers ask participants to think out loud while performing a task. As software engineers sometimes forget to verbalize, experimenters may occasionally remind them to continue thinking out loud. Usually last no more than 2 hours. • Advantages: • One of the few ways to test a cognitive model • Easy to implement • You can also ask to write down • Disadvantages: • Difficult and time consuming to analyse output • Example: I want to understand the strategy used by developers when debugging. I give a certain piece of software, ask them to add some functions, the system will return an error, and then I ask them to debug the code and think aloud about what they do. •
  • 70. Shadowing/Observation • What are they: with shadowing the experimenter follows and observe the participant and records their activities. With observation, I follow and observe more participants (e.g., in meetings). • Advantages: • Easy to implement • No special equipment needed • Disadvantages: • Know just the general, observable activity • Need to know the environment and domain very well • Can be annoying for people, and could bias their behaviour • Example: I want to monitor informal communication in the group, and I observe an open development space for a certain amount of days.
  • 71. Participant Observation (Join the Team) • What are they: the researcher joins the development team and perform some activities like the others. • Advantages: • More acceptance by the participants • Deeper understanding of the dynamics • Create rapport with people • You can contribute to the team • Disadvantages: • Extremely time consuming (it’s an additional job) • May lose external perspective • Example: Over 17 months, a researcher participated in 23 code inspection meetings. From his participation, he developed a series of hypotheses on how factors such as familiarity, organizational distance, and physical distance are related to how much time is spent on discussion and tasks.
  • 72. 2nd Degree Techniques: Indirect Involvement • Instrumenting Systems • Fly-on-the-Wall: participants recording their own work The researcher needs to have contact with the research environment and with the participants, but: 1. does not need to interact with them during data collection, and 2. not much effort is required to participants
  • 73. Instrumenting Systems • What is it: monitor developer-system interaction during a certain task, e.g., with eye tracking, cameras, wristband, or add-on tools for logging. • Advantages: • No time commitment for software engineering (unless you carry out an experiment) • Accurate information • Disadvantages: • Data are “raw” and do not have a clear meaning • Ethical concerns in monitoring users • Example: I want to monitor the degree of engagement of software developers in a company. I ask them to use wristbands during their day to record their engagement (sensed engagement). I instrument their computers with a logger to check what they are doing. I ask them to write down their degree of engagement every 30 minutes (working diary, reported engagement). I check to which extent the two measures (sensed engagement and reported) are in agreement, and what were the developers doing.
  • 74. Fly-on-the-Wall • What is it: participants are required to record or videotape themselves when they do a specific task. • Advantages: • Little effort required by the participant • No direct interaction with the researcher • Disadvantages: • High amount of data and high cost for analysing them • Videos are multi-modal data and analysing them is not straightforward • Not always easy to understand the content of videos • Example: I ask the team to video tape each meeting they do for a certain period (e.g., an iteration). I review the recording to see specific patterns of interaction, and the roles of the people.
  • 75. 3rd Degree Techniques aka Mining Software Repositories • Analysis of electronic database of work performed / Analysis of tool logs • Document Analysis such as code documentation and other software related documents • Static and Dynamic analysis of a system (Software Analytics) Require access only to work artefacts, such as source code or documentation
  • 76. 3rd Degree Techniques aka Mining Software Repositories • Analysis of electronic database of work performed / Analysis of tool logs • Document Analysis such as code documentation and other software related documents • Static and Dynamic analysis of a system (Software Analytics) Require access only to work artefacts, such as source code or documentation In recent years, with the development of shared repositories, such as GitHub, these data collection activities go under the name Mining Software Repositories
  • 77. Analysis of electronic database of Work Performed and Tool Logs • What is it: access to the platforms for issue or bug reporting (e.g., Bugzilla), change request, configuration management systems, version control systems (e.g., git) • Advantages: • Large amount of data • Stable and independent of the researcher • People do not need to do extra work • Disadvantages: • Too much data! • Limited knowledge of work environment • People do not necessarily fill all the information needed (e.g., in commit messages) • Different process management policies in different companies, and this may impact on the data • Example: I want to understand which are the typical patterns of software evolution. I analyse the change requests and commits in a certain software repository and check, e.g., when are they typically performed, by whom, and if there is a typical sequence of actions.
  • 78. Document Analysis • What is it: analysis of documents related to the software process, such as code comments, 
 e-mails, stack overflow, twitter, app review, developer’s documentation, users’ manual, etc. • Advantages: • Large amount of data in natural language (English, Italian, German, etc.) • Written information can answer why questions • Researcher’s independent • Disadvantages: • Requires knowledge of the context • Natural language processing (NLP) techniques needed for large amount of data • Data are often “dirty” • Example: I want to understand whether the app reviews on the Apple Store actually contain potential new requirements for the app. I ask some subjects to check a certain amount of reviews, identify requirements, and check their agreement (I can also decide to automatically predict whether a certain review includes a requirement or not, based on the manually checked reviews).
  • 79. Static and Dynamic Analysis of a System (Software Analytics) • What is it: analyze the code (static analysis) or traces generated by running the code (dynamic analysis) to learn about the design, and indirectly about how software engineers think and work. One might compare the programming or architectural styles of several software engineers by analyzing their use of various constructs, or the values of various complexity metrics. • Advantages: • Large amount of data • Researcher’s independent • Analysis tools are emerging (https://github.com/ishepard/pydriller, https://github.com/uni-bremen-agst/ libvcs4j, https://ghtorrent.org) • Disadvantages: • Source code is not always easy to understand • Dynamic behaviour is even more difficult • Need to resort on automatic support • Example: I want to check which are the most frequent dynamic errors triggered by software in GitHub. 
 I download a selection of representative projects, and analyse them with an abstract interpretation tool, and see which are the typical errors.
  • 80. Data Collection Techniques: Summary Table 2. Questions asked by software engineering researchers (column 2) that can be answered by field study techniques. Technique Used by researchers when their goal is to understand: Volume of data Also used by software engineers for: First Order Techniques Brainstorming and Focus Groups Ideas and general background about the process and product, general opinions (also useful to enhance participant rapport) Small Requirements gathering, project planning Surveys General information (including opinions) about process, product, personal knowledge etc. Small to Large Requirements and evaluation Conceptual modeling Mental models of product or process Small Requirements Work Diaries Time spent or frequency of certain tasks (rough approximation, over days or weeks) Medium Think-aloud sessions Mental models, goals, rationale and patterns of activities Medium to large UI evaluation Shadowing and Observation Time spent or frequency of tasks (intermittent over relatively short periods), patterns of activities, some goals and rationale Small Advanced approaches to use case or task analysis Participant observation (joining the team) Deep understanding, goals and rationale for actions, time spent or frequency over a long period Medium Second Order Techniques Instrumenting systems Software usage over a long period, for many participants Large Software usage analysis Fly in the wall Time spent intermittently in one location, patterns of activities (particularly collaboration) Medium Third Order Techniques Analysis of work databases Long-term patterns relating to software evolution, faults etc. Large Metrics gathering Analysis of tool use logs Details of tool usage Large Documentation analysis Design and documentation practices, general understanding Medium Reverse engineering Static and dynamic analysis Design and programming practices, general understanding Large Program comprehension, metrics, testing, etc. DATACOLLECTIONMETHODS315 NOTE: “first order” in this table means “first degree” cf. Lethbridge et al., 2005, https://doi.org/10.1007/s10664-005-1290-x
  • 81. Building Theories in Software Engineering cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12 cf. Mendez Fernandez, https://www.slideshare.net/mendezfe/an-introduction-into- philosophy-of-science-for-software-engineers Alessio Ferrari, ISTI-CNR, Pisa, Italy alessio.ferrari@isti.cnr.it
  • 82. What is a Theory? • A statement about the existence of some pattern in the entities that belong to a certain context
 
 • The boundary of the context determines the scope of applicability of the theory (e.g., all the people in a certain company vs all the C developers of the world) A theory exists where some form of REGULARITY can be identified “entities” = “observable phenomena” = “events and objects”
  • 83. What is a Theory? • I can have different levels of sophistication of a theory, depending to how much abstract are the entities considered (how far are them from direct, measurable observations): • Low: 90% of faults are found in functions that are longer than 1000 LOC (once a definition of fault is given, this can be verified quite precisely) • Medium: Requirements defects can be classified into unclarity, incompleteness and incorrectness (I need precise definitions for the three classes, I have to assess that all the existing defects can be linked to one of the classes, I have to check that every reader classifies in the same manner…verification of the theory is complicated) • High: If the team leader is not self-confident, developers lose trust 
 (I need measures for self-confidence and trust, verification of the theory is VERY complicated) In SE you will find all these different types of theory
  • 84. What a Theory Does? Description Explanation Prediction Explanation and Prediction Design and Action descriptions and conceptualisations (taxonomies, ontologies, e.g., defect types example) identify the motivation (e.g., team leader example) predict according to a model (e.g., fault example) find model and motivation prescriptive (e.g., testing resources example, initial slides) cf. also https://www.quora.com/How-can-statistics-tell-us-about-causality
  • 85. What are the Elements 
 of a Theory? • The elements of a theory can be framed according to 6 questions What How Why Where When for Whom Scope conditions What are the entities in terms of which a theory offers description, explanation, prediction or prescription? These are the constructs of a theory. How are the constructs related? Relationships between constructs make up a theory’s propositions, and describe how the constructs interact. Can lead to predictions Why do the relationships hold? Answers to this question 
 are what give the theory explanatory power Identify the circumstances in which the theory 
 is applicable (the context)
  • 86. Constructs (WHAT)12 Building Theories in Software Engineering 323 Table 3 Constructs, propositions, example explanations and scope of the theory of UML-based development Constructs C1 UML-based development method C2 Costs (total number of person hours in the project) C3 Communication (ease of discussing solutions within development teams and in reviews) C4 Design (perceived structural properties of the code) C5 Documentation (the documentation of the system for the purpose of passing reviews as well as for expected future maintainability) C6 Testability (more efficient development of test cases and better quality, i.e., better coverage) C7 Training (training in the UML-based method before the start of the project) C8 Coordination (of requirements and teams) C9 Legacy code (code that has not been reverse engineered to UML-models) Propositions P1 The use of a UML-based development method increases costs P2 The use of a UML-based development method positively affects communication P3 The use of a UML-based development method positively affects design P4 The use of a UML-based development method positively affects documentation P5 The use of a UML-based development method positively affects testability We will see more of this example later on
  • 87. Propositions (HOW) C2 Costs (total number of person hours in the project) C3 Communication (ease of discussing solutions within development teams and in reviews) C4 Design (perceived structural properties of the code) C5 Documentation (the documentation of the system for the purpose of passing reviews as well as for expected future maintainability) C6 Testability (more efficient development of test cases and better quality, i.e., better coverage) C7 Training (training in the UML-based method before the start of the project) C8 Coordination (of requirements and teams) C9 Legacy code (code that has not been reverse engineered to UML-models) Propositions P1 The use of a UML-based development method increases costs P2 The use of a UML-based development method positively affects communication P3 The use of a UML-based development method positively affects design P4 The use of a UML-based development method positively affects documentation P5 The use of a UML-based development method positively affects testability P6 The positive effects of UML-based development are reduced if training is not sufficient and adapted P7 The positive effects of UML-based development are reduced if there is insufficient coordi- nation of modelling activities among distributed teams working on the same project P8 The positive effects of UML-based development are reduced if the activity includes modification of legacy code Explanations E4 The documentation is – More complete – More consistent due to traceability among models and between models and code – More readable, and makes it easier to find specific information, due to a common format – More understandable for non-technical people – May be viewed from different perspectives due to different types of diagram E5 Test cases based on UML models – Are easier to develop – Can be developed earlier
  • 88. Explanation (WHY) are specified further into propositions of the theory, as indicated in Fig. 3; the P3 The use of a UML-based development method positively affects design P4 The use of a UML-based development method positively affects documentation P5 The use of a UML-based development method positively affects testability P6 The positive effects of UML-based development are reduced if training is not sufficient and adapted P7 The positive effects of UML-based development are reduced if there is insufficient coordi- nation of modelling activities among distributed teams working on the same project P8 The positive effects of UML-based development are reduced if the activity includes modification of legacy code Explanations E4 The documentation is – More complete – More consistent due to traceability among models and between models and code – More readable, and makes it easier to find specific information, due to a common format – More understandable for non-technical people – May be viewed from different perspectives due to different types of diagram E5 Test cases based on UML models – Are easier to develop – Can be developed earlier – Are more complete – Have a more a unified format Moreover, traceability from requirements to code and test cases makes it is easier to identify which test cases must be run after an update Scope The theory is supposed to be applicable for distributed projects creating and modifying large, embedded, safety-critical subsystems, based on legacy code or new code
  • 89. Scope Conditions (WHEN, WHERE, for WHOM…) are specified further into propositions of the theory, as indicated in Fig. 3; the propositions P6–P8 are examples of moderators. The scope of the theory is also illustrated in the diagram. Scope conditions are typically modelled as subclasses or component classes. Figure 3 shows that our – More readable, and makes it easier to find specific information, due to a common format – More understandable for non-technical people – May be viewed from different perspectives due to different types of diagram E5 Test cases based on UML models – Are easier to develop – Can be developed earlier – Are more complete – Have a more a unified format Moreover, traceability from requirements to code and test cases makes it is easier to identify which test cases must be run after an update Scope The theory is supposed to be applicable for distributed projects creating and modifying large, embedded, safety-critical subsystems, based on legacy code or new code
  • 90. Scope Conditions (WHEN, WHERE, for WHOM…) are specified further into propositions of the theory, as indicated in Fig. 3; the propositions P6–P8 are examples of moderators. The scope of the theory is also illustrated in the diagram. Scope conditions are typically modelled as subclasses or component classes. Figure 3 shows that our – More readable, and makes it easier to find specific information, due to a common format – More understandable for non-technical people – May be viewed from different perspectives due to different types of diagram E5 Test cases based on UML models – Are easier to develop – Can be developed earlier – Are more complete – Have a more a unified format Moreover, traceability from requirements to code and test cases makes it is easier to identify which test cases must be run after an update Scope The theory is supposed to be applicable for distributed projects creating and modifying large, embedded, safety-critical subsystems, based on legacy code or new code This example theory answers all the questions, but the theories you develop may answer only a SUBSET of the questions (e.g., WHY is left to other researchers)
  • 91. How are Theories Formed? Induction, Deduction, Abduction Theory Observation Induction Hypothesis Deduction Test Theory Abduction Deduction
  • 92. How are Theories Formed? Induction, Deduction, Abduction Theory Observation inference of a generalized conclusion from particular instances Induction Hypothesis Deduction Test Theory Abduction Deduction
  • 93. derive testable hypothesis for a theory How are Theories Formed? Induction, Deduction, Abduction Theory Observation inference of a generalized conclusion from particular instances Induction Hypothesis Deduction Test Theory Abduction Deduction
  • 94. How are Theories Formed? Induction, Deduction, Abduction Theory Observation inference of a generalized conclusion from particular instances Induction Hypothesis Deduction Test Theory Abduction Deduction
  • 95. How are Theories Formed? Induction, Deduction, Abduction Theory Observation inference of a generalized conclusion from particular instances Induction Hypothesis Deduction Test Theory Abduction generalize from theories Deduction
  • 96. derive testable hypothesis for a theory How are Theories Formed? Induction, Deduction, Abduction Theory Observation inference of a generalized conclusion from particular instances Induction Hypothesis Deduction Test Theory Abduction generalize from theories Deduction
  • 97. Criteria for Evaluating Theories12 Building Theories in Software Engineering 319 the presence of a falsifiable theory, which gives rise to hypotheses that are tested by observation. Although this framework as such has been overtaken by other frameworks (Ruse, 1995), the principle of testability remains fundamental for empirically-based theories. There are no commonly agreed set of criteria for evalu- ating testability, but we will emphasize the criteria as follows: (1) The constructs Table 1 Criteria for evaluating theories Testability The degree to which a theory is constructed such that empirical refutation is possible Empirical support The degree to which a theory is supported by empirical studies that confirm its validity Explanatory power The degree to which a theory accounts for and predicts all known observations within its scope, is simple in that it has few ad hoc assumption, and relates to that which is already well understood Parsimony The degree to which a theory is economically constructed with a mini- mum of concepts and propositions Generality The breadth of the scope of a theory and the degree to which the theory is independent of specific settings Utility The degree to which a theory supports the relevant areas of the software industry cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12 To what extent does my theory explain WHY?
  • 98. Step-by-Step guide to Formulating Theories (Deductive) 1. Define constructs of the theory (can be novel constructs, existing ones, or refinements of existing ones) 2. Define propositions (novel, modifications/refinements of existing ones) 3. Provide explanations to justify the theory (explicit assumptions and logical justifications for the constructs and propositions of the theory, referring to existing theories, also from other disciplines) 4. Define the scope of interest (values of constructs, and combinations thereof, that the theory is oriented to explain)
  • 99. Step-by-Step guide to Formulating Theories (Deductive) 1. Define constructs of the theory (can be novel constructs, existing ones, or refinements of existing ones) 2. Define propositions (novel, modifications/refinements of existing ones) 3. Provide explanations to justify the theory (explicit assumptions and logical justifications for the constructs and propositions of the theory, referring to existing theories, also from other disciplines) 4. Define the scope of interest (values of constructs, and combinations thereof, that the theory is oriented to explain) Every time you are applying an empirical method you are actually building theories
  • 100. Step-by-Step guide to Formulating Theories (Deductive) 5. Test the theory through empirical research (examination of the validity of the theory’s predictions through empirical studies): 1. Choosing an appropriate research setting and sample. The sample does not only include the actors, but also the sample of technologies, activities (tasks) and systems. 2. Operationalizing theoretical constructs into empirical variables (e.g., justify the connection between complexity of software and its measure in lines of code) 3. Operationalizing theoretical propositions into empirically testable hypotheses (definition of hypotheses in terms of empirical variables) 4. Application of qualitative or quantitative methods to test the hypotheses (when speaking about hypothesis testing, we normally refer to quantitative statistical tests, however the conceptual process is the same also for qualitative methods) 6. Define scope of validity (part of the scope of interest in which the theory has actually been validated)
  • 101. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 102. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 103. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 104. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 105. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 106. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 107. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 108. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 109. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 110. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 111. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 112. Step-by-Step Graphical Guide (Deductive) Theory Operationalisation (Variables Hypothesis and Sample Definition) Data Collection and Measurements Data Analysis (Hypothesis Testing) Confirm/Reject and Scope of Validity
  • 113. Scope of Validity 328 D.I.K. Sjøberg et al. to the scope of interest. The first consideration to make in testing a theory is to make sure that the study fits the theory’s scope of interest. Otherwise, the results would be irrelevant to that theory. Moreover, in a given study, typically only a part Fig. 4 Scope of interest versus scope of validity cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12
  • 114. Threats to Validity • Empirical support (or inconsistencies) between theoretical propositions and empirical observations do not necessarily imply that the theory is validated (or disconfirmed) • Judgements regarding the validity of the theory require that the study is well conducted, and not encumbered with • Invalid operationalization of theoretical constructs and propositions • Inappropriate research design • Inaccuracy in data collection and data analysis • Misinterpretation of empirical findings
  • 115. Threats to Validity • Construct: have I operationalised all the constructs correctly? (e.g., is LOC a good scale to measure complexity?) • Internal: are there aspects that may have influenced my outcome and that I did not consider? (identify confounding variables, e.g., did the people already see the code they are evaluating?) • External: to which extent are my findings generalisable (how much of the scope of interest is covered, e.g., which type of languages are considered?) Each research method has specific classifications for threats to validity, we will see them later, here are three general notions
  • 116. Theory Operationalisation Sample Definition, Data Collection Step-by-Step Graphical Guide (Inductive) Often I do not have the information to identify construct, propositions, and explanations before data collection Therefore I start with data collection! Constructs, propositions and explanations are extracted from the data (normally QUALITATIVE)
  • 117. Theory Data Analysis and Operationalisation Sample Definition, Data Collection Step-by-Step Graphical Guide (Inductive to Deductive) INDUCTIVE
  • 118. Theory Data Analysis and Operationalisation Sample Definition, Data Collection Step-by-Step Graphical Guide (Inductive to Deductive) INDUCTIVE Operationalisation Sample Definition, Data Collection Data Analysis / Hypothesis Testing Refutation/ Confirmation Scope of Validity DEDUCTIVE
  • 119. Generating a Theory — Inductive to Deductive An Example • Field study in a company to investigate benefits and challenges of the use of a UML-based method in a large distributed development project • Goal of the project: new safety-critical process-control system based on several existing systems • Four sites in three countries, 230 people, 100 using UML • Data was collected through individual interviews, questionnaires and project documents. cf. Sjøberg et al., 2009 https://dx.doi.org/10.1007/978-1-84800-044-5_12
  • 120. Generating a Theory: Example • Step 1: Defining the constructs • Interviews are performed to identify which are the most significant concepts to consider. They applied the so-called “open coding” to the interview transcripts to identify the constructs
  • 121. Generating a Theory: Example • Step 1: Defining the constructs • Interviews are performed to identify which are the most significant concepts to consider. They applied the so-called “open coding” to the interview transcripts to identify the constructs 12 Building Theories in Software Engineering 323 Table 3 Constructs, propositions, example explanations and scope of the theory of UML-based development Constructs C1 UML-based development method C2 Costs (total number of person hours in the project) C3 Communication (ease of discussing solutions within development teams and in reviews) C4 Design (perceived structural properties of the code) C5 Documentation (the documentation of the system for the purpose of passing reviews as well as for expected future maintainability) C6 Testability (more efficient development of test cases and better quality, i.e., better coverage) C7 Training (training in the UML-based method before the start of the project) C8 Coordination (of requirements and teams) C9 Legacy code (code that has not been reverse engineered to UML-models) Propositions
  • 122. Generating a Theory: Example • Step 2: Defining the propositions • From the interviews, relationships are identified between constructs (e.g., relation between UML and cost), and these are translated into propositions • The resulting propositions are confirmed with questionnaires 12 Building Theories in Software Engineering 323 Table 3 Constructs, propositions, example explanations and scope of the theory of UML-based development Constructs C1 UML-based development method C2 Costs (total number of person hours in the project) C3 Communication (ease of discussing solutions within development teams and in reviews) C4 Design (perceived structural properties of the code) C5 Documentation (the documentation of the system for the purpose of passing reviews as well as for expected future maintainability) C6 Testability (more efficient development of test cases and better quality, i.e., better coverage) C7 Training (training in the UML-based method before the start of the project) C8 Coordination (of requirements and teams) C9 Legacy code (code that has not been reverse engineered to UML-models) Propositions P1 The use of a UML-based development method increases costs P2 The use of a UML-based development method positively affects communication P3 The use of a UML-based development method positively affects design P4 The use of a UML-based development method positively affects documentation P5 The use of a UML-based development method positively affects testability P6 The positive effects of UML-based development are reduced if training is not sufficient and adapted P7 The positive effects of UML-based development are reduced if there is insufficient coordi- nation of modelling activities among distributed teams working on the same project P8 The positive effects of UML-based development are reduced if the activity includes modification of legacy code Explanations
  • 123. Generating a Theory: Example • Step 3: Provide explanations • Further analyse the interviews to understand the reasons behind the propositions • Perform further interviews and check project documents to make sense of identified phenomena C5 Documentation (the documentation of the system for the purpose of passing reviews as well as for expected future maintainability) C6 Testability (more efficient development of test cases and better quality, i.e., better coverage) C7 Training (training in the UML-based method before the start of the project) C8 Coordination (of requirements and teams) C9 Legacy code (code that has not been reverse engineered to UML-models) Propositions P1 The use of a UML-based development method increases costs P2 The use of a UML-based development method positively affects communication P3 The use of a UML-based development method positively affects design P4 The use of a UML-based development method positively affects documentation P5 The use of a UML-based development method positively affects testability P6 The positive effects of UML-based development are reduced if training is not sufficient and adapted P7 The positive effects of UML-based development are reduced if there is insufficient coordi- nation of modelling activities among distributed teams working on the same project P8 The positive effects of UML-based development are reduced if the activity includes modification of legacy code Explanations E4 The documentation is – More complete – More consistent due to traceability among models and between models and code – More readable, and makes it easier to find specific information, due to a common format – More understandable for non-technical people – May be viewed from different perspectives due to different types of diagram E5 Test cases based on UML models – Are easier to develop – Can be developed earlier – Are more complete – Have a more a unified format Moreover, traceability from requirements to code and test cases makes it is easier to identify which test cases must be run after an update Scope
  • 124. Generating a Theory: Example • Step 4: Identifying the scope of interest of the theory • Technology: UML • Actor: designers in distributed teams • Software System: large, embedded software • Activity: create and modify UML diagrams
  • 125. Generating a Theory: Example • Step 5: Testing the theory - Deductive Step • Consider each proposition and perform a study for each one, or for a subset, e.g., “Use of UML methods increases cost”, “Use of UML methods positively affects testability” • I can use different methods to test the theory: • Field studies: identify companies who are willing to introduce UML; establish a way to evaluate cost (e.g., man-hour); consider a comparable company not using UML; check resulting cost. • Experiment: two group of subjects; give them a requirements document; ask group 1 to implement the code; ask group 2 to design and then implement; evaluate and compare cost. • Survey/Questionnaire: contact multiple companies who have introduced UML and ask them to state their agreement with the propositions and the explanations • Step 6: based on the selected study I identify the scope of validity (larger for survey, narrower for field studies)
  • 126. Generating a Theory: Example • Theory Evaluation • Testability: constructs are not ambiguous, and propositions are clear, furthermore protocols are shared for replication. Since some subjective data collection was performed, replication may lead to different results. • Empirical support: other studies seem to confirm part of the propositions • Explanatory power: the motivations are derived from interviews, and not all factors may have been considered. Hence the explanatory power is limited (did not account for all possible reasons WHY). • Parsimony: reduced number of constructs and relationships in the proposition • Generality: scope is narrow, as I have performed a case study • Utility: utility is high, as it can help decision making These are all logical arguments that have to be checked by peers!
  • 127. The ABC of Software Engineering Research cf. Stol and Fitzgerald, 2018 https://doi.org/10.1145/3241743
  • 128. The Need for a Taxonomy of Methods Strategies • As we said, there is no universally accepted taxonomy for research methods in SE The ABC of Software Engineering Research 11:3 Table 1. A “Mixed Bag”: Alternative Research Methods in Software Engineering According to a Selection of Sources Glass et al. [63] Zannier et al. [230] Sjøberg et al. [190] Höfer and Tichy [75] Easterbrook et al. [48] Action research Controlled experiment Controlled experiment Case study Experimentation Conceptual analysis Quasi experiment Surveys Correlational study Case study Concept implementation Case study Case studies Ethnography Survey Case study Exploratory case study Action research Ex post facto study Ethnography Data analysis Experience report Experiment Action research Discourse analysis Meta-analysis Meta-analysis Ethnography Example application Phenomenology Field experiment Survey Survey Field study Discussion Grounded theory Hermeneutics Instrument development Laboratory experiment (human/software) Literature review Meta-analysis Mathematical proof Protocol analysis Phenomenology Simulation Descriptive/expl. survey Each author use different terms to refer to research methods, and there is no agreement Let us talk about STRATEGIES, which can adopt specific METHODS…
  • 129. A Unifying Framework: ABC of SE Research • Actors: human and technical, i.e., managers, software engineers, users, software systems, software development artifacts incl. defects, tools, techniques, prototypes • Behaviour: of all actors, i.e., system behavior (e.g., reliability, performance, and other quality attributes), software engineers’ behavior and antecedents such as productivity, motivation, and intention • Context: of all actors, i.e., industrial settings, organizations, software projects, development teams, software laboratory, classroom, meeting rooms "Optimizing a study to achieve generalizability over actors (A) and precise measurement of their behavior (B), in a realistic context (C), is impossible, and is a “three-horned dilemma [since] there is no way—in principle—to maximize all three (conflicting) desiderata of the research strategy domain” (McGrath, 1981 https://doi.org/10.1177/000276428102500205 ) Three main dimensions…
  • 130. • Obstrusiveness: to what extent does a researcher “intrude” on the research setting, or simply make observations in an unobtrusive way (i.e., how much control do I have on the empirical settings) • Generalizability: to which extent the research findings are generalizable (i.e., how much of the scope of interest is it covered, given the current scope of validity) And two other dimensions… A Unifying Framework: ABC of SE Research
  • 131. ABC Framework more obtrusive less obtrusive more general less general Precise characterisation of Behaviour is relevant Precise characterisation of specific Context is relevant Generalizability over Actors is relevant A B C Note: Actors can be People or Software; Behaviour of People or Software
  • 132. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
  • 133. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi- Jungle Natural Reserve Flight SimulatorIn Vitro Experiment Courtroom Referendum Mathematical Model Forecasting System
  • 134. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi- Jungle Natural Reserve Flight SimulatorIn Vitro Experiment Courtroom Referendum Mathematical Model Forecasting System
  • 135. Field Studies (Jungle) • Purpose: To investigate the impact of distributed teams in software development • Setting: Natural, first author spent 7 months on-site at an organization. • Procedure: Document study, observation, interviews. • Findings: Four major problems and 8 specific challenges Example
  • 136. Field Studies (Jungle) • Setting: Natural setting that exists before the researcher enters it. Minimal intrusion of the setting so as not to disturb realism, only to facilitate data collection. • Purpose: • Exploratory, to understand what’s going on, how things work, or to generate hypotheses. • Typical Methods and Data: Case study, ethnography, observational study; qualitative data incl. interviews, field notes, archival documents, may include quantitative data. • Inherent Limitations: • No statistical generalizability • No control over events • Low precision of measurement
  • 137. Field Studies (Jungle) • Essence: • Facilitates the study of real-world actors (people, systems) and their behaviors in a natural setting that is not manipulated by the researcher. • High potential to capture realistic settings and a high degree of detail of a particular system and context. • Evaluation Considerations: • Not suitable to investigate statistical relationships, or to otherwise manipulate variables, • Not suitable for findings that hold for larger populations.
  • 138. Field Experiments (Natural Reserve) • Purpose: To identify a cost-effective way to avoid software defects. • Setting: Natural, company staff and researcher collaborated on-site, using real products to evaluate new approaches. • Procedure: Action research (improving case study, design science), data include defect reports, time spent, usability issues, timeliness of the project, product sales. • Findings: certain techniques are beneficial, while other are time consuming and do not avoid defects
  • 139. Field Experiments (Natural Reserve) • Setting: Natural, pre-existing setting (in vivo), but some level of intrusion due to the deliberate manipulation of aspects of the setting; study affected by confounding factors. • Purpose: To investigate, evaluate, or compare techniques, practices, processes, or approaches within a real-world and pre-existing setting. • Typical Methods and Data: case study, quasi-experiment, action research; studies may use either quantitative data or qualitative data. • Inherent Limitations: • No statistical generalizability • Precision of measurement affected by confounding contextual factors
  • 140. • Essence: • Facilitates the study of effects of a modification of properties of a studied entity or phenomenon that occurs in a natural setting, i.e., pre-exist independent of the researcher. • Potentially very costly to set up due to complexity of natural settings. • Evaluation Considerations: • Limited level of precision of measurement; • Results not generalizable, but strongly linked to the specific setting due to confounding variables that are very difficult to isolate. Field Experiments (Natural Reserve)
  • 141. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi- Jungle Natural Reserve Flight SimulatorIn Vitro Experiment Courtroom Referendum Mathematical Model Forecasting System
  • 142. Experimental Simulations (Flight Simulator) • Purpose: To understand how developers perceive the testing team • Setting: Contrived, simulation environment with experimental stimuli that were previously defined (e.g., the software to be written by the developers, the types of checks performed by testers) • Procedure: developers develop code, testers test and give feedback during a meeting, impressions of developers are observed • Findings: Insights into defensive reactions of the developers
  • 143. Experimental Simulations (Flight Simulator) • Setting: Contrived setting (in virtuo) created specifically for a study to represent a concrete type of setting. Environment is created by the researcher to study behavior of actors. • Purpose: To study behavior of participants or systems in a controlled setting that resembles a real-world, concrete class of settings as closely as possible. • Typical Methods and Data: Simulation/Role-playing games, management games, instrumented multiplayer games; quantitative or qualitative data, depending on the simulation instrument. • Inherent Limitations: • Generalizability reduced as the setting is designed to mirror a specific type of setting (e.g. I have specific subjects from a company) • Realism reduced due to artificial setting Similar to Lab experiment, but more context-specific
  • 144. • Essence: • A contrived setting that simulates a specific class of real-world systems that to some extent resembles reality. • Temporal flow of events depends on the simulation environment and actors’ behavior, which allows for observing more natural behavior than a laboratory experiment. • Evaluation Considerations: • Reduced level of realism compared to field experiments due to the contrived setting • Behavior of actors may reflect that in natural settings, but consequences for actors lack realism, which may affect their behavior. Experimental Simulations (Flight Simulator)
  • 145. Laboratory Experiments (in Vitro Experiments) • Purpose: To investigate the hypothesis that a certain code inspection method A is more effective than another method B • Setting: Contrived, laboratory exercise with graduate students • Procedure: Measurement of effect of inspection methods on 4 dependent variables including fault detection rate • Findings: inspection method A is more effective than inspection method B
  • 146. Laboratory Experiments (in Vitro Experiments) • Setting: Contrived setting (in vitro) created specifically for a study, with high degree of control of all measured variables. • Purpose: • to study with a high degree of precision relationships between variables, or comparisons between techniques; • may allow establishment of causality between variables. • Typical Methods and Data: Randomized controlled experiments and quasi experiments, comparative evaluations with benchmark studies; usually quantitative data exclusively. • Inherent Limitations: • Abstract or unrealistic context due to highly artificial setting • Scope of problem reduced to study the “essence”, optimizing internal validity at cost of external validity
  • 147. • Essence: • A controlled setting where behavior of actors (humans or systems) is carefully measured through a number of discrete trials to establish effects or conduct comparative analyses. • Maximum potential to capture precise measurement of variables (high internal validity) due to potential to isolate confounding factors. • Evaluation Considerations: • Studied relationships and variables are more abstract due to the contrived and “sterile” nature of the research setting. • The setting is more artificial than for experimental simulations Laboratory Experiments (in Vitro Experiments)
  • 148. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi- Jungle Natural Reserve Flight SimulatorIn Vitro Experiment Courtroom Referendum Mathematical Model Forecasting System
  • 149. Judgment Studies (Courtrooms) • Purpose: To evaluate a set of 12 practices based on feedback by team managers. • Setting: Neutral, dedicated meeting room with seating around a table. • Procedure: 10 managers from 7 companies, selected based on their interest and expertise. • Findings: a framework of defect and benefits for the 12 practices
  • 150. Judgment Studies (Courtrooms) • Setting: Neutral setting; may be actively designed to nullify the context, so that “responses” are in relation to some stimulus (question or instructions), independent of setting. • Purpose: To elicit information from subjects for purposes of evaluation or study of some entities. • Typical Methods and Data: Delphi studies, interview studies, focus group, evaluation studies; use of qualitative and/or quantitative data. • Inherent Limitations: • Responses not related to any specific or realistic context • Less generalizability than sample studies due to lack of representative sampling • Less control and precision of measurement than a lab. exp.
  • 151. • Essence: • Facilitates study of responses or behavior of actors that bears no relation to the research setting, which is neutral or actively “neutralized.” • Allows for more complex questions and interactions between researcher and respondents. • Evaluation Considerations: No concrete or natural setting, which prohibits capturing direct observations of phenomena. Judgment Studies (Courtrooms)
  • 152. Sample Studies (Referendum) • Purpose: To investigate the state of practice of requirements engineering in industry. • Setting: Neutral, web-based questionnaire. • Procedure: 22 questions; participants drawn from internet; 194 responses from a population of 1,519 • Findings: Findings include organization and participant characteristics (various domains; participants held variety of positions); software development life cycle model (agile, waterfall, etc.); RE techniques.
  • 153. Sample Studies (Referendum) • Setting: Neutral setting. Limited level of precision of measurement; no variables are manipulated. The researcher must deal with whatever data is collected. • Purpose: To study the distribution of a particular characteristic in a population (of people or systems), or the correlation between two or more characteristics in a population. • Typical Methods and Data: Software repository mining, surveys, questionnaires, interviews; analysis includes correlational methods, e.g., regression. Typically, quantitative data (e.g., Likert scales) but can include qualitative data. • Inherent Limitations: • Reductionist—depth of and number of data points per participant limited • Data collection not “interactive”: no option to clarify questions; repository data comes as is, no opportunity to manipulate variables, only to correlate them
  • 154. • Essence: • Facilitates data collection from a representative sample of a population (human or nonhuman, such as systems or design artifacts). • Maximum potential to generalize findings to a wider population; • Unobtrusive research strategy. • Evaluation Considerations: • Questions tend to be “simple”; • Limited opportunity for “complex” interaction between the researcher and subjects. • Research setting offers no realistic context. Sample Studies (Referendum)
  • 155. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi- Jungle Natural Reserve Flight SimulatorIn Vitro Experiment Courtroom Referendum Mathematical Model Forecasting System
  • 156. Formal Theory (Mathematical Model) • Purpose: To develop an understanding of the role of creativity in RE. • Setting: no empirical observations, but derivation of a conceptual framework from literature. • Procedure: check general creativity literature, check requirements engineering creativity literature. • Findings: A theoretical framework that offers RE researchers a basis to incorporate creativity in RE methods and techniques.
  • 157. Formal Theory (Mathematical Model) • Setting: Nonempirical setting; typically a research office or library. • Purpose: • To develop a conceptualization, framework, or theory on a topic. • Focus is on formulating relations among concepts, or explanations that hold for a wide range of contexts. • Typical Methods and Data: literature reviews, Conceptual reasoning, concept development, development of propositions and/or hypotheses; framework development. • Inherent Limitations: • Low on realism: does not consider a specific context but rather abstract concepts • No manipulation of variables or measurement (no empirical information is gathered)
  • 158. • Essence: • The careful and justified construction of a theoretical model that represents one view of a phenomenon, which helps to analyze or explain the real world. • Model generic behavior for a range of classes of populations (humans or nonhuman artifacts), which serves to make predictions or explanations about the real world. • Evaluation Considerations: • Theoretical models do not generate new empirical observations, though may inform future empirical studies. Formal Theory (Mathematical Model)
  • 159. Computer Simulations (Forecasting system) • Purpose: To investigate bottlenecks and overload in the testing processes. • Setting: Nonempirical, a discrete event simulator was implemented. • Procedure: Four simulation scenarios with different parameter values to model different circumstances. • Findings: Two ways were identified to avoid congestion:
 (1) increase number of staff, (2) increase the number of interactions with the development team.
  • 160. Computer Simulations (Forecasting system) • Setting: Nonempirical setting (in silico); no recording of observations in the real world. There are no actors (people, real-world systems) or real-world behavior: everything is specified in the simulation. • Purpose: To model a particular system or phenomenon that facilitates evaluation of a large number of complex scenarios that are captured in the preprogrammed model. • Typical Methods and Data: Development of software programs that contain symbolic representations of all variables a researcher considers important; usually these variables are derived and calibrated based on prior empirical studies. • Inherent Limitations: • No empirical data is gathered • Results will be as good as the accuracy of the model representing the simulated system • Low generalizability as it attempts to model a specific class of real-world systems
  • 161. • Essence: • Represents a symbolic replica of a concrete real-world system where all configurations and variables are preprogrammed. • Useful to run a large number of complex scenarios to explore a solution space, which might not be feasible to do manually. • Evaluation Considerations: • All simulation rules are preprogrammed: no new empirical (i.e., real world, as opposed to simulated) behavior is observed. • Due to concrete implementation, limited generalizability. Computer Simulations (Forecasting system)
  • 162. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi- Jungle Natural Reserve Flight SimulatorIn Vitro Experiment Courtroom Referendum Mathematical Model Forecasting System Frequent in SE Rare in SE
  • 164. 7 Commandments of the Empirical Software Engineer (From Daniel Méndez) 1. No such thing as absolute and / or universal truth (truth is always relative)
 2. The value of scientific theories always depends on their • ability to stand criticism by the (research) community,
 • robustness / our confidence (e.g. degree of corroboration),
 • contribution to the body of knowledge (relation to existing evidence)
 • ability to solve a problem 3. Theory building is a long endeavour where
 • progress comes in an iterative, step-wise manner,
 • empirical inquiries need to consider many non-trivial factors,
 • we often need to rely on pragmatism and creativity
 • we depend on acceptance by peers (research communities) https://www.slideshare.net/mendezfe/an-introduction-into-philosophy-of-science-for-software-engineers
  • 165. 4. Be sceptical and open at the same time
 • no statement imposed by authorities shall be immune to criticism
 • be open to existing evidence and arguments/explanations by others
 5. Be always aware of
 • strengths & limitations of single research methods
 • validity and scope of observations and related theories
 • relation to existing body of knowledge / existing evidence
 6. Appreciate the value of
 • all research processes and methods
 • null results (one’s failure can be another one’s success) 
 • replication studies (progress comes via repetitive steps)
 7. Be an active part of something bigger (knowledge is built by communities)