Systematic Literature Reviews and Systematic Mapping Studies

Systematic Mapping Study and
Systematic Literature Review in
Software Engineering
Alessio Ferrari, CNR-ISTI, Pisa, Italy

alessio.ferrari@isti.cnr.it
cf. Petersen et al., 2008 https://doi.org/10.14236/ewic/EASE2008.8
cf. Kitchenham, 2007, https://go.aws/2TK4SN3
May, 2020

SLR and SMS
• Systematic Literature Reviews and Systematic Mapping
Studies can be regarded as GLORIFIED school
researches in the era of Internet (e.g, search and report
information about Napoleon)

• Instead of writing “Napoleon” in Google, you select
keywords (“artiﬁcial intelligence” AND “testing”), retrieve
papers from scientiﬁc databases, and then analyse them
one by one to produce some knowledge

SLR and SMS
artificial intelligence
testing
Scientific
Databases
Relevant Keywords
Classifications,
Statistics, Structured
Knowledge
Retrieved
Papers
Relevant
Papers

The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System

Motivation: Systematic
Mapping Studies
• Sometimes we do not know enough about a certain software engineering
field and we want to know what is the state-of-the-art, to identify possible
research directions

• Sometimes we need to create a framework of concepts to understand an
emerging software engineering field, and identify limitations and
opportunities for research

• For example, I can consider the field “artificial intelligence (AI) for
software testing”, and I may see that certain AI technologies are never
applied, and this opens novel opportunities for research

• We can search the literature, but it is very hard to demonstrate that you
have considered ALL the possible scientific sources of information

• It is also very hard to replicate the study Coverage and Reproducibility
Utility should drive the need for the study

Systematic Mapping Study (SMS)
• Goal: Build a classification scheme and structure a
software engineering field of interest

• Method: define a study protocol, collect papers from
databases, screen them, identify categories of papers,
present the results with plots

• Contribution:
• identify novel research directions

• identify areas that need more investigation

• provide a common terminology for the field

Motivation: Systematic
Literature Reviews
• Sometimes I need to search for a specific answer to a
research question, and I know that the literature includes
different studies regarding the topic

• Sometimes I know that there are different experimental
results concerning a topic, and I want to compare them

• Sometimes I wish to create a theory by abduction, by
generalising from different theories found in the literature
and related to different experimental results
Coverage, in terms of considered sources,
and Reproducibility are the main issue also in this case
Utility should drive the question

Systematic Literature
Reviews (SLR)
• Goal: Get an answer to a set of specific research questions
related to an area of research

• Method: define a study protocol, collect papers from
databases, screen them, identify categories, read the papers
in depth, extract data from them, present plots of the results
and abducted theory

• Contribution:
• identify answers to specific research questions

• identify a general theory about the topic of research

SMS vs SLR
• Rule of Thumb:

• If your RQs require a general classification of the papers, the
scope is large, and you are mostly focusing on the METHODS of
the studies, you are doing a SMS

• If your RQs require to extract specific information from the
papers, and you are comparing the RESULTS of different
empirical studies, you are doing a SLR

• If your RQs require both, you are doing a SLR (e.g., classification
of the METHODS, and then analysis of the RESULTS by class)

• Why should I do an SLR or SMS? Because you will have to make
a literature review anyway to learn about your research field, and
if you do not do it this way you will not publish it
Both are Secondary Studies, i.e., their source of information are Primary Studies
Their method is very similar, but their goals are different!

Systematic Literature
Reviews (SLRs)
Alessio Ferrari, CNR-ISTI

alessio.ferrari@isti.cnr.it

SLR Steps (in Theory)
PREPARATION
CONDUCTING
REPORTING
Deﬁnition of research questions
and study protocol
Search papers and
synthesise data
Results are
documented,
validated and
reported
SLRs should appear like if they
were conducted in this way,
but in practice several iterations
and adjustments are required
Research questions may change based on the retrieved data
The theory is built incrementally, and may need to be changed

SLR Steps
(a More Realistic Scheme)
Need for
SLR and
Question(s)
Pilot
Search
Protocol
Primary
Search
Title and
abstract
screening
Full-text
Download
Full-text
Screening
Secondary
Search
Secondary
References
Screening
Data
Collection
(Extraction)
Data
Analysis
(Synthesis)
Synthesis
and Report
Reference Management System
(Mendeley, Endnote, Zotero)
Scientiﬁc Archives
(IEEE Xplore, ACM, SpringerLink,
Scopus, Web of Science)
inclusion/
exclusion
quality
inclusion/
exclusion
qualityinclusion/
exclusion

Need for the SLR
• Prior to undertaking a systematic review, researchers should ensure that a systematic review is
necessary
• If other (sufficiently recent) reviews exist, then your review is not needed

• If very few papers touch your topic, then your review may be needed, but no one will publish it
• Rule of Thumb:

• Intensively search Google Scholar with terms related to your research (e.g., “artificial intelligence”
AND “testing”, and synonyms) and with the following terms: “literature”, “review”, “survey”,
“mapping study”, “state-of-the-art”

• Search the most recent (high-quality) papers about your topic of interest, check the related work
section, and see if they refer some survey or review (e.g., a specific paper about AI for testing)

• In your report, include a section in which you refer prior studies, and explain what your review adds
to the body of knowledge (e.g., There are reviews on “testing techniques”, but no review is focused
on “effectiveness of testing”; There are mapping studies on AI and testing, but none of them is
focused on “effectiveness of testing”; There is a review but it is from 10 years ago)

Issues that Trigger
Research Questions (RQs)
• Types of issues for SLRs in SE (from Kitchenham, 2007):

• Assessing the effect of a software engineering technology.

• Assessing the frequency or rate of a project development factor such
as the adoption of a technology, or the frequency or rate of project
success or failure.

• Identifying cost and risk factors associated with a technology.

• Identifying the impact of technologies on reliability, performance and
cost models

• Cost benefit analysis of employing specific software development
technologies or software applications

• Is there any relationship between user involvement and
the success of software systems?
• What should be the degree/level of user involvement in
software development to achieve desired results?
• What is the effectiveness of artificial intelligence
techniques in predicting software bugs?
• What are the human factors that affect project failure?

General, you can deﬁne it before the pilot search

You would deﬁne it after some pilot searches (you know what’s in the papers)

You would deﬁne it after some classiﬁcation of AI techniques for bug prediction (an SMS)

For a single study you can deﬁne how many (related) RQs you need
The important thing is that you have some information
to write about each question when reporting!

Research Questions (RQs) and Elements of Interest
• It is very hard to have a list of well-defined RQs since the beginning of your study

• A possible strategy is to first identify these elements of interest  
(check the categories that are interesting for you):

• Human subjects: developers, users, testers, managers

• Artefact: code, requirements, test cases
• Methods or technologies: artificial intelligence, SCRUM development, questionnaires

• Software engineering task: testing, elicitation, negotiation, maintenance

• Focus: reliability, bugs, happiness, effect, effectiveness

• Spatial Scope: research, large companies, small companies

• Temporal Scope: short term, long term, from one phase/task to another
• And then formulate the main RQ (knowing that it will change and it will be refined):

• What is the effectiveness of artificial intelligence technologies for testing?
• Which are the effects of introducing SCRUM development in large companies?









granularity may vary









granularity may vary
not all these elements
may be relevant to you

Pilot Search
• The Pilot Search is key to define and refine the RQs, as you do not really
know what you will find in the literature until you try to search the database

• The Pilot Search is key to define the keywords to be used

• The Pilot Search is key to understand whether it is the case to pursue the
SLR (you may find other reviews!)

• The Pilot Search is key to align the viewpoints of the different reviewers
(more than one person may need to be involved), and data checkers
(double checking is needed)

• If you do not find evidence of what you search, the number of retrieved
papers is low, or they have poor quality…well do another type of study,
e.g., an experiment

Pilot Search
• The Pilot Search consists in a set of unrestricted trials in which you first
define a draft of your protocol and then you apply it

• The Pilot Search does not follow any specific guideline

• The goal is to INCREMENTALLY define the study protocol

• Therefore, you will first select some keywords and databases, retrieve
some studies, check a relevant part of them, (try to) discard those that
are not interesting, (try to) extract data of interest, refine/modify the RQs,
change the keywords, search for additional databases…

• …UNTIL you are confident that you can do the same thing with all
the studies, and you can report the whole study like it was done
sequentially (but expect adjustments until the final reporting!)

Protocol
• Primary Search Strategy and Search String

• Secondary Search Strategy

• Inclusion/Exclusion Criteria

• Quality Criteria

• Data Collection (Extraction)

• Data Analysis (Synthesis)

Protocol



Protocol - Primary Search Strategy
• The Primary Search Strategy is the retrieval of studies (a.k.a. papers,
articles, works) with a search string from a set of scientiﬁc databases
(a.k.a. digital libraries)

• Databases (DBs) for SE are: Scopus, IEEE Xplore, ACM Digital
Library, SpringerLink (optional: Web of Science; Google Scholar only for
complementary searches)

• Each search engine of each DB has speciﬁc peculiarities, and you may
need to adapt your search string to the input interface of the DB
• Search engines may vary across time (replication of the study is not
always possible), and this can happen also between the start of your
study and its publication!

• The primary search strategy needs to be complemented with a
secondary search, as you are never sure that you collected ALL relevant
studies

Protocol - Primary Search
Strategy
<<software AND (product line OR product lines OR product family OR product families) AND
(variability OR variation OR variant)>>
Search String
syntactically identical search strings for all the searched databases.
We performed several test searches with different search engines
digital libraries of software engineering literature. The results from
the test searches were continuously discussed in order to refine the
search string until we were fully satisfied with the capability of the
search string to bring the required material.
an Endnote library where all duplicates were removed by using
the duplicate removal feature of Endnote. This was followed by a
series of manual checks to ensure no duplicates remained in the li-
brary. Each downloaded paper had an entry in the Endnote library
containing the publication title, abstract, author(s), source and date
stored. We did not restrict our search based on publication year.
Table 2
Data sources used and their respective search strings.
ID Library Search string
1 IEEExplore (((software)<in>ti)<or>((software)<in>ab))<and>(((product line<or>product family)<in>ti)<or>((product line<or>product
family)<in>ab))<and>(((variability<or>variant<or>variation)<in>ti)<or>((variability<or>variant<or>variation)<in>ab))
2 ACM Digital library String 1: +abstract:’’product line’’ +abstract:vari⁄
String 2: +abstract:’’product family’’ +abstract:vari⁄
String 3: title:’’product family’’ title:’’product line’’
3 Citeseer library (Google) Software (‘‘product line’’ OR ‘‘product lines’’ OR ‘‘product family’’ OR ‘‘product families’’) (variability OR variation OR variant)
4 ScienceDirect TITLE-ABSTR-KEY(software AND (‘‘product line⁄’’ OR ‘‘product famil⁄’’)) and TITLE-ABSTR-KEY(variability OR variation OR variant)
5 EI compendex/inspec (((software AND (‘‘product line⁄’’ OR ‘‘product famil⁄’’)) WN KY) AND ((variability OR variation OR variant) WN KY)), English only
6 SpringerLink su:(software) AND (su:(‘‘product line’’) OR su:(‘‘product family’’)) AND su:(variability OR variation OR variant)
7 Web of science TS=((software AND (‘‘product line⁄’’ OR ‘‘product famil⁄’’)) AND (variability OR variation OR variant))
4 L. Chen, M. Ali Babar / Information and Software Technology xxx (2011) xxx–xxx
Search String formatted for different databases
cf. Chen & Ali Babar, 2011 https://doi.org/10.1016/j.infsof.2010.12.006

Protocol - Deﬁne Search String
• Derive main terms from the RQs based on the human subjects,
artefact, methods or technologies, software engineering task,
focus (NOTE: spatial scope and temporal scope are often not part of
the string)

• ︎Determine and include synonyms, related terms, and alternative
spelling for main terms;

• Check the keywords in any relevant papers you already know, and
initial searches on the relevant databases;

• ︎Incorporate alternative spellings and synonyms using Boolean ‘‘OR’’;

• ︎Link main terms using Boolean ‘‘AND’’.
not rigorous, but common, acceptable approach
user, involvement, software development
e.g., user ~ customer, consumer, end-user, end user
e.g., involvement ~ involv*, participat*, contribut*, UX
(user OR customer OR …) AND (involv* OR participat* OR …) AND (…)

Protocol - Define Search String
cf. Zhang et al. https://doi.org/10.1016/j.infsof.2010.12.010
more rigorous approach
research and social science [11,29]. Nevertheless, it is not possible
to have ‘gold standard’ for most SLRs in SE. Accordingly, this paper
introduces the concept of ‘quasi-gold standard’ that is a set of
known studies from related publication venues on a research topic.
2.3. State of the practice
Since the introduction of EBSE and SLR, the number of SLRs in SE
has been growing rapidly. This subsection briefly summarizes the
state-of-the-practice of search strategies in EBSE from the above
aspects.
studies identified by other SLRs, such as [18].
Methodologically, the manual and automated searches were
independently used in the existing SLRs in SE. Even for the SLRs
using the both, they designed their manual and automated search
respectively and simply combined the results from the both meth-
ods. These SLRs neither discover any relationships between the
search methods (i.e. automated and manual), nor established the
methodological linkage between and integration of them.
2.3.2. Search engines (digital libraries) and search venues
Table 1a summarizes 11 search engines (digital libraries) used
more than once in SLRs for searching relevant studies in SE, which
are ranked in the order of their frequencies. Among them, IEEE
Xplore and ACM DigitalLibrary are the main search portals for most
SLRs in SE. Table 1b lists top venues for manual search used twice
or more in SLRs. The venues related to SE in general (e.g., IEEE Soft-
ware, TSE, ICSE) and empirical software engineering (e.g., ESEM, IS-
ESE) were most consulted in manual search in the existing SLRs.
In addition, some SLRs employed more specific sources for their
literature search, such as BESTweb [17] and university library ser-
vices, which are dedicated to one subject in SE but sometimes not
accessible for external researchers.
3. QGS-based systematic search approach
Based on the concept of quasi-gold standard (QGS), this section
constructs a systematic, repeatable, and practical literature search
approach for SE, which provides a mechanism for search strategy
development and evaluation.Fig. 2. Search sensitivity, precision, and gold standard.
standard, there are two more constraints associated with a ‘quasi-
gold standard’: venues (where) and period (time span). In other
words, a ‘quasi-gold standard’ can be regarded as a ‘gold standard’
in the conditions where these constraints apply. Accordingly, a
more objective method for devising and testing search strategies
is developed and integrated into a systematic search process,
which may rely on an analysis of information from the available
records (QGS) rather than subjective input from searchers’ percep-
tions (like many SLRs did). On the other hand, for the subjective
approach of search string design, QGS can also be used for evaluat-
ing the search strategy (see Section 4).
The systematic literature search process proposed in this paper
is composed of five steps.
3.2.1. Step 1: Identify related venues and databases
The literature search process starts with the identification of the
relevant publication venues. In SE, many digital libraries are avail-
able for automated search, and even more venues for manual
search.
Select publication venues for manual search. Research questions
for an SLR are motivated by the research in a particular subject
Fig. 3. Mechanism underpinning the systematic search approach. Fig. 4. Workflow of the proposed systematic search process.
subjective evaluation highly relies on their personal knowledge in a
specific domain, which is difficult to be quantified. Apart from the
subjective approach, objective evaluation employs a set of quanti-
tative criteria to assess performance of a search strategy.
Sensitivity and precision. Two important criteria borrowed from
medicine can be used for evaluating the quality and efficiency of
a search strategy. Sensitivity for a given topic is defined as the pro-
portion of relevant studies retrieved for that topic; and precision is
the proportion of retrieved studies that are relevant studies. Fig. 2
shows different search strategies within search universe and the
relation with gold standard (being explained in this subsection).
In automated search, given search strings, the selected search
engine (library) retrieves a certain number of results (studies).
Then the sensitivity and precision corresponding to the search
strings and engine can be calculated as:
Sensitivity ¼
Number of relevant studies retrieved
Total number of relevant studies
100% ð1Þ
Precision ¼
Number of relevant studies retrieved
Number of studies retrieved
100% ð2Þ
Gold standard. The ‘gold standard’ represents, as accurately as possi-
ble, the known set of identified primary studies in a collection
according to the definition of research questions in an SLR. Gold
standard normally plays two distinct roles in the evaluation frame-
work. For SLRs, it is assumed to be truth in appraising the sensitivity
of a search strategy; it is also a source of training samples for refin-
ing search strings [29]. In practice, it may be appropriate to bifur-
Table 1
Search engines and venues.
Rank Search engine # of SLRs % of SLRs
(a) Search engines used more than once
1 IEEE Xplore 24 92
2 ACM digital library 21 81
3 ScienceDirect 15 58
4 ISI Web of Science 10 38
5 EI Compendex 9 35
6 SpringerLink 8 31
6 Wiley InterScience 8 31
6 Inspec 8 31
9 Google Scholar 6 23
10 SCOPUS 2 8
10 Kluwer 2 8
(b) Search venues used more than once
1 IEEE Software 4 27
1 ESEM 4 27
1 ISESE 4 27
4 TSE 3 20
4 ICSE 3 20
4 JSS 3 20
4 IEEE Computer 3 20
8 Metrics 2 13
8 TOSEM 2 13
8 ESE 2 13
8 WWW 2 13
8 ICSM 2 13
8 MISQ 2 13
H. Zhang et al. / Information and Software Technology 53 (2011) 625–637 627
quasi À sensitivity
P 80%; then; move forward . . . ;
< 80%; then; go back to Step 3:

ð3Þ
engineering was selecte
performed a manual s
span from Jan 2004 thro
of secondary study, the
It retrieved 34 candidat
identified as relevant st
4.2. Search implementat
4.2.1. Search venues and
At the manual searc
conferences) related to
EBSE. By carefully cons
SE community with refe
selected by the first tw
Table 2
Search strategy scales.
Strategy Sensitivity
(%)
Precision
(%)
Comments
High sensitivity 85–90 7–15 Max sensitivity despite low
precision
High precision 40–58 25–60 Max precision rate despite low
recall
Optimum 80–99 20–25 Maximize both sensitivity
precision
Acceptable 72–80 15–25 Fair sensitivity precision
630 H. Zhang et al. / Information and Software Technology 53 (2011) 625–

Protocol - Primary Search
Strategy
1. Describe how you have defined the search string

2. Define the scope in terms of years (in some cases you may want only recent studies)

3. Describe the DB (aka digital libraries) you have selected and WHY—normally
because they are common, refer to Kitchenham, 2007 https://go.aws/2TK4SN3 and
you’re safe

• Annotate the number of studies retrieved from each DB with the final search
string, you will need that for reporting!

4. Screen Title and Abstract and exclude clearly irrelevant studies

5. If a study seems relevant, download full text in your Reference Management
System (e.g., Zotero, EndNote)

6. If more than one person is involved, decide a way to resolve undecided cases
Always keep track of the numbers of papers you get,
and annotate the date in which you performed the search

Example: SpringerLink
4,097 papers!
…let’s reﬁne
and we did not use
synonyms!

4,097 papers!
…let’s reﬁne
and we did not use
synonyms!
Software Engineering,
Article and
Conference
…still a lot,
and we may
have lost some
papers!

4,097 papers!
…let’s reﬁne
and we did not use
synonyms!
Software Engineering,
Article and
Conference
…still a lot,
and we may
have lost some
papers!
from a certain page onwards
results are not relevant
I have to screen titles and identify the
page from which results are irrelevant;
I have to justify and document the
decision

Example: IEEE Xplore
Only 86 papers!
Here synonyms may help
Always remember to balance coverage with resources…
for any question, in the end, few papers are REALLY relevant

Protocol - Secondary
Search Strategy
replicable (as far as possible):
• The review must be documented in sufficient detail for readers to be able to
assess the thoroughness of the search.
• The search should be documented as it occurs and changes noted and justified.
• The unfiltered search results should be saved and retained for possible reanalysis.
Procedures for documenting the search process are given in Table 2.
Table 2 Search process documentation
Data Source Documentation
Digital Library Name of database
Search strategy for the database
Date of search
Years covered by search
Journal Hand Searches Name of journal
Years searched
Any issues not searched
Conference proceedings Title of proceedings
Name of conference (if different)
Title translation (if necessary)
Journal name (if published as part of a journal)
Efforts to identify
unpublished studies
Research groups and researchers contacted (Names and contact details)
Research web sites searched (Date and URL)
Other sources Date Searched/Contacted
URL
Any specific conditions pertaining to the search
Researchers should specify their rationale for:
• The digital libraries to be searched.
• The journal and conference proceedings to be searched.
Primary
Secondary
You have to show that you have done your best to identify ALL the studies

Protocol - Inclusion/Exclusion Criteria
You DO NOT NEED to carefully inspect the full-text to apply them
17:8 F. Ul Muram et al.
Table II. Inclusion and Exclusion Criteria
Type Description
Inclusion I1 Study is internal to software domain. We are only interested in consistency
checking for software systems.
I2 Study is about consistency checking related to software behavioral
models/diagrams.
I3 Study comes from an acceptable source such as a peer-reviewed scientific journal,
conference, symposium, or workshop.
I4 Study reports issues, problems, or any type of experience concerning software
behavioral model consistency.
I5 Study describes solid evidence on software behavioral model consistency
checking, for instance, by using rigorous analysis, experiments, case studies,
experience reports, field studies, and simulation.
Exclusion E1 Study is about hardware or other fields not directly related to software.
E2 Study is not clearly related to at least one aspect of the specified research
questions.
E3 Study reports only syntactic or structural consistency checking of
models/diagrams.
E4 Secondary literature reviews.
E5 Study does not present sufficient technical details of consistency checking related
to software behavioral models (e.g., they have a different focus (i.e., version
control) and have insufficient detail)
E6 Study did not undergo a peer-review process, such as non-reviewed journal,
magazine, or conference papers, master theses, books, and doctoral dissertations
(in order to ensure a minimum level of quality).
E7 Study is not in English.
E8 Study is a shorter version of another study which appeared in a different source
(the longer version will be included).
Example from https://doi.org/10.1145/3037755
But the Pilot Search is crucial to define these criteria!

Protocol - Quality Criteria
• Quality assessment can be used to exclude papers, or just to
score them (some evidence may be taken only from a subset of
high-quality studies)

• You need to deﬁne a checklist of quality criteria and apply them

• Criteria to be applied depend on the type of study
considered

• Quality of an Experiment is diﬀerent than quality of a Case
Study

• Extensive guidelines in https://go.aws/2TK4SN3
You NEED to inspect the full-text

Protocol - Quality Criteriaif they
reading
he full
riteria;
ing the
ation of
the full
evalu-
tematic
review,
n order
atically
alid.
onduct
in Appendix B of [20] for our study. A summary of the quality assess-
ment criteria is presented in Table 3. We present the findings from
assessing the quality of the reported evaluation studies using the
1. Study selection process.
Table 3
Quality criteria used in this review (taken from [20] without any changes).
1. Is the paper based on research (or is it merely a ‘‘lessons learned’’ report
based on expert opinion)?
2. Is there a clear statement of the aims of the research?
3. Is there an adequate description of the context in which the research was
carried out?
4. Was the research design appropriate to address the aims of the research?
5. Was the recruitment strategy appropriate to the aims of the research?
6. Was there a control group with which to compare treatments?
7. Was the data collected in a way that addressed the research issue?
8. Was the data analysis sufficiently rigorous?
9. Has the relationship between researcher and participants been considered
to an adequate degree?
10. Is there a clear statement of findings?
11. Is the study of value for research or practice?
Give one point for each question
Exclude papers with less than X points (justify X,
you should understand which is the right X
after scoring the papers)
Generic criteria for empirical studies
cf. https://doi.org/10.1016/j.infsof.2010.12.006
Tailor to your context!

Protocol - Quality Criteria
Generic and differentiated
5. Ethnographic studies
6. Action research
7. Experiments
The studies will be evaluated for their reporting as that is the only mean of quality assessment available to us. We
will use the quality checklist provided by Barbara Kitchenham in the guidelines [1] and also the one developed by
one of the team member (Muneera Bano) while conducting another SLR [2] [3].
Quality Checklist
Generic
Are the aims clearly stated? YES/NO
Are the study participants or observational units adequately described? YES/NO/PARTIAL
Was the study design appropriate with respect to research aim? YES/NO/PARTIAL
Are the data collection methods adequately described? YES/NO/PARTIAL
Are the statistical methods justified by the author? YES/NO
Is the statistical methods used to analyze the data properly described and referenced? YES/NO
Are negative findings presented? YES/NO/PARTIAL
Are all the study questions answered? YES/NO
Do the researchers explain future implications? YES/NO
Survey
Was the denominator (i.e. the population size) reported? YES/NO
Did the author justified sample size? YES/NO
Is the sample representative of the population to which the results will generalize? YES/NO
Have “drop outs” introduced biasness on result limitation? YES/NO/NOT
APPLICABLE
Experiment
Were treatments randomly allocated? YES/NO
If there is a control group, are participants similar to the treatment group participants in
terms of variables that may affect study outcomes?
YES/NO
Could lack of blinding introduce bias? YES/NO
Are the variables used in the study adequately measured (i.e. are the variables likely to be
valid and reliable)?
YES/NO
Case Study
Is case study context defined? YES/NO
Are sufficient raw data presented to provide understanding of the case? YES/NO
Is the case study based on theory and linked to existing literature? YES/NO
Are ethical issues addressed properly (personal intentions, integrity issues, consent, review
board approval)?
YES/NO
Is a clear Chain of evidence established from observations to conclusions? YES/NO/PARTIAL
Experience Report
Is the focus of study reported? YES/NO
4. Experience reports
5. Ethnographic studies
6. Action research
7. Experiments
The studies will be evaluated for their reporting as that is the only mean of quality assessment available to us. We
will use the quality checklist provided by Barbara Kitchenham in the guidelines [1] and also the one developed by
one of the team member (Muneera Bano) while conducting another SLR [2] [3].
Quality Checklist
Generic
Are the aims clearly stated? YES/NO
Are the study participants or observational units adequately described? YES/NO/PARTIAL
Was the study design appropriate with respect to research aim? YES/NO/PARTIAL
Are the data collection methods adequately described? YES/NO/PARTIAL
Are the statistical methods justified by the author? YES/NO
Is the statistical methods used to analyze the data properly described and referenced? YES/NO
Are negative findings presented? YES/NO/PARTIAL
Are all the study questions answered? YES/NO
Do the researchers explain future implications? YES/NO
Survey
Was the denominator (i.e. the population size) reported? YES/NO
Did the author justified sample size? YES/NO
Is the sample representative of the population to which the results will generalize? YES/NO
Have “drop outs” introduced biasness on result limitation? YES/NO/NOT
APPLICABLE
Experiment
Were treatments randomly allocated? YES/NO
If there is a control group, are participants similar to the treatment group participants in
terms of variables that may affect study outcomes?
YES/NO
Could lack of blinding introduce bias? YES/NO
Are the variables used in the study adequately measured (i.e. are the variables likely to be
valid and reliable)?
YES/NO
Case Study
Is case study context defined? YES/NO
Are sufficient raw data presented to provide understanding of the case? YES/NO
Is the case study based on theory and linked to existing literature? YES/NO
Are ethical issues addressed properly (personal intentions, integrity issues, consent, review
board approval)?
YES/NO
Is a clear Chain of evidence established from observations to conclusions? YES/NO/PARTIAL
Experience Report
Is the focus of study reported? YES/NO
Quality Checklist
Generic
Does the author report personal observation? YES/NO
Is there a link between data, interpretation and conclusion? YES/NO/PARTIAL
Does the study report multiple experiences? YES/NO
Some of the checklist items will be graded on yes/no and few with partially. Scores will also be assigned according
the grades, 1 for Yes, 0 for No and 0.5 for Partial. The total sum of the scores will be used for the quality
assessment of studies.
potential customers in e-commerce developments? In
Proceedings of the Informing Science and Information
Technology Education Joint Conference 2004. Santa
Rosa, CA: Informing Science Institute, 2004, pp. 663–672
S87 Wu, J.B., and Marakas, G.M. The impact of operational
user participation on perceived system implementation
success: An empirical investigation. Journal of Computer
Information Systems, 46, 5 (2006), 127–140
Appendix B. Quality assessment checklist
The scoring to the checklist was based on the three potential
answers to the questions; yes = 1, partial = 0.5 and no = 0. If any
of the criteria was not applicable on any study then it was excluded
from evaluating for only that particular study. The studies that
scored less than 50% in quality assessment were excluded as they
were not providing the basic information about their research
methodology.
# Generic
1 Are the aims clearly stated? YES/NO
2 Are the study participants or
observational units adequately
described?
YES/NO/
PARTIAL
Re
[1
[2
[3
cf. Bano and Zowghi, 2015 https://doi.org/10.1016/j.infsof.2014.06.011

Protocol





Protocol - Data Collection
• Extractor and Checker Name

• Demographic Data

• Evaluation Information

• Classiﬁcation-related Data

• Content-related Data
If you have many papers
you may need to share the workload,
and to double check!
You can use an Excel/Google Sheet File
You can use TAGS in Zotero

Demographic Data
• Study ID (You SHALL assign a unique ID to each selected
study)

• Authors, Title, Year, Keywords

• Publication Type (Journal, Conference, Workshop)

• Publication Venue (IEEE Transactions on SE, EMSE, etc.)

• DOI (Digital Object Identiﬁer)

Evaluation Information
d with a
urnal of
ment in
d to deal
opment
eview, a
s of the
tudy re-
detailed
eparate
roaches
ological
els used,
he study
d 91 ap-
he vari-
pproach
VM ap-
relative
proaches used in the reviewed studies based on the work of Glass
[30], Shaw [28] and Carmen et al. [29]. The scheme for categorizing
evaluation methods used in this review is shown in Table 8.
Table 9 presents the kinds of approaches used to evaluate the
VM approaches reported in the reviewed papers. It is evident that
‘‘example application’’ is the most frequently used means of evalu-
ation followed by ‘‘experience reports’’ and ‘‘case studies.’’ Other
evaluation approaches used are ‘‘laboratory experiment with soft-
ware subjects’’, ‘‘laboratory experiment with human subjects’’, ‘‘field
No. of
studies
33
25
8
6
5
4
4
3
3
2
2
1
1
Table 8
The scheme for categorizing the evaluation approaches designed and used for this
review.
RA – Rigorous analysis
Rigorous derivation and proof, suited for formal model [28]
CS – Case study
An empirical inquiry that investigates a contemporary phenomenon within
its real-life context; when the boundaries between phenomenon and
context are not clearly evident; and in which multiple sources of evidence
are used [31]
DC – Discussion
Provided some qualitative, textual, opinion-oriented evaluation. e.g., compare
and contrast, oral discussion of advantages and disadvantages [29]
EA – Example application
Authors describing an application and provide an example to assist in the
description, but the example is ‘‘used to validate’’ or ‘‘evaluate’’ as far as
the authors suggest [28]
EP – Experience
The result has been used on real examples, but not in the form of case studies
or controlled experiments, the evidence of its use is collected informally or
formally [28]
FE – Field experiment
Controlled experiment performed in industry settings [32]
LH – Laboratory experiment with human subjects
Identification of precise relationships between variables in a designed
controlled environment using human subjects and quantitative
techniques [33]
LS – Laboratory experiment with software subjects
A laboratory experiment to compare the performance of newly proposed
system with other existing systems [30]
SI – Simulation
Execution of a system with artificial data [33], using a model of the real word
[34]
tic review of evaluation of variability management approaches in software product lines,
Type of Study (you can also use the ABC classification for the types of study)
cf. https://doi.org/10.1016/j.infsof.2010.12.006
Industrial Evaluation
• NO: Not evaluated in industrial settings 
• LAB: Industrial problem treated  
in laboratory settings 
• IND: Industrial problem validated  
with industrial experts 
• DEV: Development of an industrial product 
Authorship
• A: only academic authors 
• I: only industrial authors 
• AI: academic and industrial authors

Classiﬁcation-related Data
• Normally related to the method and context of the paper (not the RESULTS)
• Classes may be related to the usual list:


• Methods or technologies: artiﬁcial intelligence, SCRUM development,
questionnaires




Depend on your problem and your research questions

• Topic of the SLR: Artiﬁcial intelligence technologies for
test generation
• Classes of Technologies: AI Planner Approach, Simulated
Annealing, Tabu Searching, Genetic Algorithm, Ant Colony
Optimization (ACO)

• Classes of Systems: embedded, desktop software, website,
mobile app, internet of things

• Types of Testing: database testing, functional testing, GUI
testing, application testing, usability testing, security testing,
integration testing
Example

• Topic of the SLR: Artiﬁcial intelligence technologies for
test generation
• Classes of Technologies: AI Planner Approach, Simulated
Annealing, Tabu Searching, Genetic Algorithm, Ant Colony
Optimization (ACO)

• Classes of Systems: embedded, desktop software, website,
mobile app, internet of things

• Types of Testing: database testing, functional testing, GUI
testing, application testing, usability testing, security testing,
integration testing
Example
Already understanding what do these
different terms mean is a learning experience

Content-related Data
• Normally related to the FINDINGS/RESULTS of the
reviewed paper

• I have to write down some information extracted directly
from the paper, depending on the RQs

• It can be quantitative (e.g., performance obtained with a
method)

• It can be qualitative (e.g., extract excerpts of the text
about qualitative properties, such as the observed impact
of user involvement)

Protocol - Data Analysis
• In principle you should apply quantitative meta-analysis
methods to assess the quantitative data, however,
quantitative results are rarely presented uniformly across
papers in SE

• Therefore, data analysis for all quantitative data (both
content-related and others) is normally based on
descriptive statistics
• Data analysis for qualitative data is normally based on
qualitative methods (e.g., thematic analysis), oriented to
derive a theory
We’ll see this through an example case
in the reporting part of this presentation
Quantitative meta-analysis: combines the results
of a number of different reports into one report
to create a single, more precise estimate of an effect

Structure of a SLR Report
Table 8 Structure and Contents of Reports of Systematic Reviews
Section Subsection Scope Comments
Title* The title should be short but informative. It should be based on the
question being asked. In journal papers, it should indicate that the study is
a systematic review.
Authorship* When research is done collaboratively, criteria for determining both who
should be credited as an author, and the order of author’s names should be
defined in advance. The contribution of workers not credited as authors
should be noted in the Acknowledgements section.
Context The importance of the research
questions addressed by the review.
Objectives The questions addressed by the
systematic review.
Methods Data Sources, Study selection, Quality
Assessment and Data extraction.
Results Main finding including any meta-
analysis results and sensitivity
analyses.
Executive summary
or Structured
Abstract*
Conclusions Implications for practice and future
research.
A structured summary or abstract allows readers to assess quickly the
relevance, quality and generality of a systematic review.
Background Justification of the need for the
review.
Summary of previous reviews.
Description of the software engineering technique being investigated and
its potential importance.
Review questions Each review question should be
specified.
Identify primary and secondary review questions. Note this section may be
included in the background section.
Data sources and search
strategy
Study selection
Study quality assessment
Data extraction
Review Methods
Data synthesis
This should be based on the research protocol. Any changes to the original
protocol should be reported.
Included and
excluded studies
Inclusion and exclusion criteria.
List of excluded studies with rationale
for exclusion.
Study inclusion and exclusion criteria can sometimes best be represented
as a flow diagram because studies will be excluded at different stages in
the review for different reasons.
42

Structure of a SLR Report
Findings Description of primary studies.
Results of any quantitative summaries
Details of any meta-analysis.
Results
Sensitivity analysis
Non-quantitative summaries should be provided to summarise each of the
studies and presented in tabular form.
Quantitative summary results should be presented in tables and graphs.
Discussion Principal findings These must correspond to the findings discussed in the results section.
Strengths and Weaknesses Strengths and weaknesses of the
evidence included in the review.
Relation to other reviews, particularly
considering any differences in quality
and results.
A discussion of the validity of the evidence considering bias in the
systematic review allows a reader to assess the reliance that may be placed
on the collected evidence.
Meaning of findings Direction and magnitude of effect
observed in summarised studies.
Applicability (generalisability) of the
findings.
Make clear to what extent the results imply causality by discussing the
level of evidence.
Discuss all benefits, adverse effects and risks.
Discuss variations in effects and their reasons (for example are the
treatment effects larger on larger projects).
Practical implications for software
development.
What are the implications of the results for practitioners?Conclusions Recommendations
Unanswered questions and
implications for future research.
Acknowledgements* All persons who contributed to the
research but did not fulfil authorship
criteria.
Conflict of Interest Any secondary interest on the part of the researchers (e.g. a financial
interest in the technology being evaluated) should be declared.
References and
Appendices
Appendices can be used to list studies included and excluded from the
study, to document search strategy details, and to list raw data from the
included studies.
Never Forget this!

Reporting - Search
Summary of secondary searches step 4 (PS: primary searches, SS: secondary searches).
Ref. # of
studies
Time span
covered
Missing in
our results
Missing from their review Overlapping
[1] 22 1959–1981 S60 NA (citations prior to 1980) PS ? S26
[2] 19 1982–1992 S62, S65, S66, S26, S27, S28, S29, S30, S39, S46, S51, S52 PS ? S2, S5, S15, S31, S32
SS ? S40, S43, S49
[3] 82 1974–2007 From S59 to S87 S1, S4, S7, S8, S9, S10, S11, S13, S14, S16, S17, S18, S19, S22, S27,
S28, S29, S30, S32, S33, S34, S35, S38, S39, S40, S41, S42, S43, S44,
S45, S46, S48, S49, S51, S53, S56, S58
PS ? S2, S3, S5, S12, S15, S26, S31, S36, S37
SS ? S47, S50, S52, S57
Total 29 44 19
Fig. 1. SLR execution process.
Draw a diagram of your search process, WITH NUMBER of PAPERS

Reporting - Statistics
Fig. 4. Summary of characteristics of included studies (decade, research method, ERA rank).
count
S39 J.D. Gould and C. Lewis, ‘‘Designing for usability: key principles and what designers think,’’ Communications of the ACM,
vol. 28, no. 3, pp. 300–311, 1985
1417
S47 J. Hartwick and H. Barki, ‘‘Explaining the role of user participation in information system use,’’ Management Science, pp. 440–465, 1994 1191
S2 J.J. Baroudi, M.H. Olson, and B. Ives, ‘‘An empirical study of the impact of user involvement on system usage and information satisfaction,’’
Communications of the ACM, vol. 29, no. 3, pp. 232–238, 1986
783
S34 H. Barki and J. Hartwick, ‘‘Measuring user participation, user involvement, and user attitude,’’ MIS Quarterly, pp. 59–82, 1994 697
S65 Jarvenpaa, S.L., and Ives, B. Executive involvement and participation in the management of information technology.
MIS Quarterly, 15, 2 (June 1991), 205–227
589
Classiﬁcation based on DEMOGRAPHIC and EVALUATION information
(year, type of study, and quality rank—A*, A, etc.)

Reporting - Statistics
Fig. 6. Relationship of user involvement and system succes.
Table 5
Top ten journals in results of SLR.
156 M. Bano, D. Zowghi / Information and Software Technology 58 (2015) 148–169
Classiﬁcation based on CONTENT-related data
(Positive, Negative or Uncertain impact)

Reporting - Theory
Table 6
Benefits of user involvement.
Benefits of user
involvement
Description Extracted from following studies Freq
(N = 87)
Benefits from psychological
perspective
User system satisfaction Users will favor a system more if they are
involved in its development and feel satisfied
with using it
S3, S13, S16, S20, S21, S27, S33, S34,
S35, S37, S38, S45, S46, S52, S59, S63,
S65, S67, S68, S71, S83, S84
23
User system acceptance Users approve that the system is developed
according to their workplace needs and
requirements
S4, S11, S13, S38, S40, S43, S46, S64,
S87
9
Facilitating change Involved users will not resist using a new
system in their work environment
S5, S12, S69, S71, S72 6
Better user’s attitude
towards system
Involved users will show positive attitude
when using the system
S5, S12, S69, S71, S72 5
Increasing perceived
relevance to the system by
users
Involved users considered themselves more
informed about the system and think that the
system is relevant
S12 1
Increasing user motivation Involved users will be more motivated to use
the system
S16 1
Increasing customer loyalty Involved users will have higher degree of trust
in the development team
S21 1
Assist in maintaining long
term relationship with
users
Involved users will have more interaction with
the development team. This helps maintain
long term relationships between users/
customers and development team
S21 1
Benefits from managerial
perspective
Better communication User involvement will lead to increase in
interaction between users and development
team and will facilitate more effective
communication
S10, S12, S25, S55, S58, S77 6
Improved Management
Practice
By involving the users in the development, the
management will face less resistance by giving
the users sense of dignity of knowing that they
are important for the system
S16, S29 2
Developing realistic
expectation
Users will have a more informed idea of the
features of the system being developed
S32, S56 2
Reducing cost of the system Decreasing the risk of too many changes after
implementation by involving users in the
project
S43, S52 2
Helping in conflict
resolution
User involvement can help resolve
disagreements that may arise between users
and developing teams
S32 1
Benefits from methodological
perspective
Better understanding of
user requirements
Eliciting more accurate requirements from the
users of the systems
S8, S10, S14, S16, S21, S22, S37, S38,
S41, S43, S45, D46, S50, S57, S64, S70,
S71, S75, S79, S83
20
158 M. Bano, D. Zowghi / Information and Software Technology 58 (2015) 148–169
perspective involved in its development and feel satisfied
with using it
S35, S37, S38, S45, S46, S52, S59, S63,
S65, S67, S68, S71, S83, S84
User system acceptance Users approve that the system is developed
according to their workplace needs and
requirements
S4, S11, S13, S38, S40, S43, S46, S64,
S87
9
Facilitating change Involved users will not resist using a new
system in their work environment
S5, S12, S69, S71, S72 6
Better user’s attitude
towards system
Involved users will show positive attitude
when using the system
S5, S12, S69, S71, S72 5
Increasing perceived
relevance to the system by
users
Involved users considered themselves more
informed about the system and think that the
system is relevant
S12 1
Increasing user motivation Involved users will be more motivated to use
the system
S16 1
Increasing customer loyalty Involved users will have higher degree of trust
in the development team
S21 1
Assist in maintaining long
term relationship with
users
Involved users will have more interaction with
the development team. This helps maintain
long term relationships between users/
customers and development team
S21 1
Benefits from managerial
perspective
Better communication User involvement will lead to increase in
interaction between users and development
team and will facilitate more effective
communication
S10, S12, S25, S55, S58, S77 6
Improved Management
Practice
By involving the users in the development, the
management will face less resistance by giving
the users sense of dignity of knowing that they
are important for the system
S16, S29 2
Developing realistic
expectation
Users will have a more informed idea of the
features of the system being developed
S32, S56 2
Reducing cost of the system Decreasing the risk of too many changes after
implementation by involving users in the
project
S43, S52 2
Helping in conflict
resolution
User involvement can help resolve
disagreements that may arise between users
and developing teams
S32 1
Benefits from methodological
perspective
Better understanding of
user requirements
Eliciting more accurate requirements from the
users of the systems
S8, S10, S14, S16, S21, S22, S37, S38,
S41, S43, S45, D46, S50, S57, S64, S70,
S71, S75, S79, S83
20
Improving quality of
resultant application
By involving the users the non functional
aspects of the system such as functional
suitability, reliability, usability, performance,
efficiency, compatibility, security,
maintainability and portability can be elicited
which may not have been expressed explicitly
hence improving the quality of the system
S11, S26, S27, S36, S37, S38, S40, S52,
S57, S68, S70, S71, S77, S79, S83, S87
16
Improving quality of design
decisions
Based on the level of users understanding,
skills and their workplace environment the
decisions for the design of the system will be
better informed
S6, S9, S11, S40, S41, S46, S52, S64,
S65, S69, S77, S83
12
Helping in overcoming in
implementation failures
When users are part of the testing,
implementation and installation of the system,
this can reduce the number of failures
S31 1
Theory based on CONTENT-related data

Reporting - Discussion
• Your SLR must be COMPLETE (all papers have been considered)

• Provide arguments on the completeness of the study

• Your SLR must be INFORMATIVE and contribute to the body of knowledge:

• What do we know that we did not know before (for each RQ considered)? Compare with existing
studies and highlight what is the contribution in terms of knowledge

• Your SLR must be USEFUL (implications for practice and research)

• Who can profit from this SLR?

• Is research in the field exhaustive, and can the results be transferred to practice?

• How can the results be used by researchers and practitioners?

• e.g., now that I know that user involvement has certain benefits and certain drawbacks, when
should I involve users?;

• now that I know that certain AI methods are never used for testing, what should I do? —
researchers should explore those methods, practitioners should apply more consolidated
ones

Reporting - Threats To Validity
• Validity of Literature Search: arguments about the eﬀort that you
made to identify ALL the studies

• Validity of Study Selection: arguments on how you reduced the
human bias and you reached some form of objectivity in the initial
screening

• Validity of Data Collection and Analysis: arguments on the soundness
of your classiﬁcation schemes (based on previous literature), how
objectivity was achieved on qualitative data extraction — reviewer and
checker roles, and how your data were stored and retrieved

• Validity of Data Synthesis and Visualisation: argument on the fact
that your statistics are correct, illustrative, appropriately respond to the
RQs, and correctly derived from the data
WARNING: There is a lot of confusion in the threats to validity for SLRs
This is a reasonable list, based on work from Liping Zhao, Manchester, UK

Systematic Mapping
Studies (SMS)

In ONE Slide
• Follow the same process as for SLRs, BUT…

• Quality Criteria are not strictly necessary, and do not need to be
too detailed (you can discard low quality papers based on CORE
Ranking of the venue, http://www.core.edu.au/conference-portal)

• Data Collection does not consider content-related data, only
classiﬁcation, evaluation and demographic data

• Data Analysis may include a Theory, but just in terms of classes
of papers and relations (i.e., a Descriptive Theory, for example:
most of the papers on AI and testing are not evaluated in
industrial contexts—because few “case studies” are identiﬁed in
the literature)

Summary: SLR and SMS
• SLRs and SMSs are normally carried out by PhD students at the
beginning of their PhD carrier

• The goal is to learn a research ﬁeld, learn to evaluate
publications, learn to be systematic, learn the pain of research,
learn that YOU KNOW NOTHING
• …and understand if you want to go ahead
• SLRs and SMS are often useful for other researchers to motivate
their studies (research gap), so they can lead to a lot of citations

• …but if you publish a SLR or a SMS, everyone will think that you
are just searching success (citations) and not knowledge (but
you* will get both)
*and your supervisor(s)

Systematic Literature Reviews and Systematic Mapping Studies

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Systematic Literature Reviews and Systematic Mapping Studies

Similaire à Systematic Literature Reviews and Systematic Mapping Studies (20)

Plus de alessio_ferrari

Plus de alessio_ferrari (7)

Dernier

Dernier (20)

Systematic Literature Reviews and Systematic Mapping Studies