SlideShare une entreprise Scribd logo
1  sur  90
Télécharger pour lire hors ligne
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Tanja E.J. Vos
Centro de Investigación en Métodos de Producción de Software
Universidad Politecnica de Valencia
Valencia, Spain
tvos@pros.upv.es
1
European initiatives where academia
and industry get together.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Overview	
  
•  EU	
  projects	
  (with	
  SBST)	
  that	
  I	
  have	
  been	
  coordina;ng:	
  
–  EvoTest	
  (2006-­‐2009)	
  
–  FITTEST	
  (2010-­‐2013)	
  
•  What	
  is	
  means	
  to	
  coordinate	
  them	
  and	
  how	
  they	
  are	
  structured	
  
•  How	
  do	
  we	
  evaluate	
  their	
  results	
  through	
  academia-­‐industry	
  
projects.	
  
2
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

EvoTest	
  
•  Evolutionary Testing for Complex Systems
•  September 2006– September 2009
•  Total costs: 4.300.000 euros
•  Partners:	
  
–  Universidad	
  Politecnica	
  de	
  Valencia	
  (Spain)	
  
–  University	
  College	
  London	
  (United	
  Kingdom)	
  
–  Daimler	
  Chrysler	
  (Germany)	
  
–  Berner	
  &	
  MaVner	
  (Germany)	
  
–  Fraunhofer	
  FIRST	
  (Germany)	
  
–  Motorola	
  (UK)	
  
–  Rila	
  Solu;ons	
  (Bulgaria)	
  
•  Website	
  does	
  not	
  work	
  anymore
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

EvoTest	
  objec;ves/results	
  
•  Apply	
  Evolu;onary	
  Search	
  Based	
  Tes;ng	
  techniques	
  to	
  solve	
  tes;ng	
  problems	
  
from	
  a	
  wide	
  spectrum	
  of	
  complex	
  real	
  world	
  systems	
  	
  in	
  an	
  industrial	
  context.	
  	
  
•  Improve	
  the	
  power	
  of	
  evolu;onary	
  algorithms	
  for	
  searching	
  important	
  test	
  
scenarios,	
  hybridising	
  with	
  other	
  techniques:	
  
–  other	
  general-­‐purpose	
  search	
  techniques,	
  
–  other	
  advanced	
  so_ware	
  engineering	
  techniques,	
  such	
  as	
  slicing	
  and	
  program	
  transforma;on.	
  	
  
•  An	
  extensible	
  and	
  open	
  Automated	
  Evolu;onary	
  Tes;ng	
  Architecture	
  and	
  
Framework	
  will	
  be	
  developed.	
  This	
  will	
  provide	
  general	
  components	
  and	
  
interfaces	
  to	
  facilitate	
  the	
  automa;c	
  genera;on,	
  execu;on,	
  monitoring	
  and	
  
evalua;on	
  of	
  effec;ve	
  test	
  scenarios.	
  
	
  
	
  
4
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

FITTEST	
  
•  Future Internet Testing
•  September 2010 – December 2013
•  Total costs: 5.845.000 euros
•  Partners:	
  
–  Universidad	
  Politecnica	
  de	
  Valencia	
  (Spain)	
  
–  University	
  College	
  London	
  (United	
  Kingdom)	
  
–  Berner	
  &	
  MaVner	
  (Germany)	
  
–  IBM	
  (Israel)	
  
–  Fondazione	
  Bruno	
  Kessler	
  (Italy)	
  
–  Universiteit	
  Utrecht	
  (The	
  Netherlands)	
  
–  So_team	
  (France)	
  	
  
•  	
  hVp://www.pros.upv.es/fiVest/	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Future Internet Applications
–  Characterized by an extreme high level of dynamism
–  Adaptation to usage context (context awareness)
–  Dynamic discovery and composition of services
–  Etc..
•  Testing of these applications gets extremely important
•  Society depends more and more on them
•  Critical activities such as social services, learning, finance, business.
•  Traditional testing is not enough
–  Testwares are fixed
•  Continuous testing is needed
–  Testwares that automatically adapt to the dynamic behavior of the
Future Internet application
–  This is the objective of FITTEST
FITTEST	
  objec;ves/results	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

ORACLES
RUN SUT
GENERATE
TEST CASES
AUTOMATE
TEST CASES
SUT
INSTRUMENT
COLLECT &
PREPROCESS
LOGS
LOGGING TEST-WARE GENERATION
ANALYSE & INFER
MODELS
ANALYSE &
INFER ORACLES
EXECUTE
TEST CASES
TEST EVALUATION
EVALUATE
TEST CASES
MODEL BASED
ORACLES
HUMAN
ORACLES
TEST
RESULTS
TEST EXECUTION
FREE ORACLES
Model based
oracles
PROPERTIES
BASED ORACLES
Log based
oracles
End-users
PATTERN BASED
ORACLES
MANUALLY
Domain experts Domain Input
Specifications
OPTIONAL MANUAL EXTENSIONS
How	
  does	
  it	
  work?	
  
LOGGING
1.  Run the target System that is Under Test
(SUT)
2.  Collect the logs it generates
This can be done by:
•  real usage by end users of the application
in the production environment
•  test case execution in the test environment.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

How	
  does	
  it	
  work?	
  	
  
ORACLES
RUN SUT
GENERATE
TEST CASES
AUTOMATE
TEST CASES
UMENT
COLLECT &
PREPROCESS
LOGS
LOGGING TEST-WARE GENERATION
ANALYSE & INFER
MODELS
ANALYSE &
INFER ORACLES
EXECUTE
TEST CASES
TEST EVALUATION
EVALUATE
TEST CASES
MODEL BASED
ORACLES
HUMAN
ORACLES
TEST
RESULTS
TEST EXECUTION
FREE ORACLES
Model based
oracles
PROPERTIES
BASED ORACLES
Log based
oracles
sers
PATTERN BASED
ORACLES
MANUALLY
experts Domain Input
Specifications
OPTIONAL MANUAL EXTENSIONS
GENERATION
1.  Analyse the logs
2.  Generate different testwares:
•  Models
•  Domain Input Specification
•  Oracles
3.  Use these to generate and
automate a test suite consisting
off:
•  Abstract test cases
•  Concrete test cases
•  Pass/Fail Evaluation criteria
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

How	
  does	
  it	
  work?	
  	
  
ORACLES
RUN SUT
GENERATE
TEST CASES
AUTOMATE
TEST CASES
SUT
INSTRUMENT
COLLECT &
PREPROCESS
LOGS
LOGGING TEST-WARE GENERATION
ANALYSE & INFER
MODELS
ANALYSE &
INFER ORACLES
EXECUTE
TEST CASES
TEST EVALUATION
EVALUATE
TEST CASES
MODEL BASED
ORACLES
HUMAN
ORACLES
TEST
RESULTS
TEST EXECUTION
FREE ORACLES
Model based
oracles
PROPERTIES
BASED ORACLES
Log based
oracles
End-users
PATTERN BASED
ORACLES
MANUALLY
Domain experts Domain Input
Specifications
OPTIONAL MANUAL EXTENSIONS
Execute the test cases
and start a new test
cycle for continuous
testing and adaptation of
the test wares!
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

What	
  does	
  it	
  mean	
  to	
  be	
  an	
  EU	
  	
  
project	
  coordinator	
  
•  Understand what the project is about, what needs to be
done and what is most important.
•  Do NOT be afraid to get your hands dirty.
•  Do NOT assume people are working as hard on the project
as you are (or are as enthusiastic as you ;-)
•  Do NOT assume that the people that ARE responsible for
some tasks TAKE this responsibility
•  Get CC-ed in all emails and deal with it
•  Stalk people (email, sms, whatsapp, skype, voicemail
messages)
•  Be patient when explaining the same things over and over
again
10
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

EU	
  project	
  structures	
  
•  We have: WorkPackages (WP)
•  These are composed of: Tasks
•  These result in: Deliverables
WP	
  
Project	
  Management	
  
WP	
  
Exploita;on	
  and	
  Dissemina;on	
  
WP:	
  Integrated	
  it	
  
all	
  together	
  in	
  a	
  
superduper	
  
solu;on	
  that	
  
industry	
  needs	
  
WP:	
  Do	
  some	
  research	
  
WP:	
  And	
  some	
  more	
  
WP:	
  Do	
  more	
  research	
  
WP:	
  And	
  more……..	
  
WP:	
  Evaluate	
  Your	
  Results	
  through	
  Case	
  Studies	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

EU	
  project	
  
how	
  to	
  evaluate	
  your	
  results	
  
•  You need to do studies that evaluate the resulting testing
tools/techniques within a real industrial environment
12
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

WE	
  NEED	
  MORE	
  THAN	
  …	
  
Glenford Myers 1979!
Triangle Problem
Test a program which
can return the type of a
triangle based on the
widths of the 3 sides
For	
  evalua;on	
  
of	
  tes;ng	
  tools	
  
we	
  need	
  more!	
  
b a
c
α β
γ
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

WE	
  NEED….	
  
Real	
  people	
   Real	
  faults	
  
Real	
  systems	
  
Real	
  Tes;ng	
  
Environments	
  
Real	
  Tes;ng	
  
Processes	
  
Empirical	
  studies	
  
with	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

What	
  are	
  empirical	
  studies	
  
•  [Wikipedia]	
  Empirical	
  research	
  is	
  a	
  way	
  of	
  gaining	
  knowledge	
  by	
  means	
  of	
  
direct	
  and	
  indirect	
  observa8on	
  or	
  experience.	
  
•  [PPV00]	
  An	
  empirical	
  study	
  is	
  really	
  just	
  a	
  test	
  that	
  compares	
  what	
  we	
  
believe	
  to	
  what	
  we	
  observe.	
  Such	
  tests	
  when	
  wisely	
  constructed	
  and	
  
executed	
  play	
  a	
  fundamental	
  role	
  in	
  so?ware	
  engineering,	
  helping	
  us	
  
understand	
  how	
  and	
  why	
  things	
  work.	
  
•  Collec;ng	
  data:	
  
–  Quan;ta;ve	
  data	
  -­‐>	
  numeric	
  data	
  
–  Qualita;ve	
  data	
  -­‐>	
  observa;ons,	
  interviews,	
  opinions,	
  diaries,	
  etc.	
  
•  Different	
  kinds	
  are	
  dis;nguished	
  in	
  literature	
  [WRH+00,	
  RH09]	
  
–  Controlled	
  experiments	
  
–  Surveys	
  
–  Case	
  Studies	
  
–  Ac;on	
  research	
  
15
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Controlled	
  experiments	
  
What	
  
Experimental	
   inves;ga;on	
   of	
   hypothesis	
   in	
   a	
   laboratory	
   semng,	
   in	
   which	
  
condi;ons	
  are	
  set	
  up	
  to	
  isolate	
  the	
  variables	
  of	
  interest	
  ("independent	
  variables")	
  
and	
   test	
   how	
   they	
   affect	
   certain	
   measurable	
   outcomes	
   (the	
   "dependent	
  
variables")	
  
Good	
  for	
  
–  Quan;ta;ve	
  analysis	
  of	
  benefits	
  of	
  a	
  tes;ng	
  tool	
  or	
  technique	
  
–  We	
  can	
  use	
  methods	
  showing	
  sta;s;cal	
  significance	
  
–  We	
  can	
  demonstrate	
  how	
  scien;fic	
  we	
  are!	
  [EA06]	
  
Disadvantages	
  
–  Limited	
  confidence	
  that	
  laboratory	
  set-­‐up	
  reflects	
  the	
  real	
  situa;on	
  
–  ignores	
  contextual	
  factors	
  (e.g.	
  social/organiza;onal/poli;cal	
  factors)	
  
–  extremely	
  ;me-­‐consuming	
  
See:	
  [BSH86,	
  CWH05,	
  PPV00,	
  Ple95]	
  
	
   16
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Surveys	
  
What	
  
Collec;ng	
  informa;on	
  to	
  describe,	
  compare	
  or	
  explain	
  knowledge,	
  amtudes	
  and	
  
behaviour	
  over	
  large	
  popula;ons	
  using	
  interviews	
  or	
  ques0onnaires.	
  
Good	
  for	
  
–  Quan;ta;ve	
  and	
  qualita;ve	
  data	
  
–  Inves;ga;ng	
  the	
  nature	
  of	
  a	
  large	
  popula;on	
  
–  Tes;ng	
  theories	
  where	
  there	
  is	
  liVle	
  control	
  over	
  the	
  variables	
  
Disadvantages	
  
–  Difficul;es	
  of	
  sampling	
  and	
  selec;on	
  of	
  par;cipants	
  
–  Collected	
  informa;on	
  tends	
  to	
  subjec;ve	
  opinion	
  
See:	
  [PK01-­‐03]	
  
	
  
17
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Case	
  Studies	
  
What	
  
A	
  technique	
  for	
  detailed	
  exploratory	
  inves;ga;ons	
  that	
  aVempt	
  to	
  understand	
  
and	
  explain	
  phenomenon	
  or	
  test	
  theories	
  within	
  their	
  context	
  
Good	
  for	
  
–  Quan;ta;ve	
  and	
  qualita;ve	
  data	
  
–  Inves;ga;ng	
  capability	
  of	
  a	
  tool	
  within	
  a	
  specific	
  real	
  context	
  
–  Gaining	
  insights	
  into	
  chains	
  of	
  cause	
  and	
  effect	
  
–  Tes;ng	
  theories	
  in	
  complex	
  semngs	
  where	
  there	
  is	
  liVle	
  control	
  over	
  the	
  
variables	
  (Companies!)	
  
Disadvantages	
  
–  Hard	
  to	
  find	
  good	
  appropriate	
  case	
  studies	
  
–  Hard	
  to	
  quan;fy	
  findings	
  
–  Hard	
  to	
  build	
  generaliza;ons	
  (only	
  context)	
  
See:	
  [RH09,	
  EA06,	
  Fly06]	
  
	
  
	
  
18
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Ac;on	
  Research	
  
What	
  
research	
   ini;ated	
   to	
   solve	
   an	
   immediate	
   problem	
   (or	
   a	
   reflec;ve	
   process	
   of	
  
progressive	
   problem	
   solving)	
   involving	
   a	
   process	
   of	
   ac;vely	
   par;cipa;ng	
   in	
   an	
  
organiza;on’s	
  change	
  situa;on	
  whilst	
  conduc;ng	
  research	
  
Good	
  for	
  
–  Quan;ta;ve	
  and	
  qualita;ve	
  data	
  
–  When	
  the	
  goal	
  is	
  solving	
  a	
  problem.	
  
–  When	
  the	
  goal	
  of	
  the	
  study	
  is	
  change.	
  
Disadvantages	
  
–  Hard	
  to	
  quan;fy	
  findings	
  
–  Hard	
  to	
  build	
  generaliza;ons	
  (only	
  context)	
  
See:	
  [RH09,	
  EA06,	
  Fly06,	
  Rob02]	
  
	
  
	
  
19
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

EU	
  project	
  
how	
  to	
  evaluate	
  your	
  results	
  
•  You need to do empirical studies that evaluate the resulting testing
tools/techniques within a real industrial environment
•  The empirical study that best fits our purposes is Case Study
–  Evaluate the capability of our testing techiques/tools
–  In a real industrial context
–  Comparing to current testing practice
20
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Case	
  studies:	
  not	
  only	
  for	
  jus;fying	
  
research	
  projects	
  
•  We	
  need	
  to	
  apply	
  our	
  results	
  in	
  industry	
  to	
  understand	
  the	
  problems	
  
they	
  have	
  and	
  if	
  we	
  are	
  going	
  the	
  right	
  direc;on	
  to	
  solve	
  them.	
  
•  Real	
  need	
  in	
  so_ware	
  engineering	
  industry	
  to	
  have	
  general	
  guidelines	
  
on	
   what	
   tes;ng	
   techniques	
   and	
   tools	
   to	
   use	
   for	
   different	
   tes;ng	
  
objec;ves,	
  and	
  how	
  usable	
  these	
  techniques	
  are.	
  
•  Up	
  to	
  date	
  these	
  guidelines	
  do	
  not	
  exist.	
  
•  If	
  we	
  would	
  have	
  a	
  body	
  of	
  documented	
  experiences	
  and	
  knowledge	
  
from	
  which	
  the	
  needed	
  guidelines	
  can	
  be	
  extracted	
  
•  With	
   these	
   guidelines,	
   tes;ng	
   prac;cioners	
   might	
   make	
   informed	
  
decisions	
  about	
  which	
  techniques	
  to	
  use	
  and	
  es;mate	
  the	
  ;me/effort	
  
that	
  is	
  needed.	
  
21
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

The	
  challenge	
  
TO DO THIS WE HAVE TO:
•  Perform more evaluative empirical case studies in industrial environments
•  Carry out these studies by following the same methodology, to enhance the
comparison among the testing techniques and tools,
•  Involve realistic systems, environments and subjects (and not toy-programs
and students as is the case in most current work).
•  Do the studies thoroughly to ensure that any benefit identified during the
evaluation study is clearly derived from the testing technique studied, and also
to ensure that different studies can be compared.
FOR THIS WE NEED:
•  A general methodological evaluation framework that can simplify the design
of case studies for comparing software testing techniques and make the results
more precise, reliable, and easy to compare.
To create a body of evidence consisting of evaluative studies of testing
techniques and tools that can be used to understand the needs of industry and
derive general guidelines about their usability and applicability in industry.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Obtain	
  answers	
  to	
  general	
  ques;ons	
  about	
  adop;ng	
  different	
  so_ware	
  
tes;ng	
  techniques	
  and	
  or	
  tools.	
  
Secondary	
  study	
  
(EBSE)	
  
TT1	
  
CS1	
  
CS2	
  
CSn	
  
TT2	
  
CS1	
  
CS2	
  
CSn	
  
TTi	
  
CS1	
  
CS2	
  
CSn	
  
TTn	
  
CS1	
  
CS2	
  
CSn	
  
Body of
Evidence
Tes;ng	
  Techniques	
  and	
  Tools	
  (TT)	
  
Primary	
  case	
  studies	
  
General	
  Framework	
  for	
  EvaluaDng	
  TesDng	
  
Techniques	
  and	
  Tools	
  	
  	
  	
  
InstanDate	
  	
  
The	
  idea	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Very	
  brief…what	
  is	
  EBSE	
  
Evidence	
  Based	
  So_ware	
  Engineering	
  
•  The	
   essence	
   of	
   the	
   evidence-­‐based	
   paradigm	
   is	
   that	
   of	
   systema;cally	
  
collec0ng	
  and	
  analysing	
  all	
  of	
  the	
  available	
  empirical	
  data	
  about	
  a	
  given	
  
phenomenon	
   in	
   order	
   to	
   obtain	
   a	
   much	
   wider	
   and	
   more	
   complete	
  
perspec;ve	
   than	
   would	
   be	
   obtained	
   from	
   an	
   individual	
   study,	
   not	
   least	
  
because	
  each	
  study	
  takes	
  place	
  within	
  a	
  par;cular	
  context	
  and	
  involves	
  a	
  
specific	
  set	
  of	
  par;cipants.	
  
•  The	
  core	
  tool	
  of	
  the	
  evidence-­‐based	
  paradigm	
  is	
  the	
  Systema;c	
  Literature	
  
Review	
  (SLR)	
  
–  Secondary	
  study	
  
–  Gathering	
  an	
  analysing	
  primary	
  stydies.	
  
	
  
•  See:	
  hVp://www.dur.ac.uk/ebse/about.php	
  
24
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Obtain	
  answers	
  to	
  general	
  ques;ons	
  about	
  adop;ng	
  different	
  so_ware	
  
tes;ng	
  techniques	
  and	
  or	
  tools.	
  
Secondary	
  study	
  
(EBSE)	
  
TT1	
  
CS1	
  
CS2	
  
CSn	
  
TT2	
  
CS1	
  
CS2	
  
CSn	
  
TTi	
  
CS1	
  
CS2	
  
CSn	
  
TTn	
  
CS1	
  
CS2	
  
CSn	
  
Body of
Evidence
Tes;ng	
  Techniques	
  and	
  Tools	
  (TT)	
  
Primary	
  case	
  studies	
  
General	
  Framework	
  for	
  EvaluaDng	
  TesDng	
  
Techniques	
  and	
  Tools	
  	
  	
  	
  
InstanDate	
  	
  
The	
  idea	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Empirical	
  research	
  with	
  industry	
  =	
  difficult	
  
•  Not “rocket science”-difficult
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Empirical	
  research	
  with	
  industry	
  =	
  difficult	
  
•  But “communication science”-difficult
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

“communica;on	
  science”-­‐difficult	
  
28
Only
takes 1
hour
Not so
long
Not so
long
5 minutes
Researcher Practitioner in a company
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Communica;on	
  is	
  extremely	
  difficult	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

 30
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Examples…..	
  
Academia	
  
•  Wants	
  to	
  empirically	
  evaluate	
  T	
  
•  What	
  techniques/tools	
  can	
  we	
  
compare	
  with?	
  
•  Why	
  don´t	
  you	
  know	
  that?	
  
•  That	
  does	
  not	
  take	
  so	
  much	
  ;me!	
  
•  Finding	
  real	
  faults	
  would	
  be	
  great!	
  
•  Can	
  we	
  then	
  inject	
  faults?	
  
•  How	
  many	
  people	
  can	
  use	
  it?	
  
•  Is	
  there	
  historical	
  	
  data?	
  
•  But	
  you	
  do	
  have	
  that	
  informa;on?	
  
Industry	
  
•  Wants	
  to	
  execute	
  and	
  use	
  T	
  to	
  see	
  what	
  
happens.	
  
•  We	
  use	
  intui;on!	
  
	
  
•  You	
  want	
  me	
  to	
  know	
  all	
  that?	
  
•  That	
  much	
  ;me!?	
  
•  We	
  cannot	
  give	
  this	
  informa;on.	
  
•  Not	
  ar;ficial	
  ones,	
  we	
  really	
  need	
  to	
  
know	
  if	
  this	
  would	
  work	
  for	
  real	
  faults.	
  
•  We	
  can	
  assign	
  1	
  person.	
  
•  That	
  is	
  confiden;al.	
  
•  Oh..,	
  I	
  thought	
  you	
  did	
  not	
  need	
  that.	
  
31
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

With	
  the	
  objec;ve	
  to	
  improve	
  and	
  
reduce	
  some	
  barriers	
  
•  Use	
  a	
  general	
  methodological	
  framework.	
  
•  To	
  use	
  as	
  a	
  vehicle	
  of	
  communica;on	
  
•  To	
  simplify	
  the	
  design	
  
•  To	
  make	
  sure	
  that	
  studies	
  can	
  be	
  compared	
  and	
  aggregated	
  
32
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Exis;ng	
  work	
  	
  
•  By Lott & Rombach, Eldh, Basili, Do et al, Kitchenham et al
•  Describe organizational frameworks, i.e.:
–  General steps
–  Warnings when designing
–  Confounding factors that should be minimized
•  We pretended to define a methodological framework that
defined how to evaluate software testing techiques, i.e.:
–  The research questions that can be posed
–  The variables that can be measured
–  Etc.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

The	
  methodological	
  framework	
  
•  Imagine a company C wants to evaluate T to see whether it is useful and
worthwhile to incorporate in its company.
•  Components of the Framework (each case study will be an instantiation)
–  Objectives: effectiveness, efficiency, satisfaction
–  Cases or treatments (= the testing techniques/tools)
–  The subjects (= practitioners that will do the study)
–  The objects or pilot projects: selection criteria.
–  The variables and metrics: which data to collect?
–  Protocol that defines how to execute and collect data.
–  How to analyse the data
–  Threats to validity
–  Toolbox
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Defini;on	
  of	
  the	
  framework	
  –	
  research questions
•  RQ1: How does T contribute to the effectiveness of testing when it is being
used in real testing environments of C and compared to the current practices
of C?
•  RQ2: How does T contribute to the efficiency of testing when it is being used
in real testing environments of C and compared to the current practices of C?
•  RQ3: How satisfied (subjective satisfaction) are testing practitioners of C
during the learning, installing, configuring and usage of T when used in real
testing environments?
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Defini;on	
  of	
  the	
  framework	
  –	
  the cases
•  Reused a taxonomy from Vegas and Basili adapted to software testing tools
and augmented with results from Tonella
•  The case or the testing technique or tool should be described by:
–  Prerequisites: type, life-cycle, environment (platform and languages),
scalability, input, knowledge needed, experience needed.
–  Results: output, completeness, effectiveness, defect types, number of
generated test cases.
–  Operation: Interaction modes, user guidance, maturity, etc.
–  Obtaining the tool: License, cost, support.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Defini;on	
  of	
  the	
  framework	
  –	
  subjects
•  Workers of C that normally use the techniques and tools with which T is being
compared.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Defini;on	
  of	
  the	
  framework	
  – the objects/pilot projects
•  In order to see how we can compare and what type of study we will do, we
need answers to the following questions:
A.  Will	
  we	
  have	
  access	
  to	
  a	
  system	
  with	
  known	
  faults?	
  What	
  informa;on	
  is	
  present	
  
about	
  these	
  faults?	
  
B.  Are	
  we	
  allowed/able	
  to	
  inject	
  faults	
  into	
  the	
  system?	
  
C.  Does	
  your	
  company	
  gather	
  data	
  from	
  projects	
  as	
  standard	
  prac;ce?	
  What	
  data	
  is	
  
this?	
  Can	
  this	
  data	
  be	
  made	
  available	
  for	
  comparison?	
  Do	
  you	
  have	
  a	
  company	
  
baseline?	
  Do	
  we	
  have	
  access	
  to	
  a	
  sister	
  project?	
  
D.  Does	
  company	
  C	
  have	
  enough	
  ;me	
  and	
  resources	
  to	
  execute	
  various	
  rounds	
  of	
  
tests?,	
  or	
  more	
  concrete:	
  
•  Is	
  company	
  C	
  willing	
  to	
  make	
  a	
  new	
  testsuite	
  TSna	
  with	
  some	
  technique/tool	
  
Ta	
  already	
  used	
  in	
  the	
  company	
  C?	
  
•  Is	
  company	
  C	
  is	
  willing	
  to	
  make	
  a	
  new	
  testsuite	
  TSnn	
  with	
  some	
  technique/
tool	
  Tn	
  that	
  is	
  also	
  new	
  to	
  company	
  C?	
  
•  Can	
  we	
  use	
  an	
  exis;ng	
  testsuite	
  TSe	
  that	
  we	
  can	
  use	
  to	
  compare?	
  (Do	
  we	
  
know	
  the	
  techniques	
  that	
  were	
  used	
  to	
  create	
  that	
  test	
  suite,	
  and	
  how	
  much	
  
;me	
  it	
  took?)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

	
  
Defini;on	
  of	
  the	
  framework	
  -­‐	
  Protocol	
  
Can we inject
faults?
Training on T Inject faults
YES
do T to make
TST; collect
data
NO
Can we compare with an existing test
suite from company C (i.e. TSC)
Can company C make
another test suite TSN for
comparison using Tknown or
Tunknown
Do we have a
version with known
faults?
Do we have a
company baseline?
do Tknown o
Tunknown to
make TSN ;
collect data
NO YES
YESNO
NO YES
Do we have a
version with known
faults?
Do we have a
version with known
faults?
NO YES
NO
NO YES
YES
1 2
3
6 7
4 5
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

	
  
Defini;on	
  of	
  the	
  framework	
  -­‐	
  Scenarios	
  
•  Remember:
o  Does company C have enough time and resources to execute various rounds of
tests?, or more concrete:
•  Is company C willing to make a new testsuite TSna with some technique/tool Tal
already used in the company C?
•  Is company C is willing to make a new testsuite TSnn with some technique/tool
Tn that is also new to company C?
•  Can we use an existing testsuite TSe that we can use to compare? (Do we
know the techniques that were used to create that test suite, and how much
time it took?)
•  Scenario 1 (qualitative assessment only) (Qualitative Effects Analysis)
•  Scenario 2 (Scenario 1 / quantitative analysis based on company baseline)
•  Scenario 3 ((Scenario 1 / Scenario 2) / quantitative analysis of FDR)
•  Scenario 4 ((Scenario 1 / Scenario 2) / quantitative comparison of T and TSe)
•  Scenario 5 (Scenario 4 / FDR of T and TSe)
•  Scenario 6 ((Scenario 1 / Scenario 2) / quantitative comparison of T and (Ta or Tn))
•  Scenario 7 (Scenario 6 / FDR of T and (Ta or Tn))
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

If	
  we	
  can	
  inject	
  faults,	
  take	
  care	
  that	
  
1.  The	
  ar;ficially	
  seeded	
  faults	
  are	
  similar	
  to	
  real	
  faults	
  that	
  
naturally	
  occur	
  in	
  real	
  programs	
  due	
  to	
  mistakes	
  made	
  by	
  
developers.	
  
–  To	
  iden;fy	
  realis;c	
  fault	
  types,	
  a	
  history-­‐based	
  approach	
  can	
  be	
  used,	
  
i.e.	
  “real”	
  faults	
  can	
  be	
  fetched	
  from	
  the	
  bug	
  tracking	
  system	
  and	
  
made	
  sure	
  that	
  these	
  reported	
  faults	
  are	
  an	
  excellent	
  representa;ve	
  
of	
  faults	
  that	
  are	
  introduced	
  by	
  developers	
  during	
  implementa;on.	
  	
  
2.  The	
  faults	
  should	
  be	
  injected	
  in	
  code	
  that	
  is	
  covered	
  by	
  an	
  
adequate	
  number	
  of	
  test	
  cases	
  
–  	
  e.g.,	
  they	
  may	
  be	
  seeded	
  in	
  code	
  that	
  is	
  executed	
  by	
  more	
  than	
  20	
  
percent	
  and	
  less	
  than	
  80	
  percent	
  of	
  the	
  test	
  cases.	
  
3.  The	
  faults	
  should	
  be	
  injected	
  “fairly,”	
  i.e.,	
  an	
  adequate	
  
number	
  of	
  instances	
  of	
  each	
  fault	
  type	
  is	
  seeded.	
  
41
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Defini;on	
  of	
  the	
  framework	
  –	
  data to collect
•  Effectiveness
–  Number of test cases designed or generated.
–  How many invalid test cases are generated.
–  How many repeated test cases are generated
–  Number of failures observed.
–  Number of faults found.
–  Number of false positives (The test is marked as Failed, when the functionality is
working).
–  Number of false negatives (The test is marked as Passed, when the functionality is not
working).
–  Type and cause of the faults that were found.
–  Estimation (or when possible measured) of coverage reached.
•  Efficiency
•  Subjective satisfaction
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Effectiveness
•  Efficiency
–  Time needed to learn the testing method.
–  Time needed to design or generate the test cases.
–  Time needed to set up the testing infrastructure (install, configure, develop test drivers,
etc.) (quantitative). (Note: if software is to be developed or other mayor configuration/
installation efforts, it might be a good idea to maintain working diaries).
–  Time needed to test the system and observe the failure (i.e. planning, implementation
and execution) in hours (quantitative).
–  Time needed to identify the fault type and cause for each observed failure (i.e. time to
isolate) (quantitative).
•  Subjective satisfaction
Defini;on	
  of	
  the	
  framework	
  –	
  data to collect
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Effectiveness
•  Efficiency
•  Subjective satisfaction
–  SUS score (10 question questionnaire with 5 likert-scale and a total score)
–  5 reactions (through reaction cards) that will be used to create a word cloud and ven
diagrams)
–  Emotional face reactions during semi-structured interviews (faces will be evaluated on
a 5 likert-scale from “not at all like this” to “very much like this”).
–  Subjective opinions about the tool.
A mixture of methods for evaluating
subjective satisfaction
Defini;on	
  of	
  the	
  framework	
  –	
  data to collect
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

 45
System	
  Usability	
  Scale	
  (SUS)	
  
©DigitalEquipmentCorporation,1986
www.usability.serco.com/trump/documents/Suschapt.doc
! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!Strongly!disagree! ! !!!!!!!!!!!!!!!!!!!!!!Strongly!agree!
! ! ! ! ! ! ! ! ! !!!!!!!
1.!I!think!that!I!would!like!to! !
!!!use!this!system!frequently! ! !
!
!
2.!I!found!the!system!unnecessarily!
!!!complex!
!
!
3.!I!thought!the!system!was!easy!
!!!to!use!!!!!!!!!!!!!!!!!!!!!!! !
!
4.!I!think!that!I!would!need!the!
!!!support!of!a!technical!person!to!
!!!be!able!to!use!this!system! !
!
5.!I!found!the!various!functions!in!
!!!this!system!were!well!integrated!
! ! ! ! !
6.!I!thought!there!was!too!much!
!!!inconsistency!in!this!system!
! ! ! ! !
7.!I!would!imagine!that!most!people!
!!!would!learn!to!use!this!system!
!!!very!quickly! ! ! !
!
8.!I!found!the!system!very!
!!!cumbersome!to!use!
! ! ! !
9.!I!felt!very!confident!using!the!
!!!system!
! !
10.!I!needed!to!learn!a!lot!of!
!!!things!before!I!could!get!going!
!!!with!this!system!! ! !
!
!
!
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Why	
  SUS	
  
•  studies	
  (e.g.	
  [TS04,	
  BKM08])	
  have	
  shown	
  that	
  this	
  simple	
  
ques;onnaire	
  gives	
  most	
  reliable	
  results.	
  
•  SUS	
  is	
  technology	
  agnos;c,	
  making	
  it	
  flexible	
  enough	
  to	
  assess	
  a	
  
wide	
  range	
  of	
  interface	
  technologies.	
  	
  
•  The	
  survey	
  is	
  rela;vely	
  quick	
  and	
  easy	
  to	
  use	
  by	
  both	
  study	
  
par;cipants	
  and	
  administrators.	
  
•  The	
  survey	
  provides	
  a	
  single	
  score	
  on	
  a	
  scale	
  that	
  is	
  easily	
  
understood	
  by	
  the	
  wide	
  range	
  of	
  people	
  (from	
  project	
  managers	
  
to	
  computer	
  programmers)	
  who	
  are	
  typically	
  involved	
  in	
  the	
  
development	
  of	
  products	
  and	
  services	
  and	
  who	
  may	
  have	
  liVle	
  or	
  
no	
  experience	
  in	
  human	
  factors	
  and	
  usability.	
  
•  The	
  survey	
  is	
  nonproprietary,	
  making	
  it	
  a	
  cost	
  effec;ve	
  tool	
  as	
  
well.	
  
46
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

SUS	
  Scoring	
  
•  SUS	
  yields	
  a	
  single	
  number	
  represen;ng	
  a	
  composite	
  measure	
  
of	
  the	
  overall	
  usability	
  of	
  the	
  system	
  being	
  studied.	
  Note	
  that	
  
scores	
  for	
  individual	
  items	
  are	
  not	
  meaningful	
  on	
  their	
  own.	
  
•  To	
  calculate	
  the	
  SUS	
  score,	
  first	
  sum	
  the	
  score	
  contribu;ons	
  
from	
  each	
  item.	
  	
  
–  Each	
  item's	
  score	
  contribu;on	
  will	
  range	
  from	
  0	
  to	
  4.	
  	
  
–  For	
  items	
  1,	
  3,	
  5,	
  7	
  and	
  9	
  the	
  score	
  contribu;on	
  is	
  the	
  scale	
  posi;on	
  
minus	
  1.	
  	
  
–  For	
  items	
  2,4,6,8	
  and	
  10,	
  the	
  contribu;on	
  is	
  5	
  minus	
  the	
  scale	
  
posi;on.	
  	
  
–  Mul;ply	
  the	
  sum	
  of	
  the	
  scores	
  by	
  2.5	
  to	
  obtain	
  the	
  overall	
  value	
  of	
  SU.	
  	
  
•  SUS	
  scores	
  have	
  a	
  range	
  of	
  0	
  to	
  100.	
  
47
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

SUS	
  is	
  not	
  enough…	
  
•  In	
  a	
  literature	
  review	
  of	
  180	
  published	
  usability	
  studies,	
  
Hornbaek	
  [Horn06]	
  concludes	
  that	
  measures	
  of	
  sa;sfac;on	
  
should	
  be	
  extended	
  beyond	
  ques;onnaires.	
  
•  So	
  we	
  add	
  more….	
  
48
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Reac;on	
  Cards	
  
49
Developed by and © 2002 Microsoft Corporation. All rights reserved.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Emo;onal	
  face	
  reac;ons	
  	
  
•  The	
  idea	
  is	
  to	
  elicit	
  feedback	
  about	
  the	
  product,	
  par;cularly	
  
emo;ons	
  that	
  arose	
  for	
  the	
  par;cipants	
  while	
  talking	
  about	
  the	
  
product	
  (e.g.	
  frustra;on,	
  happiness).	
  	
  	
  
•  We	
  will	
  video	
  tape	
  the	
  users	
  when	
  they	
  respond	
  to	
  the	
  following	
  
two	
  ques;ons	
  during	
  a	
  semi-­‐structured	
  interview.	
  
•  Would	
  you	
  recommend	
  this	
  tool	
  it	
  to	
  other	
  colleagues?	
  	
  
–  If	
  not	
  why	
  
–  If	
  yes	
  what	
  arguments	
  would	
  you	
  use	
  	
  
•  Do	
  you	
  think	
  you	
  can	
  persuade	
  your	
  management	
  to	
  invest	
  in	
  a	
  tool	
  like	
  this?	
  
–  If	
  not	
  why	
  
–  If	
  yes	
  what	
  arguments	
  would	
  you	
  use	
  	
  
50
!
Not!at!
all!like!
this!
! ! ! ! ! Very!
much!
like!this!
1! 2! 3! 4! 5! 6! 7!
!
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Analysing	
  and	
  interpre;ng	
  the	
  data	
  
•  Depends	
  on	
  the	
  amount	
  of	
  data	
  we	
  have.	
  
•  If	
  we	
  only	
  have	
  1	
  value	
  for	
  each	
  variable,	
  no	
  analysis	
  techniques	
  
are	
  available	
  and	
  we	
  just	
  present	
  and	
  interpret	
  the	
  data.	
  
•  If	
   we	
   have	
   sets	
   of	
   values	
   for	
   a	
   variable	
   then	
   we	
   need	
   to	
   use	
  
sta;s;cal	
  methods	
  
–  Descrip;ve	
  sta;s;cs	
  
–  Sta;s;cal	
   tests	
   (or	
   significance	
   tes;ng).	
   In	
   sta;s;cs	
   a	
   result	
   is	
   called	
  
sta;s;cally	
   significant	
   if	
   it	
   has	
   been	
   predicted	
   as	
   unlikely	
   to	
   have	
  
occurred	
   by	
   chance	
   alone,	
   according	
   to	
   a	
   pre-­‐determined	
   threshold	
  
probability,	
  the	
  significance	
  level.	
  
•  Evalua;ng	
   SBST	
   tools,	
   we	
   will	
   always	
   have	
   sets	
   of	
   values	
   for	
  
most	
  of	
  the	
  variables	
  to	
  deal	
  with	
  randomness	
  
51
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Descrip;ve	
  sta;s;cs	
  
•  Mean,	
   median,	
   middle,	
   standard	
   devia;on,	
   frequency,	
  
correla;on,	
  etc	
  
•  Graphical	
  visualisa;on:	
  scaVer	
  plot,	
  box	
  plot,	
  histogram,	
  pie	
  
charts	
  
52
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Sta;s;cal	
  test	
  
53
http://science.leidenuniv.nl/index.php/ibl/pep/people/Tom_de_Jong/Teaching
Most	
  important	
  is	
  finding	
  te	
  right	
  method	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

	
  
Defini;on	
  of	
  the	
  framework	
  -­‐	
  Threats	
  
•  Threats to validity (of confounding factors) have to be
minimized
•  These are the effects or situations that might jeopardize
the validity of your results….
•  Those that cannot be prevented have to be reported
•  When working with people we have to consider many
sociological effects:
•  Let us just look at a couple well known ones to give you an
idea….
–  The learning curve effect
–  The Hawthorne effect
–  The Placebo effect
–  Etc….
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

The	
  learning	
  curve	
  effect	
  
•  When	
  using	
  new	
  methods/tools	
  people	
  gain	
  familiarity	
  with	
  
their	
  applica;on	
  over	
  ;me	
  (=	
  learning	
  curve).	
  
–  Ini;ally	
  they	
  are	
  likely	
  to	
  them	
  more	
  ineffec;vely	
  than	
  they	
  might	
  a_er	
  a	
  
period	
  of	
  familiarisa;on/learning.	
  
•  Thus	
  the	
  learning	
  curve	
  effect	
  will	
  tend	
  to	
  counteract	
  any	
  
posi;ve	
  effects	
  inherent	
  in	
  the	
  new	
  method/tool.	
  	
  
•  In	
  the	
  context	
  of	
  evalua;ng	
  methods/tools	
  there	
  are	
  two	
  basic	
  
strategies	
  to	
  minimise	
  the	
  learning	
  curve	
  effect	
  (that	
  are	
  not	
  
mutually	
  exclusive):	
  
1.  Provide	
  appropriate	
  training	
  before	
  undertaking	
  an	
  evalua;on	
  exercise;	
  
2.  Separate	
  pilot	
  projects	
  aimed	
  at	
  gaining	
  experience	
  of	
  using	
  a	
  method/
tool	
  from	
  pilot	
  projects	
  that	
  are	
  part	
  of	
  an	
  evalua;on	
  exercise.	
  
55
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

The	
  Hawthorn	
  effect	
  
•  When	
  an	
  evalua;on	
  is	
  performed,	
  staff	
  working	
  on	
  the	
  pilot	
  
project(s)	
  may	
  have	
  the	
  percep;on	
  that	
  they	
  are	
  working	
  under	
  
more	
  management	
  scru;ny	
  than	
  normal	
  and	
  may	
  therefore	
  work	
  
more	
  conscien;ously.	
  	
  
•  Name	
  comes	
  from	
  Hawthorne	
  aircra_	
  factory	
  (lights	
  low	
  or	
  high?).	
  
•  The	
  Hawthorn	
  effect	
  would	
  tend	
  to	
  exaggerate	
  posi;ve	
  effects	
  
inherent	
  in	
  a	
  new	
  method/tool.	
  	
  
•  A	
  strategy	
  to	
  minimise	
  the	
  Hawthorne	
  effect	
  is	
  to	
  ensure	
  that	
  a	
  
similar	
  level	
  of	
  management	
  scru;ny	
  is	
  applied	
  to	
  control	
  projects	
  
in	
  your	
  case	
  study	
  (i.e.	
  project(s)	
  using	
  the	
  current	
  method/tool)	
  as	
  
is	
  applied	
  to	
  the	
  projects	
  that	
  are	
  using	
  the	
  new	
  method/tool.	
  
56
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

The	
  placebo	
  effect	
  
•  In	
  medical	
  research,	
  pa;ents	
  who	
  are	
  deliberately	
  given	
  ineffectual	
  treatments	
  
recover	
  if	
  they	
  believe	
  that	
  the	
  treatment	
  will	
  cure	
  them.	
  	
  
•  Also	
  a	
  so_ware	
  engineer	
  who	
  believes	
  that	
  adop;ng	
  some	
  prac;ce	
  (i.e.,	
  
wearing	
  a	
  pink	
  t-­‐shirt)	
  will	
  improve	
  the	
  reliability	
  of	
  his	
  code	
  may	
  succeed	
  in	
  
producing	
  more	
  reliable	
  code.	
  	
  
•  Such	
  a	
  result	
  could	
  not	
  be	
  generalised.	
  
•  In	
  medicine,	
  placebo	
  effect	
  is	
  minimized	
  by	
  not	
  informing	
  the	
  subjects.	
  	
  
•  This	
  cannot	
  be	
  done	
  in	
  the	
  context	
  of	
  tes;ng	
  tool	
  evalua;ons.	
  
•  When	
  evalua;ng	
  methods	
  and	
  tools	
  the	
  best	
  you	
  can	
  do	
  is	
  to:	
  
–  Assign	
  staff	
  to	
  pilot	
  projects	
  using	
  your	
  normal	
  project	
  staffing	
  methods	
  and	
  hope	
  
that	
  the	
  actual	
  selec;on	
  of	
  staff	
  includes	
  the	
  normal	
  mix	
  of	
  enthusiasts,	
  cynics	
  and	
  
no-­‐hopers	
  that	
  normally	
  comprise	
  your	
  project	
  teams.	
  
–  Make	
  a	
  special	
  effort	
  to	
  avoid	
  staffing	
  pilot	
  projects	
  with	
  staff	
  who	
  have	
  a	
  vested	
  
interest	
  in	
  the	
  method/tool	
  (i.e.	
  staff	
  who	
  developed	
  or	
  championed	
  it)	
  or	
  a	
  vested	
  
interest	
  in	
  seeing	
  it	
  fail	
  (i.e.	
  staff	
  who	
  really	
  hate	
  change	
  rather	
  than	
  just	
  resent	
  it).	
  
•  This	
  is	
  a	
  bit	
  like	
  selec;ng	
  a	
  jury.	
  Ini;ally	
  the	
  selec;on	
  of	
  poten;al	
  jurors	
  is	
  at	
  
random,	
  but	
  there	
  is	
  addi;onal	
  screening	
  to	
  avoid	
  jurors	
  with	
  iden;fiable	
  bias.	
  
57
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

	
  
Defini;on	
  of	
  the	
  framework	
  -­‐	
  Threats	
  
•  There are many more factors that all need to be identified
•  Not only the people but also the technology:
–  Did the measurement tools (i.e. Coverage) really measure what we
thoughts
–  Are the injected faults really representative?
–  Were the faults injected fairly?
–  Is the pilot project and software representative?
–  Were the used oracles reliable?
–  Etc….
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

	
  
Defini;on	
  of	
  the	
  framework	
  -­‐	
  Toolbox	
  
•  Toolbox
–  Demographic questionnaire: This questionnaire must be answered by the testers
before performing the test. This questionnaire aims to obtain the features of the
testers: level of experience in using the tool, years, job, knowledge of similar tools,
–  Satisfaction questionnaire SUS: Questionnaire in order to extract the testers’
satisfaction when they perform the evaluation.
–  Reaction cards
–  Questions for (taped) semi-structured interviews
–  Process for investigating the face reactions in videos
–  An fault taxonomy to classify the found fault.
–  A software testing technique and tools classification/taxonomy
–  Working diaries
–  A fault template to classify each fault detected by means of the test. The template
contains the following information:
•  Time spent to detect the fault
•  Test case that found the fault
•  Cause of the fault: mistake in the implementation, mistake in the design,
mistake in the analysis.
•  Manifestation of the fault in the code
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

	
  
Applying	
  or	
  instan;a;ng	
  the	
  framework	
  
•  Search Based Structural Testing tool [Vos et.al. 2012]
•  Search Based Functional Testing tool [Vos et. Al. 2013]
•  Web testing techniques for AJAX applications [MRT08]
•  Commercial combinatorial testing tool at Sulake (to be presented at ISSTA
workshop next week)
•  Automated Test Case Generation at IBM (has been send to ESEM)
•  We have finished other instances:
–  Commercial combinatorial testing tool at SOFTEAM (has been send to ESEM)
•  Currently we are working on more instantiations:
–  Regression testing priorization technique
–  Continuous testing tool at SOFTEAM
–  Rogue User Testing at SOFTEAM
–  …
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Sulake is a Finish company
•  Develops social entertainment games
•  Main product: Habbo hotel
–  World’s largest virtual community for teenagers
–  Millions of teenagaers a week all over the world (ages 13-18)
–  Access direct through the browser or facebook
–  11 languages available
–  218.000.000 registered users
–  11.000.000 visitors / month
•  System can be accessed through wide variety of
browsers, flashplayers (and their versions) than run on
different operating systems!
•  Which combinations to use when testing the system?!
Combinatorial	
  Tes;ng	
  
example	
  case	
  study	
  Sulake:	
  Context	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Can	
  a	
  tool	
  help?	
  
•  What	
  about	
  the	
  CTE	
  XL	
  Profesional	
  or	
  CTE	
  for	
  short?	
  
•  Combinatorial	
  Tree	
  Editor:	
  
–  Model	
  your	
  combinatorial	
  problem	
  in	
  a	
  tree	
  
–  Indicate	
  the	
  priori;es	
  of	
  the	
  combina;ons	
  	
  
–  The	
  tool	
  automa;cally	
  generated	
  the	
  best	
  test	
  cases!	
  
67
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  What do we want to find out?
•  Research questions:
–  RQ1: Compared to the current test suites used for testing in Sulake, can
the test cases generated by the CTE contribute to the effectiveness of
testing when it is used in real testing environments at Sulake?
–  RQ2: How much effort would be required to introduce the CTE into the
testing processes currently implanted at Sulake?
–  RQ3: How much effort would be required to add the generated test
cases into the testing infrastructure currently used at Sulake?
–  RQ4: How satisfied are Sulake testing practitioners during the learning,
installing, configuring and usage of CTE when it is used in real testing
environment
Combinatorial	
  Tes;ng	
  
example	
  case	
  study	
  Sulake:	
  Research	
  Ques;ons	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Current combinatorial testing practice at Sulake
–  Exploratory testing with feature coverage as objective
–  Based on real user information(i.e. browsers, OS, Flash,
etc.) combinatorial aspects are taken into account
COMPARED TO
•  Classification Tree Editor (CTE)
–  Classify the combinatorial aspects as a classification tree
–  Generate prioritized test cases and select
The	
  tes;ng	
  tools	
  evaluated	
  
(the	
  cases	
  or	
  treatments)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Subjects: 1 senior tester from Sulake (6 years sw devel,
8 years testing experience)
Who	
  is	
  doing	
  the	
  study	
  
(the	
  subjects)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Objects:
–  2 nightly builds from Habbo
–  Existing test suite that Sulake uses (TSsulake) with 42
automated test cases
–  No known faults, no injection of faults
Systems	
  Under	
  Test	
  
(objects	
  or	
  pilot	
  projects)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Can we inject
faults?
Training on T Inject faults
YES
do T to make
TST; collect
data
NO
Can we compare with an existing test
suite from company C (i.e. TSC)
Can company C make
another test suite TSN for
comparison using Tknown or
Tunknown
Do we have a
version with known
faults?
Do we have a
company baseline?
do Tknown o
Tunknown to
make TSN ;
collect data
NO YES
YESNO
NO YES
Do we have a
version with known
faults?
Do we have a
version with known
faults?
NO YES
NO
NO YES
YES
1 2
3
6 7
4 5
The	
  protocol	
  
(scenario	
  4)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

!
Variables) TSSulake) TSCTE)
Measuring!effectiveness:!
Number!of!test!cases! 42! 68!(selected!42!high!priority)!
Number!of!invalid!test!cases! 0! 0!
Number!of!repeated!test!cases! 0! 26!
Number!of!failures!observed! 0! 12!
Number!of!fauls!found! 0! 2!
Type!and!cause!of!the!faults! N/A! 1.!critical,!browser!hang!
2.!minor,!broken!UI!element!
Feature!coverage!reached! 100%! 40%!
All!pairs!coverage!reached! N/A! 80%!
Measuring!efficiency:!
Time!needed!to!learn!the!CTE!testing!method! N/A! 116!min!
Time!needed!to!design!and!generate!the!test!suite!with!the!CTE!! N/A! 95!min!(62!for!tree,!33!for!
removing!duplicates)!!
Time!needed!to!setup!testing!infrastructure!specific!to!CTE! N/A! 74!min!
Time!needed!to!automate!the!test!suite!generated!by!the!CTE!! N/A! 357!min!
Time!needed!to!execute!the!test!suite! 114min! 183!min!(both!builds)!
Time!needed!to!identify!fault!types!and!causes!! 0min! 116!min!
Measuring!subjective!satisfaction!
SUS! N/A! 50!
Reaction!cards! N/A! Comprehensive,!Dated,!Old,!
Sterile,!Unattractive!
Informal!interview!! N/A! video!
Collected	
  data	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Descrip;ve	
  sta;s;cs	
  for	
  efficiency	
  (1)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Descrip;ve	
  sta;s;cs	
  for	
  efficiency	
  (2)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Emo;onal	
  face	
  reac;ons	
  	
  
76
1.
4.
2. 3.
Duplicates Test Cases
Readability of the CTE trees Technical support and user manuals
Appearance of the tool
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  RQ1:	
  Compared	
  to	
  the	
  current	
  test	
  suites	
  used	
  for	
  tes;ng	
  in	
  Sulake,	
  can	
  the	
  test	
  cases	
  
generated	
  by	
  the	
  CTE	
  contribute	
  to	
  the	
  effec;veness	
  of	
  tes;ng	
  when	
  it	
  is	
  used	
  in	
  real	
  tes;ng	
  
environments	
  at	
  Sulake?	
  
–  2	
  new	
  faults	
  were	
  found!!	
  
–  The	
  need	
  that	
  more	
  structured	
  combinatorial	
  tes;ng	
  is	
  necessary	
  was	
  confirmed	
  by	
  Sulake.	
  
•  RQ2:	
  How	
  much	
  effort	
  would	
  be	
  required	
  to	
  introduce	
  the	
  CTE	
  into	
  the	
  tes;ng	
  processes	
  
currently	
  implanted	
  at	
  Sulake?	
  
–  Effort	
  for	
  learning	
  and	
  installing	
  is	
  medium,	
  but	
  can	
  be	
  jus;fied	
  within	
  Sulake	
  being	
  only	
  once	
  
–  Designing	
  and	
  genera;ng	
  test	
  cases	
  suffers	
  form	
  duplicates	
  that	
  costs	
  ;me	
  to	
  be	
  removed.	
  Total	
  effort	
  can	
  
be	
  accepted	
  within	
  Sulake	
  because	
  cri;cal	
  faults	
  were	
  discovered.	
  
–  Execu;ng	
  the	
  test	
  suite	
  generated	
  by	
  the	
  CTE	
  takes	
  1	
  hour	
  more	
  that	
  Sulake	
  test	
  suite	
  (due	
  to	
  more	
  
combinatorial	
  aspects	
  being	
  tested).	
  This	
  means	
  that	
  Sulake	
  cannot	
  include	
  these	
  tests	
  in	
  daily	
  build,	
  but	
  will	
  
have	
  to	
  add	
  them	
  to	
  nightly	
  builds	
  only.	
  
¢  RQ3:	
  How	
  much	
  effort	
  would	
  be	
  required	
  to	
  add	
  the	
  generated	
  test	
  cases	
  into	
  the	
  tes;ng	
  
infrastructure	
  currently	
  used	
  at	
  Sulake?	
  	
  
•  Effort	
  for	
  automa;ng	
  the	
  generated	
  test	
  cases,	
  but	
  can	
  be	
  jus;fied	
  within	
  Sulake.	
  
¢  RQ4:	
  How	
  sa;sfied	
  are	
  Sulake	
  tes;ng	
  prac;;oners	
  during	
  the	
  learning,	
  installing,	
  configuring	
  
and	
  usage	
  of	
  CTE	
  when	
  it	
  is	
  used	
  in	
  real	
  tes;ng	
  environment.	
  
•  Seems	
  to	
  have	
  everything	
  that	
  is	
  needed	
  but	
  looks	
  unaVrac;ve.	
  
Combinatorial	
  Tes;ng	
  
example	
  case	
  study	
  Sulake:	
  Conclusions	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Automated	
  Test	
  Case	
  Genera;on	
  
example	
  case	
  study	
  IBM:	
  context	
  
•  IBM	
  Research	
  Labs	
  in	
  Haifa,	
  Israel	
  
•  Develop	
  a	
  system	
  (denoted	
  IMP	
  ;-­‐)	
  for	
  resource	
  management	
  
in	
  a	
  networked	
  environment	
  (servers,	
  virtual	
  machines,	
  
switches,	
  storage	
  devices,	
  etc.)	
  
•  IBM	
  Lab	
  has	
  a	
  designated	
  team	
  that	
  is	
  responsible	
  for	
  tes;ng	
  
new	
  versions	
  of	
  the	
  product.	
  
•  The	
  tes;ng	
  is	
  being	
  done	
  within	
  a	
  simulated	
  tes;ng	
  
environment	
  developed	
  by	
  this	
  tes;ng	
  team	
  
•  IBM	
  Lab	
  is	
  interested	
  in	
  evalua;ng	
  the	
  Automated	
  Test	
  Case	
  
Genera;on	
  tools	
  developed	
  in	
  the	
  FITTEST	
  project!	
  
78
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  What do we want to find out?
•  Research questions:
–  RQ1: Compared to the current test suite used for testing at IBM
Research, can the FITTEST tools contribute to the effectiveness of
testing when it is used in real testing environments at IBM?
–  RQ2: Compared to the current test suite used for testing at IBM
Research, can the FITTEST tools contribute to the efficiency of testing
when it is used in real testing environments at IBM?
–  RQ3: How much effort would be required to deploy the FITTEST tools
within the testing processes currently implanted at IBM Research?
Automated	
  Test	
  Case	
  Genera;on	
  
example	
  case	
  study	
  IBM:	
  the	
  research	
  ques;ons	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Current test case design practice at IBM
–  Exploratory test case design
–  The objective of test cases is maximise the coverage of
the system use-cases
COMPARED TO
•  FITTEST Automated Test Case Generation tools
–  Only part of the whole continuous testing approach of
FITTEST
The	
  tes;ng	
  tools	
  evaluated	
  
(the	
  cases	
  or	
  treatments)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

FITTEST	
  
Automated	
  Test	
  Case	
  Genera;on	
  
81
Logs2FSM
•  Infers FSM models from the logs
•  Applying an event-based model
inference approach (read more in
[MTR08]).
•  The model-based oracles that also
result from this tool refer to the use of
the paths generated from the inferred
FSM as oracles. If these paths, when
transformed to test cases, cannot be
fully executed, then the tester needs to
inspect the failing paths to see if that is
due to some faults, or the paths
themselves are infeasible.
FSM2Tests
•  Takes FSMs and a Domain Input
Specification (DIS) file created by a
tester for the IBM Research SUT to
generate concrete test cases.
•  This component implements a technique
that combines model-based and
combinatorial testing (see [NMT12]):
1.  generate test paths from the FSM
(using various simple and
advanced graph visit algorithms)
2.  transform these paths into
classification trees using the CTE
XL format, enriched with the DIS
such as data types and partitions;
3.  Generate test combinations from
those trees using combinatorial
criteria.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Subjects:
–  1 senior tester from IBM (10 years sw devel, 5 years testing
experience of which 4 years with IMP system)
–  1 researcher from FBK (10 years of experience with sw
development, 5 years of experience with research in testing)
Who	
  is	
  doing	
  the	
  study	
  
(the	
  subjects)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  Objects:
–  SUT: Distributed application for managing system
resources in a networked environment
•  A management server that communicates with multiple
managed clients (= physical or virtual resources)
•  Important product for IBM with real customers
•  Case study will be performed on a new version of this system
–  Existing test suite that IBM uses (TSibm)
•  Selected from what they call System Validation Tests
(SVT) -> tests for high level complex costumer use-cases
•  Manually designed
•  Automatically executed through activation scripts
–  10 representative faults to inject into the system
Pilot	
  project	
  
	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Can we inject
faults?
Training on T Inject faults
YES
do T to make
TST; collect
data
NO
Can we compare with an existing test
suite from company C (i.e. TSC)
Can company C make
another test suite TSN for
comparison using Tknown or
Tunknown
Do we have a
version with known
faults?
Do we have a
company baseline?
do Tknown o
Tunknown to
make TSN ;
collect data
NO YES
YESNO
NO YES
Do we have a
version with known
faults?
Do we have a
version with known
faults?
NO YES
NO
NO YES
YES
1 2
3
6 7
4 5
The	
  protocol	
  
(scenario	
  5)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

More	
  detailed	
  steps	
  
1.  Configure	
  the	
  simulated	
  environment	
  and	
  create	
  the	
  logs	
  [IBM]	
  
2.  Select	
  test	
  suite	
  TSibm [IBM]	
  
3.  Write	
  ac;va;on	
  scripts	
  for	
  each	
  test	
  case	
  in	
  TSibm [IBM]	
  
4.  Generate	
  TSfittest [FBK	
  subject]	
  
a.  Instan;ate	
  FITTEST	
  components	
  for	
  the	
  IMP	
  
b.  Generate	
  the	
  FSM	
  with	
  Logs2FSM	
  
c.  Define	
  the	
  Domain	
  Input	
  Specifica;on	
  (DIS)	
  
d.  Generate	
  the	
  concrete	
  test	
  data	
  with	
  FSM2Tests	
  
5.  Select	
  and	
  inject	
  the	
  faults	
  [IBM]	
  
6.  Develop	
  a	
  tool	
  that	
  transforms	
  the	
  concrete	
  test	
  cases	
  generated	
  
by	
  the	
  FITTEST	
  tool	
  FSM2Tests	
  	
  to	
  an	
  executable	
  format	
  [IBM]	
  
7.  Execute	
  TSibm 	
  [IBM]	
  
8.  Execute	
  TSfittest [IBM]	
  
85
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Collected	
  Measures	
  
86
TSibm TSfittest
size
number of abstract test cases NA 84
number of concrete test cases 4 3054
number of commands (or events) 1814 18520
construction
design of the test cases manual cf. Section III-B1 automated cf. Section III-B2
effort
effort to create the test suite
design 5 hours set up FITTEST tools 8 hours
activation scripts 4 hours generate the FSM automated, less than 1 second CPU time
specify the DIS 2 hours
generate concrete tests automated, less than 1 minute CPU time
transform into executable format 20 hours
TABLE IV. DESCRIPTIVE MEASURES FOR THE TEST SUITES TSibm AND TSfittest
what the researcher have in mind and what is investigated
ccording to the research questions. This type of threat is
mainly related to the use of injected faults to measure the fault-
finding capability of our testing strategies. This is because the
ypes of faults seeded may not be enough representative of real
aults. In order to mitigate this threat, the IBM team identified
epresentative faults that were based on real faults, identified
n earlier time of the development. This identification although
was realized by a senior tester, the list was revised by all IBM
eam that participated in this case study.
Moreover, from the FITTEST project’s point of view w
have the following results:
• The FITTEST tools have shown to be useful with
the context of a real industrial case.
• The FITTEST tools have the ability to automate th
testing process within a real industrial case.
ACKNOWLEDGMENT
This work was financed by the FITTEST project, IC
S19
S15
S42
GET/VMControl/virtualAppliances/{id}/targets
S34
GET/VMControl/virtualAppliances
S26
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl
S33
GET/VMControl/virtualAppliances/{id}
S41
GET/VMControl/virtualAppliances/{id}/targets
S2
GET/VMControl/virtualAppliances/{id}/progress
S20
GET/VMControl/virtualAppliances/{id}
S37
GET/VMControl/virtualAppliances/{id}/targets
S36
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
S22
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
GET/VMControl/virtualAppliances/{id}/progress GET/isdsettings/restcompatibilities
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targetsGET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances
S48
GET/resources/OperatingSystem?Props=IPv4Address
GET/isdsettings/restcompatibilitiesGET/VMControl/virtualAppliances/{id}/targets
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/progress S40
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
GET/resources/System/{id}/access/a315017165998
S38
GET/VMControl/workloads
GET/VMControl/virtualAppliances/{id}/targets
S31
GET/VMControl/workloads/{id}
TABLE II. DESCRIPTIVE MEASURES FOR THE FSM THAT IS USED TO
GENERATE TSfittest
Variable
Number of traces used to infer FSM 6
Average trace length 100
Number of nodes in generated FSM 51
Number of transitions in generated FSM 134
at IBM Research, can the FITTEST technologies contribute
to the effectiveness of testing when it is used in the testing
the effecti
can be de
Moreover
can help i
RQ3: Ho
FITTEST
at IBM R
As ca
FITTEST
Input Spe
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Collected	
  measures	
  
87
IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$
TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$
TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$
As can be seen from Table IV, the TSibm is substantial
smaller in size than the TSfittest in all parameters, this is one
of the evident results of automation. However, not only the
size of TSfittest is bigger, also the effectiveness of TSfittest,
measured by the injected faults coverage (see Figure 4), is
significantly higher (50% vs 70%). Moreover, if we would
combine the TSibm and TSfittest suites, the effectiveness
increases to 80%. Therefore, within the context of the studied
environment, for IBM Research the FITTEST technologies can
contribute to the effectiveness of testing and IBM Research has
decided that, for optimizing faults-finding capability, the two
techniques can best be combined.
RQ2: Compared to the current test suite used for testing at
IBM Research, can the FITTEST technologies contribute to the
efficiency of testing when it is used in the testing environments
at IBM Research?
TABLE III. EFFICIENCY MEASURES FOR EXECUTION OF BOTH TEST
SUITES.
Variable TSibm normalized TSfittest normalized
by size by size
Execution Time 36.75 9.18 127.87 1.52
with fault injection
Execution Time 27.97 6.99 50.72 0.60
without fault injection
It can be seen from Table III that the time to execute TSibm
is smaller than the time to execute TSfittest. This is due to the
number of concrete tests in TSfittest. When we normalize the
Internal validi
are examined. In o
is related to the lo
vironment to be us
models. Because of
the content of the i
have asked IBM for
is that the quality
the completeness o
because incomplete
of the TSfittest. In
number of detected
can be improved, s
conclusion about t
unchanged. Regard
although they had
working in the in
knowledge of the F
means of a closer
complementing the
mistakes or misund
External validi
is possible to gene
findings are of inter
case. Our results r
given set of artifici
expensive in terms
it with in order to
However, as discus
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

•  RQ1: Compared to the current test suite used for testing at IBM Research, can the
FITTEST tools contribute to the effectiveness of testing when it is used in real
testing environments at IBM?
–  TSibm finds 50% of injected faults, TSfittest finds 70%
–  Together they find 80%! -> IBM will consider combining the techniques
•  RQ2: Compared to the current test suite used for testing at IBM Research, can the
FITTEST tools contribute to the efficiency of testing when it is used in real testing
environments at IBM?
–  FITTEST test cases execute faster because they are smaller
–  Shorter tests was good for IBM -> easier to identify faults
•  RQ3: How much effort would be required to deploy the FITTEST tools within the
testing processes currently implanted at IBM Research?
–  Found reasonable by IBM considering the fact that manual tasks need to be done only
once and more faults were fund.
Automated	
  Test	
  Case	
  Genera;on	
  
example	
  case	
  study	
  IBM:	
  Conclusions	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Threats	
  to	
  validity	
  
•  The	
  learning	
  curve	
  effect	
  
•  (an	
  basically	
  all	
  the	
  other	
  human	
  factors	
  ;-­‐)	
  
•  Missing	
  informa;on	
  in	
  the	
  logs	
  leads	
  to	
  weak	
  FSMs.	
  
•  Incomplete	
  specifica;on	
  of	
  the	
  DIS	
  lead	
  to	
  weak	
  concrete	
  test	
  
cases.	
  
•  The	
  representa;veness	
  of	
  the	
  injected	
  faults	
  to	
  real	
  faults.	
  
89
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Final	
  Things	
  ……	
  
	
  
•  As	
  researchers,	
  we	
  should	
  concentrate	
  on	
  future	
  problems	
  the	
  
industry	
  will	
  face.	
  ¿no?	
  
	
  
•  How	
  can	
  we	
  claim	
  to	
  know	
  future	
  needs	
  without	
  understanding	
  
current	
  ones?	
  
•  Go	
  to	
  industry	
  and	
  evaluate	
  your	
  results!	
  
90
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Need	
  any	
  help?	
  
More	
  informa;on?	
  	
  
Want	
  to	
  do	
  an	
  instan;a;on?	
  
	
  
Contact:	
  
	
  
Tanja	
  E.	
  J.	
  Vos	
  
	
  
email:	
  tvos@pros.upv.es	
  
skype:	
  tanja_vos	
  
web:	
  hVp://tanvopol.webs.upv.es/	
  
project:	
  hVp://www.facebook.com/FITTESTproject	
  
telephone:	
  +34	
  690	
  917	
  971	
  	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

References	
  
empirical	
  research	
  in	
  so_ware	
  engineering	
  
[BSH86]	
  V	
  R	
  Basili,	
  R	
  W	
  Selby,	
  and	
  D	
  H	
  Hutchens.	
  Experimenta;on	
  in	
  so_ware	
  engineering.	
  IEEE	
  Trans.	
  So_w.	
  Eng.,	
  12:733–743,	
  July	
  1986.	
  
[BSL99]	
  Victor	
  R.	
  Basili,	
  Forrest	
  Shull,	
  and	
  Filippo	
  Lanubile.	
  Building	
  knowledge	
  through	
  families	
  of	
  experiments.	
  IEEE	
  Trans.	
  So_w.	
  Eng.,	
  
25(4):456–473,	
  1999.	
  
[CWH05]	
  M.	
  Host	
  C.	
  Wohlin	
  and	
  K.	
  Henningsson.	
  Empirical	
  research	
  methods	
  in	
  so_ware	
  and	
  web	
  engineering.	
  In	
  E.	
  Mendes	
  and	
  N.	
  Mosley,	
  
editors,	
  Web	
  Engineering	
  -­‐	
  Theory	
  and	
  Prac;ce	
  of	
  Metrics	
  and	
  Measurement	
  	
  for	
  Web	
  Development,	
  	
  pages	
  409–	
  429,	
  2005.	
  
[Fly04]	
  Bent	
  Flyvbejrg.	
  Five	
  misunderstandings	
  about	
  case-­‐study	
  research.	
   	
  In	
  Jaber	
  F.	
  Gubrium	
  Clive	
  Seale,	
  Giampietro	
  Gobo	
  and	
  David	
  
Silverman,	
  editors,	
  Qualita;ve	
  Research	
  	
  Prac;ce.,	
  pages	
  420–434.	
  London	
  	
  and	
  Thousand	
  Oaks,	
  CA.,	
  2004.	
  
[KLL97]	
  B.	
  Kitchenham,	
  S.	
  Linkman,	
  and	
  D.	
  Law.	
  Desmet:	
  a	
  methodology	
  for	
  evalua;ng	
  so_ware	
  engineering	
  	
  
[KPP+	
  02]	
  Barbara	
  A.	
  Kitchenham,	
  Shari	
  Lawrence	
  Pfleeger,	
  Lesley	
  M.	
  Pickard,	
  Peter	
  W.	
  Jones,	
  David	
  C.	
  Hoaglin,	
  Khaled	
  El	
  Emam,	
  and	
  JarreV	
  
Rosenberg.	
  	
  Preliminary	
  guidelines	
  	
  for	
  empirical	
  research	
  in	
  so_ware	
  engineering.	
  IEEE	
  	
  Trans.	
  So_w.	
  Eng.,	
  28(8):721–734,	
  	
  2002.	
  
[LSS05]	
  Timothy	
  C.	
  Lethbridge,	
  Susan	
  EllioV	
  Sim,	
  and	
  Janice	
  Singer.	
  Studying	
  so_ware	
  engineers:	
  Data	
  	
  collec;on	
  techniques	
  for	
  so_ware	
  
field	
  studies.	
  Empirical	
  So_ware	
  Engineering,	
  10:311–	
  341,	
  2005.	
  	
  10.1007/s10664-­‐005-­‐1290-­‐x.	
  
[TTDBS07]	
  Paolo	
  Tonella,	
  Marco	
  Torchiano,	
  Bart	
  	
  Du	
  	
  Bois,	
  	
  and	
  Tarja	
  Syst¨a.	
  	
  Empirical	
  studies	
  in	
  reverse	
  engineering:	
  	
  state	
  of	
  the	
  art	
  and	
  
future	
  trends.	
  Empirical	
  So_w.	
  Engg.,	
  12(5):551–571,	
  2007.	
  
[WRH+00]	
  Claes	
  Wohlin,	
  Per	
  Runeson,	
  Mar;n	
  Host,	
  Magnus	
  C.	
  	
  Ohls-­‐	
  son,	
  Bjorn	
  Regnell,	
  and	
  Anders	
  Wesslen.	
  Experimenta;on	
  in	
  so_ware	
  
engineering:	
  	
  an	
  introduc;on.	
  Kluwer	
  Academic	
  Publishers,	
  Norwell,	
  MA,	
  USA,	
  2000.	
  
[Rob02]	
  Colin	
  Robson.	
  Real	
  World	
  Research.	
  Blackwell	
  Publishing	
  Limited,	
  2002.	
  
[RH09]	
  Per	
  Runeson	
  and	
  Mar;n	
  Host.	
  Guidelines	
  for	
  conduc;ng	
  and	
  repor;ng	
  case	
  study	
  research	
  in	
  so_ware	
  engineering.	
  Empirical	
  So_w.	
  
Engg.,	
  14(2):131–164,	
  	
  2009.	
  
[PPV00]	
   Dewayne	
   E.	
   Perry,	
   Adam	
   A.	
   Porter,	
   and	
   Lawrence	
   G.	
   VoVa.	
   2000.	
   Empirical	
   studies	
   of	
   so_ware	
   engineering:	
   a	
   roadmap.	
   In	
  
Proceedings	
  of	
  the	
  Conference	
  on	
  The	
  Future	
  of	
  So?ware	
  Engineering	
  (ICSE	
  '00).	
  ACM,	
  New	
  York,	
  NY,	
  USA,	
  345-­‐355.	
  
[EA06]	
  hVp://www.cs.toronto.edu/~sme/case-­‐studies/case_study_tutorial_slides.pdf	
  
[Ple95]	
  Pfleeger,S.L.;	
  Experimental	
  design	
  and	
  analysis	
  in	
  so_ware	
  engineering.	
  Annals	
  of	
  So_ware	
  Engineering	
  1,	
  219-­‐253,	
  1995.	
  
[PK01-­‐03]	
  Pfleeger,	
  S.L.	
  and	
  Kitchenham,	
  B.A..	
  Principles	
  of	
  Survey	
  Research.	
  So_ware	
  Engineering	
  Notes,	
  (6	
  parts)	
  Nov	
  2001	
  -­‐	
  Mar	
  2003.	
  
[Fly06]	
  Flyvbjerg,	
  B.;	
  Five	
  Misunderstandings	
  about	
  Case	
  Study	
  Research.	
  Qualita;ve	
  Inquiry	
  12	
  (2)	
  219-­‐245,	
  April	
  2006	
  
[KPP95]	
  Kitchenham,	
  B.;	
  Pickard,	
  L.;	
  Pfleeger,	
  Shari	
  Lawrence,	
  "Case	
  studies	
  for	
  method	
  and	
  tool	
  evalua;on,"	
  So?ware,	
  IEEE	
  ,	
  vol.12,	
  no.4,	
  
pp.52,62,	
  Jul	
  1995	
  
	
  
	
  
92
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

[BS87]	
  Victor	
  R.	
   	
  Basili	
  and	
  Richard	
  W.	
   	
  Selby.	
   	
   	
  Comparing	
  the	
  effec;veness	
  of	
  so_ware	
  tes;ng	
  strategies.	
  IEEE	
  Trans.	
  So_w.	
  Eng.,	
  
13(12):1278–1296,	
  1987.	
  
[DRE04]	
  Hyunsook	
  Do,	
  Gregg	
  Rothermel,	
  and	
  Sebas;an	
  Elbaum.	
   	
  In-­‐	
  frastructure	
  support	
  for	
  controlled	
  experimenta;on	
  with	
  so_-­‐	
  
ware	
  tes;ng	
  and	
  regression	
  tes;ng	
  techniques.	
  In	
  Proc.	
  Int.	
  Symp.	
  On	
  Empirical	
  	
  So_ware	
  Engineering,	
  	
  ISESE	
  ’04,	
  2004.	
  
[EHP+	
  06]	
  	
  Sigrid	
  	
  Eldh,	
  Hans	
  Hansson,	
  Sasikumar	
  Punnekkat,	
  Anders	
  Pet-­‐	
  tersson,	
  and	
  Daniel	
  Sundmark.	
  A	
  framework	
  for	
  comparing	
  
efficiency,	
  effec;veness	
  and	
  applicability	
  of	
  so_ware	
  tes;ng	
  tech-­‐	
  niques.	
  Tes;ng:	
  Academic	
  &	
  Industrial	
  Conference	
  on	
  Prac;ce	
  And	
  
Research	
  Techniques	
  (TAIC	
  	
  part),	
  0:159–170,	
  2006.	
  
[JMV04]	
  Natalia	
  Juristo,	
  Ana	
  M.	
  Moreno,	
  and	
  Sira	
  Vegas.	
  Reviewing	
  25	
  years	
  of	
  tes;ng	
  technique	
  experiments.	
  Empirical	
  So_w.	
  Engg.,
9(1-­‐2):7–44,	
  2004.	
  
[RAT+	
  06]	
  Per	
  Runeson,	
  Carina	
  Andersson,	
  Thomas	
  Thelin,	
  Anneliese	
  Andrews,	
  and	
  Tomas	
  Berling.	
  	
  What	
  do	
  we	
  know	
  about	
  de-­‐	
  fect	
  
detec;on	
  methods?	
  IEEE	
  So_w.,	
  23(3):82–90,	
  2006.	
  
[VB05]	
  Sira	
  Vegas	
  and	
  Victor	
  Basili.	
  	
  A	
  	
  characterisa;on	
  	
  schema	
  for	
  so_ware	
  tes;ng	
  Techniques.	
  Empirical	
  So_w.	
  Engg.,	
  10(4):437–
466,	
  2005.	
  
[LR96]	
  Christopher	
  M.	
  LoV	
  and	
  H.	
  Dieter	
  Rombach.	
  Repeatable	
  so_ware	
  engineering	
  	
  experiments	
  	
  for	
  comparing	
  defect-­‐detec;on	
  
techniques.	
  Empirical	
  	
  So_ware	
  Engineering,	
  	
  1:241–277,	
  	
  1996.	
  10.1007/BF00127447.	
  
	
  
	
  
	
  
93
References	
  
Empirical	
  research	
  in	
  so_ware	
  tes;ng	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

[Vos	
  et.	
  Al.	
  2010]	
  T.	
  E.	
  J.	
  Vos,	
  A.	
  I.	
  Baars,	
  F.	
  F.	
  Lindlar,	
  P.	
  M.	
  Kruse,	
  A.	
  Windisch,	
  and	
  J.	
  Wegener,	
  “Industrial	
  scaled	
  automated	
  structural	
  
tes;ng	
  with	
  the	
  evolu;onary	
  tes;ng	
  tool,”	
  in	
  ICST,	
  2010,	
  pp.	
  175–184.	
  	
  
[Vos	
  et.	
  al	
  2012]	
  Tanja	
  E.	
  J.	
  Vos,	
  Arthur	
  I.	
  Baars,	
  Felix	
  F.	
  Lindlar,	
  Andreas	
  Windisch,	
  Benjamin	
  Wilmes,	
  Hamilton	
  Gross,	
  Peter	
  M.	
  Kruse,	
  
Joachim	
  Wegener:	
  Industrial	
  Case	
  Studies	
  for	
  Evalua;ng	
  Search	
  Based	
  Structural	
  Tes;ng.	
  Interna;onal	
  Journal	
  of	
  So_ware	
  Engineering	
  
and	
  Knowledge	
  Engineering	
  22(8):	
  1123-­‐	
  (2012)	
  
[Vos	
  et.	
  Al.	
  2013]	
  T.	
  E.	
  J.	
  Vos,	
  F.	
  Lindlar,	
  B.	
  Wilmes,	
  A.	
  Windisch,	
  A.	
  Baars,	
  P.	
  Kruse,	
  H.	
  Gross,	
  and	
  J.	
  Wegener,	
  “Evolu;onary	
  func;onal	
  
black-­‐box	
  tes;ng	
  in	
  an	
  industrial	
  semng,”	
  So?ware	
  Quality	
  Journal,	
  21	
  (2):	
  259-­‐288	
  (2013)	
  
[MRT08]	
  A.	
  MarcheVo,	
  F.	
  Ricca,	
  and	
  P.	
  Tonella,	
  “A	
  case	
  study-­‐	
  based	
  comparison	
  of	
  web	
  tes;ng	
  techniques	
  applied	
  to	
  ajax	
  web	
  
applica;ons,”	
  Interna;onal	
  Journal	
  on	
  So_ware	
  Tools	
  for	
  Technology	
  Transfer	
  (STTT),	
  vol.	
  10,	
  pp.	
  477–	
  492,	
  2008	
  	
  
94
References	
  
Descrip;ons	
  of	
  empirical	
  studies	
  done	
  in	
  so_ware	
  tes;ng	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

Other	
  references	
  
[TS04]	
  T.S.	
  Tullis	
  and	
  J.	
  N.	
  Stetson.	
  A	
  comparison	
  of	
  ques;onnaires	
  for	
  assessing	
  website	
  usability.	
  In	
  Proceedings	
  of	
  the	
  Usability	
  
Professionals	
  Associa;on	
  Conference,	
  2004.	
  
[BKM08]	
  Aaron	
  Bangor,	
  Philip	
  T.	
  	
  Kortum,	
  and	
  James	
  T.	
  	
  Miller.	
  	
  An	
  empirical	
  evalua;on	
  of	
  the	
  system	
  usability	
  scale.	
  Interna;onal	
  
Journal	
  of	
  Human-­‐computer	
  	
  Interac;on,	
  	
  24:574–594,	
  2008.	
  
[Horn06]	
  Hornbæk,	
  K.	
  (2006),	
  "Current	
  Prac8ce	
  in	
  Measuring	
  Usability:	
  Challenges	
  to	
  Usability	
  Studies	
  and	
  Research",	
  Interna8onal	
  
Journal	
  of	
  Human-­‐Computer	
  Studies,	
  Volume	
  64,	
  Issue	
  2	
  ,	
  February	
  2006,	
  Pages	
  79-­‐102.	
  	
  
[MTR08]	
  MarcheVo,	
  Alessandro;	
  Tonella,	
  Paolo	
  and	
  Ricca,	
  Filippo.,	
  State-­‐based	
  tes;ng	
  of	
  Ajax	
  web	
  applica;ons,	
  So_ware	
  Tes;ng,	
  
Verifica;on,	
  and	
  Valida;on,	
  2008	
  1st	
  Interna;onal	
  Conference	
  on,	
  pp121-­‐130,	
  IEEE,	
  2008.	
  
[NMT12]	
  C.	
  D.	
  Nguyen,	
  A.	
  MarcheVo,	
  and	
  P.	
  Tonella.	
  Combining	
  model-­‐based	
  and	
  combinatorial	
  tes;ng	
  for	
  effec;ve	
  test	
  case	
  
genera;on.	
  In	
  Proceedings	
  of	
  the	
  2012	
  Interna;onal	
  Symposium	
  on	
  So_ware	
  Tes;ng	
  and	
  Analysis,	
  pages	
  100-­‐110,	
  ACM,	
  2012.	
  
	
  
95

Contenu connexe

Similaire à TAROT2013 Testing School - Tanja Vos presentation

TAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - ItalyTAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - ItalyTanja Vos
 
Julie Marguerite - Tefis open calls (fia dec 2010)
Julie Marguerite - Tefis open calls  (fia dec 2010)Julie Marguerite - Tefis open calls  (fia dec 2010)
Julie Marguerite - Tefis open calls (fia dec 2010)FIA2010
 
About IRT Nanoelec
About IRT NanoelecAbout IRT Nanoelec
About IRT NanoelecIRTNanoelec
 
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptxECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptxManuel Castro
 
IoT.est Project ID Card
IoT.est Project ID CardIoT.est Project ID Card
IoT.est Project ID Cardiotest
 
COMNET as a Capacity-Building Partner for Two African Institutions
COMNET as a Capacity-Building Partner for Two African InstitutionsCOMNET as a Capacity-Building Partner for Two African Institutions
COMNET as a Capacity-Building Partner for Two African InstitutionsProjectENhANCE
 
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...EADTU
 
Update on the ELIXIR UK node by Chris Ponting
Update on the ELIXIR UK node by Chris PontingUpdate on the ELIXIR UK node by Chris Ponting
Update on the ELIXIR UK node by Chris PontingELIXIR UK
 
20110712.we nmr.utrecht
20110712.we nmr.utrecht20110712.we nmr.utrecht
20110712.we nmr.utrechtNuno Ferreira
 
Testing Challenges and Approaches in Edge Computing
Testing Challenges and Approaches in Edge ComputingTesting Challenges and Approaches in Edge Computing
Testing Challenges and Approaches in Edge ComputingAxel Rennoch
 
EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project Manuel Castro
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)L4MS
 
Advancing the JISC Access & Identity Management Programme
Advancing the JISC Access & Identity Management ProgrammeAdvancing the JISC Access & Identity Management Programme
Advancing the JISC Access & Identity Management ProgrammeJISC Netskills
 
Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...christophefeltus
 

Similaire à TAROT2013 Testing School - Tanja Vos presentation (20)

TAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - ItalyTAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - Italy
 
Julie Marguerite - Tefis open calls (fia dec 2010)
Julie Marguerite - Tefis open calls  (fia dec 2010)Julie Marguerite - Tefis open calls  (fia dec 2010)
Julie Marguerite - Tefis open calls (fia dec 2010)
 
Fire at Net Futures2015
Fire at Net Futures2015Fire at Net Futures2015
Fire at Net Futures2015
 
About IRT Nanoelec
About IRT NanoelecAbout IRT Nanoelec
About IRT Nanoelec
 
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptxECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
 
IoT.est Project ID Card
IoT.est Project ID CardIoT.est Project ID Card
IoT.est Project ID Card
 
Tien3
Tien3Tien3
Tien3
 
COMNET as a Capacity-Building Partner for Two African Institutions
COMNET as a Capacity-Building Partner for Two African InstitutionsCOMNET as a Capacity-Building Partner for Two African Institutions
COMNET as a Capacity-Building Partner for Two African Institutions
 
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
 
Update on the ELIXIR UK node by Chris Ponting
Update on the ELIXIR UK node by Chris PontingUpdate on the ELIXIR UK node by Chris Ponting
Update on the ELIXIR UK node by Chris Ponting
 
20110712.we nmr.utrecht
20110712.we nmr.utrecht20110712.we nmr.utrecht
20110712.we nmr.utrecht
 
Testing Challenges and Approaches in Edge Computing
Testing Challenges and Approaches in Edge ComputingTesting Challenges and Approaches in Edge Computing
Testing Challenges and Approaches in Edge Computing
 
EDI - European Data Incubator - Public Presentation
EDI - European Data Incubator - Public PresentationEDI - European Data Incubator - Public Presentation
EDI - European Data Incubator - Public Presentation
 
EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project
 
Enoll hannover-2013-anna
Enoll hannover-2013-annaEnoll hannover-2013-anna
Enoll hannover-2013-anna
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)
 
iTEC widgetStore
iTEC widgetStoreiTEC widgetStore
iTEC widgetStore
 
Advancing the JISC Access & Identity Management Programme
Advancing the JISC Access & Identity Management ProgrammeAdvancing the JISC Access & Identity Management Programme
Advancing the JISC Access & Identity Management Programme
 
Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...
 

Plus de Henry Muccini

Human Behaviour Centred Design
Human Behaviour Centred Design Human Behaviour Centred Design
Human Behaviour Centred Design Henry Muccini
 
How cultural heritage, cyber-physical spaces, and software engineering can wo...
How cultural heritage, cyber-physical spaces, and software engineering can wo...How cultural heritage, cyber-physical spaces, and software engineering can wo...
How cultural heritage, cyber-physical spaces, and software engineering can wo...Henry Muccini
 
La gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle Segreterie
La gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle SegreterieLa gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle Segreterie
La gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle SegreterieHenry Muccini
 
Turismo 4.0: l'ICT a supporto del turismo sostenibile
Turismo 4.0: l'ICT a supporto del turismo sostenibileTurismo 4.0: l'ICT a supporto del turismo sostenibile
Turismo 4.0: l'ICT a supporto del turismo sostenibileHenry Muccini
 
Sustainable Tourism - IoT and crowd management
Sustainable Tourism - IoT and crowd managementSustainable Tourism - IoT and crowd management
Sustainable Tourism - IoT and crowd managementHenry Muccini
 
Software Engineering at the age of the Internet of Things
Software Engineering at the age of the Internet of ThingsSoftware Engineering at the age of the Internet of Things
Software Engineering at the age of the Internet of ThingsHenry Muccini
 
The influence of Group Decision Making on Architecture Design Decisions
The influence of Group Decision Making on Architecture Design DecisionsThe influence of Group Decision Making on Architecture Design Decisions
The influence of Group Decision Making on Architecture Design DecisionsHenry Muccini
 
An IoT Software Architecture for an Evacuable Building Architecture
An IoT Software Architecture for an Evacuable Building ArchitectureAn IoT Software Architecture for an Evacuable Building Architecture
An IoT Software Architecture for an Evacuable Building ArchitectureHenry Muccini
 
Web Engineering L8: User-centered Design (8/8)
Web Engineering L8: User-centered Design (8/8)Web Engineering L8: User-centered Design (8/8)
Web Engineering L8: User-centered Design (8/8)Henry Muccini
 
Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)
Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)
Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)Henry Muccini
 
Web Engineering L6: Software Architecture for the Web (6/8)
Web Engineering L6: Software Architecture for the Web (6/8)Web Engineering L6: Software Architecture for the Web (6/8)
Web Engineering L6: Software Architecture for the Web (6/8)Henry Muccini
 
Web Engineering L5: Content Model (5/8)
Web Engineering L5: Content Model (5/8)Web Engineering L5: Content Model (5/8)
Web Engineering L5: Content Model (5/8)Henry Muccini
 
Web Engineering L3: Project Planning (3/8)
Web Engineering L3: Project Planning (3/8)Web Engineering L3: Project Planning (3/8)
Web Engineering L3: Project Planning (3/8)Henry Muccini
 
Web Engineering L2: Requirements Elicitation for the Web (2/8)
Web Engineering L2: Requirements Elicitation for the Web (2/8)Web Engineering L2: Requirements Elicitation for the Web (2/8)
Web Engineering L2: Requirements Elicitation for the Web (2/8)Henry Muccini
 
Web Engineering L1: introduction to Web Engineering (1/8)
Web Engineering L1: introduction to Web Engineering (1/8)Web Engineering L1: introduction to Web Engineering (1/8)
Web Engineering L1: introduction to Web Engineering (1/8)Henry Muccini
 
Web Engineering L4: Requirements and Planning in concrete (4/8)
Web Engineering L4: Requirements and Planning in concrete (4/8)Web Engineering L4: Requirements and Planning in concrete (4/8)
Web Engineering L4: Requirements and Planning in concrete (4/8)Henry Muccini
 
Collaborative aspects of Decision Making and its impact on Sustainability
Collaborative aspects of Decision Making and its impact on SustainabilityCollaborative aspects of Decision Making and its impact on Sustainability
Collaborative aspects of Decision Making and its impact on SustainabilityHenry Muccini
 
Engineering Cyber Physical Spaces
Engineering Cyber Physical SpacesEngineering Cyber Physical Spaces
Engineering Cyber Physical SpacesHenry Muccini
 
I progetti UnivAq-UFFIZI, INCIPICT, e  CUSPIS
I progetti UnivAq-UFFIZI, INCIPICT, e  CUSPISI progetti UnivAq-UFFIZI, INCIPICT, e  CUSPIS
I progetti UnivAq-UFFIZI, INCIPICT, e  CUSPISHenry Muccini
 
Exploring the Temporal Aspects of Software Architecture
Exploring the Temporal Aspects of Software ArchitectureExploring the Temporal Aspects of Software Architecture
Exploring the Temporal Aspects of Software ArchitectureHenry Muccini
 

Plus de Henry Muccini (20)

Human Behaviour Centred Design
Human Behaviour Centred Design Human Behaviour Centred Design
Human Behaviour Centred Design
 
How cultural heritage, cyber-physical spaces, and software engineering can wo...
How cultural heritage, cyber-physical spaces, and software engineering can wo...How cultural heritage, cyber-physical spaces, and software engineering can wo...
How cultural heritage, cyber-physical spaces, and software engineering can wo...
 
La gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle Segreterie
La gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle SegreterieLa gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle Segreterie
La gestione dell’utenza numerosa - dalle Segreterie, ai Musei, alle Segreterie
 
Turismo 4.0: l'ICT a supporto del turismo sostenibile
Turismo 4.0: l'ICT a supporto del turismo sostenibileTurismo 4.0: l'ICT a supporto del turismo sostenibile
Turismo 4.0: l'ICT a supporto del turismo sostenibile
 
Sustainable Tourism - IoT and crowd management
Sustainable Tourism - IoT and crowd managementSustainable Tourism - IoT and crowd management
Sustainable Tourism - IoT and crowd management
 
Software Engineering at the age of the Internet of Things
Software Engineering at the age of the Internet of ThingsSoftware Engineering at the age of the Internet of Things
Software Engineering at the age of the Internet of Things
 
The influence of Group Decision Making on Architecture Design Decisions
The influence of Group Decision Making on Architecture Design DecisionsThe influence of Group Decision Making on Architecture Design Decisions
The influence of Group Decision Making on Architecture Design Decisions
 
An IoT Software Architecture for an Evacuable Building Architecture
An IoT Software Architecture for an Evacuable Building ArchitectureAn IoT Software Architecture for an Evacuable Building Architecture
An IoT Software Architecture for an Evacuable Building Architecture
 
Web Engineering L8: User-centered Design (8/8)
Web Engineering L8: User-centered Design (8/8)Web Engineering L8: User-centered Design (8/8)
Web Engineering L8: User-centered Design (8/8)
 
Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)
Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)
Web Engineering L7: Sequence Diagrams and Design Decisions (7/8)
 
Web Engineering L6: Software Architecture for the Web (6/8)
Web Engineering L6: Software Architecture for the Web (6/8)Web Engineering L6: Software Architecture for the Web (6/8)
Web Engineering L6: Software Architecture for the Web (6/8)
 
Web Engineering L5: Content Model (5/8)
Web Engineering L5: Content Model (5/8)Web Engineering L5: Content Model (5/8)
Web Engineering L5: Content Model (5/8)
 
Web Engineering L3: Project Planning (3/8)
Web Engineering L3: Project Planning (3/8)Web Engineering L3: Project Planning (3/8)
Web Engineering L3: Project Planning (3/8)
 
Web Engineering L2: Requirements Elicitation for the Web (2/8)
Web Engineering L2: Requirements Elicitation for the Web (2/8)Web Engineering L2: Requirements Elicitation for the Web (2/8)
Web Engineering L2: Requirements Elicitation for the Web (2/8)
 
Web Engineering L1: introduction to Web Engineering (1/8)
Web Engineering L1: introduction to Web Engineering (1/8)Web Engineering L1: introduction to Web Engineering (1/8)
Web Engineering L1: introduction to Web Engineering (1/8)
 
Web Engineering L4: Requirements and Planning in concrete (4/8)
Web Engineering L4: Requirements and Planning in concrete (4/8)Web Engineering L4: Requirements and Planning in concrete (4/8)
Web Engineering L4: Requirements and Planning in concrete (4/8)
 
Collaborative aspects of Decision Making and its impact on Sustainability
Collaborative aspects of Decision Making and its impact on SustainabilityCollaborative aspects of Decision Making and its impact on Sustainability
Collaborative aspects of Decision Making and its impact on Sustainability
 
Engineering Cyber Physical Spaces
Engineering Cyber Physical SpacesEngineering Cyber Physical Spaces
Engineering Cyber Physical Spaces
 
I progetti UnivAq-UFFIZI, INCIPICT, e  CUSPIS
I progetti UnivAq-UFFIZI, INCIPICT, e  CUSPISI progetti UnivAq-UFFIZI, INCIPICT, e  CUSPIS
I progetti UnivAq-UFFIZI, INCIPICT, e  CUSPIS
 
Exploring the Temporal Aspects of Software Architecture
Exploring the Temporal Aspects of Software ArchitectureExploring the Temporal Aspects of Software Architecture
Exploring the Temporal Aspects of Software Architecture
 

Dernier

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 

Dernier (20)

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 

TAROT2013 Testing School - Tanja Vos presentation

  • 1. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Tanja E.J. Vos Centro de Investigación en Métodos de Producción de Software Universidad Politecnica de Valencia Valencia, Spain tvos@pros.upv.es 1 European initiatives where academia and industry get together.  
  • 2. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Overview   •  EU  projects  (with  SBST)  that  I  have  been  coordina;ng:   –  EvoTest  (2006-­‐2009)   –  FITTEST  (2010-­‐2013)   •  What  is  means  to  coordinate  them  and  how  they  are  structured   •  How  do  we  evaluate  their  results  through  academia-­‐industry   projects.   2
  • 3. The FITTEST project is funded by the European Commission (FP7-ICT-257574) EvoTest   •  Evolutionary Testing for Complex Systems •  September 2006– September 2009 •  Total costs: 4.300.000 euros •  Partners:   –  Universidad  Politecnica  de  Valencia  (Spain)   –  University  College  London  (United  Kingdom)   –  Daimler  Chrysler  (Germany)   –  Berner  &  MaVner  (Germany)   –  Fraunhofer  FIRST  (Germany)   –  Motorola  (UK)   –  Rila  Solu;ons  (Bulgaria)   •  Website  does  not  work  anymore
  • 4. The FITTEST project is funded by the European Commission (FP7-ICT-257574) EvoTest  objec;ves/results   •  Apply  Evolu;onary  Search  Based  Tes;ng  techniques  to  solve  tes;ng  problems   from  a  wide  spectrum  of  complex  real  world  systems    in  an  industrial  context.     •  Improve  the  power  of  evolu;onary  algorithms  for  searching  important  test   scenarios,  hybridising  with  other  techniques:   –  other  general-­‐purpose  search  techniques,   –  other  advanced  so_ware  engineering  techniques,  such  as  slicing  and  program  transforma;on.     •  An  extensible  and  open  Automated  Evolu;onary  Tes;ng  Architecture  and   Framework  will  be  developed.  This  will  provide  general  components  and   interfaces  to  facilitate  the  automa;c  genera;on,  execu;on,  monitoring  and   evalua;on  of  effec;ve  test  scenarios.       4
  • 5. The FITTEST project is funded by the European Commission (FP7-ICT-257574) FITTEST   •  Future Internet Testing •  September 2010 – December 2013 •  Total costs: 5.845.000 euros •  Partners:   –  Universidad  Politecnica  de  Valencia  (Spain)   –  University  College  London  (United  Kingdom)   –  Berner  &  MaVner  (Germany)   –  IBM  (Israel)   –  Fondazione  Bruno  Kessler  (Italy)   –  Universiteit  Utrecht  (The  Netherlands)   –  So_team  (France)     •   hVp://www.pros.upv.es/fiVest/  
  • 6. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Future Internet Applications –  Characterized by an extreme high level of dynamism –  Adaptation to usage context (context awareness) –  Dynamic discovery and composition of services –  Etc.. •  Testing of these applications gets extremely important •  Society depends more and more on them •  Critical activities such as social services, learning, finance, business. •  Traditional testing is not enough –  Testwares are fixed •  Continuous testing is needed –  Testwares that automatically adapt to the dynamic behavior of the Future Internet application –  This is the objective of FITTEST FITTEST  objec;ves/results  
  • 7. The FITTEST project is funded by the European Commission (FP7-ICT-257574) ORACLES RUN SUT GENERATE TEST CASES AUTOMATE TEST CASES SUT INSTRUMENT COLLECT & PREPROCESS LOGS LOGGING TEST-WARE GENERATION ANALYSE & INFER MODELS ANALYSE & INFER ORACLES EXECUTE TEST CASES TEST EVALUATION EVALUATE TEST CASES MODEL BASED ORACLES HUMAN ORACLES TEST RESULTS TEST EXECUTION FREE ORACLES Model based oracles PROPERTIES BASED ORACLES Log based oracles End-users PATTERN BASED ORACLES MANUALLY Domain experts Domain Input Specifications OPTIONAL MANUAL EXTENSIONS How  does  it  work?   LOGGING 1.  Run the target System that is Under Test (SUT) 2.  Collect the logs it generates This can be done by: •  real usage by end users of the application in the production environment •  test case execution in the test environment.
  • 8. The FITTEST project is funded by the European Commission (FP7-ICT-257574) How  does  it  work?     ORACLES RUN SUT GENERATE TEST CASES AUTOMATE TEST CASES UMENT COLLECT & PREPROCESS LOGS LOGGING TEST-WARE GENERATION ANALYSE & INFER MODELS ANALYSE & INFER ORACLES EXECUTE TEST CASES TEST EVALUATION EVALUATE TEST CASES MODEL BASED ORACLES HUMAN ORACLES TEST RESULTS TEST EXECUTION FREE ORACLES Model based oracles PROPERTIES BASED ORACLES Log based oracles sers PATTERN BASED ORACLES MANUALLY experts Domain Input Specifications OPTIONAL MANUAL EXTENSIONS GENERATION 1.  Analyse the logs 2.  Generate different testwares: •  Models •  Domain Input Specification •  Oracles 3.  Use these to generate and automate a test suite consisting off: •  Abstract test cases •  Concrete test cases •  Pass/Fail Evaluation criteria
  • 9. The FITTEST project is funded by the European Commission (FP7-ICT-257574) How  does  it  work?     ORACLES RUN SUT GENERATE TEST CASES AUTOMATE TEST CASES SUT INSTRUMENT COLLECT & PREPROCESS LOGS LOGGING TEST-WARE GENERATION ANALYSE & INFER MODELS ANALYSE & INFER ORACLES EXECUTE TEST CASES TEST EVALUATION EVALUATE TEST CASES MODEL BASED ORACLES HUMAN ORACLES TEST RESULTS TEST EXECUTION FREE ORACLES Model based oracles PROPERTIES BASED ORACLES Log based oracles End-users PATTERN BASED ORACLES MANUALLY Domain experts Domain Input Specifications OPTIONAL MANUAL EXTENSIONS Execute the test cases and start a new test cycle for continuous testing and adaptation of the test wares!
  • 10. The FITTEST project is funded by the European Commission (FP7-ICT-257574) What  does  it  mean  to  be  an  EU     project  coordinator   •  Understand what the project is about, what needs to be done and what is most important. •  Do NOT be afraid to get your hands dirty. •  Do NOT assume people are working as hard on the project as you are (or are as enthusiastic as you ;-) •  Do NOT assume that the people that ARE responsible for some tasks TAKE this responsibility •  Get CC-ed in all emails and deal with it •  Stalk people (email, sms, whatsapp, skype, voicemail messages) •  Be patient when explaining the same things over and over again 10
  • 11. The FITTEST project is funded by the European Commission (FP7-ICT-257574) EU  project  structures   •  We have: WorkPackages (WP) •  These are composed of: Tasks •  These result in: Deliverables WP   Project  Management   WP   Exploita;on  and  Dissemina;on   WP:  Integrated  it   all  together  in  a   superduper   solu;on  that   industry  needs   WP:  Do  some  research   WP:  And  some  more   WP:  Do  more  research   WP:  And  more……..   WP:  Evaluate  Your  Results  through  Case  Studies  
  • 12. The FITTEST project is funded by the European Commission (FP7-ICT-257574) EU  project   how  to  evaluate  your  results   •  You need to do studies that evaluate the resulting testing tools/techniques within a real industrial environment 12
  • 13. The FITTEST project is funded by the European Commission (FP7-ICT-257574) WE  NEED  MORE  THAN  …   Glenford Myers 1979! Triangle Problem Test a program which can return the type of a triangle based on the widths of the 3 sides For  evalua;on   of  tes;ng  tools   we  need  more!   b a c α β γ
  • 14. The FITTEST project is funded by the European Commission (FP7-ICT-257574) WE  NEED….   Real  people   Real  faults   Real  systems   Real  Tes;ng   Environments   Real  Tes;ng   Processes   Empirical  studies   with  
  • 15. The FITTEST project is funded by the European Commission (FP7-ICT-257574) What  are  empirical  studies   •  [Wikipedia]  Empirical  research  is  a  way  of  gaining  knowledge  by  means  of   direct  and  indirect  observa8on  or  experience.   •  [PPV00]  An  empirical  study  is  really  just  a  test  that  compares  what  we   believe  to  what  we  observe.  Such  tests  when  wisely  constructed  and   executed  play  a  fundamental  role  in  so?ware  engineering,  helping  us   understand  how  and  why  things  work.   •  Collec;ng  data:   –  Quan;ta;ve  data  -­‐>  numeric  data   –  Qualita;ve  data  -­‐>  observa;ons,  interviews,  opinions,  diaries,  etc.   •  Different  kinds  are  dis;nguished  in  literature  [WRH+00,  RH09]   –  Controlled  experiments   –  Surveys   –  Case  Studies   –  Ac;on  research   15
  • 16. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Controlled  experiments   What   Experimental   inves;ga;on   of   hypothesis   in   a   laboratory   semng,   in   which   condi;ons  are  set  up  to  isolate  the  variables  of  interest  ("independent  variables")   and   test   how   they   affect   certain   measurable   outcomes   (the   "dependent   variables")   Good  for   –  Quan;ta;ve  analysis  of  benefits  of  a  tes;ng  tool  or  technique   –  We  can  use  methods  showing  sta;s;cal  significance   –  We  can  demonstrate  how  scien;fic  we  are!  [EA06]   Disadvantages   –  Limited  confidence  that  laboratory  set-­‐up  reflects  the  real  situa;on   –  ignores  contextual  factors  (e.g.  social/organiza;onal/poli;cal  factors)   –  extremely  ;me-­‐consuming   See:  [BSH86,  CWH05,  PPV00,  Ple95]     16
  • 17. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Surveys   What   Collec;ng  informa;on  to  describe,  compare  or  explain  knowledge,  amtudes  and   behaviour  over  large  popula;ons  using  interviews  or  ques0onnaires.   Good  for   –  Quan;ta;ve  and  qualita;ve  data   –  Inves;ga;ng  the  nature  of  a  large  popula;on   –  Tes;ng  theories  where  there  is  liVle  control  over  the  variables   Disadvantages   –  Difficul;es  of  sampling  and  selec;on  of  par;cipants   –  Collected  informa;on  tends  to  subjec;ve  opinion   See:  [PK01-­‐03]     17
  • 18. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Case  Studies   What   A  technique  for  detailed  exploratory  inves;ga;ons  that  aVempt  to  understand   and  explain  phenomenon  or  test  theories  within  their  context   Good  for   –  Quan;ta;ve  and  qualita;ve  data   –  Inves;ga;ng  capability  of  a  tool  within  a  specific  real  context   –  Gaining  insights  into  chains  of  cause  and  effect   –  Tes;ng  theories  in  complex  semngs  where  there  is  liVle  control  over  the   variables  (Companies!)   Disadvantages   –  Hard  to  find  good  appropriate  case  studies   –  Hard  to  quan;fy  findings   –  Hard  to  build  generaliza;ons  (only  context)   See:  [RH09,  EA06,  Fly06]       18
  • 19. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Ac;on  Research   What   research   ini;ated   to   solve   an   immediate   problem   (or   a   reflec;ve   process   of   progressive   problem   solving)   involving   a   process   of   ac;vely   par;cipa;ng   in   an   organiza;on’s  change  situa;on  whilst  conduc;ng  research   Good  for   –  Quan;ta;ve  and  qualita;ve  data   –  When  the  goal  is  solving  a  problem.   –  When  the  goal  of  the  study  is  change.   Disadvantages   –  Hard  to  quan;fy  findings   –  Hard  to  build  generaliza;ons  (only  context)   See:  [RH09,  EA06,  Fly06,  Rob02]       19
  • 20. The FITTEST project is funded by the European Commission (FP7-ICT-257574) EU  project   how  to  evaluate  your  results   •  You need to do empirical studies that evaluate the resulting testing tools/techniques within a real industrial environment •  The empirical study that best fits our purposes is Case Study –  Evaluate the capability of our testing techiques/tools –  In a real industrial context –  Comparing to current testing practice 20
  • 21. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Case  studies:  not  only  for  jus;fying   research  projects   •  We  need  to  apply  our  results  in  industry  to  understand  the  problems   they  have  and  if  we  are  going  the  right  direc;on  to  solve  them.   •  Real  need  in  so_ware  engineering  industry  to  have  general  guidelines   on   what   tes;ng   techniques   and   tools   to   use   for   different   tes;ng   objec;ves,  and  how  usable  these  techniques  are.   •  Up  to  date  these  guidelines  do  not  exist.   •  If  we  would  have  a  body  of  documented  experiences  and  knowledge   from  which  the  needed  guidelines  can  be  extracted   •  With   these   guidelines,   tes;ng   prac;cioners   might   make   informed   decisions  about  which  techniques  to  use  and  es;mate  the  ;me/effort   that  is  needed.   21
  • 22. The FITTEST project is funded by the European Commission (FP7-ICT-257574) The  challenge   TO DO THIS WE HAVE TO: •  Perform more evaluative empirical case studies in industrial environments •  Carry out these studies by following the same methodology, to enhance the comparison among the testing techniques and tools, •  Involve realistic systems, environments and subjects (and not toy-programs and students as is the case in most current work). •  Do the studies thoroughly to ensure that any benefit identified during the evaluation study is clearly derived from the testing technique studied, and also to ensure that different studies can be compared. FOR THIS WE NEED: •  A general methodological evaluation framework that can simplify the design of case studies for comparing software testing techniques and make the results more precise, reliable, and easy to compare. To create a body of evidence consisting of evaluative studies of testing techniques and tools that can be used to understand the needs of industry and derive general guidelines about their usability and applicability in industry.
  • 23. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Obtain  answers  to  general  ques;ons  about  adop;ng  different  so_ware   tes;ng  techniques  and  or  tools.   Secondary  study   (EBSE)   TT1   CS1   CS2   CSn   TT2   CS1   CS2   CSn   TTi   CS1   CS2   CSn   TTn   CS1   CS2   CSn   Body of Evidence Tes;ng  Techniques  and  Tools  (TT)   Primary  case  studies   General  Framework  for  EvaluaDng  TesDng   Techniques  and  Tools         InstanDate     The  idea  
  • 24. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Very  brief…what  is  EBSE   Evidence  Based  So_ware  Engineering   •  The   essence   of   the   evidence-­‐based   paradigm   is   that   of   systema;cally   collec0ng  and  analysing  all  of  the  available  empirical  data  about  a  given   phenomenon   in   order   to   obtain   a   much   wider   and   more   complete   perspec;ve   than   would   be   obtained   from   an   individual   study,   not   least   because  each  study  takes  place  within  a  par;cular  context  and  involves  a   specific  set  of  par;cipants.   •  The  core  tool  of  the  evidence-­‐based  paradigm  is  the  Systema;c  Literature   Review  (SLR)   –  Secondary  study   –  Gathering  an  analysing  primary  stydies.     •  See:  hVp://www.dur.ac.uk/ebse/about.php   24
  • 25. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Obtain  answers  to  general  ques;ons  about  adop;ng  different  so_ware   tes;ng  techniques  and  or  tools.   Secondary  study   (EBSE)   TT1   CS1   CS2   CSn   TT2   CS1   CS2   CSn   TTi   CS1   CS2   CSn   TTn   CS1   CS2   CSn   Body of Evidence Tes;ng  Techniques  and  Tools  (TT)   Primary  case  studies   General  Framework  for  EvaluaDng  TesDng   Techniques  and  Tools         InstanDate     The  idea  
  • 26. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Empirical  research  with  industry  =  difficult   •  Not “rocket science”-difficult
  • 27. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Empirical  research  with  industry  =  difficult   •  But “communication science”-difficult
  • 28. The FITTEST project is funded by the European Commission (FP7-ICT-257574) “communica;on  science”-­‐difficult   28 Only takes 1 hour Not so long Not so long 5 minutes Researcher Practitioner in a company
  • 29. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Communica;on  is  extremely  difficult  
  • 30. The FITTEST project is funded by the European Commission (FP7-ICT-257574) 30
  • 31. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Examples…..   Academia   •  Wants  to  empirically  evaluate  T   •  What  techniques/tools  can  we   compare  with?   •  Why  don´t  you  know  that?   •  That  does  not  take  so  much  ;me!   •  Finding  real  faults  would  be  great!   •  Can  we  then  inject  faults?   •  How  many  people  can  use  it?   •  Is  there  historical    data?   •  But  you  do  have  that  informa;on?   Industry   •  Wants  to  execute  and  use  T  to  see  what   happens.   •  We  use  intui;on!     •  You  want  me  to  know  all  that?   •  That  much  ;me!?   •  We  cannot  give  this  informa;on.   •  Not  ar;ficial  ones,  we  really  need  to   know  if  this  would  work  for  real  faults.   •  We  can  assign  1  person.   •  That  is  confiden;al.   •  Oh..,  I  thought  you  did  not  need  that.   31
  • 32. The FITTEST project is funded by the European Commission (FP7-ICT-257574) With  the  objec;ve  to  improve  and   reduce  some  barriers   •  Use  a  general  methodological  framework.   •  To  use  as  a  vehicle  of  communica;on   •  To  simplify  the  design   •  To  make  sure  that  studies  can  be  compared  and  aggregated   32
  • 33. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Exis;ng  work     •  By Lott & Rombach, Eldh, Basili, Do et al, Kitchenham et al •  Describe organizational frameworks, i.e.: –  General steps –  Warnings when designing –  Confounding factors that should be minimized •  We pretended to define a methodological framework that defined how to evaluate software testing techiques, i.e.: –  The research questions that can be posed –  The variables that can be measured –  Etc.
  • 34. The FITTEST project is funded by the European Commission (FP7-ICT-257574) The  methodological  framework   •  Imagine a company C wants to evaluate T to see whether it is useful and worthwhile to incorporate in its company. •  Components of the Framework (each case study will be an instantiation) –  Objectives: effectiveness, efficiency, satisfaction –  Cases or treatments (= the testing techniques/tools) –  The subjects (= practitioners that will do the study) –  The objects or pilot projects: selection criteria. –  The variables and metrics: which data to collect? –  Protocol that defines how to execute and collect data. –  How to analyse the data –  Threats to validity –  Toolbox
  • 35. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Defini;on  of  the  framework  –  research questions •  RQ1: How does T contribute to the effectiveness of testing when it is being used in real testing environments of C and compared to the current practices of C? •  RQ2: How does T contribute to the efficiency of testing when it is being used in real testing environments of C and compared to the current practices of C? •  RQ3: How satisfied (subjective satisfaction) are testing practitioners of C during the learning, installing, configuring and usage of T when used in real testing environments?
  • 36. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Defini;on  of  the  framework  –  the cases •  Reused a taxonomy from Vegas and Basili adapted to software testing tools and augmented with results from Tonella •  The case or the testing technique or tool should be described by: –  Prerequisites: type, life-cycle, environment (platform and languages), scalability, input, knowledge needed, experience needed. –  Results: output, completeness, effectiveness, defect types, number of generated test cases. –  Operation: Interaction modes, user guidance, maturity, etc. –  Obtaining the tool: License, cost, support.
  • 37. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Defini;on  of  the  framework  –  subjects •  Workers of C that normally use the techniques and tools with which T is being compared.
  • 38. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Defini;on  of  the  framework  – the objects/pilot projects •  In order to see how we can compare and what type of study we will do, we need answers to the following questions: A.  Will  we  have  access  to  a  system  with  known  faults?  What  informa;on  is  present   about  these  faults?   B.  Are  we  allowed/able  to  inject  faults  into  the  system?   C.  Does  your  company  gather  data  from  projects  as  standard  prac;ce?  What  data  is   this?  Can  this  data  be  made  available  for  comparison?  Do  you  have  a  company   baseline?  Do  we  have  access  to  a  sister  project?   D.  Does  company  C  have  enough  ;me  and  resources  to  execute  various  rounds  of   tests?,  or  more  concrete:   •  Is  company  C  willing  to  make  a  new  testsuite  TSna  with  some  technique/tool   Ta  already  used  in  the  company  C?   •  Is  company  C  is  willing  to  make  a  new  testsuite  TSnn  with  some  technique/ tool  Tn  that  is  also  new  to  company  C?   •  Can  we  use  an  exis;ng  testsuite  TSe  that  we  can  use  to  compare?  (Do  we   know  the  techniques  that  were  used  to  create  that  test  suite,  and  how  much   ;me  it  took?)  
  • 39. The FITTEST project is funded by the European Commission (FP7-ICT-257574)   Defini;on  of  the  framework  -­‐  Protocol   Can we inject faults? Training on T Inject faults YES do T to make TST; collect data NO Can we compare with an existing test suite from company C (i.e. TSC) Can company C make another test suite TSN for comparison using Tknown or Tunknown Do we have a version with known faults? Do we have a company baseline? do Tknown o Tunknown to make TSN ; collect data NO YES YESNO NO YES Do we have a version with known faults? Do we have a version with known faults? NO YES NO NO YES YES 1 2 3 6 7 4 5
  • 40. The FITTEST project is funded by the European Commission (FP7-ICT-257574)   Defini;on  of  the  framework  -­‐  Scenarios   •  Remember: o  Does company C have enough time and resources to execute various rounds of tests?, or more concrete: •  Is company C willing to make a new testsuite TSna with some technique/tool Tal already used in the company C? •  Is company C is willing to make a new testsuite TSnn with some technique/tool Tn that is also new to company C? •  Can we use an existing testsuite TSe that we can use to compare? (Do we know the techniques that were used to create that test suite, and how much time it took?) •  Scenario 1 (qualitative assessment only) (Qualitative Effects Analysis) •  Scenario 2 (Scenario 1 / quantitative analysis based on company baseline) •  Scenario 3 ((Scenario 1 / Scenario 2) / quantitative analysis of FDR) •  Scenario 4 ((Scenario 1 / Scenario 2) / quantitative comparison of T and TSe) •  Scenario 5 (Scenario 4 / FDR of T and TSe) •  Scenario 6 ((Scenario 1 / Scenario 2) / quantitative comparison of T and (Ta or Tn)) •  Scenario 7 (Scenario 6 / FDR of T and (Ta or Tn))
  • 41. The FITTEST project is funded by the European Commission (FP7-ICT-257574) If  we  can  inject  faults,  take  care  that   1.  The  ar;ficially  seeded  faults  are  similar  to  real  faults  that   naturally  occur  in  real  programs  due  to  mistakes  made  by   developers.   –  To  iden;fy  realis;c  fault  types,  a  history-­‐based  approach  can  be  used,   i.e.  “real”  faults  can  be  fetched  from  the  bug  tracking  system  and   made  sure  that  these  reported  faults  are  an  excellent  representa;ve   of  faults  that  are  introduced  by  developers  during  implementa;on.     2.  The  faults  should  be  injected  in  code  that  is  covered  by  an   adequate  number  of  test  cases   –   e.g.,  they  may  be  seeded  in  code  that  is  executed  by  more  than  20   percent  and  less  than  80  percent  of  the  test  cases.   3.  The  faults  should  be  injected  “fairly,”  i.e.,  an  adequate   number  of  instances  of  each  fault  type  is  seeded.   41
  • 42. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Defini;on  of  the  framework  –  data to collect •  Effectiveness –  Number of test cases designed or generated. –  How many invalid test cases are generated. –  How many repeated test cases are generated –  Number of failures observed. –  Number of faults found. –  Number of false positives (The test is marked as Failed, when the functionality is working). –  Number of false negatives (The test is marked as Passed, when the functionality is not working). –  Type and cause of the faults that were found. –  Estimation (or when possible measured) of coverage reached. •  Efficiency •  Subjective satisfaction
  • 43. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Effectiveness •  Efficiency –  Time needed to learn the testing method. –  Time needed to design or generate the test cases. –  Time needed to set up the testing infrastructure (install, configure, develop test drivers, etc.) (quantitative). (Note: if software is to be developed or other mayor configuration/ installation efforts, it might be a good idea to maintain working diaries). –  Time needed to test the system and observe the failure (i.e. planning, implementation and execution) in hours (quantitative). –  Time needed to identify the fault type and cause for each observed failure (i.e. time to isolate) (quantitative). •  Subjective satisfaction Defini;on  of  the  framework  –  data to collect
  • 44. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Effectiveness •  Efficiency •  Subjective satisfaction –  SUS score (10 question questionnaire with 5 likert-scale and a total score) –  5 reactions (through reaction cards) that will be used to create a word cloud and ven diagrams) –  Emotional face reactions during semi-structured interviews (faces will be evaluated on a 5 likert-scale from “not at all like this” to “very much like this”). –  Subjective opinions about the tool. A mixture of methods for evaluating subjective satisfaction Defini;on  of  the  framework  –  data to collect
  • 45. The FITTEST project is funded by the European Commission (FP7-ICT-257574) 45 System  Usability  Scale  (SUS)   ©DigitalEquipmentCorporation,1986 www.usability.serco.com/trump/documents/Suschapt.doc ! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!Strongly!disagree! ! !!!!!!!!!!!!!!!!!!!!!!Strongly!agree! ! ! ! ! ! ! ! ! ! !!!!!!! 1.!I!think!that!I!would!like!to! ! !!!use!this!system!frequently! ! ! ! ! 2.!I!found!the!system!unnecessarily! !!!complex! ! ! 3.!I!thought!the!system!was!easy! !!!to!use!!!!!!!!!!!!!!!!!!!!!!! ! ! 4.!I!think!that!I!would!need!the! !!!support!of!a!technical!person!to! !!!be!able!to!use!this!system! ! ! 5.!I!found!the!various!functions!in! !!!this!system!were!well!integrated! ! ! ! ! ! 6.!I!thought!there!was!too!much! !!!inconsistency!in!this!system! ! ! ! ! ! 7.!I!would!imagine!that!most!people! !!!would!learn!to!use!this!system! !!!very!quickly! ! ! ! ! 8.!I!found!the!system!very! !!!cumbersome!to!use! ! ! ! ! 9.!I!felt!very!confident!using!the! !!!system! ! ! 10.!I!needed!to!learn!a!lot!of! !!!things!before!I!could!get!going! !!!with!this!system!! ! ! ! ! !
  • 46. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Why  SUS   •  studies  (e.g.  [TS04,  BKM08])  have  shown  that  this  simple   ques;onnaire  gives  most  reliable  results.   •  SUS  is  technology  agnos;c,  making  it  flexible  enough  to  assess  a   wide  range  of  interface  technologies.     •  The  survey  is  rela;vely  quick  and  easy  to  use  by  both  study   par;cipants  and  administrators.   •  The  survey  provides  a  single  score  on  a  scale  that  is  easily   understood  by  the  wide  range  of  people  (from  project  managers   to  computer  programmers)  who  are  typically  involved  in  the   development  of  products  and  services  and  who  may  have  liVle  or   no  experience  in  human  factors  and  usability.   •  The  survey  is  nonproprietary,  making  it  a  cost  effec;ve  tool  as   well.   46
  • 47. The FITTEST project is funded by the European Commission (FP7-ICT-257574) SUS  Scoring   •  SUS  yields  a  single  number  represen;ng  a  composite  measure   of  the  overall  usability  of  the  system  being  studied.  Note  that   scores  for  individual  items  are  not  meaningful  on  their  own.   •  To  calculate  the  SUS  score,  first  sum  the  score  contribu;ons   from  each  item.     –  Each  item's  score  contribu;on  will  range  from  0  to  4.     –  For  items  1,  3,  5,  7  and  9  the  score  contribu;on  is  the  scale  posi;on   minus  1.     –  For  items  2,4,6,8  and  10,  the  contribu;on  is  5  minus  the  scale   posi;on.     –  Mul;ply  the  sum  of  the  scores  by  2.5  to  obtain  the  overall  value  of  SU.     •  SUS  scores  have  a  range  of  0  to  100.   47
  • 48. The FITTEST project is funded by the European Commission (FP7-ICT-257574) SUS  is  not  enough…   •  In  a  literature  review  of  180  published  usability  studies,   Hornbaek  [Horn06]  concludes  that  measures  of  sa;sfac;on   should  be  extended  beyond  ques;onnaires.   •  So  we  add  more….   48
  • 49. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Reac;on  Cards   49 Developed by and © 2002 Microsoft Corporation. All rights reserved.
  • 50. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Emo;onal  face  reac;ons     •  The  idea  is  to  elicit  feedback  about  the  product,  par;cularly   emo;ons  that  arose  for  the  par;cipants  while  talking  about  the   product  (e.g.  frustra;on,  happiness).       •  We  will  video  tape  the  users  when  they  respond  to  the  following   two  ques;ons  during  a  semi-­‐structured  interview.   •  Would  you  recommend  this  tool  it  to  other  colleagues?     –  If  not  why   –  If  yes  what  arguments  would  you  use     •  Do  you  think  you  can  persuade  your  management  to  invest  in  a  tool  like  this?   –  If  not  why   –  If  yes  what  arguments  would  you  use     50 ! Not!at! all!like! this! ! ! ! ! ! Very! much! like!this! 1! 2! 3! 4! 5! 6! 7! !
  • 51. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Analysing  and  interpre;ng  the  data   •  Depends  on  the  amount  of  data  we  have.   •  If  we  only  have  1  value  for  each  variable,  no  analysis  techniques   are  available  and  we  just  present  and  interpret  the  data.   •  If   we   have   sets   of   values   for   a   variable   then   we   need   to   use   sta;s;cal  methods   –  Descrip;ve  sta;s;cs   –  Sta;s;cal   tests   (or   significance   tes;ng).   In   sta;s;cs   a   result   is   called   sta;s;cally   significant   if   it   has   been   predicted   as   unlikely   to   have   occurred   by   chance   alone,   according   to   a   pre-­‐determined   threshold   probability,  the  significance  level.   •  Evalua;ng   SBST   tools,   we   will   always   have   sets   of   values   for   most  of  the  variables  to  deal  with  randomness   51
  • 52. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Descrip;ve  sta;s;cs   •  Mean,   median,   middle,   standard   devia;on,   frequency,   correla;on,  etc   •  Graphical  visualisa;on:  scaVer  plot,  box  plot,  histogram,  pie   charts   52
  • 53. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Sta;s;cal  test   53 http://science.leidenuniv.nl/index.php/ibl/pep/people/Tom_de_Jong/Teaching Most  important  is  finding  te  right  method  
  • 54. The FITTEST project is funded by the European Commission (FP7-ICT-257574)   Defini;on  of  the  framework  -­‐  Threats   •  Threats to validity (of confounding factors) have to be minimized •  These are the effects or situations that might jeopardize the validity of your results…. •  Those that cannot be prevented have to be reported •  When working with people we have to consider many sociological effects: •  Let us just look at a couple well known ones to give you an idea…. –  The learning curve effect –  The Hawthorne effect –  The Placebo effect –  Etc….
  • 55. The FITTEST project is funded by the European Commission (FP7-ICT-257574) The  learning  curve  effect   •  When  using  new  methods/tools  people  gain  familiarity  with   their  applica;on  over  ;me  (=  learning  curve).   –  Ini;ally  they  are  likely  to  them  more  ineffec;vely  than  they  might  a_er  a   period  of  familiarisa;on/learning.   •  Thus  the  learning  curve  effect  will  tend  to  counteract  any   posi;ve  effects  inherent  in  the  new  method/tool.     •  In  the  context  of  evalua;ng  methods/tools  there  are  two  basic   strategies  to  minimise  the  learning  curve  effect  (that  are  not   mutually  exclusive):   1.  Provide  appropriate  training  before  undertaking  an  evalua;on  exercise;   2.  Separate  pilot  projects  aimed  at  gaining  experience  of  using  a  method/ tool  from  pilot  projects  that  are  part  of  an  evalua;on  exercise.   55
  • 56. The FITTEST project is funded by the European Commission (FP7-ICT-257574) The  Hawthorn  effect   •  When  an  evalua;on  is  performed,  staff  working  on  the  pilot   project(s)  may  have  the  percep;on  that  they  are  working  under   more  management  scru;ny  than  normal  and  may  therefore  work   more  conscien;ously.     •  Name  comes  from  Hawthorne  aircra_  factory  (lights  low  or  high?).   •  The  Hawthorn  effect  would  tend  to  exaggerate  posi;ve  effects   inherent  in  a  new  method/tool.     •  A  strategy  to  minimise  the  Hawthorne  effect  is  to  ensure  that  a   similar  level  of  management  scru;ny  is  applied  to  control  projects   in  your  case  study  (i.e.  project(s)  using  the  current  method/tool)  as   is  applied  to  the  projects  that  are  using  the  new  method/tool.   56
  • 57. The FITTEST project is funded by the European Commission (FP7-ICT-257574) The  placebo  effect   •  In  medical  research,  pa;ents  who  are  deliberately  given  ineffectual  treatments   recover  if  they  believe  that  the  treatment  will  cure  them.     •  Also  a  so_ware  engineer  who  believes  that  adop;ng  some  prac;ce  (i.e.,   wearing  a  pink  t-­‐shirt)  will  improve  the  reliability  of  his  code  may  succeed  in   producing  more  reliable  code.     •  Such  a  result  could  not  be  generalised.   •  In  medicine,  placebo  effect  is  minimized  by  not  informing  the  subjects.     •  This  cannot  be  done  in  the  context  of  tes;ng  tool  evalua;ons.   •  When  evalua;ng  methods  and  tools  the  best  you  can  do  is  to:   –  Assign  staff  to  pilot  projects  using  your  normal  project  staffing  methods  and  hope   that  the  actual  selec;on  of  staff  includes  the  normal  mix  of  enthusiasts,  cynics  and   no-­‐hopers  that  normally  comprise  your  project  teams.   –  Make  a  special  effort  to  avoid  staffing  pilot  projects  with  staff  who  have  a  vested   interest  in  the  method/tool  (i.e.  staff  who  developed  or  championed  it)  or  a  vested   interest  in  seeing  it  fail  (i.e.  staff  who  really  hate  change  rather  than  just  resent  it).   •  This  is  a  bit  like  selec;ng  a  jury.  Ini;ally  the  selec;on  of  poten;al  jurors  is  at   random,  but  there  is  addi;onal  screening  to  avoid  jurors  with  iden;fiable  bias.   57
  • 58. The FITTEST project is funded by the European Commission (FP7-ICT-257574)   Defini;on  of  the  framework  -­‐  Threats   •  There are many more factors that all need to be identified •  Not only the people but also the technology: –  Did the measurement tools (i.e. Coverage) really measure what we thoughts –  Are the injected faults really representative? –  Were the faults injected fairly? –  Is the pilot project and software representative? –  Were the used oracles reliable? –  Etc….
  • 59. The FITTEST project is funded by the European Commission (FP7-ICT-257574)   Defini;on  of  the  framework  -­‐  Toolbox   •  Toolbox –  Demographic questionnaire: This questionnaire must be answered by the testers before performing the test. This questionnaire aims to obtain the features of the testers: level of experience in using the tool, years, job, knowledge of similar tools, –  Satisfaction questionnaire SUS: Questionnaire in order to extract the testers’ satisfaction when they perform the evaluation. –  Reaction cards –  Questions for (taped) semi-structured interviews –  Process for investigating the face reactions in videos –  An fault taxonomy to classify the found fault. –  A software testing technique and tools classification/taxonomy –  Working diaries –  A fault template to classify each fault detected by means of the test. The template contains the following information: •  Time spent to detect the fault •  Test case that found the fault •  Cause of the fault: mistake in the implementation, mistake in the design, mistake in the analysis. •  Manifestation of the fault in the code
  • 60. The FITTEST project is funded by the European Commission (FP7-ICT-257574)   Applying  or  instan;a;ng  the  framework   •  Search Based Structural Testing tool [Vos et.al. 2012] •  Search Based Functional Testing tool [Vos et. Al. 2013] •  Web testing techniques for AJAX applications [MRT08] •  Commercial combinatorial testing tool at Sulake (to be presented at ISSTA workshop next week) •  Automated Test Case Generation at IBM (has been send to ESEM) •  We have finished other instances: –  Commercial combinatorial testing tool at SOFTEAM (has been send to ESEM) •  Currently we are working on more instantiations: –  Regression testing priorization technique –  Continuous testing tool at SOFTEAM –  Rogue User Testing at SOFTEAM –  …
  • 61. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Sulake is a Finish company •  Develops social entertainment games •  Main product: Habbo hotel –  World’s largest virtual community for teenagers –  Millions of teenagaers a week all over the world (ages 13-18) –  Access direct through the browser or facebook –  11 languages available –  218.000.000 registered users –  11.000.000 visitors / month •  System can be accessed through wide variety of browsers, flashplayers (and their versions) than run on different operating systems! •  Which combinations to use when testing the system?! Combinatorial  Tes;ng   example  case  study  Sulake:  Context  
  • 62. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Can  a  tool  help?   •  What  about  the  CTE  XL  Profesional  or  CTE  for  short?   •  Combinatorial  Tree  Editor:   –  Model  your  combinatorial  problem  in  a  tree   –  Indicate  the  priori;es  of  the  combina;ons     –  The  tool  automa;cally  generated  the  best  test  cases!   67
  • 63. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  What do we want to find out? •  Research questions: –  RQ1: Compared to the current test suites used for testing in Sulake, can the test cases generated by the CTE contribute to the effectiveness of testing when it is used in real testing environments at Sulake? –  RQ2: How much effort would be required to introduce the CTE into the testing processes currently implanted at Sulake? –  RQ3: How much effort would be required to add the generated test cases into the testing infrastructure currently used at Sulake? –  RQ4: How satisfied are Sulake testing practitioners during the learning, installing, configuring and usage of CTE when it is used in real testing environment Combinatorial  Tes;ng   example  case  study  Sulake:  Research  Ques;ons  
  • 64. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Current combinatorial testing practice at Sulake –  Exploratory testing with feature coverage as objective –  Based on real user information(i.e. browsers, OS, Flash, etc.) combinatorial aspects are taken into account COMPARED TO •  Classification Tree Editor (CTE) –  Classify the combinatorial aspects as a classification tree –  Generate prioritized test cases and select The  tes;ng  tools  evaluated   (the  cases  or  treatments)  
  • 65. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Subjects: 1 senior tester from Sulake (6 years sw devel, 8 years testing experience) Who  is  doing  the  study   (the  subjects)  
  • 66. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Objects: –  2 nightly builds from Habbo –  Existing test suite that Sulake uses (TSsulake) with 42 automated test cases –  No known faults, no injection of faults Systems  Under  Test   (objects  or  pilot  projects)  
  • 67. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Can we inject faults? Training on T Inject faults YES do T to make TST; collect data NO Can we compare with an existing test suite from company C (i.e. TSC) Can company C make another test suite TSN for comparison using Tknown or Tunknown Do we have a version with known faults? Do we have a company baseline? do Tknown o Tunknown to make TSN ; collect data NO YES YESNO NO YES Do we have a version with known faults? Do we have a version with known faults? NO YES NO NO YES YES 1 2 3 6 7 4 5 The  protocol   (scenario  4)  
  • 68. The FITTEST project is funded by the European Commission (FP7-ICT-257574) ! Variables) TSSulake) TSCTE) Measuring!effectiveness:! Number!of!test!cases! 42! 68!(selected!42!high!priority)! Number!of!invalid!test!cases! 0! 0! Number!of!repeated!test!cases! 0! 26! Number!of!failures!observed! 0! 12! Number!of!fauls!found! 0! 2! Type!and!cause!of!the!faults! N/A! 1.!critical,!browser!hang! 2.!minor,!broken!UI!element! Feature!coverage!reached! 100%! 40%! All!pairs!coverage!reached! N/A! 80%! Measuring!efficiency:! Time!needed!to!learn!the!CTE!testing!method! N/A! 116!min! Time!needed!to!design!and!generate!the!test!suite!with!the!CTE!! N/A! 95!min!(62!for!tree,!33!for! removing!duplicates)!! Time!needed!to!setup!testing!infrastructure!specific!to!CTE! N/A! 74!min! Time!needed!to!automate!the!test!suite!generated!by!the!CTE!! N/A! 357!min! Time!needed!to!execute!the!test!suite! 114min! 183!min!(both!builds)! Time!needed!to!identify!fault!types!and!causes!! 0min! 116!min! Measuring!subjective!satisfaction! SUS! N/A! 50! Reaction!cards! N/A! Comprehensive,!Dated,!Old,! Sterile,!Unattractive! Informal!interview!! N/A! video! Collected  data  
  • 69. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Descrip;ve  sta;s;cs  for  efficiency  (1)  
  • 70. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Descrip;ve  sta;s;cs  for  efficiency  (2)  
  • 71. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Emo;onal  face  reac;ons     76 1. 4. 2. 3. Duplicates Test Cases Readability of the CTE trees Technical support and user manuals Appearance of the tool
  • 72. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  RQ1:  Compared  to  the  current  test  suites  used  for  tes;ng  in  Sulake,  can  the  test  cases   generated  by  the  CTE  contribute  to  the  effec;veness  of  tes;ng  when  it  is  used  in  real  tes;ng   environments  at  Sulake?   –  2  new  faults  were  found!!   –  The  need  that  more  structured  combinatorial  tes;ng  is  necessary  was  confirmed  by  Sulake.   •  RQ2:  How  much  effort  would  be  required  to  introduce  the  CTE  into  the  tes;ng  processes   currently  implanted  at  Sulake?   –  Effort  for  learning  and  installing  is  medium,  but  can  be  jus;fied  within  Sulake  being  only  once   –  Designing  and  genera;ng  test  cases  suffers  form  duplicates  that  costs  ;me  to  be  removed.  Total  effort  can   be  accepted  within  Sulake  because  cri;cal  faults  were  discovered.   –  Execu;ng  the  test  suite  generated  by  the  CTE  takes  1  hour  more  that  Sulake  test  suite  (due  to  more   combinatorial  aspects  being  tested).  This  means  that  Sulake  cannot  include  these  tests  in  daily  build,  but  will   have  to  add  them  to  nightly  builds  only.   ¢  RQ3:  How  much  effort  would  be  required  to  add  the  generated  test  cases  into  the  tes;ng   infrastructure  currently  used  at  Sulake?     •  Effort  for  automa;ng  the  generated  test  cases,  but  can  be  jus;fied  within  Sulake.   ¢  RQ4:  How  sa;sfied  are  Sulake  tes;ng  prac;;oners  during  the  learning,  installing,  configuring   and  usage  of  CTE  when  it  is  used  in  real  tes;ng  environment.   •  Seems  to  have  everything  that  is  needed  but  looks  unaVrac;ve.   Combinatorial  Tes;ng   example  case  study  Sulake:  Conclusions  
  • 73. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Automated  Test  Case  Genera;on   example  case  study  IBM:  context   •  IBM  Research  Labs  in  Haifa,  Israel   •  Develop  a  system  (denoted  IMP  ;-­‐)  for  resource  management   in  a  networked  environment  (servers,  virtual  machines,   switches,  storage  devices,  etc.)   •  IBM  Lab  has  a  designated  team  that  is  responsible  for  tes;ng   new  versions  of  the  product.   •  The  tes;ng  is  being  done  within  a  simulated  tes;ng   environment  developed  by  this  tes;ng  team   •  IBM  Lab  is  interested  in  evalua;ng  the  Automated  Test  Case   Genera;on  tools  developed  in  the  FITTEST  project!   78
  • 74. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  What do we want to find out? •  Research questions: –  RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM? –  RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM? –  RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research? Automated  Test  Case  Genera;on   example  case  study  IBM:  the  research  ques;ons  
  • 75. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Current test case design practice at IBM –  Exploratory test case design –  The objective of test cases is maximise the coverage of the system use-cases COMPARED TO •  FITTEST Automated Test Case Generation tools –  Only part of the whole continuous testing approach of FITTEST The  tes;ng  tools  evaluated   (the  cases  or  treatments)  
  • 76. The FITTEST project is funded by the European Commission (FP7-ICT-257574) FITTEST   Automated  Test  Case  Genera;on   81 Logs2FSM •  Infers FSM models from the logs •  Applying an event-based model inference approach (read more in [MTR08]). •  The model-based oracles that also result from this tool refer to the use of the paths generated from the inferred FSM as oracles. If these paths, when transformed to test cases, cannot be fully executed, then the tester needs to inspect the failing paths to see if that is due to some faults, or the paths themselves are infeasible. FSM2Tests •  Takes FSMs and a Domain Input Specification (DIS) file created by a tester for the IBM Research SUT to generate concrete test cases. •  This component implements a technique that combines model-based and combinatorial testing (see [NMT12]): 1.  generate test paths from the FSM (using various simple and advanced graph visit algorithms) 2.  transform these paths into classification trees using the CTE XL format, enriched with the DIS such as data types and partitions; 3.  Generate test combinations from those trees using combinatorial criteria.
  • 77. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Subjects: –  1 senior tester from IBM (10 years sw devel, 5 years testing experience of which 4 years with IMP system) –  1 researcher from FBK (10 years of experience with sw development, 5 years of experience with research in testing) Who  is  doing  the  study   (the  subjects)  
  • 78. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Objects: –  SUT: Distributed application for managing system resources in a networked environment •  A management server that communicates with multiple managed clients (= physical or virtual resources) •  Important product for IBM with real customers •  Case study will be performed on a new version of this system –  Existing test suite that IBM uses (TSibm) •  Selected from what they call System Validation Tests (SVT) -> tests for high level complex costumer use-cases •  Manually designed •  Automatically executed through activation scripts –  10 representative faults to inject into the system Pilot  project    
  • 79. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Can we inject faults? Training on T Inject faults YES do T to make TST; collect data NO Can we compare with an existing test suite from company C (i.e. TSC) Can company C make another test suite TSN for comparison using Tknown or Tunknown Do we have a version with known faults? Do we have a company baseline? do Tknown o Tunknown to make TSN ; collect data NO YES YESNO NO YES Do we have a version with known faults? Do we have a version with known faults? NO YES NO NO YES YES 1 2 3 6 7 4 5 The  protocol   (scenario  5)  
  • 80. The FITTEST project is funded by the European Commission (FP7-ICT-257574) More  detailed  steps   1.  Configure  the  simulated  environment  and  create  the  logs  [IBM]   2.  Select  test  suite  TSibm [IBM]   3.  Write  ac;va;on  scripts  for  each  test  case  in  TSibm [IBM]   4.  Generate  TSfittest [FBK  subject]   a.  Instan;ate  FITTEST  components  for  the  IMP   b.  Generate  the  FSM  with  Logs2FSM   c.  Define  the  Domain  Input  Specifica;on  (DIS)   d.  Generate  the  concrete  test  data  with  FSM2Tests   5.  Select  and  inject  the  faults  [IBM]   6.  Develop  a  tool  that  transforms  the  concrete  test  cases  generated   by  the  FITTEST  tool  FSM2Tests    to  an  executable  format  [IBM]   7.  Execute  TSibm  [IBM]   8.  Execute  TSfittest [IBM]   85
  • 81. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Collected  Measures   86 TSibm TSfittest size number of abstract test cases NA 84 number of concrete test cases 4 3054 number of commands (or events) 1814 18520 construction design of the test cases manual cf. Section III-B1 automated cf. Section III-B2 effort effort to create the test suite design 5 hours set up FITTEST tools 8 hours activation scripts 4 hours generate the FSM automated, less than 1 second CPU time specify the DIS 2 hours generate concrete tests automated, less than 1 minute CPU time transform into executable format 20 hours TABLE IV. DESCRIPTIVE MEASURES FOR THE TEST SUITES TSibm AND TSfittest what the researcher have in mind and what is investigated ccording to the research questions. This type of threat is mainly related to the use of injected faults to measure the fault- finding capability of our testing strategies. This is because the ypes of faults seeded may not be enough representative of real aults. In order to mitigate this threat, the IBM team identified epresentative faults that were based on real faults, identified n earlier time of the development. This identification although was realized by a senior tester, the list was revised by all IBM eam that participated in this case study. Moreover, from the FITTEST project’s point of view w have the following results: • The FITTEST tools have shown to be useful with the context of a real industrial case. • The FITTEST tools have the ability to automate th testing process within a real industrial case. ACKNOWLEDGMENT This work was financed by the FITTEST project, IC S19 S15 S42 GET/VMControl/virtualAppliances/{id}/targets S34 GET/VMControl/virtualAppliances S26 GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/progress GET/VMControl S33 GET/VMControl/virtualAppliances/{id} S41 GET/VMControl/virtualAppliances/{id}/targets S2 GET/VMControl/virtualAppliances/{id}/progress S20 GET/VMControl/virtualAppliances/{id} S37 GET/VMControl/virtualAppliances/{id}/targets S36 GET/VMControl/virtualAppliances GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targets S22 GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances/{id}/progress GET/isdsettings/restcompatibilities GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targetsGET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances S48 GET/resources/OperatingSystem?Props=IPv4Address GET/isdsettings/restcompatibilitiesGET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/progress S40 GET/VMControl/virtualAppliances GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targets GET/resources/System/{id}/access/a315017165998 S38 GET/VMControl/workloads GET/VMControl/virtualAppliances/{id}/targets S31 GET/VMControl/workloads/{id} TABLE II. DESCRIPTIVE MEASURES FOR THE FSM THAT IS USED TO GENERATE TSfittest Variable Number of traces used to infer FSM 6 Average trace length 100 Number of nodes in generated FSM 51 Number of transitions in generated FSM 134 at IBM Research, can the FITTEST technologies contribute to the effectiveness of testing when it is used in the testing the effecti can be de Moreover can help i RQ3: Ho FITTEST at IBM R As ca FITTEST Input Spe
  • 82. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Collected  measures   87 IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$ TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$ TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$ As can be seen from Table IV, the TSibm is substantial smaller in size than the TSfittest in all parameters, this is one of the evident results of automation. However, not only the size of TSfittest is bigger, also the effectiveness of TSfittest, measured by the injected faults coverage (see Figure 4), is significantly higher (50% vs 70%). Moreover, if we would combine the TSibm and TSfittest suites, the effectiveness increases to 80%. Therefore, within the context of the studied environment, for IBM Research the FITTEST technologies can contribute to the effectiveness of testing and IBM Research has decided that, for optimizing faults-finding capability, the two techniques can best be combined. RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST technologies contribute to the efficiency of testing when it is used in the testing environments at IBM Research? TABLE III. EFFICIENCY MEASURES FOR EXECUTION OF BOTH TEST SUITES. Variable TSibm normalized TSfittest normalized by size by size Execution Time 36.75 9.18 127.87 1.52 with fault injection Execution Time 27.97 6.99 50.72 0.60 without fault injection It can be seen from Table III that the time to execute TSibm is smaller than the time to execute TSfittest. This is due to the number of concrete tests in TSfittest. When we normalize the Internal validi are examined. In o is related to the lo vironment to be us models. Because of the content of the i have asked IBM for is that the quality the completeness o because incomplete of the TSfittest. In number of detected can be improved, s conclusion about t unchanged. Regard although they had working in the in knowledge of the F means of a closer complementing the mistakes or misund External validi is possible to gene findings are of inter case. Our results r given set of artifici expensive in terms it with in order to However, as discus
  • 83. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM? –  TSibm finds 50% of injected faults, TSfittest finds 70% –  Together they find 80%! -> IBM will consider combining the techniques •  RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM? –  FITTEST test cases execute faster because they are smaller –  Shorter tests was good for IBM -> easier to identify faults •  RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research? –  Found reasonable by IBM considering the fact that manual tasks need to be done only once and more faults were fund. Automated  Test  Case  Genera;on   example  case  study  IBM:  Conclusions  
  • 84. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Threats  to  validity   •  The  learning  curve  effect   •  (an  basically  all  the  other  human  factors  ;-­‐)   •  Missing  informa;on  in  the  logs  leads  to  weak  FSMs.   •  Incomplete  specifica;on  of  the  DIS  lead  to  weak  concrete  test   cases.   •  The  representa;veness  of  the  injected  faults  to  real  faults.   89
  • 85. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Final  Things  ……     •  As  researchers,  we  should  concentrate  on  future  problems  the   industry  will  face.  ¿no?     •  How  can  we  claim  to  know  future  needs  without  understanding   current  ones?   •  Go  to  industry  and  evaluate  your  results!   90
  • 86. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Need  any  help?   More  informa;on?     Want  to  do  an  instan;a;on?     Contact:     Tanja  E.  J.  Vos     email:  tvos@pros.upv.es   skype:  tanja_vos   web:  hVp://tanvopol.webs.upv.es/   project:  hVp://www.facebook.com/FITTESTproject   telephone:  +34  690  917  971    
  • 87. The FITTEST project is funded by the European Commission (FP7-ICT-257574) References   empirical  research  in  so_ware  engineering   [BSH86]  V  R  Basili,  R  W  Selby,  and  D  H  Hutchens.  Experimenta;on  in  so_ware  engineering.  IEEE  Trans.  So_w.  Eng.,  12:733–743,  July  1986.   [BSL99]  Victor  R.  Basili,  Forrest  Shull,  and  Filippo  Lanubile.  Building  knowledge  through  families  of  experiments.  IEEE  Trans.  So_w.  Eng.,   25(4):456–473,  1999.   [CWH05]  M.  Host  C.  Wohlin  and  K.  Henningsson.  Empirical  research  methods  in  so_ware  and  web  engineering.  In  E.  Mendes  and  N.  Mosley,   editors,  Web  Engineering  -­‐  Theory  and  Prac;ce  of  Metrics  and  Measurement    for  Web  Development,    pages  409–  429,  2005.   [Fly04]  Bent  Flyvbejrg.  Five  misunderstandings  about  case-­‐study  research.    In  Jaber  F.  Gubrium  Clive  Seale,  Giampietro  Gobo  and  David   Silverman,  editors,  Qualita;ve  Research    Prac;ce.,  pages  420–434.  London    and  Thousand  Oaks,  CA.,  2004.   [KLL97]  B.  Kitchenham,  S.  Linkman,  and  D.  Law.  Desmet:  a  methodology  for  evalua;ng  so_ware  engineering     [KPP+  02]  Barbara  A.  Kitchenham,  Shari  Lawrence  Pfleeger,  Lesley  M.  Pickard,  Peter  W.  Jones,  David  C.  Hoaglin,  Khaled  El  Emam,  and  JarreV   Rosenberg.    Preliminary  guidelines    for  empirical  research  in  so_ware  engineering.  IEEE    Trans.  So_w.  Eng.,  28(8):721–734,    2002.   [LSS05]  Timothy  C.  Lethbridge,  Susan  EllioV  Sim,  and  Janice  Singer.  Studying  so_ware  engineers:  Data    collec;on  techniques  for  so_ware   field  studies.  Empirical  So_ware  Engineering,  10:311–  341,  2005.    10.1007/s10664-­‐005-­‐1290-­‐x.   [TTDBS07]  Paolo  Tonella,  Marco  Torchiano,  Bart    Du    Bois,    and  Tarja  Syst¨a.    Empirical  studies  in  reverse  engineering:    state  of  the  art  and   future  trends.  Empirical  So_w.  Engg.,  12(5):551–571,  2007.   [WRH+00]  Claes  Wohlin,  Per  Runeson,  Mar;n  Host,  Magnus  C.    Ohls-­‐  son,  Bjorn  Regnell,  and  Anders  Wesslen.  Experimenta;on  in  so_ware   engineering:    an  introduc;on.  Kluwer  Academic  Publishers,  Norwell,  MA,  USA,  2000.   [Rob02]  Colin  Robson.  Real  World  Research.  Blackwell  Publishing  Limited,  2002.   [RH09]  Per  Runeson  and  Mar;n  Host.  Guidelines  for  conduc;ng  and  repor;ng  case  study  research  in  so_ware  engineering.  Empirical  So_w.   Engg.,  14(2):131–164,    2009.   [PPV00]   Dewayne   E.   Perry,   Adam   A.   Porter,   and   Lawrence   G.   VoVa.   2000.   Empirical   studies   of   so_ware   engineering:   a   roadmap.   In   Proceedings  of  the  Conference  on  The  Future  of  So?ware  Engineering  (ICSE  '00).  ACM,  New  York,  NY,  USA,  345-­‐355.   [EA06]  hVp://www.cs.toronto.edu/~sme/case-­‐studies/case_study_tutorial_slides.pdf   [Ple95]  Pfleeger,S.L.;  Experimental  design  and  analysis  in  so_ware  engineering.  Annals  of  So_ware  Engineering  1,  219-­‐253,  1995.   [PK01-­‐03]  Pfleeger,  S.L.  and  Kitchenham,  B.A..  Principles  of  Survey  Research.  So_ware  Engineering  Notes,  (6  parts)  Nov  2001  -­‐  Mar  2003.   [Fly06]  Flyvbjerg,  B.;  Five  Misunderstandings  about  Case  Study  Research.  Qualita;ve  Inquiry  12  (2)  219-­‐245,  April  2006   [KPP95]  Kitchenham,  B.;  Pickard,  L.;  Pfleeger,  Shari  Lawrence,  "Case  studies  for  method  and  tool  evalua;on,"  So?ware,  IEEE  ,  vol.12,  no.4,   pp.52,62,  Jul  1995       92
  • 88. The FITTEST project is funded by the European Commission (FP7-ICT-257574) [BS87]  Victor  R.    Basili  and  Richard  W.    Selby.      Comparing  the  effec;veness  of  so_ware  tes;ng  strategies.  IEEE  Trans.  So_w.  Eng.,   13(12):1278–1296,  1987.   [DRE04]  Hyunsook  Do,  Gregg  Rothermel,  and  Sebas;an  Elbaum.    In-­‐  frastructure  support  for  controlled  experimenta;on  with  so_-­‐   ware  tes;ng  and  regression  tes;ng  techniques.  In  Proc.  Int.  Symp.  On  Empirical    So_ware  Engineering,    ISESE  ’04,  2004.   [EHP+  06]    Sigrid    Eldh,  Hans  Hansson,  Sasikumar  Punnekkat,  Anders  Pet-­‐  tersson,  and  Daniel  Sundmark.  A  framework  for  comparing   efficiency,  effec;veness  and  applicability  of  so_ware  tes;ng  tech-­‐  niques.  Tes;ng:  Academic  &  Industrial  Conference  on  Prac;ce  And   Research  Techniques  (TAIC    part),  0:159–170,  2006.   [JMV04]  Natalia  Juristo,  Ana  M.  Moreno,  and  Sira  Vegas.  Reviewing  25  years  of  tes;ng  technique  experiments.  Empirical  So_w.  Engg., 9(1-­‐2):7–44,  2004.   [RAT+  06]  Per  Runeson,  Carina  Andersson,  Thomas  Thelin,  Anneliese  Andrews,  and  Tomas  Berling.    What  do  we  know  about  de-­‐  fect   detec;on  methods?  IEEE  So_w.,  23(3):82–90,  2006.   [VB05]  Sira  Vegas  and  Victor  Basili.    A    characterisa;on    schema  for  so_ware  tes;ng  Techniques.  Empirical  So_w.  Engg.,  10(4):437– 466,  2005.   [LR96]  Christopher  M.  LoV  and  H.  Dieter  Rombach.  Repeatable  so_ware  engineering    experiments    for  comparing  defect-­‐detec;on   techniques.  Empirical    So_ware  Engineering,    1:241–277,    1996.  10.1007/BF00127447.         93 References   Empirical  research  in  so_ware  tes;ng  
  • 89. The FITTEST project is funded by the European Commission (FP7-ICT-257574) [Vos  et.  Al.  2010]  T.  E.  J.  Vos,  A.  I.  Baars,  F.  F.  Lindlar,  P.  M.  Kruse,  A.  Windisch,  and  J.  Wegener,  “Industrial  scaled  automated  structural   tes;ng  with  the  evolu;onary  tes;ng  tool,”  in  ICST,  2010,  pp.  175–184.     [Vos  et.  al  2012]  Tanja  E.  J.  Vos,  Arthur  I.  Baars,  Felix  F.  Lindlar,  Andreas  Windisch,  Benjamin  Wilmes,  Hamilton  Gross,  Peter  M.  Kruse,   Joachim  Wegener:  Industrial  Case  Studies  for  Evalua;ng  Search  Based  Structural  Tes;ng.  Interna;onal  Journal  of  So_ware  Engineering   and  Knowledge  Engineering  22(8):  1123-­‐  (2012)   [Vos  et.  Al.  2013]  T.  E.  J.  Vos,  F.  Lindlar,  B.  Wilmes,  A.  Windisch,  A.  Baars,  P.  Kruse,  H.  Gross,  and  J.  Wegener,  “Evolu;onary  func;onal   black-­‐box  tes;ng  in  an  industrial  semng,”  So?ware  Quality  Journal,  21  (2):  259-­‐288  (2013)   [MRT08]  A.  MarcheVo,  F.  Ricca,  and  P.  Tonella,  “A  case  study-­‐  based  comparison  of  web  tes;ng  techniques  applied  to  ajax  web   applica;ons,”  Interna;onal  Journal  on  So_ware  Tools  for  Technology  Transfer  (STTT),  vol.  10,  pp.  477–  492,  2008     94 References   Descrip;ons  of  empirical  studies  done  in  so_ware  tes;ng  
  • 90. The FITTEST project is funded by the European Commission (FP7-ICT-257574) Other  references   [TS04]  T.S.  Tullis  and  J.  N.  Stetson.  A  comparison  of  ques;onnaires  for  assessing  website  usability.  In  Proceedings  of  the  Usability   Professionals  Associa;on  Conference,  2004.   [BKM08]  Aaron  Bangor,  Philip  T.    Kortum,  and  James  T.    Miller.    An  empirical  evalua;on  of  the  system  usability  scale.  Interna;onal   Journal  of  Human-­‐computer    Interac;on,    24:574–594,  2008.   [Horn06]  Hornbæk,  K.  (2006),  "Current  Prac8ce  in  Measuring  Usability:  Challenges  to  Usability  Studies  and  Research",  Interna8onal   Journal  of  Human-­‐Computer  Studies,  Volume  64,  Issue  2  ,  February  2006,  Pages  79-­‐102.     [MTR08]  MarcheVo,  Alessandro;  Tonella,  Paolo  and  Ricca,  Filippo.,  State-­‐based  tes;ng  of  Ajax  web  applica;ons,  So_ware  Tes;ng,   Verifica;on,  and  Valida;on,  2008  1st  Interna;onal  Conference  on,  pp121-­‐130,  IEEE,  2008.   [NMT12]  C.  D.  Nguyen,  A.  MarcheVo,  and  P.  Tonella.  Combining  model-­‐based  and  combinatorial  tes;ng  for  effec;ve  test  case   genera;on.  In  Proceedings  of  the  2012  Interna;onal  Symposium  on  So_ware  Tes;ng  and  Analysis,  pages  100-­‐110,  ACM,  2012.     95