SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Trust-based Requirements Traceability
ICPC 2011, Kingston
Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
Requirements Traceability
• Requirements traceability is defined as “the
ability to describe and follow the life of a
requirement, in both a forwards and
backwards direction” (Gotel, 1994)
2
What’s Requirements Traceability Good For?
Program Comprehension
Discover what code needs to change to
handle a new requirement
Aid in determining whether a specification is
completely implemented
3
IR-based Approaches
• Vector Space Model (Antoniol et al. 2002)
• Latent Semantic Indexing (Marcus and Maletic, 2003)
• Jensen Shannon Divergence (Abadi et al. 2008)
• Latent Dirichlet Allocation (Asuncion, 2010)
4
Problem!
5
Requirements0.000132% - Similarity
Goal
• Mining software repository to improve
recovery traceability links
• Using software repository links to improve
expert’s trust in an automatically recovered
link
6
Inspiration
• Web trust model (Palmer, 2000, McKnight, 2002)
• Initial Trust
• Reputation Trust
7
How do we trust?
8
How do we trust?
9
Other
Approaches
Trustrace
10
Example: Requirements to Code Links
Example: Requirements to SVN Links (1)
Example: Requirements to SVN Links (2)
Example: Re-weighting All Links
Case Studies
15
Pooka SIP Communicator
Version 2.0 1.0
Number of Classes 298 1,771
Number of Methods 20,868 31,502
LOC 244K 487K
SVN History 2000 – 2010 2005-2010
SIP Communicator: Voice over IP and instate messenger
Pooka: An email Client
Hypotheses
16
H01: There is no statistical difference in the precision of the
recovered traceability links when using Trustrace or a VSM-
based approach
H02: There is no statistical difference in the recall of the
recovered traceability links when using Trustrace or a VSM-
based approach
IR Quality Measures
17
Identifiers / Commit Messages Extraction
18
Extraction
• Class Name
• Method Names
• Variable Names
• Comments
• SVN Commit Messages
• SVN Commit File Names
SVN Logs Preprocessing
19
We extract CVS/SVN commits and discards those that:
1. Are tagged as “delete”
2. Does not concern source code (e.g., changed manual pages or
documentation only)
3. Have messages of length shorter or equal to two words.
Text Preprocessing
20
• Filter (#43@$)
• Stop words (the, is, an….)
• Stemmer (attachment -> attach)
Information Retrieval (IR) Methods
• Vector Space Model (VSM) (Salton et al., 1975)
– Each document, d, is represented by a vector of ranks of
the terms in the vocabulary:
vd = [rd(w1), rd(w2), …, rd(w|V|)]
– The query is similarly represented by a vector
– The similarity between the query and document is the
cosine of the angle between their respective vectors
21
Pooka’s Results
22
SIP Results
23
Statistical Tests
24
Precision
VSM Trustrace p-value
Pooka 42.28 54.35 p<0.01
SIP Com. 14.23 25.13 P<0.01
Pooka Results
25
SIP Results
26
Statistical Tests
27
Recall
VSM Trustrace p-value
Pooka 11.14 12.6 P>0.7
SIP Com. 13.42 16.63 P>0.5
Discussion
• Using different source of information reduces an
experts effort up to 50%
• Using temporal information with IR-based
approaches yields better results
• The results tend to improve when increasing the SVN
commit log size
• Trustrace also improves LSI results at k=50 and k=200
values for Pooka and SIP respectively
28
Threats to Validity
• External validity:
• We analyzed only two systems
• Construct validity:
• The two researchers built both oracles
• Oracles were validated by other two experts
• Internal validity: Different ʎ value may lead to different results
• Reliability validity: replication package is available online at
www.ptidej.net
• Tool is online at www.factrace.net
29
Ongoing work
More IR approaches and datasets
Empirical study
Including other friends (bug reports etc.)
Determine heuristics to identify the best ʎ
30
Summary
• Only similarity value is not enough to trust a
link
• Other source of information is required to
increase trust of a link
31

Contenu connexe

Tendances

Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data
1crore projects
 
Eswc2009
Eswc2009Eswc2009
Eswc2009
fanizzi
 
Query dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighborQuery dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighbor
iyo
 

Tendances (19)

Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
 
Thesis Proposal Presentation
Thesis Proposal PresentationThesis Proposal Presentation
Thesis Proposal Presentation
 
Unit 5 Quantization
Unit 5 QuantizationUnit 5 Quantization
Unit 5 Quantization
 
Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
 
Unit 3 Dictionary based Compression Techniques
Unit 3 Dictionary based Compression TechniquesUnit 3 Dictionary based Compression Techniques
Unit 3 Dictionary based Compression Techniques
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 
Subjective evaluation answer ppt
Subjective evaluation answer pptSubjective evaluation answer ppt
Subjective evaluation answer ppt
 
Eswc2009
Eswc2009Eswc2009
Eswc2009
 
An Adaptive Approach for Subjective Answer Evaluation
An Adaptive Approach for Subjective Answer EvaluationAn Adaptive Approach for Subjective Answer Evaluation
An Adaptive Approach for Subjective Answer Evaluation
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Unit 3 Arithmetic Coding
Unit 3 Arithmetic CodingUnit 3 Arithmetic Coding
Unit 3 Arithmetic Coding
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...
 
Query dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighborQuery dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighbor
 
WCRE11b.ppt
WCRE11b.pptWCRE11b.ppt
WCRE11b.ppt
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 

Similaire à Icpc11c.ppt

IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
KtonNguyn2
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing Applications
Marco Brambilla
 

Similaire à Icpc11c.ppt (20)

Wcre11b.ppt
Wcre11b.pptWcre11b.ppt
Wcre11b.ppt
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
Icsm12.ppt
Icsm12.pptIcsm12.ppt
Icsm12.ppt
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing Applications
 
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Seminar pasqualina potena
Seminar pasqualina potenaSeminar pasqualina potena
Seminar pasqualina potena
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source Projects
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptx
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
Provinance in scientific workflows in e science
Provinance in scientific workflows in e scienceProvinance in scientific workflows in e science
Provinance in scientific workflows in e science
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
Vikalp - Automatic multiple choice questions generator
Vikalp - Automatic multiple choice questions generatorVikalp - Automatic multiple choice questions generator
Vikalp - Automatic multiple choice questions generator
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug LocalizationAssisting Code Search with Automatic Query Reformulation for Bug Localization
Assisting Code Search with Automatic Query Reformulation for Bug Localization
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sources
 

Plus de Yann-Gaël Guéhéneuc

Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22
Yann-Gaël Guéhéneuc
 
Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3
Yann-Gaël Guéhéneuc
 
On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6
Yann-Gaël Guéhéneuc
 

Plus de Yann-Gaël Guéhéneuc (20)

Advice for writing a NSERC Discovery grant application v0.5
Advice for writing a NSERC Discovery grant application v0.5Advice for writing a NSERC Discovery grant application v0.5
Advice for writing a NSERC Discovery grant application v0.5
 
Ptidej Architecture, Design, and Implementation in Action v2.1
Ptidej Architecture, Design, and Implementation in Action v2.1Ptidej Architecture, Design, and Implementation in Action v2.1
Ptidej Architecture, Design, and Implementation in Action v2.1
 
Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22
 
Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3
 
Some Pitfalls with Python and Their Possible Solutions v0.9
Some Pitfalls with Python and Their Possible Solutions v0.9Some Pitfalls with Python and Their Possible Solutions v0.9
Some Pitfalls with Python and Their Possible Solutions v0.9
 
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
 
An Explanation of the Halting Problem and Its Consequences
An Explanation of the Halting Problem and Its ConsequencesAn Explanation of the Halting Problem and Its Consequences
An Explanation of the Halting Problem and Its Consequences
 
Are CPUs VMs Like Any Others? v1.0
Are CPUs VMs Like Any Others? v1.0Are CPUs VMs Like Any Others? v1.0
Are CPUs VMs Like Any Others? v1.0
 
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
 
Well-known Computer Scientists v1.0.2
Well-known Computer Scientists v1.0.2Well-known Computer Scientists v1.0.2
Well-known Computer Scientists v1.0.2
 
On Java Generics, History, Use, Caveats v1.1
On Java Generics, History, Use, Caveats v1.1On Java Generics, History, Use, Caveats v1.1
On Java Generics, History, Use, Caveats v1.1
 
On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6
 
ICSOC'21
ICSOC'21ICSOC'21
ICSOC'21
 
Vissoft21.ppt
Vissoft21.pptVissoft21.ppt
Vissoft21.ppt
 
Service computation20.ppt
Service computation20.pptService computation20.ppt
Service computation20.ppt
 
Serp4 iot20.ppt
Serp4 iot20.pptSerp4 iot20.ppt
Serp4 iot20.ppt
 
Msr20.ppt
Msr20.pptMsr20.ppt
Msr20.ppt
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 
Icsoc20.ppt
Icsoc20.pptIcsoc20.ppt
Icsoc20.ppt
 
Icsoc18.ppt
Icsoc18.pptIcsoc18.ppt
Icsoc18.ppt
 

Dernier

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Dernier (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Icpc11c.ppt

  • 1. Trust-based Requirements Traceability ICPC 2011, Kingston Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
  • 2. Requirements Traceability • Requirements traceability is defined as “the ability to describe and follow the life of a requirement, in both a forwards and backwards direction” (Gotel, 1994) 2
  • 3. What’s Requirements Traceability Good For? Program Comprehension Discover what code needs to change to handle a new requirement Aid in determining whether a specification is completely implemented 3
  • 4. IR-based Approaches • Vector Space Model (Antoniol et al. 2002) • Latent Semantic Indexing (Marcus and Maletic, 2003) • Jensen Shannon Divergence (Abadi et al. 2008) • Latent Dirichlet Allocation (Asuncion, 2010) 4
  • 6. Goal • Mining software repository to improve recovery traceability links • Using software repository links to improve expert’s trust in an automatically recovered link 6
  • 7. Inspiration • Web trust model (Palmer, 2000, McKnight, 2002) • Initial Trust • Reputation Trust 7
  • 8. How do we trust? 8
  • 9. How do we trust? 9 Other Approaches
  • 12. Example: Requirements to SVN Links (1)
  • 13. Example: Requirements to SVN Links (2)
  • 15. Case Studies 15 Pooka SIP Communicator Version 2.0 1.0 Number of Classes 298 1,771 Number of Methods 20,868 31,502 LOC 244K 487K SVN History 2000 – 2010 2005-2010 SIP Communicator: Voice over IP and instate messenger Pooka: An email Client
  • 16. Hypotheses 16 H01: There is no statistical difference in the precision of the recovered traceability links when using Trustrace or a VSM- based approach H02: There is no statistical difference in the recall of the recovered traceability links when using Trustrace or a VSM- based approach
  • 18. Identifiers / Commit Messages Extraction 18 Extraction • Class Name • Method Names • Variable Names • Comments • SVN Commit Messages • SVN Commit File Names
  • 19. SVN Logs Preprocessing 19 We extract CVS/SVN commits and discards those that: 1. Are tagged as “delete” 2. Does not concern source code (e.g., changed manual pages or documentation only) 3. Have messages of length shorter or equal to two words.
  • 20. Text Preprocessing 20 • Filter (#43@$) • Stop words (the, is, an….) • Stemmer (attachment -> attach)
  • 21. Information Retrieval (IR) Methods • Vector Space Model (VSM) (Salton et al., 1975) – Each document, d, is represented by a vector of ranks of the terms in the vocabulary: vd = [rd(w1), rd(w2), …, rd(w|V|)] – The query is similarly represented by a vector – The similarity between the query and document is the cosine of the angle between their respective vectors 21
  • 24. Statistical Tests 24 Precision VSM Trustrace p-value Pooka 42.28 54.35 p<0.01 SIP Com. 14.23 25.13 P<0.01
  • 27. Statistical Tests 27 Recall VSM Trustrace p-value Pooka 11.14 12.6 P>0.7 SIP Com. 13.42 16.63 P>0.5
  • 28. Discussion • Using different source of information reduces an experts effort up to 50% • Using temporal information with IR-based approaches yields better results • The results tend to improve when increasing the SVN commit log size • Trustrace also improves LSI results at k=50 and k=200 values for Pooka and SIP respectively 28
  • 29. Threats to Validity • External validity: • We analyzed only two systems • Construct validity: • The two researchers built both oracles • Oracles were validated by other two experts • Internal validity: Different ʎ value may lead to different results • Reliability validity: replication package is available online at www.ptidej.net • Tool is online at www.factrace.net 29
  • 30. Ongoing work More IR approaches and datasets Empirical study Including other friends (bug reports etc.) Determine heuristics to identify the best ʎ 30
  • 31. Summary • Only similarity value is not enough to trust a link • Other source of information is required to increase trust of a link 31