SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Prereqir: Recovering Pre-Requirements
                       via Cluster Analysis

              Jane Huffman Hayes, Giuliano (Giulio) Antoniol and
                           Yann-Gaël Guéhéneuc




WCRE 2008,
 Antwerp

    1/21
Content

             Problem Statement

             PREREQUIR Idea

             PREREQUIR Process

             Technologies

             WEB Browser Requirements

             Case Study Results

             Conclusions

WCRE 2008,
 Antwerp

    2/21
The Challenge

             A few years after deployment, the RS may no
             longer exist.

             If it exists, it will be almost surely outdated.

             My customers may desire new functionalities or
             technologies that my system may or may not
             implement.

             I poll my stakeholders:
                programmers, managers, testing team members,
                marketing personnel, and end users;
WCRE 2008,
 Antwerp
                find out what they believe the system should do.
    3/21
PREREQIR in Essence

             We need a pre-requirement document:
                what the competitor systems do;
                what our customer base needs.


             Obtain and vet a list of requirements from diverse
             stakeholders.

             Structure requirements by mapping them into
             representation suitable for grouping via pattern-
             recognition and similarity-based clustering.

             Analyze clustered requirements to divide them
             into set of essential and set of optional
WCRE 2008,
 Antwerp
             requirements.
    4/21
The PREREQUIR Process




                               Textual documents                                             Vector space/clustering

                                                                                                  001010
                                                                 browser support zoom
                                                                 unzoom page detail




                                                            rp
             Requirements rp
                                                                                                                                      Recovered PRI ri
                                                                                                                                          and oj
                                                                 Browser support print             101110
                                Split, Stop-word Removal,                                                    Clustering   Labelling
                                         Stemming                                        TF-IDF             PAM/AGNES     Clusters
                                        Tokenization
             Requirements ri                                ri




                  1                                                     2                                       3             4




WCRE 2008,
 Antwerp

    5/21
PREREQUIR Technology

             Standard information retrieval vector space
             model.
             Indexing process:

                Stopper;
                Stemmer;
                Thesaurus (not vital but helps);
                TF-IDF indexing.


             Clustering PAM and AGNES.


WCRE 2008,
             Labeling: still an open question.
 Antwerp

    6/21
Step 1 – Collect Stakeholders RS

             By means of questionnaires, collect stakeholders
             requirements.

             We favor a non-intrusive lightweight approach such as a
             WEB based questionnaire.

             Minimize the risk of influencing stakeholder.

             There is risk that:
                 he/she did not really understand the task;
                 the granularity and level is very different between
                respondents;
                 the respondent population is not heterogeneous enough;
                 the sample size is small.
WCRE 2008,
 Antwerp

    7/21
Step 2 – Vector Space Mapping

             The goal is to group single requirements by
             different users into clusters representing the
             same functionality/concept.

             By means of standard IR tools, map the collected
             requirements into a vector space.

             Stopper, stemmer, and TDF/IDF plus thesaurus
             expansions:
                certain stakeholders may use cryptic terms such as
                RFC or test/benchmark acronyms.

WCRE 2008,
 Antwerp

    8/21
Step 3 – Clustering

             Transform similarity into a distance.

             Apply robust partition around medoids.

             Estimate the number of clusters (different
             requirements) silhouette:
                                         .
                                                 >0.70 very strong structure
                                                 0.50 … 0.70 reasonable structure
                           b(i ) − a(i )         0.25 … 0.50 weak structure
               s (i ) =
                          max{a(i ), b(i )}      < 0.25 no structure.

                a(i) average distance to the other PRI in the cluster;
                b(i) is the average distance to PRI in the nearest cluster.


WCRE 2008,   Take the flex close to max value of the average
 Antwerp
             silhouette.
    9/21
Step 3 Bis – Tree Structure

             If there is a weak structure, check for a
             requirement tree organization.

             Re-cluster with AGNES.

             Compute the Agglomerative Coefficient (AC).

             AC measures the strength of the hierarchical
             structure discovered.

             AC > 0.9 a very strong hierarchical structure.


WCRE 2008,
             Impose a threshold on the average similarity to
 Antwerp     avoid grouping “too different” things.
   10/21
Step 4 – Label Clusters

             Process each PRI of a cluster:
                stopping, stemming;
                build cluster-specific dictionary;
                weight each word by its frequency in the cluster:
                    If a word is in all the PRI in a cluster, its weight is 1.00. If a word
                    appears in half of the PRI, its weight is 0.50.
             For a given stemmed PRI, calculate a score:
                sum up the weights of the stems present in the cluster
                dictionary to obtain a positive weight;
                count the number of words in the cluster-specific dictionary
                that are absent in the current PRI:
                    obtain a negative weight.
             Assign a score to the PRI computed as:
                 the ratio positive weight / negative weights.
             Label the cluster:
WCRE 2008,
 Antwerp         take the PRI with the highest score.
   11/21
Case Study

             Mimic the recovery process for a Web browser.

             Pool via e-mail to a set of users (about 200).

             25 answers out of which we kept 22, overall 433
             user needs:
                mostly male (20), age varies, average 36, standard
                deviation 9.5;
                respondents: 10 researchers, five lecturers/professors,
                four students, one programmer, and two project
                managers.


WCRE 2008,
 Antwerp

   12/21
PAM - AGNES

             We did not find a strong or evident cluster
             structure:
                silhouette about 0.26;
                region between 167 – 170 cluster:
                   say 170 clusters or less.


             AGNES reports a strong structure:
                AC above 0.9.


             Grouping via AGNES
                grows a tree starting from leaves

WCRE 2008,
 Antwerp

   13/21
Outliers

             Setting a cluster internal similarity threshold
             decides
                top level clusters
                singleton clusters - outliers
                inner nodes


             The “non kept” are also important:
                single user needs;
                more expert users may use acronyms
                    must comply with ACID2
                “too generic: sentences:
                   it should be fast.

WCRE 2008,
 Antwerp

   14/21
15/21
         Antwerp
        WCRE 2008,
                                                                                                                                   1000




                                        100
                                              200
                                                    300
                                                                400
                                                                                 500
                                                                                                        600
                                                                                                                 700
                                                                                                                       800
                                                                                                                             900




                                    0
                        Threshold



                     0.035



                     0.075



                     0.115



                     0.155



                     0.195



                     0.235



                     0.275



                     0.315



                     0.355



                     0.395



                     0.435



                     0.475



                     0.515



                     0.555



                     0.595



                     0.635



                     0.675



                     0.715



                     0.755



                     0.795



                     0.835



                     0.875



                     0.915
                                                                                                          Tops




                                                                      Overall
                                                                                Leaves




                                                          Outliers
                                                                                         Intermediate
                                                                                                                                          AGNES Clusters
Manual Verification

             Two people reviewed cluster and cluster labeling.

             IR measures precision and recall.

             Precision measures the quality of the clusters.

             A conservative approach:
                “Yes” was assigned if both authors said “Yes”;
                “No” was assigned if one of the authors said “No”;
                “Maybe” was assigned in the other cases.


WCRE 2008,
 Antwerp

   16/21
Precision Recall – 0.36
             128 Common User Needs, 181 Outliers


                     1




               0,8




                                                                                                                                     Precision

               0,6                                                                                                                   Recall


                                                                                                                                     Percentage of Outliers




               0,4




               0,2




                     0
                         Threshold




                                                                                             0,645
                                                                             0,525


                                                                                     0,585




                                                                                                                             0,885
                                                                                                     0,705


                                                                                                             0,765


                                                                                                                     0,825
                                             0,285
                                     0,225




                                                     0,345


                                                             0,405



                                                                     0,465




WCRE 2008,
 Antwerp

   17/21
Traceability Task

             PRI for a Web browser provided:
                Web site: www.learnthenet.com.


             There are 20 LtN PRI:
                textual PRI ranging from 5 to 73 words, having on
                average 23.5 words.


             LtN10: “The toolbar should include a Reload or
             Refresh button to load the web page again.”

             Trace via vector space retrieval with tf-idf.

WCRE 2008,
 Antwerp     Similarity threshold of 0.20.
   18/21
Manual Evaluation by Two Authors

             14 of the 20 LtN PRI are traced:
               the 14 PRI were all marked as “Yes” by both authors.


             If we also include the two marked as “Maybe”
             there are 16 LtN PRI out of 20 traced.

             Overall, between 70% (“Yes” only) and 80%
             (“Yes” and “Maybe”) of the LtN PRI are also
             found in the PRI obtained from the respondents.



WCRE 2008,
 Antwerp

   19/21
Threats to Validity

             External validity: only one system and 22
             answers out of 200, impact of vocabulary is not
             known.

             Construct validity: computation performed using
             widely adopted toolsets, other tool can produce
             different results.

             Reliability validity: material will be made
             available.


WCRE 2008,
             Internal validity: subjectivity introduced by
 Antwerp     experts, “Yes” if and only if both agrees.
   20/21
Conclusion

             AGNES clusters PRI with an accuracy of 70%.

             A similarity threshold of about 0.36, about 55% of
             the PRI were common to two or more
             stakeholders and 42% were outliers:
                128 – 181.


             We automatically label the common and outlier
             PRI with 82% of the labels being correct.

             The method achieves roughly 70% recall and
WCRE 2008,
             70% precision when compared to a ground truth.
 Antwerp

   21/21

Contenu connexe

Tendances

Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protectionAdvance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protectionIJERA Editor
 
Architecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe SystemsArchitecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe SystemsDharmalingam Ganesan
 
Repositories: What are they and what are they good for?
Repositories: What are they and what are they good for?Repositories: What are they and what are they good for?
Repositories: What are they and what are they good for?Courtney Michael
 
Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.Antonio Marcos Alberti
 
OW2 Petals Dragon SOA Linuxtag09
OW2 Petals Dragon SOA Linuxtag09OW2 Petals Dragon SOA Linuxtag09
OW2 Petals Dragon SOA Linuxtag09Catherine Nuel
 
Ph.D. Dissertation
Ph.D. DissertationPh.D. Dissertation
Ph.D. DissertationSumant Tambe
 
Users roles for co-creation of innovation in living lab networks Seppo Leminen
Users roles for co-creation of  innovation in living lab networks  Seppo LeminenUsers roles for co-creation of  innovation in living lab networks  Seppo Leminen
Users roles for co-creation of innovation in living lab networks Seppo LeminenEuropean Network of Living Labs (ENoLL)
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationSoftwarePractice
 

Tendances (12)

Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protectionAdvance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
 
Architecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe SystemsArchitecture Analysis of Systems based on Publish-Subscribe Systems
Architecture Analysis of Systems based on Publish-Subscribe Systems
 
Repositories: What are they and what are they good for?
Repositories: What are they and what are they good for?Repositories: What are they and what are they good for?
Repositories: What are they and what are they good for?
 
Bosc2011 arakawa
Bosc2011 arakawaBosc2011 arakawa
Bosc2011 arakawa
 
Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.Apresentação feita em 2005 no Annual Simulation Symposium.
Apresentação feita em 2005 no Annual Simulation Symposium.
 
OW2 Petals Dragon SOA Linuxtag09
OW2 Petals Dragon SOA Linuxtag09OW2 Petals Dragon SOA Linuxtag09
OW2 Petals Dragon SOA Linuxtag09
 
ICSM08a.ppt
ICSM08a.pptICSM08a.ppt
ICSM08a.ppt
 
Ph.D. Dissertation
Ph.D. DissertationPh.D. Dissertation
Ph.D. Dissertation
 
4092a081
4092a0814092a081
4092a081
 
Users roles for co-creation of innovation in living lab networks Seppo Leminen
Users roles for co-creation of  innovation in living lab networks  Seppo LeminenUsers roles for co-creation of  innovation in living lab networks  Seppo Leminen
Users roles for co-creation of innovation in living lab networks Seppo Leminen
 
10 fn s42
10 fn s4210 fn s42
10 fn s42
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservation
 

En vedette

โปรแกรมย่อยและฟังก์ชันมาตรฐาน
โปรแกรมย่อยและฟังก์ชันมาตรฐานโปรแกรมย่อยและฟังก์ชันมาตรฐาน
โปรแกรมย่อยและฟังก์ชันมาตรฐานThachanok Plubpibool
 
Zend framework 2.0
Zend framework 2.0Zend framework 2.0
Zend framework 2.0shrutisgupta
 
Marquis half page vertical quill jan 2009
Marquis half page vertical quill jan 2009Marquis half page vertical quill jan 2009
Marquis half page vertical quill jan 2009Sam Reul
 
Careers in archaeology
Careers in archaeologyCareers in archaeology
Careers in archaeologyentranzz123
 
SCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy Kennedy
SCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy KennedySCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy Kennedy
SCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy KennedyWisconsin Land Information Association
 

En vedette (8)

WCRE04.ppt
WCRE04.pptWCRE04.ppt
WCRE04.ppt
 
โปรแกรมย่อยและฟังก์ชันมาตรฐาน
โปรแกรมย่อยและฟังก์ชันมาตรฐานโปรแกรมย่อยและฟังก์ชันมาตรฐาน
โปรแกรมย่อยและฟังก์ชันมาตรฐาน
 
Zend framework 2.0
Zend framework 2.0Zend framework 2.0
Zend framework 2.0
 
50 states
50 states50 states
50 states
 
Marquis half page vertical quill jan 2009
Marquis half page vertical quill jan 2009Marquis half page vertical quill jan 2009
Marquis half page vertical quill jan 2009
 
Slideshare
SlideshareSlideshare
Slideshare
 
Careers in archaeology
Careers in archaeologyCareers in archaeology
Careers in archaeology
 
SCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy Kennedy
SCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy KennedySCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy Kennedy
SCO Pilot Project Efforts to Integrate County PLSS Datasets - Timothy Kennedy
 

Similaire à WCRE08.ppt

Why AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesWhy AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesTimothy Chen
 
Microsoft HPC User Group
Microsoft HPC User Group Microsoft HPC User Group
Microsoft HPC User Group sjwoodman
 
IRJET- Extension to Visual Information Narrator using Neural Network
IRJET- Extension to Visual Information Narrator using Neural NetworkIRJET- Extension to Visual Information Narrator using Neural Network
IRJET- Extension to Visual Information Narrator using Neural NetworkIRJET Journal
 
Framework Engineering
Framework EngineeringFramework Engineering
Framework EngineeringYoungSu Son
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designMr. Chanuwan
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES
 
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCINGnexgentechnology
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOpsTimothy Chen
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 
CHOReOS European project presented to ANIKETOS project
CHOReOS European project presented to ANIKETOS projectCHOReOS European project presented to ANIKETOS project
CHOReOS European project presented to ANIKETOS projectchoreos
 
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ SeabirdssolutionsAlgorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ SeabirdssolutionsSBGC
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cseSBGC
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cseSBGC
 
Sentiment Analysis using Naïve Bayes, CNN, SVM
Sentiment Analysis using Naïve Bayes, CNN, SVMSentiment Analysis using Naïve Bayes, CNN, SVM
Sentiment Analysis using Naïve Bayes, CNN, SVMIRJET Journal
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learningbutest
 

Similaire à WCRE08.ppt (20)

Wcre08.ppt
Wcre08.pptWcre08.ppt
Wcre08.ppt
 
Why AIOps Matters For Kubernetes
Why AIOps Matters For KubernetesWhy AIOps Matters For Kubernetes
Why AIOps Matters For Kubernetes
 
Microsoft HPC User Group
Microsoft HPC User Group Microsoft HPC User Group
Microsoft HPC User Group
 
IRJET- Extension to Visual Information Narrator using Neural Network
IRJET- Extension to Visual Information Narrator using Neural NetworkIRJET- Extension to Visual Information Narrator using Neural Network
IRJET- Extension to Visual Information Narrator using Neural Network
 
Framework Engineering
Framework EngineeringFramework Engineering
Framework Engineering
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system design
 
Rain technology seminar
Rain technology seminar Rain technology seminar
Rain technology seminar
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
Sahana 2011-annual-meeting-standards-pmc
Sahana 2011-annual-meeting-standards-pmcSahana 2011-annual-meeting-standards-pmc
Sahana 2011-annual-meeting-standards-pmc
 
Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...
 
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOps
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
CHOReOS European project presented to ANIKETOS project
CHOReOS European project presented to ANIKETOS projectCHOReOS European project presented to ANIKETOS project
CHOReOS European project presented to ANIKETOS project
 
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ SeabirdssolutionsAlgorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
 
43
4343
43
 
Sentiment Analysis using Naïve Bayes, CNN, SVM
Sentiment Analysis using Naïve Bayes, CNN, SVMSentiment Analysis using Naïve Bayes, CNN, SVM
Sentiment Analysis using Naïve Bayes, CNN, SVM
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learning
 

Plus de Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 

Plus de Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Dernier

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

WCRE08.ppt

  • 1. Prereqir: Recovering Pre-Requirements via Cluster Analysis Jane Huffman Hayes, Giuliano (Giulio) Antoniol and Yann-Gaël Guéhéneuc WCRE 2008, Antwerp 1/21
  • 2. Content Problem Statement PREREQUIR Idea PREREQUIR Process Technologies WEB Browser Requirements Case Study Results Conclusions WCRE 2008, Antwerp 2/21
  • 3. The Challenge A few years after deployment, the RS may no longer exist. If it exists, it will be almost surely outdated. My customers may desire new functionalities or technologies that my system may or may not implement. I poll my stakeholders: programmers, managers, testing team members, marketing personnel, and end users; WCRE 2008, Antwerp find out what they believe the system should do. 3/21
  • 4. PREREQIR in Essence We need a pre-requirement document: what the competitor systems do; what our customer base needs. Obtain and vet a list of requirements from diverse stakeholders. Structure requirements by mapping them into representation suitable for grouping via pattern- recognition and similarity-based clustering. Analyze clustered requirements to divide them into set of essential and set of optional WCRE 2008, Antwerp requirements. 4/21
  • 5. The PREREQUIR Process Textual documents Vector space/clustering 001010 browser support zoom unzoom page detail rp Requirements rp Recovered PRI ri and oj Browser support print 101110 Split, Stop-word Removal, Clustering Labelling Stemming TF-IDF PAM/AGNES Clusters Tokenization Requirements ri ri 1 2 3 4 WCRE 2008, Antwerp 5/21
  • 6. PREREQUIR Technology Standard information retrieval vector space model. Indexing process: Stopper; Stemmer; Thesaurus (not vital but helps); TF-IDF indexing. Clustering PAM and AGNES. WCRE 2008, Labeling: still an open question. Antwerp 6/21
  • 7. Step 1 – Collect Stakeholders RS By means of questionnaires, collect stakeholders requirements. We favor a non-intrusive lightweight approach such as a WEB based questionnaire. Minimize the risk of influencing stakeholder. There is risk that: he/she did not really understand the task; the granularity and level is very different between respondents; the respondent population is not heterogeneous enough; the sample size is small. WCRE 2008, Antwerp 7/21
  • 8. Step 2 – Vector Space Mapping The goal is to group single requirements by different users into clusters representing the same functionality/concept. By means of standard IR tools, map the collected requirements into a vector space. Stopper, stemmer, and TDF/IDF plus thesaurus expansions: certain stakeholders may use cryptic terms such as RFC or test/benchmark acronyms. WCRE 2008, Antwerp 8/21
  • 9. Step 3 – Clustering Transform similarity into a distance. Apply robust partition around medoids. Estimate the number of clusters (different requirements) silhouette: . >0.70 very strong structure 0.50 … 0.70 reasonable structure b(i ) − a(i ) 0.25 … 0.50 weak structure s (i ) = max{a(i ), b(i )} < 0.25 no structure. a(i) average distance to the other PRI in the cluster; b(i) is the average distance to PRI in the nearest cluster. WCRE 2008, Take the flex close to max value of the average Antwerp silhouette. 9/21
  • 10. Step 3 Bis – Tree Structure If there is a weak structure, check for a requirement tree organization. Re-cluster with AGNES. Compute the Agglomerative Coefficient (AC). AC measures the strength of the hierarchical structure discovered. AC > 0.9 a very strong hierarchical structure. WCRE 2008, Impose a threshold on the average similarity to Antwerp avoid grouping “too different” things. 10/21
  • 11. Step 4 – Label Clusters Process each PRI of a cluster: stopping, stemming; build cluster-specific dictionary; weight each word by its frequency in the cluster: If a word is in all the PRI in a cluster, its weight is 1.00. If a word appears in half of the PRI, its weight is 0.50. For a given stemmed PRI, calculate a score: sum up the weights of the stems present in the cluster dictionary to obtain a positive weight; count the number of words in the cluster-specific dictionary that are absent in the current PRI: obtain a negative weight. Assign a score to the PRI computed as: the ratio positive weight / negative weights. Label the cluster: WCRE 2008, Antwerp take the PRI with the highest score. 11/21
  • 12. Case Study Mimic the recovery process for a Web browser. Pool via e-mail to a set of users (about 200). 25 answers out of which we kept 22, overall 433 user needs: mostly male (20), age varies, average 36, standard deviation 9.5; respondents: 10 researchers, five lecturers/professors, four students, one programmer, and two project managers. WCRE 2008, Antwerp 12/21
  • 13. PAM - AGNES We did not find a strong or evident cluster structure: silhouette about 0.26; region between 167 – 170 cluster: say 170 clusters or less. AGNES reports a strong structure: AC above 0.9. Grouping via AGNES grows a tree starting from leaves WCRE 2008, Antwerp 13/21
  • 14. Outliers Setting a cluster internal similarity threshold decides top level clusters singleton clusters - outliers inner nodes The “non kept” are also important: single user needs; more expert users may use acronyms must comply with ACID2 “too generic: sentences: it should be fast. WCRE 2008, Antwerp 14/21
  • 15. 15/21 Antwerp WCRE 2008, 1000 100 200 300 400 500 600 700 800 900 0 Threshold 0.035 0.075 0.115 0.155 0.195 0.235 0.275 0.315 0.355 0.395 0.435 0.475 0.515 0.555 0.595 0.635 0.675 0.715 0.755 0.795 0.835 0.875 0.915 Tops Overall Leaves Outliers Intermediate AGNES Clusters
  • 16. Manual Verification Two people reviewed cluster and cluster labeling. IR measures precision and recall. Precision measures the quality of the clusters. A conservative approach: “Yes” was assigned if both authors said “Yes”; “No” was assigned if one of the authors said “No”; “Maybe” was assigned in the other cases. WCRE 2008, Antwerp 16/21
  • 17. Precision Recall – 0.36 128 Common User Needs, 181 Outliers 1 0,8 Precision 0,6 Recall Percentage of Outliers 0,4 0,2 0 Threshold 0,645 0,525 0,585 0,885 0,705 0,765 0,825 0,285 0,225 0,345 0,405 0,465 WCRE 2008, Antwerp 17/21
  • 18. Traceability Task PRI for a Web browser provided: Web site: www.learnthenet.com. There are 20 LtN PRI: textual PRI ranging from 5 to 73 words, having on average 23.5 words. LtN10: “The toolbar should include a Reload or Refresh button to load the web page again.” Trace via vector space retrieval with tf-idf. WCRE 2008, Antwerp Similarity threshold of 0.20. 18/21
  • 19. Manual Evaluation by Two Authors 14 of the 20 LtN PRI are traced: the 14 PRI were all marked as “Yes” by both authors. If we also include the two marked as “Maybe” there are 16 LtN PRI out of 20 traced. Overall, between 70% (“Yes” only) and 80% (“Yes” and “Maybe”) of the LtN PRI are also found in the PRI obtained from the respondents. WCRE 2008, Antwerp 19/21
  • 20. Threats to Validity External validity: only one system and 22 answers out of 200, impact of vocabulary is not known. Construct validity: computation performed using widely adopted toolsets, other tool can produce different results. Reliability validity: material will be made available. WCRE 2008, Internal validity: subjectivity introduced by Antwerp experts, “Yes” if and only if both agrees. 20/21
  • 21. Conclusion AGNES clusters PRI with an accuracy of 70%. A similarity threshold of about 0.36, about 55% of the PRI were common to two or more stakeholders and 42% were outliers: 128 – 181. We automatically label the common and outlier PRI with 82% of the labels being correct. The method achieves roughly 70% recall and WCRE 2008, 70% precision when compared to a ground truth. Antwerp 21/21