SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
CAMBRIDGE, UK, 10 DEC 2008

                     Code-tagging and similarity-
                     based retrieval with myCBR
                     Thomas Roth-Berghofer & Daniel Bahls
                     Senior researcher, trb@dfki.de
                     German Research Centre for Artificial Intelligence DFKI GmbH

Samstag, 18. Juli 2009
Programmer‘s dilemma




Samstag, 18. Juli 2009
Programmer‘s dilemma




Samstag, 18. Juli 2009
Programmer‘s dilemma

              • Where is the code fragment I used to solve a
                    similar problem in the past?
              • Is this piece of code still available?
              • Is it worth the effort to search for it?
              • If so, what would be the right search term?



Samstag, 18. Juli 2009
Personalised approach




Samstag, 18. Juli 2009
Personalised approach
                               • Personal
                                vocabulary: tags




Samstag, 18. Juli 2009
Personalised approach
                               • Personal
                                 vocabulary: tags
                               • Linking tags




Samstag, 18. Juli 2009
Personalised approach
                               • Personal
                                 vocabulary: tags
                               • Linking tags
                               • Case-based
                                 retrieval




Samstag, 18. Juli 2009
Personalised approach
                               • Personal
                                 vocabulary: tags
                               • Linking tags
                               • Case-based
                                 retrieval
                               • Work context



Samstag, 18. Juli 2009
Personalised approach
                               • Personal
                                 vocabulary: tags
                               • Linking tags
                               • Case-based
                                 retrieval
                               • Work context
                               • Social dimension:
                                 tag exchange


Samstag, 18. Juli 2009
CBR cycle




           Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues,
           methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.


Samstag, 18. Juli 2009
CBR cycle                                             myCBR
                                                                           CBR




           Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues,
           methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.


Samstag, 18. Juli 2009
Code snippet & context
                   Java code snippet




Samstag, 18. Juli 2009
Code snippet & context
                   Java code snippet    Work context
                                       • java.net.URL
                                       • java.net.URLConnection
                                       • java.io.InputStream
                                       • java.lang.StringBuffer
                                       • java.io.BufferedReader
                                       • java.lang.String
                                       • java.lang.Exception

Samstag, 18. Juli 2009
Case structure
                          Attribute      Value type              category
                             Tags        String (multiple)   Problem description
                         Context items   String (multiple)   Problem description
                         Code snippet         String              Solution
                         Document type        String            Provenance
                         Project name         String            Provenance
                           File path          String            Provenance
                           Author ID          String            Provenance
                         Creation date        Long              Provenance
                            Rating            Float             Maintenance
                          Rating count       Integer            Maintenance



Samstag, 18. Juli 2009
Case structure                                Set by user
                                                                Set by coTag


                          Attribute      Value type              category
                             Tags        String (multiple)   Problem description
                         Context items   String (multiple)   Problem description
                         Code snippet         String              Solution
                         Document type        String            Provenance
                         Project name         String            Provenance
                           File path          String            Provenance
                           Author ID          String            Provenance
                         Creation date        Long              Provenance
                            Rating            Float             Maintenance
                          Rating count       Integer            Maintenance



Samstag, 18. Juli 2009
Acquiring case




Samstag, 18. Juli 2009
Acquiring case




Samstag, 18. Juli 2009
Query view

              • Search for tags: init,
                    logging config
              • Include context
                    => regard currently
                    selected code




Samstag, 18. Juli 2009
Retrieval


              • Result for: init, logging,
                    config
              • Ranked list of code
                    snippets




Samstag, 18. Juli 2009
Presentation of cases




Samstag, 18. Juli 2009
Situations in which
                   explanations play a role
              • Instructing explanations:
                    • Novice users want to know about how tagging and (similarity-based)
                         retrieval works.

              • Convincing explanations:
                    • Regular users want to check when the retrieval does not meet their
                         expectations.

              • Improving explanations
                    • Regular users want to correct coTag‘s behaviour.




Samstag, 18. Juli 2009
Explanation of matching


              • Search terms:
                    • init, logging, config
              • Case tags:
                    • init, Logger




Samstag, 18. Juli 2009
Graphical explanation of
                   trigram matching

              • Syntactical similarity
                    • Typos
                    • Stemming




Samstag, 18. Juli 2009
Similarity customisation
              • Tag similarities:
                           unsimilar       0%
                         partly similar   25%
                            similar       50%
                          very similar    75%
                           identical      100%
              • Updates personal and
                    community similarity
                    measure


Samstag, 18. Juli 2009
Similarity customisation
              • Tag similarities:
                           unsimilar       0%
                         partly similar   25%
                            similar       50%
                          very similar    75%
                           identical      100%
              • Updates personal and
                    community similarity
                    measure


Samstag, 18. Juli 2009
Three levels of similarity
                   calculation

                         Personal


                         Imported


                         Trigram




Samstag, 18. Juli 2009
Three levels of similarity
                   calculation

                         Personal


                         Imported


                         Trigram




Samstag, 18. Juli 2009
Three levels of similarity
                   calculation

                         Personal


                         Imported


                         Trigram




Samstag, 18. Juli 2009
Three levels of similarity
                   calculation

                         Personal


                         Imported


                         Trigram




Samstag, 18. Juli 2009
Three levels of similarity
                   calculation

                         Personal


                         Imported


                         Trigram




Samstag, 18. Juli 2009
Customised (personal)
                   and imported similarity




Samstag, 18. Juli 2009
Client-side architecture




Samstag, 18. Juli 2009
Client-side architecture




Samstag, 18. Juli 2009
Client-side architecture




Samstag, 18. Juli 2009
Tag and exchange code
                   snippets




Samstag, 18. Juli 2009
Samstag, 18. Juli 2009
Samstag, 18. Juli 2009
Take home messages




Samstag, 18. Juli 2009
Take home messages
              • Re-finding information is a quite
                   typical task in knowledge-work.




Samstag, 18. Juli 2009
Take home messages
              • Re-finding information is a quite
                   typical task in knowledge-work.
              • Tagging is a helpful and well-
                   known technique.




Samstag, 18. Juli 2009
Take home messages
              • Re-finding information is a quite
                   typical task in knowledge-work.
              • Tagging is a helpful and well-
                   known technique.
              • Similarity-based retrieval can
                   improve searches.




Samstag, 18. Juli 2009
Take home messages
              • Re-finding information is a quite
                   typical task in knowledge-work.
              • Tagging is a helpful and well-
                   known technique.
              • Similarity-based retrieval can
                   improve searches.
              • Explanation-aware development of
                   applications help you deal with
                   increased complexity of similarity-
                   based retrieval.




Samstag, 18. Juli 2009
Thank you!

                     CAMBRIDGE, UK, 10 DEC 2008

                     Code-tagging and similarity-
                     based retrieval with myCBR
                     Thomas Roth-Berghofer & Daniel Bahls
                     Senior researcher, trb@dfki.de
                     German Research Centre for Artificial Intelligence DFKI GmbH

Samstag, 18. Juli 2009

Contenu connexe

Dernier

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Code-tagging and similarity-based retrieval with myCBR

  • 1. CAMBRIDGE, UK, 10 DEC 2008 Code-tagging and similarity- based retrieval with myCBR Thomas Roth-Berghofer & Daniel Bahls Senior researcher, trb@dfki.de German Research Centre for Artificial Intelligence DFKI GmbH Samstag, 18. Juli 2009
  • 4. Programmer‘s dilemma • Where is the code fragment I used to solve a similar problem in the past? • Is this piece of code still available? • Is it worth the effort to search for it? • If so, what would be the right search term? Samstag, 18. Juli 2009
  • 6. Personalised approach • Personal vocabulary: tags Samstag, 18. Juli 2009
  • 7. Personalised approach • Personal vocabulary: tags • Linking tags Samstag, 18. Juli 2009
  • 8. Personalised approach • Personal vocabulary: tags • Linking tags • Case-based retrieval Samstag, 18. Juli 2009
  • 9. Personalised approach • Personal vocabulary: tags • Linking tags • Case-based retrieval • Work context Samstag, 18. Juli 2009
  • 10. Personalised approach • Personal vocabulary: tags • Linking tags • Case-based retrieval • Work context • Social dimension: tag exchange Samstag, 18. Juli 2009
  • 11. CBR cycle Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994. Samstag, 18. Juli 2009
  • 12. CBR cycle myCBR CBR Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994. Samstag, 18. Juli 2009
  • 13. Code snippet & context Java code snippet Samstag, 18. Juli 2009
  • 14. Code snippet & context Java code snippet Work context • java.net.URL • java.net.URLConnection • java.io.InputStream • java.lang.StringBuffer • java.io.BufferedReader • java.lang.String • java.lang.Exception Samstag, 18. Juli 2009
  • 15. Case structure Attribute Value type category Tags String (multiple) Problem description Context items String (multiple) Problem description Code snippet String Solution Document type String Provenance Project name String Provenance File path String Provenance Author ID String Provenance Creation date Long Provenance Rating Float Maintenance Rating count Integer Maintenance Samstag, 18. Juli 2009
  • 16. Case structure Set by user Set by coTag Attribute Value type category Tags String (multiple) Problem description Context items String (multiple) Problem description Code snippet String Solution Document type String Provenance Project name String Provenance File path String Provenance Author ID String Provenance Creation date Long Provenance Rating Float Maintenance Rating count Integer Maintenance Samstag, 18. Juli 2009
  • 19. Query view • Search for tags: init, logging config • Include context => regard currently selected code Samstag, 18. Juli 2009
  • 20. Retrieval • Result for: init, logging, config • Ranked list of code snippets Samstag, 18. Juli 2009
  • 22. Situations in which explanations play a role • Instructing explanations: • Novice users want to know about how tagging and (similarity-based) retrieval works. • Convincing explanations: • Regular users want to check when the retrieval does not meet their expectations. • Improving explanations • Regular users want to correct coTag‘s behaviour. Samstag, 18. Juli 2009
  • 23. Explanation of matching • Search terms: • init, logging, config • Case tags: • init, Logger Samstag, 18. Juli 2009
  • 24. Graphical explanation of trigram matching • Syntactical similarity • Typos • Stemming Samstag, 18. Juli 2009
  • 25. Similarity customisation • Tag similarities: unsimilar 0% partly similar 25% similar 50% very similar 75% identical 100% • Updates personal and community similarity measure Samstag, 18. Juli 2009
  • 26. Similarity customisation • Tag similarities: unsimilar 0% partly similar 25% similar 50% very similar 75% identical 100% • Updates personal and community similarity measure Samstag, 18. Juli 2009
  • 27. Three levels of similarity calculation Personal Imported Trigram Samstag, 18. Juli 2009
  • 28. Three levels of similarity calculation Personal Imported Trigram Samstag, 18. Juli 2009
  • 29. Three levels of similarity calculation Personal Imported Trigram Samstag, 18. Juli 2009
  • 30. Three levels of similarity calculation Personal Imported Trigram Samstag, 18. Juli 2009
  • 31. Three levels of similarity calculation Personal Imported Trigram Samstag, 18. Juli 2009
  • 32. Customised (personal) and imported similarity Samstag, 18. Juli 2009
  • 36. Tag and exchange code snippets Samstag, 18. Juli 2009
  • 40. Take home messages • Re-finding information is a quite typical task in knowledge-work. Samstag, 18. Juli 2009
  • 41. Take home messages • Re-finding information is a quite typical task in knowledge-work. • Tagging is a helpful and well- known technique. Samstag, 18. Juli 2009
  • 42. Take home messages • Re-finding information is a quite typical task in knowledge-work. • Tagging is a helpful and well- known technique. • Similarity-based retrieval can improve searches. Samstag, 18. Juli 2009
  • 43. Take home messages • Re-finding information is a quite typical task in knowledge-work. • Tagging is a helpful and well- known technique. • Similarity-based retrieval can improve searches. • Explanation-aware development of applications help you deal with increased complexity of similarity- based retrieval. Samstag, 18. Juli 2009
  • 44. Thank you! CAMBRIDGE, UK, 10 DEC 2008 Code-tagging and similarity- based retrieval with myCBR Thomas Roth-Berghofer & Daniel Bahls Senior researcher, trb@dfki.de German Research Centre for Artificial Intelligence DFKI GmbH Samstag, 18. Juli 2009