3. General format: Hybrid Mode
Participants: Live + Remote participants
Remote participants:
- Check your email for your Zoom invitation
- Please remain muted until it is your turn to speak!
Presentations:
Live presentations
In-person: Room 103
Questions:
After each presentation we have allocated time for Q & A.
Remote participants:
- We encourage you to use the “Raise your hand option” to ask questions.
- Organizers will invite to turn on your mic to ask (voice) questions.
Everything on Twitch will be recorded and uploaded to YouTube!
Sebastiano
6. Thanks to our sponsors!
IEEE Technical Community on Software Engineering
(TCSE)
ACM Special Interest Group on Software Engineering
(SIGSOFT)
Sebastiano
7. Schedule (UTC+10)
09:10 → Opening & Awards
09:30 → Live Keynote: Automated Bug Management: Reflections and
the Road Ahead by David Lo
10:30 → Break
11:00 → Live Keynote: Trends and Opportunities in the Application of
Large Language Models: the Quest for Maximum Effect by
Albert Ziegler
12:00 → Session: Position Papers
12:30 → Lunch
13:45 → Tool Competition (overview + 5 live presentations + closing)
-> Tool Award session
15:15 → Break
15:45 → Session: Research Papers
17:15 → Closing and Award session
https://conf.researchr.org/program/icse-2023/program-icse-2023/ https://nlbse2023.github.io/
Sebastiano
8. Keynote (09:30)
Automated Bug Management: Reflections and the
Road Ahead
For many projects, bug reports, predominantly written in natural
language, are submitted daily to issue tracking systems. The
number of such reports is often too many for busy software
engineers to manually handle and eventually resolve in a timely
fashion. Also, the resolution of each report often requires many
steps, e.g., detecting invalid reports, assigning the reports to
engineers with the right expertise, finding the buggy files requiring
changes, fixing the buggy files, etc. Incorrect decisions made for
any of these steps can slow down the resolution of the bug report.
To help reduce engineers’ workload and improve the reliability of
systems, in the last decade, many automated solutions have been
proposed for various steps in the bug management and resolution
process. This talk will first do a reflection on the hundreds of studies
done in this popular area of Natural-Language Based Software
Engineering (NLBSE), highlighting success cases and the explored
directions. It will then highlight interesting future work in the road
ahead, describing important unsolved problems and untapped
opportunities.
David Lo
Bio:
Professor of Computer Science at the School of
Computing and Information Systems, Singapore
Management University. He leads the SOftware
Analytics Research (SOAR) group. His research
interest is in the intersection of software
engineering, cybersecurity, and data science,
encompassing socio-technical aspects and
analysis of different kinds of software artifacts,.
He has won more than 15 international research
and service awards, including 2 Most Influential
Paper Awards and 6 ACM SIGSOFT
Distinguished Paper Awards. He is currently
serving on the SIGSOFT Executive Committee,
Editorial Boards of TSE, TRel, and EMSE, and as a
PC Co-Chair of ESEC/FSE 2024 and ICSE 2025.
Sebastiano
10. Keynote (11:00)
Trends and Opportunities in the Application of Large
Language Models: the Quest for Maximum Effect
As large language models become more and more sophisticated,
the machine learning problem "How to train a great new model so it
best solves my task" increasingly pivots to "How to run a great
existing model so it best solves my task". This is easier said than
done and requires reconciliation of four goals:
1. How to communicate the problem and the format in which
you expect your answer;
2. How to communicate all background information the
model might need to arrive at that answer;
3. How to communicate with the model robustly, in particular
in a way that it is used to from its training set;
4. How to keep the question short in order to adhere to the
context window and save computing time and cost.
The talk discusses strategies for addressing each of these goals in
the code domain, as well as methods for balancing them against
each other. The keynote will in particular focus on the example of
GitHub Copilot and related AI for software development projects.
Albert Ziegler
Bio:
Principal machine learning engineer with a
background in Mathematics and a home at
GitHub Next, GitHub's innovation and
future group. His main interests are
combinations of deductive and intuitive
reasoning to improve the software
development experience. He's previously
worked on developer productivity, ML
guided CodeQL, and he was part of the trio
that conceived and then implemented the
GitHub Copilot project. His most recent
projects include Copilot Radar and AI for
Pull Requests.
Sebastiano
11. Research papers (2 sessions)
Full papers (20 minutes):
- 15 minutes for talk
- 5 minutes for questions
Short papers (15 minutes):
- 10 minutes for talk
- 5 minutes for questions
Position papers (15 minutes):
- 8 minutes for talk
- 7 minutes for questions
Sebastiano
12. Position papers: schedule (12:00)
The (Ab)use of Open Source Code to Train Language Models (position)
Ali Al-Kaswan and Maliheh Izadi Delft University of Technology
Exploring Generalizability of NLP-based Models for Modern Software Development
Cross-Domain Environments (position - online)
Rrezarta Krasniqi and Hyunsook Do University of North Texas
Sebastiano
15. Tool Competition schedule
Opening & Issue Report Classification Competition
Rafael Kallis1
, Maliheh Izadi2
, Pooja Rani3
, Luca Pascarella4
, Oscar Chaparro5
[1] Rafael Kallis Consulting, [2] Delft University of
Technology, [3] University of Zurich, [4] ETH Zurich [5] College of William and Mary
An Intelligent Tool for Classifying Issue Reports
Muhammad Laiq Blekinge Institute of Technology
Few-Shot Learning for Issue Report Classification
Giuseppe Colavito, Filippo Lanubile, Nicole Novielli University of Bari
Code Comment Classification Competition
Pooja Rani1
, Luca Pascarella2
, Oscar Chaparro3
[1] University of Zurich, [2] ETH Zurich [3] College of William and Mary
Performance Comparison of Binary Machine Learning Classifiers in Identifying Code Comment Types: An Exploratory
Study
Amila Indika, Peter Y. Washington and Anthony Peruma University of Hawaiʻi at Mānoa
Classifying Code Comments via Pre-trained Programming Language Model
Ying Li, Haibo Wang, Huaien Zhang and Shin Hwei Tan Southern University of Science and Technology
STACC: Code Comment Classification using Sentence Transformers
Ali Al-Kaswan, Maliheh Izadi and Arie van Deursen Delft University of Technology
Closing
Rafael Kallis1
, Maliheh Izadi2
, Pooja Rani3
, Luca Pascarella4
, Oscar Chaparro5
[1] Rafael Kallis Consulting, [2] Delft University of
Technology, [3] University of Zurich, [4] ETH Zurich [5] College of William and Mary
Tool Chairs
17. Research papers: schedule (15:45)
An Exploratory Study on the Usage and Readability of Messages within Assertion Methods of Test Cases (full - online)
Taryn Takebayashi1
, Anthony Peruma1
, Mohamed Weim Mkaouer2
and Christian Newman2
[1] University of Hawai‘i at Mānoa, [2] Rochester Institute of Technology
Stop Words for Processing Software Engineering Documents: Do they Matter? (full)
Yaohou Fan1
, Chetan Arora2
and Christoph Treude1
[1] University of Melbourne, [2] Monash University
Applying Information Theory to Software Evolution (full)
Adriano Torres1
, Sebastian Baltes1
, Christoph Treude2
and Markus Wagner3
[1] University of Adelaide, [2] University of
Melbourne, [3] Monash University
Zero-shot Prompting for Code Complexity Prediction Using GitHub Copilot (short - online)
Mohammed Latif Siddiq1
, Abdus Samee2
, Sk Ruhul Azgor2
, Md. Asif Haider2
, Shehabul Islam Sawraz2
and Joanna Cecilia
da Silva Santos1
[1] University of Notre Dame, [2] Bangladesh University of Engineering and Technology
Evaluating Code Comment Generation with Summarized API Docs (short - online)
Bilel Matmti and Fatemeh Fard University of British Columbia
Sebastiano
23. Thanks to: the Tool Competition Co-chairs
Rafael Kallis Maliheh Izadi Pooja Rani Luca Pascarella Oscar Chaparro
for organizing two exciting and relevant tool competitions!
Sebastiano
26. Thanks to: our Web Chair
Arnaldo Sgueglia
for his support with the website and virtualization!
Sebastiano
27. Thanks to: the Program Committee members
for their support in reviewing papers!
Sebastiano
28. Thanks to: Student Volunteers
Christian Birchler Sajad Khatiri
for their help with technical duties and virtualization!
Sebastiano
29. What’s Next?
Special issue at Science of Computer Programming 2023:
“NLBSE’23: Natural Language-based Software to
Support Software Engineering Processes”
Open Call!
Short papers with a great focus on software and replication packages
Submission Dates: November 1st, 2023
Recordings of our Workshop will be made available in the webpage.
Sebastiano
Sebastiano
30. What’s Next?
• Coordinate with similar workshops (e.g., NLP-SEA, NLP4RE) in other SE venues
to continuously foster research in the field.
• Involve more industrial subjects and practitioners.
• Promote discussion around current and relevant themes (e.g., AI-language
models) and new competition in other relevant NLBSE areas
• Encourage the design, implementation, and public availability of usable and
high-quality tools to deal with NLBSE-related challenges.
• We are generally open to ideas or new NLBSE tool competition/challenges
(contact us)!
Sebastiano
31. Thank you all for participating!
See you next year in Lisbon
at NLBSE 2024!