9. Domain Analysis
9
address bar
tabbed
browsing
HTML rendering engine
JavaScript engine
GUI components
browser extensions
bookmarks
history
supporting libraries
CSS
XML
A manual, expensive process for large domains!
10. Topic Modeling
10
html css xml
javascript png
jpeg gif sqlite
bmp
gtk event
image html
button frame
window
12. Topic Modeling
12
KeywordsTopic
html css xml
javascript
A
png jpeg gif
bmp image
B
sqlite
C
Latent Dirichlet
Allocation
gtk event
button
frame
window
D
html css xml
javascript png
jpeg gif sqlite
bmp
gtk event
image html
button frame
window
13. Topic Modeling
13
KeywordsTopic
html css xml
javascript
A
png jpeg gif
bmp image
B
sqlite
C
html css xml
javascript png
jpeg gif sqlite
bmp
gtk event
image html
button frame
window
Latent Dirichlet
Allocation
gtk event
button
frame
window
D
A: 0.4
B: 0.4
C: 0.2
D: 0.0
A: 0.2
B: 0.2
C: 0.0
D: 0.6
14. Topic Modeling
14
KeywordsTopic
html css xml
javascript
A
png jpeg gif
bmp image
B
sqlite
C
html css xml
javascript png
jpeg gif sqlite
bmp
gtk event
image html
button frame
window
Latent Dirichlet
Allocation
gtk event
button
frame
window
D
A: 0.4
B: 0.4
C: 0.2
D: 0.0
A: 0.2
B: 0.2
C: 0.0
D: 0.6
Human analysis
32. Successful Identification of Commonalities &
Variability in Domain
Tex
t
Bas
ed
GUI
Bas
ed
We
b
Bro
wse
rs
0
10
20
30
40
50
60
70
80
90
100
Percent of
Systems
contained
in cluster
(spread)
Top 10 Topic Clusters
32
layout:
view, window, event, widget, terminal, tab
33. Successful Identification of Commonalities &
Variability in Domain
Tex
t
Bas
ed
GUI
Bas
ed
We
b
Bro
wse
rs
0
10
20
30
40
50
60
70
80
90
100
Percent of
Systems
contained
in cluster
(spread)
Top 10 Topic Clusters
33
layout:
view, window, event, widget, terminal, tab
text-based components:
text, items, line, term, htext
38. Threats
38
•Comments: more semantic information than code,
but often missing
•Only open source systems studied
•Other documentation
•One set of thresholds and cluster size tested