Dongmei Zhang and Tao Xie. Pathways to Technology Transfer and Adoption: Achievements and Challenges. In Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering in Practice (SEIP), Mini-Tutorial, San Francisco, CA, May 2013. http://people.engr.ncsu.edu/txie/publications/icse13seip-techtransfer.pdf
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Pathways to Technology Transfer and Adoption: Achievements and Challenges
1. Pathways to Technology
Transfer and Adoption:
Achievements and Challenges
Dongmei Zhang
Microsoft Research Asia
Tao Xie
North Carolina State University
ICSE 2013 SEIP Mini-Tutorial
May 23, 2013
taoxie@gmail.comdongmeiz@microsoft.com
10. Mindset Changing is Needed
for Our Community
•Need to get out of comfort zone
•Need to value (and pursue) “realness”
•Need to aim for ultimate tasks
•Need to value (and pursue) tech readiness
11. Mindset Changing is Needed
for Our Community
•Need to get out of comfort zone
•Need to value (and pursue) “realness”
•Need to aim for ultimate tasks
•Need to value (and pursue) tech readiness
15. NSF Workshop on Formal Methods
• Goal: to identify the future directions in research in
formal methods and its transition to industrial practice.
• The workshop brought together researchers and
identified primary challenges in the field, both
foundational, infrastructural, and in transitioning ideas
from research labs to developer tools.
http://goto.ucsd.edu/~rjhala/NSFWorkshop/
Recently related fields (e.g., formal methods) have already looked into
transitioning research to industrial practice. Time for us to do too!
December 2012
16. Mindset Changing is Needed
for Our Community
•Need to get out of comfort zone
•Need to value (and pursue) “realness”
•Need to aim for ultimate tasks
•Need to value (and pursue) tech readiness
24. Industrial Evaluations!= Real Adoption
• Papers on industrial studies/evaluations on applying
tools on industrial code, who apply?
• Authors themselves instead of third parties
• Non-target users (such as students)
• Target users but not developers of the industrial code
• Developers of the industrial code
• Apply one-time (hit&run) or continuous adoption?
Need to value real adoption (e.g., in reviewing papers)
25. Mindset Changing is Needed
for Our Community
•Need to get out of comfort zone
•Need to value (and pursue) “realness”
•Need to aim for ultimate tasks
•Need to value (and pursue) tech readiness
29. MS Academic Search: “Clone Detection”
Typically focus/evaluate on
intermediate steps (e.g., clone
detection) instead of ultimate
tasks (e.g., bug detection or
refactoring), even when the
field already grows mature
with n years of efforts on
intermediate steps
30. Some Success Stories of Applying
Clone Detection [Focus on Ultimate Tasks]
26
Zhenmin Li, Shan Lu, Suvda Myagmar, and
Yuanyuan Zhou. CP-Miner: a tool for finding
copy-paste and related bugs in operating
system code. In Proc. OSDI 2004.
MSRA
XIAO
Yingnong Dang, Dongmei Zhang, Song Ge,
Chengyun Chu, Yingjun Qiu, and Tao Xie.
XIAO: Tuning Code Clones at Hands of
Engineers in Practice. In Proc. ACSAC 2012,
http://patterninsight.com/
http://www.blackducksoftware.com/
http://research.microsoft.com/en-us/groups/sa/
31. Mindset Changing is Needed
for Our Community
•Need to get out of comfort zone
•Need to value (and pursue) “realness”
•Need to aim for ultimate tasks
•Need to value (and pursue) tech readiness
32. Example Dimensions of Tech Readiness
•Scalability
•Complexity
•Applicability
•Usability (human in the loop)
•Cost-Benefit Analysis
33. Scalability
•Academia
• Rarely ask “When scale is up, will my solution still work?”
• Tend to focus on small or toy scale problems
•Real-world (e.g., search engine, code analysis, …)
• Often demand a scalable solution
• Ideal: sophisticated and scalable solution
• But in practice, simple solution tends to be scalable
(performance, maintenance, …)
• Academia tend to value sophistication > simplicity
• Ex: Echelon@MS [Srivastava/Thiagarajan ISSTA’02],
Klee [Cadar et al. OSDI’08]
http://dl.acm.org/citation.cfm?id=566187
http://dl.acm.org/citation.cfm?id=1855756
34. Complexity
•Academia
• Tend to make assumptions to simplify problems, or one
at a time (indeed relaxing assumptions over time)
• May not be able to assess the relevance/feasibility of
assumptions in practice; not consult/work w/ industry
•Real-world
• Often has high complexity, violating these assumptions
• Example: OO Unit Test Generation
• Isolated simple classes Isolated complex data
structures Real world classes as focused by our recent
work [Thummalapenta et al. ESEC/FSE’09, OOPSLA’11]
http://dl.acm.org/citation.cfm?id=2048083
http://dl.acm.org/citation.cfm?id=1595725
35. Applicability
• Academia
• Tend to focus on a solution optimized for one of many situations
(likely worse for others) vs. comprehensive solution
• May not enable to tell ahead of time whether a given case would
fall into applicable scope of the solution
• Real-world
• Need a comprehensive solution that would work generally (at least
not compromising too much other situations)
• Examples
• Integration of our Fitnex in Pex [Xie et al. DSN’09]
• Coverity [Bessey et al. CACM’10] vs.
MSRA XIAO [Dang et al. ACSAC’12]/PatternInsight
• Industry adoption of open source tools
http://dl.acm.org/citation.cfm?id=1646374
http://research.microsoft.com/pubs/81089/dsn09-fitnex%5B1%5D.pdf
http://research.microsoft.com/jump/175199
36. Usability
• Academia
• Tend to leave human out of loop (involving human makes
evaluations difficult to conduct or write)
• Tend not to spend effort on improving tool usability
• tool usability would be valued more in HCI than in SE
• too much to include both the approach/tool itself and usability/its
evaluation in a single paper
• Real-world
• Often has human in the loop (familiar IDE integration, social effect,
lack of expertise/willingness to write specs,…)
• Examples
• Agitar [Boshernitsan et al. ISSTA’06] vs. Daikon [Ernst et al. ICSE’99]
• Debugging user study [Parnin&Orso ISSTA’11]
http://dl.acm.org/citation.cfm?id=302467
http://dl.acm.org/citation.cfm?id=1146258
http://dl.acm.org/citation.cfm?id=2001445
37. "Are Automated Debugging [Research]
Techniques Actually Helping Programmers?"
• 50 years of automated debugging research
• N papers only 5 evaluated with actual programmers
“
” [Parnin&Orso ISSTA’11]
http://dl.acm.org/citation.cfm?id=2001445
38. Cost-Benefit Analysis
• Academia
• Tend to focus on one or a few dimensions of
measurement (e.g., analysis cost, precision and/or recall)
• Real-world
• Consider many dimensions of measurement
• Cost, e.g., human cost (inspecting false positives)
• Benefit, e.g., bug severity
• Killer apps, e.g.,
• MSR SLAM: Device driver verification
• MSR SAGE: Security testing of binaries [Godefroid et al. NDSS’08]
• PatternInsight/MSRA XIAO: Known-bug detection
• Example: Google FindBugs Fixit [Ayewah&Pugh ISSTA’09]
http://research.microsoft.com/en-us/projects/slam/
http://research.microsoft.com/en-us/um/people/pg/public_psfiles/ndss2008.pdf
http://dl.acm.org/citation.cfm?id=1831738
39. Industry Academia Collaboration
•Academia (research recognitions, e.g., papers)
vs. Industry (company revenues)
•Academia (research innovations) vs. Industry
(likely involving engineering efforts)
•Academia (long-term/fundamental research or
out of box thinking) vs. Industry (short-term
research or work)
• Industry: problems, infrastructures, data, evaluation
testbeds, …
• Academia: educating students, …
40. MSRA Software Analytics Group
Mission
Utilize data-driven approach to help create highly
performing, user friendly, and efficiently developed
and operated software and services.
Founded
May 2009
Group members
12
http://research.microsoft.com/en-us/groups/sa/
http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
41. Software Analytics
Software analytics is to enable software practitioners
to perform data exploration and analysis in order to
obtain insightful and actionable information for data-
driven tasks around software and services.
Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao
Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences.
In MALETS 2011
http://research.microsoft.com/en-us/groups/sa/malets11-analytics.pdf
43. Research topics & technology pillars
Microsoft Confidential
Software
Development
Process
Software Systems
Software Users
Research Topics
44. Research topics & technology pillars
Microsoft Confidential
Software
Development
Process
Software Systems
Software Users
Information Visualization
Analysis Algorithms
Large-scale Computing
Research Topics Technology Pillars
45. Research topics & technology pillars
Microsoft Confidential
Software
Development
Process
Software Systems
Software Users
Information Visualization
Analysis Algorithms
Large-scale Computing
Research Topics Technology Pillars
46. Research topics & technology pillars
Microsoft Confidential
Software
Development
Process
Software Systems
Software Users
Information Visualization
Analysis Algorithms
Large-scale Computing
Research Topics Technology Pillars
Vertical
Horizontal
47. Connection to practice
MSR 2012 39
• Software Analytics is naturally tied with
software development practice
• Getting real
48. Connection to practice
MSR 2012 39
• Software Analytics is naturally tied with
software development practice
• Getting real
Real
Data
Real
Problems
Real
Users
Real
Tools
49. Creating real impact
Code Clone Analysis [Dang et al. ACSAC’12]
• Detecting near-duplicated code
• Released with Visual Studio 2012
StackMine [Han et al. ICSE’12]
• Performance debugging in the large
via mining millions of stack traces
• Helping improve Windows performance
http://research.microsoft.com/jump/175199
http://dl.acm.org/citation.cfm?id=2337241
52. Real world is not that pretty…
• Data is incomplete and noisy…
• The scale of data is huge…
• We do not have all the time in the world to compute…
• The machines are not powerful enough…
• End users are “impatient”…
• Product teams are always busy…
• Product teams do not commit before seeing everything
working…
• Product teams change plans and priorities…
• Product teams speak “different languages”…
• More …
ICSE 2013 SEIP 43
53. What does “getting real” mean?
ICSE 2013 SEIP 44
Making real
impact
Building real
technologies
Solving real
problems
Software engineering is naturally tied with software
development practice
55. Example project – XIAO
• Token-based code clone analysis technique
• Characteristics
• Technology transfers
• Three-year journey fromVisual Studio 2012
• Code clone search service within Microsoft
• research to impact
ICSE 2013 SEIP 46
¤ High tunability ¤ High scalability
¤ High compatibility ¤ High explorability
Prototype
development
Early adoption
Technology
transfer
Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, Tao Xie, XIAO: Tuning Code Clones at
Hands of Engineers in Practice, Proc. ACSAC 2012.
http://research.microsoft.com/jump/175199
56. Scalability
• Four-step analysis process
• Easily parallelizable based on source code partition
ICSE 2013 SEIP 47
Pre-processing
Coarse
Matching
Fine MatchingPruning
57. What you tune is what you get
MSR 2012 48
• Intuitive similarity metric
• Effective control of the degree of syntactical differences between two code
snippets
• Tunable at fine granularity
• Statement similarity
• % of inserted/deleted/modified statements
• Balance between code structure and disordered statements
for (i = 0; i < n; i ++) {
a ++;
b ++;
c = foo(a, b);
d = bar(a, b, c);
e = a + c; }
for (i = 0; i < n; i ++) {
c = foo(a, b);
a ++;
b ++;
d = bar(a, b, c);
e = a + d;
e ++; }
58. What you tune is what you get
MSR 2012 48
• Intuitive similarity metric
• Effective control of the degree of syntactical differences between two code
snippets
• Tunable at fine granularity
• Statement similarity
• % of inserted/deleted/modified statements
• Balance between code structure and disordered statements
for (i = 0; i < n; i ++) {
a ++;
b ++;
c = foo(a, b);
d = bar(a, b, c);
e = a + c; }
for (i = 0; i < n; i ++) {
c = foo(a, b);
a ++;
b ++;
d = bar(a, b, c);
e = a + d;
e ++; }
59. What you tune is what you get
MSR 2012 48
• Intuitive similarity metric
• Effective control of the degree of syntactical differences between two code
snippets
• Tunable at fine granularity
• Statement similarity
• % of inserted/deleted/modified statements
• Balance between code structure and disordered statements
for (i = 0; i < n; i ++) {
a ++;
b ++;
c = foo(a, b);
d = bar(a, b, c);
e = a + c; }
for (i = 0; i < n; i ++) {
c = foo(a, b);
a ++;
b ++;
d = bar(a, b, c);
e = a + d;
e ++; }
60. What you tune is what you get
MSR 2012 48
• Intuitive similarity metric
• Effective control of the degree of syntactical differences between two code
snippets
• Tunable at fine granularity
• Statement similarity
• % of inserted/deleted/modified statements
• Balance between code structure and disordered statements
for (i = 0; i < n; i ++) {
a ++;
b ++;
c = foo(a, b);
d = bar(a, b, c);
e = a + c; }
for (i = 0; i < n; i ++) {
c = foo(a, b);
a ++;
b ++;
d = bar(a, b, c);
e = a + d;
e ++; }
61. Explorability
ICSE 2013 SEIP 49
1. Clone navigation based on source tree hierarchy
2. Pivoting of folder level statistics
3. Folder level statistics
4. Clone function list in selected folder
5. Clone function filters
6. Sorting by bug or refactoring potential
7. Tagging
1 2 3 4 5 6
7
1. Block correspondence
2. Block types
3. Block navigation
4. Copying
5. Bug filing
6. Tagging
1
2
3
4
1
6
5
How to navigate through the large
number of detected clones?
How to quickly review a pair of clones?
64. Communication – getting connected
• Reaching-out to practitioners
• Understanding their business
• Speaking practitioners’ languages
• Finding out their pain points
• Understanding their scenarios
• Experiencing their pain
• Articulating their problems
ICSE 2013 SEIP 52
65. Communication – forming partnership
• Finding and defining shared goals
• Setting the right expectation
• Building a roadmap
• Forming virtual team (creating an email alias)
• Adopting a milestone approach
• Conducting regular sync-up
ICSE 2013 SEIP 53
66. Example project – XIAO
• Tons of papers published in the past 10 years
• 6 years of International Workshop on Software
Clones (IWSC) since 2006
• Dagstuhl Seminar
• Software Clone Management towards Industrial Application (2012)
• Duplication, Redundancy, and Similarity in Software (2006)
• No code clone analysis tools in MS
• No product offering
ICSE 2013 SEIP 54
Source: http://www.dagstuhl.de/12071
Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, Tao Xie, XIAO:
Tuning Code Clones at Hands of Engineers in Practice, Proc. ACSAC 2012.
http://research.microsoft.com/jump/175199
67. Motivation
• Copy-and-paste is a common developer behavior
• A real tool widely adopted internally and externally
ICSE 2013 SEIP 55
68. Reaching out (1)
• Demonstrating XIAO at TechFest
• Posting XIAO at internal website
• Active “selling” to various teams
• What we gained
• Opportunities to run XIAO on different codebases and
produce rich results
• Feedback to improve both algorithm and system
• Expanded network
ICSE 2013 SEIP 56
69. Reaching out (2)
• What did not land well internally
• Wide interest, but no concrete takers
• Why no takers?
• What exactly is the valuable proposition?
• Long way to go from code clones to bugs
• High cost for code refactoring
• Product prioritization
• Lessons learned
• Killer scenarios needed for value proposition
• Security is a big stick
ICSE 2013 SEIP 57
70. Potential 0day vulnerability disclosure
ICSE 2013 SEIP 58
Initial vulnerability reported in product A
Patch release of
product B
Potential
0day attack!
Security bulletin released
Similar vulnerability found in
product B by attackers
71. Tech transfer to MSRC*
• Search scenario vs. detection scenario
• Code snippet as input
• Much larger scale of codebases
• Near-real-time response
• Code clone search service
• Indexed ~600 million LOC across multiple codebases
• Deployed in, used by, and transferred to MSRC
• Champion in MSRC worked with us all the way
• Providing feedback and update
• Prompting within MSRC
ICSE 2013 SEIP 59
* Microsoft Security Response Center
72. Vulnerability investigation workflow
ICSE 2013 SEIP 60
Design/Implement/Test fix
Variants finding
Root cause investigation &
source location
Issue reproducing
Team A
MSRC
Manual & ad hoc investigation
Code snippet
Team B
Team C
Code clones
73. Vulnerability investigation workflow
ICSE 2013 SEIP 61
Clone search service
Completeness is the key
Web service API for
automation
Code snippet
Code clones
Automated Investigation
Code snippet
Code clones
Design/Implement/Test fix
Variants finding
Root cause investigation &
source location
Issue reproducing
74. More secure Microsoft products
ICSE 2013 SEIP 62
Automated laborious manual efforts
Faster response time critical in security context
Code clone search service integrated into
vulnerability investigation process of MSRC
Real security issues proactively identified
and addressed
75. Example – MS security bulletin MS12-034
Combined security update for Microsoft Office, Windows, .NET Framework, and
Silverlight, published: Tuesday, May 08, 2012
3 publicly disclosed vulnerabilities and seven privately reported involved.
Specifically, one is exploited by the Duqu malware to execute arbitrary code
when a user opened a malicious Office document.
Insufficient bounds check within the font parsing subsystem of win32k.sys
Cloned copy in gdiplus.dll, ogl.dll (office), Silverlight, and Windows Journal viewer
Microsoft Security Research & Defense Blog about this bulletin
“However, we wanted to be sure to address the vulnerable code wherever it
appeared across the Microsoft code base. To that end, we have been working
with Microsoft Research to develop a “Cloned Code Detection” system that we
can run for every MSRC case to find any instance of the vulnerable code in any
shipping product. This system is the one that found several of the copies of CVE-
2011-3402 that we are now addressing with MS12-034.”
ICSE 2013 SEIP 63
http://blogs.technet.com/b/srd/archive/2012/05/08/ms12-034-duqu-ten-cve-s-and-removing-keyboard-layout-file-attack-surface.aspx
76. Transfer to Visual Studio (1)
• Unsuccessful efforts
• Out-Of-Band (OOB) release
• Power Tool
• Two reorgs in Visual Studio
• Lessons learned
• No integration story; felt like a “separate” tool
• Not on the release path of VS
• Accumulated assets
• Solidified algorithm and system
• Trusted partners
• One program manager in VS
• MSRA Innovation Engineering Group
ICSE 2013 SEIP 64
77. Transfer to Visual Studio (2)
• Third time’s the charm
• Strong support from general manager of VSU
• Concrete scenarios identified
• Easy sell at VS 2012 planning meeting
• Virtual team
• Researchers (MSRA SA)
• Developers (MSRA IEG, VS)
• Program manager (VS)
• Tester (VS)
• Active planning as part of VS 2012 release
• Weekly sync-up
• Timely feedback from VS partners
ICSE 2013 SEIP 65
79. Summary
• Mindset changing needed for community
• Get out of comfort zone
• Value (and pursue) “realness”
• Aim for ultimate tasks
• Value (and pursue) tech readiness
• Experience sharing of successful tech-transfer on
Software Analytics
• Getting-real mindset
• Technical readiness
• Collaboration
ICSE 2013 SEIP 79