Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Data-driven software engineering @Microsoft 
Michaela Greiler
Data-driven software engineering @Microsoft 
•How can we optimize the testing process? 
•Do code reviews make a difference...
0 
20 
40 
60 
80 
100 
2010 
2010 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2012 
2012 
20...
Reviewer recommendation: Does experience matter?
Can we change with what we can measure? 
Michaela Greiler
YES
YES 
that’s the danger!
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2....
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2....
SOCIO TECHNICAL CONGRUENCE 
“Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
So should we go without any measurements?
Interpretation 
Data Collection 
Usage 
Lessons learned 
No 
Garbage!
•What is codemine? What data does codeminehave?
GMQ vs. Opportunistic data collection 
•Easily available ≠ what’s needed 
•Determine the needed data 
•Find proxy measures...
Interpretation needs domain knowledge
Tools, processes, 
practices and policies. 
Release schedule 
Time 
Engineers 
What roles exist? 
Who does what? 
Responsi...
You cannot compare 1:1
Engineers want to understand the nitty-gritty 
•How do you calculate the recommended reviewers? 
•Why was that person reco...
Simplicity first 
Files 
without 
bugs 
Files 
with 
bugs 
Files withoutbugs: main contributor made > 50% of all edits 
Fi...
Iterative process with very close involvement of product teams and domain experts. 
It’s a dialog 
It’s a back and forth
Mixed Method Research 
Is a research approach or methodology 
•for questions that call for real-life contextual understand...
Foundations of Mixed 
Methods Research 
Designing 
Social Inquiry 
Qualitative Research: Mixed Method Research 
•Interview...
A Grounded Theory Study 
23 
Systematic procedure to discover a theory from (qualitative) data 
S. Adolph, W. Hall, Ph. Kr...
Deductiveversus inductive 
A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existin...
All models are wrong but some are useful 
(George E. P. Box)
Theo: Test Effectiveness Optimization from History 
Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* 
*Mi...
Improving Development Processes 
Product / 
Service 
Legacy 
changes 
New product 
features 
Technology 
changes 
Developm...
Software Testing for Windows 
Winmain (main branch) 
Quality gate 
(system testing) 
Quality gate 
(system & component tes...
Software Testing for Office 
Software testing is very expensive 
• Thousands test suites executed, millions test cases exe...
Goal 
Reduce the number of test executions … 
… without sacrificing code quality 
Dynamic, self-adaptive optimization model
Solution 
Reduce the number of test executions … 
•Runevery test at least once beforeintegrating code change into main bra...
Bug finding capabilities change with context
Solution 
Using cost function to model risk. 
푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 
퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇...
Current Results 
Simulated on Windows 8.1 development period (BVT only)
Dynamic, Self-Adaptive 
Decision points are connected to each other 
Skipping tests influences the risk factorsof higher l...
Bug Finding Performance of Tests 
How many test executions fail? 
#failed test exec 
Branch level 
Number of test executio...
Impact on Development Process 
Secondary Improvements 
•Machine Setup: we may lower the number of machines allocated to te...
Michaela Greiler 
@mgreiler 
www.michaelagreiler.com 
http://research.microsoft.com/en-us/projects/tse/
Prochain SlideShare
Chargement dans…5
×

Can we induce change with what we measure?

1 986 vues

Publié le

Tom DeMarco states that “You can’t control what you can’t measure”, but how much can we change and control (with) what we measure? This talk investigates the opportunities and limits of data-driven software engineering, shows which opportunities lie ahead of us when we engage in mining and analyzing software engineering process data, but also highlights important factors that influence the success and adaptability of data-based improvement approaches.

Publié dans : Technologie
  • Soyez le premier à commenter

Can we induce change with what we measure?

  1. 1. Data-driven software engineering @Microsoft Michaela Greiler
  2. 2. Data-driven software engineering @Microsoft •How can we optimize the testing process? •Do code reviews make a difference? •Is coding velocity and quality always a tradeoff? •What’s the optimal way to organize work on a large team? MSR Redmond/TSE: Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta MSR Redmond: Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann MSR Cambridge: Brendan MurphyKim Herzig
  3. 3. 0 20 40 60 80 100 2010 2010 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 Code Coverage trigger of Checkins % completely covered % somewhat covered % not covered
  4. 4. Reviewer recommendation: Does experience matter?
  5. 5. Can we change with what we can measure? Michaela Greiler
  6. 6. YES
  7. 7. YES that’s the danger!
  8. 8. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  9. 9. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  10. 10. SOCIO TECHNICAL CONGRUENCE “Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
  11. 11. So should we go without any measurements?
  12. 12. Interpretation Data Collection Usage Lessons learned No Garbage!
  13. 13. •What is codemine? What data does codeminehave?
  14. 14. GMQ vs. Opportunistic data collection •Easily available ≠ what’s needed •Determine the needed data •Find proxy measures if needed •Know the analysis before collecting the data Otherwise, data is not usable for the intended purpose •Goal –Question –Metric •Check for completeness, cleanness/ noise and usefulness •Data background •How was data generated? •Why was it generated? •Who consumes the data? •What about outliers? •How was the data processed?
  15. 15. Interpretation needs domain knowledge
  16. 16. Tools, processes, practices and policies. Release schedule Time Engineers What roles exist? Who does what? Responsibilities? M1 M2 Beta Organization of code bases Team structure and culture.
  17. 17. You cannot compare 1:1
  18. 18. Engineers want to understand the nitty-gritty •How do you calculate the recommended reviewers? •Why was that person recommended? •Why is Lisa not recommended?
  19. 19. Simplicity first Files without bugs Files with bugs Files withoutbugs: main contributor made > 50% of all edits Files withbugs: main contributor made < 60% of all edits Ownership metric: Proportion of edits of all edits for the contributor with the most edits Reporting vs. Prediction Comprehension vs. automation If you can do it with a decision tree… do it…
  20. 20. Iterative process with very close involvement of product teams and domain experts. It’s a dialog It’s a back and forth
  21. 21. Mixed Method Research Is a research approach or methodology •for questions that call for real-life contextual understandings; •employing rigorous quantitative research assessing magnitude and frequency of constructs and •rigorous qualitative researchexploring the meaning and understanding of constructs; DR. MARGARET-ANNESTOREY Professor of Computer Science University of Victoria All methods are inherently flawed! Generalizability Precision Realism DR. ARIEVANDEURSEN Professor of Software Engineering Delft University of Technology
  22. 22. Foundations of Mixed Methods Research Designing Social Inquiry Qualitative Research: Mixed Method Research •Interviews •Observations •Focus groups •Contextual Inquiry •Grounded Theory •…
  23. 23. A Grounded Theory Study 23 Systematic procedure to discover a theory from (qualitative) data S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. Glaser and Strauss
  24. 24. Deductiveversus inductive A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). Theory Hypotheses Observation Confirm/Reject Observation Patterns Theory
  25. 25. All models are wrong but some are useful (George E. P. Box)
  26. 26. Theo: Test Effectiveness Optimization from History Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* *Microsoft Research, Cambridge +Microsoft Corporation, US
  27. 27. Improving Development Processes Product / Service Legacy changes New product features Technology changes Development Environment $ Speed R Cost Quality / Risk (should be well balanced) Microsoft aims for shorter release cycles Empirical data to support & drive decisions • Speed up development processes (e.g. code velocity) • More frequent releases • Maintaining / increasing product quality Joint effort by MSR & product teams • MSR Cambridge: Brendan Murphy, Kim Herzig • TSE Redmond: Jacek Czerwonka, Michaela Greiler • MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan • Windows, Windows Phone, Office, Dynamics product teams
  28. 28. Software Testing for Windows Winmain (main branch) Quality gate (system testing) Quality gate (system & component testing) Quality gate (component testing) time Development branch Multiple area branches Multiple component branches Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection {Simplified illustration}
  29. 29. Software Testing for Office Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection Dev Inner Loop BVT and CVT on main Dog food Different • Branching structure • Development process • Testing process • Release schedules • … {Simplified illustration}
  30. 30. Goal Reduce the number of test executions … … without sacrificing code quality Dynamic, self-adaptive optimization model
  31. 31. Solution Reduce the number of test executions … •Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). •We eventually find all code issues but take riskof finding them later (on higher level branches). … without sacrificing code quality High cost, unknown value $$$$$ High cost, low value$$$$ Low cost, low value$ Low cost, good value$$ How likely is a test causing: 1)false positivesor 2)finding code issues? Analyzehistoric data: -Test Events -Builds -Code Integrations Analyzepast test results -Passing tests, false alarms, detected code issues
  32. 32. Bug finding capabilities change with context
  33. 33. Solution Using cost function to model risk. 푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" =퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" =푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ Test Costto run a test. Valueof output.
  34. 34. Current Results Simulated on Windows 8.1 development period (BVT only)
  35. 35. Dynamic, Self-Adaptive Decision points are connected to each other Skipping tests influences the risk factorsof higher level branches We re-enable testsif code quality drops (e.g. different milestone) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% relative test reduction rate Time (Windows 8.1) Training period
  36. 36. Bug Finding Performance of Tests How many test executions fail? #failed test exec Branch level Number of test executions How many of the failed test executions result in bug reports? FP TP test-unspecific TP test-specific Branch level
  37. 37. Impact on Development Process Secondary Improvements •Machine Setup: we may lower the number of machines allocated to testing process •Developer satisfaction: Removing false test failures increases confidence in testing process …hard to estimate speed improvement through simulation “We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
  38. 38. Michaela Greiler @mgreiler www.michaelagreiler.com http://research.microsoft.com/en-us/projects/tse/

×