Presentation from my talk during IWSM Mensura Conference in Gothenburg, October 24-26, 2017 regarding current agile and dynamic software development environment, and necessity for automated estimation decision support tools based on machine learning algorithms that would enable not only predictions (effort, duration) but would enable simulations and better adaptation to changing factors.
2. Bio
Research Scientist, Data Science Manager, Project Manager
• MSc in Business Computing (2006) – Poznan University of Economics
• PgD in Knowledge Management (2007) – Dublin Institute of Technology
• Visiting Scholar (2014) – Florida Atlantic University
• PhD in Economics (2016) – Warsaw School of Economics
• PMP Certified (2011)
10yrs of professional experience in business analysis, project management and
business intelligence – AIB, Aviva, PKO, DNB, Raiffeisen
Currently establishing Data Lab for Jones Lang LaSalle (JLL) - EMEA
Research interests: Applicability of machine learning algorithms, in particular for
software estimation and smart cities
< 2 >
4. Challenges for software estimation
Agile/ hybrid methodologies
Rapid, continuous delivery
Vague, changing requirements
< 4 >
High uncertainty:
• Product features
• Budget
• Timeframe
• Quality
Need for techniques and tools that enable scenario planning
and dynamic adoption to changing environment!
Leaner, faster and more dynamic!
6. Limitations of existing estimation techniques
< 6 >
Expert estimation
(PERT, Delphi, Planning Poker)
Estimation by analogy
Parametric models
(COCOMO II, SLIM, SEER-SEM)
Size-based
estimation models
(FPA, Use Case)
Decomposition and bottom-up
(WBS-based, User stories)
Expert Knowledge
Subjective choice of
comparison criterion
Difficult to preform in changing
environment, limited information
Code reuse, libraries,
codeless programming
and agile development
Requires training,
may not be able to
apply for baseline
estimation
8. Data Science for software estimation
• Researched for last 2 decades, primarily for:
- Effort and duration estimation
- Monitoring
- Quality
- Maintenance
• Applied various techniques and algorithms – classification, regression, ensembling, data
preparation, machine learning algorithms
• Datasets – ISBSG, COCOMO, NASA, SourceForge, PROMISE Software Engineering
Repository
• Emphasis on prediction accuracy of Effort and Duration models
• Exceptional results, although limited to none implementations within organisations
< 8 >
Wen, J., Li, S., Lin, Z., Hu, Y. and Huang, C. (2012). Systematic literature review of machine learning based software development effort estimation models.
Information and Software Technology. 54.
9. Effort and duration estimation model - example
< 9 >
Input
dataset
3-fold
cross-
validation
SVM
MLP
GLM
Ensemble
aggregation
Evaluation
MMRE, PRED,
MMER, MBRE
ISBSG
dataset
Feature
selection
Data
understanding
Data
preparation
LASSO,
stepwise
regression
Transformation,
normalization
Data selection
& cleaning
Pearson
correlation
Pospieszny, P., Czarnacka-Chrobot, B. and Kobyliński, A. (2017). An effective approach for software project effort and duration estimation with machine learning algorithms – working paper
11. Prescriptive Analytics
• Final frontier of business analytics – IBM, 2010
• Scenario planning and smart foresight
• Performed with a given set of goals, limitations and constraints that define different
scenarios which provide foresight as to the best outputs (set of alternative actions or
decisions)
• Applies combination of different techniques and approaches with emphasis on machine
learning algorithms
• Software Process Simulation – similar principle, different approach, techniques and
algorithms (system dynamics, discrete event simulation, Petri nets and Monte Carlo)
• Enhanced planning, optimization of resources and increasing project success rate
< 11 >Lustig, I., Dietrich, B., Johnson, C. and Dziekan, C. (2010). The analytics journey. Analytics Magazine. 3, 11–13.
12. Five pillars
< 12 >Basu, A. 2013. Five pillars of prescriptive analytics success. Executive Edge. (2013), 8–11.
Integrated predictions
& prescriptions
Both must work
synergistically for
prescriptive analytics to
deliver accurate results
Prescriptions & side
effects
Prescriptions based on
advanced analytic
approaches that include a
process of generating the
best course of actions for
defined goals, constraints
and decision variables
Adaptive algorithms
Flexible algorithms that
enable re-predictions
and re-prescriptions and
can handle noisy input
data – preferably
machine learning
algorithms.
Feedback mechanism
Any actions based on
prescriptions should be
recorded for further use
to deliver better actions
– automated learning
process.
Hybrid data
Structured &
unstructured data,
including feed from
external sources
(environmental,
economic data etc.)
13. Machine learning
• Exceptional in handling multi-variety and noisy data in uncertain environments
• Unsupervised vs. Supervised learning (+ Reinforcement)
• Types of problems – association rules, clustering, regression, classification
• Algorithms – Neural Networks, Deep learning, Support Vector Machines,
Decision Trees, Generalized Linear Models
• Optimization and learning mechanism
< 13 >
14. Areas of software estimation – applicability
< 14 >
Area Description Sample questions
Baseline estimation
Scenario planning based on fixed effort and/or duration. The
aim is to define project and product characteristics that will
ensure project/ phase/sprint’s completion within determined
effort and timeframe.
• Which resources based on their skillset should be
allocated?
• What size and quality will the product have?
• Which development methodology should be used?
Monitoring
Identify any deviations from baselines that may impact
successful completion of project/phase/sprint or even task, and
propose corrective actions by adjusting project or product
characteristics (fixed effort and/or duration).
• Which additional resources should be allocated?
• Which product features needs to be sacrificed?
• How effort or duration overrun reduction activities will
impact product quality?
Quality
Define project and product features that will ensure delivering
product with determined baseline quality.
• Which resources will ensure delivering high quality
product?
• What architecture, development platform or
programing language should be applied?
• Which software development and testing
methodologies should be used?
Maintenance
By determining maintenance effort of product to be develop
define project and product characteristics.
• What quality of product should be delivered?
• Which skilled resources should be allocated?
• What development and testing methodology should be
applied?
15. Use Case#1 - Baseline estimation
Objective: Complete project within 12 months
Question:
• What effort is involved?
• Which resources should be applied?
Key metrics:
• Decision variables: effort, resource types & volume
• Constraints: duration, product characteristics, development methodology
Approach:
Run multiple ML predictive models with different scenarios in order to obtain the most
optimal solution:
1. Manipulate with resource types & volume, and also effort in order to achieve (predict)
duration ~12 months OR
2. Define multiple resource scenarios and predict effort (reminding variables as constraints)
< 15 >
16. UC#2 - Task Estimation
< 16 >
Objective: Complete tasks within a Scrum Sprint (2-4 weeks) or Release (1-3 months)
Question:
• How many story points/size?
• Which resources should be applied?
• Which development methodology should be used?
Key metrics:
• Decision variables: Story points/ size, resource types & volume
• Constraints: duration, functionalities to be delivered, task characteristics
Approach:
Define multiple scenarios for resources, dev methodologies (for releases) and predict story
points/size for each task or sprint
17. UC#3 – Change request (release, project)
< 17 >
Objective: Absorb changes in release/ project scope and deliver functionalities/ product
within baseline duration
Question:
• How many sprints/ iterations and at what duration?
• How many story points/size?
• Which additional resources should be applied?
• Which functionalities/ user stories should be dropped?
Key metrics:
• Decision variables: Resource types & volume, story points/ size, user stories/ functionalities
• Constraints: Duration, quality metrics, development methodology and framework, architecture,
programming languages etc.
Approach:
Define multiple scenarios for sprints/ iterations and resources and predict story points/size for release
or project
18. • Data
- Availability – more granular than in predictive required
- Volume, quality and completeness
- Data preparation approach!
- External data – working days, planned vacations, flu index etc.
• Cost vs benefits of implementation
• Implementation guidelines – simplicity!
• Traditional simulation techniques vs. machine learning
• Need for hybrid approach?
< 18 >
Open questions
19. Further research
• Simulation using machine learning algorithms – approach & algorithms
• Integration with traditional simulation techniques- probabilistic & rule based
• Proof of concept within chosen organisations
• Integration with existing project management, issue and bug tracking software
or development of standalone tool
< 19 >
Towards dynamic planning, optimal utilization of resources and ultimately
increasing project success rate!