2. } Search is what you do when you're looking for
something. Discovery is when something
wonderful that you didn't know existed, or
didn’t know how to ask for, finds you.” – CNN
Money, “The race to create ‘smart’ Google.”
From Search to Recommendation
3. Recommender problem
the user is
} Consumer
} Subscriber
} Member
Estimate a utility function
to predict
how a user will like an item
the item is
} Movie
} Apps
} Travel destinations
4. Recommender
} A good recommendation
} Relevant to the user
} Personalized
} Diverse
} Expands the user's taste into neighboring
areas (serendipity – unsought finding)
5. Paradigm of Recommender Systems
} Recommender systems reduce information overload by
estimating relevance
} Collaborative filtering :What is popular in a community
} User profile & community information
} Content Based: Provides more of what user liked before
} User profile & Item profile
} Knowledge Based :What is best based on the users’ needs
} User profile & Item profile & Knowledge Model
} Hybrid Method: Combination of inputs and/or composition of
different methods
} User profile & Item profile & knowledge Model & Community
Information
6. Recommender Systems Challenges
} Dealing with Big Data problems
} Lack of Useful Data
} Unstructured data
} Missing Data
} New user and New Item
} Cold Start problem
} Temporality
} Changing Data
} Changing user preferences and biases
} Negative choices
} Evaluating Recommenders
7. Main Research Issues
} Understanding the context and modeling context
} Algorithms
} Evaluation
} Engineering
8. Bayesian Networks For Evidence-Based Decision-
Making in Software Engineering
Ayse Tosun Misirli, and Ayse Bener, IEEE Transactions on
Software Engineering, vol.40, no.6., June 2014
9. Recommendation systems
for software engineering (RSSE)
} Recommendation systems/ prediction models should be
designed in a way that they are capable of integrating
evidence, i.e., facts and probabilities systematically
collected or measured from real data and observations,
into practitioners’ experience.
} In this study, we follow the lead of computational biology
and healthcare decision-making, and investigate the
applications of BNs in SE
10. The Bayesian Approach
} Provides a natural statistical framework for evidence-based
decision-making by incorporating an integrated summary of
the available evidence and associated uncertainty (of
consequences)
} Maintaining observations, statistical distributions, prior
assumptions, and expert judgment in a single model
} Encoding causal relationships among variables for predicting
future actions
} “information propagation through the network”, i.e., gaming
over the network to see all possible scenarios and their
outcomes to give the best action
} imitating the process of human thinking, while going beyond the
capabilities of human reasoning with a fact-based, error-free
intelligence through the usage of enormous amounts of
historical data
11. Example of a simple BN with different
variable types
12. Systematic Mapping of BNs in SWE
} To investigate the
applications of BNs
in SE
} main software
engineering
challenges addressed
} techniques used to
learn causal
relationships among
variables
} techniques used to
infer the
parameters
} variable types used
as BN nodes
13. Empirical Analysis on Bayesian Decision-
Making
} Hybrid Bayesian Network that would solve a specific
software engineering challenge
} predicting software reliability in terms of post-release defects
} a ’mixeddata’ model to represent software life cycle
phases by incorporating expert judgment (qualitative data
through surveys) into quantitative data collected from
software repositories
} a ’hybrid’ BN that incorporates both continuous and
categorical variables
19. Setting Prior Distributions
} Model #1
} expert knowledge
} Model #2
} Lilliefors significance test on all variables and on post
release defects
} normal probability plots
} Model #3
} The requirements specification subnet whose
distributions were set based on expert knowledge is used,
and it is incorporated with the development and testing
subnet in Model #2 whose variables are assigned different
distributions based on the significance tests
20. Structure Learning
} Expert Judgement
} Chi-plot
} Independence betwen two variables
} Copula models- a transformation of data with marginal distributions
} Prior to modeling it is necessary to chack the presence of dependence
there is a positive monotone dependence
between test cases and post release
defects as data pairs are shifted towards
right from the center
21. Inference
} Bayesian learning for complex models using Monte Carlo
methods, especially Gibbs sampling
} insufficient statistics
} incomplete data
} successively sample from posterior distribution of each
node in a Bayesian model given all the others as full
conditionals
} successful when estimating the unknown parameters of
probability distributions or when conducting empirical
analysis to infer true values of a given sample
} enables to make predictions for future scenarios even though
some of the input variables are missing
23. Threats to Validity
} Internal validity
} biases during data collection
} Used scripts to extract data
} Eliminated outliers
} BNs for causality and to avoid over-fitting
} Construct validity
} Large set of metrics were chosen
} Well-known performance measures are used
} Conclusion validity
} Non-parametric test (Mann-Whitney U-test),ANOVA, t-test were used
} External validity
} we aim to transfer the methodology behind BN construction to enhance the
usage of these graphical, probabilistic models in software engineering
24. Conclusions
} Similar to computational biology and healthcare, we need
to make decisions under uncertainty using multiple data
sources
} As we understand the dynamics of BNs and the
techniques used for model learning, these models would
enable us to uncover hidden relationships between
variables, which cannot be easily identified by experts
} Understanding the theory behind BNs also gives us the
opportunity to adopt these models to different industrial
settings by changing the set of metrics, their distributions,
and causal relationships among variables
25. Conclusions
} An integrated tool support (intelligent software delivery
platform)
} Dione – to be integrated to IBM Rational