1. Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana
2.
3. Introduction Graphical model: Node represents random variables; edge represents dependency. Undirected graphical model: Markov network Directed graphical model: Bayesian network Causal relationships between nodes; Directed acyclic graph ( DAG ) : No directed cycles allowed; B={ G, θ } x 1 x 2 x 3 x 4
4. Introduction Goal: simultaneously learn Bayes Net structures for multiple tasks. Different tasks are related; Structures might be similar, but not identical. Example: gene expression data. 1) Learning one single structure from data. 2) Generalizing to multiple task learning by setting joint prior of structures.
5. Single Bayesian network learning from data Bayes Network B={ G, θ }, including a set of n random variables X ={ X 1 , X 2 ,…, X n } Joint probability P ( X) can be factorized by Given dataset D ={ x 1 , x 2 , …, x m }, where x i = (x 1 ,x 2 ,…,x n ), we can learn structure G and parameter θ from the dataset D .
6.
7. Algorithm: 1) Randomly generate an initial DAG, evaluate its score; 2) Evaluate the scores of all the neighbors of current DAG; 3) while {some neighbors have higher scores than current DAG} move to the neighbor that has the highest score Evaluate the scores of all the neighbors of the new DAG; end 4) Repeat (1) - (3) a number of times starting from different DAG every time. Single Bayesian network learning from data
8.
9. Given iid dataset D 1 , D 2 , …, D k, Simultaneously learn the structure B 1 ={G 1 , θ 1 } ,B 2 ={G 2 , θ 2 },…,B k ={G k , θ k } Structures (G 1 ,G 2 ,…,G k ) – similar, but not identical Learning from related task
10. Learning from related task One more assumption: the parameters of different networks are independent: Not true, but make structure learning more efficient. Since we focus on structure learning, not parameter learning, this is acceptable.
11.
12.
13.
14. Learning from related task Acceleration : At each iteration, algorithm must find best score from a set of neighbors Not necessary search all the elements in The first i tasks are specified and the rest k-i tasks are not specified. where is the upper bound of the neighbor subset