Machine Learning at PeerIndex

Machine Learning at
PeerIndex

@fhuszar

Ferenc Huszár
Wednesday, 16 May 12

PeerIndex.com: understand your inﬂuence


PeerPerks.com: free stuff for inﬂuencers


PeerPerks: free stuff for inﬂuencers


Machine Learning @ PeerIndex



• The usual stuff



• The usual stuff
• topic modelling/classiﬁcation of tweets/statuses/URLs



• The usual stuff
• identity resolution across twitter, facebook, linkedIn



• The usual stuff
• spambot/fraud detection: identify people gaming the system



• The usual stuff
• sentiment classiﬁcation: happy/sad/neutral



• The usual stuff

• The really exciting stuff



• The usual stuff

• inferring networks of inﬂuence - more about this later



• The usual stuff

• visualise different aspects of inﬂuence, in an engaging way



• The usual stuff

• visualise different aspects of inﬂuence, in an engaging way
• inﬂuence maximisation - submodular optimisation


Inferring networks of inﬂuence



Social network



Social network Propagation probabilities

pi,j



Social network Propagation probabilities

pi,j

Information cascade logs
http://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/...

1079306 2011-08-25T00:03:06+01:00 259725 2011-10-24T03:32:19+01:00
4549198 2011-08-25T04:32:25+01:00 76539 2011-10-24T03:32:23+01:00
2662975 2011-08-25T00:35:11+01:00 1922351 2011-10-24T04:28:47+01:00
2333224 2011-08-25T01:43:18+01:00 9183 2011-10-24T03:30:57+01:00
3141371 2011-08-25T01:52:06+01:00 3335398 2011-10-24T03:34:01+01:00
3482720 2011-08-25T07:18:24+01:00 1616885 2011-10-24T03:48:16+01:00
1403682 2011-08-25T03:52:58+01:00 82198 2011-10-24T03:48:29+01:00
4679657 2011-08-25T01:07:48+01:00 906390 2011-10-24T23:13:51+01:00
32460 2011-08-25T01:11:43+01:00 1051322 2011-10-24T03:40:02+01:00


Heurisric approaches to estimate pi,j



• purely based on local network structure
1
pi,j
din (j)



1
pi,j
din (j)

• trivalency “model” my personal favourite :)
pi,j {0.1, 0.01, 0.01} randomly



1
pi,j
din (j)

pi,j {0.1, 0.01, 0.01} randomly

• data-driven heuristics
number of items shared by j after i shared it
pi,j
number of items shared by i



1
pi,j
din (j)

pi,j {0.1, 0.01, 0.01} randomly

• data-driven heuristics
number of items shared by j after i shared it
pi,j
number of items shared by i

How do you solve this with machine learning?


The likelihood


The likelihood

P( D | ✓ )


The likelihood

P( D | ✓ )
http://www.pcworld.com/article/239719

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j

what’s the probability of the cascade u1 , u2 , u3 , . . . , un


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j

for subsequent users in cascade


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j


p0,u1


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j


p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j


p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))· · ·


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j

0 1
n
Y i 1
Y
= @1 (1 puj ,ui )A
i=1 j=1


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j

0 1
n
Y i 1
Y
= @1 (1 puj ,ui )A
i=1 j=1
for users that are not in cascade


The likelihood

P( D | )

1079306 2011-08-25T00:03:06+01:00
4549198 2011-08-25T04:32:25+01:00
2662975 2011-08-25T00:35:11+01:00
2333224 2011-08-25T01:43:18+01:00
3141371 2011-08-25T01:52:06+01:00
3482720 2011-08-25T07:18:24+01:00
1403682 2011-08-25T03:52:58+01:00
4679657 2011-08-25T01:07:48+01:00
32460 2011-08-25T01:11:43+01:00
pi,j

0 1
n
Y i 1
Y
= @1 (1 puj ,ui )A
i=1 j=1
for users that are not in cascade
Y Y
(1 pu,v )
u2{u1 ...un } v2users
/


Maximum likelihood at scale



• data too sparse to learn one parameter per edge




• large scale gradient-based optimisation is costly





• Solution: combine ensemble of heuristics with ML






• use heuristics to compute probabilities at scale






• use heuristics to compute probabilities at scale

• use ML to tune parameters on small-scale data


Inﬂuence maximisation



• Select a set of users to maximise outreach



• Inﬂuence of people combines non-linearly



• In many models it combines sub-modularly



A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B)



A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B)

• these functions are fun to optimise



A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B)

• these functions are fun to optimise
• pops up many times in machine learning


Wrap up


Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks


Wrap up


• lots of ‘standard’ machine learning tasks


Wrap up



• some uniquely exciting problems


Wrap up



• inferring propagation probabilities


Wrap up



• compute expected number of users one reaches out to


Wrap up



• putting all aspects together into a single number, and visualise


Wrap up



• putting all aspects together into a single number, and visualise
• inﬂuence maximisation


Thanks

We’re hiring ML scientists, interns and engineers...
@fhuszar
fh@peerindex.com


Machine Learning at PeerIndex

Recommandé

Recommandé

Contenu connexe

Similaire à Machine Learning at PeerIndex

Similaire à Machine Learning at PeerIndex (20)

Dernier

Dernier (20)

Machine Learning at PeerIndex

Notes de l'éditeur