Social-media platforms have created new ways for citizens to stay informed and participate in public debates. However, to enable a healthy environment for information sharing, social deliberation, and opinion formation, citizens need to be exposed to sufficiently diverse viewpoints that challenge their assumptions, instead of being trapped inside filter bubbles.
In this paper, we take a step in this direction and propose a novel approach to maximize the diversity of exposure in a social network. We formulate the problem in the context of information propagation, as a task of recommending a small number of news articles to selected users.
We propose a realistic setting where we take into account content and user leanings, and the probability of further sharing an article. This setting allows us to capture the balance between maximizing the spread of information and ensuring the exposure of users to diverse viewpoints.
The resulting problem can be cast as maximizing a monotone and submodular function subject to a matroid constraint on the allocation of articles to users. It is a challenging generalization of the influence maximization problem. Yet, we are able to devise scalable approximation algorithms by introducing a novel extension to the notion of random reverse-reachable sets. We experimentally demonstrate the efficiency and scalability of our algorithm on several real-world datasets.
Maximizing the Diversity of Exposure in a Social Network
1. Maximizing the Diversity of Exposure
in a Social Network
Cigdem Aslay
Helsinki Algorithms Seminar
October 4, 2018
2. Maximizing the Diversity of Exposure in a Social Network
C. Aslay, A. Matakos, E. Galbrun, and A. Gionis. IEEE ICDM 2018.
https://arxiv.org/pdf/1809.04393.pdf
3. Outline
• Motivation
• Algorithmic Personalization and Filter Bubbles
• Information Propagation in Online Social Networks
• Diversity Exposure Maximization Problem
• Scalable Approximation Algorithm
• Experimental Results
• Future Work and Open Problems
4. Selective Exposure in Online Social Networks
• Online social networking platforms are “relevance maximizers”
• Relevant (=biased) content recommendation
• Relevant (=biased) posts from friends in social feed
• Content different from your viewpoint is less likely to reach you
Lack of exposure to diverse viewpoints
resulting from algorithmic personalisation
Filter bubble*
*The term was coined by internet activist Eli Pariser in 2010.
5. ImagefromGarimellaetal.,KDD2018TutorialonPolarization
“Filter bubbles are a serious problem with news.”
Bill Gates, 21 February 2017
“The internet has exacerbated phenomenon of
people having conversations in their own silos.”
“If you’re liberal, then you’re on MSNBC. If you’re a
conservative, you’re on Fox News.”
Barack Obama, 24 April 2017
“The two most discussed concerns this past year
were about diversity of viewpoints we see (filter
bubbles) and accuracy of information (fake news).”
Mark Zuckerberg, 16 February 2017
6. People are connected, they perform actions, actions propagate
nice read
indeed!
09:3009:00
post, like,
retweet,…
friends,
fans,
followers,..
like a virus
Information Propagation in Online Social Networks
8. Bursting Filter Bubbles
• Goal : We want users to be exposed to diverse content
• A user’s diversity exposure level depends on her political
leaning and the political leaning of the articles she consumes
• How : Recommend articles to users of a social network
• Articles maybe shared among users, creating possible
information cascades
Information propagation model
defined on the social graph
9. Bursting Filter Bubbles
Independent Cascade (IC) Model
• For each article i, each arc (u,v) is associated
with a propagation probability
• A node u activated at time t on article i tries to
activate each inactive neighbour v, succeeding
with probability
pi
u,v
pi
u,v
‣ Recommend articles matching users’ predisposition?
• Ensures higher spread but yields minimal increase of diversity
‣ Recommend articles radically opposing to users’ predisposition?
• High local diversity but hinders the spread of the articles
10. Diversity Exposure Maximization
• Given
• directed social graph G = (V,E)
• users’ leaning scores s(v), defined in [-1,1]
• set I of articles, each with leaning score s(i), defined in [-1,1]
• IC propagation parameters for each article
• users’ attention bound kv > 0
• total assignment size constraint k > 0
• Find a feasible assignment A of items to users that has the maximum
expected diversity exposure score E[F(A)]:
X
v2V
✓
max
i2E(v)
{s(i), s(v)} min
i2E(v)
{s(i), s(v)}
◆
E(v) : expected set of items that v is exposed to resulting from assignment A
11. Theoretical Analysis
• Diversity exposure score is monotone and submodular
• Monotonicity: expected diversity exposure score cannot decrease as
the assignment size increases
• Submodularity: marginal increase in expected diversity exposure score
shrinks as the assignment size increases
• Diversity exposure maximization is NP-Hard
• Reduction from the NP-Hard influence maximization problem (= select
k nodes that maximize expected spread)
• Restricted special case with one article i s.t. |s(i) - s(v)| = 1
12. Theoretical Analysis
• Family of feasible solutions form a matroid defined on the ground set
of (user,article) pairs
• (Matroid: structure that abstracts and generalizes the notion of
linear independence in vector spaces)
• Assignment size constraint: uniform matroid
• User attention bound constraint: partition matroid
• Intersection of these matroids: still a matroid
13. Theoretical Analysis
• Monotone submodular function maximization subject to a matroid
constraint
• Greedy algorithm provides 1/2 approximation*
• Select the feasible (user,article) pair giving the highest increase in
overall diversity-exposure score at each iteration
• Requires to check reachability by each article
• Use r MC simulations at each iteration: O(n * m * k * |I|2 * r)
• Extend recently developed techniques for scalable influence maximization
to solve a more general problem
* Fisher et al., "An analysis of approximations for maximizing submodular set functions", Polyhedral combinatorics 1978.
#P-hard!
14. Scalable Approximation
• Possible worlds model: G as a random directed edge-coloured multi-
graph
• Multiplicity of each edge : |I|
• Color-reachability: reachability only over edges of same colour
G = (V,E,p) g ~ G
Pr(g) =
Y
i2I
Y
(u,v)i2g
pi
uv
Y
(u,v)i2Eg
(1 pi
uv)
15. Scalable Approximation
• Generalize the reverse-reachability notion of influence maximization*
* Borgs et al., "Maximizing social influence in nearly optimal time.", SODA 2014.
Random Reverse Co-exposure Sets:
• Sample a possible world g from G: remove every edge (u,v)i with
probability
• Pick a target node v from G uniformly at random
• RC-set of v, Rv = {(user,article) pairs that can color-reach v via out-links in
g}
1 pi
uv
VU1
Rv = {(u1, blue), (u1, red), (u2, blue), (u2, red), (u3, blue), (u4, red), (u5, red)}
U2 U3
U4 U5
16. Random Reverse Co-exposure Sets
• Unbiased estimation from the weighted frequency of pairs appearing in
sample of random RC-sets:
• Weight of A on a random RC-set Rv = diversity exposure level of v
resulting from the pairs in A ∩ Rv
• Expected diversity exposure score E[F(A)] of A = n * expected
weight of A on a random Rv
• Estimate E[F(A)] by estimating the total weight w(A) of A on a
random sample of RC-sets
• A (user,article) pair that has high weight in a sample of random RC-sets
would provide high diversity exposure
17. Two-Phase Iterative Diversity-Exposure
Maximization (TDEM)
• So we want to have
Pr
h
|E[F(A)] n · w(A)|
✏
2
· OPT
i
nh
k
• Generate a sample of random RC-sets
• Apply greedy to find an assignment of size k that has the maximum
estimated weight on a random sample of RC
• How many random RC-sets are enough??
• We want an approximate greedy solution s.t. w.p. at least 1 - 𝛿˜AG
E[F( ˜AG
)]
✓
1
2
✏
◆
· OPT
Sample size is a function of OPT!
18. Two-Phase Iterative Diversity-Exposure
Maximization (TDEM)
Determination of Sample Size
• Requires the value of OPT which is unknown and NP-hard to compute
• Estimate a tight lower bound LB on OPT
• Perform a statistical test* B(x) on O(log2 n + 1) values of x = n, n/2, …, 2
• If OPT < x, B(x) = false w.h.p.
• Adaptively sample 𝛉x random RC-sets until the stopping condition, i.e.,
B(x) = true, is satisfied
• Compute the lower bound on the sample size using LB = x
Phase 1: Parameter
Estimation
* Tang et al., "Influence maximization in near-linear time: A martingale approach.", SIGMOD 2015.
19. Two-Phase Iterative Diversity-Exposure
Maximization (TDEM)
• Derive the lower bound on the sample size replacing OPT with LB
• Discard the previously generated 𝛉x RC-sets? No!
• For each possible assignment, and a sequence of random RC-sets
R1, R2,…, define M1, M2, … where
• Show that M1, M2,…. is a martingale, i.e., E[Mj | M1, ..., Mj-1] = Mj-1
• Use martingale inequalities to find a lower bound on the sample size
• No independence assumption, allows to re-use RC-sets,
improved run-time
Phase 1: Parameter
Estimation
Mj =
jX
z=1
(wz w)
20. Two-Phase Iterative Diversity-Exposure
Maximization (TDEM)
* Tang et al., "Influence maximization in near-linear time: A martingale approach.", SIGMOD 2015.
Running time linear in the total size of the RC-sets sample!
• Run-time analysis based on “almost” weighted maximum
coverage problem
• Competitive to running IMM* for the restricted special case
where |I| = 1 and |s(i) - s(v)| = 1
Phase 1: Parameter
Estimation
Phase 2: Pair
Selection
˜AG
21. Experiments
• Twitter Datasets*
* Garimella et al., "Balancing information exposure in social networks”, NIPS 2017.
• Node leanings via estimated probabilities of users to retweet content from
either of the opposing sides
• Leaning-aware influence parameters
• Leanings of 25 items distributed between -1 and 1
22. Algorithms tested
!22
Experiments
https://github.com/aslayci/TDEM
• TDEM : our algorithm
• FAR : recommends articles to high-degree nodes opposing
their predisposition
• CLOSE : recommends articles to high-degree nodes
matching their predisposition
• WEIGHT : recommends articles based on highest
degree(u) ⇥ |s(u) s(i)|
23. Results
Experiments
• At least 50% gain in expected diversity exposure over the best-
performing degree heuristic, sometimes reaches upto %90 gain!
24. Future Work
• Leaning-aware information propagation models defined over
multi-dimensional political spectrum
• Diversity-exposure measures defined on refined leaning modelling
• Adaption of scalable approximation algorithms to new scoring
function (possibly non-monotone and not submodular)
• Objective political advertising mechanisms