Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds.
In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?.
This work was presented at The Web Conference 2018, MSM workshop.
Spotify AI DJ Deck - The Agency at University of Florida
Iterative knowledge extraction from social networks. The Web Conference 2018
1. Iterative
Knowledge Extraction
from Social Networks
Marco Brambilla, Stefano Ceri, Florian Daniel,
Marco Di Giovanni, Andrea Mauri, Giorgia Ramponi
The Web Conference, WWW 2018, Lyon, France
MSM'2018
5. Extracting Emerging Knowledge from Social Media
Marco Brambilla, et Al. 2017. Extracting Emerging Knowledge from Social Media.
In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
DOI: https://doi.org/10.1145/3038912.3052697
16. 1. How does reconstructed domain knowledge evolve with an iterative
process?
2. How does the reconstructed domain knowledge spread
geographically?
3. Can the method be used to inspect the past, present, and future of
knowledge?
4. Can the method be used to find emerging knowledge?
17. 1. How does reconstructed domain knowledge evolve with an iterative
process?
Extremely domain dependent
18. 1. How does reconstructed domain knowledge evolve with an iterative
process?
Extremely domain dependent
Precision remains rather stable
19. 1. How does reconstructed domain knowledge evolve with an iterative
process?
Extremely domain dependent
Precision sometimes also increases
20. 2. How does the reconstructed domain knowledge spread geographically?
USA ChessPlayers
21. 2. How does the reconstructed domain knowledge spread geographically?
USA ChessPlayers
22. 2. How does the reconstructed domain knowledge spread geographically?
USA ChessPlayers
23. 2. How does the reconstructed domain knowledge spread geographically?
Iteratively found knowledge spans large geographical areas very fast
24. 3. Can the method be used to inspect the past, present, and future of
knowledge?
2016.01 2016.03 2016.06 2016.09 2016.12
27
candidates
Fashion designer experiment
25. 3. Can the method be used to inspect the past, present, and future of
knowledge?
2016.01 2016.03 2016.06 2016.09 2016.12
27
candidates
34
new candidates
Fashion designer experiment
26. 3. Can the method be used to inspect the past, present, and future of
knowledge?
2016.01 2016.03 2016.06 2016.09 2016.12
27
candidates
34
new candidates
18
new candidates
Fashion designer experiment
27. 3. Can the method be used to inspect the past, present, and future of
knowledge?
2016.01 2016.03 2016.06 2016.09 2016.12
27
candidates
34
new candidates
18
new candidates
16
new candidates
Fashion designer experiment
28. 4. Can the method be used to find emerging knowledge?
Fashion designers Finance influencers
Fiction Writers Chess Players
Emergent
Famous
30. Some updates..
Feature vector of user u: [𝑛1, 𝑛2, 𝑛3, . . ] where:
• 𝑛𝑖 are nouns/verbs/proper nouns frequencies in user tweets
Syntactic features: verbs, nouns and proper nouns
31. Some updates..
Feature vector of user u: [𝑛1, 𝑛2, 𝑛3, . . ] where:
• 𝑛𝑖 are nouns/verbs/proper nouns frequencies in user tweets
Syntactic features: verbs, nouns and proper nouns
We consider 𝑛𝑖 as the probabilities that u uses the i-th word in his
tweets.
32. Some updates..
Feature vector of user u: [𝑛1, 𝑛2, 𝑛3, . . ] where:
• 𝑛𝑖 are nouns/verbs/proper nouns frequencies in user tweets
We consider 𝑛𝑖 as the probabilities that u uses the i-th word in his
tweets.
Syntactic features: verbs, nouns and proper nouns
DISTRIBUTION OF WORDS Entropy as metric
33. Evaluation with 10 seeds, 10 good candidates and 600 random accounts
Finance domain
34. Conclusion
• We show the geographic and temporal spreading of entities extracted
by the method.
• The method grants a good precision even after some iterations in
many domains.
• Future work includes the semi-automatic building of a richer domain
model, by studying other twitter features (such as verbs and nouns
which appear in tweet texts).