Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research Communities

736 vues

Publié le

A Hybrid Semantic Approach to Building
 Dynamic Maps of Research Communities
by F. Osborne, G. Scavo, E. Motta

URL: http://oro.open.ac.uk/41083/

In earlier papers we characterised the notion of diachronic topic-based communities –i.e., communities of people who work on semantically related topics at the same time. These communities are important to enable topic-centred analyses of the dynamics of the research world. In this paper we present an innovative algorithm, called Research Communities Map Builder (RCMB), which is able to automatically link diachronic topic-based communities over subsequent time intervals to identify significant events. These include topic shifts within a research community; the appearance and fading of a community; communities splitting, merging, spawning other communities; and others. The output of our algorithm is a map of research communities, annotated with the detected events, which provides a concise visual representation of the dynamics of a research area. In contrast with existing approaches, RCMB enables a much more fine-grained understanding of the evolution of research communities, with respect to both the granularity of the events and the granularity of the topics. This improved understanding can, for example, inform the research strategies of funders and researchers alike. We illustrate our approach with two case studies, highlighting the main communities and events that characterized the World Wide Web and Semantic Web areas in the 2000 – 2010 decade.

Publié dans : Sciences
  • Soyez le premier à commenter

EKAW2014 - A Hybrid Semantic Approach to Building 
Dynamic Maps of Research Communities

  1. 1. A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities Francesco Osborne, Beppe Scavo, Enrico Motta KMi, The Open University, United Kingdom November 27th 2014
  2. 2. Research communities The engine of research.
  3. 3. We need to understand how scientific communities adapt and cooperate to implement visions into concrete technologies.
  4. 4. Research communities Communities of academic authors are usually identified by using standard community detection algorithms, which typically exploit co-authorship or citation graphs.
  5. 5. Temporal topic-based communities (TTC) A different type of community we investigated is formed by the set of researchers who, at a given time, are following shared research trajectory, i.e. they are working on the same topics at the same time. Osborne, F., Scavo, G., & Motta, E. (2014). Identifying diachronic topic-based research communities by clustering shared research trajectories. In The Semantic Web: Trends and Challenges (pp. 114-129). Springer International Publishing.
  6. 6. Research Communities Map Builder • RCMB is able to automatically link diachronic topic-based communities over subsequent time intervals to identify significant events. • These include topic shifts within a community; the appearance and fading of a community; communities splitting, merging, spawning other communities; etc. • The output of RCMB is a map of research communities, annotated with the detected events, which provides a concise visual representation of the dynamics of a research area.
  7. 7. RCMB steps: 1. Applies the Temporal Semantic Topic-Based Clustering (TST) algorithm to find Temporal topic-based communities in different time intervals; 2. Detects Topic Shifts; 3. Links Communities in different years; 4. Detect Key Events;
  8. 8. RCMB steps: 1. Applies the Temporal Semantic Topic-Based Clustering (TST) algorithm to find Temporal topic-based communities in different time intervals. 2. Detects Topic Shifts in following years 3. Links Communities in different years 4. Detect Key Events Temporal Semantic Topic-Based Clustering Osborne, F., Scavo, G., & Motta, E. (2014). Identifying diachronic topic-based research communities by clustering shared research trajectories. In The Semantic Web: Trends and Challenges (pp. 114-129). Springer International Publishing.
  9. 9. TST in short 1. It augments the topic semantically using an automatically generated OWL ontology and represent each author as a semantic topic distribution over subsequent years. 2. It weighs each topic according to its relationship with the main topic, for highlighting the communities strongly related to the main topic. 3. It clusters authors using the ATTS (Adjusted Temporal Topic Similarity), which is computed by averaging the cosine similarities of the topic vectors over progressively smaller intervals of time.
  10. 10. Detecting Topic Shifts We use a sliding window algorithm that checks for a topic shift by comparing the initial topic distribution in time t with the topic distributions in time t+1, t+2… t+n. Information Extraction/Semantic Annotation community 2002 Infor. Extraction: 26 % Natural Language: 17 % Named Entity: 12 % Machine Learning: 9 % Knowledge Base: 9 % 2010 Linked Data: 16 % Natural Language: 15 % Semantic Annotation: 15 % SW Technology: 10 % Information Retrieval: 10 % Knowledge Base: 9 % Semantic Wiki: 9 % 2006 Semantic Annotation: 25 % Knowledge Base: 15 % Semantic Wiki: 11 % Information Extraction: 10 % Semantic Information: 8 % Natural Language: 6 % Information Retrieval: 6 %
  11. 11. Detecting Topic Shifts We define a topic shift a statistically significant change (detected via chi-square test ) in the topic distribution of a community which occurred in a certain time interval. To detect which topics were the main protagonists of this shift, we applying the same test excluding each time a different topic, and selecting the topic whose absence yields the bigger increment in the p value.
  12. 12. Community linking We are interested in two different links between community: • The strong link is defined as a link that connects the same community in subsequent timeframes. • The weak link is defined as the link that connects community C1 with community C2 in a subsequent timeframe, if C1 has an impact over C2 in terms of migrating authors and/or topics.
  13. 13. Community linking
  14. 14. Community linking We take the minimum values of ts and tw that minimize the MEF using the Nelder-Mead algorithm.
  15. 15. Key Events detection If a community has no strong links with any precedent interval communities, we detect the appearance of a community. 2006 2007 C1 C3 C2 C1 C2
  16. 16. Key Events detection If a community has no strong links with any subsequent interval communities, we detect the fading of a community. 2006 2007 C1 C2 C3 C1 C2
  17. 17. Key Events detection If a community is linked to more than one community in the subsequent interval and one of the links is a strong one we detect the forking of one or more communities out of the community characterized by the strong link. 2006 2007 C1 C1 C2
  18. 18. Key Events detection If a community is linked to more than one community in the subsequent interval and none of the links is a strong one we detect the splitting of a community into multiple communities. 2006 2007 C1 C2 C3
  19. 19. Key Events detection If two or more communities are linked to one community in the subsequent interval and one of the inlinks is a strong link, we detect the assimilation of one or more communities into the community C characterized by the strong link. 2006 2007 C1 C1 C2 If the communities fade after the event, they are labelled as absorbed to C.
  20. 20. Key Events detection If two or more communities are linked to one community in the subsequent interval and none of the inlinks is a strong link, we detect the merging of two or more communities in a new community C. 2006 2007 C1 C3 C2 If the communities fade after the event, they are labelled as merged in C.
  21. 21. Evaluation: Cluster Compactness
  22. 22. Case study We applying RCMB to two research areas: World Wide Web (WWW) and Semantic Web (SW). Our study was based on a dataset built from data retrieved by means of the API provided by Microsoft Academic Search. We first retrieved authors and papers labelled with WWW and SW or with their first 150 co-occurring topics. We then run RCMB on WWW and SW in the 2000-2010 time interval with a granularity of 3. The average number of authors selected in each year was 932 for WWW and 646 for SW.
  23. 23. Semantic Web
  24. 24. WWW
  25. 25. Future Work • Automatically generate comprehensive explanations for the identified dynamics. • Forecasting topic shifts and key events, e.g., estimating the probability that a new topic will emerge in a certain community or that two communities will merge in the coming years.
  26. 26. Questions? Interested in scholarly data? SAVE-SD 2015 Semantics, Analytics, Visualisation: Enhancing Scholarly Data Workshop at 24th International World Wide Web Conference May 19, 2015 - Florence, Italy Site: cs.unibo.it/save-sd

×