We propose a framework to address an important challenge in the context of the ongoing adoption of the “Web 2.0” in science and research, often referred to as “Research 2.0”. Microblogging is one of the trends with increasing leverage. The challenge in this thesis is to connect users of microblogging services such as Twitter based on specific common entities that are representative and truly matter to them. We investigated the possibilities of using social data for locating an expert who shares a very specific research topic. To enrich and verify this social data we link such content to existing open data provided by the online community. We are using semantic technologies (RDF ,SPARQL), com- mon ontologies (SIOC, FOAF, DublinCore, SWRC) and Linked Data (DBpedia, GeoNames, CoLinDa) to extract and mine the data about scientific conferences out of context of microblogs. We are identifying users related to each other based on entities such as topics (tags), events, time, locations and persons (mentions). As a proof-of-concept we explain, implement and evaluate such a researcher profiling use case. It involves the development of a framework that focuses on the proposition of researches based on topics and conferences they have in common. This framework provides an API that allows quick access to the analyzed information. A demonstration application: “Researcher Affinity Browser” shows how the API supports developers to build rich internet applications for Research 2.0. This application also intro- duces the concept “affinity” that exposes the implicit proximity between entities and users based on the content users produced. The usability of a demonstration application and the usefulness of the framework itself are investigated with an explicit evaluation question- naire. This user feedback lead to important conclusions about successful achievements and opportunities to further improve this effort.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Researcher Profiling based on Semantic Analysis in Social Networks
1. Researcher Profiling based on
Semantic Analysis in Social Networks
Laurens De Vocht
Supervisors Promotors
Gonzalo Parra Erik Duval
Selver Softic Martin Ebner
July 1, 2011
4. Definitions
Profiling
“Inferring unobser vable information about users from
observable information about them, that is their actions or their
utterances.” (Zukerman and Albrecht, 2001)
Semantic Analysis
“A technique using semantic-based tools and ontologies in
order to gain a deeper understanding of the information being
stored and manipulated in an existing system” (McComb, 2004)
4
5. Problem Statement
Web users generate a massive
unstructured information flow
?
Who has scientific information
relevant for me?
5
6. Problem Statement
Connecting researchers based on shared scientific events
(conferences)
Scientific Profiling
Scientific
User Model Event Model Conferences
Resource
Researchers
Profiler/
Analyzer
Researcher
(User)
6
7. The Social Semantic Web
Community of (micro)blogging,
researchers with sharing,
conference tagging,
experience discussion
semi-structured
information
Larger population of system
people interested in
(faceted) search
scientific conferences
engine
recommendation clustered and
engine analyzed data
(Gruber, 2007)
7
8. The Social Semantic Web
Community of (micro)blogging,
researchers with sharing,
conference tagging,
experience discussion
semi-structured
information
Larger population of system
people interested in
(faceted) search
scientific conferences
engine
recommendation clustered and
engine analyzed data
Human process
(Gruber, 2007)
7
9. The Social Semantic Web
Community of (micro)blogging,
researchers with sharing,
conference tagging,
experience discussion
semi-structured
information
Larger population of system
people interested in
(faceted) search
scientific conferences
engine
recommendation clustered and
engine analyzed data
Human process Machine process
(Gruber, 2007)
7
10. The Social Semantic Web
Social Web
Community of (micro)blogging,
researchers with sharing,
conference tagging,
experience discussion
semi-structured
information
Larger population of system
people interested in
(faceted) search
scientific conferences
engine
recommendation clustered and
engine analyzed data
Human process Machine process
(Gruber, 2007)
7
11. The Social Semantic Web
Social Web Semantic Web
Community of (micro)blogging,
researchers with sharing,
conference tagging,
experience discussion
semi-structured
information
Larger population of system
people interested in
(faceted) search
scientific conferences
engine
recommendation clustered and
engine analyzed data
Human process Machine process
(Gruber, 2007)
7
12. The Social semantic Web
‣Hashtags as Identifiers
‣not always strong or consistent enough
‣properties of good hashtags formalized
‣helpful in assessment of valuable identifiers
(Laniado and Mika, 2007)
‣Expert Search/Profiling with Linked Data
‣aggregate and analyze certain types of data
‣need to surpass limits of closed data sets
‣LOD delivers multi-purpose data
(Stankovic et al., 2010)
8
13. Scope & Value of the Study
‣Bridging research areas
Human Computer-Interaction & Semantic Analysis
‣Mining usable data
out of social networks (microblogs)
‣Integration
Social network data and linked open data
‣Framework driven methodology
based upon current state-of-the-art semantic tools
‣Evaluation
proof-of-concept Research 2.0 application
9
14. Solution
Annotate Data from Social Networks
Community approved
ontologies: FOAF, SIOC
Linked Open Data Applications
Scientific Profiling Framework
Connect People and Resources
that share Scientific Affinities
10
16. Framework: Overview
Social Linked Open
Output Format
Networks Data Cloud
Framework Aggregate Interlink Publish
Archived/Cached Scientific
Linked Data Information
Data Annotate Analyse
12
17. Framework: Overview
Social Linked Open
Output Format
Networks Data Cloud
Framework Aggregate Interlink Publish
Archived/Cached Scientific
Linked Data Information
Data Annotate Analyse
DBPedia JSON
Twitter Colinda RDF (XML)
GeoNames
Aggregate Interlink Publish
Semantic Scientific
Grabeeter Profiling API
Annotate Profiling Network Analyse
12
18. Framework: Overview
Social Linked Open
Output Format
Networks Data Cloud
Framework Aggregate Interlink Publish
Archived/Cached Scientific
Linked Data Information
Data Annotate Analyse
DBPedia JSON
Twitter Colinda RDF (XML)
GeoNames
Aggregate Interlink Publish
Semantic Scientific
Grabeeter Profiling API
Annotate Profiling Network Analyse
13
32. Evaluation: Usability
‣Definitely useful application
‣Use of the map view makes sense
‣People - Event split confusing
‣View of own profile
‣not a suitable starting point
‣only useful in comparison
‣shouldn’t be always visible
‣Person-specific affinities
‣too much hidden
25
35. Evaluation: Usefulness
‣Relevance
Test users rate their search results
‣Satisfaction questionnaire
Targeted questions about usefulness
Allow comments on user interface
28
36. Evaluation: Usefulness
Relevant user percentage
Number of users
0% (None)
1-20% (A few)
21-40% (Less than one half)
41-60% (About one half)
61-80% (More than one half)
81-99% (Almost all)
100% (All)
0 1 2 3 4
29
37. Evaluation: Usefulness Usefulness Questionnaire Results
Concept Affinity
Clear view of affinities between people
Map & Plot combination understood
Deactivating filer fast enough
Activating filer fast enough
Never usability glitches
Convention between views understood
Information display not overwhelming (confusing)
Relevant detailed person info
Shown details correspond with ‘real life’ activities
Enough relevant (new) persons
Daily updating of information obvious
Twitter data made more useful for researchers
1 2 3 4 5
30
38. Evaluation: Discussion
‣ Affinities exposed in an engaging way
‣ Relevant users rating
OR Many common entities trigger positive rating
OR Common entities start deeper investigation
‣ Reliability of person details hard to verify
‣ UI satisfaction user dependent
‣ What does the user expect from “Affinity Browser”?
‣ Test different scenarios to identify usage types?
31
39. Future work
‣ Rank tags
by importance, not just frequency of use
‣ Visualization
improve viewing of links between users and entities
‣ Multiple Resources
better reliability and more verification of data
32
40. Conclusion
‣ Framework could support many social semantic-based applications
‣ Realized with current state-of-the-art technologies
‣ Interlinking with Linked Open Data Cloud enriches social network
data
‣ Researcher Affinity Browser
‣ Exposes affinities between users
‣ User feedback affirms positively new view on social data
‣ Hash tags identified as conferences provide consistent links
33
Notes de l'éditeur
\n
\n
Results: so far\n
\n
To make progress in research it is important to get in touch and share ideas with people who share affinities.\nOne of the most visible trends on the internet is the emergence of “Social Web” sites.\nCurrent online community sites are isolated fromone another. The main reason for this\nlack of interoperability is the fact that common standards for data interchange still have\nto arise.\nWe propose a framework to address an important issue in the context of the ongoing\nadoption of the “Web 2.0” in science and research, often referred to as “Science 2.0” or\n“Research 2.0”. A growing number of people are linked via acquaintances and online\nsocial networks such as Twitter allow indirect access to a huge amount of ideas. These\nideas are contained in amassive human information flow. That users of these networks\nproduce relevant data is being shown in many studies. The problem however lies in\ndiscovering and verifying such a stream of unstructured data items. Another related\nproblem is locating an expert that could provide an answer to a very specific research\nquestion.\n\n
The goal is to build a semantic profiling framework that can support applications and\nservices that try to improve the connecting of researchers.\nThemain use case and application that the framework has to support is illustrated by\nwhat could be called: “the conference case”. Scientists and researchers are interested in\nvery specific topics, this is best verified by the conferences they are attending. Another\ntrend is that they all blog and tweet about these events[14][10]. This creates huge opportunities for profiling. The attendees tweet about what they notice, what they remark\nas interesting for their own projects. What if we could connect these users using this\ninformation? We could call an application that does just that “Scientific Profiling”. This\napproach comes from the concept that the data produced in social networks can have\ntrue value if properly annotated and interlinked [5]. A second requirement is to create\na suitable context in which this information can get meaning. This is very important to\nidentify which ontologies should be used\n
Social Semantic Web Application - A Collective Knowledge System.\nThe essential difference between the classic Web and the Semantic Web is that structured data is exposed in a structured way.  For example, the classic Web might have a document that mentions a place, "Paris".  The conventional way to find this document on the Web is to search for the term "Paris" in a search engine.  Similarly, to find out more about the place one would plow through the search results on the term "Paris" and manually pick out the pages that seem to have something to do with the place.  The heuristics employed by today's search engines for inferring what one means by the string "Paris" are biased by popularity, which means that one will encounter many pages about a celebrity heiress en route to the French capital.\nThe Semantic Web vision is to point to a representation of the entity, in this case a city, rather than its surface manifestation. Thus to find the city Paris, one would search for things known to be cities for entities whose names match "Paris", possibly limiting the results to cities of a certain size or in a particular country. Then one might look for information of the desired type about the city, such as maps, travel guides, restaurants, or famous people who lived in Paris during some period of history.  The heuristics for searching the Semantic Web depend on conventions about how to represent things like cities (such as those specified in ontologies), and the availability of data which use these conventions.  Such data is not available for most user contributions in the Social Web. To move to the next level of collective knowledge systems, it would be nice to get the benefits of structured data from the systems that give rise to the Social Web.\nGruber argues that the Social Web and the Semantic Web should be combined, and that collective knowledge systems are the "killer applications" of this integration.  The keys to getting the most from collective knowledge systems, toward true collective intelligence, are tightly integrating user-contributed content and machine-gathered data, and harvesting the knowledge from this combination of unstructured and structured information.\n
Social Semantic Web Application - A Collective Knowledge System.\nThe essential difference between the classic Web and the Semantic Web is that structured data is exposed in a structured way.  For example, the classic Web might have a document that mentions a place, "Paris".  The conventional way to find this document on the Web is to search for the term "Paris" in a search engine.  Similarly, to find out more about the place one would plow through the search results on the term "Paris" and manually pick out the pages that seem to have something to do with the place.  The heuristics employed by today's search engines for inferring what one means by the string "Paris" are biased by popularity, which means that one will encounter many pages about a celebrity heiress en route to the French capital.\nThe Semantic Web vision is to point to a representation of the entity, in this case a city, rather than its surface manifestation. Thus to find the city Paris, one would search for things known to be cities for entities whose names match "Paris", possibly limiting the results to cities of a certain size or in a particular country. Then one might look for information of the desired type about the city, such as maps, travel guides, restaurants, or famous people who lived in Paris during some period of history.  The heuristics for searching the Semantic Web depend on conventions about how to represent things like cities (such as those specified in ontologies), and the availability of data which use these conventions.  Such data is not available for most user contributions in the Social Web. To move to the next level of collective knowledge systems, it would be nice to get the benefits of structured data from the systems that give rise to the Social Web.\nGruber argues that the Social Web and the Semantic Web should be combined, and that collective knowledge systems are the "killer applications" of this integration.  The keys to getting the most from collective knowledge systems, toward true collective intelligence, are tightly integrating user-contributed content and machine-gathered data, and harvesting the knowledge from this combination of unstructured and structured information.\n
Social Semantic Web Application - A Collective Knowledge System.\nThe essential difference between the classic Web and the Semantic Web is that structured data is exposed in a structured way.  For example, the classic Web might have a document that mentions a place, "Paris".  The conventional way to find this document on the Web is to search for the term "Paris" in a search engine.  Similarly, to find out more about the place one would plow through the search results on the term "Paris" and manually pick out the pages that seem to have something to do with the place.  The heuristics employed by today's search engines for inferring what one means by the string "Paris" are biased by popularity, which means that one will encounter many pages about a celebrity heiress en route to the French capital.\nThe Semantic Web vision is to point to a representation of the entity, in this case a city, rather than its surface manifestation. Thus to find the city Paris, one would search for things known to be cities for entities whose names match "Paris", possibly limiting the results to cities of a certain size or in a particular country. Then one might look for information of the desired type about the city, such as maps, travel guides, restaurants, or famous people who lived in Paris during some period of history.  The heuristics for searching the Semantic Web depend on conventions about how to represent things like cities (such as those specified in ontologies), and the availability of data which use these conventions.  Such data is not available for most user contributions in the Social Web. To move to the next level of collective knowledge systems, it would be nice to get the benefits of structured data from the systems that give rise to the Social Web.\nGruber argues that the Social Web and the Semantic Web should be combined, and that collective knowledge systems are the "killer applications" of this integration.  The keys to getting the most from collective knowledge systems, toward true collective intelligence, are tightly integrating user-contributed content and machine-gathered data, and harvesting the knowledge from this combination of unstructured and structured information.\n
Social Semantic Web Application - A Collective Knowledge System.\nThe essential difference between the classic Web and the Semantic Web is that structured data is exposed in a structured way.  For example, the classic Web might have a document that mentions a place, "Paris".  The conventional way to find this document on the Web is to search for the term "Paris" in a search engine.  Similarly, to find out more about the place one would plow through the search results on the term "Paris" and manually pick out the pages that seem to have something to do with the place.  The heuristics employed by today's search engines for inferring what one means by the string "Paris" are biased by popularity, which means that one will encounter many pages about a celebrity heiress en route to the French capital.\nThe Semantic Web vision is to point to a representation of the entity, in this case a city, rather than its surface manifestation. Thus to find the city Paris, one would search for things known to be cities for entities whose names match "Paris", possibly limiting the results to cities of a certain size or in a particular country. Then one might look for information of the desired type about the city, such as maps, travel guides, restaurants, or famous people who lived in Paris during some period of history.  The heuristics for searching the Semantic Web depend on conventions about how to represent things like cities (such as those specified in ontologies), and the availability of data which use these conventions.  Such data is not available for most user contributions in the Social Web. To move to the next level of collective knowledge systems, it would be nice to get the benefits of structured data from the systems that give rise to the Social Web.\nGruber argues that the Social Web and the Semantic Web should be combined, and that collective knowledge systems are the "killer applications" of this integration.  The keys to getting the most from collective knowledge systems, toward true collective intelligence, are tightly integrating user-contributed content and machine-gathered data, and harvesting the knowledge from this combination of unstructured and structured information.\n
Laniado and Mika found that not all hashtags are used in the same way, not all of them aggregate messages around a community or a topic, not all of them endure in time, and not all of them have an actual meaning. In this work they had addressed the issue of evaluating Twitter hashtags as strong identifiers, as a first step in order to bridge the gap between Twitter and the Semantic Web. The first contribution of this paper stands in the formalization of the problem, and in the elaboration of a number of desired properties for a good hashtag to serve as a URI. Frequency, specificity, consistency in usage and stability over time. Based on these data, they had tested the results obtained with the algorithms described in their paper, showing how a combination of the proposed measures can help in the task of assessing which tags are more likely to represent valuable identifiers. These results are promising, with respect to the perspective of anchoring Twitter hashtags to Semantic Web URIs, and to detect concepts and entities valuable to be treated as new identifiers.\n The authors concluded that expert search and profiling systems aggregate and analyze certain types of data depending on the types of expertise hypotheses they use. Traditional approaches tend to retrieve their data from closed or limited data corpuses. LOD on the other hand allows querying the whole Web like a huge database, thus surpassing the limits of closed data sets, and closed online communities. They believe that this opens new possibilities for traditional expert search and profiling systems which usually only rely on data from their local and limited databases or on unstructured data gathered from the Web. LOD also stands up for a great promise to deliver mutli purpose data that can be used to find experts in many domains and with many different expertise hypotheses. In this paper they have explored the potentials and drawbacks of LOD in comparison to traditional datasources used for expert search. They haven’t only asked the question what LOD can do, but also what one can do for LOD to make it an even better source of expertise evidence.\n
The study spans two main areas - semantic analysis and usability. The current state of the art Semantic Web standards and processes are used as a foundation for this study. Researcher profiling applications integrate human computer interaction (HCI) and expert finding. Everybody who is interested in the semantic web, microblogging and profiling might find some parts of this thesis relevant.\nThe approach presented aims at gaining more knowledge and mining usable data out of social networks, especially microblogs, with a framework driven methodology based upon Semantic Web standards and tools. Introducing the interesting aspects about microblogs, this thesis tries to answer how far they correspond with ideas from other research areas like Science 2.0, Research 2.0, Semantic Web or Linked Data and to outline the importance and relevance of such or similar efforts by examples and arguments from current research and with examples from current work.\nIt is to be noted that neither the literature study nor the software architecture want to give a broad overview of the current semantic web and microblogging services. It is targeted as a carefully considered selection of articles that allows the development of researcher profiling applications. The architecture of the framework is being designed only with the problem statement in mind. At this time it is not part of the research to find out how this could be extended to other resources or targets (e.g. mobile applications). It focuses on the integration of user data from a microblogging service and domain knowledge from scientific conferences.\n
The SemanticWeb Technology stack is well defined and applying frameworks\nsuch as SIOC (Semantically Interlinked Online Communities) [4] and FOAF (Friend-Of-A-Friend) [2] can lead to a an interlinked and semantically rich knowledge source. This\nknowledge source will be built with user profiles and the content they produce on various\nsocial networks as a basis.\nTwitter contains infos on:\nPeople, Organisations, Locations, Trends …\nLOD Cloud contains\nBillions of triples about:\nGeolocations , data about science, government, common knowledge , persons, news …\n
Results: so far\n
The idea is to design, develop and implement a framework that collects data from social networks and uses community approved ontologies and linked open data to analyze and verify the data.\n
The idea is to design, develop and implement a framework that collects data from social networks and uses community approved ontologies and linked open data to analyze and verify the data.\n
Aggregate your Tweets, Search in your Tweets offline using the Grabeeter Client. Grabeeter [45] is an application that allows you to search tweets of a single Twitter user online and offline. In contrast to the Twitter API, Grabeeter provides all stored tweets and makes no restriction over time.The Grabeeter web application uses the Twitter API to retrieve tweets of predefined users. Tweets are stored in the Grabeeter database and on the file system as Apache Lucene[2] index. In order to ensure an efficient search tweets must be indexed.\n\n
The idea is to design, develop and implement a framework that collects data from social networks and uses community approved ontologies and linked open data to analyze and verify the data.\n
The semantic profiling framework has to support a Scientific Profiling application as\nwas explained in the problem statement in chapter 1. The framework architecture still\nconsists of three layers:\n1. Extraction layer: Extracts data fromvarious resources and annotates it using relevant ontologies for that specific data context.\n2. Interlinking layer: Is feeded with annotated data (triples) and creates a SPARQL\nendpoint for it. It is responsible for requesting more data if needed for a certain\ninformation query. It parsers high level queries and translates them tot SPARQL\nQueries. The results are then being returned.\n3. Analysis layer: Here a user information needs are translated into high level queries\nthat the interlinking layer understands. It also contains somemetrics to rank and\nevaluate the returned results.\n
\n
The user profile\n
Related entities for a user\n
Suggested conferences for a user\n
Suggested users & info for a specific event\n
The test application is deployed on the Google App Engine server. This makes the maintenance straightforward and the deployment simple.\n\nGrabeeter consists of several scripts that are crawling the registered users Twitter accounts. Everything is stored in a MySQL database. Requests from the Semantic Profiling network are querying the Grabeeter MySQL database directly.\n\nThe semantic profiling server has several scripts that maintain the high level functionality. Two scripts are run periodically to keep the linked data network up to date. Other scripts realize the API functionality.\nThe “provider” script checks the Grabeeter database for new users. If there are new users their data is passed on to the Extraction module for annotation and triplification. For existing users, new tweets are fetched and triplified.\nThe “interlinking” script goes through all tags and first compares them against the Colinda repository. Any found conference tags are annotated appropriately. Secondly the script checks if tags represent a location or a common knowledge entity.\nThe scripts “person”, “event” and“discovery” implement the API functionality. They use the arguments given by the REST call. The return is every time a JSON Object containing the result of the call. The script “allusers” returns a JSON Array that contains all users currently in the system.\n
Results: so far\n
Avoid user interface is an issue (hide it)\nFocus should be on the data\nFixed data: users are presented a role and have to find a good matching conference and expert.\n\n
\n
Affinities as a starting point. Affinities are now facets to filter the result list of people. Instead of popup windows, tabs with details about each user appear in the bottom.\n
Affinities as a starting point. Affinities are now facets to filter the result list of people. Instead of popup windows, tabs with details about each user appear in the bottom.\n
\n
\n
Positive agreement among users\n1: Concept Affinity\n3: Understandable combination with affinity plot\n7: Convention between views understood\n13: Twitter data is made more useful for researchers\n\nNo agreement among users\n2: Clear view of affinities between people\n4,5: Filter (de)activation\n6: Never usability glitches\n8: Information display not overwhelming/confusing\n12: Daily updates of information obvious\n\n
\n
The more resources, the more types of entities can be interlinked to improve the verifiability of the results. The framework can easily be enriched easily with additional RDF resources, a new handle in the Interlinking module suffices. Some more effort has to be done to add data from another source that is not yet available as RDF. In that case it is necessary to write an additional Model class for the Extraction module and a handle in the Annotator class that includes data from that module by annotating it appropriately. This process is completely comparable to the extraction of Twitter data presented in this thesis. On the high level, new functionality can easily be added by proper translation into SPARQL queries. As more different data models and resources become available it might be of interest to extend the API as such. Again the same approach can be used as for the discovery and presentation of persons and scientific events.\n
The framework serves as a powerful backend for a web service. In the requirements of the current framework we focused on the ability to extract, annotate and interlink data from Twitter and make the linked data available as a SPARQL Endpoint and a Web Service that allows high level requests. The architecture is based on state-of-the-art technologies and brings in a novel approach of usage and dissemination of knowledge cumulated in social networks. It uses semantic tools and techniques for the domains of appliance like Research 2.0.\nThe web service behaves as a REST API and can support applications that want to propose interesting people or interesting scientific events to their users. It is possible to create an application that connects people who attend or mention the same scientific conference, as soon as they both have made their social data available to the system. We have shown that the enrichment of social network data with linked data leads to a verifiable user profile that allows comparison with others alike.\nThe demonstration application introduces the concept “affinity”. The concept has only been used a few times before, but for a similar purpose: to expose an otherwise hidden proximity to or liking for specific aspects. The usefulness of this approach and its presentation has been reviewed positively by test users from the target group, researchers. They appreciated the use of affinities. Their feedback exposed what we learned in theory from the literature study: the use of linked data shapes a whole new view on existing social data. By interlinking tags to scientific conferences we are able to display verified entities. We noted for example that the choice for hash tags lead to enough identified\n\n