This document provides a literature survey of approaches for ranking tagged web documents in social bookmarking systems. It begins with background on social information systems, folksonomy, and social bookmarking systems. The main body of the document reviews six categories of approaches for ranking documents: personomy based techniques, frequency and similarity based techniques, structure-based techniques, semantics-based techniques, cluster based techniques, and probability based techniques. Each category discusses several specific approaches that have been proposed, outlining the techniques and algorithms used. The document concludes by stating that while many techniques have been proposed, open questions still remain around improving search effectiveness in social bookmarking systems.
This capstone report analyzes how user-generated metadata can enhance findability in social software applications like content tagging and recommender systems. The report examines these systems' strengths and weaknesses for information classification, retrieval, and discovery. Based on an analysis of six system-factor combinations, the report finds that content tagging systems have stronger overall findability than recommender systems, particularly for information classification. However, recommender systems exhibit strengths for information discovery. The report provides examples of content tagging and recommender systems to illustrate different design approaches.
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can upload contents on web highlights the need of content controls on material published on the web. As definition of search is changing, socially-enhanced interactive search methodologies are the need of the hour. Ranking is pivotal for efficient web search as the search performance mainly depends upon the ranking results. In this paper new integrated ranking model based on fused rank of web object based on popularity factor earned over only valid interlinks from multiple social forums is proposed. This model identifies relationships between web objects in separate social networks based on the object inheritance graph. Experimental study indicates the effectiveness of proposed Fusion based ranking algorithm in terms of better search results.
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
Fast and Appropriate Social Network Analysis (SNA) tools ,techniques, are required to collect and classify
opinion scores on social networksites , as a grouping on wrong opinion may create problems for a society or
country . Social Network Analysis (SNA) is popular means for researcher as the number of users and groups
increasing day by day on that social sites , and a large group may influence other.In this paper, we
recommendhybrid model of opinion recommendation systems, for single user and for collective community
respectively, formed on social liking and influence network theory. By collecting thedata of user social networks
and preferenceslike, we designed aimproved hybrid prototype to imitate the social influence by like and sharing
the information among groups.The significance of this paper to analyze the suitability of ANN and Fuzzy sets
method in a hybrid manner for social web sites classifications, First, we intend to use Artificial Neural
Network(ANN)techniques in social media data classification by using some contemporary methods different
than the conventional methods of statistics and data analysis, in next we want to propagate the fuzzy approach
as a way to overcome the uncertainity that is always present in social media analysis . We give a brief overview
of the main ideas and recent results of social networks analysis , and we point to relationships between the two
social network analysis and classification approaches .This researchsuggests a hybrid classification model build
on fuzzy and artificial neural network (HFANN). Information Gain and three popular social sites are used to
collect information depicting features that are then used to train and test the proposed methods . This neoteric
approach combines the advantages of ANN and Fuzzy sets in classification accuracy with utilizing social data
and knowledge base available in the hate lexicons.
A Review: Text Classification on Social Media DataIOSR Journals
This document provides a review of different classifiers used for text classification on social media data. It discusses how social media data is often unstructured and contains users' opinions and sentiments. Various machine learning algorithms can be used to classify this social media text data, extracting meaningful information. The document focuses on describing Naive Bayes classifiers, which are commonly used for text classification tasks. It explains how Naive Bayes classifiers work by calculating the posterior probability that a document belongs to a certain class, based on applying Bayes' theorem with an independence assumption between features.
1. The document proposes techniques to improve search performance by matching schemas between structured and unstructured data sources.
2. It involves constructing schema mappings using named entities and schema structures. It also uses strategies to narrow the search space to relevant documents.
3. The techniques were shown to improve search accuracy and reduce time/space complexity compared to existing methods.
This document discusses tag-based information retrieval using folksonomies. It describes how folksonomies allow for tag-based browsing and retrieval of user-generated content. Several algorithms for folksonomy-based information retrieval are summarized, including FolkRank, which adapts PageRank to folksonomies, and GFolkRank, which extends FolkRank to account for grouping of resources. The document also discusses personalized variants of these algorithms and evaluates their performance on real datasets.
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...ijcsa
Social Tagging System is the process in which user makes their interest by tagging on a particular item. These STS are in associated with web 2.0 and has sourceful information for the users with their recommendations. It provides different types of recommendations are modeled by a 3-order tensor, on which multiway latent semantic analysis and dimensionality reduction is performed using both the Higher Order Singular Value Decomposition (HOSVD) method and the KernelSVD smoothing technique. We provide now with the 4-order tensor approach, which we named as Tensor Reduction. Here the items that are tagged can be viewed by the user who are recommended the same item and tagged over it. There by can improve the social tagging recommendations efficiency and also the unwanted request has been controlled. The results show significant improvements in terms of effectiveness.
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
This capstone report analyzes how user-generated metadata can enhance findability in social software applications like content tagging and recommender systems. The report examines these systems' strengths and weaknesses for information classification, retrieval, and discovery. Based on an analysis of six system-factor combinations, the report finds that content tagging systems have stronger overall findability than recommender systems, particularly for information classification. However, recommender systems exhibit strengths for information discovery. The report provides examples of content tagging and recommender systems to illustrate different design approaches.
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can upload contents on web highlights the need of content controls on material published on the web. As definition of search is changing, socially-enhanced interactive search methodologies are the need of the hour. Ranking is pivotal for efficient web search as the search performance mainly depends upon the ranking results. In this paper new integrated ranking model based on fused rank of web object based on popularity factor earned over only valid interlinks from multiple social forums is proposed. This model identifies relationships between web objects in separate social networks based on the object inheritance graph. Experimental study indicates the effectiveness of proposed Fusion based ranking algorithm in terms of better search results.
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
Fast and Appropriate Social Network Analysis (SNA) tools ,techniques, are required to collect and classify
opinion scores on social networksites , as a grouping on wrong opinion may create problems for a society or
country . Social Network Analysis (SNA) is popular means for researcher as the number of users and groups
increasing day by day on that social sites , and a large group may influence other.In this paper, we
recommendhybrid model of opinion recommendation systems, for single user and for collective community
respectively, formed on social liking and influence network theory. By collecting thedata of user social networks
and preferenceslike, we designed aimproved hybrid prototype to imitate the social influence by like and sharing
the information among groups.The significance of this paper to analyze the suitability of ANN and Fuzzy sets
method in a hybrid manner for social web sites classifications, First, we intend to use Artificial Neural
Network(ANN)techniques in social media data classification by using some contemporary methods different
than the conventional methods of statistics and data analysis, in next we want to propagate the fuzzy approach
as a way to overcome the uncertainity that is always present in social media analysis . We give a brief overview
of the main ideas and recent results of social networks analysis , and we point to relationships between the two
social network analysis and classification approaches .This researchsuggests a hybrid classification model build
on fuzzy and artificial neural network (HFANN). Information Gain and three popular social sites are used to
collect information depicting features that are then used to train and test the proposed methods . This neoteric
approach combines the advantages of ANN and Fuzzy sets in classification accuracy with utilizing social data
and knowledge base available in the hate lexicons.
A Review: Text Classification on Social Media DataIOSR Journals
This document provides a review of different classifiers used for text classification on social media data. It discusses how social media data is often unstructured and contains users' opinions and sentiments. Various machine learning algorithms can be used to classify this social media text data, extracting meaningful information. The document focuses on describing Naive Bayes classifiers, which are commonly used for text classification tasks. It explains how Naive Bayes classifiers work by calculating the posterior probability that a document belongs to a certain class, based on applying Bayes' theorem with an independence assumption between features.
1. The document proposes techniques to improve search performance by matching schemas between structured and unstructured data sources.
2. It involves constructing schema mappings using named entities and schema structures. It also uses strategies to narrow the search space to relevant documents.
3. The techniques were shown to improve search accuracy and reduce time/space complexity compared to existing methods.
This document discusses tag-based information retrieval using folksonomies. It describes how folksonomies allow for tag-based browsing and retrieval of user-generated content. Several algorithms for folksonomy-based information retrieval are summarized, including FolkRank, which adapts PageRank to folksonomies, and GFolkRank, which extends FolkRank to account for grouping of resources. The document also discusses personalized variants of these algorithms and evaluates their performance on real datasets.
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...ijcsa
Social Tagging System is the process in which user makes their interest by tagging on a particular item. These STS are in associated with web 2.0 and has sourceful information for the users with their recommendations. It provides different types of recommendations are modeled by a 3-order tensor, on which multiway latent semantic analysis and dimensionality reduction is performed using both the Higher Order Singular Value Decomposition (HOSVD) method and the KernelSVD smoothing technique. We provide now with the 4-order tensor approach, which we named as Tensor Reduction. Here the items that are tagged can be viewed by the user who are recommended the same item and tagged over it. There by can improve the social tagging recommendations efficiency and also the unwanted request has been controlled. The results show significant improvements in terms of effectiveness.
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
How to social scientists use link data (11 june2010)Han Woo PARK
The author would like to thank Bernie Horgan, Rob Ackland, Jeong-Soo Seo, and Yeon-ok Lee for their helpful comments on an earlier draft. Part of this research was carried out during the author’s stay at the Oxford Internet Institute. During the preparation of final manuscript, this research is supported from the WCU project granted from South Korean Government. This paper has been presented at the 2010 International Communication Association conference held in Singapore. http://www.icahdq.org/conferences/2010/
This document discusses methods for collecting reliable data on user behavior from online social networks. It reviews previous research that has collected social network data through OSN APIs/crawling, surveys/questionnaires, and custom application deployment. The authors argue that these methods can produce biased data as they often only collect publicly shared content or rely on self-reported information. The authors propose combining existing methods, specifically the Experience Sampling Method, to collect more reliable data by capturing both shared and non-shared user behaviors and contexts through in situ experiments.
This document summarizes a research paper that analyzed social subgroups and community structure on social networking websites. The paper used the NodeXL tool to analyze Twitter data and identify the most influential group discussing "foreign affairs". It found that 232 users tweeted about foreign affairs, forming 30 groups. The largest group had 71 users and 93 unique connections. Network analysis metrics like in-degree, betweenness centrality, and eigenvector centrality identified the most influential users within the network discussing foreign affairs. This analysis can help organizations understand influential users and groups discussing certain topics on social media.
1) Social media generates huge amounts of data every minute, including hundreds of thousands of tweets, Facebook posts, and Instagram likes.
2) Data mining involves analyzing massive amounts of information extracted from social media to gain insights about human behavior and interactions at a large scale.
3) Proper analysis of social media data requires a multidisciplinary approach drawing from fields like sociology, computer science, statistics, and neuroscience given the different forms social data can take and inherent challenges.
Classification-based Retrieval Methods to Enhance Information Discovery on th...IJMIT JOURNAL
The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a whole and for the technology used to build and maintain the Web. The ongoing struggle of information retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised important social concerns in the areas of privacy, censorship, and access to information. On a technical level, the novelty of the Web and the pace of its growth have created challenges not only in the development of new applications that realize the power of the Web, but also in the technology needed to scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents searching algorithms and hierarchical classification techniques for increasing a search service's understanding of web queries. Existing search services rely solely on a query's occurrence in the document collection to locate relevant documents. They typically do not perform any task or topic-based analysis of queries using other available resources, and do not leverage changes in user query patterns over time. Provided within are a set of techniques and metrics for performing temporal analysis on query logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing trends and patterns in the query stream, thus providing valuable data to a search service.
This document discusses data mining techniques for social media. It explains that graph mining is used to cluster similar data together based on relationships and connections between nodes in a graph. Graph mining on Facebook can be used to search for friends, places, and interests while ranking results by strength of connections. Text mining extracts meaningful information from unstructured text data on social networks and can automatically process emails, classify texts, and potentially extract full information from websites.
This document discusses analyzing social bookmarking data from Delicious to study the network of tags, URLs, and users around the topic of globalization of agriculture. The methodology involved collecting data on tags related to globalization and agriculture from Delicious over a month period in 2011. Over 60,000 taggings from nearly 4,000 users and 5,000 URLs were analyzed using social network analysis software. Key findings included the network being highly centralized around a few influential nodes, with the top URLs coming from major news sites and the top users being academics and activists. Tag networks were much denser than user or URL networks.
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
1. The document proposes a framework called SociRank to identify and rank prevalent news topics using social media factors.
2. SociRank identifies topics prevalent in both social media and news media, and then ranks them based on their media focus in news, user attention in social media, and user interaction regarding the topic.
3. The experiments show that SociRank improves the quality and variety of automatically identified news topics compared to other topic identification and ranking methods.
A topology based approach twittersdlfkjsdlkfjKunal Mittal
This document presents a topology-based approach for recommending followees (users to follow) on Twitter. The algorithm explores the graph of connections starting from a target user, selects candidate followees, and ranks them based on factors like the number of followers, number of common friends with the target user, and how often the candidate appears in the network. The approach was evaluated in an experiment with 14 real Twitter users to test how well it identified potentially interesting users to follow. Results showed the algorithm's potential for followee recommendation on Twitter by exploiting the social network structure rather than tweet content.
This document discusses data mining techniques for social media. It begins by reviewing the growth of popular social media sites like Facebook, YouTube, and Twitter. It then discusses how social media generates huge amounts of user data through interactions and content sharing. The document outlines opportunities to use data mining on social networks to gain insights into human behavior, marketing analytics, and more. It reviews common problems studied, like community detection, node classification, and modeling information flow. The conclusion emphasizes that social media provides a massive, open dataset for developing recommender systems and targeting marketing through predictive analysis of user interests and trends.
Survey of data mining techniques for socialFiras Husseini
This document summarizes data mining techniques that have been used for social network analysis. It discusses how social networks generate massive amounts of data that present computational challenges due to their size, noise, and dynamism. It then reviews both traditional and recent unsupervised, semi-supervised, and supervised data mining techniques that have been applied to social network analysis to handle these challenges and discover useful knowledge from social network data, including graph theoretic techniques, tools for analyzing opinions and sentiment, and techniques for topic detection and tracking.
Mining and Analyzing Academic Social NetworksEditor IJCATR
Academics establish relationships by way of various interactions like jointly authoring a research paper or report, jointly
supervising a thesis, working jointly on a project, etc. Some of these relationships are ubiquitous whereas other are hard to keep track
of. Of all types of possible academic and research collaborations, co-authorship is best documented. In this paper we analyze the coauthorship
based academic social networks of computer science engineering departments of Indian Institutes of Technology (IITs) as
evidenced from their research publications produced during 2011 and 2015. We use social network analysis metrics to study the
collaboration networks in four leading IITs. From experimental results it can be concluded that IIT Delhi and IIT Kharagpur have a
close knit collaboration network whereas the collaboration network of IIT Kanpur and IIT Madras is fragmented. However, the
collaboration networks of all the four IITs exhibit similar network properties as expected from any other collaboration network
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
Identification of inference attacks on private Information from Social Networkseditorjournal
Online social networks, like
Facebook, twitter are increasingly utilized by
many people. These networks permit users to
publish details about them and to connect to
their friends. Some of the details revealed
inside these networks are meant to be
keeping private. Yet it is possible to use
learning algorithms and methods on released
data have to predict private information,
which cause inference attacks. This paper
discovers how to launch inference attacks
using released social networking details to
predict private information’s. It then
separate three possible sanitization
algorithms that could be used in various
situations. Then, it investigates the
effectiveness of these techniques and tries to
use methods of collective inference
techniques to determine sensitive attributes
of the user data set. It shows that it can
decline the effectiveness of both the local and
relational classification algorithms by using
the sanitization methods we described.
Adaptive Search Based On User Tags in Social NetworkingIOSR Journals
This document summarizes an article about adaptive search based on user tags in social networking. It discusses using tags that users apply to images in social media sites like Flickr to improve image search and personalize results. It proposes using topic models to identify different meanings of ambiguous tags and a user's interests to display more relevant images. The framework involves reranking images based on aesthetics scores predicted from user comments, and using tag-based and group-based metadata to discover topics and personalize search results. Future work could further analyze community-generated metadata to identify interests and refine search algorithms.
Integrating content search with structure analysis for hypermedia retrieval a...unyil96
This document summarizes research on integrating content search and structure analysis for hypermedia retrieval and management. It discusses how link analysis and topic distillation techniques can organize query results and identify authoritative pages. Database approaches aim to facilitate search, navigation and associating web pages through extended query languages and logical document representations. Overall the paper outlines the state-of-the-art in utilizing both content and link structure to improve hypermedia search and organization.
This document discusses social data mining. It begins by defining data, information, and knowledge. It then defines data mining as extracting useful unknown information from large datasets. Social data mining is defined as systematically analyzing valuable information from social media, which is vast, noisy, distributed, unstructured, and dynamic. Common social media platforms are described. Graph mining and text mining are discussed as important techniques for social data mining. The generic social data mining process of data collection, modeling, and various mining methods is outlined. OAuth 2.0 authorization is also summarized as an important process for applications to access each other's data.
The social network analysis (SNA), branch of complex systems can be used in the construction of multiagent
systems. This paper proposes a study of how social network analysis can assist in modeling multiagent
systems, while addressing similarities and differences between the two theories. We built a prototype
of multi-agent systems for resolution of tasks through the formation of teams of agents that are formed on
the basis of the social network established between agents. Agents make use of performance indicators to
assess when should change their social network to maximize the participation in teams.
Social media-based systems: an emerging area of information systems research ...Nurhazman Abdul Aziz
This article presents a review of the social media-based systems; an emerging
area of information system research, design, and practice shaped by social media phenomenon. Social media-based system (SMS) is the application of a wider range of social software and social media phenomenon in organizational and non-organization context to facilitate every day interactions. To characterize SMS, a total of 274 articles (published during 2003–2011) were analyzed that were classified as computer science information system related in the Web of Science data base and had at least one social media phenomenon related keyword—social media; social network analysis; social network; social network site; and social network system. As a result, we found four main research streams in SMS research dealing with: (1) organizational aspect of SMS, (2) non-organizational aspect of SMS, (3) technical aspect of SMS, and (4) social as a tool. The results indicates that SMS research is fragmented and has not yet found way into the core IS journals, however, it is diverse and interdisciplinary in nature. We also proposed that unlike the
conventional and socio-technical IS where information is bureaucratic, formal, bounded within the intranet, and tightly controlled by organizations; in the SMS context, information is social, informal, boundary-less (i.e. boundary is within the internet), has less control, and more sharing of information may lead to higher value/impact.
The document discusses a review process for analyzing contextual human information behavior factors in web usage mining. It first searches journals and search engines to find empirical studies related to gender differences, prior knowledge and cognitive styles. These studies are then examined to analyze how these three human factors impact web-based interactions. While some commercial analysis applications exist, more work still needs to be done by researchers and developers to build efficient and powerful tools for studying human information behavior.
Comparison of the Formal Specification Languages Based Upon Various ParametersIOSR Journals
This document compares various formal specification languages based on different parameters. It describes Z notation, OCL, VDM, SDL and Larch languages. Z notation uses set theory and logic to model state using schemas. OCL uses constraints to describe UML models. VDM uses basic types and functions to formally specify models. SDL specifies systems as communicating finite state machines. Larch uses an interface language and shared language to specify behaviors. The languages differ based on whether they are process-oriented, sequential-oriented, model-oriented or property-oriented and the underlying mathematics used like set theory, logic or algebra.
This document studies the energy absorption buildup factor (EABF) in some soils. It calculates the EABF for various soil samples from India in the energy range of 0.015-15 MeV and penetration depths up to 40 mean free paths. The EABF is calculated using the five parameter geometrical progression fitting approximation. The results show that the EABF increases with penetration depth and peaks at intermediate energies from 0.15-0.8 MeV due to the dominance of the Compton effect. The EABF then decreases at higher energies above 2 MeV due to the increased effects of pair production. The study provides insights into how the EABF of soils varies with photon energy and penetration depth.
How to social scientists use link data (11 june2010)Han Woo PARK
The author would like to thank Bernie Horgan, Rob Ackland, Jeong-Soo Seo, and Yeon-ok Lee for their helpful comments on an earlier draft. Part of this research was carried out during the author’s stay at the Oxford Internet Institute. During the preparation of final manuscript, this research is supported from the WCU project granted from South Korean Government. This paper has been presented at the 2010 International Communication Association conference held in Singapore. http://www.icahdq.org/conferences/2010/
This document discusses methods for collecting reliable data on user behavior from online social networks. It reviews previous research that has collected social network data through OSN APIs/crawling, surveys/questionnaires, and custom application deployment. The authors argue that these methods can produce biased data as they often only collect publicly shared content or rely on self-reported information. The authors propose combining existing methods, specifically the Experience Sampling Method, to collect more reliable data by capturing both shared and non-shared user behaviors and contexts through in situ experiments.
This document summarizes a research paper that analyzed social subgroups and community structure on social networking websites. The paper used the NodeXL tool to analyze Twitter data and identify the most influential group discussing "foreign affairs". It found that 232 users tweeted about foreign affairs, forming 30 groups. The largest group had 71 users and 93 unique connections. Network analysis metrics like in-degree, betweenness centrality, and eigenvector centrality identified the most influential users within the network discussing foreign affairs. This analysis can help organizations understand influential users and groups discussing certain topics on social media.
1) Social media generates huge amounts of data every minute, including hundreds of thousands of tweets, Facebook posts, and Instagram likes.
2) Data mining involves analyzing massive amounts of information extracted from social media to gain insights about human behavior and interactions at a large scale.
3) Proper analysis of social media data requires a multidisciplinary approach drawing from fields like sociology, computer science, statistics, and neuroscience given the different forms social data can take and inherent challenges.
Classification-based Retrieval Methods to Enhance Information Discovery on th...IJMIT JOURNAL
The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a whole and for the technology used to build and maintain the Web. The ongoing struggle of information retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised important social concerns in the areas of privacy, censorship, and access to information. On a technical level, the novelty of the Web and the pace of its growth have created challenges not only in the development of new applications that realize the power of the Web, but also in the technology needed to scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents searching algorithms and hierarchical classification techniques for increasing a search service's understanding of web queries. Existing search services rely solely on a query's occurrence in the document collection to locate relevant documents. They typically do not perform any task or topic-based analysis of queries using other available resources, and do not leverage changes in user query patterns over time. Provided within are a set of techniques and metrics for performing temporal analysis on query logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing trends and patterns in the query stream, thus providing valuable data to a search service.
This document discusses data mining techniques for social media. It explains that graph mining is used to cluster similar data together based on relationships and connections between nodes in a graph. Graph mining on Facebook can be used to search for friends, places, and interests while ranking results by strength of connections. Text mining extracts meaningful information from unstructured text data on social networks and can automatically process emails, classify texts, and potentially extract full information from websites.
This document discusses analyzing social bookmarking data from Delicious to study the network of tags, URLs, and users around the topic of globalization of agriculture. The methodology involved collecting data on tags related to globalization and agriculture from Delicious over a month period in 2011. Over 60,000 taggings from nearly 4,000 users and 5,000 URLs were analyzed using social network analysis software. Key findings included the network being highly centralized around a few influential nodes, with the top URLs coming from major news sites and the top users being academics and activists. Tag networks were much denser than user or URL networks.
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
1. The document proposes a framework called SociRank to identify and rank prevalent news topics using social media factors.
2. SociRank identifies topics prevalent in both social media and news media, and then ranks them based on their media focus in news, user attention in social media, and user interaction regarding the topic.
3. The experiments show that SociRank improves the quality and variety of automatically identified news topics compared to other topic identification and ranking methods.
A topology based approach twittersdlfkjsdlkfjKunal Mittal
This document presents a topology-based approach for recommending followees (users to follow) on Twitter. The algorithm explores the graph of connections starting from a target user, selects candidate followees, and ranks them based on factors like the number of followers, number of common friends with the target user, and how often the candidate appears in the network. The approach was evaluated in an experiment with 14 real Twitter users to test how well it identified potentially interesting users to follow. Results showed the algorithm's potential for followee recommendation on Twitter by exploiting the social network structure rather than tweet content.
This document discusses data mining techniques for social media. It begins by reviewing the growth of popular social media sites like Facebook, YouTube, and Twitter. It then discusses how social media generates huge amounts of user data through interactions and content sharing. The document outlines opportunities to use data mining on social networks to gain insights into human behavior, marketing analytics, and more. It reviews common problems studied, like community detection, node classification, and modeling information flow. The conclusion emphasizes that social media provides a massive, open dataset for developing recommender systems and targeting marketing through predictive analysis of user interests and trends.
Survey of data mining techniques for socialFiras Husseini
This document summarizes data mining techniques that have been used for social network analysis. It discusses how social networks generate massive amounts of data that present computational challenges due to their size, noise, and dynamism. It then reviews both traditional and recent unsupervised, semi-supervised, and supervised data mining techniques that have been applied to social network analysis to handle these challenges and discover useful knowledge from social network data, including graph theoretic techniques, tools for analyzing opinions and sentiment, and techniques for topic detection and tracking.
Mining and Analyzing Academic Social NetworksEditor IJCATR
Academics establish relationships by way of various interactions like jointly authoring a research paper or report, jointly
supervising a thesis, working jointly on a project, etc. Some of these relationships are ubiquitous whereas other are hard to keep track
of. Of all types of possible academic and research collaborations, co-authorship is best documented. In this paper we analyze the coauthorship
based academic social networks of computer science engineering departments of Indian Institutes of Technology (IITs) as
evidenced from their research publications produced during 2011 and 2015. We use social network analysis metrics to study the
collaboration networks in four leading IITs. From experimental results it can be concluded that IIT Delhi and IIT Kharagpur have a
close knit collaboration network whereas the collaboration network of IIT Kanpur and IIT Madras is fragmented. However, the
collaboration networks of all the four IITs exhibit similar network properties as expected from any other collaboration network
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
Identification of inference attacks on private Information from Social Networkseditorjournal
Online social networks, like
Facebook, twitter are increasingly utilized by
many people. These networks permit users to
publish details about them and to connect to
their friends. Some of the details revealed
inside these networks are meant to be
keeping private. Yet it is possible to use
learning algorithms and methods on released
data have to predict private information,
which cause inference attacks. This paper
discovers how to launch inference attacks
using released social networking details to
predict private information’s. It then
separate three possible sanitization
algorithms that could be used in various
situations. Then, it investigates the
effectiveness of these techniques and tries to
use methods of collective inference
techniques to determine sensitive attributes
of the user data set. It shows that it can
decline the effectiveness of both the local and
relational classification algorithms by using
the sanitization methods we described.
Adaptive Search Based On User Tags in Social NetworkingIOSR Journals
This document summarizes an article about adaptive search based on user tags in social networking. It discusses using tags that users apply to images in social media sites like Flickr to improve image search and personalize results. It proposes using topic models to identify different meanings of ambiguous tags and a user's interests to display more relevant images. The framework involves reranking images based on aesthetics scores predicted from user comments, and using tag-based and group-based metadata to discover topics and personalize search results. Future work could further analyze community-generated metadata to identify interests and refine search algorithms.
Integrating content search with structure analysis for hypermedia retrieval a...unyil96
This document summarizes research on integrating content search and structure analysis for hypermedia retrieval and management. It discusses how link analysis and topic distillation techniques can organize query results and identify authoritative pages. Database approaches aim to facilitate search, navigation and associating web pages through extended query languages and logical document representations. Overall the paper outlines the state-of-the-art in utilizing both content and link structure to improve hypermedia search and organization.
This document discusses social data mining. It begins by defining data, information, and knowledge. It then defines data mining as extracting useful unknown information from large datasets. Social data mining is defined as systematically analyzing valuable information from social media, which is vast, noisy, distributed, unstructured, and dynamic. Common social media platforms are described. Graph mining and text mining are discussed as important techniques for social data mining. The generic social data mining process of data collection, modeling, and various mining methods is outlined. OAuth 2.0 authorization is also summarized as an important process for applications to access each other's data.
The social network analysis (SNA), branch of complex systems can be used in the construction of multiagent
systems. This paper proposes a study of how social network analysis can assist in modeling multiagent
systems, while addressing similarities and differences between the two theories. We built a prototype
of multi-agent systems for resolution of tasks through the formation of teams of agents that are formed on
the basis of the social network established between agents. Agents make use of performance indicators to
assess when should change their social network to maximize the participation in teams.
Social media-based systems: an emerging area of information systems research ...Nurhazman Abdul Aziz
This article presents a review of the social media-based systems; an emerging
area of information system research, design, and practice shaped by social media phenomenon. Social media-based system (SMS) is the application of a wider range of social software and social media phenomenon in organizational and non-organization context to facilitate every day interactions. To characterize SMS, a total of 274 articles (published during 2003–2011) were analyzed that were classified as computer science information system related in the Web of Science data base and had at least one social media phenomenon related keyword—social media; social network analysis; social network; social network site; and social network system. As a result, we found four main research streams in SMS research dealing with: (1) organizational aspect of SMS, (2) non-organizational aspect of SMS, (3) technical aspect of SMS, and (4) social as a tool. The results indicates that SMS research is fragmented and has not yet found way into the core IS journals, however, it is diverse and interdisciplinary in nature. We also proposed that unlike the
conventional and socio-technical IS where information is bureaucratic, formal, bounded within the intranet, and tightly controlled by organizations; in the SMS context, information is social, informal, boundary-less (i.e. boundary is within the internet), has less control, and more sharing of information may lead to higher value/impact.
The document discusses a review process for analyzing contextual human information behavior factors in web usage mining. It first searches journals and search engines to find empirical studies related to gender differences, prior knowledge and cognitive styles. These studies are then examined to analyze how these three human factors impact web-based interactions. While some commercial analysis applications exist, more work still needs to be done by researchers and developers to build efficient and powerful tools for studying human information behavior.
Comparison of the Formal Specification Languages Based Upon Various ParametersIOSR Journals
This document compares various formal specification languages based on different parameters. It describes Z notation, OCL, VDM, SDL and Larch languages. Z notation uses set theory and logic to model state using schemas. OCL uses constraints to describe UML models. VDM uses basic types and functions to formally specify models. SDL specifies systems as communicating finite state machines. Larch uses an interface language and shared language to specify behaviors. The languages differ based on whether they are process-oriented, sequential-oriented, model-oriented or property-oriented and the underlying mathematics used like set theory, logic or algebra.
This document studies the energy absorption buildup factor (EABF) in some soils. It calculates the EABF for various soil samples from India in the energy range of 0.015-15 MeV and penetration depths up to 40 mean free paths. The EABF is calculated using the five parameter geometrical progression fitting approximation. The results show that the EABF increases with penetration depth and peaks at intermediate energies from 0.15-0.8 MeV due to the dominance of the Compton effect. The EABF then decreases at higher energies above 2 MeV due to the increased effects of pair production. The study provides insights into how the EABF of soils varies with photon energy and penetration depth.
Image Steganography Based On Hill Cipher with Key Hiding TechniqueIOSR Journals
This document presents a method for image steganography using Hill cipher encryption with a hidden key. It begins with background on steganography, cryptography, and the Hill cipher algorithm. The proposed method hides an encrypted ciphertext and encrypted key within the pixel values of a cover image. To encrypt, it applies Hill cipher to the plaintext, hides the ciphertext in the cover image pixels, and encrypts and embeds the key. It transmits the steganographic image. To decrypt, it extracts the key from the pixel values, decrypts the key, and uses the inverse Hill cipher to recover the original plaintext. The key is transformed and diffused within the pixel values, hiding it and making the system more secure for network transmission without
Comparison Study of Lossless Data Compression Algorithms for Text DataIOSR Journals
This document compares the performance of three lossless data compression algorithms (Huffman, Shannon-Fano, and LZW) on text data files of various types and sizes. Ten text files ranging from 5KB to 528KB were used as a test bed. The algorithms were implemented in Java and evaluated based on compression ratio, compression/decompression time, and file size savings. Results showed that LZW generally provided better compression ratios and file size reductions than the other two algorithms, though it took longer for compression. Shannon-Fano compression times were generally faster than Huffman's. The best algorithm to use depends on the file type and priorities around compression speed versus ratio.
IOSR Journal of Applied Chemistry (IOSR-JAC) is an open access international journal that provides rapid publication (within a month) of articles in all areas of applied chemistry and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in Chemical Science. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
The document is a receipt for dental supplies received from the supplier DENTSPLY INTERNATIONAL MAILLEFER on January 24, 2013. It lists over 100 items received including files, burs, broaches, reamers, and other dental instruments. The items are coded and include descriptions and catalog numbers. A transport company delivered the supplies, which were received by the logistics coordinator.
IOSR Journal of Electrical and Electronics Engineering(IOSR-JEEE) is an open access international journal that provides rapid publication (within a month) of articles in all areas of electrical and electronics engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in electrical and electronics engineering. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
An Explanatory Analysis of the Economic and Social Impact of Corruption in Zi...IOSR Journals
Corruption has significant social and economic impacts in Zimbabwe. It is prevalent across both public and private sectors. Corruption diverts public funds from important services like education, healthcare and infrastructure development. It undermines good governance and rule of law. Studies estimate Zimbabwe has lost over $12 billion to illicit financial flows between 1980-2010 due to corruption. This has detrimental effects by exacerbating poverty and inequality. Corruption also increases the costs of doing business. Reducing corruption is important for Zimbabwe's economic recovery and achieving sustainable development goals.
A New Theoretical Approach to Location Based Power Aware RoutingIOSR Journals
This document proposes a new theoretical approach to location based power aware routing in mobile ad hoc networks (MANETs). It aims to extend the network lifetime by improving power utilization during routing. The approach uses nodes' location information, remaining battery power, and bandwidth status to assign link stability and select routes with lower "uptime values" and minimum bandwidth over time. This is hypothesized to better utilize nodes' power sources and bandwidth. The document outlines calculating a root up time factor for each node based on its power backup and required power, and only using nodes with maximum backup. It concludes future work will design and validate a new protocol based on this approach.
Mobile Networking and Ad hoc routing protocols validationIOSR Journals
This document discusses mobile networking and ad hoc routing protocols. It begins with an overview of cellular phone networks and their growth in usage. It then describes mobile ad hoc networks and some of the challenges in designing routing protocols for them. The document evaluates two model checking tools, SPIN and UPPAAL, and discusses their ability to verify properties of ad hoc routing protocols through formal validation methods.
Implementation of Secure Cloud Storage Gateway using Symmetric Key AlgorithmIOSR Journals
This document presents a mechanism for securely outsourcing linear programming (LP) computations to the cloud. It aims to achieve input/output privacy, correctness of results, and efficiency. The mechanism uses problem transformation techniques that encrypt the LP problem submitted by the customer and map it to an arbitrary encrypted form. It also develops an affine mapping of the decision variables to encrypt the feasible solution space. To verify results, it utilizes the duality theorem of LP to generate necessary and sufficient conditions for a correct solution. Extensive security analysis and experiments demonstrate the practicality of the approach.
Synchronization of Photo-voltaic system with a GridIOSR Journals
The document summarizes a study on synchronizing a photovoltaic (PV) system with a single phase grid. A digital phase locked loop (DPLL) technique is used for synchronization. An optocoupler detects the grid voltage zero crossings and a microcontroller using pulse width modulation controls the inverter to match the grid frequency and phase. Simulation results show the PV system output is synchronized with the grid with low harmonics. Hardware implementation involves using an optocoupler, microcontroller and inverter to synchronize the PV system voltage, frequency and phase to the grid.
This document describes a study on using a supercapacitor to power small electronic appliances. Key points:
- Researchers designed a buck converter to charge a supercapacitor module from a photovoltaic (solar) module. This allows fast charging of the supercapacitor in rural areas without grid power.
- The charged supercapacitor can then power electronic devices like mobile phones. Supercapacitors have a much longer cycle life than lithium-ion batteries commonly used in such devices.
- Experimental results showed the supercapacitor module could be charged to 10V in around 800 seconds using a constant current, constant voltage charging method from the solar panel and buck converter. This level of charging could then power small
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...IOSR Journals
The document proposes an innovative vision-based page segmentation (IVBPS) algorithm to improve hidden web content extraction. It aims to overcome limitations of existing approaches that rely heavily on HTML structure. IVBPS extracts blocks from the visual representation of a page and clusters them to segment the page semantically. It uses layout features like position and appearance to locate data regions and extract records. The algorithm analyzes the entire page structure rather than local regions, allowing it to retain content DOM tree methods may discard. This is expected to significantly improve hidden web extraction performance.
Heart Attack Prediction System Using Fuzzy C Means ClassifierIOSR Journals
This document presents a heart attack prediction system using a fuzzy C-means classifier. The system utilizes 13 patient attributes as inputs to the fuzzy C-means classifier to determine the risk of a heart attack. The classifier was tested on medical records from 270 patients and achieved a classification accuracy of 92%. Fuzzy C-means clustering allows data points to belong to multiple clusters, providing a more efficient and cost-effective way to predict the likelihood of patients experiencing a heart attack compared to other algorithms.
The document describes a molecular docking study of aspirin and aspirin derivatives using the HVR protein (HIV protease receptor). The study found that compounds SR-03, SR-02, and SR-04 showed the best docking scores and interactions with the HVR protein, indicating they may have potential anti-HIV activity. It was concluded that electron withdrawing groups attached to the aryl substituent of the carboxylic acid group in aspirin increase affinity for the HIV protein, while electron donating groups decrease affinity. Further studies are needed to determine the exact mechanisms of action.
7 Tips for Selling Expensive Collectibles On eBaybelieve52
The document provides 7 tips for selling expensive collectibles on eBay:
1. Set a reserve price that is the minimum you will accept and start the bidding very low to attract buyers.
2. Provide detailed descriptions of how the item will be carefully packed to prevent damage.
3. Require buyers to pay for insurance to protect both parties from liability if the collectible is broken.
4. Verify the authenticity of the collectible by providing details, photos or certificates of authenticity.
5. Do not offer returns but suggest using an escrow service for high value items to assure buyers.
6. Take extra precautions like verifying funds if shipping internationally due to higher fraud risks overseas.
Development and Validation of prediction for estimating resting energy expend...IOSR Journals
This document describes a study conducted to develop prediction equations for estimating resting energy expenditure (REE) in Indian subjects. Researchers measured body composition parameters of 100 Indian subjects using bioelectrical impedance analysis at frequencies of 5 kHz, 50 kHz, 100 kHz, and 200 kHz. Multiple regression analysis was used to develop two sets of REE prediction equations: 1) equations estimating REE at each frequency based on sex, age, weight, and impedance index, and 2) an equation estimating overall REE based on sex, age, fat-free mass, and fat mass. The predicted REE values from the equations closely matched measured REE values from the instrument, validating the developed prediction equations as the first such equations for Indian subjects
The document discusses analyzing political trends on social networks using the Hidden Markov Model. It begins by introducing how social network data can be analyzed to observe user behaviors and interests. It then discusses using NodeXL to gather Twitter data based on political keywords and applying the Hidden Markov Model to statistically analyze the data and determine what political topics people focus on most. Finally, it reviews related work where other researchers have used techniques like the Hidden Markov Model and social network analysis to gather and analyze data from social media platforms.
This document provides a review of different classifiers used for text classification on social media data. It discusses how social media data is often unstructured and contains users' opinions and sentiments. Various machine learning algorithms can be used to classify this social media text data, extracting meaningful information. The document focuses on describing Naive Bayes classifiers, which are commonly used for text classification tasks. It explains how Naive Bayes classifiers work by calculating the posterior probability that a document belongs to a certain class, based on applying Bayes' theorem with an independence assumption between features.
Integrated expert recommendation model for online communitiesst02IJwest
Online communities have become vital places for Web 2.0 users to share knowledg
e and experiences.
Recently, finding expertise user in community has become an important research issue. This paper
proposes a novel cascaded model for expert recommendation using aggregated knowledge extracted from
enormous contents and social network fe
atures. Vector space model is used to compute the relevance of
published content with respect
to a specific query while PageRank
algorithm is applied to rank candidate
experts. The experimental results sho
w that the proposed model is
an effective recommen
dation which can
guarantee that the most candidate experts are both highly relevant to the specific queries and highly
influential in corresponding areas
2009-Social computing-Analyzing social media networksMarc Smith
This document summarizes research on analyzing social networks within enterprises that have adopted social media applications. Key points:
- Social media applications generate social networks as employees interact by creating connections, replying to messages, collaborating on documents, and mentioning common topics. These networks reveal insights into an organization's structure and dynamics.
- Network analysis uses metrics from graph theory to describe network properties like individual roles (e.g. discussion starters), overall shape and size, and each individual's connections. Visualizations can highlight important people, events, and subgroups.
- Early social network analysis relied on manually collected data, limiting its use. Now, automatically captured social media data creates networks without explicit surveys, providing rich new data
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...inventionjournals
The problem of web search time complexity and accuracy has been visited in many research papers, and the authors discussed many approaches to improve the search performance. Still the approaches does not produce any noticeable improvement and struggles with more time complexity as well. To overcome the issues identified, an efficient multi mode conceptual clustering algorithm has been discussed in this paper, which identifies the similar interested user groups by clustering their search context according to different conceptual queries. Identified user groups are shared with the related conceptual queries and their results to reduce the time complexity. The multi mode conceptual clustering, performs grouping of search queries and users according to number of users and their search pattern. The concept of search is identified by using Natural language processing methods and the web logs produced by the default web search engines. The author designed a dedicated web interface to collect the web log about the user search and the same data has been used to cluster the social groups according to number of conceptual queries. The search results has been shared between the users of identified social groups which reduces the search time complexity and improves the efficiency of web search in better manner
Social search interfaces aim to improve information seeking through collaboration. Three studies evaluated FeedMe, SearchTogether, and Coagmento. FeedMe allowed one-click link sharing but raised privacy concerns without public knowledge triggers. SearchTogether supported large group collaboration but users wanted more real-time features. Coagmento effectively supported group awareness but received low marks for personal awareness. Overall, more research is needed on privacy protections within social search interfaces to fully realize their benefits while respecting user privacy.
Scraping and Clustering Techniques for the Characterization of Linkedin Profilescsandit
The socialization of the web has undertaken a new dimension after the emergence of the Online
Social Networks (OSN) concept. The fact that each Internet user becomes a potential content
creator entails managing a big amount of data. This paper explores the most popular
professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million
public profiles. The application of natural language processing techniques (NLP) to classify the
educational background and to cluster the professional background of the collected profiles led
us to provide some insights about this OSN’s users and to evaluate the relationships between
educational degrees and professional careers.
The socialization of the web has undertaken a new dimension after the emergence of the Online
Social Networks (OSN) concept. The fact that each Internet user becomes a potential content
creator entails managing a big amount of data. This paper explores the most popular
professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million
public profiles. The application of natural language processing techniques (NLP) to classify the
educational background and to cluster the professional background of the collected profiles led
us to provide some insights about this OSN’s users and to evaluate the relationships between
educational degrees and professional careers.
Social Media Influence Analysis using Data Science TechniquesMuhammad Bilal
The major purpose of this literature search report is to demonstrate the usage of different tactics of data science to investigate impact of social media while considering the interaction between influences and their followers.
The technical report presents two social recommendation methods that incorporate semantics from tags: a user-based semantic collaborative filtering and an item-based semantic collaborative filtering. The methods aim to find semantically similar users/items and recommend relevant social items. Experimental results show the methods improve recommendation quality and address issues like polysemy, synonymy, and semantic interoperability compared to methods without semantics.
Recommending Communities at the Regional and City LevelJohn Verostek
An exploratory study analyzed Meetup groups in Boston, Silicon Valley, and New York City to understand differences in their technology and business community ecosystems. Machine learning was used to categorize groups, which found that Boston had more programming language groups while New York had stronger business networking groups. Social network analysis identified central groups in each city. Comparisons across cities found similar but not identical group types. Member interests were also analyzed to identify recommendations for new groups. The goal was to develop a system to recommend new or related groups to improve regional community landscapes.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A Social Network-Empowered Research Analytics Framework For Project SelectionNat Rice
This document proposes a social network-empowered research analytics framework to assist government funding agencies in selecting research projects. It builds researcher profiles using data from research proposals, publications, citations, and a research social network to capture relevance, productivity, and connectivity. An algorithm then matches proposals and reviewers based on these profiles to optimize reviewer assignments. The framework was implemented and tested by China's largest funding agency, generating cost savings and improved proposal evaluation.
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction
.
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction.
Literature Review of Information Behaviour on Social MediaDavid Thompson
Using your knowledge about information resource and skills in searching and evaluating information achieved in the first half of the semester, now you are required to choose a specific topic in the area of information research, explore the exisiting literature within this domain and write a literature review.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
The document presents a new model for intelligent social networks based on semantic tag ranking. It uses a multi-agent system approach with agents performing indexing and ranking. For indexing, it uses an enhanced Latent Dirichlet Allocation (E-LDA) model that optimizes LDA parameters. Tags above a threshold from E-LDA output are ranked using Tag Rank. Simulation results showed improvements in indexing and ranking over conventional methods. The model introduces semantics to social networks to improve search and link recommendation.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share
their interests without being at the same geographical location. With the great and rapid growth of Social
Media sites such as Facebook, LinkedIn, Twitter...etc. causes huge amount of user-generated content.
Thus, the improvement in the information quality and integrity becomes a great challenge to all social
media sites, which allows users to get the desired content or be linked to the best link relation using
improved search / link technique. So introducing semantics to social networks will widen up the
representation of the social networks.
Similaire à A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems (20)
This document provides a technical review of secure banking using RSA and AES encryption methodologies. It discusses how RSA and AES are commonly used encryption standards for secure data transmission between ATMs and bank servers. The document first provides background on ATM security measures and risks of attacks. It then reviews related work analyzing encryption techniques. The document proposes using a one-time password in addition to a PIN for ATM authentication. It concludes that implementing encryption standards like RSA and AES can make transactions more secure and build trust in online banking.
This document analyzes the performance of various modulation schemes for achieving energy efficient communication over fading channels in wireless sensor networks. It finds that for long transmission distances, low-order modulations like BPSK are optimal due to their lower SNR requirements. However, as transmission distance decreases, higher-order modulations like 16-QAM and 64-QAM become more optimal since they can transmit more bits per symbol, outweighing their higher SNR needs. Simulations show lifetime extensions up to 550% are possible in short-range networks by using higher-order modulations instead of just BPSK. The optimal modulation depends on transmission distance and balancing the energy used by electronic components versus power amplifiers.
This document provides a review of mobility management techniques in vehicular ad hoc networks (VANETs). It discusses three modes of communication in VANETs: vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and hybrid vehicle (HV) communication. For each communication mode, different mobility management schemes are required due to their unique characteristics. The document also discusses mobility management challenges in VANETs and outlines some open research issues in improving mobility management for seamless communication in these dynamic networks.
This document provides a review of different techniques for segmenting brain MRI images to detect tumors. It compares the K-means and Fuzzy C-means clustering algorithms. K-means is an exclusive clustering algorithm that groups data points into distinct clusters, while Fuzzy C-means is an overlapping clustering algorithm that allows data points to belong to multiple clusters. The document finds that Fuzzy C-means requires more time for brain tumor detection compared to other methods like hierarchical clustering or K-means. It also reviews related work applying these clustering algorithms to segment brain MRI images.
1) The document simulates and compares the performance of AODV and DSDV routing protocols in a mobile ad hoc network under three conditions: when users are fixed, when users move towards the base station, and when users move away from the base station.
2) The results show that both protocols have higher packet delivery and lower packet loss when users are either fixed or moving towards the base station, since signal strength is better in those scenarios. Performance degrades when users move away from the base station due to weaker signals.
3) AODV generally has better performance than DSDV, with higher throughput and packet delivery rates observed across the different user mobility conditions.
This document describes the design and implementation of 4-bit QPSK and 256-bit QAM modulation techniques using MATLAB. It compares the two techniques based on SNR, BER, and efficiency. The key steps of implementing each technique in MATLAB are outlined, including generating random bits, modulation, adding noise, and measuring BER. Simulation results show scatter plots and eye diagrams of the modulated signals. A table compares the results, showing that 256-bit QAM provides better performance than 4-bit QPSK. The document concludes that QAM modulation is more effective for digital transmission systems.
The document proposes a hybrid technique using Anisotropic Scale Invariant Feature Transform (A-SIFT) and Robust Ensemble Support Vector Machine (RESVM) to accurately identify faces in images. A-SIFT improves upon traditional SIFT by applying anisotropic scaling to extract richer directional keypoints. Keypoints are processed with RESVM and hypothesis testing to increase accuracy above 95% by repeatedly reprocessing images until the threshold is met. The technique was tested on similar and different facial images and achieved better results than SIFT in retrieval time and reduced keypoints.
This document studies the effects of dielectric superstrate thickness on microstrip patch antenna parameters. Three types of probes-fed patch antennas (rectangular, circular, and square) were designed to operate at 2.4 GHz using Arlondiclad 880 substrate. The antennas were tested with and without an Arlondiclad 880 superstrate of varying thicknesses. It was found that adding a superstrate slightly degraded performance by lowering the resonant frequency and increasing return loss and VSWR, while decreasing bandwidth and gain. Specifically, increasing the superstrate thickness or dielectric constant resulted in greater changes to the antenna parameters.
This document describes a wireless environment monitoring system that utilizes soil energy as a sustainable power source for wireless sensors. The system uses a microbial fuel cell to generate electricity from the microbial activity in soil. Two microbial fuel cells were created using different soil types and various additives to produce different current and voltage outputs. An electronic circuit was designed on a printed circuit board with components like a microcontroller and ZigBee transceiver. Sensors for temperature and humidity were connected to the circuit to monitor the environment wirelessly. The system provides a low-cost way to power remote sensors without needing battery replacement and avoids the high costs of wiring a power source.
1) The document proposes a model for a frequency tunable inverted-F antenna that uses ferrite material.
2) The resonant frequency of the antenna can be significantly shifted from 2.41GHz to 3.15GHz, a 31% shift, by increasing the static magnetic field placed on the ferrite material.
3) Altering the permeability of the ferrite allows tuning of the antenna's resonant frequency without changing the physical dimensions, providing flexibility to operate over a wide frequency range.
This document summarizes a research paper that presents a speech enhancement method using stationary wavelet transform. The method first classifies speech into voiced, unvoiced, and silence regions based on short-time energy. It then applies different thresholding techniques to the wavelet coefficients of each region - modified hard thresholding for voiced speech, semi-soft thresholding for unvoiced speech, and setting coefficients to zero for silence. Experimental results using speech from the TIMIT database corrupted with white Gaussian noise at various SNR levels show improved performance over other popular denoising methods.
This document reviews the design of an energy-optimized wireless sensor node that encrypts data for transmission. It discusses how sensing schemes that group nodes into clusters and transmit aggregated data can reduce energy consumption compared to individual node transmissions. The proposed node design calculates the minimum transmission power needed based on received signal strength and uses a periodic sleep/wake cycle to optimize energy when not sensing or transmitting. It aims to encrypt data at both the node and network level to further optimize energy usage for wireless communication.
This document discusses group consumption modes. It analyzes factors that impact group consumption, including external environmental factors like technological developments enabling new forms of online and offline interactions, as well as internal motivational factors at both the group and individual level. The document then proposes that group consumption modes can be divided into four types based on two dimensions: vertical (group relationship intensity) and horizontal (consumption action period). These four types are instrument-oriented, information-oriented, enjoyment-oriented, and relationship-oriented consumption modes. Finally, the document notes that consumption modes are dynamic and can evolve over time.
The document summarizes a study of different microstrip patch antenna configurations with slotted ground planes. Three antenna designs were proposed and their performance evaluated through simulation: a conventional square patch, an elliptical patch, and a star-shaped patch. All antennas were mounted on an FR4 substrate. The effects of adding different slot patterns to the ground plane on resonance frequency, bandwidth, gain and efficiency were analyzed parametrically. Key findings were that reshaping the patch and adding slots increased bandwidth and shifted resonance frequency. The elliptical and star patches in particular performed better than the conventional design. Three antenna configurations were selected for fabrication and measurement based on the simulations: a conventional patch with a slot under the patch, an elliptical patch with slots
1) The document describes a study conducted to improve call drop rates in a GSM network through RF optimization.
2) Drive testing was performed before and after optimization using TEMS software to record network parameters like RxLevel, RxQuality, and events.
3) Analysis found call drops were occurring due to issues like handover failures between sectors, interference from adjacent channels, and overshooting due to antenna tilt.
4) Corrective actions taken included defining neighbors between sectors, adjusting frequencies to reduce interference, and lowering the mechanical tilt of an antenna.
5) Post-optimization drive testing showed improvements in RxLevel, RxQuality, and a reduction in dropped calls.
This document describes the design of an intelligent autonomous wheeled robot that uses RF transmission for communication. The robot has two modes - automatic mode where it can make its own decisions, and user control mode where a user can control it remotely. It is designed using a microcontroller and can perform tasks like object recognition using computer vision and color detection in MATLAB, as well as wall painting using pneumatic systems. The robot's movement is controlled by DC motors and it uses sensors like ultrasonic sensors and gas sensors to navigate autonomously. RF transmission allows communication between the robot and a remote control unit. The overall aim is to develop a low-cost robotic system for industrial applications like material handling.
This document reviews cryptography techniques to secure the Ad-hoc On-Demand Distance Vector (AODV) routing protocol in mobile ad-hoc networks. It discusses various types of attacks on AODV like impersonation, denial of service, eavesdropping, black hole attacks, wormhole attacks, and Sybil attacks. It then proposes using the RC6 cryptography algorithm to secure AODV by encrypting data packets and detecting and removing malicious nodes launching black hole attacks. Simulation results show that after applying RC6, the packet delivery ratio and throughput of AODV increase while delay decreases, improving the security and performance of the network under attack.
The document describes a proposed modification to the conventional Booth multiplier that aims to increase its speed by applying concepts from Vedic mathematics. Specifically, it utilizes the Urdhva Tiryakbhyam formula to generate all partial products concurrently rather than sequentially. The proposed 8x8 bit multiplier was coded in VHDL, simulated, and found to have a path delay 44.35% lower than a conventional Booth multiplier, demonstrating its potential for higher speed.
This document discusses image deblurring techniques. It begins by introducing image restoration and focusing on image deblurring. It then discusses challenges with image deblurring being an ill-posed problem. It reviews existing approaches to screen image deconvolution including estimating point spread functions and iteratively estimating blur kernels and sharp images. The document also discusses handling spatially variant blur and summarizes the relationship between the proposed method and previous work for different blur types. It proposes using color filters in the aperture to exploit parallax cues for segmentation and blur estimation. Finally, it proposes moving the image sensor circularly during exposure to prevent high frequency attenuation from motion blur.
This document describes modeling an adaptive controller for an aircraft roll control system using PID, fuzzy-PID, and genetic algorithm. It begins by introducing the aircraft roll control system and motivation for developing an adaptive controller to minimize errors from noisy analog sensor signals. It then provides the mathematical model of aircraft roll dynamics and describes modeling the real-time flight control system in MATLAB/Simulink. The document evaluates PID, fuzzy-PID, and PID-GA (genetic algorithm) controllers for aircraft roll control and finds that the PID-GA controller delivers the best performance.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
New techniques for characterising damage in rock slopes.pdf
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 11, Issue 5 (May. - Jun. 2013), PP 56-69
www.iosrjournals.org
www.iosrjournals.org 56 | Page
A Literature Survey on Ranking Tagged Web Documents in
Social Bookmarking Systems
Nisar Muhammad, Shah Khusro, Saeed Mehfooz, Azhar Rauf
(Department of Computer Science, University of Peshawar, Pakistan)
ABSTRACT: Social web applications like Facebook, YouTube, Delicious, Twitter and so many others have
gained popularity among masses due to its versatility and potential of accommodating cultural perspectives in
Social web paradigm. Social bookmarking systems facilitateusers to store, manageand share tagged web
documents through folk classification system. These social toolsallow its users to associate free chosen
keywords (tags) with documents for future considerations. Social tags reflect not only human cognition on
contents the document contains inbut also used as index-terms in social searching. However search results
associated with query-tags are randomly ordered either by popularity, interestingness or reverse chronological
order with most recent bookmarks on top of search results, which limits the effectiveness of information
searching in social bookmarking systems. A lot of research works have already been published to tickle the
problem by exploiting different features of folksonomy structure. This survey provides a brief review of state-of-
the-art, challenges and solutions towards recommending and ranking tagged web documents in Social
Bookmarking Systems (SBS).
Keywords–Information Searching, Social bookmarking, Social Search, Ranking, bookmarks, Web 2.0
I. INTRODUCTION
Due to the unbound storage nature of web, information searching and ranking has gained popularity in
research communities. For this purpose search engines technologies and directorieswere implemented in 1990s,
to overcome the issue of finding relevant information in search results. With the advent of Web 2.0 early in
2002, social indexing has greatly contributed in information management due to its informal organizational
structure powered bythe online users of Web, where resources are associated with freely chosen terms instead of
machine oriented controlled vocabularies.
Social bookmarking systems like Delicious and BibSonomy have large scale shared repository of
public bookmarks enriched with social tags, provide tag-based information searching mechanism for facilitating
users to search information they are interested in. However, search results returned by social searching systems
are ordered either by popularity, interestingness or in reverse chronological order with recent bookmarks on top
[1], [2]. The research problem is that bookmarked documents are not ranked according to their relevancy and
importance to query-tag, which limits the effectiveness of searching information in social bookmarking systems.
This survey reviews approaches proposed so far for re-ranking social search results against query-tags.
The survey is organized in different sections; section 2 provides a brief overview of information systems,
Folksonomy and Social Bookmarking Systems. Section 3 is state of the art while Section 4 concludes the survey
and proposes some questions and suggestions.
II. BACKGROUND
This section provides a brief overview about the background and historical development of information
systems, Folksonomy and Social Bookmarking systems.
1.1 Social Information Systems
The basic objective of an Information System is to facilitate users with results having relevant
documents on top of search results in decreasing order. Before the arrival of Internet and search engine’s
technologies most of research works were dedicated to centralized information retrieval systems, physically
located at one centralized location. Information retrieval has been widely considered as a prominent research
area since 1990s and after the technological development of search engines particularly. Since then a lot of
research works have been done for example PageRank [3], HITS [4], SimRank [5], and SALSA [6] to improve
IR systems. Similar approaches have also been adopted in social IR systems to enhance search results associated
with query tags. Social bookmarking systems having large scale shared repository of public bookmarks, provide
tag-based information searching mechanism for facilitating users to search information in public bookmarks
they are interested in. However, search results returned by these systems are ordered either by popularity,
interestingness or in reverse chronological order with recent bookmarks on top of search results. Information
2. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 57 | Page
searching and ranking requires quite sophisticated techniques in order to retrieve what is required.Section 3
reviews these approaches in the context of social web.
1.2 Folksonomy
The term Folksonomy was first coined by Thomas Vander Wall [7] and is the practice of
collaboratively managing tagged web resources. Free chosen terminologies are used for annotations purpose
instead of controlled vocabularies. According to A. Hotho [8] folksonomy is a quadruple F= (U, T, R, Y) where
U, T and R represent set of users, tags and resources respectively, while Y⊆ U×T×R represent tag assignment,
whereas the collection of all tag assignments is defined as folksonomy. The conceptual space is used for sharing,
organizing and searching web resources in social web applications. Tags serve two purposes: locating web
documents and provide qualitative data about contents. It represent contents of resources very well [9] [10],
reflects human judgment on contents even without controlled vocabulary [11] and hence are considered valuable
for information searching, indexing and ranking.Structure of folksonomy also called Formal Concept Analysis
[12] has been widely discussed in different research articles as shown in Fig 1.1. S. Golden et al [13] have
studied the structure of social tagging as well as its dynamic aspects. B. Lund and T. Hammond et al [14], [15]
have investigated the structure of collaborative tagging system and architecture of participation. D. R. Millen et
al [16] proposed an architectural design of social bookmarking tools for a large scale enterprise. R. Wetzker and
N. Deka et al [17], [18] has reviewed main features, dynamics, patterns, tag spamming, and implications of
tagging systems.
Figure 1: Structure of Folksonomy
1.3 Social Bookmarking Systems
Social Bookmarking Systems (SBS) is a Web 2.0 service using folksonomy to store web documents
online, annotate with freely chosen terminologies, mark as private or public for sharing and future
considerations. Some of these SBSs for example like Delicious, Connotea, Citeulike and BibSonomy have been
reviewed by L. L. Barnes and F. Cevasco et al [19], [20].
Social bookmarking services are of great significance; beside informal organizational of knowledge it
also creates communities of like-minded people for sharing interests as well as improve user’s experience
through expressing different perspectives and in-sighting others user’s public resources [18][21]. Educational
institutions, research communities also utilize online and offline learning, sharing [22][23] and Knowledge
management by building up library services in the digital era through its democratic nature, allowing users to
openly access and contribute [21][24][25].
III. SURVEY
Social Web has changed the way information is searched by incorporating human reasoning power
with well-defined machine algorithms in social bookmarking systems. Potentials of social search have been
studied in [26], [27]for enhancing web search. A lot of work has been published on recommending and ranking
tagged web documents in order to improve web search by ranking relevant documents on top of search results
against query-tag.wheremost of these techniques follows hybrid approaches but still categorized into six
categories for reasonable organization and understanding: Personomy based techniques, Frequency and
similarity based techniques, Structure-based techniques, Semanticsbased techniques, Cluster based techniques
and Probability based techniques.
3.1 Personomy Based Technique
M. G. Noll et al [28] proposed the idea of ranking documents by exploiting similarity relationship
between user’s profile and document’s profiles. Scalar-frequency based similarity is calculated among users,
tags and documents by using equation (1).
3. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 58 | Page
Similarity (u, d) = pu
T
. pd =
tfu tl .tfd tll
((tfu (tl))2
l . (tfd (tl ))2
l
. pd (1)
Where pd is to dampen all the non-zero values to 1, the so called normalization factor. User profile is modeled
by using bookmark collection as tag-document (m × n) matrix Md with m tags and n documents.
Md =
C11 ⋯ C1n
⋮ ⋱ ⋮
Cm1 ⋯ Cmn
, Cij ∈ 0,1
The value of Cij is set to 1 if tag ti is associated with document dj or otherwise 0. Each user profile is thus
formed by equation (2).
Pu = Md.ωd =
C11 ⋯ C1n
⋮ ⋱ ⋮
Cm1 ⋯ Cmn
.
1
⋮
1
=
Ci
∗
⋮
Cm
∗
, Ci
∗
∈ N0 (2)
The ωd = IT
represent [1…1] and factor Ci
∗
represents the total count for tag ti in user’s bookmarking collection
for users. Similarly each document profile is Pd =Mu.ωu Where Mu is tag-user matrix of (m × u).
The same efforts has been made by S. Xu et al [29] where the normalized tf ∗ idf based angular
distance between user’s and document’s profiles is computed by equation (3):
Similarity = COStf−idf um , dn =
tfum tl .iuf tl .tfdn tl .idf (tl)l
((tfum (tl).iuf (tl))2
l . (tfdn (tl) .idf (tl ))2
l
(3)
This combines the term matching between query and web page with topic matching between users and web
pages. Where tfum
rl , tfdm
tl represents user based frequency and document frequency respectively. The
parameters iuf tl , idf tl shows user based and document based inverse documents frequency. Whereas
um , dn are user’s and document’s magnitudes.
D. Vallet et al [30] modified the above scheme by eliminating the user and document length
normalization factors using equation (4).
tf − idf(um , dn)= tfum
tl . iuf tl . tfdn
tl . idf (tl)l (4)
The work is different from that of [28] in the sense, that it uses iuf and idf in combination for tag distribution
globally for users and documents. These personalized approaches has been summarized and compared by I.
Cantador et al [31].
Y. Cai et al [32] defined user profile and document profile as:
Ui = (ti,1 : vi,1, ti,2 : vi,2 : … : ti,n: vi,n )
DC = (tc,1 : ωc,1, tc,2 : ωc,2: … : tc,n: ωc,n)
Where vi,x =
Ni,x
N1
is the ratio of the number of times user i used tag x for resource N with total number of
resources tagged by user i, the NTF for tag x used by user i. Whereas ωc,x =
Mc,x
Mc
is the count of users who
annotate resource c with tag x divided by the total number of user who ever annotate resource c with any tag.
The personalized ranking function in equation (5) is based on user and document profiles as.
RScore( qi, Ui ,DC) =
γ qi , RC + θ (Ui , Rc )
2
(5)
The first part qi , DC =
ωc,x
m
.
k
m
α
, tc,x ∈ qi Where k the total number of terms in query satisfied by resource
profile and m represents total number of terms in query. is the constant used to adjust the effect of relevant
tags in a resource profile for query.
And θ Ui , Dj =
lx vi,x
m
, m is the total number of terms in the query, and
lx =
ωc,x + 1 − vi,x (1 − ωc,x) 1 > ωc,x > 0, vi,x > 0
1 ωc,x = 1, vi,x > 0
0 ωc,x = 1, vi,x > 0
P. Wu et al [33] proposed personalized recommendation by exploiting tf ∗ idf weightage as a variable
value for diffusion based algorithm for personalized recommendation and ranking for document j using equation
(6).
rk,j = ( pl ∗ wl,j )l∈Г( ij) (6)
The first factor represents tf ∗ idf as
wk,j
= wk,j
(T)
+ ∈, and the second factor is preference factor having the
weight of the edge between uk&ij. The weight of item ij with respect to user uk is defined by equation (7):
wk,j
= wk,j
(T)
+ ∈ = (t∈Г(k,j) freqkt * log
|U|
| u:t∈Г (u) |
) + ∈ (7)
4. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 59 | Page
Г k, j , is the personal tag space of user uk = (k = 1,2,3, . . . , n), freqkt is the frequency of tag t used by uk. U
is the total number of users in system, Г (u) is the set of tags used by user k and u: t ∈ Г (u) count the number
of users who used tag t.The diffusion process first defined in equation (8), distributes values calculated for each
item, averagely to the users who have collected; each user ulreceives a value as:
pl = (
wk,j
d( ij )j∈Г(uk) ) = (
wei ght of document j
count of users who have collected the document jj∈Г(uk) ) (8)
Г(uk) is the set of items that have been collected by uk, d( ij) is the degree of item ij in the user-item bipartite
network. In step 2, the value of each user is redistributed among its item’s collection according to wk,j = wk,j
(T)
+
∈.
H. N. Kim et al. [34] proposed Folksonomy-boosted ranking (FBR) which follows collaborative
filtering mechanism for personalized ranking, the hybrid approach is given by equation (9):
FBRu(i, q) = ( pu,t ∗ wt,i )t∈q (9)
The latent tag preference P defines the dot product of user-tag matrix A with tag-tag similarity matrix E using
equation (10).
P = A . Ek
OR Pu,t = au
T
. et OR Pu,t = au,j
|T|
J=1 × ej,t (10)
Where A = au,t =
aj,t
(aj,t)2|U|
J=1
is the normalized matrix of A, The second part Ek
shows k most similar
tags ej,t computed as ex,y = cos (tx. ty ) =
tx . ty
tx || ty ||
The latent tag annotation model is given by equation (11).
W = N. Hk
ORPu,i = nt
T
. ht ORPu,i = nu,j
|t|
J=1 × hj,i (11)
N is tag-item matrix which values nti represent number of users who have annotated item i with tag t, computed
as nt,i=
nt,i
(ni,j)2|t|
J=1
, H is item × item matrix and Hk
contains K most similar items. The values Hk
for
document x and y is calculated, hx,y = cos (ix. iy ) =
ix . iy
ix .|| iy ||
.
3.2 Tag Frequency and Similarity Based Techniques
Theseranking techniques which follows similarity and frequency based approaches are reviewed in this
section. Mostly Vector Space Model is used to calculate similarity measures between web pages and query.
3.2.1 CoolRanking
H. S. Khalifa et al [35] proposed CoolRank to rank tagged documents by exploiting the relationship
among tags, resources, and users. 𝐶𝑜𝑜𝑙𝑅𝑎𝑛𝑘 Algorithm is based on two assumptions, resource popularity P(R)
and tag subjectivity S(R) by using equation (1).
CoolRank = S(R)+P(R) =
ft T ∈ Tags
U (R)
+ log(No: of people who bookmarked Resourece R) (1)
ft(T) is the occurrences of tag query-tag T and U R 𝑖𝑠 𝑡𝑜𝑡𝑎𝑙 𝑏𝑜𝑜𝑘𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑢𝑠𝑒𝑟 𝑈.
3.2.2 Social Ranking
V. Zanardi et al [36] proposed Social Ranking by exploiting the cosine similarity relationships among
tags and users. Social rank for a document is the sum-total of tag similarity and user’s similarity using equation
(1):
𝑅 𝑑 = ( 𝑆𝑖𝑚𝑇𝑎𝑔𝑠
𝑢 𝑖
∗ 𝑠𝑖𝑚 𝑢, 𝑢𝑗 ) + 1 OR
𝑅 𝑑 = 𝑠𝑖𝑚(𝑡 𝑥 𝑢 𝑖 𝑡𝑎𝑔𝑔𝑒𝑑 𝑝 𝑤𝑖𝑡 𝑡 𝑥 ,{ 𝑡 𝑗 ∈ 𝑞∗} 𝑡 𝑥 . 𝑡𝑗 )𝑢 𝑖
∗ 𝑠𝑖𝑚 𝑢, 𝑢𝑗 + 1 (1)
The proposed technique first expand query-tag by considering tags that are similar to query q for which 0
<𝑠𝑖𝑚 𝑡𝑖. 𝑡𝑗 < 1 where 𝑡𝑖 ∈ 𝑞 and𝑡𝑗 ∈ 𝑞′ , q´ is the set of similar tags. The social rank score for a document 𝑅(𝑑)
is the combination of relevance of tags associated with publication with respect to tags in the extended query set
𝑞∗
and the similarity of the users with respect to query user; count the number of similar users.
5. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 60 | Page
3.2.3 Normalized Match Tag Count (NMTC)
W. Choochaiwattana at al [37] proposed MTC and NMTC expressed by the equations (1) and (2). MTC
(Match Tag Count) calculates number of users who used tags matches with query terms string. That is
𝑀𝑟𝑢𝑎 =(𝑅 𝑥 , 𝑈𝑦 , 𝐴 𝑧) equal to 1 when user 𝑈𝑦 uses tag 𝐴 𝑧 to annotate resource 𝑅𝑥 or otherwise as stated. The set
𝑞= {𝑞1,𝑞2,𝑞3, 𝑞4 . . . 𝑞 𝑛 } represents 𝑛 query terms and 𝑎(𝑟𝑥 )={𝑎1,𝑎2,𝑎3,𝑎4,….,𝑎 𝑛 } is the annotations set of web
resource 𝑥. 𝑁𝑀𝑇𝐶𝑥 (Normalized Match Tag Count) is the normalized form of 𝑀𝑇𝐶𝑥 which count all matched
tags for a resource.
𝑀𝑇𝐶𝑥 = 𝑀𝑟𝑢𝑎 𝑟𝑥 , 𝑢 𝑦 , 𝑎 𝑧 𝑖𝑓 𝑎 𝑧 ∈ 𝑞
𝑁 𝑎
𝑧=0
𝑁 𝑢
𝑦=0 . . . . . (1)
𝑁𝑀𝑇𝐶𝑥 =
𝑀 𝑟𝑢𝑎 𝑟 𝑥,𝑢 𝑦 ,𝑎 𝑧 𝑖𝑓 𝑎 𝑧 ∈𝑞
𝑁 𝑎
𝑧=0
𝑁 𝑢
𝑦=0
𝑀 𝑟𝑢𝑎 𝑟 𝑥,𝑢 𝑦 ,𝑎 𝑧
𝑁 𝑎
𝑧=0
𝑁 𝑢
𝑦=0
. . . . (2)
3.2.4 Social Ranking Based on Reputation
E. M. Daly et al [38] exploited the Wisdom of the Crowds so called reputation, which combinations the
number of bookmarkers, reputation of the bookmarkers and time dynamics of documents in order to rank web
documents. User reputation is the number of users (consumers) consuming the content of a user (contributor).
The factor 𝑅𝑟𝑒𝑤𝑎𝑟𝑑 depends upon the consuming rate of other users by equation (1).
𝑈𝑠𝑒𝑟 𝑅𝑒𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛 = 𝑅 𝑛𝑒𝑤 = 𝑅 𝑜𝑙𝑑 + (1 - 𝑅 𝑜𝑙𝑑 ) ×𝑅𝑟𝑒𝑤𝑎𝑟𝑑 . . . . (1)
The document reputation is simply the number of users that add documents to their collection using equation (2)
and time dynamic by equation (3).
𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑅𝑒𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛 = 𝑅 𝑛𝑒𝑤 = 𝑅 𝑜𝑙𝑑 + (1 - 𝑅 𝑜𝑙𝑑 ) ×𝑅𝑟𝑒𝑤𝑎𝑟𝑑 . . . . (2)
𝑇𝑖𝑚𝑒 𝐷𝑦𝑛𝑎𝑚𝑖𝑐𝑠 = 𝑅 𝑛𝑒𝑤 = 𝑅 𝑜𝑙𝑑 ×𝛾 𝑘
. . . . . (3)
The third component is the time dynamic, where 𝛾 shows time decay coefficient and k is the time unit since the
reputation value for an item last updated. The Reputation ranking is given in equation (4).
𝑅𝑒𝑝𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝑅𝑎𝑛𝑖𝑛𝑔 = 𝑅 𝑛𝑒𝑤 = 𝑅 𝑜𝑙𝑑 ×𝑅 𝑏𝑜𝑜𝑘𝑚𝑎𝑟𝑘𝑒𝑟 × 𝛽 . . . . . (4)
3.2.5 Tag-Similarity Based Ranking
F. Durao et al [39] proposed tag-based system by suggesting similar web pages based on the similarity
of their tags and a reordering method of the original recommended ranking. Three arguments are combined to
evaluate the personalized recommendation ranking; tag popularity, tag representativeness and affinity of tags-
users. The document score is given by equation (1).
𝐷𝑠 = 𝑤𝑒𝑖𝑔𝑡 𝑇𝑎𝑔𝑖 ∗ 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑛𝑒𝑠𝑠 𝑇𝑎𝑔𝑖
𝑛
𝑖=1
𝑛
𝑖=1 (1)
𝑤𝑒𝑟𝑒 𝑛 𝑖𝑠 𝑡𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜 𝑓 𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔 𝑡𝑎𝑔𝑠 𝑖𝑛 𝑡𝑒 𝑟𝑒𝑝𝑜𝑠𝑖𝑡𝑜𝑟𝑦
The affinity between user and tag is calculated with equation (2):
𝐴𝑓𝑓𝑖𝑛𝑖𝑡𝑦 𝑢,𝑡 =
𝑐𝑎𝑟𝑑 𝑟∈𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 | 𝑢,𝑡,𝑟 ∈𝑅,𝑅⊆𝑈×𝑇×𝐷
𝑐𝑎𝑟𝑑 𝑡∈𝑇 𝑡,𝑢 ∈𝑅 𝑢 ,𝑅 𝑢 ⊆𝑈×𝑇
, (2)
𝑤𝑒𝑟𝑒 𝑡 𝑖𝑠 𝑎 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑡𝑎𝑔, 𝑢 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑢𝑠𝑒𝑟, 𝑈 𝑖𝑠 𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑈𝑠𝑒𝑟𝑠 𝑎𝑛𝑑 𝐷 𝑖𝑠 𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑟𝑒𝑠𝑢𝑟𝑐𝑒𝑠
𝑎𝑛𝑑 𝑇 𝑖𝑠 𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑡𝑎𝑔𝑠.
By combining the above three parameters we have the following equation:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐷 𝑖,𝐷 𝑗 )
= 𝐷𝑠 𝐷 𝑖+ 𝐷𝑠 𝐷 𝑗
∗ 𝑐𝑜𝑠𝑖𝑛𝑒_𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑇𝐷 𝑖, 𝑇𝐷 𝑖
∗ 𝐴𝑓𝑓𝑖𝑛𝑖𝑡𝑦 𝑢,𝑡 , 𝑊𝑒𝑟𝑒 𝐷𝑠 𝑖𝑠 𝑡𝑒 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑠𝑐𝑜𝑟𝑒 𝑎𝑛𝑑 𝑇 𝑖𝑠 𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑡𝑎𝑔𝑠 𝑜𝑓 𝑎 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡.
3.2.6 Tensor Based Recommendation and Ranking
R. Wetzker et al [40] proposed user-center tag model (UCTM) which maps personomies into
folksonomies. Ternary relation is utilized among items, users and tags called the folksonomy tensor 𝑌 = 𝑌𝑖𝑡𝑢 =
⊆ 𝐼 × 𝑇 × 𝑈 with𝑌𝑖𝑡𝑢 = 1, 𝑖𝑓 𝑖, 𝑡, 𝑢 ∈ 𝑌 𝑎𝑛𝑑 0, 𝑂𝑡𝑒𝑟𝑤𝑖𝑠𝑒. Ranking of the proposed technique took place
in two steps: first query tag is translated to the global vocabulary and then items with highest weight are
recommended to users. Translation of a single tag t is the previous co-occurrences with other tags represented
by vector𝜏 𝑇𝑡 𝑢 within the translation tensor𝜏 𝑇𝑇 𝑢. Tags from user’s personomy may be translated to folksonomy
vocabulary by simple vector multiplication using equation (1).
t 𝑡, 𝑢 =
𝜏 𝑇𝑇 𝑢 × 𝑇 𝑡
𝜏 𝑇𝑇 𝑢 × 𝑇 𝑡
(1)
In step second items which are associated with these community tags are ranked by calculating weight vector by
using the following equation (2).
𝑖 𝑡, 𝑢 =
𝜏 𝑇𝑡 𝑢 × 𝑇 𝐴∗ 𝐼𝑇
𝜏 𝑇𝑡 𝑢 × 𝑇 𝐴∗ 𝐼𝑇
(2)
Where 𝐴∗
𝐼𝑇is the tag-normalized stochastic version of the matrix 𝐴𝐼𝑇.
6. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 61 | Page
3.2.7 Neighbor Weight Collaborative Filtering (NwCF)
D. Parra-Santander [41] proposed the idea of Collaborative Filtering in Social Tagging Systems by
exploiting the collaborative filtering recommender system [42]. The proposed technique works in two steps: first
users with similar interests are filtered by using equation (1).
𝑢𝑠𝑒𝑟𝑆𝑖𝑚𝑒 𝑢, 𝑛 =
𝑟 𝑢𝑖 −𝑟 𝑢 (𝑖⊆𝐶𝑅 𝑢,𝑛 𝑟 𝑛𝑖 −𝑟 𝑛 )
𝑟 𝑢𝑖 −𝑟 𝑢𝑖⊆𝐶𝑅 𝑢,𝑛
2
𝑟 𝑛𝑖 −𝑟 𝑛𝑖⊆𝐶𝑅 𝑢,𝑛
2
. . . . (1)
The above equation shows user’s similarity in terms of Pearson Correlation coefficient between target user 𝑢
and neighbor 𝑛 whereas 𝐶𝑅 𝑢,𝑛 shows set of all correlated items between 𝑢 and 𝑛, and 𝑟𝑛𝑖 represents neighbor’s
𝑛 rating for item 𝑖 ∈ 𝐶𝑅 𝑢,𝑛 . Top 𝑘 neighbors are considered and items of those neighbors are recommended to
the target user.
𝑝𝑟𝑒𝑑 𝑢, 𝑖 = 𝑟𝑢 +
𝑢𝑠𝑒𝑟𝑆𝑖𝑚𝑒 𝑢,𝑛 .(𝑖⊆𝑛𝑒𝑖𝑔 𝑏𝑜𝑟𝑠 (𝑢) 𝑟 𝑛𝑖 −𝑟 𝑛 )
𝑢𝑠𝑒𝑟𝑆𝑖𝑚𝑒 𝑢,𝑛𝑖⊆𝑛𝑒𝑖𝑔 𝑏𝑜𝑟𝑠 (𝑢)
. . . . . (2)
Enhancement of the equation (2) is supplemented by 𝑛𝑏𝑟 𝑖 as described below in equation (3).
𝑝𝑟𝑒𝑑′
(𝑢, 𝑖) = log10 1 + 𝑛𝑏𝑟 𝑖 . 𝑝𝑟𝑒𝑑(𝑢, 𝑖) . . . . (3)
The neighbors of top 𝑘 users (neighbors) are calculated by using equation (1). Here 𝑛𝑏𝑟 𝑖 calculates the
number of raters in the overall calculation of publications.
3.2.8 Linear Weighted Hybrid Approaches
J. Germmell et al [43] proposed different hybrid approaches based on users and items based
collaborative filtering, tag model similarity, structure and popularity.
a) Linear Hybrid Recommendations using equation (1)
∅ 𝑢, 𝑞, 𝑟 = 𝛼𝑖∅𝑖(𝑢, 𝑞, 𝑟)𝑘
𝑖=1 (1)
The component which exploits the two-dimensional properly of 𝑈𝑅𝑇 model include 𝑅𝑇, 𝑅𝑈, 𝑇𝑈, 𝑇𝑅, 𝑈𝑅, 𝑈𝑇,
for example 𝑅𝑇 𝑟, 𝑡 = 𝑈𝑅𝑇(𝑢, 𝑟, 𝑡)∀𝑢∈𝑈 = 𝑇𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑠𝑒𝑟𝑠 𝑤𝑜 𝑎𝑣𝑒 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑒𝑑𝑟𝑤𝑖𝑡𝑡.
b) Collaborative Filtering:
User based collaboration is based on KKN (𝐾𝐾𝑁𝑟𝑢 𝑎𝑛𝑑 𝐾𝐾𝑁𝑟𝑡 ) collaborative filtering algorithms. User-based
tag-specific approach is defined by equation (2):
∅ 𝑢, 𝑡 , 𝑟 = 𝜎 𝑢, 𝑣 𝜒(𝑣, 𝑡, 𝑟)𝑣∈𝑁 𝑢
𝑡 𝑤𝑒𝑟𝑒 𝜃 𝑣, 𝑡, 𝑟 = 1 𝑖𝑓 𝑣 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑒 𝑟 𝑤𝑖𝑡 𝑡 𝑜𝑟 0 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒. (2)
∅ 𝑣, 𝑡, 𝑟 Calculated by computing the similarity measure between users (target user u and neighbor v), where
user v must have at least label resource 𝑟 with tag 𝑡. In case of item-based collaborative filtering
recommendation systems we have equation (2):
∅ 𝑢, {𝑡}, 𝑟 = 𝜎(𝑟, 𝑠)𝑠∈𝑁 𝑟
𝑡 𝜃(𝑢, 𝑡, 𝑠) (3)
Similarities between resources are computed between the given resource 𝑟 and neighbor’s resources 𝑠 ∈ 𝑁𝑟. A
resource is considered only if tagged with tag 𝑡 for finding 𝑘 neighbors for a resource.
c) Tag-based Similarity Model:
Tag based similarity model defined by the following equation (4).
𝜔 𝑢, ∅, 𝑟 =
𝑅𝑇 𝑟,𝑡 ×𝑈𝑇(𝑢,𝑡)𝑡∈𝑇
𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒 𝑜𝑓 𝑅𝑇 𝑟,𝑡 .𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒 𝑜𝑓 𝑅𝑇 𝑢,𝑡
(4)
d) Popularity Model: Tag specific popularity model is defined by equation (5).
𝜔 𝑢, {𝑡}, 𝑟 = 𝜒(𝑣, 𝑡, 𝑟)𝑣∈𝑈 (5)
The value of 𝜒(𝑣, 𝑡, 𝑟)is 1 only if user 𝑣 tag resource 𝑟 with 𝑡 and zero otherwise.
3.3 Structured Based Approaches
Different mathematical models like Matrices, Functions, Vectors, Probability and Graph have been
considered in computing, to make the transformation of various research concepts into real world practices
possible. The graph technique has been adopted in PageRank, HITS and SALSA as far as search engines
technology is concerned. In the same way graph models are utilized to associates different objects of social
bookmarking systems for the sake of organization, sharing and ranking e-resources. Some of the contributions
that exploit graph structure of folksonomy are analyzed in this section.
3.3.1 FolkRank
FolkRank is the adapted 𝑅𝑎𝑔𝑒𝑅𝑎𝑛𝑘 [𝜔 = 𝐴𝑑𝜔 + 1 − 𝑑 𝑝] algorithm for folksonomy based
information ranking proposed by A. Hotho and his team [1]. The folksonomy 𝐹 = (𝑈, 𝑇, 𝑅, 𝑌) structure is
converted into an undirected tri-partite graph as 𝐺 = 𝑉, 𝐸 as shown in figure 2. The edge 𝑊(𝑡,𝑟)= {𝑢 ∈ 𝑈}
shows number of users who annotated resource r with tag t, the relation 𝑊(𝑡,𝑢)= { 𝑟 ∈ 𝑅 } represents the number
7. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 62 | Page
of resources tagged with term 𝑡 by user u and 𝑊(𝑢,𝑟)= { 𝑡 ∈ 𝑇 } is the number of tags assigned to resource 𝑟 by
user 𝑢. All these weightages mutually reinforce each other by spreading their weights using equation (1).
𝜔 = 𝐴𝑑𝜔 + (1 − 𝑑)𝑝 (1)
Where 𝐴 is the adjacency matrix representing weight-of-edges between nodes and is a model of the folksonomy
graph, all the in-links contributes an input value in calculating the overall 𝐹𝑜𝑙𝑘𝑅𝑎𝑛𝑘 of a node. 𝑝 is the
preferences vector and d is dumping factor. 𝐹𝑜𝑙𝑘𝑅𝑎𝑛𝑘 is calculated by differentiating two computation of
𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘, one with and one without a preference vector as 𝜔 = 𝜔1 − 𝜔0 where 𝜔0 is the where 𝑑 = 1 and 𝜔1
is with𝑑 < 1. The final differential vector contains the F𝑜𝑙𝑘𝑅𝑎𝑛𝑘 of each node; tag, resource and user.
Figure2: Tri-Partite graph of tags, users and URLs
3.3.2 SocialSimRank (SSR) and SocialPageRank (SPR)
S. Bao et al [44] proposed SSR and SPR Algorithm for measuring the popularity of web resources from
user’s perspective. SSR is computed based on set of tags associated with a page and query terms by calculating
SimRank [5] similarity measure between query and tags as using equation (1).
𝑆𝑖𝑚 𝑆𝑆𝑅 𝑞, 𝑝 = 𝑆𝐴(𝑞𝑖, 𝑞𝑗 )𝑚
𝑗=1
𝑛
𝑖=1 (1)
The query𝑞𝑖 having many terms and 𝑞𝑗 is the annotations set associated with web page 𝑝..The SPR calculates
popularity of web pages in the context of social bookmarking by executing the following steps from (a) to (f):
a) 𝑈𝑖 = 𝐴 𝑃𝑈
𝑇
. 𝑃𝑖
b) 𝐴𝑖 = 𝐴 𝑈𝐴
𝑇
. 𝑈𝑖
c) 𝑃𝑖
′
= 𝐴 𝐴𝑃
𝑇
. 𝐴𝑖
d) 𝐴𝑖
′
= 𝐴 𝐴𝑃 . 𝑃𝑖
′
e) 𝑈𝑖
′
= 𝐴 𝑈𝐴. 𝐴𝑖
′
f) 𝑃𝑖+1 = 𝐴 𝑃𝑈. 𝑈𝑖
′
Until 𝑃𝑖 converge which is the SPR score for resources.
The popularity vectors are 𝑃𝑖 , 𝑈𝑖 and 𝐴𝑖 represent pages, users and annotations in the ith iteration. User
popularity can be derived from the pages they annotate at equation (a), the annotation popularity can be derived
from the popularity of users in equation (b), the popularity of web pages can be derived from annotations in
equation (c), and Web pages to annotations at equation (d), annotations to users at equation (e) and users to web
pages at equation (f).
3.3.3 Group Sensitive Ranking
Group-Sensitive ranking algorithm is based on 𝐺𝑟𝑜𝑢𝑝𝑀𝑒 Folksonomy [45]. A 𝐺𝑟𝑜𝑢𝑝𝑀𝑒 folksonomy
is defined as a 5-tuple 𝐹 = {𝑈, 𝑇, 𝑅, 𝐺, 𝑌} where 𝑈, 𝑇, 𝑅, 𝑌 represent users, tags, resources and tag-assignment
respectively, while G = { 𝑔1, 𝑔2,𝑔3,…, 𝑔 𝑛} is the set of groups like 𝑔1 or 𝑔2 ∈G whereas a group is a set of
resources. The factor Y represents the tag assignment in the context of 𝐺𝑟𝑜𝑢𝑝𝑀𝑒 with 𝑌 ⊆ (𝑈 × 𝑇 × 𝑅 × 𝑈).
The 𝐺𝑟𝑜𝑢𝑝𝑀𝑒 users can create groups of topic specific resources which are related to each other. During the
survey it was observed that 50% resources do not have even a single tag. Therefore it is hard to found
information by folksonomy based searching. The 𝐺𝑟𝑜𝑢𝑝𝑀𝑒 approach allows users the facility of free for all
tagging approach which enables users to annotate not only their own resources but resources of others as well by
annotating groups having many resources [46].
3.3.3.1 GRank
The technique proposed by F. Abel et al [47] is a group sensitive ranking method operates on
𝐺𝑟𝑜𝑢𝑝𝑀𝑒 Folksonomy environment where resources are grouped on the basis of similarity and users interests.
For query tag 𝑞𝑡, group 𝑔 ∈ 𝐺 the ranking vector returns 𝑤𝑅𝑞 (𝑟) weights as 𝐺𝑅𝑎𝑛𝑘 for resource 𝑟 ∈ 𝑅 𝑞 by the
following four factors:
a. 𝑤𝑅𝑞 𝑟 = 𝑤(𝑡𝑞, 𝑟) .𝑑 𝑎 [𝑤𝑒𝑖𝑔𝑡𝑠 𝑜𝑓 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑙𝑦 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑠 ]
b. For each group g𝑔 ∈ 𝐺 ∩ 𝑑 𝑏 compute:
8. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 63 | Page
𝑤𝑅𝑞 𝑟 += 𝑤 𝑡𝑞, 𝑔 .𝑑 𝑏 [𝑤𝑒𝑖𝑔𝑡 𝑔𝑎𝑖𝑛𝑒𝑑 𝑓𝑟𝑜𝑚 𝑔𝑜𝑢𝑝𝑒]
c. For each 𝑟′
∈ 𝑅 𝑎 where 𝑟′
is contained in the same group as that of 𝑟 and 𝑟 ≠ 𝑟′
do𝑤𝑅𝑞 𝑟 +=
𝑤 𝑡𝑞, 𝑟′
.𝑑 𝑐 [𝑤𝑒𝑖𝑔𝑡 𝑓𝑟𝑜𝑚 𝑛𝑒𝑖𝑔𝑏𝑜𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠]
d. If 𝑟 ∈ 𝐺 then: for each 𝑟′
∈ 𝑅 𝑎 where 𝑟′
is contained in r do:
𝑤𝑅𝑞 𝑟 += 𝑤 𝑡𝑞, 𝑟′
.𝑑 𝑑 [𝑤𝑒𝑖𝑔𝑡 𝑜𝑓 𝑟𝑒𝑜𝑢𝑟𝑒𝑐𝑒 𝑖𝑛 𝑔𝑜𝑢𝑝𝑒]
3.3.3.2 GFolkRank: Group Sensitive FolkRank
GFolkrank [48] is the adopted FolkRank which is a context sensitive ranking algorithm operates on
graph 𝐺𝐺 = (𝑉𝐺, 𝐸𝐺) which models 𝐹 = (𝑈, 𝑇, 𝑅, 𝑌, 𝐺). First F is transformed to GG where each node
contributes to every other node recursively. When a user 𝑢 adds a resource 𝑟 to a group 𝑔, the tag assignment 𝑌
is formulated as (𝑢, 𝑡𝑔 , 𝑟), where 𝑡𝑔 belongs to 𝑇𝐺. These tags are called artificial tags which are assigned to all
resources containing in a group 𝑔. In this way the vertices of hyper graph is increased by 𝑇𝐺 having a total of
𝑉𝐹 + 𝑇𝐺vertices.
3.3.3.3 Social HITS: Social Hyperlink Induced Topic Search
F. Abel [49] proposed Social HITS operates in group sensitive folksonomies. In the GroupMe context the
hub and authorities which are not constraints to entity types: for example user’s authority may be supposed as
the annotations to high quality resources, similarity tag authority may the contributed by high quality users etc.
The algorithm executes in:
a) Folksonomy model𝐹, 𝑡 the query, 𝑆𝑡 the searching strategy, 𝑆𝑔 the graph construction strategy and 𝑘
the number of iterations.
b) Search base set (𝐹𝑡 ) tag-assignments relevant to 𝑡.
c) Construct graph (𝐺 𝐷) from the base set (𝐹𝑡 ) by using graph construction strategy𝑆𝑔.
d) Iterate 𝐺 𝐷graph k times for calculating Hub and Authorities of resources in 𝐹𝑡.
3.3.4 ExportVoteRank and RecommendationPageRank
C. H. Lo et al [50] proposed 𝐸𝑥𝑝𝑜𝑟𝑡𝑉𝑜𝑡𝑒𝑅𝑎𝑛𝑘 (EVR) and 𝑅𝑒𝑐𝑜𝑚𝑚𝑛𝑒𝑑𝑎𝑡𝑖𝑜𝑛𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘 (RPR) for
ranking resources in social tagging environments. The EVR for a web resource considers the importance of a
resource by taking into account authorities of users who annotate it using equation (1).
𝑒𝑣𝑟 𝑟 ← 𝑒𝑣𝑟 𝑟 +
𝑝𝑟 𝑢
𝑐𝑜𝑢𝑛𝑡 𝑢
(1)
∀𝑢 ∈ 𝑈, ∀ 𝑢, 𝑟 ∈ 𝑈𝑅, 𝑈𝑅 = {(𝑢, 𝑟)|∃𝑡 ∈ 𝑇 𝑠𝑢𝑐 𝑡𝑎𝑡 (𝑢, 𝑡, 𝑟) ∈ 𝑈𝑇𝑅}, 𝑃𝑟 𝑢 is Rank score of user 𝑢 and
initially set to 1 and𝐶𝑜𝑢𝑛𝑡(𝑢) to zero as number of bookmarks by u.
RecommendationPageRanking is based on association rule mining and implemented as a directed graph
RecGraph 𝐺𝑡 = (𝑉𝑡 , 𝐸𝑡 ) among users, tags and resources for(𝑉𝑡 , 𝐸𝑡 ) ∈ 𝑈𝑅𝑡 , 𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝑟 ∈
𝑅 𝑟𝑒𝑙𝑎𝑡𝑒𝑑 𝑡𝑜 𝑞𝑢𝑒𝑟𝑦 − 𝑡𝑎𝑔 𝑡. The𝑝𝑟𝑡 𝑟𝑖 as PageRank for 𝑟𝑖 ∈ 𝑉𝑡 is computed in equation (2):
𝑝𝑟𝑡 𝑟𝑖 ←
1−𝑑
𝑉𝑡
+ (d) . 𝜔𝑡 𝑟𝑗 , 𝑟𝑖 . 𝑝𝑟𝑡 (𝑟𝑗 )𝑟 𝑗 ∈𝐸𝑡(𝑟 𝑖) With
𝑟𝑝𝑟 𝑟 ← 𝑟𝑝𝑟 𝑟 + 𝑝𝑟𝑡 𝑟 ∀𝑡 ∈ 𝑇, ∀𝑟 ∈ 𝑉𝑡 (2)
3.3.5 S-BITS: Social Bookmarking Induced Topic Search
The idea of HITS (Hypertext Induced Topic Search) is followed in the S-BITS [51] approach toward ranking
search result in social bookmarking system. Basic assumptions which drive the proposed technique are:
a) A resource tagged by many good users is a good resource and
b) A user annotated many good pages is a good user
The folksonomy model is exploited to exhibit the phenomenon of hub and authority on bookmarked documents.
The HIT algorithm works as follow:
a) Input tag-query terms 𝑘 to a text-based IR system and obtain the root-set 𝑅 by using some similarity
measures, Web pages that are collected, as Root Set 𝑅, are related to the query 𝑘. Additionally the following is
formatted.
a) All the users associated with the root set 𝑅 are taken in account to make set 𝑈 of users, where each user
𝑢 ∈ 𝑈 have at least one annotation to a page in 𝑅.
b) All the tags associated with 𝑅 are considered as bookmarked-tag set 𝐵𝑇 for𝑅.
b) The root-set 𝑅 is expanded by using association rules to obtain a base set 𝐵 as 𝐵 = 𝑅 ∪ {𝑝𝑗 |𝑢𝑖
𝐵𝑇 𝑖𝑗
𝑝𝑗 ∧
𝑢𝑖 ∈ 𝑈 ∧ 𝐹𝑇 ⊆ 𝐵𝑇𝑖𝑗 ∧ 𝐹𝑇 ∈ 𝑇′
}. Where 𝐹𝑇 ∈ 𝑇′
is the frequent tag set and each tag set 𝐵𝑇𝑖𝑗 ∈ 𝑇′
is taken as a
transaction with association rule𝑢𝑖
𝐵𝑇 𝑖𝑗
𝑝𝑗 .
9. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 64 | Page
c) The authorities and hub score are calculated as per the nested graph that is shaped by web pages in 𝐵
and users in 𝑈 and annotation as edges 𝐸.
d) Report top-ranking authorities and hubs.
3.4 Semantic Based Approaches
Semantic web so called Web 3.0 defined as web with a meaning [W3Schools], which is still not fully
implemented and is a web of things described syntactically in a way that computer can understand [52]. It refers
to different methods and technologies likes Resource Description Framework [53], data interchanging formats
like RDF/XML, Triple and notation like RDF Schema and Web Ontology Language (OWL) [54] that makes the
Web more intelligent by providing formal description (via Description logic and other knowledge representation
techniques) of concepts and relationships within a given knowledge domain.
3.4.1 GroupMe With Semantics Approach
F. Abel et al [55] proposed ranking approach based on social semantic web, considering not only tag
frequency but also contextual information in the form of groups been enriched with RDF semantics. The
operations executes on group folksonomy in the following sequence.
a) All resources that are annotated with ti ∈ q= {t1, t1, . . tn} are retrieved and weight of each resource is
calculated as equation (1):
resourceWeight t, r = resourcesWeight(t, r)t∈q =
number of users who tagged resource r with t
number of users who tagged resource r
(1)
b) All the groups are consider which are tagged with ti ∈ q= {t1, t1, . . tn} where group weights are
calculated for each page in a group using equation (2).
groupWeight t, g =
number of resources in g that are tagged with t
number of resouces in g
(2)
c) Context weight is calculated for documents based on their appearances in groups based on Equation (a)
and (2) using equation (3):
ContextWeight(q, r, g) = resourcesWeight t, r . groupWeight t, gt∈q (3)
Group-Me is a semantic application which extracts concepts like groups and relations of tags in and out of
groups from IR systems to transform them into RDF descriptions with the help of DC, FOAF and Group-Me
and provide semantically reach description for the group folksonomy.
e) Resources retrieved are ranked according to their weightage
3.4.2 Semantically Relevant Resource Retrieval and Ranking (SR3)
P. Bedi et al [56], [57] proposed SR3 that exploits advantages of social bookmarking services and
Semantic Ontologies. Query is expended by using domain ontologies, where query-tag Qo becomes Q =
{Q0, Qp, Qc}. SR3 computes similarity weights of all query term with respect to all tags associated with a
resource as a ranking function in equation (1).
θQo ,ri
=
(wtQo,qt
. wttk ,ri
)qt= tk
wqt
tQo,qt
2 wk ttk ,ri
2
. . . . (1)
Query vector is computed by using Semantic Distance as in equation (2)
WtQo,qt
=
SRQo,qt
SRQo,qt
n
t=0
=
Semantic Weight of qt ∈Q
Semantic Weights of all terms in Q says n terms
. . . (2)
And the vector length (magnitude) of expanded query Q is defined is given as by equation (3):
Q = WtQo,qt
2
qt ….. (3)
Resource vector is computed as semantic relevance of query term Qo and each tag tk associated with resource ri
with equation (4) by utilizing equation (2).
SRQo,tk
=
Counttk
d Qo,tk
+ 1
=
Tag Frequency
Sematnic Weightsof tags and query terms
The normalized form is:
Wttk ri
=
SR Qo,tk
SRQo,tk
m
k=1
=
Semantic Weight of tk associated with ri
Semantic Weights of all tags (say m)associated with ri
. . . . (4)
Equation (5) defines the resource vectorri ∈ R for equation (1) as:
ri = wk ttk,ri
2
. . . . . (5)
10. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 65 | Page
In similar way A. Thukral et al [58] proposed Social Semantic Relevance (S2R) approach based on
Vector Space Model which exploits relationship between tags and pre-existing semantics knowledge associated
with tagged resources.
3.4.3 Agent Assisted Based Approach
G. Fenza et al [59] proposed the idea of agent-assisted tagging which aims is to assist users not only in
tagging activity but in suggesting relevant resources as well. A plug-in-application has been programmed using
Delicious APIs based on JADE platform for browser enriched with three types of agents:
a) The user agent Parses resources a user currently explores. After analyzing the ∗ idf , with confidence
threshold 0.6, candidate words are presented to the user for tagging activity.
b) The FFCA agent (Fuzzy Formal Concept Analysis) maintains a cross-table matrix of (resources × tags)
the intersection of which contains the relevance of the tag to associated resource having a value
between 0 and 1.
c) The lattice agent makes a formal concept lattice tree from fuzzy formal context matrix for
corresponding resources with edges having the weighting score for recommendation and ranking web
resources.
3.5 Cluster Based Approaches
The clustering (unsupervised learning) is the process of organizing objects into groups having similarity
with respect to some common properties while classification is considered as the supervised learning which the
task of identifying to which cluster a new object belongs to, based on training set. Clustering has been adopting
to exploit different aspects of the social web in order to re-order search result so that to facilitate users with the
top relevant documents on the top position in decreasing order of their relevancy. Some of the technique that is
based on cluster, in this regard, is briefly discussed below.
3.5.1 Tag Clustering Through Association Rules Approach
Y. Zhou et al [60] proposed tag-clustering approach based on Association Rule Mining operates in using
the following steps.
a) Tag Clustering: Tags are clustered into different concepts using tags graph based on association rule
mining where edge between two tags is signified with a similarity weight Wtitj
= Conf(ti → tj). The similarity
between two clusters A and B is calculated by the following equation (1).
sim A, B =
cut (A,B)
A
+
cut (A,B)
B
. . . . (1)
Whereas A represents the number of nodes in cluster A and B, cut(A, B) is the sum of cross/cut edges weights
of overlapping tags between cluster A and B by equation (2)
cut A, B = Wtitj∈(cut edges)ti∈A,tj∈B . . . . (2)
b) Similarity Function between resource r and concept C is given by equation (3):
sim r, C =
( W(t,C)t∈(r∩C) )2
W t,C ∗t∈C w t,rt∈r
=
Sum of weights of overlapping tags in concept C
sum of weights in C ×sum of weights of tags in r
. . . . (3)
Defines similarity between resource r and concepts C with respect to ranked tags weightages, where W t, r
represent weights of tags associated with a resource r asW t, r = W t, C t ∈ C or 0 t ∉ C . Equation (4)
calculates the concept tag’s weights associated with a concept C. The cohesion is defined for a tag is the number
of links (edges) with other tags in the same concept. While Inv. Coup is the number of links a tag t has with
other tags not in the same concept (other external tags).
weight t, C =
0 if t ∉ C
cohesion Wt,vv∈C ∗ Inv. Coup(2− Wt,uu∈C
) if t ∈ C
. . . . (4)
3.5.2 Web Pages and Tag Clustering (WTC)
C. Zhao [61] proposed Web pages and Tag Clustering algorithm (WTC) computed in two step, first web
pages and tags are clustered using hyper graph spectral clustering algorithm. Secondly coverage rate between
resources and tags rate for ranking web resources.
WorldNet is used to transform all worlds, of retrieved documents, into noun form to create a set of worlds
which expresses web page more precisely. Tags and web pages are clustered using hyper-graph spectral
clustering results in (a) tag clusters (TCs) and (b) web pages clusters (PCs). Each TC represent a set of web
pages (PS) which contains all the pages which at least contains one tag form the corresponding TC as content
word, at same pattern every page cluster (PC) represent tag set (TS) which includes all the tags which were at
least associated with one page in the corresponding (PC). The similarity between documents is computed by the
following equation (1).
11. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 66 | Page
Sim di,dj =
1
2
di∩dj
di∪dj
+
1
2
di∪dj
di + dj
. . . . (1)
After this the largest ten couples of TC and TS are considered for the retrieval of web page sets PSs and
web page clusters PCs. The coverage rate between a web page cluster and web page set defines the ranking
measures using equation (2).
Cov Ci ∩ Dj =
Ci∩Dj
Ci∪Dj
. . . . (2)
Where Ci and Dj represent web page cluster and web page sets respectively. Query is matched against the
TSs and TCs and pick up those which contains the query tag. The similarity among the TSs and TCs are
computed and top ten largest TSs and TCs are considered for which the corresponding PCs and PSs are selected.
The coverage between the PS and PC is computed and the couple of PS and PC is return to user having high
coverage rate.
3.5.3 Similarity based Tag Clustering with Tag Frequency
S. Niwa et al [62] proposed tag clustering based on the cosine similarity among tags. Parent tags of all tags
are chosen which is highly related to that tag and in this way each cluster represent a particular topic.The
Personalization factor combines the recommendation points of each page within a cluster with the affinity score
between users and tag clusters by equation (1).
point U, D = rel(U, Ci)Ci∈ CLASTERS × point (Ci,D) . . . (1)
Where first part of equation (1) defines relation between users U and tags T as rel U, C, =
rel U, TiTi∈ C where
rel U, T = rel(Di , T)
Di∈ bookmarks (U)
with rel Di , T = NTF D, T × IDF(T)
The second part is the recommendation value of each page for a cluster. The recommendation point between
each web page and each cluster is calculated by summing the weighted frequency score each tags within a
cluster using equation (2).
point C,D = w D, TiTi∈ C (2)
3.5.4 Hierarchal Agglomerative Clustering Approach
A. Shepitsen et al [63] proposed for the hybrid approach of Personalized Information recommendation
through Vector Space Model (VSM) with Hierarchical Agglomerative Clustering (HAC) algorithm by equation
(1).
θ u, q, r = Sim q, r ∗ I u, r = (VSM ∗ Personaliztion Factor) (1)
The first factor Sim(r,q) calculates for each document r with respect to query q by equation (2).
cos q, r =
tf(q,r)
tf(t,r)2
t∈T
(2)
Furthermore Agglomerative Hierarchal Clustering is used to clusters tags being examined. The distance
between two clusters is the distance between their centroids. The division coefficient is taken high so that to
make small clusters with similar tags. To calculate user’s interest for personalization as defined by the second
parameter is given in equation (3).
I u, r = uc_w(u, c) ∗c∈C rc_w(r,c) (3) where
ucw u,c =
Numer of times r is annotated with a tag from a cluster c
total number of annotationsby user u
rcw r,c =
Number of times the resource r is annotated with a tag form cluster c
total number of times the resource r is annotated
3.5.5 Semantic Tag Clustering Search
D. Vandic et al [64] proposed the idea of Semantic Tag Clustering Search (STCS) for sorting web
documents which is based [65], which is further based on [66]. The sorting formula based on cosine similarity
by using equation (1).
g q, r =
1
n
1
m
cos(qi
, rj)m
i=i
n
j=1 (1)
The result of resources been returned be solved by calculating the similarity between query tag and resources.
Where qi
∈ q = {q1
, q2
, … , qm
} and tags associated with resource rj ∈ r = {r1, r2, … , rn}. Non-hierarchal
clustering technique is used with angular similarity based on tag co-occurrence for semantic relatedness and
levenshtein similarity to detect string similarity to avoid syntactic variations using equation (2).
ωij= zij × (1- levij) + 1 − zij × cos vector i , vector j where
12. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 67 | Page
zij =
max(length ti . length tj )
length (tk)
∈ [0,1] (2)
3.6 Language Model Based Approaches
M. Ponte et al [67] proposed language model for information retrieval which is also used for social information
retrieval systems. Where each document is considered as language model viewed as topic model; the probability
of generating the query term by each language model is calculated and ranked with respect to those probabilities
shown in equation (1).
P qD = P qi
Dk
i=1 Using Bayes’ rules
P dD, u =
P(qD,u)P(Du)
P(qu)
(1)
S. Xu et al [68] proposed Language Annotation Model (LAM) which exploits the contents (Contents Model)
and annotations (Annotation Model) set of a resource. The content model id further divided into topic cluster
model and content unigram model by using the relationship among annotations and documents. The Annotation
model is divided into Annotation Unigram Model as word similarity independency model and Annotation
Dependency model as word similarity dependency model. The Language Annotation Model Function is giving
in equation (2):
P qd = {Content Unigram Model + Topic Cluster Model + Annotatin Unigram Model
+ Annotation Dependancy Model}
P qd = λcumPcum qi
d + λtcmPtcm qi
d + λaumPaum qi
d + λadmPadm qi
dm
m
i=1 (2)
X. Wu et al [69] proposed the language model by using the tag assignment as conceptual space among tags,
resources and users by using equation (3).
P dt = P(dcμ)P(cμd)D
μ (3)
M. Harvey et al [70] proposed ranking model based on Latent Dirichlet Allocation Model (LDA) for social
tagging model. The LDA model is modified for tagging topic model (TTM) to formulate the folksonomy
structure into it. The ranking functions in case of LDA, and TTM1 and TTM2 are defined by equations (4), (5),
(6), (7).
P dq ∝ P d . P qd =
Nd)
N
P ωd =w∈q
Nd)
N
P ωzz P zdw∈q (4)
P dq, u ∝ P du . P qd, u = P du P ωd,uw∈q (5)
P du = P(d)
P zd P zd πu
P(z)z And P ωd,u =
P(ωz)P(zd)P(zu)πuP(z)−1
z
P(zd)P(zu)πuP(z)−1
z
(6)
P du = P(d)
P dz P zu πu
P(z)z And P ωd,u =
P(ωz)P(dz)P(zu)πuz
P(dz)P(zu)πuz
(7)
Z. Zhou et al [71] proposed language model based on risk minimization retrieval model in the context of web
tagged document in the following two ways:
a) Language model is expanded with user interests where content based topic is similar to tag-based
categories.
λ1. ∆ P qD , P wq + λ2. ∆ P zwD , P zwq + (1 − λ1 − λ2). ∆ P iUD , P iUq
b) Language model with user interests:
λ1. ∆ P wD , P wq + λ2. ∆ P zwD , P zwq + (1 − λ1 − λ2). ∆ P iUD , P iUq
2. CONCLUSION AND FUTURE SUGGESTIONS
The research problem that has been generated by the collaborative tagging has gaining popularity
among different research communities and academic institutions. Many research publications have contributed
in the area but with the increasing volume and relational complexity of social web paradigm makes it more
favorable for scientists to work in. This survey has brought major contributions toward search result ranking of
bookmarked resources in social bookmarking systems. Different properties of folksonomy model have been
exploited for ranking tagged web documents against query-tag. The adopted PageRank algorithm is followed to
work with the structure of folksonomy systems in order to construct graphs for random surfer to reinforce
weights from node to node for ranking purposes. FolkRank, SocialPageRank, GRank, GFolkRank,
RecommendationPageRank and Social HITS techniques are well known examples. Social Semantics Web
technologies are also being used in combination with graph and textual techniques to propose solutions for the
research problem.
In future an extensive work should be dedicated to work on developing a SocialSeachEngine like Google search
engine which will not only make use of the social ranking algorithms which will not traverse tagged resources
but also perform indexing, consider social factors and relations among the entities to recommend and rank
resources in Social Bookmarking Environments.
13. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 68 | Page
REFERENCES
[1]. A. Hotho, R. Jaschke, C. Schmitz, G. Stumme, Information Retrieval in Folksonomies: Searching and Ranking. Proc. of the 3rd
European Semantic Web conference. Montrnegro. , 2007,411-426.
[2]. Http://www.delicious.com/about.
[3]. S. Brin, L. Page, (1998). The Anatomy of a Large-Scale Hyper-textual Web Search Web Search Engine. Proceeding of the 7th
International Conference on World Wide Web. Brisbane, Australia. 107-117.
[4]. J. Kleinberg, (1999). Authoritative Sources in Hyperlinked Environments. Journal of the ACM. 604-32.
[5]. G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In KDD'02: Proceedings of the eighth ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 538-543. ACM Press, 2002
[6]. R.Lempel, S.Moran, The stochastic approach for link-structure analysys (SALSA) and the TKC effect,
[7]. Proceedings of the 9th International World Wide web Conference, 2000
[8]. Thomas Vander Wal, (2007), Folksonomy Coinage and Definition, 604-611.
[9]. A. Hotho, R. Jaschke, C. Schmitz and G. Stumme, (2006), Bibsonomy: A Social Bookmark and Publication Sharing System.
Proceeding of the First Conceptual Sturctures Tool Interoperability Workshop, pages 87-102.
[10]. A. Mathes (2004), Folksonomies - cooperative classification and communication through shared metadata, Graduate School of
Library and Information Science
[11]. H. Halpin, V. Robu, H. Shepherd (2007), The Complex Dynamics of Collaborative Tagging, Proc. International Conference on
World Wide Web, ACM Press
[12]. X. Li, X. Guo and Y. Zhao (2008), Tag-based Social Interest Discovery, Proceeding of the 17th International Conference on WWW,
China, ACM press, P 675-68
[13]. A. Hotho, R. Jaschke, C. Schmitz and G. Stumme (2006), Emergent Semantics in BibSonomy, Informatics vol 94(2).
[14]. S. Golden and B. A. Hubernam (2006). The Structure of Collaborative Tagging Systems. Journal of Information Sciences (accepted).
[15]. T. Hammond, T. Hannay, B. Lund, J. Scott (2005). Social Bookmarking Tools (I): A General Review. D-Lib Magazine, Vol 11.
[16]. B. Lund, T. Hammond, M. Flack, T. Hannay, (2005). Social Bookmarking Tools (II): A Case Study- Connotea. D-Lib Magazine, Vol
11.
[17]. D. Millen, J. Feinberg, and B. Kerr (2005). Social bookmarking in the enterprise. The ACM Queue, 3(9). 28-35.
[18]. R. Wetzder, C. Zimmermann, and C. Bauckhage, (2008). Analyzing Social Bookmarking Systems: A Delicious Cookbook. In
Mining Social Data (MSoDa) Workshop (ECAI 08)
[19]. N. J. Deka, D. Deka (2012), Tagging and Social Bookmarking: Tools of Library Services in the Digital Ear, In 8th Convention
PLANNER-2012. P 102-108
[20]. L. L. Barners (2011), Social Bookmarking Sites: A Review. Journal of Collaborative Librarianship Vol 3(3), P 180-182
[21]. Fabio Cevasco (2006), Review of ten popular social bookmarking services
[22]. C. P. Lomas (2005), 7 Things You Should Know About Social Bookmarking, Education Learning Initiative, Advance Learning
through IT Innovation (NLII).
[23]. T. M. Farwell, R. D. Waters (2010), Exploring the use of Social Bookmarking Technology in Education: an Analysis of Student
Experience using a course Specific Delicious.com Account, Journal of Online Learning and Teaching, Vol: 6(2), P 398-408
[24]. E. Novak, R. Razzouk, T. E. Johnson (2012). The Education Role of Social Annotation Tools in Higher Education: A Literature
Review, Vol 15(1). P 39-49
[25]. C. S. Redden (2010), Social Bookmarking in Academic Libraries: Trends and Applications, The Journal of Academic Liberian ship,
Vol 36(3), P 219-227
[26]. M. L. Rethleftsen (2007). Tags help make libraries Del.icio.us. Library Journal, 132 (Sept. 2007), P 26-28
[27]. P. Heymann, G. Koutrika, and H. Garcia-Mlina (2008). Can social Bookmarking Improve Web Search? In WSDM ’08: Preceding of
the international Conference on Web search and web data mining, pages 195-206
[28]. B. Krause, A. Hotho, G. Stumme, (2008). A Comparison of Social Bookmarking with Traditional Search. ECIR’2008. 101-113
[29]. M. G. Noll, C. Meinel (2007). Web Search Personalization via Social Bookmarking and Tagging. In proceeding of ISWC 2007.
[30]. S. Xui, S. Bao, B. Fei, Z. Su, Y. Yu (2008). Exploring Folksonomy for Personalized Search. SIGIR 08 Singapore. P 155-162
[31]. D. Vallet, I. Cantador, J. M. Mose (2008), Personalized Web Search With Folksonomy based User and Document Profiles.
[32]. I. Cantador et al (2010). Content based Recommendation in Social Tagging Systems. RecSys 2010, Barcelona Spain.
[33]. Y. Cai, Q. Li (2010), Personalized Search by Tag-based User Profile and Resources Profile in Collaborative Tagging Systems. CIKM
2010, Canada, P 969-978
[34]. P. Wu, Zi-Ke Zhang (2010). Enhancing Personalized Recommendations on weighted Social Tagging Networks, Physics Procedia
3(5): P 1877-1885.
[35]. H. N. Kim, M. Rawashdeh, A. Alghamdi, A. El Saddik (2012), Folksonomy-based Personalized Search and Ranking in Social Media
Services. Journal of Information System vol 37, P 61-76
[36]. H. S. Khalifa (2008). CoolRank: A Social Solution for Ranking Bookmarked Web Resources. IEEE 2008, P 208-210
[37]. V. Zanardi and L. Capra (2008). Social Ranking: Uncovering Relevant content Using Tag-based Recommender Systems. In RecSys,
Switzerland
[38]. Worasit Choochaiwattana, Michael B. Spring (2009). Applying Social Annotations to Retrieve and Re-rank Web Resources.
International Conference on Information Management and Engineering. pp.215-219
[39]. Daly E. M (2009), Harnessing Wisdom of The Crowds Dynamics for Time-Dependent Reputation and Ranking, International
Conference on Social Network Analysis and Mining, P: 262-275
[40]. F. Durao, P. Dolog (2009). A Personalized Tag-based Recommendation in Social Web Systems. Workshop on Adaptation and
Personalization for Web 2.0. P 40-49
[41]. R. Wetztker, C. Zimmermann, C. Bauckhage, S. Albayrak, (2010). I Tag, You Tag, Translating Tags for Advanced User Models
[42]. J. B. Schafer, D. Frankowski, J. Herlocker and S. Sen (2007). Collaborative Filtering Recommender Systems. The Adoptive Web. P
291-324
[43]. D. Parra-Santander, P. Brusilovsky (2009). Collaborative Filtering for Social Tagging Systems: An Experiment with Citeulike.
RecSys, New York (USA).
[44]. J. Germmell, T. Schimoler, B. Mobasher, R. Burke (2012), Resource Recommendation in Social Annotation: A Linear-weighted
Hybrid Approach. Journal of Computer and System Sciences. P 1160-1174.
[45]. S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su, (2007). Optimizing Web Search Using Social Annotations. In Proceeding of 16th
International World Wide Web Conference (ACM Press '07). 501-510.
14. A literature Survey on Ranking Tagged Web Documents in Social Bookmarking Systems
www.iosrjournals.org 69 | Page
[46]. F. Abel, N. Henze, D. Krause and M. Kriesell (2008). A Novel Approach to Social Tagging: GroupMe. 14th International
Conference on Web Information Systems and Technologies.
[47]. F. Abel, N. Henze, D. Krause and M. Kriesell (2008). On the Effect of Group Structures on Ranking Strategies in Folksonomies. In
Workshop on Social Web Search and Mining at 17th Int. World Wide Web Conference.
[48]. F. Abel, M. Baldoni, C. Barogolio, N. Henze, R. Kawase, V. Patti (2010). Research Article: Leveraging Search and Content
Exploration by Exploiting Context in Folksonomy Systems. Hypermedia and Multimedia, P 1-31
[49]. F. Abel, N. Henze and D. Krause (2008). Analyzing Ranking Algorithms in Folksonomy Systems. L3S Research Center, Germany.
[50]. F. Abel, M. Baldoni, C. Baroglio, N. Henze, D. Krause and V. Patti (2009). Context Based Ranking in Folksonomies. Hypertext
ACM, P 209-218
[51]. P. Wen-Chih, L. Chia-Hao (2008), Ranking Web Pages From User Perspectives of Social Bookmarking Sites, International
Conference on Web Intelligence and Intelligent Agent Technology, vol 1 P: 155-161
[52]. Takahashi T., Kitagawa, H. (2008), S-BITS: Social-Bookmarking Induced Topic Search, The 9th International Conference on Web-
Age Information Management, P: 52-30
[53]. Tim Berners-Lee Tim, James Hendler and Ora Lassila (2001). "The Semantic Web" . Scientific American Magazine
[54]. F. Manola, E. Miller (2004). RDF primer, W3C Recommendation
[55]. D. L. McGuiness, F. Harmelen (2004), OWL Web Ontology Language Overview. W3C Recommendation.
[56]. F. Abel, M. Frank, N. Henza, D. Krause, D. Plappert and P. Siehndel (2010), GroupMe: Where Semantic Web meets Web 2.0.
[57]. B. pedi, H. Banati, A. Thukral, (2010), Social Semantic Retrieval and Ranking of eResources, Preceding of the 2010 International
Conference on Advances in Recent Technologies in Communication and Computing (ARTCom), p: 343-345
[58]. P. Bedi, A. Thukral, H. Banati (2012), Focused Crawling of tagged Web Resources Using Ontology, Journal of Computers and
Electrical Engineering,
[59]. A. Thukral, H. Banati, P. Bedi (2011). Ranking Tagged Resources Using Social Semantic Relevance. International Journal of
Information Retrieval, Vol 1(3). P 15-34
[60]. G. Fenza, V. Loia, S. Senatore (2011), Agent-assisted Tagging aimed at Folksonomy-Based Information Retrieval, Intelligent Agent
2011 IEEE , Italy, p: 1-8
[61]. Y. Zhou (2009), Searching and Clustering on Social Tagging Sites, International Conference on Semantics, Knowledge and Grids. P:
99-105.
[62]. C. Zhao, Z. Zhang, H. Li, X. Xie, (2011), A Search Result Ranking Algorithm Based on Web Pages and Tags Clustering,
International Conference on Computer Science and Automation Engineering (CSAE), Vol 4, P: 609-614
[63]. S. Niwa, T. Doi, S. Honiden (2006). Web page recommender system based on folksonomy mining. Information Processing Society
of Japan (IPSJ) Journal, 47(5): 1382-1392.
[64]. A. Shepitsen, J. Germmell, B. Mobasher (2008). Personalized Recommendation in Social Tagging Systems using Hierarchical
Clustering, In Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys 2008) (October 2008), pp. 259-266,
[65]. D. Vandic, J. Van Dam, F. Hogenboom (2011). A Semantic Clustering Based Approach for Searching and Browsing Tag Spaces.
ACM SAC’11. Taiwan
[66]. J. W. Van Dam, D. Vandir, F. Hogenboom, F. Fransincar (2010). Searching and Browsing Tag Spaces Using the Semantic Tag
Clustering Search Framework. 4th IEEE International Conference on Semantic Computing (ICSC 2010), p 436-439
[67]. L. Specia, E. Motta (2007). Integrating Folksonomies with the Semantic Web. 4th European Semantic Web Conference (ESWC
2007). LNCS P 503-517
[68]. M. Ponte and W.B. Croft (1998). A Language Model Approach to Information Retrieval. Proceeding of the 1st ACM SIGIR 1998. P:
275-281.
[69]. S. Xu, S. Bao, Y. Yo (2007). Using Social Annotations to Improve Language Model for Information Retrieval. In CIKM’07 ACM. P
1003-1006.
[70]. X. Wu, L. Zhong and Y. Yo (2006). Exploring Social Annotations for the Semantic Web. In WWW ACM Scotland.
[71]. M. Harvey, I. Rthven, M. J. Carman (2011). Improving Social Bookmarking Search Using Personalized Latent Variable Language
Models. WSDM’11 ACM, China. P 485-494
[72]. Z. Zhou (2011), Social Information Retrieval Based on User Interesting Mining, Journal of Computational Information Systems 7:4
(2011) 1373-1379