SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
1
Dissertation Report on
COMMUNITY DETECTION
IN SOCIAL MEDIA
AS PARTIAL FULFILMENT OF
MASTER OF COMPUTER APPLICATION
SEMESTER-5
BY
BHAGYASHRI MANANI (Enroll# 105090693093)
TANVI SHARMA (Enroll# 105090693070)
UNDER GUIDANCE OF
Dr. SONAL JAIN
SUBMITTED To
GLS INSTITUE OF COMPUTER TECHNOLOGY,
GUJARAT TECHNOLOGICAL UNIVERSITY
2
Acknowledgment
First, we would like to thank Dr. Sonal Jain our internal guide. She
introduces us to the field of Community Detection in social media and provides
guidance for any query through entire dissertation. We would like to thank her for
her valuable suggestions and Show us the way of doing research. Without her
guidance and support the research could not be complete.
We would also like to thank Dr. Harshal Arolkar, Dr. Devarshi Mehta and Dr.
Jyotika Doshi for giving us valuable suggestions and for their reviews and help.
Abstract
Social Media provide many features like online chatting, online discussion,
online communities, advertisement and marketing etc. But it also come up with
issues like community detection, influence maximization, message propagation,
monitoring social media etc. In this research we focus on community finding,
which is one of the major issue of social media. We have studied and discussed
different existing algorithm used for community detection. We also have done
comparative analysis of existing algorithms. These algorithms basically use graph
theory concepts to detect community from web. We also have discussed the
limitation of existing algorithms and proposed solution towards it.
3
Contents
1. Introduction………………………………………………………………………………..5
1.1 What is Data Mining? ......................................................5
1.2 Need for Data Mining in Social Media ……………….………….7
1.3 Research Goal ………………………………………………………….…11
2. Elements of Social Media…………………………………………………………..13
2.1 Profile……………………………………………………………………………...13
2.2 Member………………………………………………………………………..…13
2.3 Group……………………………………………………………………….........14
2.4 Discussion………………………………………………………………..………15
2.5 Blogs………………………………………………………………………………..15
2.6 Widgets …………………………………………………………………………..18
3. Issues of mining Social Media…………………………………………….………20
3.1 Community Detection……………………………………………………...20
3.2 Influence Maximization…………………………………………….……...23
3.3 Message Propagation……………………………………………..………..24
3.4 Monitoring……………………………………………………………..………..25
3.5 Social CRM(Customer Relationship Management)…..……….27
4. Algorithms for Community Detection………………………………………..29
4.1 Vertex Base Community Detection…………………..……………....30
4.1.1 Bron-Kerbosch Algorithm…………..…………………30
4.1.2 Clique-Percolation Method……………………..……34
4.2 Edge Base Community Detection…………………………………......36
4.2.1 Girvan–Newman Algorithm……………..…….……..36
5. Conclusion and Proposed Solution……………………………………..……...38
5.1 Comparative Analysis of Algorithm…………………………..……....38
5.2 Limitation of existing Algorithm………………………………..…...….39
5.3 Proposed Solution………………………………………………………………41
5.4 Future Work……………………………………..…………………………..…..43
References…………………………………………………………………………………………43
4
List of Tables:
5.1 Comparative Analysis of Algorithm……………………………………………………………38
5.2 Edge content Base Community Detection Example……………………………………41
List of Figures:
Figure 1.1: Social Media Websites……………………………………………………………….…7
Figure 2.1: Profile of Social Media Website…………………………………………………...13
Figure 2.2: Member of Social Media Website………………………………………………...14
Figure 2.3: Group of Social Media Website…………………………………………………...15
Figure 2.4: Personal Blogs………………………………………………………………………………16
Figure 2.5: Media Blogs………………………………………………………………………………....17
Figure 2.6: Widgets used in Social Media…………………………………………………......18
Figure 3.1: Example of Community on Website……………………………………………….20
Figure 3.2 Twitter facilities for Monitoring……………………………………………………...26
Figure 3.3 Traditional v/s social CRM System……………………………………………….….28
Figure 4.1: Clique Graph…………………………………………………………………………….…..30
Figure 4.2: Undirected Graph G………………………………………………………………..….….32
Figure 4.3: CPM Graph (a) ………………………………………………………………………..………34
Figure 4.4: CPM Graph (b) ………………………………………………………………………..………34
Figure 5.1: Content of edge example………………………………………………………….………40
Figure5.2: Edge content Base Community Detection………………………………….………42
5
Chapter 1 Introduction
Because of increasing facility of Internet more and more people are
depending on web services. They may search or publish information, download
music and movies, play Game, use social networking websites to interact with
friend and family member, do online shopping, even payments of bills are done
using internet. With the progress of World Wide Web technologies, more and
more data are available online for web users. Web data covers a wide range of
fields like government, sports, entertainment, commercial, health & lifestyle. The
availability of vast amount of web data does not mean that users can get
whatever they want very easily. As more and more data are available on the web,
it takes more time and more effort to find the desired information. It has been
observed that 99% of the data accessible on the web is not useful for 99% of the
users [1]. The massive amount of web data need that, there should be technique
to find useful knowledge hidden behind web data.
1.1 What is Data Mining?
Data is the facts of World. Description about man (gender, height, weight, color,
name, age, education etc.), Animal (category, size, noun, weight, age etc), Mobile
(height, width, color, company, prize), Country (name, population, area, number
of states) etc can be stored and is known as Data. For example Student Database
contain following data.
 Name as “Janki”
 Gender as “girl”
 Result as 60% ,
 Attendance as 90%.
 Year as “1st
”
6
Information is filtered, meaningful and relevant Data.
For example,
 Student named Janki got 60% in B.C.A is information.
 70% student of B.C.A 3rd
year got distinction is information.
 “Sachin” got highest Percentage in 3rd
year B.C.A is information.
Knowledge is information processed in the mind of individual. In other words
Knowledge is the state or fact of knowing; it gains understanding through
experience or study. For example,
 From, Monthly Attendance Report and Exam Result Report teacher
decides does student’s performance is Average, Good, or Excellent? by
applying their knowledge and experience.
Data mining It is commonly defined as the process of discovering useful patterns
or knowledge from data sources like databases, texts, images and the Web also.
Web data mining is when data mining techniques are applied on web data. It
bridges the gap between data and knowledge, which designs to extract useful and
hidden knowledge from massive “garbage” data available on web [2, 3]. Data
mining has many applications in market basket analysis, fraud detection, profiling,
risk management, e-commerce, web analysis and many other fields.
7
1.2 Need for Data Mining in Social Media
What is Social Media?
When it comes to online social networking, websites are commonly used. These
websites are known as social sites. Social media websites is like an online
community of internet users. Online community’s member can share common
interests in hobbies, religion, politics, lifestyle etc. Using social media web sites
people can share text, photos, audio, video, and information. Platforms like
Twitter, Facebook, LinkedIn have created online communities where people can
share as much or as little personal information as they want with other members.
Once you are granted access to a social networking website you are
member of that site and you can use these site to interact with other members.
Person may share their views, thoughts, videos, images ,also can update status,
can communicate with other members, can comment on other’s views and status,
can join groups and also can invite other member for events, can read the profile
pages of other members .
Figure 1.1: Social Media Websites
8
Need for Data Mining in Social Media
Social media can be used to know current trends, opinions, influencers.
Information gathered from social networking site can be used for following
purpose.
 To Improve content marketing by better understanding customer’s
opinion
 Learning what is most relevant regarding your products, brand or even
entire business area.
 To know who are the key influencers
 To know who are intended customers for your product
This allows you to identify people who are interested in your product or content
and find ways of reaching out to them, to create content that attracts people who
are interested in your product, to get back to those people. For example,
Facebook will be able to sell their data to companies wanting to understand
market data. Facebook has the demographic and geographic data in place, and
just needs to sell access to the data.
Why use Social Media for Marketing?
Social media users are increasing day by day and become part of more and more
social media communities. Social media come with lots of features and advantage
some of them we have listed below.
9
 Communicate with customers
Social media allows products server to reach prospective customers and
customers can reach to particular advertisement or web sites or business
employees. Social media is a two-way process and allows marketing person or
technical person to chat with customers or answer any questions of customers
might have. When it comes time to buy the product customer can feel like they
have a friend in the business.
 Word of Mouth
Social media takes word of mouth marketing to new level. When your fans follow
or interact with your page, all of their friends see those interactions happen.
With every interaction, comment and discussion you open up your brand to
hundreds or thousands of prospective community members. Happy customers
can also directly tell their friends on social media about their good experience
with you. They can amplify positive chatter about your business and create a
positive atmosphere for your brand.
 Customer Loyalty
By engaging your customers through social media you have the opportunity to
reward your loyal fans and generate repeat business. By building these
relationships and maintaining them you can build customer loyalty and
satisfaction that rewards your business further.
 Feedback
The value of knowing where you are succeeding and failing can mean everything
in business. Social media lets you directly estimate what works with your fans and
what doesn’t, and allows you to address negative feedback quickly.
10
Example of Social Media used for Marketing
 Twitter
Twitter allows companies to promote products on an individual level. The use of a
product can be explained in short messages that followers are more likely to read.
These messages appear on followers’ home pages. Messages can link to the
product’s website, Facebook profile, photos, videos, etc. This link provides
followers the opportunity to spend more time interacting with the product online.
This interaction can create a loyal connection between product and individual and
can also lead to larger advertising opportunities. Twitter promotes a product in
real-time and brings customers in.
 Facebook
Facebook profiles are more detailed than Twitter. They allow a product to provide
videos, photos, and longer descriptions. Videos can show when a product can be
used as well as how to use it. These also can include testimonials as other
followers can comment on the product pages for others to see. Facebook can link
back to the product’s Twitter page as well as send out event reminders.
 Blogs
Every day there are more reasons for companies to use blogging platforms to
their social media repertoire. Platform like LinkedIn creates an environment for
companies and clients to connect online. Companies that recognize the need for
information, originality, and accessibility employ blogs to make their products
popular and unique, and ultimately reach out to consumers who are privy to
social media. Blogs allow a product or company to provide longer descriptions of
products or services. The longer description can include reasoning and uses. It can
include testimonials and can link to and from Facebook, Twitter and many social
network and blog pages. Blogs can be updated frequently and are promotional
techniques for keeping customers.
11
Online communities benefit businesses because they enable them to reach
the clients of other businesses using the platform. These online environments can
be accessed by virtually anyone; therefore consumers are invited to be a part of
the creative process.
Issues in mining Social Media
Mining the content of social media or performing analysis of social
networking data is becomes major part for online business. Community Detection
is one of the issue which deals with how to detect community in social network,
Influence maximization is the problem of Finding out the person who is working
as influencer I large social network , Message propagation is about analyzing the
pattern or keywords of the messages which are propagated in very short time ,
Social Customer Relationship Management – its goal is to strengthen relationships
with customers, improving and strengthening them through more meaningful
interactions and social media monitoring are the issues of mining Social Media.
1.3 Research Goal
We have seen the need for mining social media. More and more businesses
are running through websites. It can be online selling of products, books, music
cd’s, movie tickets ,railway or airline tickets, hotel booking. More and more
peoples are now member of different social networking sites. By mining social
media business can know current trends, customer’s interest, opinions of
customer toward products and services. This information can be used in business
to reach interested customers more efficiently for advertisement & marketing
purpose.
12
We have focused on Community Detection which is one of the issues of
mining social media. Our goal of the research is to analyze existing algorithm for
community detection; these algorithms basically use graph theory concept to
detect to community and entire social network is represented as graph and nodes
of graph shows actor or member of community while edge between pair of nodes
shows connection between these members. We try to find out limitations of
existing algorithm and proposed solution.
13
Chapter 2 Elements of Social Media
Social networking is based on a certain structure that allows people to
communicate and share their information with each other. This structure includes
having profiles, friends, blog posts, widgets, and usually something unique to that
particular social networking website such as the ability to 'poke' people on
Facebook or high-five someone on Hi5.Following section we have discussed
elements of social media.
2.1 Profile
This is where you tell the world about yourself. Profiles contain basic information,
like where you live and how old you are, religious views, contact details,
educational background, job or business details, Relationship status, profile
picture and personality questions, like who's your favorite actor or politician and
what's your favorite book.
Figure 2.1: Profile of Social Media Website
14
2.2 Members
Members are trusted people of the site who are allowed to view your profile
content (images, video, status), who can post Comments on your profile content
or who can send you private messages. You can also see updates on how
members added in your account are using social networking sites, such as when
they post a new picture or update their profile. Members are the heart and soul
of social networking.
In Facebook they are known as 'friends'; LinkedIn refers to them as
'connections ‘; while twitter refers to them as ‘followers’ where you can tweet
and followers can do reply on your tweet, but all social networks consider
member as trusted people’.
Figure 2.2: Member of Social Media Website
15
2.3 Groups
Most social networks use groups to help you find people with similar interests.
They are both a way to connect with like-minded people and way to identify your
interests.
For example student of HLICA College’s batch 2010 can create group and can
discuss on any topic like exam syllabus, technical events, exam schedule also can
discuss on queries and about the solution.
2.4 Discussions
A primary focus of groups is to create interaction between users in the form of
discussions. Most social networking websites support discussion boards for the
groups, and many also allow members of the group to post pictures, music, video
clips, and other tidbits related to the group.
Figure 2.3: Group of Social Media Website
16
2.5 Blogs
Another feature of some social networks is the ability to create your own blog
entries .A blog is a discussion published on the World Wide Web and consisting
of entries ("posts") typically displayed in reverse order (the most recent post
appears first). Good quality blogs are interactive; allowing visitors to leave
comments and even message each other via GUI widgets on the blogs, and this
interactivity distinguishes them from other static websites. In that sense, blogging
can be seen as a form of social networking. Blog is like article, news or views
towards some points and other members can comment their views and opinions
on that blogs.
 Personal Blogs:
The personal blog, an ongoing diary or commentary by an individual is the most
common blog. Some sites, such as Twitter, allow bloggers to share thoughts and
Figure 2.4: Personal Blogs
17
Feelings instantaneously with friends and family, and are much faster than
emailing or writing. In Facebook its known as Status update.
 Corporate and Organizational Blogs
A blog can be private, as in most cases, or it can be for business purposes. Blogs
used internally to enhance the communication in a corporation or externally for
marketing, branding or public relations purposes are called corporate blogs.
Similar blogs for clubs and societies are called club blogs, group blogs, or by
similar names; typical use is to inform members and other interested parties of
club and member activities. For example, member of Facebook group can post
views, news, updates, articles on that group and other member can do reply on
that post.
 Media Blogs
Blogs with shorter posts and mixed media types are called media blogs. Example
yahoo updates the articles on latest news of celebs, lifestyle, business and
technology. People can comment to that blogs.
Figure 2.5: Media Blogs
18
2.6 Widgets
A popular way of letting your personality shine through is by gracing your social
networking profile with web widgets. Many social networks allow a variety of
widgets, and you can usually find interesting widgets located on widget galleries.
Figure 2.6: Widgets used in Social Media
19
Basic Widgets for Social Website or Blog
 Photo Badge
This photo badge allows you to share your Facebook photos on websites and
blogs. Choose from a vertical, horizontal, or two-column layout and also choose
the number of photos to be displayed.
 Profile Badge
Create a Facebook, twitter or LinkedIn profile to share selected profile
information on your website. A profile badge will allow your users to easily
connect with you and add you as a friend.
 Like Box
This allows your users to publish their content and activity.
 Share Button
This powerful widget allows your visitors to share your content image, video,
article etc.
 Comments Box:
This allow member to comment or post on website content.
20
Chapter 3 Issues of Mining Social Media
Social media provide very good services like online chatting, sharing of
video, images, online game, online communities and also serve as effective tool
for advertising and marketing. Despite of these many features mining social
media is really very essential and is not easy work. It comes with issues like
community detection, influence maximization, message propagation, monitoring
social media and mining customer relationship.
3.1 Community Detection
What is community?
As we have seen, online social networks such as Twitter, Facebook and Twitter
are rapidly gaining popularity. Therefore, social network analysis is becoming a
very important in research field .One major topic in social network analysis is the
study of communities in social networks for advertisement and marketing to
identify target groups.
Figure 3.1: Example of Community on Website
21
A virtual community is a social network of individuals who communicate
with each other through particular social media, crossing geographical and
political boundaries in order to look for mutual interests or goals. It is huge
collections of individuals who interact unusually frequently with each other.
Interesting properties shared by member, such as common hobbies, occupations.
Community word has been included in various social networking sites. A social
network community informs for instance about the following questions:
 Who knows whom?
 Who knows what?
 Who can do what?
 Who looks for what?
 Who offers what?
It provides a wealth of information to its members about other people and allows
managing friends and business partners in effective environment.
What is Community Detection?
Having social media accounts for your business and creating posts for them
is not enough. You need to check whether your posting has the right message and
addresses your target audience. It needs to find right community for effective
advertisement result. Community detection is a different field whose goal is to
detect communities within networks. It tries to answer, when should people be
considered close enough to be in the same community?
In the problem of community detection, goal is detecting communities in
real-world graphs such as large social networks, web graphs, and biological
network. Partition the network into dense regions of the graph. Such dense
22
regions typically correspond to entities which are closely related, and can hence
be said to belong to a community [8, 9].
The determination of such communities is useful in the context of a variety
of applications in social-network analysis, including customer segmentation,
recommendations, and influence analysis. As a result, a number of researches
have been devoted towards algorithms for solving this problem.
Community Detection for Advertisement
The social media software enables anyone without knowledge of coding, to post,
comment on, share or mash up content, and to form communities around shared
interests. Social Media communities are growing at an exponential rate and
represent a huge potential market for Advertising & Marketing. The most well-
known Social Media communities are Linked In, Facebook, Twitter, and YouTube
with blog sites.
Social Media Optimization
It refers to the use of a number of social media outlets and communities to
generate publicity to increase the awareness of a product, brand or event. An
important problem in the area of social networking is that of community
detection so that the addressed content or posts are available to right audience.
23
3.2 Influence Maximization
Influence maximization is the problem of finding out the person who is working as
influencer. [2]
For example, a small company develops a cool online application for an
online social network and wants to market it through the same network. It has a
limited budget such that it can only select a small number of initial users in the
network to use it (by giving them gifts or payments). The company wishes that
these initial users would love the application and start influencing their friends on
the social network to use it, and their friends would influence their friends’
friends and so on, and thus through the word-of-mouth effect a large population
in the social network would adopt the application. The problem is whom to select
as the initial users so that they eventually influence the largest number of people
in the network, This problem, referred to as influence maximization, would be of
interest to many companies as well as individuals that want to promote their
products, services, and innovative ideas through the powerful word-of-mouth
effect (or called viral marketing).
Another example we have discuss is, Topsy analyzed the Twitter reaction to
the bin Laden raid last year [7]. The analysis began with one person tweeting from
Pakistan, and looked at the exposure he received over time. Within the first eight
hours of the raid, the Pakistani Twitter user reached around 1, 00,000 exposures.
Then someone in U.S. media — the influencer in this case found the initial tweets
and retweeted them and, less than one day later, the Pakistani Twitter user had
reached 90 million exposures. After the influencer retweet the message, large
numbers of followers would also retweet the message, increasing the
amplification of that particular tweet.
24
Maximize influence include this kind of mining or technical issues:
 To find out who served as an influencer and was able to amplify that
message
 How many followers do they have?
 Do they get response?
 How many external links point to their blog?
 How many comments do their blog posts attract?
 See how the exposure increased with each amplification
 Track how fast the message is trending
 Learn the positive and negative sentiment
Once you are able to use this analysis to uncover the influencers, you want to be
able to reach out to those key experts, as well as to monitor them to find out
what they are saying – including whether they are saying well, or bad, things
about your brand. You even want to find out to whom they are talking.
3.3 Message Propagation
Social websites including Facebook, Twitter, and linkedIn allow users to
construct a personal profile, share interesting information with other people, and
build relationships within a community. The mode of interaction on social
websites is affecting people’s social behaviors and consumer habits.
Although many marketing techniques may be used to spread information
over a social network, the target consumers should be defined, and the relative
suitable messages should be broadcast to them in a certain time period.
Consequently, enterprises need a tool to analyze message propagation behavior
at different combinations of community and time dimensions. Message
propagation is a problem of try to find out message with some pattern or
keywords that are spread quickly.
25
We have provided one example of to understand message propagation
more clearly, of research done by Shaozhi and Felix on twitter.com [3]. They
collected and analyzed a large data set from the Twitter social network for
following event:
In June 2009, the news of Michael Jackson's death spread all over the world.
Many online social networks were flooded messages related to this breaking
event. They started collecting related messages from Twitter.com on June 27th,
2009, two days after the tragedy. Among all the messages which are crawled, the
tweets containing “Michael Jackson" or MJ" related messages are selected.
After removing the noise, it has been found 5, 49,667 MJ related message posted
by 3, 05, 035 users. 5, 48, 102 messages were posted after Jun 25, 2009.
Need to analyze following things to know how message is propagate
 User id: Unique identifier for the user who posted this message.
 Id: The message ID, which is unique for messages posted by the same user.
Two messages posted by different users may share the same message ID.
 Text: The content of the message.
 Created at: The creation time for this message.
 Source: Twitter, facebook, yahoo blogs any client software was used to
post the message.
 In reply to status id: The message ID which this message replies to.
 In reply to user id: The user ID which this message replies to.
3.4 Social Media Monitoring
Social Media monitoring is about listening to the discussions that take place
around your brand in order to find out different views of people. It is a very
important tool for social media crisis plan and marketing plan as well.
26
Here is good example of social media monitoring, Last week I was watching
television and saw an interesting advertisement for something called the Total
Bib. It kind of made me chuckle, which caused me to tweet something like “Total
Bib reminds me of something out of a Saturday Night Live sketch”. The tweet
received a few laughs and comments by followers. A few hours later, I received a
reply tweet from TotalBib thanking me for the mention in conversation.
I was pretty amazed since I was not following them previously; they were simply
monitoring the stream. They simply took the time and made the effort to do
some simple monitoring of the Twitter stream to identify opportunity.
Social media monitoring involves text mining specific keywords on social
networking websites, blogs, discussion forums and other social media. Essentially,
monitoring software transposes specific words or phrases in unstructured data
into numerical values. The numerical values are linked to structured data in a
database, allowing the data to be analyzed with traditional data mining
techniques.
Figure 3.2 Twitter facilities for Monitoring
27
What are Needs for Social Media Monitoring?
 To know negative criticism about your brand, which you can then respond
to, turning that unhappy customer into a lifelong brand advocate.
 To know positive comments people are saying about your brand, giving you
the opportunity to connect further with those individuals.
 To detect a social media crisis in the rise, before it builds up and begins to
spiral out of control.
3.5 Social CRM Systems:
Social CRM is a strategy based around customer engagement and
interactions being a by-product. Social CRM is an extension of CRM. It means a
back-end process and system for managing different things to different
organizations. Social CRM is about try to understand problems of customer
regarding to product or service and then solving it.
Traditional CRM was very much based around data and information that
brands could collect on their customers, all of which would go into a CRM system
that then allowed the company to better target various customers.
28
In social CRM, customer is actually the focal point of how an organization
operates. Instead of marketing or pushing messages to customers, brands now
talk to and collaborate with customers to solve business problems, empower
customers to shape their own experiences and build customer relationships,
which will hopefully turn into customer advocates. PR now has a very active role
in social CRM (in fact, PR typically owns budgetary control and authority of social
initiatives ahead of every other department). In most organizations, PR
departments manage the social presence of brands and handle the customer
engagement.
Figure 3.3 Traditional v/s social CRM System
29
Chapter4 Algorithms for Community Detection
An important problem in the area of social networking is community
detection. In the problem of community detection, the goal is to partition the
network into dense regions of the graph. Such dense regions typically correspond
to entities which are closely related, and can hence be said to belong to a
community. The problem of community detection in social networking sites has
been broadly studied because of its importance in social networking application.
Before discussing the algorithm in detail, we will introduce some notations.
G = social network graph where G = ( 𝒱, E)
V = vertex set, each vertex in 𝒱 corresponds to an actor in the network,
E = edge set, an edge corresponds to a relationship between this pair of actors,
We have devoted two kinds of methods for community detection:
 Node (vertex) base community detection
o Bron–Kerbosch algorithm
o Clique percolation Method(CPM) algorithm
 Link (edge) base community detection
o Girvan–Newman algorithm
30
4.1 Vertex Base Community Detection
In node base community detection Each Node in a group satisfies some properties
can make community.
Clique is used to describe a group of 2 to 12 (averaging 5 or 6) persons who
interact with each other more regularly and intensely than others in the same
setting.
Maximal Clique is a strongly connected sub-
graph in which all nodes are adjacent to each
other. In this image Nodes {5, 6, 7, 8} form a
maximal clique.
In overlapping community structure node
can be a member of more than one
community.
4.1.1 Bron–Kerbosch Algorithm
The BK algorithm used for non-overlapping community structure and on
undirected graph. The BK algorithm uses the recursive backtracking paradigm to
enumerate all maximal cliques in the graph [6].
Figure 4.1: Clique Graph
31
Algorithm:
 We can find maximal clique using bron-kerbosch algorithm.
 At any given point in time it maintains three lists, R, P and X.
 The set R contains a set of vertices that represent a maximal clique or can
be extended to a maximal clique.
 The set P contains vertices that are connected to all vertices in R and can
be added to R to make a larger clique
 The set X contains vertices that are connected to all vertices in R but are
excluded from being added to R because all cliques containing vertices in X
have already been enumerated in a different recursion cycle.
 N(v) is neighbor of vertex v.
Pseudo Code:
BronKerbosch (R, P, X):
If ( P and X both are empty) {
report R as a maximal clique
}
Choose a pivot vertex u in P ⋃ X
for each vertex v in P  N(u){
BronKerbosch2(R ⋃ {v}, P ⋃ N(v), X ⋃ N(v))
P: = P  {v}
X: = X ⋃ {v}
}
32
Figure 4.2: Undirected Graph G
Example:
 Initially there are 3 sets R = Ø, P = {1, 2, 3, 4, 5, 6}, and X = Ø.
o Select pivot node Ʋ which has maximum number of degree or edges
Ʋ=2 (these node have degree=3);
o Neighbor of Ʋ is n(Ʋ)= {1,3,5}
o p n(Ʋ) = {2, 4, 6} (the vertices that are
elements of set P but that should not be
elements of set N).
 The iteration of the inner loop for Ʋ = 2 makes a
recursive call to the algorithm with R = {2}, P = {1, 3,
5}, and X = Ø. Within this recursive call Ʋ=1 or Ʋ=3 or Ʋ=5
o if Ʋ =1 then R={1,2} p={5} then
 for Ʋ = 5 R={1,2,5} P= Ø ,X= Ø.
o If Ʋ=5 then R={5,2} p={3}
 then for Ʋ =3 R={1,2,5} P=Ø ,X= Ø
o If Ʋ=3 then R={2,3} P=Ø ,X= Ø
 Now Ʋ=4(degree=2) makes a recursive call to the algorithm
with R = {4}, P = {3, 5,6}, and X = Ø (although vertex 2 belongs to the set X in
the outer call to the algorithm, it is not a neighbor of pivot node 4 and is
excluded from the subset of X passed to the recursive call).
o If Ʋ=3 ,then R={3,4} P=Ø ,X= Ø;
o If Ʋ=5 ,then R={5,4} P=Ø ,X= Ø;
o If Ʋ=6 then R={6,4} P=Ø ,X= Ø;
 In final iteration for Ʋ = 6, there is a recursive call to the algorithm
with R = {6}, P = 4 but it already has computed so it is in set X ={4} and set }
P=Ø .
33
BronKerbosch(Ø, {1,2,3,4,5,6}, Ø)
BronKerbosch({2}, {1,3,5}, Ø)
BronKerbosch({2,3}, Ø, Ø): output {2, 3}
BronKerbosch({2,5}, {1}, Ø)
BronKerbosch({1,2,5}, Ø, Ø): output
{1,2,5}
BronKerbosch({4}, {3,5,6}, Ø)
BronKerbosch({3,4}, Ø, Ø): output {3,4}
BronKerbosch({4,5}, Ø, Ø): output {4,5}
BronKerbosch({4,6}, Ø, Ø): output {4,6}
BronKerbosch({6}, Ø, {4}): no output
 The overlap of these can be used to define communities in several ways.
The simplest is to consider only maximal cliques bigger than a minimum
size (number of nodes minimum size=2).
o Community {1,2,5}
Drawbacks:
 The Bron-Kerbosch algorithm for finding cliques in a network is very costly,
running in worst case time in large scale network (number of nodes are
large).
 Overlapping community structure that is node which is part of more than
one community is not supported.
Application:
 The union of these cliques then defines a sub graph whose components
(disconnected parts) then define communities. Such approaches are often
implemented in social network analysis software. UCINET It is a software
34
Figure 4.3: CPM graph(a)
Figure 4.4: CPM graph(b)
package for community detection in social network which uses this
algorithm to detect community. It was developed by Lin Freeman and
Martin.
 URL of UCINET: https://sites.google.com/site/ucinetsoftware/home.
4.1.2 Clique Percolation Method
Clique percolation is a community detection method developed by Gergely
Palla in 2005 [7]. The Clique Percolation Method is a popular approach for
analyzing the overlapping community structure of networks.
Algorithm:
 Find out all cliques of size k (here k=3) in a
given network.
 Construct a clique graph
 Two cliques are adjacent if they share k-1(here
k-1 = 2) nodes.
 Each connected component in the clique graph
form a community
Example:
 Find Cliques of size 3
Here, {1,2,3},{1,3,4,},{4,5,6},{5,6,7}
(5,7,8},{5,6,8},{6,7,8}
 Construct a clique graph for only those cliques
which are adjacent, that is which are sharing k-1
=2 nodes.
35
 Each connected components in the clique graph form a community.
 Communities detected:
o {1,2,3,4}
o {4,5,6,7,8}
Advantage:
 It is not too restrictive (unlike cliques that require each node to be
connected to all other nodes),
 It allows overlaps (a) a node can be a member of several different
communities at the same time, and (i) communities can overlap with each
other by sharing nodes.
Drawback:
 Not all the nodes of graph can participate in k- clique community. for
example leaf node may be always out of community.
 To determine the size of k to find cliques of size K.
Applications:
 CFinder is free software for finding community in networks, based on the
Clique Percolation Method (CPM) developed by Palla.
 URL of CFINDER: http://www.cfinder.org
36
4.2 Link-Base (edge) Community Detection
Girvan–Newman Algorithm
The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman)
is one of the methods used to detect communities in complex systems. The
algorithm is based on the edge betweenness of edges [5].
Betweenness is a centrality measure (used as weight in weighted graph) of
a vertex within a graph. The communities are detected by progressively removing
edges from the original graph, rather than by adding the strongest edges to an
initially empty network.
Algorithm:
The betweenness of a vertex in a graph G: = (V,E) is computed as follows:
1. For each pair of vertices (s, t) compute the shortest paths between
them.
2. For each pair of vertices (s, t) determine the fraction of, shortest paths
and total path of vertex pair (s, t).
3. Sum this fraction over all pairs of vertices (s, t).
Where is total number of shortest paths from node to node ,
is the number of total paths.
37
Procedure:
1. Calculate and assign betweenness
1. Calculate betweenness (weight W) of all the set of vertices V in graph
G.
2. Each vertex pair V{ (s1, e1), … ,( sn , en)} will be assigned associated
weight W1... Wn etc.
2. The edge with the highest weight Wh is removed.
3. The betweenness of all edges affected by the removal is recalculated.
4. Steps 2 and 3 are repeated until no edges remain.
5. The order in which edges are removed is noted and communities are then
detecting using a hierarchical clustering based on reading edges in reverse
order.
Application:
 Snap software’s community detection module uses this algorithm for
community detection which is implied under <cmty.h> file.
 URL of SNAP: http://snap.stanford.edu/snap/description.html
Advantage:
 This algorithm is quite sensitive and gives accurate result.
 This algorithm is one of the few able to detect community structure at all
levels.
Drawback:
Its major drawback is the computational cost.
38
Chapter 5 Conclusion and Proposed Solution
5.1 Comparative Analysis of Algorithm
Bron–Kerbosch
Algorithm
CPM
Algorithm
Girvan–
Newman
algorithm
Node Overlapping Does not allow Allow Allow
Computational Time O(3n/3
)
(n= vertices)
Its
computational
time is high as it
try to find all k-
size cliques in
network
O(m 2
n)
(m=edges
n=vertices)
Application(software) UCINET CFINDER SNAP
Edge content and node
content
Does Not
consider
Does not
consider
Does not
consider
Based on Vertex
structure
Vertex structure Edge structure
Can work efficiently in
given Scale(Number of
nodes in graph)
Small Large Large
Table 5.1 Comparative Analysis of Algorithms
39
Bron kerbosch algorithm has limitation that it does not support overlapping
community structure .Though it is simple and its computational time is less than
other two algorithms. It works efficiently in small size social network.
CPM algorithm developed by Palla, find all k-size cliques in network rolls by
rotating any of its (k-1) edge. Though its computational time is high, it allows one
to find community in graph of having node size is (10)5
[4].
Girvan and Newman algorithm is the first modern algorithm which is based
on edge structure. Links are iteratively removed based on the value of their
betweenness, which expresses the number of shortest paths between pairs of
nodes that pass through the link. Its computation time complexity is O(m 2
n) [4].
5.2 Limitations of Existing Algorithm
The existing algorithms for community detection use only the information about
the linkage (edge) structure and node structure for community detection.
However, in many recent applications, edge content should be consider in order
to provide better supervision to the community detection process. That is edge or
node content should also be considered while detecting community.
While traditional community detection is designed with links and node
structure only, the addition of edge content will give more accurate and relevant
results to the detection process, because it provides understanding of how the
cliques relate to the content on the edges. It is possible that vertices which are
poorly linked may sometimes belong to the same community because of a very
high amount of similarity between the content itself. Thus, in some cases in which
link connectivity and content-based similarity do not agree, it is important to set
up criteria to decide whether the node is part of community or not.
40
For example,
 Two nodes might sharing audio, video, text, image etc. Edge content or
vertex content can be helpful to detect community more effectively.
 In email networks, a communication between two participants can be
considered as edge content. Clearly, participants with similar content of
communication are much more likely to belong to the same community
than those which do not.
 In social media networks such as Facebook, users may tag an image with
keywords. In such cases, it may be possible to construct a network of both
people and images in which the edge content corresponds to the keywords
which are used for tagging. Clearly such keywords provide important and
useful knowledge about the nature of the underlying community.
Figure 5.1: Edge content Example
41
Table 5.2 Community Detection Example
5.3 Proposed Solution
Community detection with edge content and vertex content give more
efficient result. Vertex content algorithm works on 2 individual node’s content.
While, Edge content works on pairwise content or communication between 2
nodes .From the given example we can clearly see how we can detect community
using the edge content passing between two actors or nodes in graph of social
media.
The graph forms two community named Fasttrack watch and Jet Airways. From
figure 5.2 we have created the following table to detect member of community:
Fast Track Watch Community member Student_ABC; Student_XYZ, ;Student_PQR
Jet Airways Community member Student_XYZ ; Traveler_MNO ; Traveler_RST
Name of Node(v) Activity(Edge Content) Keyword
Student_ABC Share “Fasttrack” website link
with Student_XYZ
Fasttrack
Student_XYZ (1)Like the link send by
Student_ABC of “Fasttrack” watch
(2)Comment on the status of
Travelor_ MNO about “Jet
Airways”
Fasttrack, Jet Airways
Student_PQR Tag Student_ABC in “Fasttrack”
watch Image
Fasttrack
Traveler _MNO Update status by latest news of
“Jet Airway’s” flight J530
launching.
Jet Airways
Traveler_RST Like the page of “Jet airways” Jet Airways
42
Figure 5.2: Edge content Community Detection
43
Future Work:
We can develop edge content base algorithm using the concept of matrix
and graph theory which consider one additional field of edge content passing
from one node to another node to detect community in social media graph.
References:
[1] J. Han and M. Kamber: “Data Mining Concepts and Techniques”, 2000.
[2] Wei Chen, Yajun Wang: “Efficient Influence Maximization in Social Networks.”
[3]Shaozhi Ye and Felix Wu: “Measuring Message Propagation and Social
Influence Maximization.”
[4]Andrea Lancichinetti and Santo Fortunato: ” Community Detection Algorithms
Analysis”, 2010.
[5]M. E. J. Newman: ”Detecting Community Structure in Networks” , 2003.
[6] C Bron, J Kerbosch: “Finding All Cliques of an Undirected Graph”, 1973.
[7] G. Palla: “Clique Percolation Method”, 2005.

Contenu connexe

Tendances

Can you trust everything?
Can you trust everything?Can you trust everything?
Can you trust everything?Colin Lieu
 
Trust influence and social media
Trust influence and social mediaTrust influence and social media
Trust influence and social mediaDawn Dawson
 
y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564
y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564
y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564Rajwant Brar
 
Impact of social branding on purchase intention: An empirical study of social...
Impact of social branding on purchase intention: An empirical study of social...Impact of social branding on purchase intention: An empirical study of social...
Impact of social branding on purchase intention: An empirical study of social...Sparkles Soft
 
Literature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social MediaLiterature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social MediaDavid Thompson
 
IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...
IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...
IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...Social Samosa
 
Social Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality IndustrySocial Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality IndustryAni Nacheva
 
Stakeholder Engagement and Public Information Through Social Media: A Study o...
Stakeholder Engagement and Public Information Through Social Media: A Study o...Stakeholder Engagement and Public Information Through Social Media: A Study o...
Stakeholder Engagement and Public Information Through Social Media: A Study o...Marco Bellucci
 
The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...
The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...
The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...Samaan Al-Msallam
 
Corporate Communication & Social Media: A study of its usage pattern
Corporate Communication & Social Media: A study of its usage patternCorporate Communication & Social Media: A study of its usage pattern
Corporate Communication & Social Media: A study of its usage patterninventionjournals
 
Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...
Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...
Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...inventionjournals
 
ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...
ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...
ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...ijseajournal
 
Sohn&kim(2012) attributes of social network services a classification and com...
Sohn&kim(2012) attributes of social network services a classification and com...Sohn&kim(2012) attributes of social network services a classification and com...
Sohn&kim(2012) attributes of social network services a classification and com...Sohn Woong
 
Social Media Define the Era in Digital Media
Social Media Define the Era in Digital MediaSocial Media Define the Era in Digital Media
Social Media Define the Era in Digital Mediainventionjournals
 
Social network privacy guide
Social network privacy guideSocial network privacy guide
Social network privacy guideYury Chemerkin
 

Tendances (20)

Can you trust everything?
Can you trust everything?Can you trust everything?
Can you trust everything?
 
Trust influence and social media
Trust influence and social mediaTrust influence and social media
Trust influence and social media
 
y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564
y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564
y.ac.uk_temp_turnitintool_1480009253._11470_1430518780_65564
 
Hsci538 ppt
Hsci538 pptHsci538 ppt
Hsci538 ppt
 
Professionalism Online: The Effect of Resumes and Social Media on Perceived W...
Professionalism Online: The Effect of Resumes and Social Media on Perceived W...Professionalism Online: The Effect of Resumes and Social Media on Perceived W...
Professionalism Online: The Effect of Resumes and Social Media on Perceived W...
 
Impact of social branding on purchase intention: An empirical study of social...
Impact of social branding on purchase intention: An empirical study of social...Impact of social branding on purchase intention: An empirical study of social...
Impact of social branding on purchase intention: An empirical study of social...
 
Literature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social MediaLiterature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social Media
 
IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...
IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...
IAMAI Factly Report: People below age 20 or above 50 more susceptible to fake...
 
Social Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality IndustrySocial Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality Industry
 
Stakeholder Engagement and Public Information Through Social Media: A Study o...
Stakeholder Engagement and Public Information Through Social Media: A Study o...Stakeholder Engagement and Public Information Through Social Media: A Study o...
Stakeholder Engagement and Public Information Through Social Media: A Study o...
 
Dissertation
DissertationDissertation
Dissertation
 
The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...
The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...
The Effects of Social Media Marketing In the Hotel Industry: Conceptual Model...
 
Corporate Communication & Social Media: A study of its usage pattern
Corporate Communication & Social Media: A study of its usage patternCorporate Communication & Social Media: A study of its usage pattern
Corporate Communication & Social Media: A study of its usage pattern
 
Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...
Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...
Snss & PR Professionals: A Case Study of Facebook PR Groups as a Tool for Bui...
 
ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...
ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...
ANALYSIS OF GAMIFICATION ELEMENTS TO EXPLORE MISINFORMATION SHARING BASED ON ...
 
Sohn&kim(2012) attributes of social network services a classification and com...
Sohn&kim(2012) attributes of social network services a classification and com...Sohn&kim(2012) attributes of social network services a classification and com...
Sohn&kim(2012) attributes of social network services a classification and com...
 
Social Media Define the Era in Digital Media
Social Media Define the Era in Digital MediaSocial Media Define the Era in Digital Media
Social Media Define the Era in Digital Media
 
Social network privacy guide
Social network privacy guideSocial network privacy guide
Social network privacy guide
 
Citizen 2.0
Citizen 2.0Citizen 2.0
Citizen 2.0
 
Twitter
TwitterTwitter
Twitter
 

En vedette

Las organizaciones sanitarias. El futuro que viene, Ignacio Riesgo
Las organizaciones sanitarias. El futuro que viene, Ignacio RiesgoLas organizaciones sanitarias. El futuro que viene, Ignacio Riesgo
Las organizaciones sanitarias. El futuro que viene, Ignacio RiesgoIgnacio Riesgo
 
Przemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, Warszawa
Przemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, WarszawaPrzemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, Warszawa
Przemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, WarszawaEBMASTERS - employer branding community
 
Gilles Thomas's Symposium: Gilles' biography and achievements
Gilles Thomas's Symposium: Gilles' biography and achievementsGilles Thomas's Symposium: Gilles' biography and achievements
Gilles Thomas's Symposium: Gilles' biography and achievementsSiRIC_Curie
 
Encouraging social skills in children
Encouraging social skills in childrenEncouraging social skills in children
Encouraging social skills in childrenHusnara Ansari
 
Reduce_Today_-_Enduser_Presentation
Reduce_Today_-_Enduser_PresentationReduce_Today_-_Enduser_Presentation
Reduce_Today_-_Enduser_PresentationAxel Jacquemein
 
Requisitos para el éxito de una colaboración público/privada en sanidad en Es...
Requisitos para el éxito de una colaboración público/privada en sanidad en Es...Requisitos para el éxito de una colaboración público/privada en sanidad en Es...
Requisitos para el éxito de una colaboración público/privada en sanidad en Es...Ignacio Riesgo
 
Congratulations – your contract signed! Are you ready?
Congratulations – your contract signed!  Are you ready? Congratulations – your contract signed!  Are you ready?
Congratulations – your contract signed! Are you ready? Steve Olson PMP, MSPM
 
Блокуванні навчальними закладами протиправного контенту в мережі Інтернет
Блокуванні навчальними закладами протиправного контенту в мережі ІнтернетБлокуванні навчальними закладами протиправного контенту в мережі Інтернет
Блокуванні навчальними закладами протиправного контенту в мережі ІнтернетАрина Стороженко
 
What You Had to Know To Land That Job
What You Had to Know To Land That JobWhat You Had to Know To Land That Job
What You Had to Know To Land That Jobancientcluster619
 
«Актуальність Батишевських ідей, щодо комп’ютеризації навчання»
«Актуальність Батишевських ідей, щодо комп’ютеризації навчання»«Актуальність Батишевських ідей, щодо комп’ютеризації навчання»
«Актуальність Батишевських ідей, щодо комп’ютеризації навчання» Арина Стороженко
 
Fast&food restaurant.finishodp
Fast&food restaurant.finishodpFast&food restaurant.finishodp
Fast&food restaurant.finishodpRafacmar
 

En vedette (20)

Train ex China
Train ex ChinaTrain ex China
Train ex China
 
Las organizaciones sanitarias. El futuro que viene, Ignacio Riesgo
Las organizaciones sanitarias. El futuro que viene, Ignacio RiesgoLas organizaciones sanitarias. El futuro que viene, Ignacio Riesgo
Las organizaciones sanitarias. El futuro que viene, Ignacio Riesgo
 
Przemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, Warszawa
Przemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, WarszawaPrzemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, Warszawa
Przemek Mitraszewski - Źródła kryzysów w firmie - 04.02.2015, Warszawa
 
Gilles Thomas's Symposium: Gilles' biography and achievements
Gilles Thomas's Symposium: Gilles' biography and achievementsGilles Thomas's Symposium: Gilles' biography and achievements
Gilles Thomas's Symposium: Gilles' biography and achievements
 
Power of one school
Power of one schoolPower of one school
Power of one school
 
CSC Presentation
CSC PresentationCSC Presentation
CSC Presentation
 
part coverage
part coveragepart coverage
part coverage
 
Encouraging social skills in children
Encouraging social skills in childrenEncouraging social skills in children
Encouraging social skills in children
 
Reduce_Today_-_Enduser_Presentation
Reduce_Today_-_Enduser_PresentationReduce_Today_-_Enduser_Presentation
Reduce_Today_-_Enduser_Presentation
 
Requisitos para el éxito de una colaboración público/privada en sanidad en Es...
Requisitos para el éxito de una colaboración público/privada en sanidad en Es...Requisitos para el éxito de una colaboración público/privada en sanidad en Es...
Requisitos para el éxito de una colaboración público/privada en sanidad en Es...
 
Lake tana finance initiative
Lake tana finance initiativeLake tana finance initiative
Lake tana finance initiative
 
Luis m
Luis mLuis m
Luis m
 
M4 project
M4 projectM4 project
M4 project
 
Congratulations – your contract signed! Are you ready?
Congratulations – your contract signed!  Are you ready? Congratulations – your contract signed!  Are you ready?
Congratulations – your contract signed! Are you ready?
 
Блокуванні навчальними закладами протиправного контенту в мережі Інтернет
Блокуванні навчальними закладами протиправного контенту в мережі ІнтернетБлокуванні навчальними закладами протиправного контенту в мережі Інтернет
Блокуванні навчальними закладами протиправного контенту в мережі Інтернет
 
What You Had to Know To Land That Job
What You Had to Know To Land That JobWhat You Had to Know To Land That Job
What You Had to Know To Land That Job
 
Toscana Residence
Toscana ResidenceToscana Residence
Toscana Residence
 
«Актуальність Батишевських ідей, щодо комп’ютеризації навчання»
«Актуальність Батишевських ідей, щодо комп’ютеризації навчання»«Актуальність Батишевських ідей, щодо комп’ютеризації навчання»
«Актуальність Батишевських ідей, щодо комп’ютеризації навчання»
 
Fast&food restaurant.finishodp
Fast&food restaurant.finishodpFast&food restaurant.finishodp
Fast&food restaurant.finishodp
 
Th P3 09
Th P3 09Th P3 09
Th P3 09
 

Similaire à Mining Social Media Issues

The Importance Of Aggregation
The Importance Of AggregationThe Importance Of Aggregation
The Importance Of AggregationRikki Wright
 
Event Detectionification And Event Identification
Event Detectionification And Event IdentificationEvent Detectionification And Event Identification
Event Detectionification And Event IdentificationCarli Ferrante
 
Sm comms slides awaz
Sm comms slides awazSm comms slides awaz
Sm comms slides awazLasa UK
 
Social media's influence in purchase decisions
Social media's influence in purchase decisionsSocial media's influence in purchase decisions
Social media's influence in purchase decisionsAnup Nair
 
The Social Network Essay
The Social Network EssayThe Social Network Essay
The Social Network EssayChelsea Porter
 
Social media and small business
Social media and small businessSocial media and small business
Social media and small businessSmallBizUp
 
CUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISING
CUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISINGCUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISING
CUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISINGSuyash Jain
 
www.mra-net.orgalert .docx
 www.mra-net.orgalert                                        .docx www.mra-net.orgalert                                        .docx
www.mra-net.orgalert .docxMARRY7
 
Using Tumblr, The Radio, And The Papers App
Using Tumblr, The Radio, And The Papers AppUsing Tumblr, The Radio, And The Papers App
Using Tumblr, The Radio, And The Papers AppKris Cundiff
 
Sm comms slides tech try
Sm comms slides tech trySm comms slides tech try
Sm comms slides tech tryLasa UK
 
WHAT IS SOCIAL MEDIA.pdf
WHAT IS SOCIAL MEDIA.pdfWHAT IS SOCIAL MEDIA.pdf
WHAT IS SOCIAL MEDIA.pdfwasim792942
 
Social Network Research Proposal
Social Network Research ProposalSocial Network Research Proposal
Social Network Research ProposalHeather Edwards
 
Social Media: Strategic Shift or Tactical Tool?
Social Media: Strategic Shift or Tactical Tool?Social Media: Strategic Shift or Tactical Tool?
Social Media: Strategic Shift or Tactical Tool?craig lefebvre
 
Webinar: Building a Case for Social Media
Webinar: Building a Case for Social MediaWebinar: Building a Case for Social Media
Webinar: Building a Case for Social MediaHHS Digital
 
Social Media for Support Organisations
Social Media for Support OrganisationsSocial Media for Support Organisations
Social Media for Support OrganisationsLasa UK
 

Similaire à Mining Social Media Issues (20)

The Importance Of Aggregation
The Importance Of AggregationThe Importance Of Aggregation
The Importance Of Aggregation
 
Diveristy
DiveristyDiveristy
Diveristy
 
Event Detectionification And Event Identification
Event Detectionification And Event IdentificationEvent Detectionification And Event Identification
Event Detectionification And Event Identification
 
Dissertation Proposal MBA
Dissertation Proposal MBADissertation Proposal MBA
Dissertation Proposal MBA
 
Sm comms slides awaz
Sm comms slides awazSm comms slides awaz
Sm comms slides awaz
 
Social media's influence in purchase decisions
Social media's influence in purchase decisionsSocial media's influence in purchase decisions
Social media's influence in purchase decisions
 
Bullying In Social Networks
Bullying In Social NetworksBullying In Social Networks
Bullying In Social Networks
 
The Social Network Essay
The Social Network EssayThe Social Network Essay
The Social Network Essay
 
Social media and small business
Social media and small businessSocial media and small business
Social media and small business
 
CUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISING
CUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISINGCUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISING
CUSTOMER PERCEPTION TOWARDS SOCIAL MEDIA ADVERTISING
 
www.mra-net.orgalert .docx
 www.mra-net.orgalert                                        .docx www.mra-net.orgalert                                        .docx
www.mra-net.orgalert .docx
 
H018144450
H018144450H018144450
H018144450
 
Using Tumblr, The Radio, And The Papers App
Using Tumblr, The Radio, And The Papers AppUsing Tumblr, The Radio, And The Papers App
Using Tumblr, The Radio, And The Papers App
 
Sm comms slides tech try
Sm comms slides tech trySm comms slides tech try
Sm comms slides tech try
 
WHAT IS SOCIAL MEDIA.pdf
WHAT IS SOCIAL MEDIA.pdfWHAT IS SOCIAL MEDIA.pdf
WHAT IS SOCIAL MEDIA.pdf
 
Ethics in Technology Handout
Ethics in Technology HandoutEthics in Technology Handout
Ethics in Technology Handout
 
Social Network Research Proposal
Social Network Research ProposalSocial Network Research Proposal
Social Network Research Proposal
 
Social Media: Strategic Shift or Tactical Tool?
Social Media: Strategic Shift or Tactical Tool?Social Media: Strategic Shift or Tactical Tool?
Social Media: Strategic Shift or Tactical Tool?
 
Webinar: Building a Case for Social Media
Webinar: Building a Case for Social MediaWebinar: Building a Case for Social Media
Webinar: Building a Case for Social Media
 
Social Media for Support Organisations
Social Media for Support OrganisationsSocial Media for Support Organisations
Social Media for Support Organisations
 

Mining Social Media Issues

  • 1. 1 Dissertation Report on COMMUNITY DETECTION IN SOCIAL MEDIA AS PARTIAL FULFILMENT OF MASTER OF COMPUTER APPLICATION SEMESTER-5 BY BHAGYASHRI MANANI (Enroll# 105090693093) TANVI SHARMA (Enroll# 105090693070) UNDER GUIDANCE OF Dr. SONAL JAIN SUBMITTED To GLS INSTITUE OF COMPUTER TECHNOLOGY, GUJARAT TECHNOLOGICAL UNIVERSITY
  • 2. 2 Acknowledgment First, we would like to thank Dr. Sonal Jain our internal guide. She introduces us to the field of Community Detection in social media and provides guidance for any query through entire dissertation. We would like to thank her for her valuable suggestions and Show us the way of doing research. Without her guidance and support the research could not be complete. We would also like to thank Dr. Harshal Arolkar, Dr. Devarshi Mehta and Dr. Jyotika Doshi for giving us valuable suggestions and for their reviews and help. Abstract Social Media provide many features like online chatting, online discussion, online communities, advertisement and marketing etc. But it also come up with issues like community detection, influence maximization, message propagation, monitoring social media etc. In this research we focus on community finding, which is one of the major issue of social media. We have studied and discussed different existing algorithm used for community detection. We also have done comparative analysis of existing algorithms. These algorithms basically use graph theory concepts to detect community from web. We also have discussed the limitation of existing algorithms and proposed solution towards it.
  • 3. 3 Contents 1. Introduction………………………………………………………………………………..5 1.1 What is Data Mining? ......................................................5 1.2 Need for Data Mining in Social Media ……………….………….7 1.3 Research Goal ………………………………………………………….…11 2. Elements of Social Media…………………………………………………………..13 2.1 Profile……………………………………………………………………………...13 2.2 Member………………………………………………………………………..…13 2.3 Group……………………………………………………………………….........14 2.4 Discussion………………………………………………………………..………15 2.5 Blogs………………………………………………………………………………..15 2.6 Widgets …………………………………………………………………………..18 3. Issues of mining Social Media…………………………………………….………20 3.1 Community Detection……………………………………………………...20 3.2 Influence Maximization…………………………………………….……...23 3.3 Message Propagation……………………………………………..………..24 3.4 Monitoring……………………………………………………………..………..25 3.5 Social CRM(Customer Relationship Management)…..……….27 4. Algorithms for Community Detection………………………………………..29 4.1 Vertex Base Community Detection…………………..……………....30 4.1.1 Bron-Kerbosch Algorithm…………..…………………30 4.1.2 Clique-Percolation Method……………………..……34 4.2 Edge Base Community Detection…………………………………......36 4.2.1 Girvan–Newman Algorithm……………..…….……..36 5. Conclusion and Proposed Solution……………………………………..……...38 5.1 Comparative Analysis of Algorithm…………………………..……....38 5.2 Limitation of existing Algorithm………………………………..…...….39 5.3 Proposed Solution………………………………………………………………41 5.4 Future Work……………………………………..…………………………..…..43 References…………………………………………………………………………………………43
  • 4. 4 List of Tables: 5.1 Comparative Analysis of Algorithm……………………………………………………………38 5.2 Edge content Base Community Detection Example……………………………………41 List of Figures: Figure 1.1: Social Media Websites……………………………………………………………….…7 Figure 2.1: Profile of Social Media Website…………………………………………………...13 Figure 2.2: Member of Social Media Website………………………………………………...14 Figure 2.3: Group of Social Media Website…………………………………………………...15 Figure 2.4: Personal Blogs………………………………………………………………………………16 Figure 2.5: Media Blogs………………………………………………………………………………....17 Figure 2.6: Widgets used in Social Media…………………………………………………......18 Figure 3.1: Example of Community on Website……………………………………………….20 Figure 3.2 Twitter facilities for Monitoring……………………………………………………...26 Figure 3.3 Traditional v/s social CRM System……………………………………………….….28 Figure 4.1: Clique Graph…………………………………………………………………………….…..30 Figure 4.2: Undirected Graph G………………………………………………………………..….….32 Figure 4.3: CPM Graph (a) ………………………………………………………………………..………34 Figure 4.4: CPM Graph (b) ………………………………………………………………………..………34 Figure 5.1: Content of edge example………………………………………………………….………40 Figure5.2: Edge content Base Community Detection………………………………….………42
  • 5. 5 Chapter 1 Introduction Because of increasing facility of Internet more and more people are depending on web services. They may search or publish information, download music and movies, play Game, use social networking websites to interact with friend and family member, do online shopping, even payments of bills are done using internet. With the progress of World Wide Web technologies, more and more data are available online for web users. Web data covers a wide range of fields like government, sports, entertainment, commercial, health & lifestyle. The availability of vast amount of web data does not mean that users can get whatever they want very easily. As more and more data are available on the web, it takes more time and more effort to find the desired information. It has been observed that 99% of the data accessible on the web is not useful for 99% of the users [1]. The massive amount of web data need that, there should be technique to find useful knowledge hidden behind web data. 1.1 What is Data Mining? Data is the facts of World. Description about man (gender, height, weight, color, name, age, education etc.), Animal (category, size, noun, weight, age etc), Mobile (height, width, color, company, prize), Country (name, population, area, number of states) etc can be stored and is known as Data. For example Student Database contain following data.  Name as “Janki”  Gender as “girl”  Result as 60% ,  Attendance as 90%.  Year as “1st ”
  • 6. 6 Information is filtered, meaningful and relevant Data. For example,  Student named Janki got 60% in B.C.A is information.  70% student of B.C.A 3rd year got distinction is information.  “Sachin” got highest Percentage in 3rd year B.C.A is information. Knowledge is information processed in the mind of individual. In other words Knowledge is the state or fact of knowing; it gains understanding through experience or study. For example,  From, Monthly Attendance Report and Exam Result Report teacher decides does student’s performance is Average, Good, or Excellent? by applying their knowledge and experience. Data mining It is commonly defined as the process of discovering useful patterns or knowledge from data sources like databases, texts, images and the Web also. Web data mining is when data mining techniques are applied on web data. It bridges the gap between data and knowledge, which designs to extract useful and hidden knowledge from massive “garbage” data available on web [2, 3]. Data mining has many applications in market basket analysis, fraud detection, profiling, risk management, e-commerce, web analysis and many other fields.
  • 7. 7 1.2 Need for Data Mining in Social Media What is Social Media? When it comes to online social networking, websites are commonly used. These websites are known as social sites. Social media websites is like an online community of internet users. Online community’s member can share common interests in hobbies, religion, politics, lifestyle etc. Using social media web sites people can share text, photos, audio, video, and information. Platforms like Twitter, Facebook, LinkedIn have created online communities where people can share as much or as little personal information as they want with other members. Once you are granted access to a social networking website you are member of that site and you can use these site to interact with other members. Person may share their views, thoughts, videos, images ,also can update status, can communicate with other members, can comment on other’s views and status, can join groups and also can invite other member for events, can read the profile pages of other members . Figure 1.1: Social Media Websites
  • 8. 8 Need for Data Mining in Social Media Social media can be used to know current trends, opinions, influencers. Information gathered from social networking site can be used for following purpose.  To Improve content marketing by better understanding customer’s opinion  Learning what is most relevant regarding your products, brand or even entire business area.  To know who are the key influencers  To know who are intended customers for your product This allows you to identify people who are interested in your product or content and find ways of reaching out to them, to create content that attracts people who are interested in your product, to get back to those people. For example, Facebook will be able to sell their data to companies wanting to understand market data. Facebook has the demographic and geographic data in place, and just needs to sell access to the data. Why use Social Media for Marketing? Social media users are increasing day by day and become part of more and more social media communities. Social media come with lots of features and advantage some of them we have listed below.
  • 9. 9  Communicate with customers Social media allows products server to reach prospective customers and customers can reach to particular advertisement or web sites or business employees. Social media is a two-way process and allows marketing person or technical person to chat with customers or answer any questions of customers might have. When it comes time to buy the product customer can feel like they have a friend in the business.  Word of Mouth Social media takes word of mouth marketing to new level. When your fans follow or interact with your page, all of their friends see those interactions happen. With every interaction, comment and discussion you open up your brand to hundreds or thousands of prospective community members. Happy customers can also directly tell their friends on social media about their good experience with you. They can amplify positive chatter about your business and create a positive atmosphere for your brand.  Customer Loyalty By engaging your customers through social media you have the opportunity to reward your loyal fans and generate repeat business. By building these relationships and maintaining them you can build customer loyalty and satisfaction that rewards your business further.  Feedback The value of knowing where you are succeeding and failing can mean everything in business. Social media lets you directly estimate what works with your fans and what doesn’t, and allows you to address negative feedback quickly.
  • 10. 10 Example of Social Media used for Marketing  Twitter Twitter allows companies to promote products on an individual level. The use of a product can be explained in short messages that followers are more likely to read. These messages appear on followers’ home pages. Messages can link to the product’s website, Facebook profile, photos, videos, etc. This link provides followers the opportunity to spend more time interacting with the product online. This interaction can create a loyal connection between product and individual and can also lead to larger advertising opportunities. Twitter promotes a product in real-time and brings customers in.  Facebook Facebook profiles are more detailed than Twitter. They allow a product to provide videos, photos, and longer descriptions. Videos can show when a product can be used as well as how to use it. These also can include testimonials as other followers can comment on the product pages for others to see. Facebook can link back to the product’s Twitter page as well as send out event reminders.  Blogs Every day there are more reasons for companies to use blogging platforms to their social media repertoire. Platform like LinkedIn creates an environment for companies and clients to connect online. Companies that recognize the need for information, originality, and accessibility employ blogs to make their products popular and unique, and ultimately reach out to consumers who are privy to social media. Blogs allow a product or company to provide longer descriptions of products or services. The longer description can include reasoning and uses. It can include testimonials and can link to and from Facebook, Twitter and many social network and blog pages. Blogs can be updated frequently and are promotional techniques for keeping customers.
  • 11. 11 Online communities benefit businesses because they enable them to reach the clients of other businesses using the platform. These online environments can be accessed by virtually anyone; therefore consumers are invited to be a part of the creative process. Issues in mining Social Media Mining the content of social media or performing analysis of social networking data is becomes major part for online business. Community Detection is one of the issue which deals with how to detect community in social network, Influence maximization is the problem of Finding out the person who is working as influencer I large social network , Message propagation is about analyzing the pattern or keywords of the messages which are propagated in very short time , Social Customer Relationship Management – its goal is to strengthen relationships with customers, improving and strengthening them through more meaningful interactions and social media monitoring are the issues of mining Social Media. 1.3 Research Goal We have seen the need for mining social media. More and more businesses are running through websites. It can be online selling of products, books, music cd’s, movie tickets ,railway or airline tickets, hotel booking. More and more peoples are now member of different social networking sites. By mining social media business can know current trends, customer’s interest, opinions of customer toward products and services. This information can be used in business to reach interested customers more efficiently for advertisement & marketing purpose.
  • 12. 12 We have focused on Community Detection which is one of the issues of mining social media. Our goal of the research is to analyze existing algorithm for community detection; these algorithms basically use graph theory concept to detect to community and entire social network is represented as graph and nodes of graph shows actor or member of community while edge between pair of nodes shows connection between these members. We try to find out limitations of existing algorithm and proposed solution.
  • 13. 13 Chapter 2 Elements of Social Media Social networking is based on a certain structure that allows people to communicate and share their information with each other. This structure includes having profiles, friends, blog posts, widgets, and usually something unique to that particular social networking website such as the ability to 'poke' people on Facebook or high-five someone on Hi5.Following section we have discussed elements of social media. 2.1 Profile This is where you tell the world about yourself. Profiles contain basic information, like where you live and how old you are, religious views, contact details, educational background, job or business details, Relationship status, profile picture and personality questions, like who's your favorite actor or politician and what's your favorite book. Figure 2.1: Profile of Social Media Website
  • 14. 14 2.2 Members Members are trusted people of the site who are allowed to view your profile content (images, video, status), who can post Comments on your profile content or who can send you private messages. You can also see updates on how members added in your account are using social networking sites, such as when they post a new picture or update their profile. Members are the heart and soul of social networking. In Facebook they are known as 'friends'; LinkedIn refers to them as 'connections ‘; while twitter refers to them as ‘followers’ where you can tweet and followers can do reply on your tweet, but all social networks consider member as trusted people’. Figure 2.2: Member of Social Media Website
  • 15. 15 2.3 Groups Most social networks use groups to help you find people with similar interests. They are both a way to connect with like-minded people and way to identify your interests. For example student of HLICA College’s batch 2010 can create group and can discuss on any topic like exam syllabus, technical events, exam schedule also can discuss on queries and about the solution. 2.4 Discussions A primary focus of groups is to create interaction between users in the form of discussions. Most social networking websites support discussion boards for the groups, and many also allow members of the group to post pictures, music, video clips, and other tidbits related to the group. Figure 2.3: Group of Social Media Website
  • 16. 16 2.5 Blogs Another feature of some social networks is the ability to create your own blog entries .A blog is a discussion published on the World Wide Web and consisting of entries ("posts") typically displayed in reverse order (the most recent post appears first). Good quality blogs are interactive; allowing visitors to leave comments and even message each other via GUI widgets on the blogs, and this interactivity distinguishes them from other static websites. In that sense, blogging can be seen as a form of social networking. Blog is like article, news or views towards some points and other members can comment their views and opinions on that blogs.  Personal Blogs: The personal blog, an ongoing diary or commentary by an individual is the most common blog. Some sites, such as Twitter, allow bloggers to share thoughts and Figure 2.4: Personal Blogs
  • 17. 17 Feelings instantaneously with friends and family, and are much faster than emailing or writing. In Facebook its known as Status update.  Corporate and Organizational Blogs A blog can be private, as in most cases, or it can be for business purposes. Blogs used internally to enhance the communication in a corporation or externally for marketing, branding or public relations purposes are called corporate blogs. Similar blogs for clubs and societies are called club blogs, group blogs, or by similar names; typical use is to inform members and other interested parties of club and member activities. For example, member of Facebook group can post views, news, updates, articles on that group and other member can do reply on that post.  Media Blogs Blogs with shorter posts and mixed media types are called media blogs. Example yahoo updates the articles on latest news of celebs, lifestyle, business and technology. People can comment to that blogs. Figure 2.5: Media Blogs
  • 18. 18 2.6 Widgets A popular way of letting your personality shine through is by gracing your social networking profile with web widgets. Many social networks allow a variety of widgets, and you can usually find interesting widgets located on widget galleries. Figure 2.6: Widgets used in Social Media
  • 19. 19 Basic Widgets for Social Website or Blog  Photo Badge This photo badge allows you to share your Facebook photos on websites and blogs. Choose from a vertical, horizontal, or two-column layout and also choose the number of photos to be displayed.  Profile Badge Create a Facebook, twitter or LinkedIn profile to share selected profile information on your website. A profile badge will allow your users to easily connect with you and add you as a friend.  Like Box This allows your users to publish their content and activity.  Share Button This powerful widget allows your visitors to share your content image, video, article etc.  Comments Box: This allow member to comment or post on website content.
  • 20. 20 Chapter 3 Issues of Mining Social Media Social media provide very good services like online chatting, sharing of video, images, online game, online communities and also serve as effective tool for advertising and marketing. Despite of these many features mining social media is really very essential and is not easy work. It comes with issues like community detection, influence maximization, message propagation, monitoring social media and mining customer relationship. 3.1 Community Detection What is community? As we have seen, online social networks such as Twitter, Facebook and Twitter are rapidly gaining popularity. Therefore, social network analysis is becoming a very important in research field .One major topic in social network analysis is the study of communities in social networks for advertisement and marketing to identify target groups. Figure 3.1: Example of Community on Website
  • 21. 21 A virtual community is a social network of individuals who communicate with each other through particular social media, crossing geographical and political boundaries in order to look for mutual interests or goals. It is huge collections of individuals who interact unusually frequently with each other. Interesting properties shared by member, such as common hobbies, occupations. Community word has been included in various social networking sites. A social network community informs for instance about the following questions:  Who knows whom?  Who knows what?  Who can do what?  Who looks for what?  Who offers what? It provides a wealth of information to its members about other people and allows managing friends and business partners in effective environment. What is Community Detection? Having social media accounts for your business and creating posts for them is not enough. You need to check whether your posting has the right message and addresses your target audience. It needs to find right community for effective advertisement result. Community detection is a different field whose goal is to detect communities within networks. It tries to answer, when should people be considered close enough to be in the same community? In the problem of community detection, goal is detecting communities in real-world graphs such as large social networks, web graphs, and biological network. Partition the network into dense regions of the graph. Such dense
  • 22. 22 regions typically correspond to entities which are closely related, and can hence be said to belong to a community [8, 9]. The determination of such communities is useful in the context of a variety of applications in social-network analysis, including customer segmentation, recommendations, and influence analysis. As a result, a number of researches have been devoted towards algorithms for solving this problem. Community Detection for Advertisement The social media software enables anyone without knowledge of coding, to post, comment on, share or mash up content, and to form communities around shared interests. Social Media communities are growing at an exponential rate and represent a huge potential market for Advertising & Marketing. The most well- known Social Media communities are Linked In, Facebook, Twitter, and YouTube with blog sites. Social Media Optimization It refers to the use of a number of social media outlets and communities to generate publicity to increase the awareness of a product, brand or event. An important problem in the area of social networking is that of community detection so that the addressed content or posts are available to right audience.
  • 23. 23 3.2 Influence Maximization Influence maximization is the problem of finding out the person who is working as influencer. [2] For example, a small company develops a cool online application for an online social network and wants to market it through the same network. It has a limited budget such that it can only select a small number of initial users in the network to use it (by giving them gifts or payments). The company wishes that these initial users would love the application and start influencing their friends on the social network to use it, and their friends would influence their friends’ friends and so on, and thus through the word-of-mouth effect a large population in the social network would adopt the application. The problem is whom to select as the initial users so that they eventually influence the largest number of people in the network, This problem, referred to as influence maximization, would be of interest to many companies as well as individuals that want to promote their products, services, and innovative ideas through the powerful word-of-mouth effect (or called viral marketing). Another example we have discuss is, Topsy analyzed the Twitter reaction to the bin Laden raid last year [7]. The analysis began with one person tweeting from Pakistan, and looked at the exposure he received over time. Within the first eight hours of the raid, the Pakistani Twitter user reached around 1, 00,000 exposures. Then someone in U.S. media — the influencer in this case found the initial tweets and retweeted them and, less than one day later, the Pakistani Twitter user had reached 90 million exposures. After the influencer retweet the message, large numbers of followers would also retweet the message, increasing the amplification of that particular tweet.
  • 24. 24 Maximize influence include this kind of mining or technical issues:  To find out who served as an influencer and was able to amplify that message  How many followers do they have?  Do they get response?  How many external links point to their blog?  How many comments do their blog posts attract?  See how the exposure increased with each amplification  Track how fast the message is trending  Learn the positive and negative sentiment Once you are able to use this analysis to uncover the influencers, you want to be able to reach out to those key experts, as well as to monitor them to find out what they are saying – including whether they are saying well, or bad, things about your brand. You even want to find out to whom they are talking. 3.3 Message Propagation Social websites including Facebook, Twitter, and linkedIn allow users to construct a personal profile, share interesting information with other people, and build relationships within a community. The mode of interaction on social websites is affecting people’s social behaviors and consumer habits. Although many marketing techniques may be used to spread information over a social network, the target consumers should be defined, and the relative suitable messages should be broadcast to them in a certain time period. Consequently, enterprises need a tool to analyze message propagation behavior at different combinations of community and time dimensions. Message propagation is a problem of try to find out message with some pattern or keywords that are spread quickly.
  • 25. 25 We have provided one example of to understand message propagation more clearly, of research done by Shaozhi and Felix on twitter.com [3]. They collected and analyzed a large data set from the Twitter social network for following event: In June 2009, the news of Michael Jackson's death spread all over the world. Many online social networks were flooded messages related to this breaking event. They started collecting related messages from Twitter.com on June 27th, 2009, two days after the tragedy. Among all the messages which are crawled, the tweets containing “Michael Jackson" or MJ" related messages are selected. After removing the noise, it has been found 5, 49,667 MJ related message posted by 3, 05, 035 users. 5, 48, 102 messages were posted after Jun 25, 2009. Need to analyze following things to know how message is propagate  User id: Unique identifier for the user who posted this message.  Id: The message ID, which is unique for messages posted by the same user. Two messages posted by different users may share the same message ID.  Text: The content of the message.  Created at: The creation time for this message.  Source: Twitter, facebook, yahoo blogs any client software was used to post the message.  In reply to status id: The message ID which this message replies to.  In reply to user id: The user ID which this message replies to. 3.4 Social Media Monitoring Social Media monitoring is about listening to the discussions that take place around your brand in order to find out different views of people. It is a very important tool for social media crisis plan and marketing plan as well.
  • 26. 26 Here is good example of social media monitoring, Last week I was watching television and saw an interesting advertisement for something called the Total Bib. It kind of made me chuckle, which caused me to tweet something like “Total Bib reminds me of something out of a Saturday Night Live sketch”. The tweet received a few laughs and comments by followers. A few hours later, I received a reply tweet from TotalBib thanking me for the mention in conversation. I was pretty amazed since I was not following them previously; they were simply monitoring the stream. They simply took the time and made the effort to do some simple monitoring of the Twitter stream to identify opportunity. Social media monitoring involves text mining specific keywords on social networking websites, blogs, discussion forums and other social media. Essentially, monitoring software transposes specific words or phrases in unstructured data into numerical values. The numerical values are linked to structured data in a database, allowing the data to be analyzed with traditional data mining techniques. Figure 3.2 Twitter facilities for Monitoring
  • 27. 27 What are Needs for Social Media Monitoring?  To know negative criticism about your brand, which you can then respond to, turning that unhappy customer into a lifelong brand advocate.  To know positive comments people are saying about your brand, giving you the opportunity to connect further with those individuals.  To detect a social media crisis in the rise, before it builds up and begins to spiral out of control. 3.5 Social CRM Systems: Social CRM is a strategy based around customer engagement and interactions being a by-product. Social CRM is an extension of CRM. It means a back-end process and system for managing different things to different organizations. Social CRM is about try to understand problems of customer regarding to product or service and then solving it. Traditional CRM was very much based around data and information that brands could collect on their customers, all of which would go into a CRM system that then allowed the company to better target various customers.
  • 28. 28 In social CRM, customer is actually the focal point of how an organization operates. Instead of marketing or pushing messages to customers, brands now talk to and collaborate with customers to solve business problems, empower customers to shape their own experiences and build customer relationships, which will hopefully turn into customer advocates. PR now has a very active role in social CRM (in fact, PR typically owns budgetary control and authority of social initiatives ahead of every other department). In most organizations, PR departments manage the social presence of brands and handle the customer engagement. Figure 3.3 Traditional v/s social CRM System
  • 29. 29 Chapter4 Algorithms for Community Detection An important problem in the area of social networking is community detection. In the problem of community detection, the goal is to partition the network into dense regions of the graph. Such dense regions typically correspond to entities which are closely related, and can hence be said to belong to a community. The problem of community detection in social networking sites has been broadly studied because of its importance in social networking application. Before discussing the algorithm in detail, we will introduce some notations. G = social network graph where G = ( 𝒱, E) V = vertex set, each vertex in 𝒱 corresponds to an actor in the network, E = edge set, an edge corresponds to a relationship between this pair of actors, We have devoted two kinds of methods for community detection:  Node (vertex) base community detection o Bron–Kerbosch algorithm o Clique percolation Method(CPM) algorithm  Link (edge) base community detection o Girvan–Newman algorithm
  • 30. 30 4.1 Vertex Base Community Detection In node base community detection Each Node in a group satisfies some properties can make community. Clique is used to describe a group of 2 to 12 (averaging 5 or 6) persons who interact with each other more regularly and intensely than others in the same setting. Maximal Clique is a strongly connected sub- graph in which all nodes are adjacent to each other. In this image Nodes {5, 6, 7, 8} form a maximal clique. In overlapping community structure node can be a member of more than one community. 4.1.1 Bron–Kerbosch Algorithm The BK algorithm used for non-overlapping community structure and on undirected graph. The BK algorithm uses the recursive backtracking paradigm to enumerate all maximal cliques in the graph [6]. Figure 4.1: Clique Graph
  • 31. 31 Algorithm:  We can find maximal clique using bron-kerbosch algorithm.  At any given point in time it maintains three lists, R, P and X.  The set R contains a set of vertices that represent a maximal clique or can be extended to a maximal clique.  The set P contains vertices that are connected to all vertices in R and can be added to R to make a larger clique  The set X contains vertices that are connected to all vertices in R but are excluded from being added to R because all cliques containing vertices in X have already been enumerated in a different recursion cycle.  N(v) is neighbor of vertex v. Pseudo Code: BronKerbosch (R, P, X): If ( P and X both are empty) { report R as a maximal clique } Choose a pivot vertex u in P ⋃ X for each vertex v in P N(u){ BronKerbosch2(R ⋃ {v}, P ⋃ N(v), X ⋃ N(v)) P: = P {v} X: = X ⋃ {v} }
  • 32. 32 Figure 4.2: Undirected Graph G Example:  Initially there are 3 sets R = Ø, P = {1, 2, 3, 4, 5, 6}, and X = Ø. o Select pivot node Ʋ which has maximum number of degree or edges Ʋ=2 (these node have degree=3); o Neighbor of Ʋ is n(Ʋ)= {1,3,5} o p n(Ʋ) = {2, 4, 6} (the vertices that are elements of set P but that should not be elements of set N).  The iteration of the inner loop for Ʋ = 2 makes a recursive call to the algorithm with R = {2}, P = {1, 3, 5}, and X = Ø. Within this recursive call Ʋ=1 or Ʋ=3 or Ʋ=5 o if Ʋ =1 then R={1,2} p={5} then  for Ʋ = 5 R={1,2,5} P= Ø ,X= Ø. o If Ʋ=5 then R={5,2} p={3}  then for Ʋ =3 R={1,2,5} P=Ø ,X= Ø o If Ʋ=3 then R={2,3} P=Ø ,X= Ø  Now Ʋ=4(degree=2) makes a recursive call to the algorithm with R = {4}, P = {3, 5,6}, and X = Ø (although vertex 2 belongs to the set X in the outer call to the algorithm, it is not a neighbor of pivot node 4 and is excluded from the subset of X passed to the recursive call). o If Ʋ=3 ,then R={3,4} P=Ø ,X= Ø; o If Ʋ=5 ,then R={5,4} P=Ø ,X= Ø; o If Ʋ=6 then R={6,4} P=Ø ,X= Ø;  In final iteration for Ʋ = 6, there is a recursive call to the algorithm with R = {6}, P = 4 but it already has computed so it is in set X ={4} and set } P=Ø .
  • 33. 33 BronKerbosch(Ø, {1,2,3,4,5,6}, Ø) BronKerbosch({2}, {1,3,5}, Ø) BronKerbosch({2,3}, Ø, Ø): output {2, 3} BronKerbosch({2,5}, {1}, Ø) BronKerbosch({1,2,5}, Ø, Ø): output {1,2,5} BronKerbosch({4}, {3,5,6}, Ø) BronKerbosch({3,4}, Ø, Ø): output {3,4} BronKerbosch({4,5}, Ø, Ø): output {4,5} BronKerbosch({4,6}, Ø, Ø): output {4,6} BronKerbosch({6}, Ø, {4}): no output  The overlap of these can be used to define communities in several ways. The simplest is to consider only maximal cliques bigger than a minimum size (number of nodes minimum size=2). o Community {1,2,5} Drawbacks:  The Bron-Kerbosch algorithm for finding cliques in a network is very costly, running in worst case time in large scale network (number of nodes are large).  Overlapping community structure that is node which is part of more than one community is not supported. Application:  The union of these cliques then defines a sub graph whose components (disconnected parts) then define communities. Such approaches are often implemented in social network analysis software. UCINET It is a software
  • 34. 34 Figure 4.3: CPM graph(a) Figure 4.4: CPM graph(b) package for community detection in social network which uses this algorithm to detect community. It was developed by Lin Freeman and Martin.  URL of UCINET: https://sites.google.com/site/ucinetsoftware/home. 4.1.2 Clique Percolation Method Clique percolation is a community detection method developed by Gergely Palla in 2005 [7]. The Clique Percolation Method is a popular approach for analyzing the overlapping community structure of networks. Algorithm:  Find out all cliques of size k (here k=3) in a given network.  Construct a clique graph  Two cliques are adjacent if they share k-1(here k-1 = 2) nodes.  Each connected component in the clique graph form a community Example:  Find Cliques of size 3 Here, {1,2,3},{1,3,4,},{4,5,6},{5,6,7} (5,7,8},{5,6,8},{6,7,8}  Construct a clique graph for only those cliques which are adjacent, that is which are sharing k-1 =2 nodes.
  • 35. 35  Each connected components in the clique graph form a community.  Communities detected: o {1,2,3,4} o {4,5,6,7,8} Advantage:  It is not too restrictive (unlike cliques that require each node to be connected to all other nodes),  It allows overlaps (a) a node can be a member of several different communities at the same time, and (i) communities can overlap with each other by sharing nodes. Drawback:  Not all the nodes of graph can participate in k- clique community. for example leaf node may be always out of community.  To determine the size of k to find cliques of size K. Applications:  CFinder is free software for finding community in networks, based on the Clique Percolation Method (CPM) developed by Palla.  URL of CFINDER: http://www.cfinder.org
  • 36. 36 4.2 Link-Base (edge) Community Detection Girvan–Newman Algorithm The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is one of the methods used to detect communities in complex systems. The algorithm is based on the edge betweenness of edges [5]. Betweenness is a centrality measure (used as weight in weighted graph) of a vertex within a graph. The communities are detected by progressively removing edges from the original graph, rather than by adding the strongest edges to an initially empty network. Algorithm: The betweenness of a vertex in a graph G: = (V,E) is computed as follows: 1. For each pair of vertices (s, t) compute the shortest paths between them. 2. For each pair of vertices (s, t) determine the fraction of, shortest paths and total path of vertex pair (s, t). 3. Sum this fraction over all pairs of vertices (s, t). Where is total number of shortest paths from node to node , is the number of total paths.
  • 37. 37 Procedure: 1. Calculate and assign betweenness 1. Calculate betweenness (weight W) of all the set of vertices V in graph G. 2. Each vertex pair V{ (s1, e1), … ,( sn , en)} will be assigned associated weight W1... Wn etc. 2. The edge with the highest weight Wh is removed. 3. The betweenness of all edges affected by the removal is recalculated. 4. Steps 2 and 3 are repeated until no edges remain. 5. The order in which edges are removed is noted and communities are then detecting using a hierarchical clustering based on reading edges in reverse order. Application:  Snap software’s community detection module uses this algorithm for community detection which is implied under <cmty.h> file.  URL of SNAP: http://snap.stanford.edu/snap/description.html Advantage:  This algorithm is quite sensitive and gives accurate result.  This algorithm is one of the few able to detect community structure at all levels. Drawback: Its major drawback is the computational cost.
  • 38. 38 Chapter 5 Conclusion and Proposed Solution 5.1 Comparative Analysis of Algorithm Bron–Kerbosch Algorithm CPM Algorithm Girvan– Newman algorithm Node Overlapping Does not allow Allow Allow Computational Time O(3n/3 ) (n= vertices) Its computational time is high as it try to find all k- size cliques in network O(m 2 n) (m=edges n=vertices) Application(software) UCINET CFINDER SNAP Edge content and node content Does Not consider Does not consider Does not consider Based on Vertex structure Vertex structure Edge structure Can work efficiently in given Scale(Number of nodes in graph) Small Large Large Table 5.1 Comparative Analysis of Algorithms
  • 39. 39 Bron kerbosch algorithm has limitation that it does not support overlapping community structure .Though it is simple and its computational time is less than other two algorithms. It works efficiently in small size social network. CPM algorithm developed by Palla, find all k-size cliques in network rolls by rotating any of its (k-1) edge. Though its computational time is high, it allows one to find community in graph of having node size is (10)5 [4]. Girvan and Newman algorithm is the first modern algorithm which is based on edge structure. Links are iteratively removed based on the value of their betweenness, which expresses the number of shortest paths between pairs of nodes that pass through the link. Its computation time complexity is O(m 2 n) [4]. 5.2 Limitations of Existing Algorithm The existing algorithms for community detection use only the information about the linkage (edge) structure and node structure for community detection. However, in many recent applications, edge content should be consider in order to provide better supervision to the community detection process. That is edge or node content should also be considered while detecting community. While traditional community detection is designed with links and node structure only, the addition of edge content will give more accurate and relevant results to the detection process, because it provides understanding of how the cliques relate to the content on the edges. It is possible that vertices which are poorly linked may sometimes belong to the same community because of a very high amount of similarity between the content itself. Thus, in some cases in which link connectivity and content-based similarity do not agree, it is important to set up criteria to decide whether the node is part of community or not.
  • 40. 40 For example,  Two nodes might sharing audio, video, text, image etc. Edge content or vertex content can be helpful to detect community more effectively.  In email networks, a communication between two participants can be considered as edge content. Clearly, participants with similar content of communication are much more likely to belong to the same community than those which do not.  In social media networks such as Facebook, users may tag an image with keywords. In such cases, it may be possible to construct a network of both people and images in which the edge content corresponds to the keywords which are used for tagging. Clearly such keywords provide important and useful knowledge about the nature of the underlying community. Figure 5.1: Edge content Example
  • 41. 41 Table 5.2 Community Detection Example 5.3 Proposed Solution Community detection with edge content and vertex content give more efficient result. Vertex content algorithm works on 2 individual node’s content. While, Edge content works on pairwise content or communication between 2 nodes .From the given example we can clearly see how we can detect community using the edge content passing between two actors or nodes in graph of social media. The graph forms two community named Fasttrack watch and Jet Airways. From figure 5.2 we have created the following table to detect member of community: Fast Track Watch Community member Student_ABC; Student_XYZ, ;Student_PQR Jet Airways Community member Student_XYZ ; Traveler_MNO ; Traveler_RST Name of Node(v) Activity(Edge Content) Keyword Student_ABC Share “Fasttrack” website link with Student_XYZ Fasttrack Student_XYZ (1)Like the link send by Student_ABC of “Fasttrack” watch (2)Comment on the status of Travelor_ MNO about “Jet Airways” Fasttrack, Jet Airways Student_PQR Tag Student_ABC in “Fasttrack” watch Image Fasttrack Traveler _MNO Update status by latest news of “Jet Airway’s” flight J530 launching. Jet Airways Traveler_RST Like the page of “Jet airways” Jet Airways
  • 42. 42 Figure 5.2: Edge content Community Detection
  • 43. 43 Future Work: We can develop edge content base algorithm using the concept of matrix and graph theory which consider one additional field of edge content passing from one node to another node to detect community in social media graph. References: [1] J. Han and M. Kamber: “Data Mining Concepts and Techniques”, 2000. [2] Wei Chen, Yajun Wang: “Efficient Influence Maximization in Social Networks.” [3]Shaozhi Ye and Felix Wu: “Measuring Message Propagation and Social Influence Maximization.” [4]Andrea Lancichinetti and Santo Fortunato: ” Community Detection Algorithms Analysis”, 2010. [5]M. E. J. Newman: ”Detecting Community Structure in Networks” , 2003. [6] C Bron, J Kerbosch: “Finding All Cliques of an Undirected Graph”, 1973. [7] G. Palla: “Clique Percolation Method”, 2005.