1. 1
Dissertation Report on
COMMUNITY DETECTION
IN SOCIAL MEDIA
AS PARTIAL FULFILMENT OF
MASTER OF COMPUTER APPLICATION
SEMESTER-5
BY
BHAGYASHRI MANANI (Enroll# 105090693093)
TANVI SHARMA (Enroll# 105090693070)
UNDER GUIDANCE OF
Dr. SONAL JAIN
SUBMITTED To
GLS INSTITUE OF COMPUTER TECHNOLOGY,
GUJARAT TECHNOLOGICAL UNIVERSITY
2. 2
Acknowledgment
First, we would like to thank Dr. Sonal Jain our internal guide. She
introduces us to the field of Community Detection in social media and provides
guidance for any query through entire dissertation. We would like to thank her for
her valuable suggestions and Show us the way of doing research. Without her
guidance and support the research could not be complete.
We would also like to thank Dr. Harshal Arolkar, Dr. Devarshi Mehta and Dr.
Jyotika Doshi for giving us valuable suggestions and for their reviews and help.
Abstract
Social Media provide many features like online chatting, online discussion,
online communities, advertisement and marketing etc. But it also come up with
issues like community detection, influence maximization, message propagation,
monitoring social media etc. In this research we focus on community finding,
which is one of the major issue of social media. We have studied and discussed
different existing algorithm used for community detection. We also have done
comparative analysis of existing algorithms. These algorithms basically use graph
theory concepts to detect community from web. We also have discussed the
limitation of existing algorithms and proposed solution towards it.
3. 3
Contents
1. Introduction………………………………………………………………………………..5
1.1 What is Data Mining? ......................................................5
1.2 Need for Data Mining in Social Media ……………….………….7
1.3 Research Goal ………………………………………………………….…11
2. Elements of Social Media…………………………………………………………..13
2.1 Profile……………………………………………………………………………...13
2.2 Member………………………………………………………………………..…13
2.3 Group……………………………………………………………………….........14
2.4 Discussion………………………………………………………………..………15
2.5 Blogs………………………………………………………………………………..15
2.6 Widgets …………………………………………………………………………..18
3. Issues of mining Social Media…………………………………………….………20
3.1 Community Detection……………………………………………………...20
3.2 Influence Maximization…………………………………………….……...23
3.3 Message Propagation……………………………………………..………..24
3.4 Monitoring……………………………………………………………..………..25
3.5 Social CRM(Customer Relationship Management)…..……….27
4. Algorithms for Community Detection………………………………………..29
4.1 Vertex Base Community Detection…………………..……………....30
4.1.1 Bron-Kerbosch Algorithm…………..…………………30
4.1.2 Clique-Percolation Method……………………..……34
4.2 Edge Base Community Detection…………………………………......36
4.2.1 Girvan–Newman Algorithm……………..…….……..36
5. Conclusion and Proposed Solution……………………………………..……...38
5.1 Comparative Analysis of Algorithm…………………………..……....38
5.2 Limitation of existing Algorithm………………………………..…...….39
5.3 Proposed Solution………………………………………………………………41
5.4 Future Work……………………………………..…………………………..…..43
References…………………………………………………………………………………………43
4. 4
List of Tables:
5.1 Comparative Analysis of Algorithm……………………………………………………………38
5.2 Edge content Base Community Detection Example……………………………………41
List of Figures:
Figure 1.1: Social Media Websites……………………………………………………………….…7
Figure 2.1: Profile of Social Media Website…………………………………………………...13
Figure 2.2: Member of Social Media Website………………………………………………...14
Figure 2.3: Group of Social Media Website…………………………………………………...15
Figure 2.4: Personal Blogs………………………………………………………………………………16
Figure 2.5: Media Blogs………………………………………………………………………………....17
Figure 2.6: Widgets used in Social Media…………………………………………………......18
Figure 3.1: Example of Community on Website……………………………………………….20
Figure 3.2 Twitter facilities for Monitoring……………………………………………………...26
Figure 3.3 Traditional v/s social CRM System……………………………………………….….28
Figure 4.1: Clique Graph…………………………………………………………………………….…..30
Figure 4.2: Undirected Graph G………………………………………………………………..….….32
Figure 4.3: CPM Graph (a) ………………………………………………………………………..………34
Figure 4.4: CPM Graph (b) ………………………………………………………………………..………34
Figure 5.1: Content of edge example………………………………………………………….………40
Figure5.2: Edge content Base Community Detection………………………………….………42
5. 5
Chapter 1 Introduction
Because of increasing facility of Internet more and more people are
depending on web services. They may search or publish information, download
music and movies, play Game, use social networking websites to interact with
friend and family member, do online shopping, even payments of bills are done
using internet. With the progress of World Wide Web technologies, more and
more data are available online for web users. Web data covers a wide range of
fields like government, sports, entertainment, commercial, health & lifestyle. The
availability of vast amount of web data does not mean that users can get
whatever they want very easily. As more and more data are available on the web,
it takes more time and more effort to find the desired information. It has been
observed that 99% of the data accessible on the web is not useful for 99% of the
users [1]. The massive amount of web data need that, there should be technique
to find useful knowledge hidden behind web data.
1.1 What is Data Mining?
Data is the facts of World. Description about man (gender, height, weight, color,
name, age, education etc.), Animal (category, size, noun, weight, age etc), Mobile
(height, width, color, company, prize), Country (name, population, area, number
of states) etc can be stored and is known as Data. For example Student Database
contain following data.
Name as “Janki”
Gender as “girl”
Result as 60% ,
Attendance as 90%.
Year as “1st
”
6. 6
Information is filtered, meaningful and relevant Data.
For example,
Student named Janki got 60% in B.C.A is information.
70% student of B.C.A 3rd
year got distinction is information.
“Sachin” got highest Percentage in 3rd
year B.C.A is information.
Knowledge is information processed in the mind of individual. In other words
Knowledge is the state or fact of knowing; it gains understanding through
experience or study. For example,
From, Monthly Attendance Report and Exam Result Report teacher
decides does student’s performance is Average, Good, or Excellent? by
applying their knowledge and experience.
Data mining It is commonly defined as the process of discovering useful patterns
or knowledge from data sources like databases, texts, images and the Web also.
Web data mining is when data mining techniques are applied on web data. It
bridges the gap between data and knowledge, which designs to extract useful and
hidden knowledge from massive “garbage” data available on web [2, 3]. Data
mining has many applications in market basket analysis, fraud detection, profiling,
risk management, e-commerce, web analysis and many other fields.
7. 7
1.2 Need for Data Mining in Social Media
What is Social Media?
When it comes to online social networking, websites are commonly used. These
websites are known as social sites. Social media websites is like an online
community of internet users. Online community’s member can share common
interests in hobbies, religion, politics, lifestyle etc. Using social media web sites
people can share text, photos, audio, video, and information. Platforms like
Twitter, Facebook, LinkedIn have created online communities where people can
share as much or as little personal information as they want with other members.
Once you are granted access to a social networking website you are
member of that site and you can use these site to interact with other members.
Person may share their views, thoughts, videos, images ,also can update status,
can communicate with other members, can comment on other’s views and status,
can join groups and also can invite other member for events, can read the profile
pages of other members .
Figure 1.1: Social Media Websites
8. 8
Need for Data Mining in Social Media
Social media can be used to know current trends, opinions, influencers.
Information gathered from social networking site can be used for following
purpose.
To Improve content marketing by better understanding customer’s
opinion
Learning what is most relevant regarding your products, brand or even
entire business area.
To know who are the key influencers
To know who are intended customers for your product
This allows you to identify people who are interested in your product or content
and find ways of reaching out to them, to create content that attracts people who
are interested in your product, to get back to those people. For example,
Facebook will be able to sell their data to companies wanting to understand
market data. Facebook has the demographic and geographic data in place, and
just needs to sell access to the data.
Why use Social Media for Marketing?
Social media users are increasing day by day and become part of more and more
social media communities. Social media come with lots of features and advantage
some of them we have listed below.
9. 9
Communicate with customers
Social media allows products server to reach prospective customers and
customers can reach to particular advertisement or web sites or business
employees. Social media is a two-way process and allows marketing person or
technical person to chat with customers or answer any questions of customers
might have. When it comes time to buy the product customer can feel like they
have a friend in the business.
Word of Mouth
Social media takes word of mouth marketing to new level. When your fans follow
or interact with your page, all of their friends see those interactions happen.
With every interaction, comment and discussion you open up your brand to
hundreds or thousands of prospective community members. Happy customers
can also directly tell their friends on social media about their good experience
with you. They can amplify positive chatter about your business and create a
positive atmosphere for your brand.
Customer Loyalty
By engaging your customers through social media you have the opportunity to
reward your loyal fans and generate repeat business. By building these
relationships and maintaining them you can build customer loyalty and
satisfaction that rewards your business further.
Feedback
The value of knowing where you are succeeding and failing can mean everything
in business. Social media lets you directly estimate what works with your fans and
what doesn’t, and allows you to address negative feedback quickly.
10. 10
Example of Social Media used for Marketing
Twitter
Twitter allows companies to promote products on an individual level. The use of a
product can be explained in short messages that followers are more likely to read.
These messages appear on followers’ home pages. Messages can link to the
product’s website, Facebook profile, photos, videos, etc. This link provides
followers the opportunity to spend more time interacting with the product online.
This interaction can create a loyal connection between product and individual and
can also lead to larger advertising opportunities. Twitter promotes a product in
real-time and brings customers in.
Facebook
Facebook profiles are more detailed than Twitter. They allow a product to provide
videos, photos, and longer descriptions. Videos can show when a product can be
used as well as how to use it. These also can include testimonials as other
followers can comment on the product pages for others to see. Facebook can link
back to the product’s Twitter page as well as send out event reminders.
Blogs
Every day there are more reasons for companies to use blogging platforms to
their social media repertoire. Platform like LinkedIn creates an environment for
companies and clients to connect online. Companies that recognize the need for
information, originality, and accessibility employ blogs to make their products
popular and unique, and ultimately reach out to consumers who are privy to
social media. Blogs allow a product or company to provide longer descriptions of
products or services. The longer description can include reasoning and uses. It can
include testimonials and can link to and from Facebook, Twitter and many social
network and blog pages. Blogs can be updated frequently and are promotional
techniques for keeping customers.
11. 11
Online communities benefit businesses because they enable them to reach
the clients of other businesses using the platform. These online environments can
be accessed by virtually anyone; therefore consumers are invited to be a part of
the creative process.
Issues in mining Social Media
Mining the content of social media or performing analysis of social
networking data is becomes major part for online business. Community Detection
is one of the issue which deals with how to detect community in social network,
Influence maximization is the problem of Finding out the person who is working
as influencer I large social network , Message propagation is about analyzing the
pattern or keywords of the messages which are propagated in very short time ,
Social Customer Relationship Management – its goal is to strengthen relationships
with customers, improving and strengthening them through more meaningful
interactions and social media monitoring are the issues of mining Social Media.
1.3 Research Goal
We have seen the need for mining social media. More and more businesses
are running through websites. It can be online selling of products, books, music
cd’s, movie tickets ,railway or airline tickets, hotel booking. More and more
peoples are now member of different social networking sites. By mining social
media business can know current trends, customer’s interest, opinions of
customer toward products and services. This information can be used in business
to reach interested customers more efficiently for advertisement & marketing
purpose.
12. 12
We have focused on Community Detection which is one of the issues of
mining social media. Our goal of the research is to analyze existing algorithm for
community detection; these algorithms basically use graph theory concept to
detect to community and entire social network is represented as graph and nodes
of graph shows actor or member of community while edge between pair of nodes
shows connection between these members. We try to find out limitations of
existing algorithm and proposed solution.
13. 13
Chapter 2 Elements of Social Media
Social networking is based on a certain structure that allows people to
communicate and share their information with each other. This structure includes
having profiles, friends, blog posts, widgets, and usually something unique to that
particular social networking website such as the ability to 'poke' people on
Facebook or high-five someone on Hi5.Following section we have discussed
elements of social media.
2.1 Profile
This is where you tell the world about yourself. Profiles contain basic information,
like where you live and how old you are, religious views, contact details,
educational background, job or business details, Relationship status, profile
picture and personality questions, like who's your favorite actor or politician and
what's your favorite book.
Figure 2.1: Profile of Social Media Website
14. 14
2.2 Members
Members are trusted people of the site who are allowed to view your profile
content (images, video, status), who can post Comments on your profile content
or who can send you private messages. You can also see updates on how
members added in your account are using social networking sites, such as when
they post a new picture or update their profile. Members are the heart and soul
of social networking.
In Facebook they are known as 'friends'; LinkedIn refers to them as
'connections ‘; while twitter refers to them as ‘followers’ where you can tweet
and followers can do reply on your tweet, but all social networks consider
member as trusted people’.
Figure 2.2: Member of Social Media Website
15. 15
2.3 Groups
Most social networks use groups to help you find people with similar interests.
They are both a way to connect with like-minded people and way to identify your
interests.
For example student of HLICA College’s batch 2010 can create group and can
discuss on any topic like exam syllabus, technical events, exam schedule also can
discuss on queries and about the solution.
2.4 Discussions
A primary focus of groups is to create interaction between users in the form of
discussions. Most social networking websites support discussion boards for the
groups, and many also allow members of the group to post pictures, music, video
clips, and other tidbits related to the group.
Figure 2.3: Group of Social Media Website
16. 16
2.5 Blogs
Another feature of some social networks is the ability to create your own blog
entries .A blog is a discussion published on the World Wide Web and consisting
of entries ("posts") typically displayed in reverse order (the most recent post
appears first). Good quality blogs are interactive; allowing visitors to leave
comments and even message each other via GUI widgets on the blogs, and this
interactivity distinguishes them from other static websites. In that sense, blogging
can be seen as a form of social networking. Blog is like article, news or views
towards some points and other members can comment their views and opinions
on that blogs.
Personal Blogs:
The personal blog, an ongoing diary or commentary by an individual is the most
common blog. Some sites, such as Twitter, allow bloggers to share thoughts and
Figure 2.4: Personal Blogs
17. 17
Feelings instantaneously with friends and family, and are much faster than
emailing or writing. In Facebook its known as Status update.
Corporate and Organizational Blogs
A blog can be private, as in most cases, or it can be for business purposes. Blogs
used internally to enhance the communication in a corporation or externally for
marketing, branding or public relations purposes are called corporate blogs.
Similar blogs for clubs and societies are called club blogs, group blogs, or by
similar names; typical use is to inform members and other interested parties of
club and member activities. For example, member of Facebook group can post
views, news, updates, articles on that group and other member can do reply on
that post.
Media Blogs
Blogs with shorter posts and mixed media types are called media blogs. Example
yahoo updates the articles on latest news of celebs, lifestyle, business and
technology. People can comment to that blogs.
Figure 2.5: Media Blogs
18. 18
2.6 Widgets
A popular way of letting your personality shine through is by gracing your social
networking profile with web widgets. Many social networks allow a variety of
widgets, and you can usually find interesting widgets located on widget galleries.
Figure 2.6: Widgets used in Social Media
19. 19
Basic Widgets for Social Website or Blog
Photo Badge
This photo badge allows you to share your Facebook photos on websites and
blogs. Choose from a vertical, horizontal, or two-column layout and also choose
the number of photos to be displayed.
Profile Badge
Create a Facebook, twitter or LinkedIn profile to share selected profile
information on your website. A profile badge will allow your users to easily
connect with you and add you as a friend.
Like Box
This allows your users to publish their content and activity.
Share Button
This powerful widget allows your visitors to share your content image, video,
article etc.
Comments Box:
This allow member to comment or post on website content.
20. 20
Chapter 3 Issues of Mining Social Media
Social media provide very good services like online chatting, sharing of
video, images, online game, online communities and also serve as effective tool
for advertising and marketing. Despite of these many features mining social
media is really very essential and is not easy work. It comes with issues like
community detection, influence maximization, message propagation, monitoring
social media and mining customer relationship.
3.1 Community Detection
What is community?
As we have seen, online social networks such as Twitter, Facebook and Twitter
are rapidly gaining popularity. Therefore, social network analysis is becoming a
very important in research field .One major topic in social network analysis is the
study of communities in social networks for advertisement and marketing to
identify target groups.
Figure 3.1: Example of Community on Website
21. 21
A virtual community is a social network of individuals who communicate
with each other through particular social media, crossing geographical and
political boundaries in order to look for mutual interests or goals. It is huge
collections of individuals who interact unusually frequently with each other.
Interesting properties shared by member, such as common hobbies, occupations.
Community word has been included in various social networking sites. A social
network community informs for instance about the following questions:
Who knows whom?
Who knows what?
Who can do what?
Who looks for what?
Who offers what?
It provides a wealth of information to its members about other people and allows
managing friends and business partners in effective environment.
What is Community Detection?
Having social media accounts for your business and creating posts for them
is not enough. You need to check whether your posting has the right message and
addresses your target audience. It needs to find right community for effective
advertisement result. Community detection is a different field whose goal is to
detect communities within networks. It tries to answer, when should people be
considered close enough to be in the same community?
In the problem of community detection, goal is detecting communities in
real-world graphs such as large social networks, web graphs, and biological
network. Partition the network into dense regions of the graph. Such dense
22. 22
regions typically correspond to entities which are closely related, and can hence
be said to belong to a community [8, 9].
The determination of such communities is useful in the context of a variety
of applications in social-network analysis, including customer segmentation,
recommendations, and influence analysis. As a result, a number of researches
have been devoted towards algorithms for solving this problem.
Community Detection for Advertisement
The social media software enables anyone without knowledge of coding, to post,
comment on, share or mash up content, and to form communities around shared
interests. Social Media communities are growing at an exponential rate and
represent a huge potential market for Advertising & Marketing. The most well-
known Social Media communities are Linked In, Facebook, Twitter, and YouTube
with blog sites.
Social Media Optimization
It refers to the use of a number of social media outlets and communities to
generate publicity to increase the awareness of a product, brand or event. An
important problem in the area of social networking is that of community
detection so that the addressed content or posts are available to right audience.
23. 23
3.2 Influence Maximization
Influence maximization is the problem of finding out the person who is working as
influencer. [2]
For example, a small company develops a cool online application for an
online social network and wants to market it through the same network. It has a
limited budget such that it can only select a small number of initial users in the
network to use it (by giving them gifts or payments). The company wishes that
these initial users would love the application and start influencing their friends on
the social network to use it, and their friends would influence their friends’
friends and so on, and thus through the word-of-mouth effect a large population
in the social network would adopt the application. The problem is whom to select
as the initial users so that they eventually influence the largest number of people
in the network, This problem, referred to as influence maximization, would be of
interest to many companies as well as individuals that want to promote their
products, services, and innovative ideas through the powerful word-of-mouth
effect (or called viral marketing).
Another example we have discuss is, Topsy analyzed the Twitter reaction to
the bin Laden raid last year [7]. The analysis began with one person tweeting from
Pakistan, and looked at the exposure he received over time. Within the first eight
hours of the raid, the Pakistani Twitter user reached around 1, 00,000 exposures.
Then someone in U.S. media — the influencer in this case found the initial tweets
and retweeted them and, less than one day later, the Pakistani Twitter user had
reached 90 million exposures. After the influencer retweet the message, large
numbers of followers would also retweet the message, increasing the
amplification of that particular tweet.
24. 24
Maximize influence include this kind of mining or technical issues:
To find out who served as an influencer and was able to amplify that
message
How many followers do they have?
Do they get response?
How many external links point to their blog?
How many comments do their blog posts attract?
See how the exposure increased with each amplification
Track how fast the message is trending
Learn the positive and negative sentiment
Once you are able to use this analysis to uncover the influencers, you want to be
able to reach out to those key experts, as well as to monitor them to find out
what they are saying – including whether they are saying well, or bad, things
about your brand. You even want to find out to whom they are talking.
3.3 Message Propagation
Social websites including Facebook, Twitter, and linkedIn allow users to
construct a personal profile, share interesting information with other people, and
build relationships within a community. The mode of interaction on social
websites is affecting people’s social behaviors and consumer habits.
Although many marketing techniques may be used to spread information
over a social network, the target consumers should be defined, and the relative
suitable messages should be broadcast to them in a certain time period.
Consequently, enterprises need a tool to analyze message propagation behavior
at different combinations of community and time dimensions. Message
propagation is a problem of try to find out message with some pattern or
keywords that are spread quickly.
25. 25
We have provided one example of to understand message propagation
more clearly, of research done by Shaozhi and Felix on twitter.com [3]. They
collected and analyzed a large data set from the Twitter social network for
following event:
In June 2009, the news of Michael Jackson's death spread all over the world.
Many online social networks were flooded messages related to this breaking
event. They started collecting related messages from Twitter.com on June 27th,
2009, two days after the tragedy. Among all the messages which are crawled, the
tweets containing “Michael Jackson" or MJ" related messages are selected.
After removing the noise, it has been found 5, 49,667 MJ related message posted
by 3, 05, 035 users. 5, 48, 102 messages were posted after Jun 25, 2009.
Need to analyze following things to know how message is propagate
User id: Unique identifier for the user who posted this message.
Id: The message ID, which is unique for messages posted by the same user.
Two messages posted by different users may share the same message ID.
Text: The content of the message.
Created at: The creation time for this message.
Source: Twitter, facebook, yahoo blogs any client software was used to
post the message.
In reply to status id: The message ID which this message replies to.
In reply to user id: The user ID which this message replies to.
3.4 Social Media Monitoring
Social Media monitoring is about listening to the discussions that take place
around your brand in order to find out different views of people. It is a very
important tool for social media crisis plan and marketing plan as well.
26. 26
Here is good example of social media monitoring, Last week I was watching
television and saw an interesting advertisement for something called the Total
Bib. It kind of made me chuckle, which caused me to tweet something like “Total
Bib reminds me of something out of a Saturday Night Live sketch”. The tweet
received a few laughs and comments by followers. A few hours later, I received a
reply tweet from TotalBib thanking me for the mention in conversation.
I was pretty amazed since I was not following them previously; they were simply
monitoring the stream. They simply took the time and made the effort to do
some simple monitoring of the Twitter stream to identify opportunity.
Social media monitoring involves text mining specific keywords on social
networking websites, blogs, discussion forums and other social media. Essentially,
monitoring software transposes specific words or phrases in unstructured data
into numerical values. The numerical values are linked to structured data in a
database, allowing the data to be analyzed with traditional data mining
techniques.
Figure 3.2 Twitter facilities for Monitoring
27. 27
What are Needs for Social Media Monitoring?
To know negative criticism about your brand, which you can then respond
to, turning that unhappy customer into a lifelong brand advocate.
To know positive comments people are saying about your brand, giving you
the opportunity to connect further with those individuals.
To detect a social media crisis in the rise, before it builds up and begins to
spiral out of control.
3.5 Social CRM Systems:
Social CRM is a strategy based around customer engagement and
interactions being a by-product. Social CRM is an extension of CRM. It means a
back-end process and system for managing different things to different
organizations. Social CRM is about try to understand problems of customer
regarding to product or service and then solving it.
Traditional CRM was very much based around data and information that
brands could collect on their customers, all of which would go into a CRM system
that then allowed the company to better target various customers.
28. 28
In social CRM, customer is actually the focal point of how an organization
operates. Instead of marketing or pushing messages to customers, brands now
talk to and collaborate with customers to solve business problems, empower
customers to shape their own experiences and build customer relationships,
which will hopefully turn into customer advocates. PR now has a very active role
in social CRM (in fact, PR typically owns budgetary control and authority of social
initiatives ahead of every other department). In most organizations, PR
departments manage the social presence of brands and handle the customer
engagement.
Figure 3.3 Traditional v/s social CRM System
29. 29
Chapter4 Algorithms for Community Detection
An important problem in the area of social networking is community
detection. In the problem of community detection, the goal is to partition the
network into dense regions of the graph. Such dense regions typically correspond
to entities which are closely related, and can hence be said to belong to a
community. The problem of community detection in social networking sites has
been broadly studied because of its importance in social networking application.
Before discussing the algorithm in detail, we will introduce some notations.
G = social network graph where G = ( 𝒱, E)
V = vertex set, each vertex in 𝒱 corresponds to an actor in the network,
E = edge set, an edge corresponds to a relationship between this pair of actors,
We have devoted two kinds of methods for community detection:
Node (vertex) base community detection
o Bron–Kerbosch algorithm
o Clique percolation Method(CPM) algorithm
Link (edge) base community detection
o Girvan–Newman algorithm
30. 30
4.1 Vertex Base Community Detection
In node base community detection Each Node in a group satisfies some properties
can make community.
Clique is used to describe a group of 2 to 12 (averaging 5 or 6) persons who
interact with each other more regularly and intensely than others in the same
setting.
Maximal Clique is a strongly connected sub-
graph in which all nodes are adjacent to each
other. In this image Nodes {5, 6, 7, 8} form a
maximal clique.
In overlapping community structure node
can be a member of more than one
community.
4.1.1 Bron–Kerbosch Algorithm
The BK algorithm used for non-overlapping community structure and on
undirected graph. The BK algorithm uses the recursive backtracking paradigm to
enumerate all maximal cliques in the graph [6].
Figure 4.1: Clique Graph
31. 31
Algorithm:
We can find maximal clique using bron-kerbosch algorithm.
At any given point in time it maintains three lists, R, P and X.
The set R contains a set of vertices that represent a maximal clique or can
be extended to a maximal clique.
The set P contains vertices that are connected to all vertices in R and can
be added to R to make a larger clique
The set X contains vertices that are connected to all vertices in R but are
excluded from being added to R because all cliques containing vertices in X
have already been enumerated in a different recursion cycle.
N(v) is neighbor of vertex v.
Pseudo Code:
BronKerbosch (R, P, X):
If ( P and X both are empty) {
report R as a maximal clique
}
Choose a pivot vertex u in P ⋃ X
for each vertex v in P N(u){
BronKerbosch2(R ⋃ {v}, P ⋃ N(v), X ⋃ N(v))
P: = P {v}
X: = X ⋃ {v}
}
32. 32
Figure 4.2: Undirected Graph G
Example:
Initially there are 3 sets R = Ø, P = {1, 2, 3, 4, 5, 6}, and X = Ø.
o Select pivot node Ʋ which has maximum number of degree or edges
Ʋ=2 (these node have degree=3);
o Neighbor of Ʋ is n(Ʋ)= {1,3,5}
o p n(Ʋ) = {2, 4, 6} (the vertices that are
elements of set P but that should not be
elements of set N).
The iteration of the inner loop for Ʋ = 2 makes a
recursive call to the algorithm with R = {2}, P = {1, 3,
5}, and X = Ø. Within this recursive call Ʋ=1 or Ʋ=3 or Ʋ=5
o if Ʋ =1 then R={1,2} p={5} then
for Ʋ = 5 R={1,2,5} P= Ø ,X= Ø.
o If Ʋ=5 then R={5,2} p={3}
then for Ʋ =3 R={1,2,5} P=Ø ,X= Ø
o If Ʋ=3 then R={2,3} P=Ø ,X= Ø
Now Ʋ=4(degree=2) makes a recursive call to the algorithm
with R = {4}, P = {3, 5,6}, and X = Ø (although vertex 2 belongs to the set X in
the outer call to the algorithm, it is not a neighbor of pivot node 4 and is
excluded from the subset of X passed to the recursive call).
o If Ʋ=3 ,then R={3,4} P=Ø ,X= Ø;
o If Ʋ=5 ,then R={5,4} P=Ø ,X= Ø;
o If Ʋ=6 then R={6,4} P=Ø ,X= Ø;
In final iteration for Ʋ = 6, there is a recursive call to the algorithm
with R = {6}, P = 4 but it already has computed so it is in set X ={4} and set }
P=Ø .
33. 33
BronKerbosch(Ø, {1,2,3,4,5,6}, Ø)
BronKerbosch({2}, {1,3,5}, Ø)
BronKerbosch({2,3}, Ø, Ø): output {2, 3}
BronKerbosch({2,5}, {1}, Ø)
BronKerbosch({1,2,5}, Ø, Ø): output
{1,2,5}
BronKerbosch({4}, {3,5,6}, Ø)
BronKerbosch({3,4}, Ø, Ø): output {3,4}
BronKerbosch({4,5}, Ø, Ø): output {4,5}
BronKerbosch({4,6}, Ø, Ø): output {4,6}
BronKerbosch({6}, Ø, {4}): no output
The overlap of these can be used to define communities in several ways.
The simplest is to consider only maximal cliques bigger than a minimum
size (number of nodes minimum size=2).
o Community {1,2,5}
Drawbacks:
The Bron-Kerbosch algorithm for finding cliques in a network is very costly,
running in worst case time in large scale network (number of nodes are
large).
Overlapping community structure that is node which is part of more than
one community is not supported.
Application:
The union of these cliques then defines a sub graph whose components
(disconnected parts) then define communities. Such approaches are often
implemented in social network analysis software. UCINET It is a software
34. 34
Figure 4.3: CPM graph(a)
Figure 4.4: CPM graph(b)
package for community detection in social network which uses this
algorithm to detect community. It was developed by Lin Freeman and
Martin.
URL of UCINET: https://sites.google.com/site/ucinetsoftware/home.
4.1.2 Clique Percolation Method
Clique percolation is a community detection method developed by Gergely
Palla in 2005 [7]. The Clique Percolation Method is a popular approach for
analyzing the overlapping community structure of networks.
Algorithm:
Find out all cliques of size k (here k=3) in a
given network.
Construct a clique graph
Two cliques are adjacent if they share k-1(here
k-1 = 2) nodes.
Each connected component in the clique graph
form a community
Example:
Find Cliques of size 3
Here, {1,2,3},{1,3,4,},{4,5,6},{5,6,7}
(5,7,8},{5,6,8},{6,7,8}
Construct a clique graph for only those cliques
which are adjacent, that is which are sharing k-1
=2 nodes.
35. 35
Each connected components in the clique graph form a community.
Communities detected:
o {1,2,3,4}
o {4,5,6,7,8}
Advantage:
It is not too restrictive (unlike cliques that require each node to be
connected to all other nodes),
It allows overlaps (a) a node can be a member of several different
communities at the same time, and (i) communities can overlap with each
other by sharing nodes.
Drawback:
Not all the nodes of graph can participate in k- clique community. for
example leaf node may be always out of community.
To determine the size of k to find cliques of size K.
Applications:
CFinder is free software for finding community in networks, based on the
Clique Percolation Method (CPM) developed by Palla.
URL of CFINDER: http://www.cfinder.org
36. 36
4.2 Link-Base (edge) Community Detection
Girvan–Newman Algorithm
The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman)
is one of the methods used to detect communities in complex systems. The
algorithm is based on the edge betweenness of edges [5].
Betweenness is a centrality measure (used as weight in weighted graph) of
a vertex within a graph. The communities are detected by progressively removing
edges from the original graph, rather than by adding the strongest edges to an
initially empty network.
Algorithm:
The betweenness of a vertex in a graph G: = (V,E) is computed as follows:
1. For each pair of vertices (s, t) compute the shortest paths between
them.
2. For each pair of vertices (s, t) determine the fraction of, shortest paths
and total path of vertex pair (s, t).
3. Sum this fraction over all pairs of vertices (s, t).
Where is total number of shortest paths from node to node ,
is the number of total paths.
37. 37
Procedure:
1. Calculate and assign betweenness
1. Calculate betweenness (weight W) of all the set of vertices V in graph
G.
2. Each vertex pair V{ (s1, e1), … ,( sn , en)} will be assigned associated
weight W1... Wn etc.
2. The edge with the highest weight Wh is removed.
3. The betweenness of all edges affected by the removal is recalculated.
4. Steps 2 and 3 are repeated until no edges remain.
5. The order in which edges are removed is noted and communities are then
detecting using a hierarchical clustering based on reading edges in reverse
order.
Application:
Snap software’s community detection module uses this algorithm for
community detection which is implied under <cmty.h> file.
URL of SNAP: http://snap.stanford.edu/snap/description.html
Advantage:
This algorithm is quite sensitive and gives accurate result.
This algorithm is one of the few able to detect community structure at all
levels.
Drawback:
Its major drawback is the computational cost.
38. 38
Chapter 5 Conclusion and Proposed Solution
5.1 Comparative Analysis of Algorithm
Bron–Kerbosch
Algorithm
CPM
Algorithm
Girvan–
Newman
algorithm
Node Overlapping Does not allow Allow Allow
Computational Time O(3n/3
)
(n= vertices)
Its
computational
time is high as it
try to find all k-
size cliques in
network
O(m 2
n)
(m=edges
n=vertices)
Application(software) UCINET CFINDER SNAP
Edge content and node
content
Does Not
consider
Does not
consider
Does not
consider
Based on Vertex
structure
Vertex structure Edge structure
Can work efficiently in
given Scale(Number of
nodes in graph)
Small Large Large
Table 5.1 Comparative Analysis of Algorithms
39. 39
Bron kerbosch algorithm has limitation that it does not support overlapping
community structure .Though it is simple and its computational time is less than
other two algorithms. It works efficiently in small size social network.
CPM algorithm developed by Palla, find all k-size cliques in network rolls by
rotating any of its (k-1) edge. Though its computational time is high, it allows one
to find community in graph of having node size is (10)5
[4].
Girvan and Newman algorithm is the first modern algorithm which is based
on edge structure. Links are iteratively removed based on the value of their
betweenness, which expresses the number of shortest paths between pairs of
nodes that pass through the link. Its computation time complexity is O(m 2
n) [4].
5.2 Limitations of Existing Algorithm
The existing algorithms for community detection use only the information about
the linkage (edge) structure and node structure for community detection.
However, in many recent applications, edge content should be consider in order
to provide better supervision to the community detection process. That is edge or
node content should also be considered while detecting community.
While traditional community detection is designed with links and node
structure only, the addition of edge content will give more accurate and relevant
results to the detection process, because it provides understanding of how the
cliques relate to the content on the edges. It is possible that vertices which are
poorly linked may sometimes belong to the same community because of a very
high amount of similarity between the content itself. Thus, in some cases in which
link connectivity and content-based similarity do not agree, it is important to set
up criteria to decide whether the node is part of community or not.
40. 40
For example,
Two nodes might sharing audio, video, text, image etc. Edge content or
vertex content can be helpful to detect community more effectively.
In email networks, a communication between two participants can be
considered as edge content. Clearly, participants with similar content of
communication are much more likely to belong to the same community
than those which do not.
In social media networks such as Facebook, users may tag an image with
keywords. In such cases, it may be possible to construct a network of both
people and images in which the edge content corresponds to the keywords
which are used for tagging. Clearly such keywords provide important and
useful knowledge about the nature of the underlying community.
Figure 5.1: Edge content Example
41. 41
Table 5.2 Community Detection Example
5.3 Proposed Solution
Community detection with edge content and vertex content give more
efficient result. Vertex content algorithm works on 2 individual node’s content.
While, Edge content works on pairwise content or communication between 2
nodes .From the given example we can clearly see how we can detect community
using the edge content passing between two actors or nodes in graph of social
media.
The graph forms two community named Fasttrack watch and Jet Airways. From
figure 5.2 we have created the following table to detect member of community:
Fast Track Watch Community member Student_ABC; Student_XYZ, ;Student_PQR
Jet Airways Community member Student_XYZ ; Traveler_MNO ; Traveler_RST
Name of Node(v) Activity(Edge Content) Keyword
Student_ABC Share “Fasttrack” website link
with Student_XYZ
Fasttrack
Student_XYZ (1)Like the link send by
Student_ABC of “Fasttrack” watch
(2)Comment on the status of
Travelor_ MNO about “Jet
Airways”
Fasttrack, Jet Airways
Student_PQR Tag Student_ABC in “Fasttrack”
watch Image
Fasttrack
Traveler _MNO Update status by latest news of
“Jet Airway’s” flight J530
launching.
Jet Airways
Traveler_RST Like the page of “Jet airways” Jet Airways
43. 43
Future Work:
We can develop edge content base algorithm using the concept of matrix
and graph theory which consider one additional field of edge content passing
from one node to another node to detect community in social media graph.
References:
[1] J. Han and M. Kamber: “Data Mining Concepts and Techniques”, 2000.
[2] Wei Chen, Yajun Wang: “Efficient Influence Maximization in Social Networks.”
[3]Shaozhi Ye and Felix Wu: “Measuring Message Propagation and Social
Influence Maximization.”
[4]Andrea Lancichinetti and Santo Fortunato: ” Community Detection Algorithms
Analysis”, 2010.
[5]M. E. J. Newman: ”Detecting Community Structure in Networks” , 2003.
[6] C Bron, J Kerbosch: “Finding All Cliques of an Undirected Graph”, 1973.
[7] G. Palla: “Clique Percolation Method”, 2005.