SlideShare une entreprise Scribd logo
1  sur  16
Network of Movie Stars
                                          Social Network Analysis
                                                  Programming Project
                                                 moogway@outlook.com


Copyright: Copyrighted under Creative Commons License (Maybe. Am not sure how that works exactly): Please ask before you
use it. Even if you don’t ask, just attribute it. I know I won’t be able to do anything if you don’t attribute it. Except well maybe you
                           will step in a puddle while heading to an important meeting. Think about that.
I – Purpose of The Study
• To develop an insight in the community of movie actors/directors by analyzing their collaborations and
  audiences’ reactions to these collaborations (by collaborations , I mean movies, but “collaborations” sounds
  more scientific).

II – Methodology
• Since it would have been too heavy a computation to analyze the data for all the actors and directors; as a proof
  of concept of this research, only the actors/directors who have featured in the IMDB Top 1000 Movies list
  (sourced from http://www.icheckmovies.com/lists/imdb+top+1000/lampadatriste/) ,were kept as subjects

• The aforementioned page was saved as an HTML file and a python parsing script was used to parse through
  the page and pick up IMDB links for top 1000 movie pages (code included no not included, because I ran out of
  time trying to format/comment all the files, thus I could not submit the project, which was depressing. But let me
  know if you need it. I will send it across. It’s a 14 line code including the empty lines)

• The links were then scraped using a python scraping script (using BeautifulSoup) to collect the following data
  (code included)
            • Director(s)
            • 3 Main Stars (mentioned separately on IMDB movie page)
            • Year of Release
            • Movie Rating

• The data was stored in a csv file which was cleaned up manually (a bit) and then analyzed using R and written
  into GML (source code included)

• The GML was then loaded up in Gephi to visualize and analyze the network (results included)
III – Defining Nodes, Edges, Edgeweights
Nodes:
• Each unique director, actor is a node (and shall be referred to as nodes/node henceforth)
• Node attributes:
           • Id – Node Id
           • Name – Name of Actor/Director
           • Appcount – Number of movies in the list involving a particular node
           • Skillscore – This is dependent on rating of each movie in which a node appears

Edges:
• All the nodes who have worked together in a movie get an edge between them.
• Edge Attributes:
            • Source
            • Target
            • Count – Number of times each edge appears
            • Edge weight – There are two ways to calculate edge weights, depending on what one wants to
              understand
                       • Weight depending on Skillscores of involved nodes
                       • Weight depending on the quality of the partnership between two nodes (comes from
                         the movie rating)
III – Skillscore and Edge weights
R Code to help with a Rubric:
#INCLUDE THE REQUIRED LIBRARY
library(lattice)

#READ THE DATAFILE AND ASSIGN THE RATINGS
TO A NUMERIC VECTOR
data <- read.csv(<CSV FILE PATH>,
colClasses='character')
ratings <- as.numeric(data[,]$RATINGS)

plot(histogram(ratings))

#CALCULATE INTERVALS
span = seq(min(ratings),max(ratings)+.1, by=.3)
span.cut = cut(ratings, span, right=TRUE)
span.freq = table(span.cut)
print(span.freq)




Movie Rating Range               X=Contribution to
                                 Skillscore
[9.1, 9.5]                       6.0
                                                     How is Skillscore Calculated?
[8.9, 9.0]                       5.3                 Skillscore is the sum of the contributions corresponding
[8.7, 8.8]                       4.5                 to each of a node’s movie’s rating (aka X in the table to
                                                     the left). So if I appear in three movies with ratings 7.1,
[8.3, 8.6]                       3.5                 9.2 and 8.2, my skillscore would be 2.0 + 6.0 + 3.0 =
[7.9, 8.2]                       3.0                 11.0.
[7.5, 7.8]                       2.5                 Simple. More could be done with it but for now, this is it.
[7.0, 7.4]                       2.0
III – Skillscore and Edge-weights (b)
Edge-weight:

Option 1: Focus on the quality of the partnership between the nodes. Here the edge weight is calculated on the
basis of rating(s) of the movie(s) in which the two nodes appear together. For example, if node A and B occur
together in two movies of ratings 8.3 and 9.0, the edge-weight will be calculated using the same Skillscore rubric.
Formula is
                          Edge Weight = 1 + P*log(Q*skillscore)
                          Skillscore = sum of all X’s (using the rubric from the last slide) corresponding to each
                                      movie’s rating (in which the two nodes appear together)
                          P, Q are constants which can be varied to optimize the width of edges in Gephi

Therefore, for our example
                       Edge Weight = 1 + P*log(Q*(3.5+5.3))

This is a straightforward way to figure out what sort of partnerships (between which nodes) work in the movie
business.

Option 2: Focus on the skills of the nodes. Here the edge weight is calculated on the basis of skillscores of each
individual node connected by an edge. For any nodes A and B, formula is:

                       Edge Weight = 1 + P*log(Q*skillscore(A)*skillscore(B))
                       Skillscore(i) = sum of all X’s (using the rubric from the last slide) corresponding to each
                                    movie’s rating (in which the node i appears)
                       P, Q are constants which can be varied to optimize the width of edges in Gephi

This makes sense when we are calculating weighted degrees and want to figure out if a node is connected to
many averagely skilled nodes or few highly skilled nodes.
IV – What does the network look like?
                        Quantitative Details:
                        •   Number of Nodes: 2159
                        •   Number of Edges: 5813 (pretty sparse graph, evidently)
                        •   Number of Connected Components: 80
                        •   Size of Giant Component: 1782 nodes (makes up for Hollywood)
                        •   Number of communities detected (Modularity Resolution = 2.0) ~ 90
                        •   Average Degree: 5.385
                        •   Minimum Degree: 3 (because each movie has at least one director and 3
                            actors, thus each movie is a clique on the graph in which each node’s
                            degree = 3)
                        •   Maximum Degree: 53 (who could this be? Any guesses?)
                        •   Clustering Coefficient: 0.775 (4297 triangles. Understandable as each
                            movie clique contains at least 4 nodes, so at least 4 triangles in each
                            clique and 1000 cliques, total 4000 triangles at least)
                        •   Network Diameter = 11
                        •   Avg. Path Length = 4.782

                        We’ll talk more Mathematics when we talk about edge weights later.


                        What is this image on the left?
                        This is the network. Visualized using Radial Axis Algorithm on Gephi. Here
                        the nodes have been arranged in a circle with each spar of the circle
                        belonging to a modularity class. The long spars are communities with most
                        members and the shadow beneath each spar is a dense network of edges.
                        We’ll talk more about what modularity classes signify in the coming slides.
                        As of now, we can see that the network is a collection of 10-14 big
                        communities and many small communities..
V – Observations and Inferences
                                  Nodes (type director) of
                                  one of the Communities
                                  identified (one of the 90
                                  modularity classes)
                                  Running the Modularity calculations in
                                  Gephi, these directors were grouped
                                  (by the algorithm, not me) together as
                                  members of the same community
                                  (there were other actor members too,
                                  but for the sake of understanding, we
                                  are taking up only directors). This
                                  community makes a lot of sense as
                                  these directors do, in fact, belong to
                                  the community of outstanding
                                  directors who were very active in the
                                  first half, and the early second half of
                                  the 20th century. This is all my
                                  substantial but limited knowledge of
                                  movies has helped me understand. I
                                  am sure a movie expert would be able
                                  to see more similarities.
V – Observations and Inferences
                                  Nodes (type director) of
                                  one of the Communities
                                  identified (one of the 90
                                  modularity classes)
                                  This is also an interesting community
                                  detected because most of the
                                  directors in this community are
                                  patrons of world cinema and in fact
                                  belong to the European continent.
                                  Juan Antonio Bayona (a fairly young
                                  director) has even cited much
                                  acclaimed Guillermo Del Toro as his
                                  inspiration and these two have been
                                  grouped together without having
                                  worked together. That was quite
                                  amazing. Zach Braff is a bit of a
                                  difficult nut to crack here, but he is
                                  also a bit quirky when it comes to
                                  directing. Don’t know for sure. Expert
                                  movie opinion needed.

                                  Note: Brown color means the
                                  personality is both an actor and a
                                  director (not necessarily in the same
                                  movie)
V – Observations and Inferences
                                  Nodes (type director) of
                                  one of the Communities
                                  identified (one of the 90
                                  modularity classes)
                                  Surprisingly, this identified community
                                  of directors is similar in the sense that
                                  most of the names in the inner circle
                                  are cited and known for having their
                                  own signature style of directing
                                  movies. The styles may be different
                                  but the attribute of having a signature
                                  style is common. These have been
                                  grouped together probably because
                                  most of these directors tend to have a
                                  fix set of actors with whom they make
                                  movies and thus the degree attributes
                                  etc might be the same (on the basis of
                                  which they are grouped together).

                                  But Wes Anderson’s inspiration is
                                  Stanley Kubrick. And both Wes
                                  Anderson and Guy Ritchie make
                                  movies with emphasis on script and
                                  dialogs (as did Stanley Kubrick).
V – Observations and Inferences
                                  Nodes (type director) of
                                  one of the Communities
                                  identified (one of the 90
                                  modularity classes)
                                  The Method Directors or the Serious
                                  Directors or The Cult Directors. Well
                                  mostly Method Directors. These are
                                  the directors who are passionate
                                  about the movies they make and it
                                  shows in their movies. They pay high
                                  attention to details and are
                                  comfortable with quite a few genres of
                                  movies.
V – Observations and Inferences
                                  Nodes (type director) of
                                  one of the Communities
                                  identified (one of the 90
                                  modularity classes)
                                  When edge weights depended on the
                                  partnerships, the graph threw forward
                                  this community class where Stanley
                                  Kubrick is paired with Alfred
                                  Hitchcock’s community (which wasn’t
                                  the case earlier where edge weights
                                  depended on the nodes and not the
                                  partnerships). This is more accurate
                                  on an objective basis because Stanley
                                  Kubrick was most active in the same
                                  era as Alfred Hitchcock but it can’t be
                                  said for sure if they had the same
                                  styles.

                                  Probably, we can then say that the
                                  partnership based edge weights are
                                  more accurate for detecting
                                  communities based on objective
                                  criteria as active period and origin etc
                                  (since the nodes will have same types
                                  of neighbours).
V – Qualitative Analysis of The Network
                                          Nodes (type director) of
                                          one of the Communities
                                          identified (one of the 90
                                          modularity classes)
                                          Compared to the community we saw
                                          earlier with Juan Antonio Bayona as a
                                          member, this community is more
                                          focused on directors based out of
                                          mainland Europe (or who primarily
                                          work in different vernaculars). We find
                                          Danny Boyle missing from this group
                                          and other European directors joining
                                          in. Another instance where
                                          partnership based edge weights are
                                          more accurate in identifying
                                          communities on an objective basis.
V – Observations and Inferences




 Tag Cloud Based on Weighted Degrees (Weight by Partnerships)
 Understandably Robert De Niro (with maximum degree, 53 and 21 appearances), Steven Spielberg (18
 appearances) and Clint Eastwood are the highlights. Please take note of Tom Cruise and Hayao Miyazaki (a
 Japanese director who is well acclaimed for amazing animated movies) as this will be helpful in the next slide.
V – Observations and Inferences




 Tag Cloud Based on Betweenness Centrality (Weight by Partnerships)
 Tom Cruise now is in the middle and prominent because he has, over the period of time worked with many
 directors/actors and is thus more important. Same goes for Matt Damon. Hayao Miyazaki is bigger again
 because he is more central as a bridge between Hollywood and Japanese movie industry (while he didn’t have a
 high weighted degree rank).
V – Observations and Inferences




Statistically the Most Important Partnerships in the World Movie Industry
No node is at the smooth part of the edge. This is important to specify because Orson Welles then appears to be
a bridge between Akira Kurosawa and Toshiro Mifune (which is not true). Each node is either at the ends of a
connection or at an inflection point in the edge curve. Also, this is statistical because from an artistic perspective,
Daniel Radcliffe, Emma Watson and Rupert Grint is not exactly the most important partnership. Purely from
artistic perspective.
V – Shortcomings of the Data and the Analysis
•   The data collected is purely indicative. A better study would be to include all the prominent actors of a movie
    and not the main 3 actors. The Main actors might not even be the main actors as they are listed based on
    popular votes rather than their roles in the story.

•   A better formula to calculate skillscore can be thought of (definitely) as the skillscore rubric used here might
    be too naïve and simple.

•   Data is only indicative of the quality of actors. More data about the movies (like how many votes, weighted
    votes (IMDB uses weighted votes) etc) can make the network more granular.

•   Constraints of Subjective Analysis: While working with any form of art/literature, it is highly uninteresting to
    work only with the Mathematical Data without delving into the qualitative nature of the analysis. But on the
    other hand, it is as difficult to have a solid understanding of the qualitative nature of the data because it is
    highly dependent on the perceiver. Thus, most of the subjective analysis done in this project might or might
    not be true depending on how informed an analyst is when it comes to movies.

•   I found myself susceptible to hindsight bias and confirmation bias while analyzing this network. At one point, I
    even thought I had apophenia (seeing meaningful patterns when none exist). Thus am not too sure of the
    conclusions drawn (also because I don’t have as much pedantic knowledge of movies and directors and
    actors) and they should be taken with a pinch of salt. They might be wrong, but as of now, since no one has
    burst the bubble, I will take them to be true by intuition.

Contenu connexe

Similaire à networkanalysis

Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxJawadHaider36
 
My last three projects - wins and failures
My last three projects - wins and failuresMy last three projects - wins and failures
My last three projects - wins and failuresAnton Katunin
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative FilteringYONG ZHENG
 
"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTIEdge AI and Vision Alliance
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Jedha Bootcamp
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf9260SahilPatil
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Claudio Greco
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Alessandro Suglia
 
Domain driven design ch1
Domain driven design ch1Domain driven design ch1
Domain driven design ch1HyeonSeok Choi
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Uncharted3 effect technique
Uncharted3 effect techniqueUncharted3 effect technique
Uncharted3 effect techniqueMinGeun Park
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptxManojGowdaKb
 
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialXavier Amatriain
 
Better DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseBetter DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseAndrew Eisenberg
 
GR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL SupportGR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL SupportGR8Conf
 

Similaire à networkanalysis (20)

Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
 
My last three projects - wins and failures
My last three projects - wins and failuresMy last three projects - wins and failures
My last three projects - wins and failures
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf4cliquesclusters-1235090001265558-2.pdf
4cliquesclusters-1235090001265558-2.pdf
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Domain driven design ch1
Domain driven design ch1Domain driven design ch1
Domain driven design ch1
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Uncharted3 effect technique
Uncharted3 effect techniqueUncharted3 effect technique
Uncharted3 effect technique
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptx
 
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
 
Better DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseBetter DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-Eclipse
 
GR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL SupportGR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL Support
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 

Dernier

04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?
04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?
04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?bilalpakweb
 
Recycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide PresentationRecycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide Presentationmakaiodm
 
Adventures in Soup Storyboard Clickthrough
Adventures in Soup Storyboard ClickthroughAdventures in Soup Storyboard Clickthrough
Adventures in Soup Storyboard ClickthroughLillyKocurek
 
Digital Marketing Creative Portfolio - Xandra Somera
Digital Marketing Creative Portfolio - Xandra SomeraDigital Marketing Creative Portfolio - Xandra Somera
Digital Marketing Creative Portfolio - Xandra SomeraXandra26
 
Rendezvous Arts on Chicago Tribune March20 2024
Rendezvous Arts on Chicago Tribune March20 2024Rendezvous Arts on Chicago Tribune March20 2024
Rendezvous Arts on Chicago Tribune March20 2024danwonclarinet
 
I know You're Strong Enough Test Storyboard
I know You're Strong Enough Test StoryboardI know You're Strong Enough Test Storyboard
I know You're Strong Enough Test StoryboardNatalieSpada
 
Music magazine inspiration - media studies
Music magazine inspiration - media studiesMusic magazine inspiration - media studies
Music magazine inspiration - media studiesLydiaAittayeb
 
Converse Shoe Designs by Anna Barto (Adobe Illustrator)
Converse Shoe Designs by Anna Barto (Adobe Illustrator)Converse Shoe Designs by Anna Barto (Adobe Illustrator)
Converse Shoe Designs by Anna Barto (Adobe Illustrator)Anna Barto
 
Alex Matus - Professional Best Photographer
Alex Matus - Professional Best PhotographerAlex Matus - Professional Best Photographer
Alex Matus - Professional Best PhotographerAlex Matus Photography
 
The Beach - a short visual story by Petra van Berkum
The Beach - a short visual story by Petra van BerkumThe Beach - a short visual story by Petra van Berkum
The Beach - a short visual story by Petra van Berkumberkumpje1
 
When a sudden medical emergency occurs—say, a spouse has a stroke
When a sudden medical emergency occurs—say, a spouse has a strokeWhen a sudden medical emergency occurs—say, a spouse has a stroke
When a sudden medical emergency occurs—say, a spouse has a strokebilalpakweb
 
Smudge Animated Short Thumbnails Version 2
Smudge Animated Short Thumbnails Version 2Smudge Animated Short Thumbnails Version 2
Smudge Animated Short Thumbnails Version 2micahhansonart
 
A selection of short panel comics by Petra van Berkum
A selection of short panel comics by Petra van BerkumA selection of short panel comics by Petra van Berkum
A selection of short panel comics by Petra van Berkumberkumpje1
 
Recycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide PresentationRecycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide Presentationmakaiodm
 
Film Poster for a fictional movie La Mer
Film Poster for a fictional movie La MerFilm Poster for a fictional movie La Mer
Film Poster for a fictional movie La MerAnna Barto
 
HOSPICE CARE DECISIONS—AND WHAT TO EXPECT
HOSPICE CARE DECISIONS—AND WHAT TO EXPECTHOSPICE CARE DECISIONS—AND WHAT TO EXPECT
HOSPICE CARE DECISIONS—AND WHAT TO EXPECTbilalpakweb
 
Flowering lilacs for celebrate spring 17
Flowering lilacs for celebrate spring 17Flowering lilacs for celebrate spring 17
Flowering lilacs for celebrate spring 17sandamichaela *
 

Dernier (20)

SEC v Burns .
SEC v Burns                                            .SEC v Burns                                            .
SEC v Burns .
 
04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?
04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?
04. MEMORY CARE: DEALING WITH DEMENTIA WHAT IS DEMENTIA?
 
Recycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide PresentationRecycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide Presentation
 
Adventures in Soup Storyboard Clickthrough
Adventures in Soup Storyboard ClickthroughAdventures in Soup Storyboard Clickthrough
Adventures in Soup Storyboard Clickthrough
 
Digital Marketing Creative Portfolio - Xandra Somera
Digital Marketing Creative Portfolio - Xandra SomeraDigital Marketing Creative Portfolio - Xandra Somera
Digital Marketing Creative Portfolio - Xandra Somera
 
Rendezvous Arts on Chicago Tribune March20 2024
Rendezvous Arts on Chicago Tribune March20 2024Rendezvous Arts on Chicago Tribune March20 2024
Rendezvous Arts on Chicago Tribune March20 2024
 
I know You're Strong Enough Test Storyboard
I know You're Strong Enough Test StoryboardI know You're Strong Enough Test Storyboard
I know You're Strong Enough Test Storyboard
 
Music magazine inspiration - media studies
Music magazine inspiration - media studiesMusic magazine inspiration - media studies
Music magazine inspiration - media studies
 
Converse Shoe Designs by Anna Barto (Adobe Illustrator)
Converse Shoe Designs by Anna Barto (Adobe Illustrator)Converse Shoe Designs by Anna Barto (Adobe Illustrator)
Converse Shoe Designs by Anna Barto (Adobe Illustrator)
 
Alex Matus - Professional Best Photographer
Alex Matus - Professional Best PhotographerAlex Matus - Professional Best Photographer
Alex Matus - Professional Best Photographer
 
The Beach - a short visual story by Petra van Berkum
The Beach - a short visual story by Petra van BerkumThe Beach - a short visual story by Petra van Berkum
The Beach - a short visual story by Petra van Berkum
 
When a sudden medical emergency occurs—say, a spouse has a stroke
When a sudden medical emergency occurs—say, a spouse has a strokeWhen a sudden medical emergency occurs—say, a spouse has a stroke
When a sudden medical emergency occurs—say, a spouse has a stroke
 
Book_National_Library_of_India_Exclusive Craqdi Library .pdf
Book_National_Library_of_India_Exclusive Craqdi Library  .pdfBook_National_Library_of_India_Exclusive Craqdi Library  .pdf
Book_National_Library_of_India_Exclusive Craqdi Library .pdf
 
Smudge Animated Short Thumbnails Version 2
Smudge Animated Short Thumbnails Version 2Smudge Animated Short Thumbnails Version 2
Smudge Animated Short Thumbnails Version 2
 
Portfolio
PortfolioPortfolio
Portfolio
 
A selection of short panel comics by Petra van Berkum
A selection of short panel comics by Petra van BerkumA selection of short panel comics by Petra van Berkum
A selection of short panel comics by Petra van Berkum
 
Recycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide PresentationRecycle Ann Arbor Brand Guide Presentation
Recycle Ann Arbor Brand Guide Presentation
 
Film Poster for a fictional movie La Mer
Film Poster for a fictional movie La MerFilm Poster for a fictional movie La Mer
Film Poster for a fictional movie La Mer
 
HOSPICE CARE DECISIONS—AND WHAT TO EXPECT
HOSPICE CARE DECISIONS—AND WHAT TO EXPECTHOSPICE CARE DECISIONS—AND WHAT TO EXPECT
HOSPICE CARE DECISIONS—AND WHAT TO EXPECT
 
Flowering lilacs for celebrate spring 17
Flowering lilacs for celebrate spring 17Flowering lilacs for celebrate spring 17
Flowering lilacs for celebrate spring 17
 

networkanalysis

  • 1. Network of Movie Stars Social Network Analysis Programming Project moogway@outlook.com Copyright: Copyrighted under Creative Commons License (Maybe. Am not sure how that works exactly): Please ask before you use it. Even if you don’t ask, just attribute it. I know I won’t be able to do anything if you don’t attribute it. Except well maybe you will step in a puddle while heading to an important meeting. Think about that.
  • 2. I – Purpose of The Study • To develop an insight in the community of movie actors/directors by analyzing their collaborations and audiences’ reactions to these collaborations (by collaborations , I mean movies, but “collaborations” sounds more scientific). II – Methodology • Since it would have been too heavy a computation to analyze the data for all the actors and directors; as a proof of concept of this research, only the actors/directors who have featured in the IMDB Top 1000 Movies list (sourced from http://www.icheckmovies.com/lists/imdb+top+1000/lampadatriste/) ,were kept as subjects • The aforementioned page was saved as an HTML file and a python parsing script was used to parse through the page and pick up IMDB links for top 1000 movie pages (code included no not included, because I ran out of time trying to format/comment all the files, thus I could not submit the project, which was depressing. But let me know if you need it. I will send it across. It’s a 14 line code including the empty lines) • The links were then scraped using a python scraping script (using BeautifulSoup) to collect the following data (code included) • Director(s) • 3 Main Stars (mentioned separately on IMDB movie page) • Year of Release • Movie Rating • The data was stored in a csv file which was cleaned up manually (a bit) and then analyzed using R and written into GML (source code included) • The GML was then loaded up in Gephi to visualize and analyze the network (results included)
  • 3. III – Defining Nodes, Edges, Edgeweights Nodes: • Each unique director, actor is a node (and shall be referred to as nodes/node henceforth) • Node attributes: • Id – Node Id • Name – Name of Actor/Director • Appcount – Number of movies in the list involving a particular node • Skillscore – This is dependent on rating of each movie in which a node appears Edges: • All the nodes who have worked together in a movie get an edge between them. • Edge Attributes: • Source • Target • Count – Number of times each edge appears • Edge weight – There are two ways to calculate edge weights, depending on what one wants to understand • Weight depending on Skillscores of involved nodes • Weight depending on the quality of the partnership between two nodes (comes from the movie rating)
  • 4. III – Skillscore and Edge weights R Code to help with a Rubric: #INCLUDE THE REQUIRED LIBRARY library(lattice) #READ THE DATAFILE AND ASSIGN THE RATINGS TO A NUMERIC VECTOR data <- read.csv(<CSV FILE PATH>, colClasses='character') ratings <- as.numeric(data[,]$RATINGS) plot(histogram(ratings)) #CALCULATE INTERVALS span = seq(min(ratings),max(ratings)+.1, by=.3) span.cut = cut(ratings, span, right=TRUE) span.freq = table(span.cut) print(span.freq) Movie Rating Range X=Contribution to Skillscore [9.1, 9.5] 6.0 How is Skillscore Calculated? [8.9, 9.0] 5.3 Skillscore is the sum of the contributions corresponding [8.7, 8.8] 4.5 to each of a node’s movie’s rating (aka X in the table to the left). So if I appear in three movies with ratings 7.1, [8.3, 8.6] 3.5 9.2 and 8.2, my skillscore would be 2.0 + 6.0 + 3.0 = [7.9, 8.2] 3.0 11.0. [7.5, 7.8] 2.5 Simple. More could be done with it but for now, this is it. [7.0, 7.4] 2.0
  • 5. III – Skillscore and Edge-weights (b) Edge-weight: Option 1: Focus on the quality of the partnership between the nodes. Here the edge weight is calculated on the basis of rating(s) of the movie(s) in which the two nodes appear together. For example, if node A and B occur together in two movies of ratings 8.3 and 9.0, the edge-weight will be calculated using the same Skillscore rubric. Formula is Edge Weight = 1 + P*log(Q*skillscore) Skillscore = sum of all X’s (using the rubric from the last slide) corresponding to each movie’s rating (in which the two nodes appear together) P, Q are constants which can be varied to optimize the width of edges in Gephi Therefore, for our example Edge Weight = 1 + P*log(Q*(3.5+5.3)) This is a straightforward way to figure out what sort of partnerships (between which nodes) work in the movie business. Option 2: Focus on the skills of the nodes. Here the edge weight is calculated on the basis of skillscores of each individual node connected by an edge. For any nodes A and B, formula is: Edge Weight = 1 + P*log(Q*skillscore(A)*skillscore(B)) Skillscore(i) = sum of all X’s (using the rubric from the last slide) corresponding to each movie’s rating (in which the node i appears) P, Q are constants which can be varied to optimize the width of edges in Gephi This makes sense when we are calculating weighted degrees and want to figure out if a node is connected to many averagely skilled nodes or few highly skilled nodes.
  • 6. IV – What does the network look like? Quantitative Details: • Number of Nodes: 2159 • Number of Edges: 5813 (pretty sparse graph, evidently) • Number of Connected Components: 80 • Size of Giant Component: 1782 nodes (makes up for Hollywood) • Number of communities detected (Modularity Resolution = 2.0) ~ 90 • Average Degree: 5.385 • Minimum Degree: 3 (because each movie has at least one director and 3 actors, thus each movie is a clique on the graph in which each node’s degree = 3) • Maximum Degree: 53 (who could this be? Any guesses?) • Clustering Coefficient: 0.775 (4297 triangles. Understandable as each movie clique contains at least 4 nodes, so at least 4 triangles in each clique and 1000 cliques, total 4000 triangles at least) • Network Diameter = 11 • Avg. Path Length = 4.782 We’ll talk more Mathematics when we talk about edge weights later. What is this image on the left? This is the network. Visualized using Radial Axis Algorithm on Gephi. Here the nodes have been arranged in a circle with each spar of the circle belonging to a modularity class. The long spars are communities with most members and the shadow beneath each spar is a dense network of edges. We’ll talk more about what modularity classes signify in the coming slides. As of now, we can see that the network is a collection of 10-14 big communities and many small communities..
  • 7. V – Observations and Inferences Nodes (type director) of one of the Communities identified (one of the 90 modularity classes) Running the Modularity calculations in Gephi, these directors were grouped (by the algorithm, not me) together as members of the same community (there were other actor members too, but for the sake of understanding, we are taking up only directors). This community makes a lot of sense as these directors do, in fact, belong to the community of outstanding directors who were very active in the first half, and the early second half of the 20th century. This is all my substantial but limited knowledge of movies has helped me understand. I am sure a movie expert would be able to see more similarities.
  • 8. V – Observations and Inferences Nodes (type director) of one of the Communities identified (one of the 90 modularity classes) This is also an interesting community detected because most of the directors in this community are patrons of world cinema and in fact belong to the European continent. Juan Antonio Bayona (a fairly young director) has even cited much acclaimed Guillermo Del Toro as his inspiration and these two have been grouped together without having worked together. That was quite amazing. Zach Braff is a bit of a difficult nut to crack here, but he is also a bit quirky when it comes to directing. Don’t know for sure. Expert movie opinion needed. Note: Brown color means the personality is both an actor and a director (not necessarily in the same movie)
  • 9. V – Observations and Inferences Nodes (type director) of one of the Communities identified (one of the 90 modularity classes) Surprisingly, this identified community of directors is similar in the sense that most of the names in the inner circle are cited and known for having their own signature style of directing movies. The styles may be different but the attribute of having a signature style is common. These have been grouped together probably because most of these directors tend to have a fix set of actors with whom they make movies and thus the degree attributes etc might be the same (on the basis of which they are grouped together). But Wes Anderson’s inspiration is Stanley Kubrick. And both Wes Anderson and Guy Ritchie make movies with emphasis on script and dialogs (as did Stanley Kubrick).
  • 10. V – Observations and Inferences Nodes (type director) of one of the Communities identified (one of the 90 modularity classes) The Method Directors or the Serious Directors or The Cult Directors. Well mostly Method Directors. These are the directors who are passionate about the movies they make and it shows in their movies. They pay high attention to details and are comfortable with quite a few genres of movies.
  • 11. V – Observations and Inferences Nodes (type director) of one of the Communities identified (one of the 90 modularity classes) When edge weights depended on the partnerships, the graph threw forward this community class where Stanley Kubrick is paired with Alfred Hitchcock’s community (which wasn’t the case earlier where edge weights depended on the nodes and not the partnerships). This is more accurate on an objective basis because Stanley Kubrick was most active in the same era as Alfred Hitchcock but it can’t be said for sure if they had the same styles. Probably, we can then say that the partnership based edge weights are more accurate for detecting communities based on objective criteria as active period and origin etc (since the nodes will have same types of neighbours).
  • 12. V – Qualitative Analysis of The Network Nodes (type director) of one of the Communities identified (one of the 90 modularity classes) Compared to the community we saw earlier with Juan Antonio Bayona as a member, this community is more focused on directors based out of mainland Europe (or who primarily work in different vernaculars). We find Danny Boyle missing from this group and other European directors joining in. Another instance where partnership based edge weights are more accurate in identifying communities on an objective basis.
  • 13. V – Observations and Inferences Tag Cloud Based on Weighted Degrees (Weight by Partnerships) Understandably Robert De Niro (with maximum degree, 53 and 21 appearances), Steven Spielberg (18 appearances) and Clint Eastwood are the highlights. Please take note of Tom Cruise and Hayao Miyazaki (a Japanese director who is well acclaimed for amazing animated movies) as this will be helpful in the next slide.
  • 14. V – Observations and Inferences Tag Cloud Based on Betweenness Centrality (Weight by Partnerships) Tom Cruise now is in the middle and prominent because he has, over the period of time worked with many directors/actors and is thus more important. Same goes for Matt Damon. Hayao Miyazaki is bigger again because he is more central as a bridge between Hollywood and Japanese movie industry (while he didn’t have a high weighted degree rank).
  • 15. V – Observations and Inferences Statistically the Most Important Partnerships in the World Movie Industry No node is at the smooth part of the edge. This is important to specify because Orson Welles then appears to be a bridge between Akira Kurosawa and Toshiro Mifune (which is not true). Each node is either at the ends of a connection or at an inflection point in the edge curve. Also, this is statistical because from an artistic perspective, Daniel Radcliffe, Emma Watson and Rupert Grint is not exactly the most important partnership. Purely from artistic perspective.
  • 16. V – Shortcomings of the Data and the Analysis • The data collected is purely indicative. A better study would be to include all the prominent actors of a movie and not the main 3 actors. The Main actors might not even be the main actors as they are listed based on popular votes rather than their roles in the story. • A better formula to calculate skillscore can be thought of (definitely) as the skillscore rubric used here might be too naïve and simple. • Data is only indicative of the quality of actors. More data about the movies (like how many votes, weighted votes (IMDB uses weighted votes) etc) can make the network more granular. • Constraints of Subjective Analysis: While working with any form of art/literature, it is highly uninteresting to work only with the Mathematical Data without delving into the qualitative nature of the analysis. But on the other hand, it is as difficult to have a solid understanding of the qualitative nature of the data because it is highly dependent on the perceiver. Thus, most of the subjective analysis done in this project might or might not be true depending on how informed an analyst is when it comes to movies. • I found myself susceptible to hindsight bias and confirmation bias while analyzing this network. At one point, I even thought I had apophenia (seeing meaningful patterns when none exist). Thus am not too sure of the conclusions drawn (also because I don’t have as much pedantic knowledge of movies and directors and actors) and they should be taken with a pinch of salt. They might be wrong, but as of now, since no one has burst the bubble, I will take them to be true by intuition.