This document discusses projects that analyze and visualize large cinema datasets to investigate patterns in film distribution. It describes a project analyzing over 63 million records of film screenings from around the world to study how films circulate globally and how seasonality and location affect screenings. Various visualization techniques are used including maps showing changes over time in Australian cinema venues, global cartograms of cinema numbers and screens, and circular diagrams tracing film screening sequences between venues.
08448380779 Call Girls In Civil Lines Women Seeking Men
Visualising heterogeneous cinema data sets
1. Visualising heterogeneous cinema data
sets
Big, Open Data and the Practice of GIScience
RGS-IBG Annual Conference, London 29 August 2013
Colin Arrowsmith, School of Mathematical and Geospatial
Science, RMIT University, Melbourne, Victoria, Australia
Deb Verhoeven and Alwyn Davidson, School of
Communication and Creative Arts, Deakin University, Melbourne,
Victoria, Australia
2. A big data project
“Only at the movies: Kinomatics”
School of Mathematical and Geospatial Sciences
2
3. Objective
To investigate spatial patterns of film diffusion
across the world.
– How do films circulate around the world?
– Does spatial clustering affect film screening?
– How does seasonality affect screening?
School of Mathematical and Geospatial Sciences
3
4. Dimensions of “Big data”
• Variety
• Velocity
• Volume
IBM “Bringing big data to the Enterprise”
(http://www-01.ibm.com/software/au/data/bigdata/)
• Visualization
School of Mathematical and Geospatial Sciences
4
5. Working with “Big data”
• Database downloaded from commercial film data collector
• 2 to 2.5 million showtime records per week
• 30000 movies downloaded after seven months
• 28000 cinema venues and 118000 screens
• 63.5 million records equating to 4.8 Gbytes of data
School of Mathematical and Geospatial Sciences
5
7. Projects exploring approaches for visualising and
analysing big film data
• Geographic methods
– Post-war cinema venues in Australia (change-over-time)
– Global cartograms for cinema (point-in-time)
– Global patterns of movement
• Non-geographic (conceptual)
– Multivariate visualisations (change-over-time)
– Film circulation (Markov-Chains)
School of Mathematical and Geospatial Sciences
7
8. Geographic examples
• Post-war cinema venues in Australia (change-over-time)
• Global cartograms for cinema (point-in-time)
• Global patterns
School of Mathematical and Geospatial Sciences
8
9. Static maps of post war cinema venues in Australia
• Basis for data was scanned “Film Weekly” summaries
• Base year of 1948 derived
• New and closed cinemas determined
• Significant post-processing
School of Mathematical and Geospatial Sciences
9
11. Rural scale changes
1948 to 1953
1963 to 1968
1953 to 1958
1958 to 1963
1968 to 1971
School of Mathematical and Geospatial Sciences
11
12. Rural scale changes
1948 to 1953
1963 to 1968
1953 to 1958
1958 to 1963
1968 to 1971
School of Mathematical and Geospatial Sciences
12
13. Urban scale changes (Melbourne)
1948 to 1953
1963 to 1968
1953 to 1958
1958 to 1963
1968 to 1971
School of Mathematical and Geospatial Sciences
13
14. Global cinema cartograms
• Cartogram is a map where a thematic variable is substituted for area (or
distance)
• Population substituted for area
School of Mathematical and Geospatial Sciences
14
19. Life of Pi
30 November 2012
7 December 2012
14 December 2012
21 December 2012
19
20. Life of Pi
28 December 2012
11 January 2013
4 January 2013
17 January 2013
20
21. Life of Pi (November 2012 to January 2013)
School of Mathematical and Geospatial Sciences
21
22. Life of Pi (November 2012 to January 2013)
School of Mathematical and Geospatial Sciences
22
23. Non-geographic examples
• Multivariate visualisations (change-over-time)
• Film circulation (Markov-Chains)
School of Mathematical and Geospatial Sciences
23
26. Movement approaches: The Greek cinema circuit
• Objective
– To explore historical changes in the diasporic Greek cinema distribution of
Finos and Anzervos films during the period 1956 to 1963
• Rationale
– To demonstrate the role of geographic analysis in understanding cinema
circuit behaviour
School of Mathematical and Geospatial Sciences
26
27. Data acquisition
• Archival newspaper and oral history research
• Government records
– censorship records
– theatre licence and company records
• Geo-location using street address or via GPS
School of Mathematical and Geospatial Sciences
27
32. Key chains identified
No. of
venues
Anzervos
Finos
1
B
C
B
A
2
BC
CB
BC
AD
3
BCB
CBC
BCB
BCA
4
BCBC
MGPC
BCBC
BCBA
School of Mathematical and Geospatial Sciences
32
33. Circos – circular visualisations
• Film sequence (Fort of Freedom):
– BCBBBBBCAABBBBBB by screening
or
– BCBCAB venue sequencing
School of Mathematical and Geospatial Sciences
33
34. Change in sequence (Anzervos)
Ali Pasha and Mrs Frossini
The Fort of Freedom
School of Mathematical and Geospatial Sciences
34
35. Change in sequence (Finos)
Music, Povery and Pride
Astero
School of Mathematical and Geospatial Sciences
35
36. Change of venue date
School of Mathematical and Geospatial Sciences
36
37. Change of venue date
Ali Pasha and Mrs Frosini
3.5
A
J
S
3
Months
2.5
2
1.5
C
CA A
1
0.5
BCBBB BB C
0
0
10
20
30
40
50
60
70
Days
The Fort of Freedom
35
A B B B
B B
30
Months
25
20
15
10
5
B
C B B B B
B
C
A
0
0
5
10
15
20
25
30
35
40
Days
School of Mathematical and Geospatial Sciences
37
38. Change of venue date
Music, Poverty and Pride
100
F
90
BBBB BBD DD
A
D
K
K
80
70
BBBBB
Months
60
50
40
30
20
10
G
P
II
A
C
0
0
20
40
60
80
100
120
Days
Astero
35
JJJ JJJ F
K
D
30
Months
25
20
BB B
BB B
15
O
10
5
B
B
B BBC
CB B B
B
BA
B
A
A
O
D
BBB
BB
A
0
0
50
100
150
200
250
Days
School of Mathematical and Geospatial Sciences
38
39. OLIVE TREES
• The olives are where films finished: green= Sydney venue, purple =
Melbourne venue
• Leaves are screenings: yellow is QLD, light green is NSW, darker green is
VIC, dark brown is SA
• The distance is days between screenings and done to scale
Anzervos
Finos
School of Mathematical and Geospatial Sciences
39
40. Issues working with “big” complex cinema data
•Multiple sources of data
•Working at multiple scales
•Working with historic data
•Multiple definitions
•Need for visualising both geographic and conceptual relationships
School of Mathematical and Geospatial Sciences
40
Notes de l'éditeur
Thanks James. I’d just like to start by acknowledging my co-authors of this presentation – Deb Verhoeven who is a media/film expert and cinema historian – Deb and I have worked together for probably the past 6-7 years - and Alwyn Davidson who was one of my past PhD students who is working with us as a researcher on this project. Two quite different disciplines – blends “QUALITATIVE” with “QUANTITATIVE”.
My presentation will take you through a project is being funded through the Australian Research Council (ARC) aimed at trying to understand the spatial patterns of film diffusion throughout the world. The project is still in its development phase. But what I want to run through today, is some of the methods we’ve used to analyse and visualise film movement and cinema venue activity that may prove useful in understanding these film movements that have been collected and are stored as “Big Data”.
We’ve called the project “Kinomatics” – from the Russian pronunciation of Cine (cinema) – often referred to in cinema literature – eg. “Kino Cinema” in Melbourne. Also play on Kinetic Energy (i.e pertaining to movement).
We have a number of research questions of which these are but a few.
How has digitization affected film distribution? – no longer a “physical” movement of film.
If we start by reviewing IBM’s dimensions of what Big Data is: VARIETY (data can be structured, unstructured, text, video, audio); VELOCITY (time sensitive, can require streaming of data); and of course VOLUME – comes in one size LARGE.
We would also add VISUALIZATION to that in order to analyse patterns.
We’re downloading (via daily streaming), screenings for films across 48 countries in the World. This is data collected by a US Company for commercial purposes for advertising etc. There are other large databases that also hold some of this data (for example: the Internet Movie Database or IMDb available at: www.imdb.com – but only gives films for used specific regions).
Compressed data files automatically downloaded via PERL synchronisation service
Project database is RHEL 6 standard MySQL 5.1.67
Stored on virtual server at Deakin using RedHat Enterprise Linux (RHEL)
Currently have 63.5 million – estimate at the end will have 100 million
This is an outline of the database schema – the link between MOVIE and VENUE is the SHOWTIME (or screening date and time).
Geographic methods hold geographic location as true or near-to-true. Show geographic relationships between venues and geographic movement of screenings.
Non-geographic – show relationships between distributers and venues.
Much of the earlier project data came from project based databases such as CAARP.
Other visualisations: “Information is beautiful” web site (www.informationisbeautiful.net) and within this web site is “Hollywood Visualizations” (http://www.informationisbeautiful.net/2012/hollywood-visualizations/)
The first example came from earlier ARC Discovery Grant
Based on “Hot Spot” (Getis-Ord) analysis – identifies statistically significant hot spots (high values – or increase in cinema numbers) and cold spots (low values – loss in cinema numbers). Issue of small polygons.
Based on “Hot Spot” (Getis-Ord) analysis – identifies statistically significant hot spots (high values – or increase in cinema numbers) and cold spots (low values – loss in cinema numbers). Issue of small polygons.
Shows that state boundaries not that significant. Topography was – hilly terrain in NE Victoria versus flat areas in NSW and Qld – distance not important but time to travel was
Cars become influential in late ‘50s
Cartogram generated using the Gastner-Newman “diffusion-based” method which equalises density throughout a set polygon. It uses the mean of polygons outside the area of interest to maintain their shape.
Screenings for “Skyfall” shown on 10 January 2013
Different film screening shown on 24 December 2012.
Approximately 1500 different films screened more than 300,000 times at 82000 venues
Screenings for “Life of Pi ”
Screenings for “Life of Pi ”
Radial axes related to time – circles increase from 1948 in 5 year intervals
Lines indicate length of cinema venue operation
Colours related to venue operator.
Centre = Melbourne GPO
Could be used in similar fashion to weather map – petal diagram at each location.
Using “Tableau”
Greek cinema circuit operated by staggering the release of films; period of study when 100,000s of Greek migrated to Australia (250000 Greeks came to Australia between 1952-1974)
Uncovers the relationships between cinemas themselves; anecdotal evidence that films tended to follow a predictable pathway – wanted to test this – single release of a film (one physical copy which moved from venue to venue).
Markov Chains – statistical process where an initial condition results in a number of alternative outcomes (stochastic).
CAARP = Cinema and Audience Research Project
much data from Greek language newspaper “NeosKosmos”
1 film went to “A”
15 films went to “B” (6 went only to B)
A = Melbourne Town Hall (Melbourne)
B = Lawson Theatre (Redfern)
C = Doncaster Theatre (Sydney)
D = Nicholas Hall (Melbourne)
Circos – used for genome sequencing (eg. A, C, G and T are bases and three of these code for 1 amino acid)
Produced using “Circos” software developed originally for identifying and analysing similarities and differences in genome structure and the sequencing of multiple genomes
The similarities in visualising genome sequences and cinema venue sequences were evident. The circular approach to represent connections between venues became easier to organise than using a linear method.
Hence it could be concluded that Finos Films had a much broader, or eclectic, venue repertoire than did Anzervos, who were more constrained to venues A, B and C.
Acknowledge Michelle Mantsio – research assistant who collected data and entered in CAARP and drew these diagrams. Olive tree is metaphor for Greek film distribution
Multiple sources and types of data – publications, third party commercial data, external databases often collected for differing purposes – need for socio-demographic, meteorological/seasonality, etc
Multiple scales – local and global with differing levels of spatial and attribute accuracies – need for triangulation to confirm validity – our big data project not truly global (48 countries) – Hollywood in process of signing agreements for distributing digitally – still use hardcopy mailed out. Some countries will be left out due to internet restrictions.
Historic data – as above – often gaps in data which can’t be ascertained
Multiple definitions – the meaning of a “venue” – country Australia – may be a Town Hall or moving cinema (caravans)
Finally there is a need to visualise in different ways to build a collective story.