SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
The data

The script

Your turn

Questions?

Hands-on-Workshop
Big (Twitter) Data
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam

30 January 2014
10.45
#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

In this sesion (2/4):
1 The data

Recording tweets with yourTwapperkeeper
CSV-files
Other ways to collect tweets
Not that different: Facebook posts
2 The script

Pseudo-code
Python code
The output
3 Your turn
4 Questions?

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

The data:
Recording tweets with yourTwapperkeeper
http://datacollection.followthenews-uva.cloudlet.sara.nl

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

Storage
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.
You tell it once which data to collect – and
wait some months.

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recording tweets with yourTwapperkeeper

yourTwapperkeeper

Retrieving the data
You could access the MySQL-database directly.
But yourTwapperkeeper has a nice interface
that allows you to export the data to a format
we can use for the analysis.

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

CSV-files

The data:
CSV-files

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

CSV-files

CSV-files

The format of our choice
• All programs can read it
• Even human-readable in a simple text editor:
• Plain text, with a comma (or a semicolon) denoting column

breaks
• No limits regarging the size

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

CSV-files

1

2

3

text,to_user_id,from_user,id,from_user_id,
iso_language_code,source,profile_image_url,geo_type,
geo_coordinates_0,geo_coordinates_1,created_at,time
:-) #Lectrr #wereldleiders #uitspraken #Wikileaks #
klimaattop http://t.co/Udjpk48EIB,,henklbr
,407085917011079169,118374840,nl,web,http://pbs.twimg.
com/profile_images/378800000673845195/
b47785b1595e6a1c63b93e463f3d0ccc_normal.jpeg,,0,0,Sun
Dec 01 09:57:00 +0000 2013,1385891820
Wat zijn de resulaten vd #klimaattop in #Warschau waard?
@EP_Environment ontmoet voorzitter klimaattop
@MarcinKorolec http://t.co/4Lmiaopf60,,Europarl_NL
,406058792573730816,37623918,en,<a href="http://www.
hootsuite.com" rel="nofollow">HootSuite</a>,http://pbs
.twimg.com/profile_images/2943831271/
b6631b23a86502fae808ca3efde23d0d_normal.png,,0,0,Thu
Nov 28 13:55:35 +0000 2013,1385646935

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Other ways to collect tweets

The data:
Other ways to collect tweets

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Other ways to collect tweets

Other ways to collect tweets
Again, we want a CSV file. . .
• If you want tweets per person: www.allmytweets.net
• Up to six days backwards: www.scraperwiki.com
• Buy it from a commercial vendor
• TCAT (from the guys at DMI/mediastudies)
• For specific purposes, write your own Python script to access

the Twitter-API
(if you want to, I can show you more about this tomorrow)

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Not that different: Facebook posts

The data:
Not that different: Facebook posts

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Not that different: Facebook posts

Not that different: Facebook posts
Have a look at netvizz
• Gephi-files for network analysis
• . . . and a tab-seperated (essentially the same as CSV) file with

the content)

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Not that different: Facebook posts

Not that different: Facebook posts
Have a look at netvizz
• Gephi-files for network analysis
• . . . and a tab-seperated (essentially the same as CSV) file with

the content)

An alternative: Facepager
• Tool to query different APIs (a.o. Twitter and Facebook) and

to store the result in a CSV table
• http://www.ls1.ifkw.uni-muenchen.de/personen/

wiss_ma/keyling_till/software.html

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Pseudo-code

The script:
Pseudo-code

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Pseudo-code

Our task: Identify all tweets that include a reference to Poland
Let’s start with some pseudo-code!
1
2
3
4
5
6
7

open csv-table
for each line:
append column 1 to a list of tweets
append column 3 to a list of corresponding users
look for searchstring in column 1
append search result to a list of results
save lists to a new csv-file

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

The script:
Python code

#bigdata

Damian Trilling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

#!/usr/bin/python
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"
user_list=[]
tweet_list=[]
search_list=[]
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)
print "Opening "+inputfilename
reader=CsvUnicodeReader(open(inputfilename,"r"))
for row in reader:
tweet_list.append(row[0])
user_list.append(row[2])
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)
print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)
The data

The script

Your turn

Questions?

Python code

1
2
3
4
5

#!/usr/bin/python
# We start with importing some modules:
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re

6
7
8
9
10

# Let us define two variables that contain
# the names of the files we want to use
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2
3
4
5
6

# We create some empty lists that we will use later on.
# A list can contain several variables
# and is denoted by square brackets.
user_list=[]
tweet_list=[]
search_list=[]

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2

# What do we want to look for?
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau
|[Ww]arszawa’)

3
4
5
6

# Enough preparation, let the program begin!
# We tell the user what is going on...
print "Opening "+inputfilename

7
8
9

# ... and call the module that reads the input file.
reader=CsvUnicodeReader(open(inputfilename,"r"))

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2
3
4
5
6
7
8

# Now we read the file line by line.
# The indented block is repeated for each row
# (thus, each tweet)
for row in reader:
# append data from the current row to our lists.
# Note that we start counting with 0.
tweet_list.append(row[0])
user_list.append(row[2])

9
10
11
12
13
14
15
16

#bigdata

# Let us count how often our searchstring is used in
# in this tweet
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)

Damian Trilling
The data

The script

Your turn

Questions?

Python code

1
2

# Time to put all the data in one container
# and save it:

3
4
5
6

7
8
9
10

print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland
mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)

#bigdata

Damian Trilling
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

#!/usr/bin/python
from unicsv import CsvUnicodeReader
from unicsv import CsvUnicodeWriter
import re
inputfilename="mytweets.csv"
outputfilename="myoutput.csv"
user_list=[]
tweet_list=[]
search_list=[]
searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)
print "Opening "+inputfilename
reader=CsvUnicodeReader(open(inputfilename,"r"))
for row in reader:
tweet_list.append(row[0])
user_list.append(row[2])
matches1 = searchstring1.findall(row[0])
matchcount1=0
for word in matches1:
matchcount1=matchcount1+1
search_list.append(matchcount1)
print "Constructing data matrix"
outputdata=zip(tweet_list,user_list,search_list)
headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])
print "Write data matrix to ",outputfilename
writer=CsvUnicodeWriter(open(outputfilename,"wb"))
writer.writerows(headers)
writer.writerows(outputdata)
The data

The script

Your turn

Questions?

The output

The script:
myoutput.csv

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

The output

1
2

3

4

5

tweet,user,how often is Poland mentioned?
:-) #Lectrr #wereldleiders #uitspraken #Wikileaks #
klimaattop http://t.co/Udjpk48EIB,henklbr,0
Wat zijn de resulaten vd #klimaattop in #Warschau waard?
@EP_Environment ontmoet voorzitter klimaattop
@MarcinKorolec http://t.co/4Lmiaopf60,Europarl_NL,1
RT @greenami1: De winnaars en verliezers van de
lachwekkende #klimaattop in #Warschau (interview):
http://t.co/DEYqnqXHdy #Misserfolg #Kli...,LarsMoratis
,1
De winnaars en verliezers van de lachwekkende #klimaattop
in #Warschau (interview): http://t.co/DEYqnqXHdy #
Misserfolg #Klimaschutz #FAZ,greenami1,1

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

The output

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Try it yourself!
We’ll help you getting started. Please go to
http://beehub.nl/bigdata-cw/workshop and download the
some files. Save the Python files
unicsv.py
myfirstscript.py as well as the dataset
mytweets.csv in a new folder called workshop on your
H-drive.
When you are done, start Python (GUI) from the
Windows Start Menu.

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Recap
1 The data

Recording tweets with yourTwapperkeeper
CSV-files
Other ways to collect tweets
Not that different: Facebook posts
2 The script

Pseudo-code
Python code
The output
3 Your turn
4 Questions?

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

This afternoon

Your own script

#bigdata

Damian Trilling
The data

The script

Your turn

Questions?

Vragen of opmerkingen?

Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
#bigdata

Damian Trilling

Contenu connexe

En vedette (7)

BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case study
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
 
Twitter bootstrap tutorial
Twitter bootstrap tutorialTwitter bootstrap tutorial
Twitter bootstrap tutorial
 

Similaire à Analyzing social media with Python and other tools (2/4)

Adventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterAdventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterKrist Wongsuphasawat
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizingKrist Wongsuphasawat
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Matthew Russell
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightMatthew Russell
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insightDigital Reasoning
 
Five steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of usersFive steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of usersWeiai Wayne Xu
 
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...PyData
 
Natural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenNatural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenAiden Wu, FRM
 
Linking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case StudyLinking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case Studylisbk
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using pythonKe Jiang
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveIRJET Journal
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using pythonKe Jiang
 

Similaire à Analyzing social media with Python and other tools (2/4) (20)

Analyzing social media with Python and other tools (4/4)
Analyzing social media with Python and other tools (4/4) Analyzing social media with Python and other tools (4/4)
Analyzing social media with Python and other tools (4/4)
 
Adventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at TwitterAdventure in Data: A tour of visualization projects at Twitter
Adventure in Data: A tour of visualization projects at Twitter
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizing
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
 
BD-ACA week3a
BD-ACA week3aBD-ACA week3a
BD-ACA week3a
 
Aws r
Aws rAws r
Aws r
 
BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
 
Five steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of usersFive steps to get tweets sent by a list of users
Five steps to get tweets sent by a list of users
 
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
Data Engineering 101: Building your first data product by Jonathan Dinu PyDat...
 
Natural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenNatural Language Processing sample code by Aiden
Natural Language Processing sample code by Aiden
 
Linking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case StudyLinking Feral Event Data: IWMW 2009 Case Study
Linking Feral Event Data: IWMW 2009 Case Study
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using python
 
01-intro.pptx
01-intro.pptx01-intro.pptx
01-intro.pptx
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
 
Collect twitter data using python
Collect twitter data using pythonCollect twitter data using python
Collect twitter data using python
 

Plus de Department of Communication Science, University of Amsterdam

Plus de Department of Communication Science, University of Amsterdam (18)

BDACA - Lecture8
BDACA - Lecture8BDACA - Lecture8
BDACA - Lecture8
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA - Lecture7
 
BDACA - Lecture6
BDACA - Lecture6BDACA - Lecture6
BDACA - Lecture6
 
BDACA - Lecture4
BDACA - Lecture4BDACA - Lecture4
BDACA - Lecture4
 
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
BDACA - Lecture3
 
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Lecture2
 
BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA - Tutorial1
 
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
BDACA - Lecture1
 
BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
 
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
 
BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7
 
BDACA1516s2 - Lecture4
 BDACA1516s2 - Lecture4 BDACA1516s2 - Lecture4
BDACA1516s2 - Lecture4
 
BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1
 

Dernier

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Dernier (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Analyzing social media with Python and other tools (2/4)

  • 1. The data The script Your turn Questions? Hands-on-Workshop Big (Twitter) Data Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 30 January 2014 10.45 #bigdata Damian Trilling
  • 2. The data The script Your turn Questions? In this sesion (2/4): 1 The data Recording tweets with yourTwapperkeeper CSV-files Other ways to collect tweets Not that different: Facebook posts 2 The script Pseudo-code Python code The output 3 Your turn 4 Questions? #bigdata Damian Trilling
  • 3. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper The data: Recording tweets with yourTwapperkeeper http://datacollection.followthenews-uva.cloudlet.sara.nl #bigdata Damian Trilling
  • 4. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper #bigdata Damian Trilling
  • 5. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper Storage Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. You tell it once which data to collect – and wait some months. #bigdata Damian Trilling
  • 6. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper #bigdata Damian Trilling
  • 7. The data The script Your turn Questions? Recording tweets with yourTwapperkeeper yourTwapperkeeper Retrieving the data You could access the MySQL-database directly. But yourTwapperkeeper has a nice interface that allows you to export the data to a format we can use for the analysis. #bigdata Damian Trilling
  • 8.
  • 9.
  • 10.
  • 11. The data The script Your turn Questions? CSV-files The data: CSV-files #bigdata Damian Trilling
  • 12. The data The script Your turn Questions? CSV-files CSV-files The format of our choice • All programs can read it • Even human-readable in a simple text editor: • Plain text, with a comma (or a semicolon) denoting column breaks • No limits regarging the size #bigdata Damian Trilling
  • 13. The data The script Your turn Questions? CSV-files 1 2 3 text,to_user_id,from_user,id,from_user_id, iso_language_code,source,profile_image_url,geo_type, geo_coordinates_0,geo_coordinates_1,created_at,time :-) #Lectrr #wereldleiders #uitspraken #Wikileaks # klimaattop http://t.co/Udjpk48EIB,,henklbr ,407085917011079169,118374840,nl,web,http://pbs.twimg. com/profile_images/378800000673845195/ b47785b1595e6a1c63b93e463f3d0ccc_normal.jpeg,,0,0,Sun Dec 01 09:57:00 +0000 2013,1385891820 Wat zijn de resulaten vd #klimaattop in #Warschau waard? @EP_Environment ontmoet voorzitter klimaattop @MarcinKorolec http://t.co/4Lmiaopf60,,Europarl_NL ,406058792573730816,37623918,en,<a href="http://www. hootsuite.com" rel="nofollow">HootSuite</a>,http://pbs .twimg.com/profile_images/2943831271/ b6631b23a86502fae808ca3efde23d0d_normal.png,,0,0,Thu Nov 28 13:55:35 +0000 2013,1385646935 #bigdata Damian Trilling
  • 14. The data The script Your turn Questions? Other ways to collect tweets The data: Other ways to collect tweets #bigdata Damian Trilling
  • 15. The data The script Your turn Questions? Other ways to collect tweets Other ways to collect tweets Again, we want a CSV file. . . • If you want tweets per person: www.allmytweets.net • Up to six days backwards: www.scraperwiki.com • Buy it from a commercial vendor • TCAT (from the guys at DMI/mediastudies) • For specific purposes, write your own Python script to access the Twitter-API (if you want to, I can show you more about this tomorrow) #bigdata Damian Trilling
  • 16. The data The script Your turn Questions? Not that different: Facebook posts The data: Not that different: Facebook posts #bigdata Damian Trilling
  • 17. The data The script Your turn Questions? Not that different: Facebook posts Not that different: Facebook posts Have a look at netvizz • Gephi-files for network analysis • . . . and a tab-seperated (essentially the same as CSV) file with the content) #bigdata Damian Trilling
  • 18. The data The script Your turn Questions? Not that different: Facebook posts Not that different: Facebook posts Have a look at netvizz • Gephi-files for network analysis • . . . and a tab-seperated (essentially the same as CSV) file with the content) An alternative: Facepager • Tool to query different APIs (a.o. Twitter and Facebook) and to store the result in a CSV table • http://www.ls1.ifkw.uni-muenchen.de/personen/ wiss_ma/keyling_till/software.html #bigdata Damian Trilling
  • 19.
  • 20. The data The script Your turn Questions? Pseudo-code The script: Pseudo-code #bigdata Damian Trilling
  • 21. The data The script Your turn Questions? Pseudo-code Our task: Identify all tweets that include a reference to Poland Let’s start with some pseudo-code! 1 2 3 4 5 6 7 open csv-table for each line: append column 1 to a list of tweets append column 3 to a list of corresponding users look for searchstring in column 1 append search result to a list of results save lists to a new csv-file #bigdata Damian Trilling
  • 22. The data The script Your turn Questions? Python code The script: Python code #bigdata Damian Trilling
  • 23. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 #!/usr/bin/python from unicsv import CsvUnicodeReader from unicsv import CsvUnicodeWriter import re inputfilename="mytweets.csv" outputfilename="myoutput.csv" user_list=[] tweet_list=[] search_list=[] searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’) print "Opening "+inputfilename reader=CsvUnicodeReader(open(inputfilename,"r")) for row in reader: tweet_list.append(row[0]) user_list.append(row[2]) matches1 = searchstring1.findall(row[0]) matchcount1=0 for word in matches1: matchcount1=matchcount1+1 search_list.append(matchcount1) print "Constructing data matrix" outputdata=zip(tweet_list,user_list,search_list) headers=zip(["tweet"],["user"],["how often is Poland mentioned?"]) print "Write data matrix to ",outputfilename writer=CsvUnicodeWriter(open(outputfilename,"wb")) writer.writerows(headers) writer.writerows(outputdata)
  • 24. The data The script Your turn Questions? Python code 1 2 3 4 5 #!/usr/bin/python # We start with importing some modules: from unicsv import CsvUnicodeReader from unicsv import CsvUnicodeWriter import re 6 7 8 9 10 # Let us define two variables that contain # the names of the files we want to use inputfilename="mytweets.csv" outputfilename="myoutput.csv" #bigdata Damian Trilling
  • 25. The data The script Your turn Questions? Python code 1 2 3 4 5 6 # We create some empty lists that we will use later on. # A list can contain several variables # and is denoted by square brackets. user_list=[] tweet_list=[] search_list=[] #bigdata Damian Trilling
  • 26. The data The script Your turn Questions? Python code 1 2 # What do we want to look for? searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau |[Ww]arszawa’) 3 4 5 6 # Enough preparation, let the program begin! # We tell the user what is going on... print "Opening "+inputfilename 7 8 9 # ... and call the module that reads the input file. reader=CsvUnicodeReader(open(inputfilename,"r")) #bigdata Damian Trilling
  • 27. The data The script Your turn Questions? Python code 1 2 3 4 5 6 7 8 # Now we read the file line by line. # The indented block is repeated for each row # (thus, each tweet) for row in reader: # append data from the current row to our lists. # Note that we start counting with 0. tweet_list.append(row[0]) user_list.append(row[2]) 9 10 11 12 13 14 15 16 #bigdata # Let us count how often our searchstring is used in # in this tweet matches1 = searchstring1.findall(row[0]) matchcount1=0 for word in matches1: matchcount1=matchcount1+1 search_list.append(matchcount1) Damian Trilling
  • 28. The data The script Your turn Questions? Python code 1 2 # Time to put all the data in one container # and save it: 3 4 5 6 7 8 9 10 print "Constructing data matrix" outputdata=zip(tweet_list,user_list,search_list) headers=zip(["tweet"],["user"],["how often is Poland mentioned?"]) print "Write data matrix to ",outputfilename writer=CsvUnicodeWriter(open(outputfilename,"wb")) writer.writerows(headers) writer.writerows(outputdata) #bigdata Damian Trilling
  • 29. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 #!/usr/bin/python from unicsv import CsvUnicodeReader from unicsv import CsvUnicodeWriter import re inputfilename="mytweets.csv" outputfilename="myoutput.csv" user_list=[] tweet_list=[] search_list=[] searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’) print "Opening "+inputfilename reader=CsvUnicodeReader(open(inputfilename,"r")) for row in reader: tweet_list.append(row[0]) user_list.append(row[2]) matches1 = searchstring1.findall(row[0]) matchcount1=0 for word in matches1: matchcount1=matchcount1+1 search_list.append(matchcount1) print "Constructing data matrix" outputdata=zip(tweet_list,user_list,search_list) headers=zip(["tweet"],["user"],["how often is Poland mentioned?"]) print "Write data matrix to ",outputfilename writer=CsvUnicodeWriter(open(outputfilename,"wb")) writer.writerows(headers) writer.writerows(outputdata)
  • 30. The data The script Your turn Questions? The output The script: myoutput.csv #bigdata Damian Trilling
  • 31. The data The script Your turn Questions? The output 1 2 3 4 5 tweet,user,how often is Poland mentioned? :-) #Lectrr #wereldleiders #uitspraken #Wikileaks # klimaattop http://t.co/Udjpk48EIB,henklbr,0 Wat zijn de resulaten vd #klimaattop in #Warschau waard? @EP_Environment ontmoet voorzitter klimaattop @MarcinKorolec http://t.co/4Lmiaopf60,Europarl_NL,1 RT @greenami1: De winnaars en verliezers van de lachwekkende #klimaattop in #Warschau (interview): http://t.co/DEYqnqXHdy #Misserfolg #Kli...,LarsMoratis ,1 De winnaars en verliezers van de lachwekkende #klimaattop in #Warschau (interview): http://t.co/DEYqnqXHdy # Misserfolg #Klimaschutz #FAZ,greenami1,1 #bigdata Damian Trilling
  • 32. The data The script Your turn Questions? The output #bigdata Damian Trilling
  • 33. The data The script Your turn Questions? Try it yourself! We’ll help you getting started. Please go to http://beehub.nl/bigdata-cw/workshop and download the some files. Save the Python files unicsv.py myfirstscript.py as well as the dataset mytweets.csv in a new folder called workshop on your H-drive. When you are done, start Python (GUI) from the Windows Start Menu. #bigdata Damian Trilling
  • 34. The data The script Your turn Questions? Recap 1 The data Recording tweets with yourTwapperkeeper CSV-files Other ways to collect tweets Not that different: Facebook posts 2 The script Pseudo-code Python code The output 3 Your turn 4 Questions? #bigdata Damian Trilling
  • 35. The data The script Your turn Questions? This afternoon Your own script #bigdata Damian Trilling
  • 36. The data The script Your turn Questions? Vragen of opmerkingen? Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net #bigdata Damian Trilling