SlideShare a Scribd company logo
1 of 44
Download to read offline
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

#bigdata in Communication Science
Some examples from research
by me and my students
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam

October 2013
#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

1 What’s big data?
2 Some examples

Rare events
Tone in tweets
Counting words and n-grams
Network analysis
3 Problems
4 A glimpse in the kitchen
5 Questions?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What’s big data?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What’s big data?

No definition, but . . .
• Existing data
• Too big to code manually
• Sometimes also too big to handle with normal tools
• New research questions
• Call to revisit the relationship between theory and empirical

research

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What’s big data?

Some sources
• Social Network Sites
• RSS-feeds
• Databases
• Scraping text from the web
• ...

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

It’s out there!
You only have to collect it.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Some examples

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

A recent master thesis

Rare events

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

A recent master thesis

Rare events
Imagine you want to analyze some very rare content.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

A recent master thesis

Rare events
Imagine you want to analyze some very rare content.
Normal sampling won’t work, that’s for sure.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sites

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sites
We collected all articles from nine news sites during a period of
two months, resulting in a database with 74.000 articles.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sites
We collected all articles from nine news sites during a period of
two months, resulting in a database with 74.000 articles.
In a second step, we filtered those articles containing specific
keywords. Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Rare events

It’s just one line of code!

url.txt
http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung
http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten
http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik
http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food
...
...
...

#bigdata

wget-commando
wget -i urls.txt

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents
We took lists with positive and negative words and with a
politician’s opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents
We took lists with positive and negative words and with a
politician’s opponents.
We used a Python-script to check which type of words were used
to refer to opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
there opponents
We took lists with positive and negative words and with a
politician’s opponents.
We used a Python-script to check which type of words were used
to refer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Tone in tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-grams

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-grams
Imagine you want to know which words or expressions dominate a
discourse .

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-grams
Imagine you want to know which words or expressions dominate a
discourse .
There are plenty of possibilities to get an answer within minutes,
here’s one:

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

Again, just one or two lines of code!

For example with STATA
• Install the package wordscore (net install

http://www.tcd.ie/Political_Science/wordscores/wordscores)
• voor wordcounts: wordfreq /home/dami/texts/lab92.txt

/home/dami/texts/lab97.txt
• voor ngrams (trigrams in dit geval): phrasefreq 3 lab92.txt

lab97.txt

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Counting words and n-grams

trigrams in Obama-Tweets

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

Another approach

Network analysis

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

Another approach

Network analysis
Imagine you want to know who talks to whom and how networks
are interconnected .

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

Another approach

Network analysis
Imagine you want to know who talks to whom and how networks
are interconnected .
Use a tool like NodeXL or Gephi!

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Network analysis

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API

(Twitter)

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API

(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user

(e.g., allmytweets.net), but if you want to capture a
#hashtag, you have to record it live

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems
You sometimes depend entirely on commercial parties
• Services can shut down (GoogleReader) or change their API

(Twitter)
• It’s rather easy to get (up to 3200) tweets from a specific user

(e.g., allmytweets.net), but if you want to capture a
#hashtag, you have to record it live
• Twitter doesn’t give you all tweets, but just about 1% (+ a

bunch of other limits)

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Problems

Not sure if this a problem or a great opportunity. . .
You cannot rely (only) on ready-made software but shout get ready
to use tools like bash-scripts, grep, python, . . . (Which can be fun!)

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

A glimpse in the kitchen

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

What I’m doing right now

Analyzing #tvduell
• 570.000 tweets
• Identifyig clusters of nouns, verbs and adjectives
• Assigning positivity and negativity scores to tweets
• See if they can be interpreted as frames

⇒How are Merkel and Steinbrück framed on the Second Secreen
during the debate?

#bigdata

Damian Trilling
What’s big data?

#bigdata

Some examples

Problems

A glimpse in the kitchen

Questions?

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Something you can use?
1 What’s big data?
2 Some examples

Rare events
Tone in tweets
Counting words and n-grams
Network analysis
3 Problems
4 A glimpse in the kitchen
5 Questions?

#bigdata

Damian Trilling
What’s big data?

Some examples

Problems

A glimpse in the kitchen

Questions?

Vragen of opmerkingen?

Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
#bigdata

Damian Trilling

More Related Content

What's hot (7)

BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6
 
What do you do with 280 million tweets from the 2016 U.S. election?
What do you do with 280 million tweets from the 2016 U.S. election?What do you do with 280 million tweets from the 2016 U.S. election?
What do you do with 280 million tweets from the 2016 U.S. election?
 
Guest lecture at Coding Culture, Utrecht
Guest lecture at Coding Culture, UtrechtGuest lecture at Coding Culture, Utrecht
Guest lecture at Coding Culture, Utrecht
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4
 
MKWI 2018 - Discussing the Value of Hate Speech Detection
MKWI 2018 - Discussing the Value of Hate Speech DetectionMKWI 2018 - Discussing the Value of Hate Speech Detection
MKWI 2018 - Discussing the Value of Hate Speech Detection
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2
 

Similar to Understanding Big Data with Examples

Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressMarcel Blattner, PhD
 
Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014JinYeong Bak
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitterKatrin Weller
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016steveparkhurst
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsCitizens in the Making
 
The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...Thomas Lancaster
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizingKrist Wongsuphasawat
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviewsJay (Jianqiang) Wang
 
Web Development or Data Science
Web Development or Data Science Web Development or Data Science
Web Development or Data Science Aaron Lamphere
 

Similar to Understanding Big Data with Examples (20)

BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1BDACA1516s2 - Lecture1
BDACA1516s2 - Lecture1
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
BDACA - Lecture1
 
BD-ACA week1b
BD-ACA week1bBD-ACA week1b
BD-ACA week1b
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
Twitter dissertation questions
Twitter dissertation questionsTwitter dissertation questions
Twitter dissertation questions
 
BDACA1516s2 - Lecture4
 BDACA1516s2 - Lecture4 BDACA1516s2 - Lecture4
BDACA1516s2 - Lecture4
 
Bigdata
BigdataBigdata
Bigdata
 
Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014Self-disclosure topic model for twitter conversations - EMNLP 2014
Self-disclosure topic model for twitter conversations - EMNLP 2014
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitter
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016
 
BD-ACA week4a
BD-ACA week4aBD-ACA week4a
BD-ACA week4a
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
 
Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social Good
 
The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...The Generative Artificial Intelligence Revolution and the Future of Academic ...
The Generative Artificial Intelligence Revolution and the Future of Academic ...
 
Unpacking Digital Methods
Unpacking Digital MethodsUnpacking Digital Methods
Unpacking Digital Methods
 
What to expect when you are visualizing
What to expect when you are visualizingWhat to expect when you are visualizing
What to expect when you are visualizing
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviews
 
Web Development or Data Science
Web Development or Data Science Web Development or Data Science
Web Development or Data Science
 

More from Department of Communication Science, University of Amsterdam

More from Department of Communication Science, University of Amsterdam (19)

BDACA - Lecture8
BDACA - Lecture8BDACA - Lecture8
BDACA - Lecture8
 
BDACA - Lecture6
BDACA - Lecture6BDACA - Lecture6
BDACA - Lecture6
 
BDACA - Tutorial5
BDACA - Tutorial5BDACA - Tutorial5
BDACA - Tutorial5
 
BDACA - Lecture5
BDACA - Lecture5BDACA - Lecture5
BDACA - Lecture5
 
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
BDACA - Lecture3
 
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Lecture2
 
BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA - Tutorial1
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
 
BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6
 
BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
 
BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Understanding Big Data with Examples

  • 1. What’s big data? Some examples Problems A glimpse in the kitchen Questions? #bigdata in Communication Science Some examples from research by me and my students Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam October 2013 #bigdata Damian Trilling
  • 2. What’s big data? Some examples Problems A glimpse in the kitchen Questions? 1 What’s big data? 2 Some examples Rare events Tone in tweets Counting words and n-grams Network analysis 3 Problems 4 A glimpse in the kitchen 5 Questions? #bigdata Damian Trilling
  • 3. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? #bigdata Damian Trilling
  • 4. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? No definition, but . . . • Existing data • Too big to code manually • Sometimes also too big to handle with normal tools • New research questions • Call to revisit the relationship between theory and empirical research #bigdata Damian Trilling
  • 5. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? Some sources • Social Network Sites • RSS-feeds • Databases • Scraping text from the web • ... #bigdata Damian Trilling
  • 6. What’s big data? Some examples Problems A glimpse in the kitchen Questions? It’s out there! You only have to collect it. #bigdata Damian Trilling
  • 7. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Some examples #bigdata Damian Trilling
  • 8. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events #bigdata Damian Trilling
  • 9. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events Imagine you want to analyze some very rare content. #bigdata Damian Trilling
  • 10. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events Imagine you want to analyze some very rare content. Normal sampling won’t work, that’s for sure. #bigdata Damian Trilling
  • 11. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 12. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites We collected all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 13. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites We collected all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. In a second step, we filtered those articles containing specific keywords. Those 292 articles where then manually coded. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 14. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events #bigdata Damian Trilling
  • 15. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events It’s just one line of code! url.txt http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food ... ... ... #bigdata wget-commando wget -i urls.txt Damian Trilling
  • 16. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets #bigdata Damian Trilling
  • 17. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. #bigdata Damian Trilling
  • 18. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. Do you really want to go through thousands of tweets by hand? #bigdata Damian Trilling
  • 19. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 20. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 21. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. We used a Python-script to check which type of words were used to refer to opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 22. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. We used a Python-script to check which type of words were used to refer to opponents. For further analysis, the results where imported in SPSS. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 23. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets #bigdata Damian Trilling
  • 24. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets #bigdata Damian Trilling
  • 25. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams #bigdata Damian Trilling
  • 26. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams Imagine you want to know which words or expressions dominate a discourse . #bigdata Damian Trilling
  • 27. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams Imagine you want to know which words or expressions dominate a discourse . There are plenty of possibilities to get an answer within minutes, here’s one: #bigdata Damian Trilling
  • 28. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams Again, just one or two lines of code! For example with STATA • Install the package wordscore (net install http://www.tcd.ie/Political_Science/wordscores/wordscores) • voor wordcounts: wordfreq /home/dami/texts/lab92.txt /home/dami/texts/lab97.txt • voor ngrams (trigrams in dit geval): phrasefreq 3 lab92.txt lab97.txt #bigdata Damian Trilling
  • 29. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams trigrams in Obama-Tweets #bigdata Damian Trilling
  • 30. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis #bigdata Damian Trilling
  • 31. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis Imagine you want to know who talks to whom and how networks are interconnected . #bigdata Damian Trilling
  • 32. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis Imagine you want to know who talks to whom and how networks are interconnected . Use a tool like NodeXL or Gephi! #bigdata Damian Trilling
  • 33. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis #bigdata Damian Trilling
  • 34. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems #bigdata Damian Trilling
  • 35. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties #bigdata Damian Trilling
  • 36. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) #bigdata Damian Trilling
  • 37. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) • It’s rather easy to get (up to 3200) tweets from a specific user (e.g., allmytweets.net), but if you want to capture a #hashtag, you have to record it live #bigdata Damian Trilling
  • 38. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) • It’s rather easy to get (up to 3200) tweets from a specific user (e.g., allmytweets.net), but if you want to capture a #hashtag, you have to record it live • Twitter doesn’t give you all tweets, but just about 1% (+ a bunch of other limits) #bigdata Damian Trilling
  • 39. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems Not sure if this a problem or a great opportunity. . . You cannot rely (only) on ready-made software but shout get ready to use tools like bash-scripts, grep, python, . . . (Which can be fun!) #bigdata Damian Trilling
  • 40. What’s big data? Some examples Problems A glimpse in the kitchen Questions? A glimpse in the kitchen #bigdata Damian Trilling
  • 41. What’s big data? Some examples Problems A glimpse in the kitchen Questions? What I’m doing right now Analyzing #tvduell • 570.000 tweets • Identifyig clusters of nouns, verbs and adjectives • Assigning positivity and negativity scores to tweets • See if they can be interpreted as frames ⇒How are Merkel and Steinbrück framed on the Second Secreen during the debate? #bigdata Damian Trilling
  • 42. What’s big data? #bigdata Some examples Problems A glimpse in the kitchen Questions? Damian Trilling
  • 43. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Something you can use? 1 What’s big data? 2 Some examples Rare events Tone in tweets Counting words and n-grams Network analysis 3 Problems 4 A glimpse in the kitchen 5 Questions? #bigdata Damian Trilling
  • 44. What’s big data? Some examples Problems A glimpse in the kitchen Questions? Vragen of opmerkingen? Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net #bigdata Damian Trilling