1. Rela%ve Trends in Scien%fic
Terms on Twi4er
Victoria Uren, Aba‐Sah Dadzie
The OAK Group, Dept. of Computer Science, The University of Sheffield
2. Introduc%on
• scien%fic research tradi%onally disseminated via journals, books,
scien%fic conferences
• new form of discourse – online social media
– suitable forum for dissemina%ng scien%fic research?
– do scien%sts engage with online social media?
– are there sufficient amounts of informa%on on scien%fic topics?
• are there suitable metrics for measuring scien%fic impact online?
– between scien%sts?
– for public engagement?
• are these new measures comparable to formal metrics?
altmetrics11: Tracking scholarly impact on the social Web
3. Outline
• Aims/Introduc%on
• Related Work
• Experiment
– Data
– Analysis & Results
• Conclusions
• Next Steps
• Acknowledgements
altmetrics11: Tracking scholarly impact on the social Web
4. Outline
• Aims/Introduc%on
• Related Work
• Experiment
– Data
– Analysis & Results
• Conclusions
• Next Steps
• Acknowledgements
altmetrics11: Tracking scholarly impact on the social Web
5. Related Work
• Garfield, E. (from 1950s)
– father of scientometrics
• Priem et al. (2010)
– Scientometrics 2.0 as a new metric for measuring scholarly impact on social web
• Lane (2010)
– need to improve metrics used to measure scien%fic impact
• Michel et al. (2011)
– Google nGrams to analyse culture
– a.o., recognised fame for scien%sts low…
• Cheong et al. (2009)
– H1N1 spike (trend) detected on Twi4er during flu pandemic (May 2009)
• Rowe et al. (2011)
– influence of content and author features on predic%on of ac%ve, long term
discussions on social web
• Kinsella et al. (2011)
– using hyperlinked metadata to aid categorisa%on of topics discussed in online
social media
altmetrics11: Tracking scholarly impact on the social Web
6. Outline
• Aims/Introduc%on
• Related Work
• Experiment
– Data
– Analysis & Results
• Conclusions
• Next Steps
• Acknowledgements
altmetrics11: Tracking scholarly impact on the social Web
7. Experiment
• exploratory experiment
– to determine frequency of occurrence of scien%fic term usage in
online social media
• data set
– three sets of (scien%fic) terms selected from UNESCO thesaurus
– Google Books NGrams corpus used as a baseline
– 300 tweets collected in each sample, using Twi4er API, for selected
terms
• frequency/usage analysis
altmetrics11: Tracking scholarly impact on the social Web
8. Outline
• Aims/Introduc%on
• Related Work
• Experiment
– Data
– Analysis & Results
• Conclusions
• Next Steps
• Acknowledgements
altmetrics11: Tracking scholarly impact on the social Web
9. UNESCO Thesaurus 1Gram Terms
Topic Terms
Physical Sciences Ioniza%on, Electromagne%sm, Crystallography
Chemical Sciences Phosphorus, Alkalinity, Microchemistry
Earth Sciences Permafrost, Lithosphere, Glaciology
• selec%on criteria
– minimisa%on of noise due to polysemy
– avoidance of scien%fic terms with other common/colloquial usage
– terms unique to a par%cular topic
– words with a single stem
– 1Grams only
altmetrics11: Tracking scholarly impact on the social Web
10. Baseline Dataset – Google 1Grams
• obtained from Google Books NGrams corpus1
• total NGrams by year for three sets of terms
– 2006 – 116,029
– 2007 – 126,206
– 2008 – 111,417
• annual varia%on by topic (of total NGrams baseline dataset)
– Chemical Sciences 50‐60%
– Physical Sciences 30‐40%
– Earth Sciences ~ 10%
• [1] h4p://ngrams.googlelabs.com/datasets
altmetrics11: Tracking scholarly impact on the social Web
12. Twi4er Dataset
Sample ID CollecAon Period Elapsed Time (h)
T‐300‐1 Tue Mar 01 20:56:43 GMT 2011 – 41
Thu Mar 03 14:22:18 GMT 2011
T‐300‐2 Fri Mar 04 02:35:55 GMT 2011 – 64
Sun Mar 06 18:38:05 GMT 2011
T‐300‐3 Mon Mar 07 20:31:11 GMT 2011 – 44
Wed Mar 09 16:21:36 GMT 2011
• three samples collected, containing 300 consecu%ve tweets each
• ~ 0.003% of total tweets over collec%on period
altmetrics11: Tracking scholarly impact on the social Web
13. Outline
• Aims/Introduc%on
• Related Work
• Experiment
– Data
– Analysis & Results
• Conclusions
• Next Steps
• Acknowledgements
altmetrics11: Tracking scholarly impact on the social Web
15. Twi4er c.f. Google NGrams
• higher varia%on in distribu%on for Twi4er sample
– however largely in line with Google NGrams
• can Google NGrams serve as a suitable baseline?
– need to more closely examine varia%on…
• notable peaks in Twi4er sample for three terms
– Permafrost (Earth Sciences)
– Alkalinity (Chemical Sciences)
– Phosphorus (Chemical Sciences)
• are these poten%al trends?
altmetrics11: Tracking scholarly impact on the social Web
17. Twi4er c.f. Google NGrams
• Permafrost
– 17% and 15% in Twi4er samples (T‐300‐1 & 2) – c.f. 5% in G‐2006‐2008
– 41 out of 113 tweets (36%) used in scien%fic context
– large number of tweets referred to
• online game server1
• designer case for iPhone
• Alkalinity
– none found to have scien%fic content
– mostly used in pseudo‐scien%fic health advice
– peak in T‐300‐2 (31 out of 60 tweets – ~50%)
• dominated by pH measures in swimming pools & fish tanks
• influence probably due to collec%on period – weekend – engagement in
leisure ac%vi%es
• [1] h4p://www.everquest2.com/Permafrost
altmetrics11: Tracking scholarly impact on the social Web
18. Example Tweets – Permafrost
• advert/chat
– @HDNinjacp go to Permafrost Its never full : Fri Mar 04 05:21:02 GMT 2011
– @Riffy8888 hey Could you Come to my Party birthday Party on CP March 13 Server
Permafrost Dock 6:00PST : Sun Mar 06 04:37:13 GMT 2011
– Party Server Permafrost Dock Please Go It's An Early Birthday Party For me : Thu Mar 03
01:38:28 GMT 2011
• cold
– 36 inches of permafrost s%ll, I want to stake my bird condo b4 the squirrals knock it
down again..bas%ds..all of'm : Sat Mar 05 01:51:48 GMT 2011
• science
– Fire and Ice: Permafrost Melt Spews Combus%ble Methane h4p://%ny.ly/be8q : Fri Mar
04 16:43:10 GMT 2011
– (retweeted) ‐ Experts Monitor Methane Release from Permafrost: Over the past few
years, methane levels around the world have b... h4p://bit.ly/hvVEJX : Wed Mar 02
12:27:25 GMT 2011
– RT @NetNewsBuzz: Permafrost Melt Soon Irreversible Without Major Fossil Fuel Cuts
h4p://%nyurl.com/5w8w2oh #oil #climate #CO2 #fossilfuels : Thu Mar 03 02:57:48 GMT
2011
altmetrics11: Tracking scholarly impact on the social Web
19. Example Tweets – Alkalinity T‐300‐2
• Chemistry Help Needed! pH, concentra%on of carbonate species and
alkalinity... just got published: h4p://bit.ly/hUCpz7
– URL points to the ques%on on “My Chemistry Tutor” – homework?
• retweeted
– The proper total alkalinity for your pool is 100 ppm. h4p://su.pr/8hrxCE : Fri Mar
04 19:02:20 GMT 2011
– If the Total Alkalinity in your swimming pool is low, your pH will be low. h4p://
su.pr/8hrxCE : Fri Mar 04 20:34:11 GMT 2011
• spam/adverts (including retweets)
– @Poet_Carl_Wa4s: some foods create acidity or alkalinity ayer they‚Äôre
metabolized...h4p://ping.fm/GQTvA #KnowledgeIsPower! : Sat Mar 05 02:38:55
GMT 2011
– RT @CourtneyPool: Green juice, oh Liquid Emerald Elixir of Life and Alkalinity!
Course through my BODY! #juicing : Sun Mar 06 18:34:29 GMT 2011
altmetrics11: Tracking scholarly impact on the social Web
20. Twi4er c.f. Google NGrams: Phosphorus
Sample Total LegislaAon NutriAon
Other Industry
White
ID Sciences
Phosphorus
T‐300‐1
129
46
16
29
4 5
T‐300‐2 119
4 26
35
9 5
T‐300‐3 171
12
23
37
42
19
• Twi4er trends for Phosphorus in sample T‐300‐3
– Industry
• takeover of a Brazilian company by the Indian firm United Phosphorus
– White Phosphorus
• 17 retweets of an emo%ve message (rela%on to Middle East wars)
altmetrics11: Tracking scholarly impact on the social Web
21. Twi4er c.f. Google NGrams: Phosphorus
• usage largely with scien%fic content
– with rela%onships, a.o., to legal, nutri%onal & economic context
– five main categories iden%fied
• Legisla%on
– limits to use in fer%liser, soap
• Nutri%on
– phosphorus content
• Other Science
– peak phosphorus, pollu%on
– discovery of arsenic replacing phosphorus in a microbe
– tweets about new paper on Redfield ra%o in organisms
• Industry
– mergers, prices of Phosphorus‐containing goods
• White Phosphorus
– use in Middle East wars
altmetrics11: Tracking scholarly impact on the social Web
22. Example Tweets – Phosphorus
• Legisla%on
– RT @YarnPlayCafe: The fact that he wants to repeal the phosphorus ban and kill the Madison
lakes is, by itself, enough to #killthisbill ... : Tue Mar 08 02:04:49 GMT 2011
• Nutri%on
– Big, wet snowflakes driy over the farm. To warm up, I try some Horlicks, a wheat/barley/whey
drink with lots of calcium & phosphorus. Mmmm. : Tue Mar 01 20:56:43 GMT 2011
– Vitamin D acts as an hormone and plays a controlling role in the metabolism of calcium and
phosphorus : Sun Mar 06 12:12:36 GMT 2011
• Other Science
– [java] 129 : Greater Phosphorus Efficiency h4p://bit.ly/iehsmK #agriculture : Wed Mar 02
14:36:21 GMT 2011
• Industry
– #stocks #bse #nse Buy United Phosphorus ‐ posi%ve move to tap largest La%n American market;
Edelweiss h4p://dlvr.it/JdSpV : Tue Mar 08 17:22:55 GMT 2011
– Enshi : Wugang develops technique to handle high‐phosphorus iron ore ‐ Steel Business Briefing
(subscri h4p://uxp.in/30538045 : Tue Mar 08 09:33:05 GMT 2011
• White Phosphorus
– Dear America, your white phosphorus and depleted uranium can not stop the growth of Iraq's
future. Iraq Will Rise. : Wed Mar 02 07:49:21 GMT 2011
– @Remroum so first they steal our land, now they want our "tac%cs" i.e. poetry? i guess the white
phosphorus just isn't cu•ng it anymore. : Sat Mar 05 03:42:44 GMT 2011
• ???
– @p_kojo ‐ Phosphorus Potassium ‐ Pinocchio , I'm so glad we found each other nw we can hav
lots of fun :) : Sun Mar 06 13:43:10 GMT 2011
altmetrics11: Tracking scholarly impact on the social Web
23. Outline
• Aims/Introduc%on
• Related Work
• Experiment
– Data
– Analysis & Results
• Conclusions
• Next Steps
• Acknowledgements
altmetrics11: Tracking scholarly impact on the social Web
24. Conclusions – Experiment
• recognised challenges
– baseline corpus for online social media difficult to obtain
• very small (rela%vely) samples found in Twi4er stream
• difficult to obtain representa%ve samples
more effec%ve methods required to extract lower frequency terms
– difficulty reproducing experiments
– reliability, ethical & privacy issues – due to user‐created content
• what is a suitable, publicly available baseline corpus?
– Google NGrams?
• different informa%on collec%on methods from online social media
– coverage of topics may see large varia%on between corpora
– any others?
• Wikipedia/DBpedia? TREC?
altmetrics11: Tracking scholarly impact on the social Web
25. Engagement with the Web?
• why do scien%sts not tweet? (or engage much in other social media)?
– is the web not seen to enforce sufficient scien%fic rigour?
– do scien%sts not view the web as a poten%al audience?
• is the web audience a suitable peer reviewer?
• why do scien%sts hesitate to disseminate informa%on online?
– poten%al for ideas to be stolen?
– trust – how to differen%ate between valid science and pseudo‐science,
spam and adverts?
• social media largely driven by personal interest, sen%ment, opinion
– may explain low scien%fic content
– more colloquial use of what is tradi%onally scien%fic terminology
altmetrics11: Tracking scholarly impact on the social Web
26. Implica%ons for Altmetrics
• however ‐ some level of scien%fic discourse on Twi4er
– e.g., Phosphorus iden%fied as a poten%al Twi4er trend
• online social media may s%ll have poten%al to serve as an altmetric
for measuring impact of science
• star%ng from scientometrics ‐ which looks at author features, e.g.,
– co‐cita%on
– affilia%on – rela%onship to reputa%on
• corresponding features in online social media
– followers
– retweets – rela%onship to trust?
altmetrics11: Tracking scholarly impact on the social Web
27. Outline
• Aims/Introduc%on
• Related Work
• Experiment
– Data
– Analysis & Results
• Conclusions
• Next Steps
• Acknowledgements
altmetrics11: Tracking scholarly impact on the social Web
28. Next Steps
• replicate experiments with larger samples over longer period
– more detailed analysis
• e.g., hashtag analysis; urls within tweets
• focus on terms with more trending poten%al, e.g., nanostructures, nanosilver
• consider specific tweets
– from scien%fic media and journals
– posted during scien%fic conferences, congresses
• comparison with other independent baseline data sets
• compare Twi4er use within different disciplines
– influence of interdisciplinary collabora%on on use of online social media?
• create new benchmarks data & experiments
define alt‐metric for scien%fic term usage in online social media
altmetrics11: Tracking scholarly impact on the social Web
29. Acknowledgements
• Elizabeth Cano for discussions on collec%on and use of data from
Twi4er streams
• V.S. Uren & A.‐S. Dadzie funded by:
– European Commission 7th Framework Programme project
SmartProducts (grant no. 231204)
altmetrics11: Tracking scholarly impact on the social Web
30. References
• Garfield bib ‐ h4p://garfield.library.upenn.edu/pub.html
• Ma4hew Rowe, Sofia Angeletou and Harith Alani. (2011) Predic%ng Discussions on
the Social Seman%c Web, Proc., ESWC (2) 2011: 405‐420
• Sheila Kinsella, Mengjiao Wang, John Breslin and Conor Hayes. (2011) Improving
Categorisa%on in Social Media using Hyperlinks to Structured Data Sources, Proc.,
ESWC (2) 2011: 390–404
• others in paper references – see h4p://altmetrics.org/altmetrics11/uren‐v0
altmetrics11: Tracking scholarly impact on the social Web