Kalev Leetaru is a Senior Fellow at the George Washington University Center for Cyber & Homeland Security and a member of its Counterterrorism and Intelligence Task Force. Leetaru was named one of Foreign Policy Magazine's Top 100 Global Thinkers of 2013, as well as being a 2015-2016 Google Developer Expert for Google Cloud Platform. Leetaru's work focuses on how innovative applications of the world's largest datasets, computing platforms, algorithms and mind-sets can reimagine the way we understand and interact with our global world. The GDELT Project is a realtime open data global graph over human society as seen through the eyes of the world's news media, reaching deeply into local events, reaction, discourse, and emotions of the most remote corners of the world in near-realtime and making all of this available as an open data firehose to enable research over human society.
5. Datasets
• NEWS: Worldwide local news coverage in 100 languages (65 live
translated) – online news preserved via Internet Archive
• TELEVISION: Collaboration with the Internet Archive to process
more than 100 television stations across the US, updating daily
• ACADEMIC LITERATURE: 21 billion words covering 70 years
(JSTOR/DTIC/CORE/CITESEER/IA)
• BOOKS: Collaboration with Internet Archive and HathiTrust to
process 3.5 million books 1800-2015
• HUMAN RIGHTS: Half century of worldwide human rights reports
• IMAGERY: Large fraction of global news imagery processed via deep
learning: objects/activities, OCR, logos, facial sentiment, geolocation
6.
7.
8. Preserving Online News
• World’s largest initiative to preserve online news
• Only program to focus on worldwide local news in local
languages
• Partnership with Internet Archive’s NO404 program - prior
to this IA’s news archiving was very limited, focused
extensively on the Western world and major English-
language sources
• Most web archiving efforts preference English and Western
news outlets
• Working with IA to ensure preservation of mobile formats
and enhanced preservation of embedded article imagery
9. Preserving Online News
• 1.5-2% of news articles disappear within 2 weeks
• 5% disappear within a month
• Up to 14% gone after 2 months – half with 404 and half
ranging from sustained 500’s to domain removal (popular in
some areas of the world)
• Of GDELT-relevant coverage, 140,000 articles published
today will be gone in 2 months
• 14 million GDELT monitored articles disappeared over a 6
month period representing 2x the total output of the New
York Times over the last half century
• Numbers vastly higher in some countries
10. Preserving Online News
• Manual efforts like Archive-IT don’t scale to sudden-
onset events like natural disasters or terror attacks –
need “always on” archiving. Majority of coverage in
first 72 hours and levels off in 14 days.
• Nepal 2015 earthquake: Yale + Columbia preserved 107
URLs with ArchiveIT.
• Nepal 2015 earthquake: GDELT captured over 667,000
articles about the earthquake and the country’s
recovery over the following year, including 225,000 in
languages other than English, with the top language
being Nepali – capturing the local perspective
18. US Ebola News Coverage
Number American television news broadcasts per week mentioning "ebola"
• March 2014 WHO
announcement
• First American infections
• Eric Duncan arrives in Dallas
Average “tone” of English language media coverage of “ebola”
• Steady ascent towards more
and more positive coverage
as “Western medicine
miracles to the rescue”
theme dominates coverage
19.
20.
21.
22. Carbon Capture & Sequestration
• English coverage of CCS 2010-2015
• 32,000 websites, 250,000 people, 140,000 organizations,
50,000 locations
• Green cluster (center): senior American policymakers
• Green cluster (lower): “cap and trade” politicians
• Red cluster (bottom): American lawmakers on Congressional
energy committee or sponsoring energy-related legislation
• Purple cluster (top right): climate skeptics
• Yellow (upper left): Australian politicians
• Pink (upper center): British politicians
• Periphery of all clusters: journalists and financial analysts who
feature prominently in coverage or who write much of the
coverage – Karolin Schaps (Reuters) and Alex Morales
(Bloomberg News London) are attached to British political
cluster; Tom Friedman is attached to American political
cluster
23.
24. • Red: Actual Ukraine
• Green: Avg Turkey
(2/19/1999-
4/20/1999) and
Lebanon
(3/24/2007-
5/23/2007)
• (r=0.49)
Here’s a timeline of instability in Turkey from February to September of this year. The top timeline is physical instability, showing the surge in violent clashes between the Turkish government and the Kurdistan Worker’s Party. The bottom timeline charts “anxiety” showing how even as attacks were decreasing and the country was becoming calmer, anxiety was increasing as people were unsure about what might come next and if the calm would last.
Calculate the mathematical equations governing news coverage of natural disasters.
Yet, timelines are just one of the ways we can visualize all of this data. Maps and network diagrams are especially powerful ways of understanding society. In clockwise order: 1) A map by BBVA of refugee inflows and outflows, 2) A map of global wildlife crime, 3) A network of the people most heavily associated with Russian economic sanctions, 4) A geographic network diagram of refugee flows, 5) A map of the countries discussing the Greek economic crisis, 6) A map of global anti-tank and aircraft missile activity
Single map on the bottom combined 860 billion emotional scores, 1.5 billion location mentions, 89 million events and 1.4 million photographs from 200 million news articles in 65 languages from every country on earth covering 2015