Evolution of the Humanitarian Data Ecosystem

Evolution of the
Humanitarian Data
Ecosystem
Sara Terp, AAAI 2015

SJ’s Stages of Data Use
• Hand-scraping (including lists of where to look),
random categories, SMS, maps
• Standards and dataset visualisations
• Mashups and statistical analysis
• Stable datastores and local data scientists

2004-2009
• December 2004: Boxing Day Tsunami kills 230,000 people. Sri
Lankan techs create Sahana
• January 2008: Kenyan news blackout during post-election violence.
Bloggers create Ushahidi
• June 2009: CrisisCommons forms after a tweet-up
• October 2009: ICCM conference, Cleveland
• 2009: Ushahidi creates CrisisMappers
• 2009: First RHOK hackathon creates PeopleFinder
• 2009: CDAC forms after a discussion in a bar

Intelligence Systems
BOTSHUMANS
Good at: complex analysis,
heuristics, pragmatic
translations, creative data
finding, sudden onset
Not so good at: high volume,
repetitive, 24/7 accurate
Good at: high volume,
repetitive, complex
pattern finding, long term
Not so good at:
complexity, human foibles

Unmanned Vehicle Control
PACT locus of Authorith Computer Autonomy PACT Level Sheridan & Verplank
Computer monitored by
human
Full 5b Computer does everything autonomously
5a
Computer chooses action, performs it &
informs human
Computer backed up by
human
Action unless revoked 4b
Computer chooses action & performs it
unless human disapproves
4a
Computer chooses action & performs it if
human approves
Human backed up by
computer
Advice, and if
authorised, action
3
Computer suggests options and proposes
one of them
Human assisted by
computer
Advice 2 Computer suggests options to human
Human assisted by
computer only when
requested
Advice only if requested 1
Human asks computer to suggest options
and human selects
Operator None 0
Whole task done by human except for
actual operations

“Don’t be Imperial”
Pro: “Laboratory” =
on behalf of
Per: “Community” =
alongside
Para: “Grassroots” –
by and within

Volunteer Skills Used
Programming
Telecommunications
Mapping
User Experience
IT project management
Data analysis
Relief work experience
Local knowledge
Translation
Communications & PR
Facilitation and admin
Making tea!

Data Process
Ask a good question…
Obtain datasets
Clean, combine, transform data
Explore the data
Try models (classification, machine learning etc)
Interpret and communicate your results

People started conversations…
• Twitter
• Facebook
• SMS
• Phones
• Photos
• News
• Sneakernet
DecisionsGAP
Overworked
Field People

@bodaceacat
http://blog.overcognition.com/
Creating Datasets
• People add features to OpenStreepMap
• Person sends SMS to 4636
• Message goes to CrowdFlower
• Person translates and geolocates message
• Message goes to Ushahidi display
• Message gets to responders, public, aunts, Sahana etc.

Building Technologies
Ongoing:
• CDAC website review
• Field Voices
• Haiti Amps Network
• Haitian Voices
• Machine Translation System
• Oil Spill Response
• PAP outskirts food relief
• Telecommunications technical project
• Low-bandwidth Ushahidi
• Kapab Medical Facility Capacity Finder
• Disaster Accountability Public Database
• Sync the Sheet
• Testing Crabgrass
Closed:
• Translators in Action - other translation tools were
developed
Proposed
• Mining Relief Data
• Automating Aid Request via a Voice Phone Call
• Building A Refugee Camp Cell Phone Early
Warning System
• Community Tool Box
• CrisisCommons Roledex
• Facebook for ARC Safe and Well site
• Haitian Skilled Workforce Retention
• Post Disaster Child Protection
• CDAC Radio Website
Unknown
• Disaster Accountability Hotline
• Incident visualisation
• Needs Categorization
• World Academic TeaCHing Hospitals disaster
relief

Improving Technologies
• ReliefWeb UX redesign
• Ushahidi UX redesign
• CDAC website review
• OpenStreetMap development, at other end of table;
OpenStreetMap users at the other

@bodaceacat
What’s an appropriate crisis to help?
• Information
– Information deluge
– Knowledge drought
• Infrastructure
– Local infrastructure is overwhelmed
– Existing information channels
• Stages
– Mitigation
– Preparedness
– Response
– Recovery
– Sustainability

@bodaceacat
user questions for pkfloods
• Where can I find out who needs my help?
• Where can I find people to help me deliver aid?
• Where can I find out information?
• How do I find out if I'm about to be flooded?
• Who should I alert/give my information to?
• Where can I find general information out about #pkfloods?
• Where can I search for people? (I cannot find my grandmother/relative)
• I have been 'found' - who should I alert/give my status to?
• I need food/water/supplies, how can I tell people I need something?
• I have food/water/supplies, how can I find out where there's a need?
• I want to get to location x, where can I find out about the state of the roads?
• I am observing/know the state of the roads, who should I alert/give my
information to?
• How can I find out where there are information blackspots/there is no
telecomms coverage?
• I know where the telecoms/information blackspots are, who should I give my
alert/information to and how?

@bodaceacat
Pkfloods Use Cases

What if the datapoints move?
• Ash cloud from Snæfellsjökull left planes on ground
and thousands of people stranded
• UK crisis mappers started news and twitter watches
• Needed a tool that let us track who was stranded
and ways for people to get home
• But all the methods we had were static

@bodaceacat
The 2010 Vision:
effective crisis information ecosystems

Task Types
• Message level:
• Media monitoring, source checking (e.g. SMS), summarisation, translation,
geolocation, cleaning (e.g. PII removal), categorising (e.g. grouping)
• Meta level:
• Analysis (producing graphs, explanations, connections),
• Verification
• Tasks / team control
• Communication
• After-action reporting (inc evaluation)

Sudden-Onset Crisis
• Fire, flood, heat, cold, tsunami, earthquake, storm,
tornado, hurricane, cyclone, refugees, bombings,
election issues / violence etc

Slow-Burn Crises
Droughts, agriculture, food insecurity, conflict,
education, disease, employment, shelter, trade,
endemic violence, GBV etc.
“Human development is a process of enlarging people’s choices.
The most critical ones are to lead a long and healthy life, to be
educated and to enjoy a decent standard of living. Additional
choices include political freedom, guaranteed human rights and
self-respect – what Adam Smith called the ability to mix with
others without being ashamed to appear in public” – UNDP Human
Development Report

Crisismapping Early 2011: radiation

Data CrossWalks
DR Congo in Data.UN.Org:
“Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the
Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem.
Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the
Congo”
DR Congo in common standards:
“Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of
the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN
Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)

Common Data Needs
• Rolodexes: which response groups to follow, and who’s
likely to bring what
• 3Ws: who’s doing what where
• GIS data: knowing where medical facilities, schools, roads,
bridges are
• Communications: cell tower locations and signal maps
• Demographics.
• Technology and social media use to demographics

Commonly Available Data
• Direct messages (SMS etc)
• Social media messages (tweets etc)
• Demographic data (e.g. surveys)
• News reports
• 3Ws, situation reports (both official, via news sources and on
social media), field notes
• Photos: ground, aerial, satellite, videos
• CSVs, webpages, PDFs, audio recordings (e.g. radio)

Common Issues
• Massively dispersed and unstructured data (still)
• Named entity and category mismatches between datasets
• Trust
• Personally Identifiable Information (and risk)
* Crisis response is time-limited
* Crisis data response is resource-limited
* Crisis preparation is attention-limited (if you want resilience,
either pay or lead)

(Some of) What’s Broken
• Crisis Data
– Remote vs Ground disconnect
– Crisis vs Development disconnect
– Deployment lead overload
• Development Data
– Broken data formats, access, coverage, standards
– Ignored data sources
– Human vs Data disconnect
• Communities
– Stovepipes, fiefdoms, imperialism, finding…

My Personal Three Vs
• Variety
– Data all over the place
– Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images,
videos, audiofiles, maps, proprietary. Etc.
• Velocity
– Streams updating too fast for a mapping team (100-200 people) to handle
– Pages updating too frequently to check by hand
• Volume
– Can’t open the data in a spreadsheet
– Can’t fit the data on my laptop
– Maxes out my credit card (thank you Amazon!)

Mappers Needed More Data Science Literacy

Ushahidi Platform
PHOTOS, VIDEOS

Local wins. Local should
(almost) always win

Ushahidi Platforms as
Datasets

And are making it part of “normality”

Here are some missing
pieces
• Basic vocabularies, e.g. stopword lists for most languages
(including SMSspeak in different languages)
• Pre-crisis datasets for many crisis-prone countries
• Philippines: local response groups set up
• Missing Maps project for GIS data
• What about the rest?
• User datasets in existing tools
• E.g. adding own gazetteers into Ushahidi.

Evolution of the Humanitarian Data Ecosystem

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Evolution of the Humanitarian Data Ecosystem

Similaire à Evolution of the Humanitarian Data Ecosystem (20)

Plus de Sara-Jayne Terp

Plus de Sara-Jayne Terp (20)

Dernier

Dernier (20)

Evolution of the Humanitarian Data Ecosystem

Notes de l'éditeur