SlideShare une entreprise Scribd logo
1  sur  70
Evolution of the
Humanitarian Data
Ecosystem
Sara Terp, AAAI 2015
SJ’s Stages of Data Use
• Hand-scraping (including lists of where to look),
random categories, SMS, maps
• Standards and dataset visualisations
• Mashups and statistical analysis
• Stable datastores and local data scientists
2004-2009
• December 2004: Boxing Day Tsunami kills 230,000 people. Sri
Lankan techs create Sahana
• January 2008: Kenyan news blackout during post-election violence.
Bloggers create Ushahidi
• June 2009: CrisisCommons forms after a tweet-up
• October 2009: ICCM conference, Cleveland
• 2009: Ushahidi creates CrisisMappers
• 2009: First RHOK hackathon creates PeopleFinder
• 2009: CDAC forms after a discussion in a bar
Intelligence Systems
BOTSHUMANS
Good at: complex analysis,
heuristics, pragmatic
translations, creative data
finding, sudden onset
Not so good at: high volume,
repetitive, 24/7 accurate
Good at: high volume,
repetitive, complex
pattern finding, long term
Not so good at:
complexity, human foibles
Unmanned Vehicle Control
PACT locus of Authorith Computer Autonomy PACT Level Sheridan & Verplank
Computer monitored by
human
Full 5b Computer does everything autonomously
5a
Computer chooses action, performs it &
informs human
Computer backed up by
human
Action unless revoked 4b
Computer chooses action & performs it
unless human disapproves
4a
Computer chooses action & performs it if
human approves
Human backed up by
computer
Advice, and if
authorised, action
3
Computer suggests options and proposes
one of them
Human assisted by
computer
Advice 2 Computer suggests options to human
Human assisted by
computer only when
requested
Advice only if requested 1
Human asks computer to suggest options
and human selects
Operator None 0
Whole task done by human except for
actual operations
2010: Haiti, VTCs
“Don’t be Imperial”
Pro: “Laboratory” =
on behalf of
Per: “Community” =
alongside
Para: “Grassroots” –
by and within
Volunteer Skills Used
Programming
Telecommunications
Mapping
User Experience
IT project management
Data analysis
Relief work experience
Local knowledge
Translation
Communications & PR
Facilitation and admin
Making tea!
Data Scientist Skills
Data Process
Ask a good question…
Obtain datasets
Clean, combine, transform data
Explore the data
Try models (classification, machine learning etc)
Interpret and communicate your results
People started conversations…
• Twitter
• Facebook
• SMS
• Phones
• Photos
• News
• Sneakernet
DecisionsGAP
Overworked
Field People
SMS to Map
@bodaceacat
http://blog.overcognition.com/
Creating Datasets
• People add features to OpenStreepMap
• Person sends SMS to 4636
• Message goes to CrowdFlower
• Person translates and geolocates message
• Message goes to Ushahidi display
• Message gets to responders, public, aunts, Sahana etc.
Interpreting Aerial Images
Building Technologies
Ongoing:
• CDAC website review
• Field Voices
• Haiti Amps Network
• Haitian Voices
• Machine Translation System
• Oil Spill Response
• PAP outskirts food relief
• Telecommunications technical project
• Low-bandwidth Ushahidi
• Kapab Medical Facility Capacity Finder
• Disaster Accountability Public Database
• Sync the Sheet
• Testing Crabgrass
Closed:
• Translators in Action - other translation tools were
developed
Proposed
• Mining Relief Data
• Automating Aid Request via a Voice Phone Call
• Building A Refugee Camp Cell Phone Early
Warning System
• Community Tool Box
• CrisisCommons Roledex
• Facebook for ARC Safe and Well site
• Haitian Skilled Workforce Retention
• Post Disaster Child Protection
• CDAC Radio Website
Unknown
• Disaster Accountability Hotline
• Incident visualisation
• Needs Categorization
• World Academic TeaCHing Hospitals disaster
relief
Improving Technologies
• ReliefWeb UX redesign
• Ushahidi UX redesign
• CDAC website review
• OpenStreetMap development, at other end of table;
OpenStreetMap users at the other
Building Interfaces
Creating Community Sensors
@bodaceacat
http://blog.overcognition.com/
What’s an appropriate crisis to help?
• Information
– Information deluge
– Knowledge drought
• Infrastructure
– Local infrastructure is overwhelmed
– Existing information channels
• Stages
– Mitigation
– Preparedness
– Response
– Recovery
– Sustainability
@bodaceacat
http://blog.overcognition.com/
user questions for pkfloods
• Where can I find out who needs my help?
• Where can I find people to help me deliver aid?
• Where can I find out information?
• How do I find out if I'm about to be flooded?
• Who should I alert/give my information to?
• Where can I find general information out about #pkfloods?
• Where can I search for people? (I cannot find my grandmother/relative)
• I have been 'found' - who should I alert/give my status to?
• I need food/water/supplies, how can I tell people I need something?
• I have food/water/supplies, how can I find out where there's a need?
• I want to get to location x, where can I find out about the state of the roads?
• I am observing/know the state of the roads, who should I alert/give my
information to?
• How can I find out where there are information blackspots/there is no
telecomms coverage?
• I know where the telecoms/information blackspots are, who should I give my
alert/information to and how?
@bodaceacat
http://blog.overcognition.com/
Pkfloods Use Cases
What if the datapoints move?
• Ash cloud from Snæfellsjökull left planes on ground
and thousands of people stranded
• UK crisis mappers started news and twitter watches
• Needed a tool that let us track who was stranded
and ways for people to get home
• But all the methods we had were static
@bodaceacat
http://blog.overcognition.com/
The 2010 Vision:
effective crisis information ecosystems
Responder-triggered VTCs
Task Types
• Message level:
• Media monitoring, source checking (e.g. SMS), summarisation, translation,
geolocation, cleaning (e.g. PII removal), categorising (e.g. grouping)
• Meta level:
• Analysis (producing graphs, explanations, connections),
• Verification
• Tasks / team control
• Communication
• After-action reporting (inc evaluation)
Sudden-Onset Crisis
• Fire, flood, heat, cold, tsunami, earthquake, storm,
tornado, hurricane, cyclone, refugees, bombings,
election issues / violence etc
2011: UN Data Science
Slow-Burn Crises
Droughts, agriculture, food insecurity, conflict,
education, disease, employment, shelter, trade,
endemic violence, GBV etc.
“Human development is a process of enlarging people’s choices.
The most critical ones are to lead a long and healthy life, to be
educated and to enjoy a decent standard of living. Additional
choices include political freedom, guaranteed human rights and
self-respect – what Adam Smith called the ability to mix with
others without being ashamed to appear in public” – UNDP Human
Development Report
Crisismapping Early 2011: radiation
Category Standards
Human/Machine Data Generation
Data CrossWalks
DR Congo in Data.UN.Org:
“Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the
Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem.
Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the
Congo”
DR Congo in common standards:
“Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of
the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN
Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)
2012: Partial Automation
ACAPS DNA
Data Finding
Common Data Needs
• Rolodexes: which response groups to follow, and who’s
likely to bring what
• 3Ws: who’s doing what where
• GIS data: knowing where medical facilities, schools, roads,
bridges are
• Communications: cell tower locations and signal maps
• Demographics.
• Technology and social media use to demographics
Commonly Available Data
• Direct messages (SMS etc)
• Social media messages (tweets etc)
• Demographic data (e.g. surveys)
• News reports
• 3Ws, situation reports (both official, via news sources and on
social media), field notes
• Photos: ground, aerial, satellite, videos
• CSVs, webpages, PDFs, audio recordings (e.g. radio)
Common Issues
• Massively dispersed and unstructured data (still)
• Named entity and category mismatches between datasets
• Trust
• Personally Identifiable Information (and risk)
* Crisis response is time-limited
* Crisis data response is resource-limited
* Crisis preparation is attention-limited (if you want resilience,
either pay or lead)
(Some of) What’s Broken
• Crisis Data
– Remote vs Ground disconnect
– Crisis vs Development disconnect
– Deployment lead overload
• Development Data
– Broken data formats, access, coverage, standards
– Ignored data sources
– Human vs Data disconnect
• Communities
– Stovepipes, fiefdoms, imperialism, finding…
2013: Data Overloads
Cleaner Workflows
More Maps
2013 Boston bombings
My Personal Three Vs
• Variety
– Data all over the place
– Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images,
videos, audiofiles, maps, proprietary. Etc.
• Velocity
– Streams updating too fast for a mapping team (100-200 people) to handle
– Pages updating too frequently to check by hand
• Volume
– Can’t open the data in a spreadsheet
– Can’t fit the data on my laptop
– Maxes out my credit card (thank you Amazon!)
The other Vs: Veracity
Mappers Needed More Data Science Literacy
Datastores
2014: Datastores
We Build Community Data
Tools
Ushahidi is a Dataset
Ushahidi Platform
PHOTOS, VIDEOS
Ushahidi Platform as Data
Non-Expert Visualisations
Word-level analysis
Typhoon Ruby, Dec 2014
Where to Map?
Stuff Happens
Lots of groups curate data
Including volunteer mappers
Ruby Datastores
Local wins. Local should
(almost) always win
2015: NGO Data Scientists
Ushahidi Platforms as
Datasets
Datastores and Viz
Resilience
And are making it part of “normality”
Here are some missing
pieces
• Basic vocabularies, e.g. stopword lists for most languages
(including SMSspeak in different languages)
• Pre-crisis datasets for many crisis-prone countries
• Philippines: local response groups set up
• Missing Maps project for GIS data
• What about the rest?
• User datasets in existing tools
• E.g. adding own gazetteers into Ushahidi.

Contenu connexe

Tendances

How Can Media Reconnect Us With Our Humanity? (FULL DECK)
How Can Media Reconnect Us With Our Humanity? (FULL DECK)How Can Media Reconnect Us With Our Humanity? (FULL DECK)
How Can Media Reconnect Us With Our Humanity? (FULL DECK)Tyrone Grandison
 
A Consumer Health Librarian’s National Library of Medicine Funded Project in...
A Consumer Health Librarian’s  National Library of Medicine Funded Project in...A Consumer Health Librarian’s  National Library of Medicine Funded Project in...
A Consumer Health Librarian’s National Library of Medicine Funded Project in...Robin M. Ashford, MSLIS
 
Measuring Networked Nonprofit - Peer Learning
Measuring Networked Nonprofit - Peer LearningMeasuring Networked Nonprofit - Peer Learning
Measuring Networked Nonprofit - Peer LearningBeth Kanter
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsCarlos Castillo (ChaTo)
 
Emergency Risk Communication
Emergency Risk CommunicationEmergency Risk Communication
Emergency Risk CommunicationHeather Blanchard
 
The Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster ResponseThe Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster ResponseMuhammad Imran
 
Sahana Booklet
Sahana BookletSahana Booklet
Sahana BookletTalkSahana
 
Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...
Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...
Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...Chad Norman
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kindSara-Jayne Terp
 
Typhoon pablo bopha activation
Typhoon pablo bopha activationTyphoon pablo bopha activation
Typhoon pablo bopha activationCatherine Graham
 
Gis t april 2010 findings
Gis t april 2010 findingsGis t april 2010 findings
Gis t april 2010 findingsKSI Koniag
 
What If You Let Citizens Build Your Website?
What If You Let Citizens Build Your Website?What If You Let Citizens Build Your Website?
What If You Let Citizens Build Your Website?GovLoop
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...UNDP Eurasia
 
Crowd Based Information Management
Crowd Based Information ManagementCrowd Based Information Management
Crowd Based Information ManagementDokkan
 

Tendances (20)

New/Social Media & Emergencies
New/Social Media & EmergenciesNew/Social Media & Emergencies
New/Social Media & Emergencies
 
SSF NYC VOADS January 2012
SSF NYC VOADS January 2012SSF NYC VOADS January 2012
SSF NYC VOADS January 2012
 
Ssf techchange jan2012
Ssf techchange jan2012Ssf techchange jan2012
Ssf techchange jan2012
 
Japan After Action Review
Japan After Action ReviewJapan After Action Review
Japan After Action Review
 
How Can Media Reconnect Us With Our Humanity? (FULL DECK)
How Can Media Reconnect Us With Our Humanity? (FULL DECK)How Can Media Reconnect Us With Our Humanity? (FULL DECK)
How Can Media Reconnect Us With Our Humanity? (FULL DECK)
 
A Consumer Health Librarian’s National Library of Medicine Funded Project in...
A Consumer Health Librarian’s  National Library of Medicine Funded Project in...A Consumer Health Librarian’s  National Library of Medicine Funded Project in...
A Consumer Health Librarian’s National Library of Medicine Funded Project in...
 
Measuring Networked Nonprofit - Peer Learning
Measuring Networked Nonprofit - Peer LearningMeasuring Networked Nonprofit - Peer Learning
Measuring Networked Nonprofit - Peer Learning
 
Social Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of NewsSocial Media News Mining and Automatic Content Analysis of News
Social Media News Mining and Automatic Content Analysis of News
 
Emergency Risk Communication
Emergency Risk CommunicationEmergency Risk Communication
Emergency Risk Communication
 
The Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster ResponseThe Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster Response
 
Sahana Booklet
Sahana BookletSahana Booklet
Sahana Booklet
 
Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...
Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...
Panel: Across The Specturm of Social Media - How Nonprofit Organizations of A...
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind
 
Typhoon pablo bopha activation
Typhoon pablo bopha activationTyphoon pablo bopha activation
Typhoon pablo bopha activation
 
Gis t april 2010 findings
Gis t april 2010 findingsGis t april 2010 findings
Gis t april 2010 findings
 
What If You Let Citizens Build Your Website?
What If You Let Citizens Build Your Website?What If You Let Citizens Build Your Website?
What If You Let Citizens Build Your Website?
 
ChangeMedium - Overview
ChangeMedium - OverviewChangeMedium - Overview
ChangeMedium - Overview
 
City camp london
City camp londonCity camp london
City camp london
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...
 
Crowd Based Information Management
Crowd Based Information ManagementCrowd Based Information Management
Crowd Based Information Management
 

Similaire à Evolution of the Humanitarian Data Ecosystem

Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social MachineUlrik Lyngs
 
Big data and development
Big data and developmentBig data and development
Big data and developmentSimone Sala
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 
STEAM: A Framework for 21st Century Education
STEAM:  A Framework for 21st Century EducationSTEAM:  A Framework for 21st Century Education
STEAM: A Framework for 21st Century Educationboralogix
 
Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012Sara-Jayne Terp
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsAdam Papendieck
 
CrisisCampUk: Where next for UK crisis crowdsourcing
CrisisCampUk: Where next for UK crisis crowdsourcingCrisisCampUk: Where next for UK crisis crowdsourcing
CrisisCampUk: Where next for UK crisis crowdsourcingSara-Jayne Terp
 
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective LivesJoy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective LivesBayCHI
 
Knowledge Management and Open Data for Innovation
Knowledge Management and Open Data for InnovationKnowledge Management and Open Data for Innovation
Knowledge Management and Open Data for InnovationJeanne Holm
 
Crowdmapping & Verification Hanoi Workshop
Crowdmapping & Verification Hanoi WorkshopCrowdmapping & Verification Hanoi Workshop
Crowdmapping & Verification Hanoi WorkshopBrian Herbert
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
 

Similaire à Evolution of the Humanitarian Data Ecosystem (20)

Big Data and Me
Big Data and MeBig Data and Me
Big Data and Me
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social Machine
 
08302011 cc vtc_risk
08302011 cc vtc_risk08302011 cc vtc_risk
08302011 cc vtc_risk
 
Big data and development
Big data and developmentBig data and development
Big data and development
 
ICCM 2014 -- Ignite Talks -- Session 2
ICCM 2014 -- Ignite Talks -- Session 2ICCM 2014 -- Ignite Talks -- Session 2
ICCM 2014 -- Ignite Talks -- Session 2
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 
STEAM: A Framework for 21st Century Education
STEAM:  A Framework for 21st Century EducationSTEAM:  A Framework for 21st Century Education
STEAM: A Framework for 21st Century Education
 
Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Scaling Crisismapping
Scaling CrisismappingScaling Crisismapping
Scaling Crisismapping
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Emergency Communications
Emergency CommunicationsEmergency Communications
Emergency Communications
 
2009 unicef open everything nyc
2009 unicef open everything nyc2009 unicef open everything nyc
2009 unicef open everything nyc
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis Informatics
 
Data science general
Data science generalData science general
Data science general
 
CrisisCampUk: Where next for UK crisis crowdsourcing
CrisisCampUk: Where next for UK crisis crowdsourcingCrisisCampUk: Where next for UK crisis crowdsourcing
CrisisCampUk: Where next for UK crisis crowdsourcing
 
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective LivesJoy Mountford at BayCHI: Visualizations of Our Collective Lives
Joy Mountford at BayCHI: Visualizations of Our Collective Lives
 
Knowledge Management and Open Data for Innovation
Knowledge Management and Open Data for InnovationKnowledge Management and Open Data for Innovation
Knowledge Management and Open Data for Innovation
 
Crowdmapping & Verification Hanoi Workshop
Crowdmapping & Verification Hanoi WorkshopCrowdmapping & Verification Hanoi Workshop
Crowdmapping & Verification Hanoi Workshop
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 

Plus de Sara-Jayne Terp

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Sara-Jayne Terp
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageSara-Jayne Terp
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...Sara-Jayne Terp
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other thingsSara-Jayne Terp
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of DisinformationSara-Jayne Terp
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umarylandSara-Jayne Terp
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...Sara-Jayne Terp
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeleySara-Jayne Terp
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksSara-Jayne Terp
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_secSara-Jayne Terp
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copySara-Jayne Terp
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideSara-Jayne Terp
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scaleSara-Jayne Terp
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformationSara-Jayne Terp
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowSara-Jayne Terp
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSara-Jayne Terp
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old thingsSara-Jayne Terp
 
risks and mitigations of releasing data
risks and mitigations of releasing datarisks and mitigations of releasing data
risks and mitigations of releasing dataSara-Jayne Terp
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger dataSara-Jayne Terp
 

Plus de Sara-Jayne Terp (20)

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of age
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other things
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of Disinformation
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworks
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec
 
2020 09-01 disclosure
2020 09-01 disclosure2020 09-01 disclosure
2020 09-01 disclosure
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guide
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scale
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformation
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz now
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_belief
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old things
 
risks and mitigations of releasing data
risks and mitigations of releasing datarisks and mitigations of releasing data
risks and mitigations of releasing data
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger data
 

Dernier

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Dernier (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Evolution of the Humanitarian Data Ecosystem

  • 1. Evolution of the Humanitarian Data Ecosystem Sara Terp, AAAI 2015
  • 2. SJ’s Stages of Data Use • Hand-scraping (including lists of where to look), random categories, SMS, maps • Standards and dataset visualisations • Mashups and statistical analysis • Stable datastores and local data scientists
  • 3. 2004-2009 • December 2004: Boxing Day Tsunami kills 230,000 people. Sri Lankan techs create Sahana • January 2008: Kenyan news blackout during post-election violence. Bloggers create Ushahidi • June 2009: CrisisCommons forms after a tweet-up • October 2009: ICCM conference, Cleveland • 2009: Ushahidi creates CrisisMappers • 2009: First RHOK hackathon creates PeopleFinder • 2009: CDAC forms after a discussion in a bar
  • 4.
  • 5. Intelligence Systems BOTSHUMANS Good at: complex analysis, heuristics, pragmatic translations, creative data finding, sudden onset Not so good at: high volume, repetitive, 24/7 accurate Good at: high volume, repetitive, complex pattern finding, long term Not so good at: complexity, human foibles
  • 6. Unmanned Vehicle Control PACT locus of Authorith Computer Autonomy PACT Level Sheridan & Verplank Computer monitored by human Full 5b Computer does everything autonomously 5a Computer chooses action, performs it & informs human Computer backed up by human Action unless revoked 4b Computer chooses action & performs it unless human disapproves 4a Computer chooses action & performs it if human approves Human backed up by computer Advice, and if authorised, action 3 Computer suggests options and proposes one of them Human assisted by computer Advice 2 Computer suggests options to human Human assisted by computer only when requested Advice only if requested 1 Human asks computer to suggest options and human selects Operator None 0 Whole task done by human except for actual operations
  • 8. “Don’t be Imperial” Pro: “Laboratory” = on behalf of Per: “Community” = alongside Para: “Grassroots” – by and within
  • 9. Volunteer Skills Used Programming Telecommunications Mapping User Experience IT project management Data analysis Relief work experience Local knowledge Translation Communications & PR Facilitation and admin Making tea!
  • 11. Data Process Ask a good question… Obtain datasets Clean, combine, transform data Explore the data Try models (classification, machine learning etc) Interpret and communicate your results
  • 12. People started conversations… • Twitter • Facebook • SMS • Phones • Photos • News • Sneakernet DecisionsGAP Overworked Field People
  • 14. @bodaceacat http://blog.overcognition.com/ Creating Datasets • People add features to OpenStreepMap • Person sends SMS to 4636 • Message goes to CrowdFlower • Person translates and geolocates message • Message goes to Ushahidi display • Message gets to responders, public, aunts, Sahana etc.
  • 16. Building Technologies Ongoing: • CDAC website review • Field Voices • Haiti Amps Network • Haitian Voices • Machine Translation System • Oil Spill Response • PAP outskirts food relief • Telecommunications technical project • Low-bandwidth Ushahidi • Kapab Medical Facility Capacity Finder • Disaster Accountability Public Database • Sync the Sheet • Testing Crabgrass Closed: • Translators in Action - other translation tools were developed Proposed • Mining Relief Data • Automating Aid Request via a Voice Phone Call • Building A Refugee Camp Cell Phone Early Warning System • Community Tool Box • CrisisCommons Roledex • Facebook for ARC Safe and Well site • Haitian Skilled Workforce Retention • Post Disaster Child Protection • CDAC Radio Website Unknown • Disaster Accountability Hotline • Incident visualisation • Needs Categorization • World Academic TeaCHing Hospitals disaster relief
  • 17. Improving Technologies • ReliefWeb UX redesign • Ushahidi UX redesign • CDAC website review • OpenStreetMap development, at other end of table; OpenStreetMap users at the other
  • 20. @bodaceacat http://blog.overcognition.com/ What’s an appropriate crisis to help? • Information – Information deluge – Knowledge drought • Infrastructure – Local infrastructure is overwhelmed – Existing information channels • Stages – Mitigation – Preparedness – Response – Recovery – Sustainability
  • 21. @bodaceacat http://blog.overcognition.com/ user questions for pkfloods • Where can I find out who needs my help? • Where can I find people to help me deliver aid? • Where can I find out information? • How do I find out if I'm about to be flooded? • Who should I alert/give my information to? • Where can I find general information out about #pkfloods? • Where can I search for people? (I cannot find my grandmother/relative) • I have been 'found' - who should I alert/give my status to? • I need food/water/supplies, how can I tell people I need something? • I have food/water/supplies, how can I find out where there's a need? • I want to get to location x, where can I find out about the state of the roads? • I am observing/know the state of the roads, who should I alert/give my information to? • How can I find out where there are information blackspots/there is no telecomms coverage? • I know where the telecoms/information blackspots are, who should I give my alert/information to and how?
  • 23. What if the datapoints move? • Ash cloud from Snæfellsjökull left planes on ground and thousands of people stranded • UK crisis mappers started news and twitter watches • Needed a tool that let us track who was stranded and ways for people to get home • But all the methods we had were static
  • 26. Task Types • Message level: • Media monitoring, source checking (e.g. SMS), summarisation, translation, geolocation, cleaning (e.g. PII removal), categorising (e.g. grouping) • Meta level: • Analysis (producing graphs, explanations, connections), • Verification • Tasks / team control • Communication • After-action reporting (inc evaluation)
  • 27. Sudden-Onset Crisis • Fire, flood, heat, cold, tsunami, earthquake, storm, tornado, hurricane, cyclone, refugees, bombings, election issues / violence etc
  • 28. 2011: UN Data Science
  • 29. Slow-Burn Crises Droughts, agriculture, food insecurity, conflict, education, disease, employment, shelter, trade, endemic violence, GBV etc. “Human development is a process of enlarging people’s choices. The most critical ones are to lead a long and healthy life, to be educated and to enjoy a decent standard of living. Additional choices include political freedom, guaranteed human rights and self-respect – what Adam Smith called the ability to mix with others without being ashamed to appear in public” – UNDP Human Development Report
  • 33. Data CrossWalks DR Congo in Data.UN.Org: “Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the Congo” DR Congo in common standards: “Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)
  • 37. Common Data Needs • Rolodexes: which response groups to follow, and who’s likely to bring what • 3Ws: who’s doing what where • GIS data: knowing where medical facilities, schools, roads, bridges are • Communications: cell tower locations and signal maps • Demographics. • Technology and social media use to demographics
  • 38. Commonly Available Data • Direct messages (SMS etc) • Social media messages (tweets etc) • Demographic data (e.g. surveys) • News reports • 3Ws, situation reports (both official, via news sources and on social media), field notes • Photos: ground, aerial, satellite, videos • CSVs, webpages, PDFs, audio recordings (e.g. radio)
  • 39. Common Issues • Massively dispersed and unstructured data (still) • Named entity and category mismatches between datasets • Trust • Personally Identifiable Information (and risk) * Crisis response is time-limited * Crisis data response is resource-limited * Crisis preparation is attention-limited (if you want resilience, either pay or lead)
  • 40. (Some of) What’s Broken • Crisis Data – Remote vs Ground disconnect – Crisis vs Development disconnect – Deployment lead overload • Development Data – Broken data formats, access, coverage, standards – Ignored data sources – Human vs Data disconnect • Communities – Stovepipes, fiefdoms, imperialism, finding…
  • 45. My Personal Three Vs • Variety – Data all over the place – Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images, videos, audiofiles, maps, proprietary. Etc. • Velocity – Streams updating too fast for a mapping team (100-200 people) to handle – Pages updating too frequently to check by hand • Volume – Can’t open the data in a spreadsheet – Can’t fit the data on my laptop – Maxes out my credit card (thank you Amazon!)
  • 46. The other Vs: Veracity
  • 47. Mappers Needed More Data Science Literacy
  • 50. We Build Community Data Tools
  • 51.
  • 52. Ushahidi is a Dataset
  • 55.
  • 61. Lots of groups curate data
  • 64. Local wins. Local should (almost) always win
  • 65. 2015: NGO Data Scientists
  • 69. And are making it part of “normality”
  • 70. Here are some missing pieces • Basic vocabularies, e.g. stopword lists for most languages (including SMSspeak in different languages) • Pre-crisis datasets for many crisis-prone countries • Philippines: local response groups set up • Missing Maps project for GIS data • What about the rest? • User datasets in existing tools • E.g. adding own gazetteers into Ushahidi.

Notes de l'éditeur

  1. I’m Sara Terp - Director of Data Projects at Ushahidi, and long-term crisis data nerd. I’ve been asked to talk about how this magic happened, and how we’re creeping up on crisis data science. Mappers each have our own stories - the story of an Information Management Officer inside UNOCHA is different to the story of an academic at Harvard, is different to mine. Five years ago, I saw a data system that was badly broken and decided to dedicate 5 years of my life to help to fix it. I’ve watched and been part of some of the evolution of humanitarian data management. I’ve been in one tornado, two hurricanes, snowstorms, floods, 1 nuclear alert, 1 conflict, 1 cold war, but never been an on-the-ground responder. This is what I’ve seen happening over the years, and where I think (or hope) we’re all going. Linked data is about the relationships between things, so let’s see where the relationships in crisis community data are.
  2. Let’s spoil the surprise a bit. This is what I’ve observed over years of handling crisis, development and community data.
  3. Let’s talk through some of what’s happened over the past few years, from the perspective of a volunteer data nerd. I won’t talk about every crisis and every deployment - most of the big ones are well covered already, but I will talk about some of the ways that we got to where we are today. This is development data science before I got involved. 2004-2009 I was designing UAV and intelligence systems for the UK military and running an innovations group to combine new ideas, new technologies and business ideas. I saw tragedies like the Boxing Day Tsunami, but couldn’t see any way that I, as a technologist, could help. * Sahana is an information management product, designed to be used by disaster response groups * Ushahidi is a crowdsourcing tool, designed for easy reporting through SMS and emails, and easy summary through categorized datasets and maps.
  4. I was working on ways for humans and intelligent agents to work together on time-limited tasks. I could sum that up as humans don’t have all the info, machines don’t have all the smarts, but sometimes you have to compromise to get stuff done in time.
  5. I also designed unmanned vehicle systems, where safety means you have to clearly define the control, responsibilities and interactions shared between a vehicle system and its human pilot. Variable autonomy is about sharing the load when needed, e.g. When to use humans, when to use machines as “surge capacity”? How much can we trust the machine to do? I saw a lot of the overload problems in crisis data as a place that we might be able to apply variable autonomy theory.
  6. And working on better ways to manage company knowledge and innovations, looking at things like DKCP which codifies the interactions between concept and knowledge space.
  7. And then the 2010 Haiti earthquake hit, and a call went out across London for anyone who’d recently run a Barcamp, to organize one not in the usual 3-4 months, but in 2-3 days. It all started here for many crisis data nerds. A bunch of us got together in London. My favorite quote was from one of the disaster relief people who worked with us: just simply “I’ve always wanted to do this”. And that was really the point of the CrisisCamps: not just to produce data and technologies, but also, because we were outside the system, to do the things that NGO information people wanted to do but couldn’t get top-cover or resources for. Picture: CrisisCamp London during the Haiti response.
  8. And a bunch of people got together and dialed in from all over the world. The original VTCs were grassroots organizations - some still are. In Haiti and several crises that followed, local people and diaspora connected and were part of each response. And when you process crisis data, you have to remember that you only have the data that could be created or sent, and that local knowledge is often the thing that makes the difference. Picture: Haitian developers and data nerds, designing and building a data system for gender-based violence counsellors.
  9. This is the skills list we made at the first CrisisCamp London. We had some serious experts in the room, both on specific subjects (e.g. UX), platforms (e.g. OpenStreetMap) and uses. Crisiscommons were talking to ngo imos, govts, military, ngo responders, community responders, digital responders and ronins. Ronins are important: they’re the people outside groups - they can bring new skills and knowledge if carefully connected; they can bring chaos and divided attention if not.
  10. Or, if think about that in terms of Drew Conway’s data science diagram, we had the hacking skills and substantive expertise, but we weren’t ready to use any of the stats knowledge in the room. Much of the story of development data science has been about pulling these skills together. Image: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  11. Or, if we look at in in terms of data processing, mappers were obtaining datasets, cleaning, interpreting and communicating them, but were missing the “explore" and “try models” skills. List: OSEMN: Obtain-Scrub-Explore-Model-Interpret model
  12. One of the changes in Internet2.0 was that people stopped being consumers of handed-out information and started producing information and having conversations about it. The Internet did what the Internet did; the large NGOs were still on the old broadcast model. This had the potential to either blow up or be a carefully targeted information source in Haiti.
  13. A team at Tufts and elsewhere set up an Ushahidi instance so that any SMS message sent to 4636 would end up on a categorized map. To get that map, volunteers around the world (e.g. the far table in the London photo) translated, categorized and geolocated those SMS messages. It also had to follow “First, do no harm” - we saw many personal details being sent into the map and assessing their potential risk was hard.
  14. Here’s the basic process. But you can’t geolocate without a map. This was a problem in Haiti, and it’s still a problem for every new crisis: we haven’t mapped every road, building, town or even region in the whole world, and most of those missing areas are in crisis zones. OpenStreetMap fixed that gap - volunteers traced aerial data and used paper maps and local memories to draw and tag Port-au-Prince in days. This triggered the formation of both Humanitarian OpenStreetMap and the Missing Maps project. When you design LOD for crises, you have to build in that some of your datasets will be created on-the-fly. * http://wiki.openstreetmap.org/wiki/Humanitarian_OSM_Team * http://wiki.openstreetmap.org/wiki/Missing_Maps_Project
  15. Mappers also generated data from images: here, they searched aerial imagery for the tarpaulins over an informal displaced persons’ camp, then marked it as a campsite on OpenStreetMap. Image: aerial image of Port-au-Prince with informal camp (tarpaulins).
  16. Mappers also had a lot of developer volunteers, and developers like to build stuff. November 2010 CrisisCommons projects list: http://wiki.crisiscommons.eu/wiki/Project_Statuses_November_2010. Mappers also worked on a lot of technologies with RHOK - and learnt that the successful projects were the ones with end-user buy-in. But more tech wasn’t the problem: the problem was getting better data and collaboration between groups. Other volunteer projects (NB lots of groups collaborated on projects) included * “We Have, We Need” Craigslist of self-identified needs and requests by non-profits assisting in Haiti relief operations. Built in days. Biggest moment: getting generator fuel to a hospital 20 minutes after they tweeted for help * Haiti Hospital Capacity Finder. Listed free beds in field hospitals
  17. The London camp missed the first few days of the response: by the time they pitched in, there were many development projects already happening. They looked at the projects list, and took a different approach: using their connections to help improve already-existing systems.
  18. Groups built interfaces from paper to GIS maps and back. They also built apps to move data from google spreadsheets to APIs and back too (it isn’t hard to do - just technical and needs a bit of thought - e.g. if you can use the top lhs cell of a spreadsheet as a flag for whether to update it or not). Image: walking papers map; these are used to convert OSM maps to paper and back (note the QR code in the corner of the map; this is used to position the uploaded paper map onto the OSM map).
  19. Public Laboratory (PLOTS) formed during the Mexican gulf oil spill, to create and use low-cost, community-based sensors and bring measurement science to new communities. They’ve been creating kite-based aerial data for years now, and have designed things like the CD-and-cardboard spectrometer.
  20. Mappers thought about what we meant by data needs: what was needed, and when. Pre-crisis resilience was important, and many efforts to produce pre-crisis datasets (including CrisisWiki) were tried, but often failed from inter-crisis lack of enthusiasm. 2010 CrisisCampLondon slide
  21. Public Lab started designing and using low-cost sensors in the BP oil spill. This was yet another potential data source.
  22. And the types of design question we needed to ask of the data.
  23. And the analysis needed to produce that. Mappers framed these needs in their working environments: * Physical environment - phone signal, internet, * Political environment - e.g. data as power * Technical environment
  24. This was the first deployment that worried me. Despite all our good intentions, our ability to track social media and reach out to people, we didn’t have a good way to help thousands of people in trouble. I started wondering how we might start handling data points that moved in time and space. Eventually those thoughts made it into the W3C GIS standards.
  25. This was our vision of an effective crisis information ecosystem: * Established data gathering technologies (mostly) * Cooperation as standard * Open data systems = crisis data systems We’ve thought about (almost) everything People know where to get information People know where to help out
  26. And this is what we lived by. In many ways, Haiti was the start of something very beautiful; in other ways, it was a hot mess of crowds of people all trying to help the same responders.
  27. At the same time, Project EPIC was quietly created human-tagged Twitter feeds.
  28. SBTF launched in late 2010, with a different focus to the earlier groups; it had specialist teams, and would only activate if asked to by a responding partner (NGO, news agency, local group etc). Here’s an example of the type of workflow the volunteers followed; this workflow is from a later deployment (the 2012 Kenyan election violence monitoring), but pretty much covers the teams: SMS team handling SMS messages sent to a Ushahidi platform; media monitors searching online for related information; translation team converting to English; Geolocation finding lat/long from addresses; report team adding categories, and verification team making sense of groups of rather than single reports. Mappers also tested new technologies and listed found datasources.
  29. Here are some of the tasks that human crowdsourcers did. Some of these can be automated; some can be partially automated; others we might need to keep as human activities How we do this depends partly on the point of the dataset: in 2010, the emphasis on reporting needs shifted to an emphasis on “tell us what you see” - this changed the message-sensitive pressure to catch every relevant message (e.g. cries for help) into a pressure to produce a timely summary of the situation.
  30. We worked on a lot of disasters that year, and in the years to follow. 2010 was special because existing crisis camp leads encouraged camps to form around the world, with local leaders working in local languages. The Chile community managed their own earthquake data with help from neighboring spanish-speaking countries; the Thailand camp formed during the Pakistan floods then went on to handle floods in Thailand too (which was great because there are limits to Google Translate). This only stopped when the tensions between a distributed barcamp-style federation model and a centralized hierarchical control one became too strong.
  31. I joined the UN’s big data team (UN Global Pulse), in the hope it would fill in the missing pieces of that Drew Conway diagram. Mostly we concentrated on ways to reduce the time between a development crisis starting and data becoming available on it. At the time there were very few developers in the UN. I discovered that the UN is full of people who succeed despite its politics, and had many opportunities to meet with them and talk about humanitarian GIS and data science.
  32. Here, we started thinking about crises that weren’t going to be over in hours, days or (at a stretch) weeks.
  33. And then all heck broke loose. Two major crises happened at the same time, and mappers found themselves dealing with Fukushima’s radiation data (crowdsourced radiation monitoring) and trying to “do no harm” during a conflict (Libya crisis map).
  34. Here’s a snapshot of the Libya Crisis Map report form. Every report in a Ushahidi platform must (unless we’ve disabled some code). Mappers started collecting and listing these categories, and thinking hard about what a standard set would be, before we started having problems comparing datasets. Internews wrote a lovely report on these category lists too: https://innovation.internews.org/sites/default/files/research/InternewsWPCrowdGlobe_Web.pdf
  35. This was the first cross-check of data generation by both machines and humans. Both were asked to tag buildings (shacks, houses and large buildings) as a proxy for population densities in the Afgooye region of Somalia. The machine (EU) and human results were comparable. This took over my Christmas, and the Christmas of many other volunteers. Tomnod has gone on to run many other satellite-tagging deployments, including tagging the seats of wildfires in Australia.
  36. At some point that year, I got annoyed that data.un.org didn’t have an API, created my own and started looking at the crosswalks between datasets in it. I munched lists of csv headers, and starting looking at variations between the datasets under those headers. It didn’t take long to start running into problems combining datasets. The one that I chose as an example was country names… this is still causing problems today, and is going to be an issue for anyone linking data today. The CrisisNet team has started working on auto-detecting data column types.
  37. By 2012, data science was well underway and the open data crowd had lots of experience using algorithms to clean datasets. The USAID dataset was pre-cleaned (automated geolocation) before volunteers coded the locations that couldn’t be found by machines. This wasn’t a sudden-onset deployment, but a useful test of people and machines sharing tasks.
  38. This was the ACAPS DNA deployment: a test to see if volunteers could help gather the standard data for their DNA product, without losing accuracy. Mappers built scrapers and designed an automated crisis data collection system, but we still needed humans to search obscure corners of the web for relevant information. The country name crosswalks were very useful in this.
  39. We still built maps. Lots of maps.
  40. Mappers still spent a lot of time searching the internet and archives for small pieces of data; in this case, a list of operational health facilities in Libya after the crisis there. Much of this data search is an exercise in lateral thinking and connecting to other groups (like the Libyan doctors’ facebook group), and in the frustrations of geolocating buildings that had no street addresses (but were referred to in terms of the journey to them, e.g. left at this mosque etc).
  41. Need useful, actionable data.
  42. And here are some of the issues we saw. Note that almost nobody wanted to work on crisis data in-between crises - it’s not as sexy as “saving lives with data”.
  43. But the whole system (communities, processes, tech and innovation) still needed work. Slide from 2012 talk
  44. Mappers used some of the wells data from Sudan at the Guardian’s 2012 Development Data Hackathon (http://www.eventbrite.co.uk/e/development-data-challenge-london-tickets-3990385350). Mappers also long conversations about linking funding and project data given the variable level of representation in the IATI standard.
  45. Image: Micromappers using PyBossa platform (Python version of Stanford’s Bossa crowdsourcing platform) for Typhoon Yolanda deployment. One of the UN volunteers (Simon?) produced a beautiful choropleth mashup of poverty data and damage estimates - this was the first linked data visualization that I’d seen in this space.
  46. What Micromappers brought was control over the system inputs (e.g. code from people like Hemant could filter messages before passing to volunteers) and cleaner workflows: 2 clicks instead of the 6-8 clicks to tag a message in standard Ushahidi (Ushahidi had workflow code, but it wasn’t widely used).
  47. This still led to maps.
  48. We saw overload in a lot of crises, including the Boston bombing.
  49. This is a t-shirt printed after the 2010 Chile earthquake. The message on it reads “plz send help to 1712 estacion central, santiago chile. im stuck under a building with my child. #hitsunami #chile we have no supplies”. iHub data wrote a research report on the other 3Vs: viability, verification, validity: http://community.ihub.co.ke/blogs/15644/3vs-crowdsourcing-framework-for-elections-launched http://www.ihub.co.ke/ihubresearch/jb_VsReportpdf2013-8-29-07-38-56.pdf
  50. But… it’s hard to take people from 0 to data scientist. It’s easier to build tools that are easy to install and use.
  51. The OpenCrisis team tried building a humanitarian data and data links store (the Humanitarian Data Project); somewhere to collect all the datasets we’d found over the past few years, and make sure other people could find things like Karen Payne’s wonderful spreadsheet of crisis data links. Mappers searched old deployment folders for data source lists and wrote code to import data from sources including Google spreadsheets into CKAN instances, but it was painful, beyond painful, setting up and managing a CKAN instance for this, and difficult to sustain as a volunteer group without sysadmins.
  52. We were greatly relieved when UNOCHA’s Humanitarian Data Exchange appeared later that year doing the same thing, and pulled the plug on HDP.
  53. We build tools for democratizing information, increasing transparency and lowering the barriers for individuals to share their stories. We’re willing to take risks in the pursuit of changing the traditional way that information flows in the world.
  54. 40,000 Deployments; 49,000 Mobile Downloads.
  55. I joined Ushahidi to improve data literacy in one of the Crisismappers’ most-used tools. I still don’t work on crisis data, but I have had long conversations about what a data scientist would need in the main platform. Ushahidi Platform is built on a database. It has datasets embedded in it. In Platform V2 you can access the reports list (the dots on the map) through the API and CSV download, but there’s much much more sitting in each platform, waiting to be claimed.
  56. Or alternatively: community reports in, stories out. And we can already do some basic reasoning with this: for instance, using point-in-polygon methods to check GIS labels like region and country against lat/longs.
  57. Ushahidi platform V2 has admin-defined forms. That means that users can create a set of report forms with different fields in them; that in turn makes for a more powerful set of tables. Ushahidi tried using forms to represent sensor data, but in practice it’s easier to add sensor data as attachments to existing reports. We went a little further with this, creating an Ushahidi plugin that connects Ushahidi instances together, to share data in common data fields and categories. This gives individual groups control over their own category lists, without disrupting the central view of all instances.
  58. But we still have the issue of a “dumb” input feed. This is where we need some filtering and intelligence.
  59. But we still have the issue of a “dumb” input feed. This is where we need some filtering and intelligence, before data gets to human processors and the map.
  60. We’ve also put in a lot of D3 code, to give non-specialist users access to visualisations of their datasets. And thought about other automations, for instance Named entity recognition using the Umati plugin, and geolocation from those named entities. Autotranslation through google translate api Auto-categorisation from text; auto-tagging using external programs Retweet removal using external programs; and, please please, spam removal
  61. I’ve been working on the Pheme project, with a consortium working on veracity checking on social media, including detecting contradiction/controversy and tagging rumors as non-verified potential facts, misinformation and disinformation, in multiple languages, going from “about the same thing” to “confirms/contradicts”. Part of this is starting to look at Ushahidi platform data on the word level. Another part is treating Ushahidi reports as a dataset that can be tagged from outside (eg. using categories and the API) to highlight message components and influences.
  62. Typhoon Ruby, like Yolanda, started small - a footnote in an article about a previous supertyphoon. Text: from http://philnews.ph/2014/12/01/pagasa-forecasts-an-lpa-to-enter-philippines-on-friday-bagyong-ruby/
  63. Having decided on a country, the storm was veering around so much that it was hard to tell where it would make landfall. This is important because none of the Filipino coasts are completely mapped, and anywhere the storm hit would need maps very quickly.
  64. In crises, government dataset pages fail (e.g. in Sandy and Ruby). They’re usually back up during the response, but it helps to be prepared for this. There are also still PDF datasets out there - Ruby produced a dataset that was a PDF image of a slightly-tilted spreadsheet that none of the tools I had (including OCR) could scrape. Often these things are better solved politically than technically.
  65. And this itself created an issue: the government groups were reluctant to share with the NGOs operating on their turf. Image: http://reliefweb.int/sites/reliefweb.int/files/resources/TC_Hagupit_Impacts_05DEC14_1500UTC..pdf
  66. http://www.gov.ph/crisis-response/typhoon-ruby/#section-1. Micromappers happened, classifying images just after a similar exercise classifying tree damage from Cyclone Yolanda. I start working out if I can get away with filtering out “Pray” and “God” from the tweet dataset.
  67. HDX started to be used.
  68. OSM map changes during Typhoon Ruby. This is important. Most people in the Philippines speak English, but Tagalog is the official language, with about 100 dialects (http://en.wikipedia.org/wiki/Languages_of_the_Philippines)
  69. Mappers already saw data science in 2013 with the choropleth mashups of poverty against typhoon track, but in 2015, data science and development data are really starting to converge, spurred by a combination of available tools, easily-available data science training and curiosity. At the same time as crisis mapping took off, so did data science. It had enthusiastic commercial sponsors (e.g. O’Reilly Media), conferences (STRATA), knowledge aggregators (Data Science Central), MOOCs, meet ups and DataKind. Humanitarian data science has been a little slower to take off. Mappers have datastores, mashups and map projects (like Missing Maps). Mappers’re missing ?
  70. We still have Ushahidi maps, now acting as data sources for other applications. The Ebola crisis has been a long one, with many groups involved; that length has given them time to clean datasets and try new technologies.
  71. HDX page from the Ebola response. Simon Johnson of the Red Cross has been producing many Tableau visualizations and dashboards from these datasets. Ebola geonode is also available, and the HDX standards group is slowly making headway. Humanitarians have been working with new technologies for a while now (e.g. UNOOSA’s work on UAV systems), including Unmanned vehicles, Low-cost sensors, Wearable technology, Data science and AI, but it’s also important to build for what people have available to them: Mobile phones, USB sticks, Excel, Googledocs
  72. But the most important thing that’s happening is that we’ve shifted back to improving crisis resilience (not the same thing as preparedness: resilience should be built into the normal operations of a community, not just remembered in a crisis) at the community level. And that’s where linked data can really help. Image: Rockfeller Resilience Initiative front page, http://www.100resilientcities.org/#/-_/
  73. Image: Taarifa map of water points in Tanzania. This help deal with the issues that will always be there in a crisis.
  74. We have things like Karen Payne’s list of data sources, and ontologies (e.g. WWHGD) of basic data needs - why aren’t we automating putting these together?
  75. This lot have structure