SlideShare une entreprise Scribd logo
1  sur  39
Big Data and Me: experiences
from the front line
Sara-Jayne Farmer
Change Assembly
April 22nd 2013
ME
Me
• Data Scientist
• Using data to:
– connect communities
– improve access to information
– so people can make better decisions
– on both small and large scales
• It’s all about people:
– Local people: know their needs; need more information
– Local technologists: have skills; need connections
– Large organisations: have resources; need guidance
Some of those People
(smart, talented, dedicated hackers in Haiti, January 2013)
My Personal Three Vs
• Variety
– Data all over the place
– Csv, json, xml, excel, pdf, text, webpages, rss, scanned
pages, images, videos, audiofiles, maps, proprietary. Etc.
• Velocity
– Streams updating too fast for a mapping team (100-200 people)
to handle
– Pages updating too frequently to check by hand
• Volume
– Can’t open the data in a spreadsheet
– Can’t fit the data on my laptop
– Maxes out my credit card (thank you Amazon!)
VARIETY
“more people have mobile phones than toilets”
– UN, March 2013
But… but… there are always data issues…
• Datasets were difficult to find
• No data available after 2010
• Hard to track provenance – e.g. what decisions did
the people creating these datasets make? What
assumptions?
• Data was rounded up
• Countrynames didn’t match between sets
• Multiple charactersets (e.g. Å, A, Ԇ)
• Messy formatting (merges, ‘explanations’ etc)
e.g. Country Names
DR Congo in Data.UN.Org:
• “Congo, Democratic Republic of the”, “Congo
Democratic”, “Democratic Republic of the Congo”, “Congo
(Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo
Dem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep.
of Congo”, “Dem. Rep. of the Congo”
DR Congo in common standards:
• “Democratic Republic of the Congo” (UN
Stats), “Congo, The Democratic Republic of the”
(ISO3166), “Congo, Democratic Republic of the”
(FIPS10, Stanag), “180” (UN Stats), “COD”
(ISO3166, Stanag), “CG” (FIPS10)
And coding
And interpretation
• Hang on… don’t some people have more than one
phone?
• And how do you count the people without toilets?
• What if the cities have lots of phones and toilets, and
the rural areas don’t?
• Where does my composting toilet fit in this?
• How big were these surveys?
• What do we do with the zeros?
• Etc…
And purpose
And Communication
And Alternative Data Sources
And alternative alternatives…
• Social media proxies
• Grassroots maps
• Etc.
VELOCITY AND VOLUME
2013 Boston bombings
The Humans+Tools Solution: Crisismapping
Find…
Listen…
Estimate…
Geolocate…
Create maps…
Analyse
Explain
Use
BUT WE NEED MORE DATA
SCIENTISTS…
Build and Connect Communities
Train Non-Techies
Create Higher-level Tools
Big Data and Me: experiences
from the front line
Sara-Jayne Farmer
http://www.changeassembly.com/
@bodaceacat
MORE REFERENCES
strataconf.com
datasciencecentral.com
analytictalent.com
Tools
Formal (Free) Training
NYC Meetups (see meetup.com)
Volunteering: datakind.org

Contenu connexe

Tendances

Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit WorldNate Hill
 
Hyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonHyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonDataJournalismUK
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media OrganizationsJ T "Tom" Johnson
 
Towards a critical data journalism practice
Towards a critical data journalism practiceTowards a critical data journalism practice
Towards a critical data journalism practiceLiliana Bounegru
 
Mapping the Australian Twittersphere
Mapping the Australian TwittersphereMapping the Australian Twittersphere
Mapping the Australian TwittersphereAxel Bruns
 
Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsPaul Bradshaw
 

Tendances (7)

Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit World
 
Hyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy DickinsonHyperlocal data journalism - Andy Dickinson
Hyperlocal data journalism - Andy Dickinson
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media Organizations
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Towards a critical data journalism practice
Towards a critical data journalism practiceTowards a critical data journalism practice
Towards a critical data journalism practice
 
Mapping the Australian Twittersphere
Mapping the Australian TwittersphereMapping the Australian Twittersphere
Mapping the Australian Twittersphere
 
Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 years
 

Similaire à Big Data and Me

Evolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemEvolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemSara-Jayne Terp
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for developmentSara-Jayne Terp
 
Open Data Islands and Communities
Open Data Islands and CommunitiesOpen Data Islands and Communities
Open Data Islands and CommunitiesAlan Dix
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kindSara-Jayne Terp
 
Digital divide and computer assisted reporting
Digital divide and computer assisted reportingDigital divide and computer assisted reporting
Digital divide and computer assisted reportingAnna Polud
 
2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development dataSara-Jayne Terp
 
Icc2013 country names
Icc2013 country namesIcc2013 country names
Icc2013 country namessirf13
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with DataRitvvij Parrikh
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Big data and development
Big data and developmentBig data and development
Big data and developmentSimone Sala
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Big Data, Open Data, Big Costs - tim willoughby
Big Data, Open Data, Big Costs  - tim willoughbyBig Data, Open Data, Big Costs  - tim willoughby
Big Data, Open Data, Big Costs - tim willoughbyTim Willoughby
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014StampedeCon
 
The big story of small data.
The big story of small data. The big story of small data.
The big story of small data. Alan Dix
 
Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime InterfaceBen Taylor
 

Similaire à Big Data and Me (20)

Evolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data EcosystemEvolution of the Humanitarian Data Ecosystem
Evolution of the Humanitarian Data Ecosystem
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
Open Data Islands and Communities
Open Data Islands and CommunitiesOpen Data Islands and Communities
Open Data Islands and Communities
 
2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind2013 10-22 humanitarian data talk to data kind
2013 10-22 humanitarian data talk to data kind
 
Digital divide and computer assisted reporting
Digital divide and computer assisted reportingDigital divide and computer assisted reporting
Digital divide and computer assisted reporting
 
2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data2013 01-21 open itp crisis and development data
2013 01-21 open itp crisis and development data
 
Icc2013 country names
Icc2013 country namesIcc2013 country names
Icc2013 country names
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Gettind data used
Gettind data usedGettind data used
Gettind data used
 
Making data more human
Making data more humanMaking data more human
Making data more human
 
Big data and development
Big data and developmentBig data and development
Big data and development
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Big Data, Open Data, Big Costs - tim willoughby
Big Data, Open Data, Big Costs  - tim willoughbyBig Data, Open Data, Big Costs  - tim willoughby
Big Data, Open Data, Big Costs - tim willoughby
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
 
The big story of small data.
The big story of small data. The big story of small data.
The big story of small data.
 
Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime Interface
 

Plus de Sara-Jayne Terp

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Sara-Jayne Terp
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageSara-Jayne Terp
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...Sara-Jayne Terp
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other thingsSara-Jayne Terp
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of DisinformationSara-Jayne Terp
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umarylandSara-Jayne Terp
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...Sara-Jayne Terp
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeleySara-Jayne Terp
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksSara-Jayne Terp
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_secSara-Jayne Terp
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copySara-Jayne Terp
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideSara-Jayne Terp
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scaleSara-Jayne Terp
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformationSara-Jayne Terp
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowSara-Jayne Terp
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSara-Jayne Terp
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old thingsSara-Jayne Terp
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger dataSara-Jayne Terp
 
Session 09 learning relationships.pptx
Session 09 learning relationships.pptxSession 09 learning relationships.pptx
Session 09 learning relationships.pptxSara-Jayne Terp
 

Plus de Sara-Jayne Terp (20)

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of age
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other things
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of Disinformation
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworks
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec
 
2020 09-01 disclosure
2020 09-01 disclosure2020 09-01 disclosure
2020 09-01 disclosure
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guide
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scale
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformation
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz now
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_belief
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old things
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger data
 
Session 09 learning relationships.pptx
Session 09 learning relationships.pptxSession 09 learning relationships.pptx
Session 09 learning relationships.pptx
 

Dernier

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Dernier (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Big Data and Me

  • 1. Big Data and Me: experiences from the front line Sara-Jayne Farmer Change Assembly April 22nd 2013
  • 2. ME
  • 3. Me • Data Scientist • Using data to: – connect communities – improve access to information – so people can make better decisions – on both small and large scales • It’s all about people: – Local people: know their needs; need more information – Local technologists: have skills; need connections – Large organisations: have resources; need guidance
  • 4. Some of those People (smart, talented, dedicated hackers in Haiti, January 2013)
  • 5. My Personal Three Vs • Variety – Data all over the place – Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images, videos, audiofiles, maps, proprietary. Etc. • Velocity – Streams updating too fast for a mapping team (100-200 people) to handle – Pages updating too frequently to check by hand • Volume – Can’t open the data in a spreadsheet – Can’t fit the data on my laptop – Maxes out my credit card (thank you Amazon!)
  • 7. “more people have mobile phones than toilets” – UN, March 2013
  • 8. But… but… there are always data issues… • Datasets were difficult to find • No data available after 2010 • Hard to track provenance – e.g. what decisions did the people creating these datasets make? What assumptions? • Data was rounded up • Countrynames didn’t match between sets • Multiple charactersets (e.g. Å, A, Ԇ) • Messy formatting (merges, ‘explanations’ etc)
  • 9. e.g. Country Names DR Congo in Data.UN.Org: • “Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem. Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the Congo” DR Congo in common standards: • “Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)
  • 11. And interpretation • Hang on… don’t some people have more than one phone? • And how do you count the people without toilets? • What if the cities have lots of phones and toilets, and the rural areas don’t? • Where does my composting toilet fit in this? • How big were these surveys? • What do we do with the zeros? • Etc…
  • 15. And alternative alternatives… • Social media proxies • Grassroots maps • Etc.
  • 18. The Humans+Tools Solution: Crisismapping
  • 26. Use
  • 27. BUT WE NEED MORE DATA SCIENTISTS…
  • 28. Build and Connect Communities
  • 31. Big Data and Me: experiences from the front line Sara-Jayne Farmer http://www.changeassembly.com/ @bodaceacat
  • 36. Tools
  • 38. NYC Meetups (see meetup.com)