Join our #DataTalk on Thursdays at 5 p.m. ET. This week, we learned from DataKind – Harnessing the Power of Data Science in the Service of humanity, Real Impact Analytics, Elissa Redmiles, a Data Science for Social Good Summer Fellow at the University of Chicago, Nick Eng, Data Scientist at the Center for Data Science and Public Policy at the University of Chicago, Kevin Chen, the Chief Scientist at the Experian, North America Data Lab, and others.
2. Join our #DataTalk on Thursdays at 5 p.m. ET
This week, we talked with DataKind, Real Impact Analytics, Elissa
Redmiles, a Data Science for Social good Summer Fellow at the
University of Chicago, Nick Eng, Data Scientist at the Center for
Data Science and Public Policy at the University of Chicago, and
Kevin Chen, Chief Scientist at the Experian Data Lab.
Check out the resources and tweets from this chat:
ex.pn/dataforgood
4. Real Impact Analytics
@RIAnalytics
Data for good is the use of big data to help
policymakers and aid workers foster the
social public good and maximize impact.
ex.pn/datatalk
#DataTalk
5. Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1 ex.pn/datatalk
#DataTalk
Data for good projects use data and data
science to help nonprofits better reach their
mission and assist their target audience.
6. Nick Eng
Data Scientist, Center for Data Science
at the University of Chicago
@nick_eng ex.pn/datatalk
#DataTalk
Projects that use data to improve society as
a whole, rather than any single individual.
To be more specific: communities/cities.
8. Kevin Chen
Chief Data Scientist, Experian Data Lab
@kevincchen
ex.pn/datatalk
#DataTalk
Projects that improve social equality by
leveraging public and private data.
9. Nick Eng
Data Scientist, Center for Data Science
at the University of Chicago
@nick_eng ex.pn/datatalk
#DataTalk
Using data to help the underprivileged,
especially for those who might not know
how data can be used as a tool.
10. What are some favorite examples of
how data has been used for good?
11. ex.pn/datatalk
#DataTalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
One of the projects that drew me to
@DataSciFellows was the
#NurseFamilyParnership project, which used
data science to predict people in need.
14. ex.pn/datatalk
#DataTalk
Our favorite D4G example is the use of telecom data with the
Global Pulse UN team in Uganda to detect and tackle food
crises and prioritize actions against poverty. More specifically,
we have developed a mapping of income inequality and income
shocks in Africa using changes in pre-paid patterns.
Real Impact Analytics
@RIAnalytics
16. ex.pn/datatalk
#DataTalk
A second powerful example of D4G is the use of telecom
mobility data to identify, prevent and treat contagious diseases
such as Ebola, malaria and cholera. We have been able to
identify micro-communities as well as mobility patterns. This
leads towards identifying key routes to block and assess the
potential impact on the spread of a disease.
Real Impact Analytics
@RIAnalytics
17. ex.pn/datatalk
#DataTalk
Nick Eng
Data Scientist, Center for Data Science
at the University of Chicago
@nick_eng
Beyond predictive models and confidential
datasets, products like clearstreets.org simplify
our lives using open data.
22. ex.pn/datatalk
#DataTalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
Organizational culture is very important.
Having good data to analyze and resources directed
toward analysis are key.
23. ex.pn/datatalk
#DataTalk
Finding balance between retaining proprietary
knowledge on either data or technology and applying
to data for good projects can be hard.
Kevin Chen
Chief Data Scientist, Experian Data Lab
@kevincchen
24. ex.pn/datatalk
#DataTalk
Nick Eng
Data Scientist, Center for Data Science
at the University of Chicago
@nick_eng
Implementation! Fancy models or cool
visualizations is only step one. Making these
tools part of the day-to-day is number two.
25. We can see 3 types of challenges: (i) design of the
tools/apps; (ii) access to data; (iii) align the eco-system.
ex.pn/datatalk
#DataTalkReal Impact Analytics
@RIAnalytics
26. The operational challenge is mostly to repackage research
insights to generate real impact on decisions of aid
workers in the field. Many tools are not simple enough for
a daily field use or less actionable and have usually not
been designed around an actual worker’s needs.
ex.pn/datatalk
#DataTalkReal Impact Analytics
@RIAnalytics
27. The technical challenge is to be able to connect to relevant
data sources, being external data sources (e.g. WHO,
World Bank) or telecom data sources.
ex.pn/datatalk
#DataTalkReal Impact Analytics
@RIAnalytics
28. The legal and regulatory challenge is to syndicate our
approach with local regulators and secure the data
handling process, in terms of privacy, anonymization of
data or remote access. All data must remain at the telecom
operator premises within the country. This last challenge
can be partly address through securing a sustainable eco-
system involving all parties.
ex.pn/datatalk
#DataTalkReal Impact Analytics
@RIAnalytics
29. ex.pn/datatalk
#DataTalk
Nick Eng
Data Scientist, Center for Data Science
at the University of Chicago
@nick_eng
And figuring out what the problem exactly is, and
framing it. We don’t always know the domain.
We need your help and feedback.
34. ex.pn/datatalk
#DataTalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
Many different formats are usable:
database data, excel data, csv data are all
easily processable, but text and web data work, too.
35. ex.pn/datatalk
#DataTalkReal Impact Analytics
@RIAnalytics
Telecom data are particularly unique in emerging markets, as they are
collected systematically, locally and in real time. These data can be
complemented by 2 data sources: (i) external or public databases, such
as occurrences of a specific disease in a specific location;
(ii) additional / ad-hoc data which are collected through a mobile
application. The most important limitation is the possibility to identify
back individual people based on the shared insights or tools. This
would dramatically undermine the scaling up of Data for Good.
39. ex.pn/datatalk
#DataTalk
Nick Eng
Data Scientist, Center for Data Science
at the University of Chicago
@nick_eng
And when structured data isn’t available,
you can get creative to make your own data
(e.g. scraping websites).
45. #DataTalkReal Impact Analytics
@RIAnalytics
We need to understand the actual needs of the
potential users, assess correlation between
available data and possible actions and outcomes
and adapt apps and algorithms accordingly.
46. #DataTalkReal Impact Analytics
@RIAnalytics
We need to be able to refresh and operationalize the tools offering a
mobile access to insights; we need to technically secure the access
to the data and ensure privacy; and we need to be able to measure
impact and correct algorithm accordingly. Overall, trust is one of
the key overarching success factors, as it allows to have a smooth
decision flow and maximize impact. Therefore, we need to build
strong partnerships with international institutions to ensure global
impact and scalability of our actions.
49. #DataTalk
Nick Eng
Data Scientist, Center for Data Science
at the University of Chicago
@nick_eng
When doing a project, make sure it’s a
constant partnership with your other
stakeholders (e.g. nonprofits).
50. #DataTalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
Talk to SMEs and find the domain
knowledge you don’t have. Data is only
have the puzzle.
51. What are ways to use data for good,
while protecting privacy?
53. #DataTalk
ex.pn/datatalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
@DataSciFellows keeps data secure
while letting the code for processing the
data be open source.
54. #DataTalk
ex.pn/datatalk
Kevin Chen
Chief Data Scientist, Experian Data Lab
@kevincchen
Use the data in aggregates (e.g. finding
activity patterns of city dwellers using
aggregated mobile phone activity).
55. #DataTalk
ex.pn/datatalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
We keep data science code open source so that
other nonprofit organizations can use these
resources to process their own data.
59. #DataTalk
ex.pn/datatalk
Kevin Chen
Chief Data Scientist, Experian Data Lab
@kevincchen
Add noise to the data, bucket the values (e.
g. age) or use coarser level of info (e.g. zip3
vs zip5) when possible are a few ways.
60. What type of data philanthropy
would you like to see happen?
63. #DataTalk
ex.pn/datatalk
I would really like to see projects that use
data to help understand, prevent, intervene,
and treat cancers.
Kevin Chen
Chief Data Scientist, Experian Data Lab
@kevincchen
65. #DataTalk
ex.pn/datatalk
Real Impact Analytics
@RIAnalytics
We would like to co-design a sustainable, scalable and open
ecosystem of mobile anti-poverty apps together with other
developers, NGOs, international agencies and philanthropists.
There will be different types of apps required, such as apps
supporting NGO’s in disaster relief or sudden outbreak of a
contagious disease or apps supporting ministries or public
authorities in their decision-making, optimizing the targeting
and impact of public policies. Most of emerging countries lack
data about their populations.
66. #DataTalk
ex.pn/datatalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
I also think it’s important for more
corporations like Experian and IBM to raise
awareness of #Data4Good projects.
67. What are ways to support
organizations and data scientists
working in data philanthropy?
68. #DataTalk
ex.pn/datatalk
Real Impact Analytics
@RIAnalytics
Philanthropists can best support D4G by joining the
dialogue with app developers and end-users on the public
questions to address. There is a clear need to fund
specific apps and an operational platform to ensure that
Data for Good becomes not only a science but foster also
operational impact. Securing such platform with a first set
of apps will generate spillovers and a positive dynamics
among the communities of developers, NGOs and public
institutions.
70. #DataTalk
ex.pn/datatalk
Elissa Redmiles
Data Science for Social Good Summer
Fellow at the University of Chicago
@eredmil1
Agree with @PrarieScience.
@NSF funding for outcomes based
#DataScience projects is key especially for
training data scientists.
71. #DataTalk
ex.pn/datatalk
Corporates can provide funding and
recognitions to their data scientists to
encourage participation in data
philanthropy projects.
Kevin Chen
Chief Data Scientist, Experian Data Lab
@kevincchen
74. Any final tips for data scientists who
want to use data for good?
75. ex.pn/datatalk
#DataTalkReal Impact Analytics
@RIAnalytics
Our main tip is to collaborate, as an operational open ecosystem is
critical to realize our shared vision of a healthy and poverty-free
world. Data for Good is at the cross-road of multiple skill sets, such
as data sciences, software development, algorithm design,
epidemiology, traffic modelling, field work involving poor
communities in emerging markets, telecom regulation. There is no
chance one organization could offer these internally. Data for Good
needs to offer both operational tools and scientific insights.