Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

What Does Responsible Data Science Mean?

Presentation at the University of Virginia Biocomplexity Institute Symposium on Data Science for the Public Good, Arlington VA August 9, 2019

  • Identifiez-vous pour voir les commentaires

What Does Responsible Data Science Mean?

  1. 1. What Does Responsible Data Science Mean? Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 08/09/19 Data Science for the Public Good @pebourne Thanks to Claudia Scholz for some slides
  2. 2. Context – Our new School of Data Science is intent on practicing responsible data science as our hallmark From our draft strategic plan – The practice of data science through education, research and service whereby all aspects of these endeavors consider the ethical, legal and policy aspects of all we do such that the reputation and integrity of the SDS are never in question. 08/09/19 Data Science for the Public Good
  3. 3. Opportunity – In over 40+ years in academia I have never seen anything as transformative as what is happening today 08/09/19 Data Science for the Public Good Data Science Initiatives Nationwide EffectCause https://surgery.duke.edu/divisions/trauma-and-critical-care-surgery The story of the trauma surgeon
  4. 4. https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist) https://www.microsoft.com/en-us/research/wp- content/uploads/2009/10/Fourth_Paradigm.pdf https://twitter.com/aip_publishing/status/856825353645559808 08/09/19 Data Science for the Public Good Of course this was all predicted by smart people ..
  5. 5. What is happening now is across all verticals – but there is a precedent we can learn from … 08/09/19 Data Science for the Public Good https://avora.com/blog/rise-of-the-data-warehouse/ https://individualizedmedicineblog.mayoclinic.org/2013/04/16/c elebrating-10th-anniversary-of-human-genome-project/ https://science.sciencemag.org/content/291/5507/1304
  6. 6. What is happening now is across all verticals – but there is a precedent we can learn from … 08/09/19 Data Science for the Public Good https://avora.com/blog/rise-of-the-data-warehouse/ DNA Sequence Data Since the Human Genome http://synbio.info/display/synbio/Genetic+data+likely+to+become+the+biggest+big+data+in+2025
  7. 7. What can we learn from what has come before…. Lesson 1 Responsible data science means recognizing that exponential growth of data leads to unexpected consequences 08/09/19 Data Science for the Public Good
  8. 8. 08/09/19 Data Science for the Public Good https://www.montana.edu/news/17886/public-forum-exploring-the-science-and-ethics-of-gene-editing- set-for-aug-7 http://theconversation.com/five-things-to-consider-before-ordering-an-online-dna-test-92504 https://www.cnbc.com/2019/05/02/ubiome-what-really-happened-at-health-start-up-raided-by-fbi.html Accuracy Do you want to know? You can do it at home What is ethical in the research lab is not when commercialized
  9. 9. The 6D’s provides one description of the consequences.. 08/09/19 Data Science for the Public Good
  10. 10. Lesson 1 Exponential growth of data leads to unexpected consequences Responsible data science anticipates or at least prepares to deal with such consequences ahead of time 08/09/19 Data Science for the Public Good
  11. 11. Lesson 2 – Its all too easy to forget the negative consequences when … 08/09/19 Data Science for the Public Good [Courtesy Eric Green, NHGRI]
  12. 12. Lesson 3 – Policies and laws lag… 08/09/19 Data Science for the Public Good http://www.navajo-nsn.gov/News%20Releases/OPVP/2019/may/FOR%20IMMEDIATE%20RELEASE%20- %20Navajo%20Nation%20signs%20data%20sharing%20agreement%20to%20advance%20uranium%20exposure%20research%20efforts.pdf
  13. 13. Lesson 4 – Data sharing is a double edge sword… 08/09/19 Data Science for the Public Good
  14. 14. On the plus side data sharing can save lives … Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive [From Adam Resnick] 08/09/19 Data Science for the Public Good
  15. 15. Timeline of genomic studies in DIPG • 2012 Landmark studies identify histone mutations as recurrent driver mutations in DIPG • The data were not shared for 3 years • In 2015 in largely the same datasets, others identify ACVR1 mutations as a secondary, co- occurring mutation • ACVR1 is targetable by a drug • 3 years = 180 lives From Adam Resnick 08/09/19 Data Science for the Public Good
  16. 16. NIH Strategic Plan for Data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data- Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 08/09/19 Data Science for the Public Good https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  17. 17. Lesson 4 – Data sharing is a double edge sword… 08/09/19 Data Science for the Public Good
  18. 18. STATE HEALTH SURVEILLANCE: NEWBORN SCREENING CASE STUDY From Bonnie R and Bernheim R, Public Health Law, Policy and Ethics, Foundation Press (2015) Category Variables Infant Patient ID, Birth date, birth time, ethnicity, weight in grams, feeding type, transfusion status, zip code of mother Sample Sample ID, collection date, received date, disposition code for sample (satisfactory/not satisfactory) Submitter Submitter ID, submitter name Test 36 different tests Diagnosis Diagnosis, diagnosis date, sample ID The final dataset contained more than 1.6 million sample records and nearly 29,000 diagnosis records 08/09/19 Data Science for the Public Good
  19. 19. Zip Code Level Sickle Cell Prevalence 08/09/19 Data Science for the Public Good
  20. 20. Given these lessons – there are many others – from just one vertical what should we be doing as a School of Data Science to be responsible while undertaking data science for the public good? 08/09/19 Data Science for the Public Good
  21. 21. Guiding Principles … Be open, transparent & collaborative in all we do • Make ourselves known - use persistent identifiers e.g., ORCID • Use preprints to accelerate progress • Only publish Open Access (OA) • Recognize openness, transparency & collaboration in hiring and P&T • Promote institutional openness – Open Data Lab, wikimedian in residence • Support institutional open data governance 08/09/19 Data Science for the Public Good
  22. 22. Guiding Principles … Consider the ethical consequences across the complete data workflow 08/09/19 Data Science for the Public Good
  23. 23. Acquisition Engineering Analysis Communication Dissemination Ethics ● Census, surveys ● Data mining, digitization ● Sensors, Internet of Things (IoT) Ethical Issues: ● Mass surveillance ● Privacy, terms of service ● Data sovereignty Data Acquisition: Information → Data Job titles: ● IoT engineer ● Chief privacy officer ● Survey designer https://www.wired.com/story/all-of-us-launches/
  24. 24. Acquisition Engineering Analysis Communication Dissemination Ethics ● Integration of data sources ● Data wrangling & cleaning ● Data structures ● Cloud & parallel computing Ethical Issues: ● Intellectual property ● Consequences of integration Data Engineering: Data → Value Job titles: ● Data engineer ● Information systems engineer
  25. 25. Acquisition Engineering Analysis Communication Dissemination Ethics ● Machine learning ○ supervised, unsupervised ● Models & simulations Ethical Issues: ● Algorithmic bias ● Accountability & transparency Data Analysis: Data → Knowledge Job titles: ● Data Scientist or Analyst ● Machine Learning Engineer
  26. 26. Acquisition Engineering Analysis Communicatio n Dissemination Ethics ● Visualization ● Storytelling Ethical Issues: ● Confidentiality ● Distortion of facts Data Communication: Data → Insight Job titles: ● Data Journalist ● Information Designer ● Dashboard Manager
  27. 27. Acquisition Engineering Analysis Communication Disseminatio n Ethics ● Data preservation ● Reproducibility of research ● F.A.I.R. & open Ethical Issues: ● Cybersecurity ● Dual use Data Dissemination: Data → Future Use Job titles: ● Data Steward ● Repository manager ● Open Science advocate
  28. 28. Take home • The fourth paradigm is upon us and will change society • Forming a new schools is an opportunity to do it right – we need help! • Look to fields like genomics that have been doing data science for some time and consider best (and worst) practices • Responsible data science involves working by a set of guiding principles and.. • Considering the consequences of what we do across the complete data lifecycle 08/09/19 Data Science for the Public Good Only then will we truly be undertaking data science for the public good
  29. 29. Acknowledgements 08/09/19 Data Science for the Public Good The BD2K Team at NIH The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
  30. 30. Thank You peb6a@virginia.edu 08/09/19 Data Science for the Public Good

×