SlideShare une entreprise Scribd logo
1  sur  71
Data, Code, and
Research at Scale
         Josh Greenberg
  The Alfred P. Sloan Foundation

      greenberg@sloan.org
       @epistemographer
Disclaimer
These statements do not necessarily reflect the
thoughts of the Alfred P. Sloan Foundation or my
        colleagues; they are mine alone.
Research at Scale
http://www.flickr.com/photos/ryanwick/3461850112/
http://www.flickr.com/photos/tncountryfan/5543540985/
Macroscope
http://pespmc1.vub.ac.be/macroscope/default.html
“My aim here is to inspire computer scientists to
implement software frameworks that empower
domain scientists to assemble their own continuously
evolving macroscopes, adding and upgrading existing
(and removing obsolete) plug-ins to arrive at a set
that is truly relevant for their work”

           Katy Borner, “Plug and Play Macroscopes”




            http://cacm.acm.org/magazines/2011/3/105316-plug-and-play-macroscopes/fulltext
http://ngrams.googlelabs.com/graph?content=science,+technology&year_start=1800&year_end=2000&corpus=0&smoothing=3
http://blog.okcupid.com/index.php/the-best-questions-for-first-dates/
Data
Big Data
SDSS



       http://www.sdss.org/includes/sideimages/sm_sdss_pie2.jpg
Census of Marine Life


                   http://comlmaps.org/oceanlifemap
http://www.flickr.com/photos/72427965@N00/3731550892/
http://www.flickr.com/photos/anders-vindegg/3369218571/
Code
http://www.flickr.com/photos/roidelapatate/4313265988/
http://ngrams.googlelabs.com/graph?content=science,+technology&year_start=1800&year_end=2000&corpus=0&smoothing=3
http://ngrams.googlelabs.com/graph?content=science,+technology&year_start=1800&year_end=2000&corpus=0&smoothing=3
http://www.sciencemag.org/content/331/6014/176.full#F1
Who does the work?
Data Science
Data Science
Engineering




                  Applied Math
              John Rauser @ http://www.youtube.com/watch?v=0tuEEnL61HM
Ap
  pli
        ed
             M
              at
                 h



                                                   Writing




       ri ng
     ee
  gin
En




                     John Rauser @ http://www.youtube.com/watch?v=0tuEEnL61HM
Ap
  pli
        ed
             M
              at
                 h



                                                   Writing




       ri ng
     ee
  gin
En




                     John Rauser @ http://www.youtube.com/watch?v=0tuEEnL61HM
Data Science
 (#alt-ac?)
All hands on deck
Galaxy Zoo
http://www.oldweather.org/
http://menus.nypl.org
Galaxy Zoo
Epistemology
Epistemology
          of Big Data?



(Flip Kromer)
Screwmeneutics?



http://www.playingwithhistory.com/wp-content/uploads/2010/04/hermeneutics.pdf
http://www.flickr.com/photos/amishsteve/98994505/
Trust
http://en.wikipedia.org/wiki/File:Library_of_Congress,_Rosenwald_4,_Bl._5r.jpg
Reproducibility
empirical falsifiability : methods
                 ::
hermeneutic inquiry : provenance
Citation
Our means of dissemination are
out of sync with the methods of
      scholarly production
http://www.sciencemag.org/content/331/6014/176.full#F1
A thought experiment:
A thought experiment:
  What if we wrote
scholarship like code?
Version Control
Tagged release
Bug Tracking
The very technology that
enables research at scale
 potentially enables new
 modes of dissemination
http://www.stodden.net/AMP2011/
http://en.wikipedia.org/wiki/File:Panopticon.jpg
Del Rigor en la Ciencia
                         Jorge Luis Borges

“En aquel Imperio, el Arte de la Cartografía logró tal Perfección
que el Mapa de una sola Provincia ocupaba toda una Ciudad, y el
Mapa del Imperio, toda una Provincia. Con el tiempo, estos Mapas
Desmesurados no satisficieron y los Colegios de Cartógrafos
levantaron un Mapa del Imperio, que tenía el Tamaño del Imperio y
coincidía puntualmente con él. Menos Adictas al Estudio de la
Cartografía, las Generaciones Siguientes entendieron que ese
dilatado Mapa era Inútil y no sin Impiedad lo entregaron a las
Inclemencias del Sol y los Inviernos. En los Desiertos del Oeste
perduran despedazadas Ruinas del Mapa, habitadas por Animales y
por Mendigos; en todo el País no hay otra reliquia de las Disciplinas
Geográficas.

“Suárez Miranda: Viajes de varones prudentes,
libro cuarto, cap. XLV, Lérida, 1658.”




                        via http://elmundoenverso.blogspot.com/2007/12/del-rigor-en-la-ciencia-jorge-lus.html
Discuss...
One more thing...
Research at Scale
Disaggregation of
scholarly materials
Flourishing of new
 channels / genres
Humanities : blogs
                   ::
   Social Sciences : SSRN (preprint)
                   ::
Sciences : PLoS ONE (rapid publication)
Addition of data and
   code to pile
New macroscopic
methods of discovery,
  assessing impact
Why (digital) humanities?

Contenu connexe

Tendances

Circulating ideas
Circulating ideasCirculating ideas
Circulating ideas
circideas
 
Mc collum meghan-slideshow
Mc collum meghan-slideshowMc collum meghan-slideshow
Mc collum meghan-slideshow
meghanmccollum47
 

Tendances (20)

Web_Analytics_Part1--Turning_Numbers_Into_Action--1-20-2011
Web_Analytics_Part1--Turning_Numbers_Into_Action--1-20-2011Web_Analytics_Part1--Turning_Numbers_Into_Action--1-20-2011
Web_Analytics_Part1--Turning_Numbers_Into_Action--1-20-2011
 
Williams_Preston_Assignment4.4FinalPPPSlideshow
Williams_Preston_Assignment4.4FinalPPPSlideshowWilliams_Preston_Assignment4.4FinalPPPSlideshow
Williams_Preston_Assignment4.4FinalPPPSlideshow
 
Zwaard, Kate: Technology and Community: Why we need partners, collaborators a...
Zwaard, Kate: Technology and Community: Why we need partners, collaborators a...Zwaard, Kate: Technology and Community: Why we need partners, collaborators a...
Zwaard, Kate: Technology and Community: Why we need partners, collaborators a...
 
Circulating ideas
Circulating ideasCirculating ideas
Circulating ideas
 
7 startup business plan traps
7 startup business plan traps7 startup business plan traps
7 startup business plan traps
 
Libraries and Transliteracy: An Introduction for Medical Librarians
Libraries and Transliteracy: An Introduction for Medical LibrariansLibraries and Transliteracy: An Introduction for Medical Librarians
Libraries and Transliteracy: An Introduction for Medical Librarians
 
Genocide in Sudan
Genocide in SudanGenocide in Sudan
Genocide in Sudan
 
Elisabeth Sahtouris - The Nature of Consciousness
Elisabeth Sahtouris - The Nature of ConsciousnessElisabeth Sahtouris - The Nature of Consciousness
Elisabeth Sahtouris - The Nature of Consciousness
 
Creating a PLN
Creating a PLNCreating a PLN
Creating a PLN
 
La città
La cittàLa città
La città
 
Finalpresentation
FinalpresentationFinalpresentation
Finalpresentation
 
Mc collum meghan-slideshow
Mc collum meghan-slideshowMc collum meghan-slideshow
Mc collum meghan-slideshow
 
Amanda S. Issues in Africa
Amanda S. Issues in AfricaAmanda S. Issues in Africa
Amanda S. Issues in Africa
 
Tervezz szokást! - WIAD, Mobile Hungary - Kolozsi István, kolboid
Tervezz szokást! - WIAD, Mobile Hungary - Kolozsi István, kolboidTervezz szokást! - WIAD, Mobile Hungary - Kolozsi István, kolboid
Tervezz szokást! - WIAD, Mobile Hungary - Kolozsi István, kolboid
 
Emily H. Issues in Africa
Emily H. Issues in AfricaEmily H. Issues in Africa
Emily H. Issues in Africa
 
Google Tools for Schools
Google Tools for SchoolsGoogle Tools for Schools
Google Tools for Schools
 
How to Write With Style
How to Write With StyleHow to Write With Style
How to Write With Style
 
How 2.0 Makes Your Life Easier
How 2.0 Makes Your Life EasierHow 2.0 Makes Your Life Easier
How 2.0 Makes Your Life Easier
 
Coordinadors TIC TAC
Coordinadors TIC TACCoordinadors TIC TAC
Coordinadors TIC TAC
 
Visual Notetaking and Dreaming Big (Dec 2013)
Visual Notetaking and Dreaming Big (Dec 2013)Visual Notetaking and Dreaming Big (Dec 2013)
Visual Notetaking and Dreaming Big (Dec 2013)
 

Similaire à Data, Code, and Research at Scale

Thoreau 2.0
Thoreau 2.0Thoreau 2.0
Thoreau 2.0
lrougeux
 
2. idea development unit 9
2. idea development unit 92. idea development unit 9
2. idea development unit 9
JoshEastham2
 
410 annotated bibliography
410 annotated bibliography410 annotated bibliography
410 annotated bibliography
Wyatt Hilyard
 
21st Century Education
21st Century Education21st Century Education
21st Century Education
Shane Mason
 
10217, 2(55 PMWhy people believe in conspiracy theories – an.docx
10217, 2(55 PMWhy people believe in conspiracy theories – an.docx10217, 2(55 PMWhy people believe in conspiracy theories – an.docx
10217, 2(55 PMWhy people believe in conspiracy theories – an.docx
drennanmicah
 

Similaire à Data, Code, and Research at Scale (20)

a future where data citation Counts
a future where data citation Countsa future where data citation Counts
a future where data citation Counts
 
The Potential of Web 3.0
The Potential of Web 3.0The Potential of Web 3.0
The Potential of Web 3.0
 
Thoreau 2.0
Thoreau 2.0Thoreau 2.0
Thoreau 2.0
 
Pandora
PandoraPandora
Pandora
 
Inno'PLAY'ion
Inno'PLAY'ionInno'PLAY'ion
Inno'PLAY'ion
 
Facing the Music: Are Information Professionals and Researchers Dancing to Di...
Facing the Music: Are Information Professionals and Researchers Dancing to Di...Facing the Music: Are Information Professionals and Researchers Dancing to Di...
Facing the Music: Are Information Professionals and Researchers Dancing to Di...
 
Facing the Music: ELAG 2013 Presentation
Facing the Music: ELAG 2013 PresentationFacing the Music: ELAG 2013 Presentation
Facing the Music: ELAG 2013 Presentation
 
2. idea development unit 9
2. idea development unit 92. idea development unit 9
2. idea development unit 9
 
410 annotated bibliography
410 annotated bibliography410 annotated bibliography
410 annotated bibliography
 
21st Century Education
21st Century Education21st Century Education
21st Century Education
 
10217, 2(55 PMWhy people believe in conspiracy theories – an.docx
10217, 2(55 PMWhy people believe in conspiracy theories – an.docx10217, 2(55 PMWhy people believe in conspiracy theories – an.docx
10217, 2(55 PMWhy people believe in conspiracy theories – an.docx
 
Being there: on innovation, revolution and radicalism in the museum
Being there: on innovation, revolution and radicalism in the museumBeing there: on innovation, revolution and radicalism in the museum
Being there: on innovation, revolution and radicalism in the museum
 
"Where good ideas come from"
"Where good ideas come from""Where good ideas come from"
"Where good ideas come from"
 
Speech Critique Essay Examples.pdf
Speech Critique Essay Examples.pdfSpeech Critique Essay Examples.pdf
Speech Critique Essay Examples.pdf
 
Who, Why & How We Serve: The Evolution of Collaborative Librarianship Through...
Who, Why & How We Serve: The Evolution of Collaborative Librarianship Through...Who, Why & How We Serve: The Evolution of Collaborative Librarianship Through...
Who, Why & How We Serve: The Evolution of Collaborative Librarianship Through...
 
What Happens When You Donate Your Career to Science
What Happens When You Donate Your Career to ScienceWhat Happens When You Donate Your Career to Science
What Happens When You Donate Your Career to Science
 
20 Lessons From Creating An Online Outreach Empire
20 Lessons From Creating An Online Outreach Empire20 Lessons From Creating An Online Outreach Empire
20 Lessons From Creating An Online Outreach Empire
 
Wizard of Apps Revised
Wizard of Apps RevisedWizard of Apps Revised
Wizard of Apps Revised
 
New Media Consortium 2016 conference: my keynote
New Media Consortium 2016 conference: my keynoteNew Media Consortium 2016 conference: my keynote
New Media Consortium 2016 conference: my keynote
 
Davidson sgp
Davidson sgpDavidson sgp
Davidson sgp
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Data, Code, and Research at Scale

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. Figure from Joël de Rosnay, 1979 book “The Macroscope”\n
  8. \n
  9. This is happening elsewhere, across other fields. Consider the impact of Google Books as a macroscope.\n
  10. \n
  11. \n34,260 real-life couples - “I met someone on OkCupid”, give username, hundreds per day\n\n- Would you consider sleeping with someone on the first date :: do you like the taste of beer?\n- Long-term compatibility :: Do you like horror movies?; Have you ever traveled around another country alone?; Wouldn't it be fun to chuck it all and go live on a sailboat?\n\n
  12. \n
  13. \n
  14. The Foundation makes grants to support original research and broad-based education related to science, technology, and economic performance; and to improve the quality of American life\n\nOne thing to know about Sloan - the Foundation likes data. A lot.\n
  15. The Sloan Digital Sky Survey or SDSS is a major multi-filter imaging and spectroscopic redshift survey using a dedicated 2.5-m wide-angle optical telescope at Apache Point Observatory in New Mexico, United States\nThe survey was begun in 2000, and has mapped over 35% of the sky\n\n
  16. Census of Marine Life - “global network of researchers in more than 80 nations engaged in a 10-year scientific initiative to assess and explain the diversity, distribution, and abundance of life in the oceans.”\n
  17. Indoor Environment - in fact, virtually every science or social science program we have now involves a data infrastructure\n
  18. Data deluge\n
  19. What to throw away?\n
  20. Code\n
  21. Data’s great, but to work with it at scale, you need code.\n\n(The coffee grinder analogy isn’t quite right, but be glad that you didn’t get a meat grinder instead)\n
  22. The n-gram viewer is a big black box. We have no idea what’s happening inside.\n
  23. They do offer links to the data itself\n
  24. Look at arrows, which mask some important transformations.\n
  25. A lot of my scholarly work was on “mediators”, the people between producers and consumers. Oriented in this direction. Handwork vs. work “at scale”\n
  26. NPR piece on data science\n
  27. John Rauser from Amazon at Strata NYC 2011\n
  28. John Rauser from Amazon at Strata NYC 2011\n\n“Telling stories with Data”\n
  29. John Rauser from Amazon at Strata NYC 2011\n
  30. NPR piece on data science\n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. Beyond data cleanup, production of new knowledge. Communication between participants (channel Lintott)\n
  38. \n
  39. Two main modes of knowledge production: scientific method founded on empirical falsifiability, and hermeneutic approaches that characterize much of the humanities and some social science.\n
  40. Get a big pile of stuff, look for patterns, and iteratively hone in.\n\nAny economist will start shouting “correlation, not causation”.\n
  41. nod to Dan Atkins for mentioning it yesterday - data mining\n
  42. Steve Ramsay on browsing a library: “Here, I don’t know what I’m looking for, really. I just have a bundle of ‘interests’ and proclivities. I’m not really trying to find ‘a path through culture.’ I’m really just screwing around.”\n
  43. Working at scale with data - Sense of play, fiddling with knobs. Exploration, visualization.\n\n\n
  44. \n
  45. Standing on the shoulders of giants\n
  46. \n
  47. \n
  48. takes for granted wide system of institutions, as well as platforms and genres. Cite a book, you can trust a broad system of libraries as well as the consistency of individual manifestations of the same work\n
  49. Chain of evidence\n
  50. Data, code, are all important\n
  51. Let’s imagine you publish an article. Many possible points of failure along chain moving upstream. Sociologists of science describe process of contestation as sequential opening of black boxes...\n
  52. \n
  53. \n
  54. \n
  55. Dan Cohen talked about learning to live with imperfection - software is never perfect, it’s just shipped.\n
  56. Social features (Github)\n
  57. Not everyone gets commit access; bug tracking is a form of decentralized review\n
  58. Forking\n
  59. \n
  60. Workshop hosted by Victoria Stodden and others that convened projects that leverage technology in the interest of reproducible research\n
  61. Some problems - looked at through another lens, this is essentially a culture of surveillance where everything is visible at all times.\n
  62. Also, limited resources mean that the perfect capture of everything isn’t feasible, or useful downstream\n
  63. Lots to decide on, and hopefully affirmatively address rather than simply allow technology to determine.\n
  64. \n
  65. Step back and look not at individual research projects, but the overall system. We’re seeing a lot of changes...\n
  66. \n
  67. \n
  68. \n
  69. \n
  70. Dan Cohen on PressForward yesterday (12/2/11) - “if you don’t like our choices, you can check our work”\n
  71. Opportunities to innovate in humanities, given 1) low stakes in publishing industry, 2) close linkages with libraries, and 3) vibrant community discussion.\n