Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Scraping in 60 minutes (CIJ Summer School 2019)

158 vues

Publié le

Workshop at the Centre for Investigative Journalism Summer School, July 2019 introducing useful tools for scraping database search results and Twitter

Publié dans : Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Scraping in 60 minutes (CIJ Summer School 2019)

  1. 1. Paul Bradshaw Leanpub.com/scrapingforjournalists* Scraping in 60 mins
  2. 2. How do you scrape? Aron Pilhofer, News Rewired
  3. 3. WYSIWYG tools: OutWit Hub, Apify Browser extensions: Web Scraper, Grepsr, Google Sheets’ =IMPORT functions Workbench Data, IFTTT, Open Refine Morph. io Scraping tools
  4. 4. OutWit Hub
  5. 5. * Chrome extensions:
  6. 6. * Edit column > Add column by fetching URLs…
  7. 7. https://ifttt.com/channels
  8. 8. https://apify.com/apify/google-search-scraper
  9. 9. https://app.workbenchdata.com/workflows/
  10. 10. * app.workbenchdata.co m/workflows/22852 /22850 /25739
  11. 11. https://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
  12. 12. Robots.txt http://www.tcij.org/robots.txt
  13. 13. Database rights Data copyright Terms & conditions Legal considerations
  14. 14. https://moveplanner.zoopla.co.uk/terms-and-conditions
  15. 15. Treat like any source: build in TGTBT checks Seek second sources Seek right of reply/ confirmation Data is just a lead
  16. 16. http://www.storybench.org/to-scrape-or-not-to-scrape-the-technical-and-ethical-challenges-of-collecting-data-off-the-web/
  17. 17. https://www.mediawiki.org/wiki/API:Main_page Does it have an API?
  18. 18. https://github.com/BBC-Data-Unit/music-festivals
  19. 19. Paul Bradshaw Leanpub.com/scrapingforjournalists* Thank you.

×