The ContentMine project (http://contentmine.org) will harvest 100 million facts from the literature. Here we summarise the technology stack we're building to enable the first step: collecting the literature. This presentation was given with a paper (https://github.com/Blahah/scraperJSON-demo-paper) at WOSP 2014.