Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
ContentMine + EPMC: Finding Zika!
1. Content Mine + Europe PMC
Peter Murray-Rust,
ContentMine.org and UniversityofCambridge
London, UK 2016-02-08
Finding Zika!
getpapers and AMI download and analyze papers from EuropePMC API:
F/OSS tools from contentmine.org
Images from Wikimedia CC-BY-SA
2. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org
3. Semantic Fulltext
• EuropePMC coherent OpenAccess
• getpapers: query , download (through API).
• AMI filters, checks[1], transforms facts in papers.
• sequences, species, genera, genes,
dictionaries
[0] All operations shown run in total of <3 minutes.
[1] Dictionaries and lookup.
[2] Usable from home by anyone
Zika endemic areas
Wikimedia CC-BY-SA
4. catalogue
getpapers
query
Daily
Crawl
EPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
5. Download all Open Access “Zika” from
EuropePMC in 10 seconds
(click below for movie)
Aedes aegypti, Wikimedia CC-BY-SA
Note: movies of this and other slides can be seen at https://vimeo.com/154705161
6. Downloaded all Open Access “Zika” from
EuropePMC in 10 seconds
Final download screen
7. Eyeballing 20/120 Zika papers,
click below for movie
Yellow Fever Virus
Wikimedia CC-BY-SA
Note: movie of this and other slides can be seen at https://vimeo.com/154705161
8. 3011 virus
1939 Ae./Aedes
1212 dengue
901 mosquito/es
894 species
791 ZIKV
721 using
716 DENV
567 detection
513 aegypti
484 infection
442 RNA
428 protein
401 albopictus
360 viral
Commonest words in 120 Zika papers
Mosquito spp.
Wikimedia CC-BY-SA
9. Filtering local files for sequence and viruses
AMI (part of ContentMine software)
(click below for movie)
Note: movies of this and other slides can be seen at https://vimeo.com/154705161
10. DNA Primers in running text
…the sodium channel voltage dependent gene (Nav). Primers
used to amplify this fragment were AaNaA
5’-ACAATGTGGATCGCTTCCC-3’
and AaNaB 5’-TGGACAAAAGCAAGGCTAAG-3’(8).
The primers amplify a fragment of approximately 472…
Snippet (quotable under 2014 UK Statutory Instrument (“Hargreaves”):
~/PMC4654492/results/sequence/dnaprimer/results.xml”
W3C Annotation
[PREFIX]
[MATCH] (link to target)
[SUFFIX]
CMine structure
plugin
option
DNA double stranded fragment
Wikimedia CC-BY-SA
11. Commonest species in 120 Zika papers
423 Ae./Aedes aegypti
333 Ae./Aedes albopictus
63 Ae. bromeliae
58 Ae. lilii
46 Ae. hensilli
42 Glossina pallidipes
40 Plasmodium vivax
35 Ae. luteocephalus
28 Ae. vittatus
25 Ae. furcifer
22 Plasmodium falciparum
21 Drosophila melanogaster
pre=“fever (DHF), are caused by the world's most prevalent mosquito-borne virus.
37 DENV is carried by " exact="Aedes aegypti” post=" mosquito, which is strongly
affected by ecological and human drivers, but also influenced by clima" name="binomial"/>
12. 183 Wolbachia
70 Aedes
69 Flavivirus/Flaviviridae
30 Glossina
17 Culex
Commonest genera in Zika papers
pre=”…-negative endosymbiotic bacterium, is a promising tool against diseases
transmitted by mosquitoes. " exact="Wolbachia” post=" can be found worldwide in
numerous arthropod species. More than 65% of all insect species are natu…”
Wolbachia in insect cell
Wikimedia CC-BY-SA
13. 38 ITS
20 MHC2TA
19 COI
14 CYPJ92
5 CYP6BB2
4 CYP9J28
3 MHC
Commonest genes in 120 Zika papers
15. CM Future
• Hypothes.is use ContentMine results for annotation
• (with Cambridge Univ Library) extracting daily
scientific facts from open and closed literature.
• with EBI, Cochrane Collaborations, JISC, OKF, LIBER,
TGAC/JohnInnes, DNADigest.
• Running workshops, hackdays.
• Planned outreach: MEPs, EC, Slashdot, Reddit,
Kickstarter, geekdom
• http://contentmine.org (OpenLock non-profit)
16. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org
Notes de l'éditeur
Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture.
In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.