call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
NCBI Entrez API Guide for Advanced Bioinformatics
1. Advanced NCBI.
The Entrez API
https://github.com/lindenb/courses
Pierre Lindenbaum
@yokofakun
pierre.lindenbaum@univ-nantes.fr
http://plindenbaum.blogspot.com
Institut du Thorax. Nantes. France
September 27, 2016
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
2. NCBI ? What about EBI, ENSEMBL, ...
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
4. What will be covered today? :
File formats...
EInfo, GQuery, ESearch , Esummary, EFetch..
processing XML answer with XSLT: HTML, SVG, R...
generating a java parser for dbSNP.
NCBI EBot
using standalone BLAST
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
5. CURL
c u r l ” http :// en . w i k i p e d i a . org / wiki /Main page”
wget −O − ” http :// en . w i k i p e d i a . org / wiki /Main page”
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
9. XSLTPROC
x s l t p r o c s t y l e s h e e t . x s l f i l e . xml > r e s u l t . xml
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
12. Formats
Genbank
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=25&rettype=gb
LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992
DEFINITION Blue Whale heavy s a t e l l i t e DNA.
ACCESSION X53813 X17460
VERSION X53813 .1 GI :25
KEYWORDS s a t e l l i t e DNA.
SOURCE Balaenoptera musculus ( Blue whale )
ORGANISM Balaenoptera musculus
Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ;
Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;
M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera .
REFERENCE 1 ( bases 1 to 422)
AUTHORS Arnason ,U. and Widegren ,B.
TITLE Composition and chromosomal l o c a l i z a t i o n of cetacean h i g h l y
r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the blue whale ,
Balaenoptera musculus
JOURNAL Chromosoma 98 (5) , 323−329 (1989)
PUBMED 2612291
COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common cetacean component c l o n e s
and <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s .
FEATURES Location / Q u a l i f i e r s
source 1 . . 4 2 2
/ organism=”Balaenoptera musculus ”
/ mol type=”genomic DNA”
/ d b x r e f=”taxon :9771”
/ c l o n e =”7”
m i s c f e a t u r e 1 . . 4 2 2
/ note=”heavy s a t e l l i t e DNA”
ORIGIN
1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t gPierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
13. Formats
ASN.1
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=25
Seq−e n t r y ::= seq {
i d {
embl {
a c c e s s i o n ”X53813” ,
v e r s i o n 1 } ,
g i 25 } ,
d e s c r {
t i t l e ” Blue Whale heavy s a t e l l i t e DNA” ,
source {
org {
taxname ” Balaenoptera musculus ” ,
common ” Blue whale ” ,
db {
{
db ” taxon ” ,
tag
i d 9771 } } ,
orgname {
name
b i no m i al {
genus ” Balaenoptera ” ,
s p e c i e s ” musculus ” } ,
l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; Craniata ; Ve r t e b r a t a ;
Euteleostomi ; Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;
M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera ” ,
gcode 1 ,
mgcode 2 ,
d i v ”MAM” } } ,
subtype {Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
14. Formats
ASN.1 (schema)
http:
//www.ncbi.nlm.nih.gov/data_specs/asn/insdseq.asn
INSDSeq ::= SEQUENCE {
l o c u s V i s i b l e S t r i n g ,
l e n g t h INTEGER ,
s t r a n d e d n e s s V i s i b l e S t r i n g OPTIONAL ,
moltype V i s i b l e S t r i n g ,
topology V i s i b l e S t r i n g OPTIONAL ,
d i v i s i o n V i s i b l e S t r i n g ,
update−date V i s i b l e S t r i n g ,
create−date V i s i b l e S t r i n g OPTIONAL ,
update−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,
create−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,
d e f i n i t i o n V i s i b l e S t r i n g ,
primary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL ,
entry−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,
a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,
other−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL ,
secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL,
p r o j e c t V i s i b l e S t r i n g OPTIONAL ,
keywords SEQUENCE OF INSDKeyword OPTIONAL ,
segment V i s i b l e S t r i n g OPTIONAL ,
source V i s i b l e S t r i n g OPTIONAL ,
organism V i s i b l e S t r i n g OPTIONAL ,
taxonomy V i s i b l e S t r i n g OPTIONAL ,
r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL ,
comment V i s i b l e S t r i n g OPTIONAL ,
comment−s e t SEQUENCE OF INSDComment OPTIONAL ,
struc−comments SEQUENCE OF INSDStrucComment OPTIONAL ,
primary V i s i b l e S t r i n g OPTIONAL ,
source−db V i s i b l e S t r i n g OPTIONAL ,Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
15. Formats
ASN.1 (tools)
DATATOOL
Generate C++ data storage classes based on ASN.1 serialization
streams.
Convert data between ASN.1, XML and JSON formats.
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
16. Formats
XML
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=25&retmode=xml
<?xml v e r s i o n=” 1.0 ”?>
<!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncbi . nlm . nih . gov/ dtd /NCBI G
<GBSet>
<GBSeq>
<GBSeq locus>X53813</ GBSeq locus>
<GBSeq length>422</ GBSeq length>
<GBSeq strandedness>double</ GBSeq strandedness>
<GBSeq moltype>DNA</GBSeq moltype>
<GBSeq topology>l i n e a r</ GBSeq topology>
<GBSeq division>MAM</ GBSeq division>
<GBSeq update−date>22−JUN−1992</GBSeq update−date>
<GBSeq create−date>13−JUL−1990</ GBSeq create−date>
<G B S e q d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ G B S e q d e f i n i t i o n>
<GBSeq primary−a c c e s s i o n>X53813</ GBSeq primary−a c c e s s i o n>
<GBSeq accession−v e r s i o n>X53813 .1</ GBSeq accession−v e r s i o n>
<GBSeq other−s e q i d s>
<GBSeqid>emb| X53813 . 1 |</GBSeqid>
<GBSeqid>g i |25</GBSeqid>
</ GBSeq other−s e q i d s>
<GBSeq secondary−a c c e s s i o n s>
<GBSecondary−accn>X17460</GBSecondary−accn>
</ GBSeq secondary−a c c e s s i o n s>
<GBSeq keywords>
<GBKeyword>s a t e l l i t e DNA</GBKeyword>
</GBSeq keywords>
<GBSeq source>Balaenoptera musculus ( Blue whale )</ GBSeq source>
<GBSeq organism>Balaenoptera musculus</ GBSeq organism>
<GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t eb r a t a ; Euteleostomi ; Mam
a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera</GBSeq taxonomy>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
17. Formats
XML (DTD)
http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.mod.dtd
<!ELEMENT GBSeq (
GBSeq locus ,
GBSeq length ,
GBSeq strandedness ? ,
GBSeq moltype ,
GBSeq topology ? ,
GBSeq division ,
GBSeq update−date ,
GBSeq create−date ? ,
GBSeq update−r e l e a s e ? ,
GBSeq create−r e l e a s e ? ,
GBSeq definition ,
GBSeq primary−a c c e s s i o n ? ,
GBSeq entry−v e r s i o n ? ,
GBSeq accession−v e r s i o n ? ,
GBSeq other−s e q i d s ? ,
GBSeq secondary−a c c e s s i o n s ? ,
GBSeq project ? ,
GBSeq keywords ? ,
GBSeq segment ? ,
GBSeq source ? ,
GBSeq organism ? ,
GBSeq taxonomy ? ,
GBSeq references ? ,
GBSeq comment ? ,
GBSeq comment−s e t ? ,
GBSeq struc−comments ? ,
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
21. E-Utilities
Set of seven server-side programs that provide a stable interface to
the search, retrieval, and linking functions of the Entrez system,
using a fixed URL syntax.
The output provided by the E-Utilities is in XML format,
sometimes JSON, (...)
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
22. Entrez Direct
http://www.ncbi.nlm.nih.gov/books/NBK179288/ ”Entrez
Direct (EDirect) is an advanced method for accessing the NCBI’s
set of interconnected databases (publication, sequence, structure,
gene, variation, expression, etc.) from a UNIX terminal window.
Functions take search terms from command-line arguments.
Individual operations are combined to build multi-step queries.
Record retrieval and formatting normally complete the process.”
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
24. EInfo
Provides a list of the names of all valid Entrez databases.
Provides statistics for a single database, including lists of indexing
fields and available link names.
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
26. EInfo
XML Ouput
https:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
<e I n f o R e s u l t>
<DbList>
<DbName>pubmed</DbName>
<DbName>p r o t e i n</DbName>
<DbName>nuccore</DbName>
<DbName>n u c l e o t i d e</DbName>
<DbName>nucgss</DbName>
<DbName>nucest</DbName>
<DbName>s t r u c t u r e</DbName>
<DbName>genome</DbName>
<DbName>assembly</DbName>
<DbName>gcassembly</DbName>
<DbName>genomeprj</DbName>
<DbName>b i o p r o j e c t</DbName>
<DbName>biosample</DbName>
<DbName>biosystems</DbName>
<DbName>b l a s t d b i n f o</DbName>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
27. EInfo
JSON Ouput
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.
fcgi?retmode=json
{
” header ”: {
” type ”: ” e i n f o ” ,
” v e r s i o n ”: ”0.3”
} ,
” e i n f o r e s u l t ”: {
” d b l i s t ”: [
”pubmed” ,
” p r o t e i n ” ,
” nuccore ” ,
( . . . )
” unigene ” ,
” g e n c o l l ” ,
” gtr ”
]
}
}Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
28. EInfo
Return statistics for a given Entrez database:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?
db=DbName
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
29. EInfo
Statistics for Pubmed
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.
fcgi?db=pubmed
<?xml v e r s i o n=” 1.0 ”?>
<e I n f o R e s u l t>
<DbInfo>
<DbName>pubmed</DbName>
<MenuName>PubMed</MenuName>
<D e s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ D e s c r i p t i o n>
<DbBuild>Build130805 −2117m.4</ DbBuild>
<Count>22974581</Count>
<LastUpdate>2013/08/06 08 :33</ LastUpdate>
<F i e l d L i s t>
( . . . )
<F i e l d>
<Name>UID</Name>
<FullName>UID</FullName>
<D e s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ D e s c r i p t i o n>
<TermCount>0</TermCount>
<IsDate>N</ IsDate>
<I s N u m e r i c a l>Y</ I s N u m e r i c a l>
<SingleToken>Y</ SingleToken>
<H i e r a r c h y>N</ H i e r a r c h y>
<IsHidden>Y</ IsHidden>
</ F i e l d>
<F i e l d>
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
30. EInfo
Statistics for Pubmed
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.
fcgi?db=pubmed&retmode=json
{
” header ”: {
” type ”: ” e i n f o ” ,
” v e r s i o n ”: ”0.3”
} ,
” e i n f o r e s u l t ”: {
” d b i n f o ”: {
”dbname ”: ”pubmed ” ,
”menuname ”: ”PubMed” ,
” d e s c r i p t i o n ”: ”PubMed b i b l i o g r a p h i c r e c o r d ” ,
” d b b u i l d ”: ” Build160921 −2207m.6” ,
” count ”: ”26470199” ,
” l a s t u p d a t e ”: ”2016/09/22 16:32” ,
” f i e l d l i s t ”: [
{
”name ”: ”ALL” ,
” fullname ”: ” A l l F i e l d s ” ,
” d e s c r i p t i o n ”: ” A l l terms from a l l s e a r c h a b l e f i e l d s ” ,
” termcount ”: ”179424126” ,
” i s d a t e ”: ”N” ,
” i s n u m e r i c a l ”: ”N” ,
” s i n g l e t o k e n ”: ”N” ,
” h i e r a r c h y ”: ”N” ,
” i s h i d d e n ”: ”N”
} ,
{
”name ”: ”UID” ,
” fullname ”: ”UID” ,
” d e s c r i p t i o n ”: ” Unique number a s s i g n e d to p u b l i c a t i o n ” ,Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
31. EInfo
With entrez-direct
$ e i n f o −dbs
$ e i n f o −db pubmed
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
33. GQuery
Provides the number of records retrieved in all Entrez databases by
a single text query.
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
34. GQuery
Example
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode
=xml”
<R e s u l t>
<Term>t y r a n n o s a u r u s rex</Term>
<eGQueryResult>
<ResultItem><DbName>pubmed</DbName><MenuName/><Count>41</Count><Status>
Ok</ Status></ ResultItem>
<ResultItem><DbName>pmc</DbName><MenuName/><Count>160</Count><Status>Ok<
/ Status></ ResultItem>
<ResultItem><DbName>mesh</DbName><MenuName/><Count>15</Count><Status>Ok<
/ Status></ ResultItem>
<ResultItem><DbName>books</DbName><MenuName/><Count>179</Count><Status>
Ok</ Status></ ResultItem>
<ResultItem><DbName>pubmedhealth</DbName><MenuName/><Count>21</Count><
Status>Ok</ Status></ ResultItem>
<ResultItem><DbName>omim</DbName><MenuName/><Count>10</Count><Status>Ok<
/ Status></ ResultItem>
<ResultItem><DbName>omia</DbName><MenuName/><Count>0</Count><Status>Term
or Database i s not found</ Status></ ResultItem>
<ResultItem><DbName>n c b i s e a r c h</DbName><MenuName/><Count>1</Count><
Status>Ok</ Status></ ResultItem>
<ResultItem><DbName>nuccore</DbName><MenuName/><Count>0</Count><Status>
Term or Database i s not found</ Status></ ResultItem>
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
35. GQuery
Transforming to HTML using XSLT
The XSLT stylesheet. https://raw.githubusercontent.com/
lindenb/courses/master/about.ncbi/gquery2html.xsl
1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?>
2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’>
3 <x s l : o u t p u t method=” html ”/>
4
5 <x s l : t e m p l a t e match=”/”><html><body>
6 <x s l : a p p l y −templates s e l e c t=” R e s u l t ”/>
7 </body></ html></ x s l : t e m p l a t e>
8
9 <x s l : t e m p l a t e match=” R e s u l t ”>
10 <t a b l e><c a p t i o n><x s l : v a l u e −of s e l e c t=”Term”/></ c a p t i o n>
11 <t r><th>Database</ th><th>Count</ th><th>Status</ th></ t r>
12 <x s l : a p p l y −templates s e l e c t=” eGQueryResult / ResultItem ”/>
13 </ t a b l e>
14 </ x s l : t e m p l a t e>
15
16 <x s l : t e m p l a t e match=” ResultItem ”>
17 <t r>
18 <td><a>
19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncbi . nlm . nih . gov/<x s l : v a l u e −of s e l e c t=”
DbName”/>?cmd=se arch& ; term=<x s l : v a l u e −of s e l e c t=” t r a n s l a t e (/ R e s u l t /Term
, ’ ’ , ’+ ’) ”/></ x s l : a t t r i b u t e>
20 <x s l : v a l u e −of s e l e c t=”DbName”/></a></ td>
21 <td><x s l : v a l u e −of s e l e c t=”Count”/></ td>
22 <td><x s l : v a l u e −of s e l e c t=” Status ”/></ td>
23 </ t r>
24 </ x s l : t e m p l a t e>
25
26 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
36. GQuery
Transforming to HTML
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode
=xml” |
x s l t p r o c gquery2html . x s l −
<html>
<body>
<t a b l e>
<caption>t y r a n n o s a u r u s rex</ caption>
<t r>
<th>Database</ th>
<th>Count</ th>
<th>Status</ th>
</ t r>
<t r>
<td>
<a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pubmed?cmd=s earch& ; term=t y r a n n o s a u r u s
</ td>
<td>41</ td>
<td>Ok</ td>
</ t r>
<t r>
<td>
<a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pmc?cmd=searc h& ; term=t y r a n n o s a u r u s+re
</ td>
<td>160</ td>
<td>Ok</ td>
</ t r>
<t r>
<td>
<a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/mesh?cmd=sea rch& ; term=t y r a n n o s a u r u s+r
</ td>
<td>15</ td>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
38. ESearch
Provides a list of UIDs matching a text query
Posts the results of a search on the History server
Downloads all UIDs from a dataset stored on the History
server
Combines or limits UID datasets stored on the History server
Sorts sets of UIDs
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
40. ESearch
Searching for ’Mammuthus primigenius’
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D” |
x m l l i n t −−format −
<e Sea rc hR esu lt>
<Count>684</Count>
<RetMax>20</RetMax>
<RetStart>0</ RetStart>
<I d L i s t>
<Id>507866428</ Id>
<Id>124056416</ Id>
<Id>383843869</ Id>
<Id>383843867</ Id>
<Id>383843865</ Id>
<Id>383843863</ Id>
<Id>383843861</ Id>
<Id>383843859</ Id>
<Id>383843857</ Id>
<Id>383843855</ Id>
<Id>383843853</ Id>
<Id>383843851</ Id>
<Id>383843849</ Id>
<Id>383843847</ Id>
<Id>383843845</ Id>
<Id>157367690</ Id>
<Id>157367676</ Id>
<Id>157367662</ Id>
<Id>157367648</ Id>
<Id>157367634</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t>
<T r a n s l a t i o n>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
41. ESearch
Searching for ’Mammuthus primigenius’ (JSON)
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e
&term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmode=j s o n ”
{
” header ”: {
” type ”: ” e s e a r c h ” ,
” v e r s i o n ”: ”0.3”
} ,
” e s e a r c h r e s u l t ”: {
” count ”: ”811” ,
” retmax ”: ”20” ,
” r e t s t a r t ”: ”0” ,
” i d l i s t ”: [
”1059791223” ,
”198241525” ,
”198241523” ,
”198241521” ,
”198241519” ,
”198241517” ,
”198241515” ,
”198241513” ,
”198241511” ,
”198241509” ,
”198241507” ,
”198241505” ,
”198241503” ,
”198241501” ,
”198241499” ,
”198241497” ,
”198241495” ,
”198241493” ,
”198241491” ,Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
42. ESearch
the retmax parameter
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=2” |
x m l l i n t −−format −
<e Sea rc hR esu lt>
<Count>684</Count>
<RetMax>2</RetMax>
<RetStart>0</ RetStart>
<I d L i s t>
<Id>507866428</ Id>
<Id>124056416</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t>
<T r a n s l a t i o n>
<From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From>
<To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n>
</ T r a n s l a t i o n S e t>
<T r a n s l a t i o n S t a c k>
<TermSet>
<Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term>
<F i e l d>Organism</ F i e l d>
<Count>684</Count>
<Explode>Y</ Explode>
</TermSet>
<OP>GROUP</OP>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
43. ESearch
the retstart parameter
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” |
x m l l i n t −−format −
<e Sea rc hR esu lt>
<Count>684</Count>
<RetMax>3</RetMax>
<RetStart>100</ RetStart>
<I d L i s t>
<Id>300810656</ Id>
<Id>300810655</ Id>
<Id>300810654</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t>
<T r a n s l a t i o n>
<From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From>
<To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n>
</ T r a n s l a t i o n S e t>
<T r a n s l a t i o n S t a c k>
<TermSet>
<Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term>
<F i e l d>Organism</ F i e l d>
<Count>684</Count>
<Explode>Y</ Explode>
</TermSet>
<OP>GROUP</OP>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
44. ESearch
rettype=retcount
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” |
x m l l i n t −−format −
<eSearchResult>
<Count>684</Count>
</ eSearchResult>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
45. ESearch
sort=Date Released
c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=
n u c l e o t i d e&term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&s o r t=Date+Released ”
x m l l i n t −−format −
<eSearchResult><Count>811</Count><RetMax>20</RetMax>
<Id>1033204644</ Id>
<Id>1033204658</ Id>
<Id>1033204672</ Id>
<Id>1033204686</ Id>
<Id>1033204729</ Id>
<Id>1033204771</ Id>
<Id>1033204785</ Id>
<Id>1033204799</ Id>
<Id>1033204813</ Id>
<Id>1033204827</ Id>
<Id>1033204871</ Id>
<Id>1033205124</ Id>
<Id>1033205194</ Id>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
47. ESummary
Syntax
Returns document summaries (DocSums) for a list of input
UIDs
Returns DocSums for a set of UIDs stored on the Entrez
History server
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
49. ESummary
Retrieve nucleotide gi=507866428
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=
n u c l e o t i d e&i d =507866428”
<eSummaryResult>
<DocSum>
<Id>507866428</ Id>
<Item Name=” Caption ” Type=” S t r i n g ”>KC524742</ Item>
<Item Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb
<Item Name=” Extra ” Type=” S t r i n g ”>g i |507866428| gb | KC524742 . 1 | [ 5 0 7 8 6 6 4 2 8 ]</ Item>
<Item Name=” Gi ” Type=” I n t e g e r ”>507866428</ Item>
<Item Name=” CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item>
<Item Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item>
<Item Name=” Flags ” Type=” I n t e g e r ”>0</ Item>
<Item Name=” TaxId ” Type=” I n t e g e r ”>37349</ Item>
<Item Name=” Length ” Type=” I n t e g e r ”>9042</ Item>
<Item Name=” Status ” Type=” S t r i n g ”>l i v e</ Item>
<Item Name=” ReplacedBy ” Type=” S t r i n g ”></ Item>
<Item Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item>
</DocSum>
</ eSummaryResult>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
50. ESummary
Retrieve nucleotide gi=507866428 in JSON
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=
n u c l e o t i d e&i d =507866428& retmode=j s o n ”
{
” header ”: {
” type ”: ”esummary ” ,
” v e r s i o n ”: ”0.3”
} ,
” r e s u l t ”: {
” u i d s ”: [
”507866428”
] ,
”507866428”: {
” uid ”: ”507866428” ,
” c a p t i o n ”: ”KC524742 ” ,
” t i t l e ”: ”Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r
” e x t r a ”: ” g i |507866428| gb | KC524742 . 1 | ” ,
” g i ”: 507866428 ,
” c r e a t e d a t e ”: ”2013/06/15” ,
” updatedate ”: ”2013/06/21” ,
” f l a g s ”: ”” ,
” t a x i d ”: 37349 ,
” s l e n ”: 9042 ,
” biomol ”: ” genomic ” ,
” moltype ”: ”dna ” ,
” topology ”: ” l i n e a r ” ,
” sourcedb ”: ” i n s d ” ,
” s e g s e t s i z e ”: ”” ,
” p r o j e c t i d ”: ”0” ,
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
51. ESummary
Retrieve snp rs25
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d =25
”
<eSummaryResult>
<DocSum>
<Id>25</ Id>
<Item Name=”SNP ID” Type=” I n t e g e r ”>25</ Item>
<Item Name=”Organism” Type=” S t r i n g ”></ Item>
<Item Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item>
<Item Name=”GLOBAL MAF” Type=” S t r i n g ”>0.4913</ Item>
<Item Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item>
<Item Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item>
<Item Name=”SUSPECTED” Type=” S t r i n g ”></ Item>
<Item Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item>
<Item Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item>
<Item Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item>
<Item Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item>
<Item Name=”CHR” Type=” S t r i n g ”>7</ Item>
<Item Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item>
<Item Name=”HANDLE” Type=” S t r i n g ”>1000GENOMES, BGI , BL ,BUSHMAN,COMPLETE GENOMICS, CSHL−HAPM
<Item Name=”FXN CLASS” Type=” S t r i n g ”>intron−v a r i a n t</ Item>
<Item Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−frequency , by−hapmap</ Item>
<Item Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item>
<Item Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item>
<Item Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 :g .11584142T> ; C, NG 027670 .1 :g .29268
<Item Name=”HET” Type=” I n t e g e r ”>50</ Item>
<Item Name=”SRATE” Type=” I n t e g e r ”>0</ Item>
<Item Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item>
<Item Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17|11574141|11584142|THSD7A|0
<Item Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item>
<Item Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item>
<Item Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
52. ESummary
Retrieve pubmed pmid=7939126
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed&
i d =7939126”
<eSummaryResult>
<DocSum>
<Id>7939126</ Id>
<Item Name=”PubDate” Type=”Date”>1994 Apr</ Item>
<Item Name=”EPubDate” Type=”Date”></ Item>
<Item Name=” Source ” Type=” S t r i n g ”>Sleep</ Item>
<Item Name=” A u t h o r L i s t ” Type=” L i s t ”>
<Item Name=” Author ” Type=” S t r i n g ”>Broughton R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Cartwright R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Doucette D</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Edmeads J</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Edwardh M</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Ervin F</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Orchard B</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>H i l l R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>T u r r e l l G</ Item>
</ Item>
<Item Name=” LastAuthor ” Type=” S t r i n g ”>T u r r e l l G</ Item>
<Item Name=” T i t l e ” Type=” S t r i n g ”>Homicidal somnambulism: a case r e p o r t .</ Item>
<Item Name=”Volume” Type=” S t r i n g ”>17</ Item>
<Item Name=” I s s u e ” Type=” S t r i n g ”>3</ Item>
<Item Name=” Pages ” Type=” S t r i n g ”>253−64</ Item>
<Item Name=” LangList ” Type=” L i s t ”>
<Item Name=”Lang” Type=” S t r i n g ”>E n g l i s h</ Item>
</ Item>
<Item Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item>
<Item Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item>
<Item Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
55. EFetch
Retrieve nucleotide gi=507866428 as ASN.1
Default https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
efetch.fcgi?db=nucleotide&id=507866428
Seq−e n t r y ::= set {
c l a s s nuc−prot ,
d e s c r {
source {
genome genomic ,
org {
taxname ”Mammuthus p r i m i g e n i u s ” ,
common ” woolly mammoth” ,
db {
{
db ” taxon ” ,
tag
i d 37349 } } ,
orgname {
name
b i no m i al {
genus ”Mammuthus” ,
s p e c i e s ” p r i m i g e n i u s ” } ,
mod {
{
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
56. EFetch
Retrieve nucleotide gi=507866428 as Fasta
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=507866428&rettype=fasta
>g i |507866428| gb | KC524742 . 1 | Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin
(Mb) gene , p a r t i a l cds
GCACTTGCTTTTTTTGTCTTCTTCAGACCACGACATGGGACTCAGCGACGGGGAATGGGAGTTGGTGTTG
AAAACCTGGGGGAAAGTGGAGGCTGACATCCCGGGCCATGGGCTGGAAGTCTTCGTCAGGTAAAGGAAGA
AATCCTGTGGCCCCCATCACCCACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
57. EFetch
Retrieve nucleotide gi=507866428 as TinySeq
https:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=nucleotide&id=507866428&rettype=fasta&retmode=xml
<?xml v e r s i o n=” 1.0 ”?>
<!DOCTYPE TSeqSet PUBLIC ”−//NCBI//NCBI TSeq/EN”
<TSeqSet>
<TSeq>
<TSeq seqtype v a l u e=” n u c l e o t i d e ”/>
<TSeq gi>507866428</ TSeq gi>
<TSeq accver>KC524742 .1</ TSeq accver>
<TSeq taxid>37349</ TSeq taxid>
<TSeq orgname>Mammuthus p r i m i g e n i u s</TSeq orgnam
<T S e q d e f l i n e>Mammuthus p r i m i g e n i u s i s o l a t e CME2
<TSeq length>9042</ TSeq length>
<TSeq sequence>GCACTTGCTTTTTTTGTCTTCTTCAGACCACGA
</TSeq>
</TSeqSet>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
58. EFetch
Retrieve nucleotide gi=507866428 as Genbank-xml
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=507866428&retmode=xml
<GBSeq>
<GBSeq locus>KC524742</ GBSeq locus>
<GBSeq length>9042</ GBSeq length>
<GBSeq strandedness>double</ GBSeq strandedness>
<GBSeq moltype>DNA</GBSeq moltype>
<GBSeq topology>l i n e a r</ GBSeq topology>
<GBSeq division>MAM</ GBSeq division>
<GBSeq update−date>21−JUN−2013</GBSeq update−date>
<GBSeq create−date>15−JUN−2013</ GBSeq create−date>
<G B S e q d e f i n i t i o n>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i
<GBSeq primary−a c c e s s i o n>KC524742</ GBSeq primary−a c c e s s i o n>
<GBSeq accession−v e r s i o n>KC524742 .1</ GBSeq accession−v e r s i o n>
<GBSeq other−s e q i d s>
<GBSeqid>gb | KC524742 . 1 |</GBSeqid>
<GBSeqid>g i |507866428</GBSeqid>
</ GBSeq other−s e q i d s>
<GBSeq source>Mammuthus p r i m i g e n i u s ( woolly mammoth)</ GBSeq source>
<GBSeq organism>Mammuthus p r i m i g e n i u s</ GBSeq organism>
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
59. EFetch
Retrieve nucleotide gi=507866428 as Genbank
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=507866428&rettype=gb
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013
DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene ,
p a r t i a l cds .
ACCESSION KC524742
VERSION KC524742 .1 GI :507866428
KEYWORDS .
SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth)
ORGANISM Mammuthus p r i m i g e n i u s
Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ;
Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ;
Mammuthus .
REFERENCE 1 ( bases 1 to 9042)
AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L .
and Berenbrink ,M.
TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net
s u r f a c e charge
JOURNAL Science 340 (6138) , 1234192 (2013)
PUBMED 23766330
REFERENCE 2 ( bases 1 to 9042)
AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N.
TITLE D i r e c t Submission
JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada
COMMENT ##Assembly−Data−START##
Sequencing Technology : : Sanger dideoxy sequencing
##Assembly−Data−END##
FEATURES Location / Q u a l i f i e r s
source 1 . . 9 0 4 2
/ organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
60. EFetch
Efetch works with the ACCESSION NUMBERS
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=KC524742&rettype=gb
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013
DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene ,
p a r t i a l cds .
ACCESSION KC524742
VERSION KC524742 .1 GI :507866428
KEYWORDS .
SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth)
ORGANISM Mammuthus p r i m i g e n i u s
Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ;
Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ;
Mammuthus .
REFERENCE 1 ( bases 1 to 9042)
AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L .
and Berenbrink ,M.
TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net
s u r f a c e charge
JOURNAL Science 340 (6138) , 1234192 (2013)
PUBMED 23766330
REFERENCE 2 ( bases 1 to 9042)
AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N.
TITLE D i r e c t Submission
JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada
COMMENT ##Assembly−Data−START##
Sequencing Technology : : Sanger dideoxy sequencing
##Assembly−Data−END##
FEATURES Location / Q u a l i f i e r s
source 1 . . 9 0 4 2
/ organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
61. EFetch
Using the WebEnv parameter.
Web environment string returned from a previous ESearch, EPost
or ELink call. When provided, ESearch will post the results of the
search operation to this pre-existing WebEnv.
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
62. EFetch
Using the WebEnv parameter.
Searching extinct species in the NCBI taxonomy (’extinct[PROP]’)
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db=
taxonomy&term=e x t i n c t%5BPROP%5D”
<e Sea rc hR esu lt>
<Count>145</Count>
<RetMax>20</RetMax>
<RetStart>0</ RetStart>
<QueryKey>1</QueryKey>
<WebEnv>NCID 1 75550312 130.14.18.34 9001 1375948145 325582538</WebEnv>
<I d L i s t>
<Id>1225531</ Id>
<Id>1225530</ Id>
<Id>1211276</ Id>
<Id>1211275</ Id>
<Id>1027716</ Id>
<Id>948961</ Id>
<Id>943952</ Id>
<Id>867394</ Id>
<Id>867393</ Id>
<Id>748142</ Id>
<Id>748141</ Id>
<Id>741158</ Id>
<Id>703576</ Id>
<Id>703571</ Id>
<Id>703559</ Id>
<Id>693865</ Id>
<Id>686441</ Id>
<Id>665113</ Id>
<Id>659069</ Id>
<Id>656807</ Id>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
63. EFetch
Using the WebEnv parameter.
Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’)
using the WebEnv parameter.
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy&
query key=1&WebEnv=NCID 1 75550312 130.14.18.34 9001 1375948145 325582538&retmode=xml”
<TaxaSet><Taxon>
<TaxId>1225531</ TaxId>
<S c i e n t i f i c N a m e>Equus ovodovi</ S c i e n t i f i c N a m e>
<OtherNames>
<Synonym>Equus ( Sussemionus ) ovodovi</Synonym>
<Name>
<ClassCDE>a u t h o r i t y</ClassCDE>
<DispName>Equus ovodovi Eisenmann & ; Sergej , 2011</DispName>
</Name>
</OtherNames>
<ParentTaxId>1225530</ ParentTaxId>
<Rank>s p e c i e s</Rank>
<D i v i s i o n>Mammals</ D i v i s i o n>
<GeneticCode>
<GCId>1</GCId>
<GCName>Standard</GCName>
</ GeneticCode>
<MitoGeneticCode>
( . . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
65. EPost
Uploads a list of UIDs to the Entrez History server
Appends a list of UIDs to an existing set of UID lists attached
to a Web Environment
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
66. EPost
Post gi to epost
Get a list of gis of extincts animals:
wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=
taxonomy&term=e x t i n c t [PROP]& retmax =1000’ |
x m l l i n t −format − |
grep ’<Id >’ |
cut −d ’<’ −f 2 |
cut −d ’>’ −f 2|
t r ”n” ” , ”
output:
1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772 ,1607771 ,1607767 ,1607757 ,1607756
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
67. EPost
Post gi to epost
wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / epost . f c g i ?db=taxonomy&
WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 5
9001 1474637318 669113391 0MetA0 S MegaStore F 1&i d
=1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772... ”
Output:
<?xml v e r s i o n=” 1.0 ”?>
<!DOCTYPE ePostResult PUBLIC ”−//NLM//DTD ePostResult , 11 May 2002//EN” ” h t t p : //
www. ncbi . nlm . nih . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”>
<ePostResult>
<QueryKey>1</QueryKey>
<WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5
9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv>
</ ePostResult>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
68. EPost
Searching in the WebEnv
Search Homo Sapiens in WebEnv ?
c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&
term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4
9001 1375948145 325582538&query key=1”
<e Sea rc hR esu lt>
<Count>0</Count>
<RetMax>0</RetMax>
<RetStart>0</ RetStart>
<QueryKey>8</QueryKey>
<WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv>
<I d L i s t />
<T r a n s l a t i o n S e t />
<T r a n s l a t i o n S t a c k>
<OP>GROUP</OP>
<TermSet>
<Term>homo s a p i e n s [ A l l Names ]</Term>
<F i e l d>A l l Names</ F i e l d>
<Count>0</Count>
<Explode>N</ Explode>
</TermSet>
<OP>AND</OP>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>(#2) AND homo s a p i e n s [ A l l Names ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
69. EPost
Searching in the WebEnv
Search Tyranosaurus in WebEnv ?
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=
taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4
9001 1375948145 325582538&query key=1”
<e Sea rc hR esu lt>
<Count>1</Count>
<RetMax>1</RetMax>
<RetStart>0</ RetStart>
<QueryKey>9</QueryKey>
<WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv>
<I d L i s t>
<Id>436494</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t />
<T r a n s l a t i o n S t a c k>
<OP>GROUP</OP>
<TermSet>
<Term>Tyrannosaurus [ A l l Names ]</Term>
<F i e l d>A l l Names</ F i e l d>
<Count>1</Count>
<Explode>N</ Explode>
</TermSet>
<OP>AND</OP>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>(#2) AND Tyrannosaurus [ A l l Names ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
71. Piping Edirect
esearch −db taxonomy −query ” Tyrannosaurus ” |
e f e t c h −format xml
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
72. Piping Edirect
esearch −db pubmed −query ” Tyrannosaurus ” |
e f i l t e r −mindate 2005 |
e f e t c h −format docsum |
x t r a c t −pattern DocumentSummary
−element MedlineCitation /PMID
−element Id S o r t F i r s t A u t h o r
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
74. Elink
Returns UIDs linked to an input set of UIDs in either the
same or a different Entrez database
Returns UIDs linked to other UIDs in the same Entrez
database that match an Entrez query
Checks for the existence of Entrez links for a set of UIDs
within the same database
Lists the available links for a UID
Lists LinkOut URLs and attributes for a set of UIDs
Lists hyperlinks to primary LinkOut providers for a set of UIDs
Creates hyperlinks to the primary LinkOut provider for a single
UID
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
76. ELink
Searching the pubmed records associated to sequence gi:507866428
h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=n u c l e o t i d e&db=
pubmed&i d =507866428&cmd=n e i g h b o r s c o r e
<e L i n k R e s u l t>
<LinkSet>
<DbFrom>nuccore</DbFrom>
<I d L i s t>
<Id>507866428</ Id>
</ I d L i s t>
<LinkSetDb>
<DbTo>pubmed</DbTo>
<LinkName>nuccore pubmed</LinkName>
<Link>
<Id>23766330</ Id>
<Score>0</ Score>
</ Link>
</ LinkSetDb>
</ LinkSet>
</ e L i n k R e s u l t>
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
i d =23766330& r e t t y p e=medline&retmode=t e x t ”
PMID− 23766330
TI − E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e
charge .
PG − 1234192
LID − 10.1126/ s c i e n c e .1234192 [ doi ]
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
78. Efetch
Transforming to SVG
Using the stylesheet
https://github.com/lindenb/xslt-sandbox/blob/master/
stylesheets/bio/ncbi/gb2svg.xsl
x s l t p r o c <( c u r l ” h t t p s :// raw . github . com/ l i n d e n b / x s l t −sandbox / master / s t y l e s h e e t s
/ bio / ncbi / gb2svg . x s l ” )
” h t t p s ://www. ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=n u c l e o t i d e&i d
=14971102& retmode=xml&r e t t y p e=gbc”
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
79. Efetch
Transforming to SVG
1 <?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?>
2 <s v g : s v g xmlns:svg=” h t t p : //www. w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=”
stroke−width:1px ; ”>
3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e>
4 <s v g : d e f s>
5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”>
6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” black ”/>
7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=” whitesmoke ”/>
8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” black ”/>
9 </ s v g : l i n e a r G r a d i e n t>
10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=”
v e r t i c a l b o d y g r a d i e n t ”>
11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” white ”/>
12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/>
13 </ s v g : l i n e a r G r a d i e n t>
14 </ s v g : d e f s>
15 <s v g : s t y l e type=” t e x t / c s s ”/>
16 <s v g : g>
17 <s v g : g transform=” t r a n s l a t e (0 ,0) ”>
18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (#
v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” black ”/>
19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s
segment 7 NSP3 gene , complete cds</ s v g : t e x t>
20 <s v g : g>
21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ;
s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 0 7 4 ”/>
22 <s v g : t e x t y=”54” x=”460” text−anchor=” middle ”><s v g : t s p a n s t y l e=” font−
w e i g h t : b o l d ; ”>source</ s v g : t s p a n><s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3
. org /2001/XMLSchema−i n s t a n c e ” x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/
x l i n k ” font−weight=” bold ”>organism</ s v g : t s p a n>:Human r o t a v i r u s A <
s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3 . org /2001/XMLSchema−i n s t a n c e ”
x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/ x l i n k ” font−weight=” bold ”>
mol type</ s v g : t s p a n>:genomic RNA <s v g : t s p a n x m l n s : x s i=” h t t p : //www.Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
80. Efetch
Transforming to SVG
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
81. Efetch
Transforming to R
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed&
term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | x m l l i n t −−format −
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5
9001 1375957034 1619786167&query key=1&retmode=xml”
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
82. Efetch
Transforming to R
1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?>
2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’>
3 <x s l : o u t p u t method=” t e x t ”/>
4
5
6 <x s l : t e m p l a t e match=”/”>
7 date2count &l t ;− l i s t ()
8 <x s l : a p p l y −templates s e l e c t=”/ PubmedArticleSet / PubmedArticle [ M e d l i n e C i t a t i o n /
DateCreated / Year ] ”/>
9 df &l t ;− data . frame (
10 Year=as . i n t e g e r ( names ( date2count ) ) ,
11 Count=u n l i s t ( date2count )
12 )
13 png ( ’ jeterpubmed . png ’ )
14 p l o t ( df )
15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( year ) ’ )
16 dev . o f f ()
17 </ x s l : t e m p l a t e>
18
19 <x s l : t e m p l a t e match=” PubmedArticle ”>
20 <x s l : v a r i a b l e name=” year ” s e l e c t=” M e d l i n e C i t a t i o n / DateCreated / Year ”/>
21 date2count [ [ ”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] &l t ;− i f e l s e ( i s . n u l l ( date2count [ [
”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e −of s e l e c t=”
$ year ”/>” ] ] )
22 </ x s l : t e m p l a t e>
23
24 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
83. Efetch
Transforming to R
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5
9001 1375957034 1619786167&query key=1&retmode=xml” |
x s l t p r o c pubmed2rstats . x s l −
date2count <− l i s t ()
date2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ”
2013” ] ] )
date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”
2012” ] ] )
date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”
2012” ] ] )
date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”
2011” ] ] )
date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”
2011” ] ] )
( . . )
df <− data . frame (
Year=as . i n t e g e r ( names ( date2count ) ) ,
Count=u n l i s t ( date2count )
)
png ( ’ jeterpubmed . png ’ )
p l o t ( df )
t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( year ) ’ )
dev . o f f ()
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
84. Efetch
Transforming to R
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5
9001 1375957034 1619786167&query key=1&retmode=xml” |
x s l t p r o c pubmed2rstats . x s l − |
R −−no−save
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
85. Generating a JAVA parser
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
86. Using the XML schema
XML Schema for dbSNP
ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_3.4.xsd
<?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?>
<xsd:schema xmlns:xsd=” h t t p : //www. w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncbi . nlm . nih .
ementFormDefault=” q u a l i f i e d ” a t t r i b u t e F o r m D e f a u l t=” u n q u a l i f i e d ”>
<x s d : e l e m e n t name=” ExchangeSet ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>Set of dbSNP refSNP docsums , v e r s i o n 3.4</ xsd:documentation>
</ x s d : a n n o t a t i o n>
<xsd:complexType>
<x s d : s e q u e n c e>
<x s d : e l e m e n t name=” SourceDatabase ” minOccurs=”0”>
<xsd:complexType>
<x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>NCBI taxonomy ID f o r v a r i a t i o n</ xsd:documentation>
</ x s d : a n n o t a t i o n>
</ x s d : a t t r i b u t e>
<x s d : a t t r i b u t e name=” organism ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>common name f o r s p e c i e s used as part of database name
</ x s d : a n n o t a t i o n>
</ x s d : a t t r i b u t e>
<x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>organism a b b r e v i a t i o n used i n dbSNP . </ xsd:documentat
</ x s d : a n n o t a t i o n>
</ x s d : a t t r i b u t e>
<x s d : a t t r i b u t e name=” gpipeOrgAbbr ” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>organism a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
87. Using the XML schema
Compiling the XML Schema for dbSNP with XJC
$ x j c −d . ” f t p :// f t p . ncbi . nlm . nih . gov/ snp / specs /docsum 3 . 4 . xsd ”
p a r s i n g a schema . . .
comp iling a schema . . .
h t t p s / www ncbi nlm nih gov / snp /docsum/ Assay . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ Assembly . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/BaseURL . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/Component . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ ExchangeSet . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ FxnSet . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/MapLoc . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ ObjectFactory . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ PrimarySequence . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/Rs . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ RsLinkout . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ RsStruct . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/Ss . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ package−i n f o . j a v a
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
88. Using the XML schema
Compiling the XML Schema for dbSNP with XJC
Search the non-genomic rs# in dbSNP.
1 import h t t p s . www ncbi nlm nih gov . snp . docsum . ∗ ;
2 import j a va x . xml . bind . ∗ ;
3 import j a va x . xml . stream . ∗ ;
4 import j a va x . xml . stream . even ts . ∗ ;
5 c l a s s ParseDbSnp
6 {
7 p u b l i c s t a t i c void main ( S t r i n g [ ] args ) throws Exception
8 {
9 JAXBContext jaxbCtxt=JAXBContext . newInstance ( ” h t t p s . www ncbi nlm nih gov
. snp . docsum” ) ;
10 Unmarshaller u n m a r s h a l l e r=jaxbCtxt . c r e a t e U n m a r s h a l l e r () ;
11 XMLInputFactory i f a c t o r y = XMLInputFactory . newInstance () ;
12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ;
13 while ( r . hasNext () )
14 {
15 XMLEvent evt=r . peek () ;
16 i f ( ! ( evt . i s S t a r t E l e m e n t () && evt . asStartElement () . getName () .
g e t L o c a l P a r t () . e q u a l s ( ”Rs” ) ) )
17 {
18 evt=r . nextEvent () ;
19 continue ;
20 }
21
22 Rs r s=u n m a r s h a l l e r . unmarshal ( r , Rs . c l a s s ) . getValue () ;
23 i f ( ” genomic ” . e q u a l s ( r s . getMolType () ) ) continue ;
24 System . out . p r i n t l n ( ” r s ”+r s . getRsId ()+” ”+r s . getMolType () ) ;
25 }
26 r . c l o s e () ;
27 }
28 }
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
89. Using the XML schema
Compiling the XML Schema for dbSNP with XJC
compile...
$ j a v a c ParseDbSnp . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum /∗. j a v a
and run...
$ c u r l −s ” f t p :// f t p . ncbi . nih . gov/ snp / organisms /human 9606/XML/ ds ch1 . xml . gz” |
gunzip −c |
j a v a ParseDbSnp
rs701 cDNA
rs860 cDNA
rs861 cDNA
rs862 cDNA
rs863 cDNA
rs864 cDNA
rs865 cDNA
rs866 cDNA
rs877 cDNA
rs878 cDNA
rs879 cDNA
rs880 cDNA
rs882 cDNA
rs883 cDNA
rs884 cDNA
rs885 cDNA
rs886 cDNA
rs913 cDNA
rs945 cDNA
rs946 cDNA
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
90. NCBI EBot
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
92. NCBI EBot
Sample output
#!/ usr / bin / p e r l
( . . . )
# PUBLIC DOMAIN NOTICE
# N a t i o n a l Center f o r Biotechnology I n f o r m a t i o n
use LWP: : Simple ;
use LWP: : UserAgent ;
use Net : : FTP;
my $delay = 0;
my $maxdelay = 3;
my $base = ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /” ;
$params{email} = ”nobody@nowhere . com” ;
$params{db} = ” nuccore ” ;
$params{ t o o l } = ” ebot ” ;
$params{term} = ”Mammuthus+p r i m i g e n i u s [ORGN] ” ;
%params = e s e a r c h(%params ) ;
$params{retmode} = ”xml” ;
$params{ o u t f i l e } = ” r e s u l t . xml” ;
$params{ r e t t y p e } = ” n a t i v e ” ;
e f e t c h b a t c h (%params ) ;
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
94. Standalone Blast
Downloading
Standalone tools are available at ftp://ftp.ncbi.nlm.nih.gov/
blast/executables/blast+/LATEST/
#add BLAST to your path
export PATH=${PATH}:/ path / to / ncbi−blast −2.2.28+/ bin
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
95. Standalone Blast
Download a sample
apis mellifera proteins
c u r l −o p r o t e i n . fa . gz
” f t p :// f t p . ncbi . nih . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . fa . gz”
gunzip p r o t e i n . fa . gz
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
96. Standalone Blast
Create a Blast database with makeblastdb
Getting help...
$ makeblastdb −help
( . . . )
−dbtype <String , ‘ nucl ’ , ‘ prot ’>
Molecule type of t a r g e t db
−in <F i l e I n >
Input f i l e / database name
Default = ‘−’
−i n p u t t y p e <String , ‘ asn1 bin ’ , ‘ asn1 txt ’ , ‘ blast
Type of the data s p e c i f i e d in i n p u t f i l e
Default = ‘ fasta ’
( . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
97. Standalone Blast
Create a Blast database with makeblastdb
Create the BLAST database:
$ makeblastdb −in p r o t e i n . fa −dbtype prot
B u i l d i n g a new DB, c u r r e n t time : 09/02/2013 18:29:38
New DB name : p r o t e i n . fa
New DB t i t l e : p r o t e i n . fa
Sequence type : Protein
Keep Linkouts : T
Keep MBits : T
Maximum f i l e s i z e : 1000000000B
Adding sequences from FASTA; added 10570 sequences
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
98. Standalone Blast
Query a Blast database with blastp
Get help:
$ b l a s t p −help
( . . . )
−query <F i l e I n >
Input f i l e name
Default = ‘−’
−db <String >
BLAST database name
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
99. Standalone Blast
Blast human EIF4G1 gi:187956781
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&
r e t t y p e=f a s t a&i d =187956781” |
b l a s t p −db p r o t e i n . fa
Query= g i |187956781| gb | AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ]
( . . . )
Score E
Sequences producing s i g n i f i c a n t alignments : ( B i t s ) Value
g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49
g i |328779480| r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.017
g i |110762568| r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.018
( . . . )
> g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n
i n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Apis m e l l i f e r a ]
Length=899
Score = 189 b i t s (479) , Expect = 4e−49, Method : Compositional matrix a d j u s t .
I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%)
Query 717 KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSI 774
++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I
S b j c t 22 RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGI 73
Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−− 829
LNKLTP+ F +L + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L
S b j c t 74 LNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAA 133
Query 830 −MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLK 888
K E F LLL++C+ EFE E FE + DE EE
S b j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
100. Standalone Blast
Blast human EIF4G1 gi:187956781 , ouput XML
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&
r e t t y p e=f a s t a&i d =187956781” |
b l a s t p −db p r o t e i n . fa −outfmt 5
( . . . )
<H i t h s p s>
<Hsp>
<Hsp num>1</Hsp num>
<Hsp bit−s c o r e>189.119</ Hsp bit−s c o r e>
<Hsp score>479</ Hsp score>
<Hsp evalue>3.78314 e−49</ Hsp evalue>
<Hsp query−from>717</ Hsp query−from>
<Hsp query−to>1017</ Hsp query−to>
<Hsp hit−from>22</ Hsp hit−from>
<Hsp hit−to>319</ Hsp hit−to>
<Hsp query−frame>0</ Hsp query−frame>
<Hsp hit−frame>0</ Hsp hit−frame>
<H s p i d e n t i t y>115</ H s p i d e n t i t y>
<H s p p o s i t i v e>175</ H s p p o s i t i v e>
<Hsp gaps>39</ Hsp gaps>
<Hsp align−l e n>319</ Hsp align−l e n>
<Hsp qseq>KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQ
IARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLL−−−−−−−−KNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKK
<Hsp hseq>RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGILNKLTPEKFAKLS
VAKRKMLGNIKFIGELGKLGIVSETILHRCILQLLEKKRRRRSRGDTAEDIECLCQIMRTCGRILDSDKGRGLMDQYFKRMNSLAESRD
<Hsp midline>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F
+ + ++++ LKGVI LIFEKA+ EP +S YA +C+ L K E F LLL++C+ EFE
E FE + DE EE E
R +A+R+ LGNIKFIGEL KL +++E I+H C+++LL + E +ECLC+++ T G+ LD +K + MDQYF +M
+ + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp midline>
</Hsp>
( . . . )Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
102. NCBI URL-API Blast
https://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html
$ c u r l ” h t t p s ://www. ncbi . nlm . nih . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE
&DATABASE=nr&PROGRAM=b l a s t p&FILTER=L&HITLIST SZE=500”
( . . . )
<!−−QBlastInfoBegin
RID = 1NRYGX9K014
RTOE = 29
QBlastInfoEnd
−−>
( . . . )
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
103. The End
Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour