SlideShare a Scribd company logo
1 of 103
Download to read offline
Advanced NCBI.
The Entrez API
Pierre Lindenbaum
Institut du Thorax. Nantes. France
September 27, 2016
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
NCBI ? What about EBI, ENSEMBL, ...
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
What will be covered today? :
File formats...
EInfo, GQuery, ESearch , Esummary, EFetch..
processing XML answer with XSLT: HTML, SVG, R...
generating a java parser for dbSNP.
using standalone BLAST
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
c u r l ” http :// en . w i k i p e d i a . org / wiki /Main page”
wget −O − ” http :// en . w i k i p e d i a . org / wiki /Main page”
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
x s l t p r o c s t y l e s h e e t . x s l f i l e . xml > r e s u l t . xml
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992
DEFINITION Blue Whale heavy s a t e l l i t e DNA.
ACCESSION X53813 X17460
VERSION X53813 .1 GI :25
KEYWORDS s a t e l l i t e DNA.
SOURCE Balaenoptera musculus ( Blue whale )
ORGANISM Balaenoptera musculus
Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ;
Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;
M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera .
REFERENCE 1 ( bases 1 to 422)
AUTHORS Arnason ,U. and Widegren ,B.
TITLE Composition and chromosomal l o c a l i z a t i o n of cetacean h i g h l y
r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the blue whale ,
Balaenoptera musculus
JOURNAL Chromosoma 98 (5) , 323−329 (1989)
PUBMED 2612291
COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common cetacean component c l o n e s
and <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s .
FEATURES Location / Q u a l i f i e r s
source 1 . . 4 2 2
/ organism=”Balaenoptera musculus ”
/ mol type=”genomic DNA”
/ d b x r e f=”taxon :9771”
/ c l o n e =”7”
m i s c f e a t u r e 1 . . 4 2 2
/ note=”heavy s a t e l l i t e DNA”
1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t gPierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Seq−e n t r y ::= seq {
i d {
embl {
a c c e s s i o n ”X53813” ,
v e r s i o n 1 } ,
g i 25 } ,
d e s c r {
t i t l e ” Blue Whale heavy s a t e l l i t e DNA” ,
source {
org {
taxname ” Balaenoptera musculus ” ,
common ” Blue whale ” ,
db {
db ” taxon ” ,
i d 9771 } } ,
orgname {
b i no m i al {
genus ” Balaenoptera ” ,
s p e c i e s ” musculus ” } ,
l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; Craniata ; Ve r t e b r a t a ;
Euteleostomi ; Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;
M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera ” ,
gcode 1 ,
mgcode 2 ,
d i v ”MAM” } } ,
subtype {Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
ASN.1 (schema)
l o c u s V i s i b l e S t r i n g ,
l e n g t h INTEGER ,
s t r a n d e d n e s s V i s i b l e S t r i n g OPTIONAL ,
moltype V i s i b l e S t r i n g ,
topology V i s i b l e S t r i n g OPTIONAL ,
d i v i s i o n V i s i b l e S t r i n g ,
update−date V i s i b l e S t r i n g ,
create−date V i s i b l e S t r i n g OPTIONAL ,
update−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,
create−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,
d e f i n i t i o n V i s i b l e S t r i n g ,
primary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL ,
entry−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,
a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,
other−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL ,
secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL,
p r o j e c t V i s i b l e S t r i n g OPTIONAL ,
segment V i s i b l e S t r i n g OPTIONAL ,
source V i s i b l e S t r i n g OPTIONAL ,
organism V i s i b l e S t r i n g OPTIONAL ,
taxonomy V i s i b l e S t r i n g OPTIONAL ,
r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL ,
comment V i s i b l e S t r i n g OPTIONAL ,
comment−s e t SEQUENCE OF INSDComment OPTIONAL ,
struc−comments SEQUENCE OF INSDStrucComment OPTIONAL ,
primary V i s i b l e S t r i n g OPTIONAL ,
source−db V i s i b l e S t r i n g OPTIONAL ,Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
ASN.1 (tools)
Generate C++ data storage classes based on ASN.1 serialization
Convert data between ASN.1, XML and JSON formats.
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
<?xml v e r s i o n=” 1.0 ”?>
<!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncbi . nlm . nih . gov/ dtd /NCBI G
<GBSeq locus>X53813</ GBSeq locus>
<GBSeq length>422</ GBSeq length>
<GBSeq strandedness>double</ GBSeq strandedness>
<GBSeq moltype>DNA</GBSeq moltype>
<GBSeq topology>l i n e a r</ GBSeq topology>
<GBSeq division>MAM</ GBSeq division>
<GBSeq update−date>22−JUN−1992</GBSeq update−date>
<GBSeq create−date>13−JUL−1990</ GBSeq create−date>
<G B S e q d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ G B S e q d e f i n i t i o n>
<GBSeq primary−a c c e s s i o n>X53813</ GBSeq primary−a c c e s s i o n>
<GBSeq accession−v e r s i o n>X53813 .1</ GBSeq accession−v e r s i o n>
<GBSeq other−s e q i d s>
<GBSeqid>emb| X53813 . 1 |</GBSeqid>
<GBSeqid>g i |25</GBSeqid>
</ GBSeq other−s e q i d s>
<GBSeq secondary−a c c e s s i o n s>
</ GBSeq secondary−a c c e s s i o n s>
<GBSeq keywords>
<GBKeyword>s a t e l l i t e DNA</GBKeyword>
</GBSeq keywords>
<GBSeq source>Balaenoptera musculus ( Blue whale )</ GBSeq source>
<GBSeq organism>Balaenoptera musculus</ GBSeq organism>
<GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t eb r a t a ; Euteleostomi ; Mam
a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera</GBSeq taxonomy>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
GBSeq locus ,
GBSeq length ,
GBSeq strandedness ? ,
GBSeq moltype ,
GBSeq topology ? ,
GBSeq division ,
GBSeq update−date ,
GBSeq create−date ? ,
GBSeq update−r e l e a s e ? ,
GBSeq create−r e l e a s e ? ,
GBSeq definition ,
GBSeq primary−a c c e s s i o n ? ,
GBSeq entry−v e r s i o n ? ,
GBSeq accession−v e r s i o n ? ,
GBSeq other−s e q i d s ? ,
GBSeq secondary−a c c e s s i o n s ? ,
GBSeq project ? ,
GBSeq keywords ? ,
GBSeq segment ? ,
GBSeq source ? ,
GBSeq organism ? ,
GBSeq taxonomy ? ,
GBSeq references ? ,
GBSeq comment ? ,
GBSeq comment−s e t ? ,
GBSeq struc−comments ? ,
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
03-02-2016-phase-out-of-GI-numbers/ : ”NCBI is phasing
out sequence GIs - use Accession.Version instead!”
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Set of seven server-side programs that provide a stable interface to
the search, retrieval, and linking functions of the Entrez system,
using a fixed URL syntax.
The output provided by the E-Utilities is in XML format,
sometimes JSON, (...)
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Entrez Direct ”Entrez
Direct (EDirect) is an advanced method for accessing the NCBI’s
set of interconnected databases (publication, sequence, structure,
gene, variation, expression, etc.) from a UNIX terminal window.
Functions take search terms from command-line arguments.
Individual operations are combined to build multi-step queries.
Record retrieval and formatting normally complete the process.”
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Provides a list of the names of all valid Entrez databases.
Provides statistics for a single database, including lists of indexing
fields and available link names.
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Base URL:
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
XML Ouput
<e I n f o R e s u l t>
<DbName>p r o t e i n</DbName>
<DbName>n u c l e o t i d e</DbName>
<DbName>s t r u c t u r e</DbName>
<DbName>b i o p r o j e c t</DbName>
<DbName>b l a s t d b i n f o</DbName>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
JSON Ouput
” header ”: {
” type ”: ” e i n f o ” ,
” v e r s i o n ”: ”0.3”
} ,
” e i n f o r e s u l t ”: {
” d b l i s t ”: [
”pubmed” ,
” p r o t e i n ” ,
” nuccore ” ,
( . . . )
” unigene ” ,
” g e n c o l l ” ,
” gtr ”
}Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Return statistics for a given Entrez database:
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Statistics for Pubmed
<?xml v e r s i o n=” 1.0 ”?>
<e I n f o R e s u l t>
<D e s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ D e s c r i p t i o n>
<DbBuild>Build130805 −2117m.4</ DbBuild>
<LastUpdate>2013/08/06 08 :33</ LastUpdate>
<F i e l d L i s t>
( . . . )
<F i e l d>
<D e s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ D e s c r i p t i o n>
<IsDate>N</ IsDate>
<I s N u m e r i c a l>Y</ I s N u m e r i c a l>
<SingleToken>Y</ SingleToken>
<H i e r a r c h y>N</ H i e r a r c h y>
<IsHidden>Y</ IsHidden>
</ F i e l d>
<F i e l d>
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Statistics for Pubmed
” header ”: {
” type ”: ” e i n f o ” ,
” v e r s i o n ”: ”0.3”
} ,
” e i n f o r e s u l t ”: {
” d b i n f o ”: {
”dbname ”: ”pubmed ” ,
”menuname ”: ”PubMed” ,
” d e s c r i p t i o n ”: ”PubMed b i b l i o g r a p h i c r e c o r d ” ,
” d b b u i l d ”: ” Build160921 −2207m.6” ,
” count ”: ”26470199” ,
” l a s t u p d a t e ”: ”2016/09/22 16:32” ,
” f i e l d l i s t ”: [
”name ”: ”ALL” ,
” fullname ”: ” A l l F i e l d s ” ,
” d e s c r i p t i o n ”: ” A l l terms from a l l s e a r c h a b l e f i e l d s ” ,
” termcount ”: ”179424126” ,
” i s d a t e ”: ”N” ,
” i s n u m e r i c a l ”: ”N” ,
” s i n g l e t o k e n ”: ”N” ,
” h i e r a r c h y ”: ”N” ,
” i s h i d d e n ”: ”N”
} ,
”name ”: ”UID” ,
” fullname ”: ”UID” ,
” d e s c r i p t i o n ”: ” Unique number a s s i g n e d to p u b l i c a t i o n ” ,Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
With entrez-direct
$ e i n f o −dbs
$ e i n f o −db pubmed
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Provides the number of records retrieved in all Entrez databases by
a single text query.
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode
<R e s u l t>
<Term>t y r a n n o s a u r u s rex</Term>
Ok</ Status></ ResultItem>
/ Status></ ResultItem>
/ Status></ ResultItem>
Ok</ Status></ ResultItem>
Status>Ok</ Status></ ResultItem>
/ Status></ ResultItem>
or Database i s not found</ Status></ ResultItem>
<ResultItem><DbName>n c b i s e a r c h</DbName><MenuName/><Count>1</Count><
Status>Ok</ Status></ ResultItem>
Term or Database i s not found</ Status></ ResultItem>
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to HTML using XSLT
The XSLT stylesheet.
1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?>
2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’>
3 <x s l : o u t p u t method=” html ”/>
5 <x s l : t e m p l a t e match=”/”><html><body>
6 <x s l : a p p l y −templates s e l e c t=” R e s u l t ”/>
7 </body></ html></ x s l : t e m p l a t e>
9 <x s l : t e m p l a t e match=” R e s u l t ”>
10 <t a b l e><c a p t i o n><x s l : v a l u e −of s e l e c t=”Term”/></ c a p t i o n>
11 <t r><th>Database</ th><th>Count</ th><th>Status</ th></ t r>
12 <x s l : a p p l y −templates s e l e c t=” eGQueryResult / ResultItem ”/>
13 </ t a b l e>
14 </ x s l : t e m p l a t e>
16 <x s l : t e m p l a t e match=” ResultItem ”>
17 <t r>
18 <td><a>
19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncbi . nlm . nih . gov/<x s l : v a l u e −of s e l e c t=”
DbName”/>?cmd=se arch&amp ; term=<x s l : v a l u e −of s e l e c t=” t r a n s l a t e (/ R e s u l t /Term
, ’ ’ , ’+ ’) ”/></ x s l : a t t r i b u t e>
20 <x s l : v a l u e −of s e l e c t=”DbName”/></a></ td>
21 <td><x s l : v a l u e −of s e l e c t=”Count”/></ td>
22 <td><x s l : v a l u e −of s e l e c t=” Status ”/></ td>
23 </ t r>
24 </ x s l : t e m p l a t e>
26 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to HTML
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode
=xml” |
x s l t p r o c gquery2html . x s l −
<t a b l e>
<caption>t y r a n n o s a u r u s rex</ caption>
<t r>
<th>Database</ th>
<th>Count</ th>
<th>Status</ th>
</ t r>
<t r>
<a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pubmed?cmd=s earch&amp ; term=t y r a n n o s a u r u s
</ td>
<td>41</ td>
<td>Ok</ td>
</ t r>
<t r>
<a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pmc?cmd=searc h&amp ; term=t y r a n n o s a u r u s+re
</ td>
<td>160</ td>
<td>Ok</ td>
</ t r>
<t r>
<a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/mesh?cmd=sea rch&amp ; term=t y r a n n o s a u r u s+r
</ td>
<td>15</ td>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Provides a list of UIDs matching a text query
Posts the results of a search on the History server
Downloads all UIDs from a dataset stored on the History
Combines or limits UID datasets stored on the History server
Sorts sets of UIDs
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Base URL https:
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Searching for ’Mammuthus primigenius’
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D” |
x m l l i n t −−format −
<e Sea rc hR esu lt>
<RetStart>0</ RetStart>
<I d L i s t>
<Id>507866428</ Id>
<Id>124056416</ Id>
<Id>383843869</ Id>
<Id>383843867</ Id>
<Id>383843865</ Id>
<Id>383843863</ Id>
<Id>383843861</ Id>
<Id>383843859</ Id>
<Id>383843857</ Id>
<Id>383843855</ Id>
<Id>383843853</ Id>
<Id>383843851</ Id>
<Id>383843849</ Id>
<Id>383843847</ Id>
<Id>383843845</ Id>
<Id>157367690</ Id>
<Id>157367676</ Id>
<Id>157367662</ Id>
<Id>157367648</ Id>
<Id>157367634</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t>
<T r a n s l a t i o n>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Searching for ’Mammuthus primigenius’ (JSON)
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e
&term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmode=j s o n ”
” header ”: {
” type ”: ” e s e a r c h ” ,
” v e r s i o n ”: ”0.3”
} ,
” e s e a r c h r e s u l t ”: {
” count ”: ”811” ,
” retmax ”: ”20” ,
” r e t s t a r t ”: ”0” ,
” i d l i s t ”: [
”1059791223” ,
”198241525” ,
”198241523” ,
”198241521” ,
”198241519” ,
”198241517” ,
”198241515” ,
”198241513” ,
”198241511” ,
”198241509” ,
”198241507” ,
”198241505” ,
”198241503” ,
”198241501” ,
”198241499” ,
”198241497” ,
”198241495” ,
”198241493” ,
”198241491” ,Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
the retmax parameter
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=2” |
x m l l i n t −−format −
<e Sea rc hR esu lt>
<RetStart>0</ RetStart>
<I d L i s t>
<Id>507866428</ Id>
<Id>124056416</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t>
<T r a n s l a t i o n>
<From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From>
<To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n>
</ T r a n s l a t i o n S e t>
<T r a n s l a t i o n S t a c k>
<Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term>
<F i e l d>Organism</ F i e l d>
<Explode>Y</ Explode>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
the retstart parameter
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” |
x m l l i n t −−format −
<e Sea rc hR esu lt>
<RetStart>100</ RetStart>
<I d L i s t>
<Id>300810656</ Id>
<Id>300810655</ Id>
<Id>300810654</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t>
<T r a n s l a t i o n>
<From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From>
<To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n>
</ T r a n s l a t i o n S e t>
<T r a n s l a t i o n S t a c k>
<Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term>
<F i e l d>Organism</ F i e l d>
<Explode>Y</ Explode>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e&
term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” |
x m l l i n t −−format −
</ eSearchResult>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
sort=Date Released
c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=
n u c l e o t i d e&term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&s o r t=Date+Released ”
x m l l i n t −−format −
<Id>1033204644</ Id>
<Id>1033204658</ Id>
<Id>1033204672</ Id>
<Id>1033204686</ Id>
<Id>1033204729</ Id>
<Id>1033204771</ Id>
<Id>1033204785</ Id>
<Id>1033204799</ Id>
<Id>1033204813</ Id>
<Id>1033204827</ Id>
<Id>1033204871</ Id>
<Id>1033205124</ Id>
<Id>1033205194</ Id>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Returns document summaries (DocSums) for a list of input
Returns DocSums for a set of UIDs stored on the Entrez
History server
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Base URL:
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve nucleotide gi=507866428
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=
n u c l e o t i d e&i d =507866428”
<Id>507866428</ Id>
<Item Name=” Caption ” Type=” S t r i n g ”>KC524742</ Item>
<Item Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb
<Item Name=” Extra ” Type=” S t r i n g ”>g i |507866428| gb | KC524742 . 1 | [ 5 0 7 8 6 6 4 2 8 ]</ Item>
<Item Name=” Gi ” Type=” I n t e g e r ”>507866428</ Item>
<Item Name=” CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item>
<Item Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item>
<Item Name=” Flags ” Type=” I n t e g e r ”>0</ Item>
<Item Name=” TaxId ” Type=” I n t e g e r ”>37349</ Item>
<Item Name=” Length ” Type=” I n t e g e r ”>9042</ Item>
<Item Name=” Status ” Type=” S t r i n g ”>l i v e</ Item>
<Item Name=” ReplacedBy ” Type=” S t r i n g ”></ Item>
<Item Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item>
</ eSummaryResult>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve nucleotide gi=507866428 in JSON
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=
n u c l e o t i d e&i d =507866428& retmode=j s o n ”
” header ”: {
” type ”: ”esummary ” ,
” v e r s i o n ”: ”0.3”
} ,
” r e s u l t ”: {
” u i d s ”: [
] ,
”507866428”: {
” uid ”: ”507866428” ,
” c a p t i o n ”: ”KC524742 ” ,
” t i t l e ”: ”Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r
” e x t r a ”: ” g i |507866428| gb | KC524742 . 1 | ” ,
” g i ”: 507866428 ,
” c r e a t e d a t e ”: ”2013/06/15” ,
” updatedate ”: ”2013/06/21” ,
” f l a g s ”: ”” ,
” t a x i d ”: 37349 ,
” s l e n ”: 9042 ,
” biomol ”: ” genomic ” ,
” moltype ”: ”dna ” ,
” topology ”: ” l i n e a r ” ,
” sourcedb ”: ” i n s d ” ,
” s e g s e t s i z e ”: ”” ,
” p r o j e c t i d ”: ”0” ,
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve snp rs25
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d =25
<Id>25</ Id>
<Item Name=”SNP ID” Type=” I n t e g e r ”>25</ Item>
<Item Name=”Organism” Type=” S t r i n g ”></ Item>
<Item Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item>
<Item Name=”GLOBAL MAF” Type=” S t r i n g ”>0.4913</ Item>
<Item Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item>
<Item Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item>
<Item Name=”SUSPECTED” Type=” S t r i n g ”></ Item>
<Item Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item>
<Item Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item>
<Item Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item>
<Item Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item>
<Item Name=”CHR” Type=” S t r i n g ”>7</ Item>
<Item Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item>
<Item Name=”FXN CLASS” Type=” S t r i n g ”>intron−v a r i a n t</ Item>
<Item Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−frequency , by−hapmap</ Item>
<Item Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item>
<Item Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item>
<Item Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 :g .11584142T&gt ; C, NG 027670 .1 :g .29268
<Item Name=”HET” Type=” I n t e g e r ”>50</ Item>
<Item Name=”SRATE” Type=” I n t e g e r ”>0</ Item>
<Item Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item>
<Item Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17|11574141|11584142|THSD7A|0
<Item Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item>
<Item Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item>
<Item Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve pubmed pmid=7939126
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed&
i d =7939126”
<Id>7939126</ Id>
<Item Name=”PubDate” Type=”Date”>1994 Apr</ Item>
<Item Name=”EPubDate” Type=”Date”></ Item>
<Item Name=” Source ” Type=” S t r i n g ”>Sleep</ Item>
<Item Name=” A u t h o r L i s t ” Type=” L i s t ”>
<Item Name=” Author ” Type=” S t r i n g ”>Broughton R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Cartwright R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Doucette D</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Edmeads J</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Edwardh M</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Ervin F</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>Orchard B</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>H i l l R</ Item>
<Item Name=” Author ” Type=” S t r i n g ”>T u r r e l l G</ Item>
</ Item>
<Item Name=” LastAuthor ” Type=” S t r i n g ”>T u r r e l l G</ Item>
<Item Name=” T i t l e ” Type=” S t r i n g ”>Homicidal somnambulism: a case r e p o r t .</ Item>
<Item Name=”Volume” Type=” S t r i n g ”>17</ Item>
<Item Name=” I s s u e ” Type=” S t r i n g ”>3</ Item>
<Item Name=” Pages ” Type=” S t r i n g ”>253−64</ Item>
<Item Name=” LangList ” Type=” L i s t ”>
<Item Name=”Lang” Type=” S t r i n g ”>E n g l i s h</ Item>
</ Item>
<Item Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item>
<Item Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item>
<Item Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Base URL:
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve nucleotide gi=507866428 as ASN.1
Seq−e n t r y ::= set {
c l a s s nuc−prot ,
d e s c r {
source {
genome genomic ,
org {
taxname ”Mammuthus p r i m i g e n i u s ” ,
common ” woolly mammoth” ,
db {
db ” taxon ” ,
i d 37349 } } ,
orgname {
b i no m i al {
genus ”Mammuthus” ,
s p e c i e s ” p r i m i g e n i u s ” } ,
mod {
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve nucleotide gi=507866428 as Fasta
>g i |507866428| gb | KC524742 . 1 | Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin
(Mb) gene , p a r t i a l cds
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve nucleotide gi=507866428 as TinySeq
<?xml v e r s i o n=” 1.0 ”?>
<TSeq seqtype v a l u e=” n u c l e o t i d e ”/>
<TSeq gi>507866428</ TSeq gi>
<TSeq accver>KC524742 .1</ TSeq accver>
<TSeq taxid>37349</ TSeq taxid>
<TSeq orgname>Mammuthus p r i m i g e n i u s</TSeq orgnam
<T S e q d e f l i n e>Mammuthus p r i m i g e n i u s i s o l a t e CME2
<TSeq length>9042</ TSeq length>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve nucleotide gi=507866428 as Genbank-xml
<GBSeq locus>KC524742</ GBSeq locus>
<GBSeq length>9042</ GBSeq length>
<GBSeq strandedness>double</ GBSeq strandedness>
<GBSeq moltype>DNA</GBSeq moltype>
<GBSeq topology>l i n e a r</ GBSeq topology>
<GBSeq division>MAM</ GBSeq division>
<GBSeq update−date>21−JUN−2013</GBSeq update−date>
<GBSeq create−date>15−JUN−2013</ GBSeq create−date>
<G B S e q d e f i n i t i o n>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i
<GBSeq primary−a c c e s s i o n>KC524742</ GBSeq primary−a c c e s s i o n>
<GBSeq accession−v e r s i o n>KC524742 .1</ GBSeq accession−v e r s i o n>
<GBSeq other−s e q i d s>
<GBSeqid>gb | KC524742 . 1 |</GBSeqid>
<GBSeqid>g i |507866428</GBSeqid>
</ GBSeq other−s e q i d s>
<GBSeq source>Mammuthus p r i m i g e n i u s ( woolly mammoth)</ GBSeq source>
<GBSeq organism>Mammuthus p r i m i g e n i u s</ GBSeq organism>
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Retrieve nucleotide gi=507866428 as Genbank
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013
DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene ,
p a r t i a l cds .
VERSION KC524742 .1 GI :507866428
SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth)
ORGANISM Mammuthus p r i m i g e n i u s
Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ;
Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ;
Mammuthus .
REFERENCE 1 ( bases 1 to 9042)
AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L .
and Berenbrink ,M.
TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net
s u r f a c e charge
JOURNAL Science 340 (6138) , 1234192 (2013)
PUBMED 23766330
REFERENCE 2 ( bases 1 to 9042)
AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N.
TITLE D i r e c t Submission
JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada
COMMENT ##Assembly−Data−START##
Sequencing Technology : : Sanger dideoxy sequencing
FEATURES Location / Q u a l i f i e r s
source 1 . . 9 0 4 2
/ organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Efetch works with the ACCESSION NUMBERS
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013
DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene ,
p a r t i a l cds .
VERSION KC524742 .1 GI :507866428
SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth)
ORGANISM Mammuthus p r i m i g e n i u s
Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ;
Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ;
Mammuthus .
REFERENCE 1 ( bases 1 to 9042)
AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L .
and Berenbrink ,M.
TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net
s u r f a c e charge
JOURNAL Science 340 (6138) , 1234192 (2013)
PUBMED 23766330
REFERENCE 2 ( bases 1 to 9042)
AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N.
TITLE D i r e c t Submission
JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada
COMMENT ##Assembly−Data−START##
Sequencing Technology : : Sanger dideoxy sequencing
FEATURES Location / Q u a l i f i e r s
source 1 . . 9 0 4 2
/ organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Using the WebEnv parameter.
Web environment string returned from a previous ESearch, EPost
or ELink call. When provided, ESearch will post the results of the
search operation to this pre-existing WebEnv.
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Using the WebEnv parameter.
Searching extinct species in the NCBI taxonomy (’extinct[PROP]’)
c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db=
taxonomy&term=e x t i n c t%5BPROP%5D”
<e Sea rc hR esu lt>
<RetStart>0</ RetStart>
<WebEnv>NCID 1 75550312 9001 1375948145 325582538</WebEnv>
<I d L i s t>
<Id>1225531</ Id>
<Id>1225530</ Id>
<Id>1211276</ Id>
<Id>1211275</ Id>
<Id>1027716</ Id>
<Id>948961</ Id>
<Id>943952</ Id>
<Id>867394</ Id>
<Id>867393</ Id>
<Id>748142</ Id>
<Id>748141</ Id>
<Id>741158</ Id>
<Id>703576</ Id>
<Id>703571</ Id>
<Id>703559</ Id>
<Id>693865</ Id>
<Id>686441</ Id>
<Id>665113</ Id>
<Id>659069</ Id>
<Id>656807</ Id>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Using the WebEnv parameter.
Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’)
using the WebEnv parameter.
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy&
query key=1&WebEnv=NCID 1 75550312 9001 1375948145 325582538&retmode=xml”
<TaxId>1225531</ TaxId>
<S c i e n t i f i c N a m e>Equus ovodovi</ S c i e n t i f i c N a m e>
<Synonym>Equus ( Sussemionus ) ovodovi</Synonym>
<ClassCDE>a u t h o r i t y</ClassCDE>
<DispName>Equus ovodovi Eisenmann &amp ; Sergej , 2011</DispName>
<ParentTaxId>1225530</ ParentTaxId>
<Rank>s p e c i e s</Rank>
<D i v i s i o n>Mammals</ D i v i s i o n>
</ GeneticCode>
( . . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Uploads a list of UIDs to the Entrez History server
Appends a list of UIDs to an existing set of UID lists attached
to a Web Environment
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Post gi to epost
Get a list of gis of extincts animals:
wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=
taxonomy&term=e x t i n c t [PROP]& retmax =1000’ |
x m l l i n t −format − |
grep ’<Id >’ |
cut −d ’<’ −f 2 |
cut −d ’>’ −f 2|
t r ”n” ” , ”
1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772 ,1607771 ,1607767 ,1607757 ,1607756
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Post gi to epost
wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / epost . f c g i ?db=taxonomy&
WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 5
9001 1474637318 669113391 0MetA0 S MegaStore F 1&i d
=1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772... ”
<?xml v e r s i o n=” 1.0 ”?>
<!DOCTYPE ePostResult PUBLIC ”−//NLM//DTD ePostResult , 11 May 2002//EN” ” h t t p : //
www. ncbi . nlm . nih . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”>
<WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5
9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv>
</ ePostResult>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Searching in the WebEnv
Search Homo Sapiens in WebEnv ?
c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&
term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4
9001 1375948145 325582538&query key=1”
<e Sea rc hR esu lt>
<RetStart>0</ RetStart>
<WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv>
<I d L i s t />
<T r a n s l a t i o n S e t />
<T r a n s l a t i o n S t a c k>
<Term>homo s a p i e n s [ A l l Names ]</Term>
<F i e l d>A l l Names</ F i e l d>
<Explode>N</ Explode>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>(#2) AND homo s a p i e n s [ A l l Names ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Searching in the WebEnv
Search Tyranosaurus in WebEnv ?
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=
taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4
9001 1375948145 325582538&query key=1”
<e Sea rc hR esu lt>
<RetStart>0</ RetStart>
<WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv>
<I d L i s t>
<Id>436494</ Id>
</ I d L i s t>
<T r a n s l a t i o n S e t />
<T r a n s l a t i o n S t a c k>
<Term>Tyrannosaurus [ A l l Names ]</Term>
<F i e l d>A l l Names</ F i e l d>
<Explode>N</ Explode>
</ T r a n s l a t i o n S t a c k>
<QueryTranslation>(#2) AND Tyrannosaurus [ A l l Names ]</ QueryTranslation>
</ e Se ar ch Res ul t>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
EDirect: combining tools
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Piping Edirect
esearch −db taxonomy −query ” Tyrannosaurus ” |
e f e t c h −format xml
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Piping Edirect
esearch −db pubmed −query ” Tyrannosaurus ” |
e f i l t e r −mindate 2005 |
e f e t c h −format docsum |
x t r a c t −pattern DocumentSummary 
−element MedlineCitation /PMID 
−element Id S o r t F i r s t A u t h o r
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Returns UIDs linked to an input set of UIDs in either the
same or a different Entrez database
Returns UIDs linked to other UIDs in the same Entrez
database that match an Entrez query
Checks for the existence of Entrez links for a set of UIDs
within the same database
Lists the available links for a UID
Lists LinkOut URLs and attributes for a set of UIDs
Lists hyperlinks to primary LinkOut providers for a set of UIDs
Creates hyperlinks to the primary LinkOut provider for a single
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Base URL:
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Searching the pubmed records associated to sequence gi:507866428
h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=n u c l e o t i d e&db=
pubmed&i d =507866428&cmd=n e i g h b o r s c o r e
<e L i n k R e s u l t>
<I d L i s t>
<Id>507866428</ Id>
</ I d L i s t>
<LinkName>nuccore pubmed</LinkName>
<Id>23766330</ Id>
<Score>0</ Score>
</ Link>
</ LinkSetDb>
</ LinkSet>
</ e L i n k R e s u l t>
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
i d =23766330& r e t t y p e=medline&retmode=t e x t ”
PMID− 23766330
TI − E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e
charge .
PG − 1234192
LID − 10.1126/ s c i e n c e .1234192 [ doi ]
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to SVG
Using the stylesheet
x s l t p r o c <( c u r l ” h t t p s :// raw . github . com/ l i n d e n b / x s l t −sandbox / master / s t y l e s h e e t s
/ bio / ncbi / gb2svg . x s l ” ) 
” h t t p s ://www. ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=n u c l e o t i d e&i d
=14971102& retmode=xml&r e t t y p e=gbc”
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to SVG
1 <?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?>
2 <s v g : s v g xmlns:svg=” h t t p : //www. w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=”
stroke−width:1px ; ”>
3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e>
4 <s v g : d e f s>
5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”>
6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” black ”/>
7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=” whitesmoke ”/>
8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” black ”/>
9 </ s v g : l i n e a r G r a d i e n t>
10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=”
v e r t i c a l b o d y g r a d i e n t ”>
11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” white ”/>
12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/>
13 </ s v g : l i n e a r G r a d i e n t>
14 </ s v g : d e f s>
15 <s v g : s t y l e type=” t e x t / c s s ”/>
16 <s v g : g>
17 <s v g : g transform=” t r a n s l a t e (0 ,0) ”>
18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (#
v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” black ”/>
19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s
segment 7 NSP3 gene , complete cds</ s v g : t e x t>
20 <s v g : g>
21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ;
s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 0 7 4 ”/>
22 <s v g : t e x t y=”54” x=”460” text−anchor=” middle ”><s v g : t s p a n s t y l e=” font−
w e i g h t : b o l d ; ”>source</ s v g : t s p a n><s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3
. org /2001/XMLSchema−i n s t a n c e ” x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/
x l i n k ” font−weight=” bold ”>organism</ s v g : t s p a n>:Human r o t a v i r u s A <
s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3 . org /2001/XMLSchema−i n s t a n c e ”
x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/ x l i n k ” font−weight=” bold ”>
mol type</ s v g : t s p a n>:genomic RNA <s v g : t s p a n x m l n s : x s i=” h t t p : //www.Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to SVG
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to R
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed&
term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | x m l l i n t −−format −
$ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5
9001 1375957034 1619786167&query key=1&retmode=xml”
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to R
1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?>
2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’>
3 <x s l : o u t p u t method=” t e x t ”/>
6 <x s l : t e m p l a t e match=”/”>
7 date2count &l t ;− l i s t ()
8 <x s l : a p p l y −templates s e l e c t=”/ PubmedArticleSet / PubmedArticle [ M e d l i n e C i t a t i o n /
DateCreated / Year ] ”/>
9 df &l t ;− data . frame (
10 Year=as . i n t e g e r ( names ( date2count ) ) ,
11 Count=u n l i s t ( date2count )
12 )
13 png ( ’ jeterpubmed . png ’ )
14 p l o t ( df )
15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( year ) ’ )
16 dev . o f f ()
17 </ x s l : t e m p l a t e>
19 <x s l : t e m p l a t e match=” PubmedArticle ”>
20 <x s l : v a r i a b l e name=” year ” s e l e c t=” M e d l i n e C i t a t i o n / DateCreated / Year ”/>
21 date2count [ [ ”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] &l t ;− i f e l s e ( i s . n u l l ( date2count [ [
”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e −of s e l e c t=”
$ year ”/>” ] ] )
22 </ x s l : t e m p l a t e>
24 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to R
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5
9001 1375957034 1619786167&query key=1&retmode=xml” |
x s l t p r o c pubmed2rstats . x s l −
date2count <− l i s t ()
date2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ”
2013” ] ] )
date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”
2012” ] ] )
date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”
2012” ] ] )
date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”
2011” ] ] )
date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”
2011” ] ] )
( . . )
df <− data . frame (
Year=as . i n t e g e r ( names ( date2count ) ) ,
Count=u n l i s t ( date2count )
png ( ’ jeterpubmed . png ’ )
p l o t ( df )
t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( year ) ’ )
dev . o f f ()
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Transforming to R
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&
u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5
9001 1375957034 1619786167&query key=1&retmode=xml” |
x s l t p r o c pubmed2rstats . x s l − |
R −−no−save
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Generating a JAVA parser
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Using the XML schema
XML Schema for dbSNP
<?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?>
<xsd:schema xmlns:xsd=” h t t p : //www. w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncbi . nlm . nih .
ementFormDefault=” q u a l i f i e d ” a t t r i b u t e F o r m D e f a u l t=” u n q u a l i f i e d ”>
<x s d : e l e m e n t name=” ExchangeSet ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>Set of dbSNP refSNP docsums , v e r s i o n 3.4</ xsd:documentation>
</ x s d : a n n o t a t i o n>
<x s d : s e q u e n c e>
<x s d : e l e m e n t name=” SourceDatabase ” minOccurs=”0”>
<x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>NCBI taxonomy ID f o r v a r i a t i o n</ xsd:documentation>
</ x s d : a n n o t a t i o n>
</ x s d : a t t r i b u t e>
<x s d : a t t r i b u t e name=” organism ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>common name f o r s p e c i e s used as part of database name
</ x s d : a n n o t a t i o n>
</ x s d : a t t r i b u t e>
<x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>organism a b b r e v i a t i o n used i n dbSNP . </ xsd:documentat
</ x s d : a n n o t a t i o n>
</ x s d : a t t r i b u t e>
<x s d : a t t r i b u t e name=” gpipeOrgAbbr ” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n>
<xsd:documentation>organism a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Using the XML schema
Compiling the XML Schema for dbSNP with XJC
$ x j c −d . ” f t p :// f t p . ncbi . nlm . nih . gov/ snp / specs /docsum 3 . 4 . xsd ”
p a r s i n g a schema . . .
comp iling a schema . . .
h t t p s / www ncbi nlm nih gov / snp /docsum/ Assay . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ Assembly . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/BaseURL . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/Component . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ ExchangeSet . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ FxnSet . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/MapLoc . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ ObjectFactory . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ PrimarySequence . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/Rs . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ RsLinkout . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ RsStruct . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/Ss . j a v a
h t t p s / www ncbi nlm nih gov / snp /docsum/ package−i n f o . j a v a
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Using the XML schema
Compiling the XML Schema for dbSNP with XJC
Search the non-genomic rs# in dbSNP.
1 import h t t p s . www ncbi nlm nih gov . snp . docsum . ∗ ;
2 import j a va x . xml . bind . ∗ ;
3 import j a va x . xml . stream . ∗ ;
4 import j a va x . xml . stream . even ts . ∗ ;
5 c l a s s ParseDbSnp
6 {
7 p u b l i c s t a t i c void main ( S t r i n g [ ] args ) throws Exception
8 {
9 JAXBContext jaxbCtxt=JAXBContext . newInstance ( ” h t t p s . www ncbi nlm nih gov
. snp . docsum” ) ;
10 Unmarshaller u n m a r s h a l l e r=jaxbCtxt . c r e a t e U n m a r s h a l l e r () ;
11 XMLInputFactory i f a c t o r y = XMLInputFactory . newInstance () ;
12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ;
13 while ( r . hasNext () )
14 {
15 XMLEvent evt=r . peek () ;
16 i f ( ! ( evt . i s S t a r t E l e m e n t () && evt . asStartElement () . getName () .
g e t L o c a l P a r t () . e q u a l s ( ”Rs” ) ) )
17 {
18 evt=r . nextEvent () ;
19 continue ;
20 }
22 Rs r s=u n m a r s h a l l e r . unmarshal ( r , Rs . c l a s s ) . getValue () ;
23 i f ( ” genomic ” . e q u a l s ( r s . getMolType () ) ) continue ;
24 System . out . p r i n t l n ( ” r s ”+r s . getRsId ()+” ”+r s . getMolType () ) ;
25 }
26 r . c l o s e () ;
27 }
28 }
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Using the XML schema
Compiling the XML Schema for dbSNP with XJC
$ j a v a c ParseDbSnp . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum /∗. j a v a
and run...
$ c u r l −s ” f t p :// f t p . ncbi . nih . gov/ snp / organisms /human 9606/XML/ ds ch1 . xml . gz” |
gunzip −c |
j a v a ParseDbSnp
rs701 cDNA
rs860 cDNA
rs861 cDNA
rs862 cDNA
rs863 cDNA
rs864 cDNA
rs865 cDNA
rs866 cDNA
rs877 cDNA
rs878 cDNA
rs879 cDNA
rs880 cDNA
rs882 cDNA
rs883 cDNA
rs884 cDNA
rs885 cDNA
rs886 cDNA
rs913 cDNA
rs945 cDNA
rs946 cDNA
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Sample output
#!/ usr / bin / p e r l
( . . . )
# N a t i o n a l Center f o r Biotechnology I n f o r m a t i o n
use LWP: : Simple ;
use LWP: : UserAgent ;
use Net : : FTP;
my $delay = 0;
my $maxdelay = 3;
my $base = ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /” ;
$params{email} = ”nobody@nowhere . com” ;
$params{db} = ” nuccore ” ;
$params{ t o o l } = ” ebot ” ;
$params{term} = ”Mammuthus+p r i m i g e n i u s [ORGN] ” ;
%params = e s e a r c h(%params ) ;
$params{retmode} = ”xml” ;
$params{ o u t f i l e } = ” r e s u l t . xml” ;
$params{ r e t t y p e } = ” n a t i v e ” ;
e f e t c h b a t c h (%params ) ;
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Standalone Blast
Standalone tools are available at
#add BLAST to your path
export PATH=${PATH}:/ path / to / ncbi−blast −2.2.28+/ bin
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Standalone Blast
Download a sample
apis mellifera proteins
c u r l −o p r o t e i n . fa . gz 
” f t p :// f t p . ncbi . nih . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . fa . gz”
gunzip p r o t e i n . fa . gz
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Standalone Blast
Create a Blast database with makeblastdb
Getting help...
$ makeblastdb −help
( . . . )
−dbtype <String , ‘ nucl ’ , ‘ prot ’>
Molecule type of t a r g e t db
−in <F i l e I n >
Input f i l e / database name
Default = ‘−’
−i n p u t t y p e <String , ‘ asn1 bin ’ , ‘ asn1 txt ’ , ‘ blast
Type of the data s p e c i f i e d in i n p u t f i l e
Default = ‘ fasta ’
( . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Standalone Blast
Create a Blast database with makeblastdb
Create the BLAST database:
$ makeblastdb −in p r o t e i n . fa −dbtype prot
B u i l d i n g a new DB, c u r r e n t time : 09/02/2013 18:29:38
New DB name : p r o t e i n . fa
New DB t i t l e : p r o t e i n . fa
Sequence type : Protein
Keep Linkouts : T
Keep MBits : T
Maximum f i l e s i z e : 1000000000B
Adding sequences from FASTA; added 10570 sequences
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Standalone Blast
Query a Blast database with blastp
Get help:
$ b l a s t p −help
( . . . )
−query <F i l e I n >
Input f i l e name
Default = ‘−’
−db <String >
BLAST database name
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Standalone Blast
Blast human EIF4G1 gi:187956781
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&
r e t t y p e=f a s t a&i d =187956781” |
b l a s t p −db p r o t e i n . fa
Query= g i |187956781| gb | AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ]
( . . . )
Score E
Sequences producing s i g n i f i c a n t alignments : ( B i t s ) Value
g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49
g i |328779480| r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.017
g i |110762568| r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.018
( . . . )
> g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n
i n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Apis m e l l i f e r a ]
Score = 189 b i t s (479) , Expect = 4e−49, Method : Compositional matrix a d j u s t .
I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%)
++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I
S b j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Standalone Blast
Blast human EIF4G1 gi:187956781 , ouput XML
$ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&
r e t t y p e=f a s t a&i d =187956781” |
b l a s t p −db p r o t e i n . fa −outfmt 5
( . . . )
<H i t h s p s>
<Hsp num>1</Hsp num>
<Hsp bit−s c o r e>189.119</ Hsp bit−s c o r e>
<Hsp score>479</ Hsp score>
<Hsp evalue>3.78314 e−49</ Hsp evalue>
<Hsp query−from>717</ Hsp query−from>
<Hsp query−to>1017</ Hsp query−to>
<Hsp hit−from>22</ Hsp hit−from>
<Hsp hit−to>319</ Hsp hit−to>
<Hsp query−frame>0</ Hsp query−frame>
<Hsp hit−frame>0</ Hsp hit−frame>
<H s p i d e n t i t y>115</ H s p i d e n t i t y>
<H s p p o s i t i v e>175</ H s p p o s i t i v e>
<Hsp gaps>39</ Hsp gaps>
<Hsp align−l e n>319</ Hsp align−l e n>
<Hsp midline>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F
+ + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp midline>
( . . . )Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
$ c u r l ” h t t p s ://www. ncbi . nlm . nih . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE
( . . . )
RTOE = 29
( . . . )
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
The End
Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API

More Related Content

What's hot

Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
Biological databases
Biological databasesBiological databases
Biological databasesQamar iqbal
Biological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usabilityBiological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usabilityLars Juhl Jensen
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITS
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analysesGenomeInABottle
UNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium PosterUNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium PosterNichole Leacock
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databasesMeetika Gupta
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
Kim Pruitt biocuration2015
Kim Pruitt biocuration2015Kim Pruitt biocuration2015
Kim Pruitt biocuration2015Kim D. Pruitt
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
Ruby on bioinformatics
Ruby on bioinformaticsRuby on bioinformatics
Ruby on bioinformaticsTse-Ching Ho
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database SystemsDenis Shestakov

What's hot (20)

Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
Biological databases
Biological databasesBiological databases
Biological databases
Biological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usabilityBiological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usability
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
Gen bank
Gen bankGen bank
Gen bank
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses
Biological databases
Biological databasesBiological databases
Biological databases
Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
UNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium PosterUNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium Poster
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment Discrepancies
Kim Pruitt biocuration2015
Kim Pruitt biocuration2015Kim Pruitt biocuration2015
Kim Pruitt biocuration2015
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
Ruby on bioinformatics
Ruby on bioinformaticsRuby on bioinformatics
Ruby on bioinformatics
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database Systems
Ensembl genome
Ensembl genomeEnsembl genome
Ensembl genome

Viewers also liked

File formats for Next Generation Sequencing
File formats for Next Generation SequencingFile formats for Next Generation Sequencing
File formats for Next Generation SequencingPierre Lindenbaum
Building a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebook
Building a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebookBuilding a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebook
Building a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebookPierre Lindenbaum
"Mon make à moi", (tout sauf Galaxy)
"Mon make à moi", (tout sauf Galaxy)"Mon make à moi", (tout sauf Galaxy)
"Mon make à moi", (tout sauf Galaxy)Pierre Lindenbaum
How to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genomeHow to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genomeRutger Vos
AM Career Marketing OHSU RIPSS 2014
AM Career Marketing OHSU RIPSS 2014AM Career Marketing OHSU RIPSS 2014
AM Career Marketing OHSU RIPSS 2014Jackie Wirz, PhD
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015Jackie Wirz, PhD
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Sreekanth Gali
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBIgeetikaJethra
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca

Viewers also liked (13)

File formats for Next Generation Sequencing
File formats for Next Generation SequencingFile formats for Next Generation Sequencing
File formats for Next Generation Sequencing
Building a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebook
Building a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebookBuilding a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebook
Building a Simple LIMS with the Eclipse Modeling Framework (EMF) ,my notebook
"Mon make à moi", (tout sauf Galaxy)
"Mon make à moi", (tout sauf Galaxy)"Mon make à moi", (tout sauf Galaxy)
"Mon make à moi", (tout sauf Galaxy)
How to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genomeHow to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genome
AM Career Marketing OHSU RIPSS 2014
AM Career Marketing OHSU RIPSS 2014AM Career Marketing OHSU RIPSS 2014
AM Career Marketing OHSU RIPSS 2014
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
Introduction to Linux
Introduction to LinuxIntroduction to Linux
Introduction to Linux
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...

Similar to NCBI Entrez API Guide for Advanced Bioinformatics

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
That's not what I meant! - Fran Alexander
That's not what I meant! - Fran Alexander That's not what I meant! - Fran Alexander
That's not what I meant! - Fran Alexander Incisive_Events
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSMirko Rossi
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Alejandra Gonzalez-Beltran
IPK - Reproducible research - To infinity
IPK - Reproducible research - To infinityIPK - Reproducible research - To infinity
IPK - Reproducible research - To infinityPeterMorrell4
Formats de données en biologie
Formats de données en biologieFormats de données en biologie
Formats de données en biologiepierrepo
Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...
Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...
Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...Diego Molla-Aliod
Miguel Foronda T3chfest
Miguel Foronda T3chfestMiguel Foronda T3chfest
Miguel Foronda T3chfestMiguel Foronda
Thesis def
Thesis defThesis def
Thesis defJay Vyas biochem i bobi u 2 database biochem i bobi u 2 biochem i bobi u 2 database biochem i bobi u 2 databaseRai University

Similar to NCBI Entrez API Guide for Advanced Bioinformatics (20)

2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
That's not what I meant! - Fran Alexander
That's not what I meant! - Fran Alexander That's not what I meant! - Fran Alexander
That's not what I meant! - Fran Alexander
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
2014 naples
2014 naples2014 naples
2014 naples
EB-eye Back End
EB-eye Back EndEB-eye Back End
EB-eye Back End
2014 ucl
2014 ucl2014 ucl
2014 ucl
01 Introduction To Dbms
01 Introduction To Dbms01 Introduction To Dbms
01 Introduction To Dbms
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Bioinformatica t2-databases
Bioinformatica t2-databasesBioinformatica t2-databases
Bioinformatica t2-databases
IPK - Reproducible research - To infinity
IPK - Reproducible research - To infinityIPK - Reproducible research - To infinity
IPK - Reproducible research - To infinity
Formats de données en biologie
Formats de données en biologieFormats de données en biologie
Formats de données en biologie
Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...
Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...
Extractive Evidence Based Medicine Summarisation Based on Sentence-Specific S...
Miguel Foronda T3chfest
Miguel Foronda T3chfestMiguel Foronda T3chfest
Miguel Foronda T3chfest
Thesis def
Thesis defThesis def
Thesis def
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche biochem i bobi u 2 database biochem i bobi u 2 biochem i bobi u 2 database biochem i bobi u 2 database

More from Pierre Lindenbaum

More from Pierre Lindenbaum (20)

Next Generation Sequencing file Formats ( 2017 )
Next Generation Sequencing file Formats ( 2017 )Next Generation Sequencing file Formats ( 2017 )
Next Generation Sequencing file Formats ( 2017 )
Mum, I 3D printed a gel comb !
Mum, I 3D printed a gel comb !Mum, I 3D printed a gel comb !
Mum, I 3D printed a gel comb !
XML for bioinformatics
XML for bioinformaticsXML for bioinformatics
XML for bioinformatics
Sketching 20120412
Sketching 20120412Sketching 20120412
Sketching 20120412
Introduction to mongodb for bioinformatics
Introduction to mongodb for bioinformaticsIntroduction to mongodb for bioinformatics
Introduction to mongodb for bioinformatics
Tweeting for the BioStar Paper
Tweeting for the BioStar PaperTweeting for the BioStar Paper
Tweeting for the BioStar Paper
Variation Toolkit
Variation ToolkitVariation Toolkit
Variation Toolkit
Bioinformatician 2.0
Bioinformatician 2.0Bioinformatician 2.0
Bioinformatician 2.0
Analyzing Exome Data with KNIME
Analyzing Exome Data with KNIMEAnalyzing Exome Data with KNIME
Analyzing Exome Data with KNIME
NOTCH2 backstage
NOTCH2 backstageNOTCH2 backstage
NOTCH2 backstage
Bioinfo tweets
Bioinfo tweetsBioinfo tweets
Bioinfo tweets
Post doctoriales 2011
Post doctoriales 2011Post doctoriales 2011
Post doctoriales 2011
20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course
Me & Biohackathon 2010
Me & Biohackathon 2010Me & Biohackathon 2010
Me & Biohackathon 2010
An implementation of Jan Aerts' LocusTree
An implementation of Jan Aerts' LocusTreeAn implementation of Jan Aerts' LocusTree
An implementation of Jan Aerts' LocusTree
Pourquoi et comment créer son Réseau
Pourquoi et comment créer son RéseauPourquoi et comment créer son Réseau
Pourquoi et comment créer son Réseau

Recently uploaded

College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Timevijaych2041
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxDr.Nusrat Tariq
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...rajnisinghkjn
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersnarwatsonia7
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar

Recently uploaded (20)

College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any TimeCall Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Viman Nagar 7001305949 All Area Service COD available Any Time
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptx
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️

NCBI Entrez API Guide for Advanced Bioinformatics

  • 1. Advanced NCBI. The Entrez API Pierre Lindenbaum @yokofakun Institut du Thorax. Nantes. France September 27, 2016 Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 2. NCBI ? What about EBI, ENSEMBL, ... Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 3. Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 4. What will be covered today? : File formats... EInfo, GQuery, ESearch , Esummary, EFetch.. processing XML answer with XSLT: HTML, SVG, R... generating a java parser for dbSNP. NCBI EBot using standalone BLAST Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 5. CURL c u r l ” http :// en . w i k i p e d i a . org / wiki /Main page” wget −O − ” http :// en . w i k i p e d i a . org / wiki /Main page” Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 6. XML Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 7. XSLT Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 8. XSLT Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 9. XSLTPROC x s l t p r o c s t y l e s h e e t . x s l f i l e . xml > r e s u l t . xml Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 10. JSON Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 11. Formats Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 12. Formats Genbank fcgi?db=nucleotide&id=25&rettype=gb LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992 DEFINITION Blue Whale heavy s a t e l l i t e DNA. ACCESSION X53813 X17460 VERSION X53813 .1 GI :25 KEYWORDS s a t e l l i t e DNA. SOURCE Balaenoptera musculus ( Blue whale ) ORGANISM Balaenoptera musculus Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera . REFERENCE 1 ( bases 1 to 422) AUTHORS Arnason ,U. and Widegren ,B. TITLE Composition and chromosomal l o c a l i z a t i o n of cetacean h i g h l y r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the blue whale , Balaenoptera musculus JOURNAL Chromosoma 98 (5) , 323−329 (1989) PUBMED 2612291 COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common cetacean component c l o n e s and <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s . FEATURES Location / Q u a l i f i e r s source 1 . . 4 2 2 / organism=”Balaenoptera musculus ” / mol type=”genomic DNA” / d b x r e f=”taxon :9771” / c l o n e =”7” m i s c f e a t u r e 1 . . 4 2 2 / note=”heavy s a t e l l i t e DNA” ORIGIN 1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t gPierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 13. Formats ASN.1 fcgi?db=nucleotide&id=25 Seq−e n t r y ::= seq { i d { embl { a c c e s s i o n ”X53813” , v e r s i o n 1 } , g i 25 } , d e s c r { t i t l e ” Blue Whale heavy s a t e l l i t e DNA” , source { org { taxname ” Balaenoptera musculus ” , common ” Blue whale ” , db { { db ” taxon ” , tag i d 9771 } } , orgname { name b i no m i al { genus ” Balaenoptera ” , s p e c i e s ” musculus ” } , l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; Craniata ; Ve r t e b r a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera ” , gcode 1 , mgcode 2 , d i v ”MAM” } } , subtype {Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 14. Formats ASN.1 (schema) http: // INSDSeq ::= SEQUENCE { l o c u s V i s i b l e S t r i n g , l e n g t h INTEGER , s t r a n d e d n e s s V i s i b l e S t r i n g OPTIONAL , moltype V i s i b l e S t r i n g , topology V i s i b l e S t r i n g OPTIONAL , d i v i s i o n V i s i b l e S t r i n g , update−date V i s i b l e S t r i n g , create−date V i s i b l e S t r i n g OPTIONAL , update−r e l e a s e V i s i b l e S t r i n g OPTIONAL , create−r e l e a s e V i s i b l e S t r i n g OPTIONAL , d e f i n i t i o n V i s i b l e S t r i n g , primary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL , entry−v e r s i o n V i s i b l e S t r i n g OPTIONAL , a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL , other−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL , secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL, p r o j e c t V i s i b l e S t r i n g OPTIONAL , keywords SEQUENCE OF INSDKeyword OPTIONAL , segment V i s i b l e S t r i n g OPTIONAL , source V i s i b l e S t r i n g OPTIONAL , organism V i s i b l e S t r i n g OPTIONAL , taxonomy V i s i b l e S t r i n g OPTIONAL , r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL , comment V i s i b l e S t r i n g OPTIONAL , comment−s e t SEQUENCE OF INSDComment OPTIONAL , struc−comments SEQUENCE OF INSDStrucComment OPTIONAL , primary V i s i b l e S t r i n g OPTIONAL , source−db V i s i b l e S t r i n g OPTIONAL ,Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 15. Formats ASN.1 (tools) DATATOOL Generate C++ data storage classes based on ASN.1 serialization streams. Convert data between ASN.1, XML and JSON formats. Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 16. Formats XML fcgi?db=nucleotide&id=25&retmode=xml <?xml v e r s i o n=” 1.0 ”?> <!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncbi . nlm . nih . gov/ dtd /NCBI G <GBSet> <GBSeq> <GBSeq locus>X53813</ GBSeq locus> <GBSeq length>422</ GBSeq length> <GBSeq strandedness>double</ GBSeq strandedness> <GBSeq moltype>DNA</GBSeq moltype> <GBSeq topology>l i n e a r</ GBSeq topology> <GBSeq division>MAM</ GBSeq division> <GBSeq update−date>22−JUN−1992</GBSeq update−date> <GBSeq create−date>13−JUL−1990</ GBSeq create−date> <G B S e q d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ G B S e q d e f i n i t i o n> <GBSeq primary−a c c e s s i o n>X53813</ GBSeq primary−a c c e s s i o n> <GBSeq accession−v e r s i o n>X53813 .1</ GBSeq accession−v e r s i o n> <GBSeq other−s e q i d s> <GBSeqid>emb| X53813 . 1 |</GBSeqid> <GBSeqid>g i |25</GBSeqid> </ GBSeq other−s e q i d s> <GBSeq secondary−a c c e s s i o n s> <GBSecondary−accn>X17460</GBSecondary−accn> </ GBSeq secondary−a c c e s s i o n s> <GBSeq keywords> <GBKeyword>s a t e l l i t e DNA</GBKeyword> </GBSeq keywords> <GBSeq source>Balaenoptera musculus ( Blue whale )</ GBSeq source> <GBSeq organism>Balaenoptera musculus</ GBSeq organism> <GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t eb r a t a ; Euteleostomi ; Mam a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera</GBSeq taxonomy>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 17. Formats XML (DTD) <!ELEMENT GBSeq ( GBSeq locus , GBSeq length , GBSeq strandedness ? , GBSeq moltype , GBSeq topology ? , GBSeq division , GBSeq update−date , GBSeq create−date ? , GBSeq update−r e l e a s e ? , GBSeq create−r e l e a s e ? , GBSeq definition , GBSeq primary−a c c e s s i o n ? , GBSeq entry−v e r s i o n ? , GBSeq accession−v e r s i o n ? , GBSeq other−s e q i d s ? , GBSeq secondary−a c c e s s i o n s ? , GBSeq project ? , GBSeq keywords ? , GBSeq segment ? , GBSeq source ? , GBSeq organism ? , GBSeq taxonomy ? , GBSeq references ? , GBSeq comment ? , GBSeq comment−s e t ? , GBSeq struc−comments ? , ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 18. E-Utilities Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 19. GI Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 20. GI 03-02-2016-phase-out-of-GI-numbers/ : ”NCBI is phasing out sequence GIs - use Accession.Version instead!” Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 21. E-Utilities Set of seven server-side programs that provide a stable interface to the search, retrieval, and linking functions of the Entrez system, using a fixed URL syntax. The output provided by the E-Utilities is in XML format, sometimes JSON, (...) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 22. Entrez Direct ”Entrez Direct (EDirect) is an advanced method for accessing the NCBI’s set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command-line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.” Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 23. EInfo Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 24. EInfo Provides a list of the names of all valid Entrez databases. Provides statistics for a single database, including lists of indexing fields and available link names. Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 25. EInfo Base URL: Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 26. EInfo XML Ouput https: // <e I n f o R e s u l t> <DbList> <DbName>pubmed</DbName> <DbName>p r o t e i n</DbName> <DbName>nuccore</DbName> <DbName>n u c l e o t i d e</DbName> <DbName>nucgss</DbName> <DbName>nucest</DbName> <DbName>s t r u c t u r e</DbName> <DbName>genome</DbName> <DbName>assembly</DbName> <DbName>gcassembly</DbName> <DbName>genomeprj</DbName> <DbName>b i o p r o j e c t</DbName> <DbName>biosample</DbName> <DbName>biosystems</DbName> <DbName>b l a s t d b i n f o</DbName>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 27. EInfo JSON Ouput fcgi?retmode=json { ” header ”: { ” type ”: ” e i n f o ” , ” v e r s i o n ”: ”0.3” } , ” e i n f o r e s u l t ”: { ” d b l i s t ”: [ ”pubmed” , ” p r o t e i n ” , ” nuccore ” , ( . . . ) ” unigene ” , ” g e n c o l l ” , ” gtr ” ] } }Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 28. EInfo Return statistics for a given Entrez database: db=DbName Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 29. EInfo Statistics for Pubmed fcgi?db=pubmed <?xml v e r s i o n=” 1.0 ”?> <e I n f o R e s u l t> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> <D e s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ D e s c r i p t i o n> <DbBuild>Build130805 −2117m.4</ DbBuild> <Count>22974581</Count> <LastUpdate>2013/08/06 08 :33</ LastUpdate> <F i e l d L i s t> ( . . . ) <F i e l d> <Name>UID</Name> <FullName>UID</FullName> <D e s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ D e s c r i p t i o n> <TermCount>0</TermCount> <IsDate>N</ IsDate> <I s N u m e r i c a l>Y</ I s N u m e r i c a l> <SingleToken>Y</ SingleToken> <H i e r a r c h y>N</ H i e r a r c h y> <IsHidden>Y</ IsHidden> </ F i e l d> <F i e l d> ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 30. EInfo Statistics for Pubmed fcgi?db=pubmed&retmode=json { ” header ”: { ” type ”: ” e i n f o ” , ” v e r s i o n ”: ”0.3” } , ” e i n f o r e s u l t ”: { ” d b i n f o ”: { ”dbname ”: ”pubmed ” , ”menuname ”: ”PubMed” , ” d e s c r i p t i o n ”: ”PubMed b i b l i o g r a p h i c r e c o r d ” , ” d b b u i l d ”: ” Build160921 −2207m.6” , ” count ”: ”26470199” , ” l a s t u p d a t e ”: ”2016/09/22 16:32” , ” f i e l d l i s t ”: [ { ”name ”: ”ALL” , ” fullname ”: ” A l l F i e l d s ” , ” d e s c r i p t i o n ”: ” A l l terms from a l l s e a r c h a b l e f i e l d s ” , ” termcount ”: ”179424126” , ” i s d a t e ”: ”N” , ” i s n u m e r i c a l ”: ”N” , ” s i n g l e t o k e n ”: ”N” , ” h i e r a r c h y ”: ”N” , ” i s h i d d e n ”: ”N” } , { ”name ”: ”UID” , ” fullname ”: ”UID” , ” d e s c r i p t i o n ”: ” Unique number a s s i g n e d to p u b l i c a t i o n ” ,Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 31. EInfo With entrez-direct $ e i n f o −dbs $ e i n f o −db pubmed Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 32. GQuery Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 33. GQuery Provides the number of records retrieved in all Entrez databases by a single text query. Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 34. GQuery Example $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode =xml” <R e s u l t> <Term>t y r a n n o s a u r u s rex</Term> <eGQueryResult> <ResultItem><DbName>pubmed</DbName><MenuName/><Count>41</Count><Status> Ok</ Status></ ResultItem> <ResultItem><DbName>pmc</DbName><MenuName/><Count>160</Count><Status>Ok< / Status></ ResultItem> <ResultItem><DbName>mesh</DbName><MenuName/><Count>15</Count><Status>Ok< / Status></ ResultItem> <ResultItem><DbName>books</DbName><MenuName/><Count>179</Count><Status> Ok</ Status></ ResultItem> <ResultItem><DbName>pubmedhealth</DbName><MenuName/><Count>21</Count>< Status>Ok</ Status></ ResultItem> <ResultItem><DbName>omim</DbName><MenuName/><Count>10</Count><Status>Ok< / Status></ ResultItem> <ResultItem><DbName>omia</DbName><MenuName/><Count>0</Count><Status>Term or Database i s not found</ Status></ ResultItem> <ResultItem><DbName>n c b i s e a r c h</DbName><MenuName/><Count>1</Count>< Status>Ok</ Status></ ResultItem> <ResultItem><DbName>nuccore</DbName><MenuName/><Count>0</Count><Status> Term or Database i s not found</ Status></ ResultItem> ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 35. GQuery Transforming to HTML using XSLT The XSLT stylesheet. lindenb/courses/master/about.ncbi/gquery2html.xsl 1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?> 2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’> 3 <x s l : o u t p u t method=” html ”/> 4 5 <x s l : t e m p l a t e match=”/”><html><body> 6 <x s l : a p p l y −templates s e l e c t=” R e s u l t ”/> 7 </body></ html></ x s l : t e m p l a t e> 8 9 <x s l : t e m p l a t e match=” R e s u l t ”> 10 <t a b l e><c a p t i o n><x s l : v a l u e −of s e l e c t=”Term”/></ c a p t i o n> 11 <t r><th>Database</ th><th>Count</ th><th>Status</ th></ t r> 12 <x s l : a p p l y −templates s e l e c t=” eGQueryResult / ResultItem ”/> 13 </ t a b l e> 14 </ x s l : t e m p l a t e> 15 16 <x s l : t e m p l a t e match=” ResultItem ”> 17 <t r> 18 <td><a> 19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncbi . nlm . nih . gov/<x s l : v a l u e −of s e l e c t=” DbName”/>?cmd=se arch&amp ; term=<x s l : v a l u e −of s e l e c t=” t r a n s l a t e (/ R e s u l t /Term , ’ ’ , ’+ ’) ”/></ x s l : a t t r i b u t e> 20 <x s l : v a l u e −of s e l e c t=”DbName”/></a></ td> 21 <td><x s l : v a l u e −of s e l e c t=”Count”/></ td> 22 <td><x s l : v a l u e −of s e l e c t=” Status ”/></ td> 23 </ t r> 24 </ x s l : t e m p l a t e> 25 26 </ x s l : s t y l e s h e e t> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 36. GQuery Transforming to HTML $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode =xml” | x s l t p r o c gquery2html . x s l − <html> <body> <t a b l e> <caption>t y r a n n o s a u r u s rex</ caption> <t r> <th>Database</ th> <th>Count</ th> <th>Status</ th> </ t r> <t r> <td> <a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pubmed?cmd=s earch&amp ; term=t y r a n n o s a u r u s </ td> <td>41</ td> <td>Ok</ td> </ t r> <t r> <td> <a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pmc?cmd=searc h&amp ; term=t y r a n n o s a u r u s+re </ td> <td>160</ td> <td>Ok</ td> </ t r> <t r> <td> <a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/mesh?cmd=sea rch&amp ; term=t y r a n n o s a u r u s+r </ td> <td>15</ td>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 37. ESearch Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 38. ESearch Provides a list of UIDs matching a text query Posts the results of a search on the History server Downloads all UIDs from a dataset stored on the History server Combines or limits UID datasets stored on the History server Sorts sets of UIDs Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 39. ESearch Syntax Base URL https: // Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 40. ESearch Searching for ’Mammuthus primigenius’ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D” | x m l l i n t −−format − <e Sea rc hR esu lt> <Count>684</Count> <RetMax>20</RetMax> <RetStart>0</ RetStart> <I d L i s t> <Id>507866428</ Id> <Id>124056416</ Id> <Id>383843869</ Id> <Id>383843867</ Id> <Id>383843865</ Id> <Id>383843863</ Id> <Id>383843861</ Id> <Id>383843859</ Id> <Id>383843857</ Id> <Id>383843855</ Id> <Id>383843853</ Id> <Id>383843851</ Id> <Id>383843849</ Id> <Id>383843847</ Id> <Id>383843845</ Id> <Id>157367690</ Id> <Id>157367676</ Id> <Id>157367662</ Id> <Id>157367648</ Id> <Id>157367634</ Id> </ I d L i s t> <T r a n s l a t i o n S e t> <T r a n s l a t i o n>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 41. ESearch Searching for ’Mammuthus primigenius’ (JSON) c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e &term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmode=j s o n ” { ” header ”: { ” type ”: ” e s e a r c h ” , ” v e r s i o n ”: ”0.3” } , ” e s e a r c h r e s u l t ”: { ” count ”: ”811” , ” retmax ”: ”20” , ” r e t s t a r t ”: ”0” , ” i d l i s t ”: [ ”1059791223” , ”198241525” , ”198241523” , ”198241521” , ”198241519” , ”198241517” , ”198241515” , ”198241513” , ”198241511” , ”198241509” , ”198241507” , ”198241505” , ”198241503” , ”198241501” , ”198241499” , ”198241497” , ”198241495” , ”198241493” , ”198241491” ,Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 42. ESearch the retmax parameter c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=2” | x m l l i n t −−format − <e Sea rc hR esu lt> <Count>684</Count> <RetMax>2</RetMax> <RetStart>0</ RetStart> <I d L i s t> <Id>507866428</ Id> <Id>124056416</ Id> </ I d L i s t> <T r a n s l a t i o n S e t> <T r a n s l a t i o n> <From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From> <To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To> </ T r a n s l a t i o n> </ T r a n s l a t i o n S e t> <T r a n s l a t i o n S t a c k> <TermSet> <Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term> <F i e l d>Organism</ F i e l d> <Count>684</Count> <Explode>Y</ Explode> </TermSet> <OP>GROUP</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 43. ESearch the retstart parameter c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” | x m l l i n t −−format − <e Sea rc hR esu lt> <Count>684</Count> <RetMax>3</RetMax> <RetStart>100</ RetStart> <I d L i s t> <Id>300810656</ Id> <Id>300810655</ Id> <Id>300810654</ Id> </ I d L i s t> <T r a n s l a t i o n S e t> <T r a n s l a t i o n> <From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From> <To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To> </ T r a n s l a t i o n> </ T r a n s l a t i o n S e t> <T r a n s l a t i o n S t a c k> <TermSet> <Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term> <F i e l d>Organism</ F i e l d> <Count>684</Count> <Explode>Y</ Explode> </TermSet> <OP>GROUP</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 44. ESearch rettype=retcount c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” | x m l l i n t −−format − <eSearchResult> <Count>684</Count> </ eSearchResult> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 45. ESearch sort=Date Released c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db= n u c l e o t i d e&term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&s o r t=Date+Released ” x m l l i n t −−format − <eSearchResult><Count>811</Count><RetMax>20</RetMax> <Id>1033204644</ Id> <Id>1033204658</ Id> <Id>1033204672</ Id> <Id>1033204686</ Id> <Id>1033204729</ Id> <Id>1033204771</ Id> <Id>1033204785</ Id> <Id>1033204799</ Id> <Id>1033204813</ Id> <Id>1033204827</ Id> <Id>1033204871</ Id> <Id>1033205124</ Id> <Id>1033205194</ Id> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 46. ESummary Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 47. ESummary Syntax Returns document summaries (DocSums) for a list of input UIDs Returns DocSums for a set of UIDs stored on the Entrez History server Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 48. ESummary Syntax Base URL: eutils/esummary.fcgi?db=(DB)&id=(TERM) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 49. ESummary Retrieve nucleotide gi=507866428 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db= n u c l e o t i d e&i d =507866428” <eSummaryResult> <DocSum> <Id>507866428</ Id> <Item Name=” Caption ” Type=” S t r i n g ”>KC524742</ Item> <Item Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb <Item Name=” Extra ” Type=” S t r i n g ”>g i |507866428| gb | KC524742 . 1 | [ 5 0 7 8 6 6 4 2 8 ]</ Item> <Item Name=” Gi ” Type=” I n t e g e r ”>507866428</ Item> <Item Name=” CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item> <Item Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item> <Item Name=” Flags ” Type=” I n t e g e r ”>0</ Item> <Item Name=” TaxId ” Type=” I n t e g e r ”>37349</ Item> <Item Name=” Length ” Type=” I n t e g e r ”>9042</ Item> <Item Name=” Status ” Type=” S t r i n g ”>l i v e</ Item> <Item Name=” ReplacedBy ” Type=” S t r i n g ”></ Item> <Item Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item> </DocSum> </ eSummaryResult> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 50. ESummary Retrieve nucleotide gi=507866428 in JSON $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db= n u c l e o t i d e&i d =507866428& retmode=j s o n ” { ” header ”: { ” type ”: ”esummary ” , ” v e r s i o n ”: ”0.3” } , ” r e s u l t ”: { ” u i d s ”: [ ”507866428” ] , ”507866428”: { ” uid ”: ”507866428” , ” c a p t i o n ”: ”KC524742 ” , ” t i t l e ”: ”Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r ” e x t r a ”: ” g i |507866428| gb | KC524742 . 1 | ” , ” g i ”: 507866428 , ” c r e a t e d a t e ”: ”2013/06/15” , ” updatedate ”: ”2013/06/21” , ” f l a g s ”: ”” , ” t a x i d ”: 37349 , ” s l e n ”: 9042 , ” biomol ”: ” genomic ” , ” moltype ”: ”dna ” , ” topology ”: ” l i n e a r ” , ” sourcedb ”: ” i n s d ” , ” s e g s e t s i z e ”: ”” , ” p r o j e c t i d ”: ”0” , ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 51. ESummary Retrieve snp rs25 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d =25 ” <eSummaryResult> <DocSum> <Id>25</ Id> <Item Name=”SNP ID” Type=” I n t e g e r ”>25</ Item> <Item Name=”Organism” Type=” S t r i n g ”></ Item> <Item Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item> <Item Name=”GLOBAL MAF” Type=” S t r i n g ”>0.4913</ Item> <Item Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item> <Item Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item> <Item Name=”SUSPECTED” Type=” S t r i n g ”></ Item> <Item Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item> <Item Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item> <Item Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item> <Item Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item> <Item Name=”CHR” Type=” S t r i n g ”>7</ Item> <Item Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item> <Item Name=”HANDLE” Type=” S t r i n g ”>1000GENOMES, BGI , BL ,BUSHMAN,COMPLETE GENOMICS, CSHL−HAPM <Item Name=”FXN CLASS” Type=” S t r i n g ”>intron−v a r i a n t</ Item> <Item Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−frequency , by−hapmap</ Item> <Item Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item> <Item Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item> <Item Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 :g .11584142T&gt ; C, NG 027670 .1 :g .29268 <Item Name=”HET” Type=” I n t e g e r ”>50</ Item> <Item Name=”SRATE” Type=” I n t e g e r ”>0</ Item> <Item Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item> <Item Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17|11574141|11584142|THSD7A|0 <Item Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item> <Item Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item> <Item Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 52. ESummary Retrieve pubmed pmid=7939126 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed& i d =7939126” <eSummaryResult> <DocSum> <Id>7939126</ Id> <Item Name=”PubDate” Type=”Date”>1994 Apr</ Item> <Item Name=”EPubDate” Type=”Date”></ Item> <Item Name=” Source ” Type=” S t r i n g ”>Sleep</ Item> <Item Name=” A u t h o r L i s t ” Type=” L i s t ”> <Item Name=” Author ” Type=” S t r i n g ”>Broughton R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Cartwright R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Doucette D</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Edmeads J</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Edwardh M</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Ervin F</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Orchard B</ Item> <Item Name=” Author ” Type=” S t r i n g ”>H i l l R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>T u r r e l l G</ Item> </ Item> <Item Name=” LastAuthor ” Type=” S t r i n g ”>T u r r e l l G</ Item> <Item Name=” T i t l e ” Type=” S t r i n g ”>Homicidal somnambulism: a case r e p o r t .</ Item> <Item Name=”Volume” Type=” S t r i n g ”>17</ Item> <Item Name=” I s s u e ” Type=” S t r i n g ”>3</ Item> <Item Name=” Pages ” Type=” S t r i n g ”>253−64</ Item> <Item Name=” LangList ” Type=” L i s t ”> <Item Name=”Lang” Type=” S t r i n g ”>E n g l i s h</ Item> </ Item> <Item Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item> <Item Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item> <Item Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 53. EFetch Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 54. EFetch Syntax Base URL: eutils/efetch.fcgi?db=(db)&id=(ID) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 55. EFetch Retrieve nucleotide gi=507866428 as ASN.1 Default efetch.fcgi?db=nucleotide&id=507866428 Seq−e n t r y ::= set { c l a s s nuc−prot , d e s c r { source { genome genomic , org { taxname ”Mammuthus p r i m i g e n i u s ” , common ” woolly mammoth” , db { { db ” taxon ” , tag i d 37349 } } , orgname { name b i no m i al { genus ”Mammuthus” , s p e c i e s ” p r i m i g e n i u s ” } , mod { { Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 57. EFetch Retrieve nucleotide gi=507866428 as TinySeq https: // db=nucleotide&id=507866428&rettype=fasta&retmode=xml <?xml v e r s i o n=” 1.0 ”?> <!DOCTYPE TSeqSet PUBLIC ”−//NCBI//NCBI TSeq/EN” <TSeqSet> <TSeq> <TSeq seqtype v a l u e=” n u c l e o t i d e ”/> <TSeq gi>507866428</ TSeq gi> <TSeq accver>KC524742 .1</ TSeq accver> <TSeq taxid>37349</ TSeq taxid> <TSeq orgname>Mammuthus p r i m i g e n i u s</TSeq orgnam <T S e q d e f l i n e>Mammuthus p r i m i g e n i u s i s o l a t e CME2 <TSeq length>9042</ TSeq length> <TSeq sequence>GCACTTGCTTTTTTTGTCTTCTTCAGACCACGA </TSeq> </TSeqSet> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 58. EFetch Retrieve nucleotide gi=507866428 as Genbank-xml fcgi?db=nucleotide&id=507866428&retmode=xml <GBSeq> <GBSeq locus>KC524742</ GBSeq locus> <GBSeq length>9042</ GBSeq length> <GBSeq strandedness>double</ GBSeq strandedness> <GBSeq moltype>DNA</GBSeq moltype> <GBSeq topology>l i n e a r</ GBSeq topology> <GBSeq division>MAM</ GBSeq division> <GBSeq update−date>21−JUN−2013</GBSeq update−date> <GBSeq create−date>15−JUN−2013</ GBSeq create−date> <G B S e q d e f i n i t i o n>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i <GBSeq primary−a c c e s s i o n>KC524742</ GBSeq primary−a c c e s s i o n> <GBSeq accession−v e r s i o n>KC524742 .1</ GBSeq accession−v e r s i o n> <GBSeq other−s e q i d s> <GBSeqid>gb | KC524742 . 1 |</GBSeqid> <GBSeqid>g i |507866428</GBSeqid> </ GBSeq other−s e q i d s> <GBSeq source>Mammuthus p r i m i g e n i u s ( woolly mammoth)</ GBSeq source> <GBSeq organism>Mammuthus p r i m i g e n i u s</ GBSeq organism> ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 59. EFetch Retrieve nucleotide gi=507866428 as Genbank fcgi?db=nucleotide&id=507866428&rettype=gb LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013 DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i a l cds . ACCESSION KC524742 VERSION KC524742 .1 GI :507866428 KEYWORDS . SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth) ORGANISM Mammuthus p r i m i g e n i u s Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ; Mammuthus . REFERENCE 1 ( bases 1 to 9042) AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L . and Berenbrink ,M. TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e charge JOURNAL Science 340 (6138) , 1234192 (2013) PUBMED 23766330 REFERENCE 2 ( bases 1 to 9042) AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N. TITLE D i r e c t Submission JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada COMMENT ##Assembly−Data−START## Sequencing Technology : : Sanger dideoxy sequencing ##Assembly−Data−END## FEATURES Location / Q u a l i f i e r s source 1 . . 9 0 4 2 / organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 60. EFetch Efetch works with the ACCESSION NUMBERS fcgi?db=nucleotide&id=KC524742&rettype=gb LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013 DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i a l cds . ACCESSION KC524742 VERSION KC524742 .1 GI :507866428 KEYWORDS . SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth) ORGANISM Mammuthus p r i m i g e n i u s Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ; Mammuthus . REFERENCE 1 ( bases 1 to 9042) AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L . and Berenbrink ,M. TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e charge JOURNAL Science 340 (6138) , 1234192 (2013) PUBMED 23766330 REFERENCE 2 ( bases 1 to 9042) AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N. TITLE D i r e c t Submission JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada COMMENT ##Assembly−Data−START## Sequencing Technology : : Sanger dideoxy sequencing ##Assembly−Data−END## FEATURES Location / Q u a l i f i e r s source 1 . . 9 0 4 2 / organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 61. EFetch Using the WebEnv parameter. Web environment string returned from a previous ESearch, EPost or ELink call. When provided, ESearch will post the results of the search operation to this pre-existing WebEnv. Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 62. EFetch Using the WebEnv parameter. Searching extinct species in the NCBI taxonomy (’extinct[PROP]’) c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db= taxonomy&term=e x t i n c t%5BPROP%5D” <e Sea rc hR esu lt> <Count>145</Count> <RetMax>20</RetMax> <RetStart>0</ RetStart> <QueryKey>1</QueryKey> <WebEnv>NCID 1 75550312 9001 1375948145 325582538</WebEnv> <I d L i s t> <Id>1225531</ Id> <Id>1225530</ Id> <Id>1211276</ Id> <Id>1211275</ Id> <Id>1027716</ Id> <Id>948961</ Id> <Id>943952</ Id> <Id>867394</ Id> <Id>867393</ Id> <Id>748142</ Id> <Id>748141</ Id> <Id>741158</ Id> <Id>703576</ Id> <Id>703571</ Id> <Id>703559</ Id> <Id>693865</ Id> <Id>686441</ Id> <Id>665113</ Id> <Id>659069</ Id> <Id>656807</ Id>Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 63. EFetch Using the WebEnv parameter. Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’) using the WebEnv parameter. $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy& query key=1&WebEnv=NCID 1 75550312 9001 1375948145 325582538&retmode=xml” <TaxaSet><Taxon> <TaxId>1225531</ TaxId> <S c i e n t i f i c N a m e>Equus ovodovi</ S c i e n t i f i c N a m e> <OtherNames> <Synonym>Equus ( Sussemionus ) ovodovi</Synonym> <Name> <ClassCDE>a u t h o r i t y</ClassCDE> <DispName>Equus ovodovi Eisenmann &amp ; Sergej , 2011</DispName> </Name> </OtherNames> <ParentTaxId>1225530</ ParentTaxId> <Rank>s p e c i e s</Rank> <D i v i s i o n>Mammals</ D i v i s i o n> <GeneticCode> <GCId>1</GCId> <GCName>Standard</GCName> </ GeneticCode> <MitoGeneticCode> ( . . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 64. EPOST Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 65. EPost Uploads a list of UIDs to the Entrez History server Appends a list of UIDs to an existing set of UID lists attached to a Web Environment Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 66. EPost Post gi to epost Get a list of gis of extincts animals: wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db= taxonomy&term=e x t i n c t [PROP]& retmax =1000’ | x m l l i n t −format − | grep ’<Id >’ | cut −d ’<’ −f 2 | cut −d ’>’ −f 2| t r ”n” ” , ” output: 1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772 ,1607771 ,1607767 ,1607757 ,1607756 Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 67. EPost Post gi to epost wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / epost . f c g i ?db=taxonomy& WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 5 9001 1474637318 669113391 0MetA0 S MegaStore F 1&i d =1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772... ” Output: <?xml v e r s i o n=” 1.0 ”?> <!DOCTYPE ePostResult PUBLIC ”−//NLM//DTD ePostResult , 11 May 2002//EN” ” h t t p : // www. ncbi . nlm . nih . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”> <ePostResult> <QueryKey>1</QueryKey> <WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5 9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv> </ ePostResult> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 68. EPost Searching in the WebEnv Search Homo Sapiens in WebEnv ? c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy& term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538&query key=1” <e Sea rc hR esu lt> <Count>0</Count> <RetMax>0</RetMax> <RetStart>0</ RetStart> <QueryKey>8</QueryKey> <WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv> <I d L i s t /> <T r a n s l a t i o n S e t /> <T r a n s l a t i o n S t a c k> <OP>GROUP</OP> <TermSet> <Term>homo s a p i e n s [ A l l Names ]</Term> <F i e l d>A l l Names</ F i e l d> <Count>0</Count> <Explode>N</ Explode> </TermSet> <OP>AND</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>(#2) AND homo s a p i e n s [ A l l Names ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 69. EPost Searching in the WebEnv Search Tyranosaurus in WebEnv ? $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db= taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538&query key=1” <e Sea rc hR esu lt> <Count>1</Count> <RetMax>1</RetMax> <RetStart>0</ RetStart> <QueryKey>9</QueryKey> <WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv> <I d L i s t> <Id>436494</ Id> </ I d L i s t> <T r a n s l a t i o n S e t /> <T r a n s l a t i o n S t a c k> <OP>GROUP</OP> <TermSet> <Term>Tyrannosaurus [ A l l Names ]</Term> <F i e l d>A l l Names</ F i e l d> <Count>1</Count> <Explode>N</ Explode> </TermSet> <OP>AND</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>(#2) AND Tyrannosaurus [ A l l Names ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 70. EDirect: combining tools Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 71. Piping Edirect esearch −db taxonomy −query ” Tyrannosaurus ” | e f e t c h −format xml Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 72. Piping Edirect esearch −db pubmed −query ” Tyrannosaurus ” | e f i l t e r −mindate 2005 | e f e t c h −format docsum | x t r a c t −pattern DocumentSummary −element MedlineCitation /PMID −element Id S o r t F i r s t A u t h o r Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 73. Elink Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 74. Elink Returns UIDs linked to an input set of UIDs in either the same or a different Entrez database Returns UIDs linked to other UIDs in the same Entrez database that match an Entrez query Checks for the existence of Entrez links for a set of UIDs within the same database Lists the available links for a UID Lists LinkOut URLs and attributes for a set of UIDs Lists hyperlinks to primary LinkOut providers for a set of UIDs Creates hyperlinks to the primary LinkOut provider for a single UID Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 75. Elink Base URL: Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 76. ELink Searching the pubmed records associated to sequence gi:507866428 h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=n u c l e o t i d e&db= pubmed&i d =507866428&cmd=n e i g h b o r s c o r e <e L i n k R e s u l t> <LinkSet> <DbFrom>nuccore</DbFrom> <I d L i s t> <Id>507866428</ Id> </ I d L i s t> <LinkSetDb> <DbTo>pubmed</DbTo> <LinkName>nuccore pubmed</LinkName> <Link> <Id>23766330</ Id> <Score>0</ Score> </ Link> </ LinkSetDb> </ LinkSet> </ e L i n k R e s u l t> $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& i d =23766330& r e t t y p e=medline&retmode=t e x t ” PMID− 23766330 TI − E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e charge . PG − 1234192 LID − 10.1126/ s c i e n c e .1234192 [ doi ] Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 77. Transformations Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 78. Efetch Transforming to SVG Using the stylesheet stylesheets/bio/ncbi/gb2svg.xsl x s l t p r o c <( c u r l ” h t t p s :// raw . github . com/ l i n d e n b / x s l t −sandbox / master / s t y l e s h e e t s / bio / ncbi / gb2svg . x s l ” ) ” h t t p s ://www. ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=n u c l e o t i d e&i d =14971102& retmode=xml&r e t t y p e=gbc” Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 79. Efetch Transforming to SVG 1 <?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?> 2 <s v g : s v g xmlns:svg=” h t t p : //www. w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=” stroke−width:1px ; ”> 3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e> 4 <s v g : d e f s> 5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”> 6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” black ”/> 7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=” whitesmoke ”/> 8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” black ”/> 9 </ s v g : l i n e a r G r a d i e n t> 10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” v e r t i c a l b o d y g r a d i e n t ”> 11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” white ”/> 12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/> 13 </ s v g : l i n e a r G r a d i e n t> 14 </ s v g : d e f s> 15 <s v g : s t y l e type=” t e x t / c s s ”/> 16 <s v g : g> 17 <s v g : g transform=” t r a n s l a t e (0 ,0) ”> 18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (# v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” black ”/> 19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t e x t> 20 <s v g : g> 21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 0 7 4 ”/> 22 <s v g : t e x t y=”54” x=”460” text−anchor=” middle ”><s v g : t s p a n s t y l e=” font− w e i g h t : b o l d ; ”>source</ s v g : t s p a n><s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3 . org /2001/XMLSchema−i n s t a n c e ” x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/ x l i n k ” font−weight=” bold ”>organism</ s v g : t s p a n>:Human r o t a v i r u s A < s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3 . org /2001/XMLSchema−i n s t a n c e ” x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/ x l i n k ” font−weight=” bold ”> mol type</ s v g : t s p a n>:genomic RNA <s v g : t s p a n x m l n s : x s i=” h t t p : //www.Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 80. Efetch Transforming to SVG Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 81. Efetch Transforming to R $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed& term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | x m l l i n t −−format − $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5 9001 1375957034 1619786167&query key=1&retmode=xml” Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 82. Efetch Transforming to R 1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?> 2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’> 3 <x s l : o u t p u t method=” t e x t ”/> 4 5 6 <x s l : t e m p l a t e match=”/”> 7 date2count &l t ;− l i s t () 8 <x s l : a p p l y −templates s e l e c t=”/ PubmedArticleSet / PubmedArticle [ M e d l i n e C i t a t i o n / DateCreated / Year ] ”/> 9 df &l t ;− data . frame ( 10 Year=as . i n t e g e r ( names ( date2count ) ) , 11 Count=u n l i s t ( date2count ) 12 ) 13 png ( ’ jeterpubmed . png ’ ) 14 p l o t ( df ) 15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( year ) ’ ) 16 dev . o f f () 17 </ x s l : t e m p l a t e> 18 19 <x s l : t e m p l a t e match=” PubmedArticle ”> 20 <x s l : v a r i a b l e name=” year ” s e l e c t=” M e d l i n e C i t a t i o n / DateCreated / Year ”/> 21 date2count [ [ ”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] &l t ;− i f e l s e ( i s . n u l l ( date2count [ [ ”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e −of s e l e c t=” $ year ”/>” ] ] ) 22 </ x s l : t e m p l a t e> 23 24 </ x s l : s t y l e s h e e t> Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 83. Efetch Transforming to R $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5 9001 1375957034 1619786167&query key=1&retmode=xml” | x s l t p r o c pubmed2rstats . x s l − date2count <− l i s t () date2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ” 2013” ] ] ) date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ” 2012” ] ] ) date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ” 2012” ] ] ) date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ” 2011” ] ] ) date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ” 2011” ] ] ) ( . . ) df <− data . frame ( Year=as . i n t e g e r ( names ( date2count ) ) , Count=u n l i s t ( date2count ) ) png ( ’ jeterpubmed . png ’ ) p l o t ( df ) t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( year ) ’ ) dev . o f f () Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 84. Efetch Transforming to R $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5 9001 1375957034 1619786167&query key=1&retmode=xml” | x s l t p r o c pubmed2rstats . x s l − | R −−no−save Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 85. Generating a JAVA parser Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 86. Using the XML schema XML Schema for dbSNP <?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?> <xsd:schema xmlns:xsd=” h t t p : //www. w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncbi . nlm . nih . ementFormDefault=” q u a l i f i e d ” a t t r i b u t e F o r m D e f a u l t=” u n q u a l i f i e d ”> <x s d : e l e m e n t name=” ExchangeSet ”> <x s d : a n n o t a t i o n> <xsd:documentation>Set of dbSNP refSNP docsums , v e r s i o n 3.4</ xsd:documentation> </ x s d : a n n o t a t i o n> <xsd:complexType> <x s d : s e q u e n c e> <x s d : e l e m e n t name=” SourceDatabase ” minOccurs=”0”> <xsd:complexType> <x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”> <x s d : a n n o t a t i o n> <xsd:documentation>NCBI taxonomy ID f o r v a r i a t i o n</ xsd:documentation> </ x s d : a n n o t a t i o n> </ x s d : a t t r i b u t e> <x s d : a t t r i b u t e name=” organism ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”> <x s d : a n n o t a t i o n> <xsd:documentation>common name f o r s p e c i e s used as part of database name </ x s d : a n n o t a t i o n> </ x s d : a t t r i b u t e> <x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”> <x s d : a n n o t a t i o n> <xsd:documentation>organism a b b r e v i a t i o n used i n dbSNP . </ xsd:documentat </ x s d : a n n o t a t i o n> </ x s d : a t t r i b u t e> <x s d : a t t r i b u t e name=” gpipeOrgAbbr ” type=” x s d : s t r i n g ”> <x s d : a n n o t a t i o n> <xsd:documentation>organism a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 87. Using the XML schema Compiling the XML Schema for dbSNP with XJC $ x j c −d . ” f t p :// f t p . ncbi . nlm . nih . gov/ snp / specs /docsum 3 . 4 . xsd ” p a r s i n g a schema . . . comp iling a schema . . . h t t p s / www ncbi nlm nih gov / snp /docsum/ Assay . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ Assembly . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/BaseURL . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/Component . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ ExchangeSet . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ FxnSet . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/MapLoc . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ ObjectFactory . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ PrimarySequence . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/Rs . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ RsLinkout . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ RsStruct . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/Ss . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ package−i n f o . j a v a Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 88. Using the XML schema Compiling the XML Schema for dbSNP with XJC Search the non-genomic rs# in dbSNP. 1 import h t t p s . www ncbi nlm nih gov . snp . docsum . ∗ ; 2 import j a va x . xml . bind . ∗ ; 3 import j a va x . xml . stream . ∗ ; 4 import j a va x . xml . stream . even ts . ∗ ; 5 c l a s s ParseDbSnp 6 { 7 p u b l i c s t a t i c void main ( S t r i n g [ ] args ) throws Exception 8 { 9 JAXBContext jaxbCtxt=JAXBContext . newInstance ( ” h t t p s . www ncbi nlm nih gov . snp . docsum” ) ; 10 Unmarshaller u n m a r s h a l l e r=jaxbCtxt . c r e a t e U n m a r s h a l l e r () ; 11 XMLInputFactory i f a c t o r y = XMLInputFactory . newInstance () ; 12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ; 13 while ( r . hasNext () ) 14 { 15 XMLEvent evt=r . peek () ; 16 i f ( ! ( evt . i s S t a r t E l e m e n t () && evt . asStartElement () . getName () . g e t L o c a l P a r t () . e q u a l s ( ”Rs” ) ) ) 17 { 18 evt=r . nextEvent () ; 19 continue ; 20 } 21 22 Rs r s=u n m a r s h a l l e r . unmarshal ( r , Rs . c l a s s ) . getValue () ; 23 i f ( ” genomic ” . e q u a l s ( r s . getMolType () ) ) continue ; 24 System . out . p r i n t l n ( ” r s ”+r s . getRsId ()+” ”+r s . getMolType () ) ; 25 } 26 r . c l o s e () ; 27 } 28 } Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 89. Using the XML schema Compiling the XML Schema for dbSNP with XJC compile... $ j a v a c ParseDbSnp . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum /∗. j a v a and run... $ c u r l −s ” f t p :// f t p . ncbi . nih . gov/ snp / organisms /human 9606/XML/ ds ch1 . xml . gz” | gunzip −c | j a v a ParseDbSnp rs701 cDNA rs860 cDNA rs861 cDNA rs862 cDNA rs863 cDNA rs864 cDNA rs865 cDNA rs866 cDNA rs877 cDNA rs878 cDNA rs879 cDNA rs880 cDNA rs882 cDNA rs883 cDNA rs884 cDNA rs885 cDNA rs886 cDNA rs913 cDNA rs945 cDNA rs946 cDNA ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 90. NCBI EBot Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 91. NCBI EBot URL ebot/ebot.cgi Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 92. NCBI EBot Sample output #!/ usr / bin / p e r l ( . . . ) # PUBLIC DOMAIN NOTICE # N a t i o n a l Center f o r Biotechnology I n f o r m a t i o n use LWP: : Simple ; use LWP: : UserAgent ; use Net : : FTP; my $delay = 0; my $maxdelay = 3; my $base = ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /” ; $params{email} = ”nobody@nowhere . com” ; $params{db} = ” nuccore ” ; $params{ t o o l } = ” ebot ” ; $params{term} = ”Mammuthus+p r i m i g e n i u s [ORGN] ” ; %params = e s e a r c h(%params ) ; $params{retmode} = ”xml” ; $params{ o u t f i l e } = ” r e s u l t . xml” ; $params{ r e t t y p e } = ” n a t i v e ” ; e f e t c h b a t c h (%params ) ; Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 93. BLAST Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 94. Standalone Blast Downloading Standalone tools are available at blast/executables/blast+/LATEST/ #add BLAST to your path export PATH=${PATH}:/ path / to / ncbi−blast −2.2.28+/ bin Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 95. Standalone Blast Download a sample apis mellifera proteins c u r l −o p r o t e i n . fa . gz ” f t p :// f t p . ncbi . nih . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . fa . gz” gunzip p r o t e i n . fa . gz Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 96. Standalone Blast Create a Blast database with makeblastdb Getting help... $ makeblastdb −help ( . . . ) −dbtype <String , ‘ nucl ’ , ‘ prot ’> Molecule type of t a r g e t db −in <F i l e I n > Input f i l e / database name Default = ‘−’ −i n p u t t y p e <String , ‘ asn1 bin ’ , ‘ asn1 txt ’ , ‘ blast Type of the data s p e c i f i e d in i n p u t f i l e Default = ‘ fasta ’ ( . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 97. Standalone Blast Create a Blast database with makeblastdb Create the BLAST database: $ makeblastdb −in p r o t e i n . fa −dbtype prot B u i l d i n g a new DB, c u r r e n t time : 09/02/2013 18:29:38 New DB name : p r o t e i n . fa New DB t i t l e : p r o t e i n . fa Sequence type : Protein Keep Linkouts : T Keep MBits : T Maximum f i l e s i z e : 1000000000B Adding sequences from FASTA; added 10570 sequences Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 98. Standalone Blast Query a Blast database with blastp Get help: $ b l a s t p −help ( . . . ) −query <F i l e I n > Input f i l e name Default = ‘−’ −db <String > BLAST database name ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 99. Standalone Blast Blast human EIF4G1 gi:187956781 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n& r e t t y p e=f a s t a&i d =187956781” | b l a s t p −db p r o t e i n . fa Query= g i |187956781| gb | AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ] ( . . . ) Score E Sequences producing s i g n i f i c a n t alignments : ( B i t s ) Value g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49 g i |328779480| r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.017 g i |110762568| r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.018 ( . . . ) > g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n i n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Apis m e l l i f e r a ] Length=899 Score = 189 b i t s (479) , Expect = 4e−49, Method : Compositional matrix a d j u s t . I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%) Query 717 KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSI 774 ++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I S b j c t 22 RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGI 73 Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−− 829 LNKLTP+ F +L + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L S b j c t 74 LNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAA 133 Query 830 −MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLK 888 K E F LLL++C+ EFE E FE + DE EE S b j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 100. Standalone Blast Blast human EIF4G1 gi:187956781 , ouput XML $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n& r e t t y p e=f a s t a&i d =187956781” | b l a s t p −db p r o t e i n . fa −outfmt 5 ( . . . ) <H i t h s p s> <Hsp> <Hsp num>1</Hsp num> <Hsp bit−s c o r e>189.119</ Hsp bit−s c o r e> <Hsp score>479</ Hsp score> <Hsp evalue>3.78314 e−49</ Hsp evalue> <Hsp query−from>717</ Hsp query−from> <Hsp query−to>1017</ Hsp query−to> <Hsp hit−from>22</ Hsp hit−from> <Hsp hit−to>319</ Hsp hit−to> <Hsp query−frame>0</ Hsp query−frame> <Hsp hit−frame>0</ Hsp hit−frame> <H s p i d e n t i t y>115</ H s p i d e n t i t y> <H s p p o s i t i v e>175</ H s p p o s i t i v e> <Hsp gaps>39</ Hsp gaps> <Hsp align−l e n>319</ Hsp align−l e n> <Hsp qseq>KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQ IARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLL−−−−−−−−KNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKK <Hsp hseq>RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGILNKLTPEKFAKLS VAKRKMLGNIKFIGELGKLGIVSETILHRCILQLLEKKRRRRSRGDTAEDIECLCQIMRTCGRILDSDKGRGLMDQYFKRMNSLAESRD <Hsp midline>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L K E F LLL++C+ EFE E FE + DE EE E R +A+R+ LGNIKFIGEL KL +++E I+H C+++LL + E +ECLC+++ T G+ LD +K + MDQYF +M + + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp midline> </Hsp> ( . . . )Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 101. NCBI URL-API Blast Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 102. NCBI URL-API Blast $ c u r l ” h t t p s ://www. ncbi . nlm . nih . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE &DATABASE=nr&PROGRAM=b l a s t p&FILTER=L&HITLIST SZE=500” ( . . . ) <!−−QBlastInfoBegin RID = 1NRYGX9K014 RTOE = 29 QBlastInfoEnd −−> ( . . . ) Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API
  • 103. The End Pierre Lindenbaum@yokofakun http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez API