2. NCBI Taxonomy 4,000 biomedical journals
Database indexed at NLM
1994
4 DBs GenBank
SWISSPROT
PIR
EMBL PRF
DDBJ PDB
dbEST GenBank
dbSTS EMBL
DDBJ 3442 Nucleic Acids Research, 1994, Vol. 22, No. 17
LANL
Patent LANL
Patent
35 DBs
2012
http://www.ncbi.nlm.nih.gov/sites/gquery
Database Center for Life Science
3. NAR database issue
1400 1380
1330
1300 1230
1200 1170
1078
1100
2008 2009 2010 2011 2012
Source: Oxford University Press
92 databases added every year
93
dullhunk
Database Center for Life Science
4. How to find a relevant database is an important topic,
and, at the same time,
to discuss what kind of databases are “good” is also significant.
Database Center for Life Science
5. Data before applications / services
NASA Goddard Photo and Video
Database Center for Life Science
6. Good fishes first
y !
m
u m !
Y y
m
u m
Y
Database Center for Life Science
7. Aziz T. Saltik
Nature provides good fishes
Chef mashes up good materials
mrjorgen
Database Center for Life Science
8. What should be considered?
and how can these be assessed?
Interesting, useful & reliable
Reliable in terms of content and structure
Peer-reviewed
→ Published on NAR database issue or another scientific journal.
Sustainable, reusable & discoverable
Appropriate licenses, bulk downloadable via the Internet, Linked Data...
Fresh & stable
Frequent updates with the least amount of down time.
Database Center for Life Science
9. We should focus on building “good” data or developing tools to help it.
Database Center for Life Science
10. Allie
Abbreviation / long form pairs in life sciences
Japanese translation
CC 2.1 (Japan) Allie
Monthly update http://allie.dbcls.jp/
SPARQL endpoint / bulk downloadable
(N-triples or tab delimited plain text)
Links to PubMed and DBpedia (currently, RDF data only)
Web search service
7000+ unique visits / mo to the search service
Database Center for Life Science
11. Allie data model absorption of lexical variants
PairCluster
ShortForm LongForm
SPF specific pathogen-free
appearsIn PubMedIDList
contains
CoocurringShort
cooccursWith
PairList FormList
Pair
ShortForm LongForm
SPF specific pathogen-free inResearch
AreaOf
ResearchArea
Pair
ShortForm LongForm
spf specified pathogen free
frequency
Database Center for Life Science
12. Allie class hierarchy
http://purl.org/allie/ontology/201108
Database Center for Life Science
13. Allie RDF data excerpted
"特定病原体除去の"@ja allie:LongForm
Abbreviation
SPF "specific pathogen-free"@en rdfs:label rdf:type
Long form rdfs:label
specific pathogen-free http://purl.org/allie/id/longform/1528191
English
allie:hasLongFormOf
特定病原体除去の
Japanese
http://purl.org/allie/id/pair/1547869
rdf:type
allie:hasShortFormOf
allie:EachPair
http://purl.org/allie/id/pair/1547869
rdfs:label rdf:type
"SPF"@en allie:ShortForm
Database Center for Life Science
14. Useful / reliable?
Database, Vol. 2011, Article ID bar013, doi:10.1093/database/bar013
.............................................................................................................................................................................................................................................................................................
Original article
Allie: a database and a search service of
abbreviations and long forms
Yasunori Yamamoto1,*, Atsuko Yamaguchi1, Hidemasa Bono1 and Toshihisa Takagi2
1
Database Center for Life Science, Bunkyo-ku, Tokyo and 2Department of Computational Biology, University of Tokyo, Kashiwa, Chiba, Japan
*Corresponding author: Tel: +81 (0)3 5841 0251; Fax: +81 (0)3 5841 8090; Email: yy@dbcls.rois.ac.jp
Downloaded from http://database.oxfordjournals.org/ at University of Tokyo on
Submitted 25 November 2010; Revised 25 March 2011; Accepted 28 March 2011
.............................................................................................................................................................................................................................................................................................
Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear
frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we
have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions).
Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all
titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the
query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring
abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps
users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary
called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic
information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical
abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbre-
viations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations
and their long forms with their corresponding PubMed IDs is constructed and updated weekly.
Database URL: The Allie service is available at http://allie.dbcls.jp/.
.............................................................................................................................................................................................................................................................................................
Database Center for Life Science
17. Reliable/stable?
http://stats.lod2.eu/rdfdocs
Database Center for Life Science
18. Stable?
http://labs.mondeca.com/sparqlEndpointsStatus/
http://labs.mondeca.com/sparqlEndpointsStatus/details/allie-abbreviation-and-long-form-database-in-life-science.html
Database Center for Life Science
19. consider to be on the right track.
Database Center for Life Science
21. RDFization of Life Science Dictionary
Life Science Dictionary
English - Japanese / Japanese - English dictionary in life sciences
Thesaurus and concordance
Project started in 1993.
110k English words and 120k Japanese words (as of Mar. 2011)
Can be used to inter- or intra-connect life science databases
Bridge English-Japanese resources in life sciences
Prefix would be http://purl.org/lsd/
Database Center for Life Science
23. RDFization of Colil
Comments on Literature in Literature (Colil)
Citation data extracted from PMC OA subset
Citing comments on each cited literature (Citation context)
Relevant literature based on co-citation data
Similar to the MS academic search service
Can be used to a literature recommendation service
Curation/annotation assistance services
Bulk downloadable
Database Center for Life Science