SlideShare une entreprise Scribd logo
1  sur  25
Building Data
Yasunori Yamamoto
NCBI Taxonomy      4,000 biomedical journals
               Database            indexed at NLM

   1994
    4 DBs       GenBank
                                               SWISSPROT
                                               PIR
                EMBL                           PRF
                DDBJ                           PDB
                dbEST                          GenBank
                dbSTS                          EMBL
                                               DDBJ 3442 Nucleic Acids Research, 1994, Vol. 22, No. 17
                LANL
                Patent                         LANL
                                               Patent




35 DBs



2012


                            http://www.ncbi.nlm.nih.gov/sites/gquery
                                                                       Database Center for Life Science
NAR database issue
                                        1400                                              1380
                                                                                 1330
                                        1300                          1230

                                        1200                1170

                                                     1078
                                        1100



                                                     2008   2009      2010       2011         2012

                                                            Source: Oxford University Press
        92 databases added every year




   93
                                          dullhunk




                                                               Database Center for Life Science
How to find a relevant database is an important topic,


                    and, at the same time,


to discuss what kind of databases are “good” is also significant.




                                                       Database Center for Life Science
Data before applications / services




         NASA Goddard Photo and Video




                                        Database Center for Life Science
Good fishes first




            y !
         m
      u m                  !
     Y                    y
                      m
                   u m
                  Y



                         Database Center for Life Science
Aziz T. Saltik



Nature provides good fishes



  Chef mashes up good materials


                                          mrjorgen



                                                     Database Center for Life Science
What should be considered?
  and how can these be assessed?
Interesting, useful & reliable

   Reliable in terms of content and structure

   Peer-reviewed
   → Published on NAR database issue or another scientific journal.

Sustainable, reusable & discoverable

   Appropriate licenses, bulk downloadable via the Internet, Linked Data...

Fresh & stable

   Frequent updates with the least amount of down time.


                                                            Database Center for Life Science
We should focus on building “good” data or developing tools to help it.




                                                          Database Center for Life Science
Allie
Abbreviation / long form pairs in life sciences

  Japanese translation

  CC 2.1 (Japan)                              Allie
  Monthly update                           http://allie.dbcls.jp/

  SPARQL endpoint / bulk downloadable
  (N-triples or tab delimited plain text)

  Links to PubMed and DBpedia (currently, RDF data only)

Web search service

  7000+ unique visits / mo to the search service

                                                               Database Center for Life Science
Allie data model            absorption of lexical variants


PairCluster
   ShortForm LongForm
     SPF     specific pathogen-free
                                          appearsIn     PubMedIDList

              contains
                                                        CoocurringShort
                                         cooccursWith
PairList                                                   FormList
   Pair
  ShortForm LongForm
     SPF     specific pathogen-free       inResearch
                                            AreaOf
                                                        ResearchArea
   Pair
  ShortForm LongForm
     spf     specified pathogen free
                                          frequency




                                                         Database Center for Life Science
Allie class hierarchy
          http://purl.org/allie/ontology/201108




                                                  Database Center for Life Science
Allie RDF data excerpted
                                           "特定病原体除去の"@ja                      allie:LongForm
Abbreviation
   SPF                  "specific pathogen-free"@en             rdfs:label       rdf:type
       Long form                           rdfs:label
       specific pathogen-free                     http://purl.org/allie/id/longform/1528191
      English
                                                              allie:hasLongFormOf
         特定病原体除去の
      Japanese
                               http://purl.org/allie/id/pair/1547869


                               rdf:type
                                                              allie:hasShortFormOf

                          allie:EachPair
                                                    http://purl.org/allie/id/pair/1547869


                                              rdfs:label                          rdf:type

                                           "SPF"@en                           allie:ShortForm
                                                                            Database Center for Life Science
Useful / reliable?
                                                                                                                                          Database, Vol. 2011, Article ID bar013, doi:10.1093/database/bar013
     .............................................................................................................................................................................................................................................................................................




     Original article
     Allie: a database and a search service of
     abbreviations and long forms
     Yasunori Yamamoto1,*, Atsuko Yamaguchi1, Hidemasa Bono1 and Toshihisa Takagi2
     1
         Database Center for Life Science, Bunkyo-ku, Tokyo and 2Department of Computational Biology, University of Tokyo, Kashiwa, Chiba, Japan

     *Corresponding author: Tel: +81 (0)3 5841 0251; Fax: +81 (0)3 5841 8090; Email: yy@dbcls.rois.ac.jp




                                                                                                                                                                                                                                                                                                     Downloaded from http://database.oxfordjournals.org/ at University of Tokyo on
     Submitted 25 November 2010; Revised 25 March 2011; Accepted 28 March 2011

     .............................................................................................................................................................................................................................................................................................

     Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear
     frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we
     have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions).
     Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all
     titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the
     query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring
     abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps
     users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary
     called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic
     information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical
     abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbre-
     viations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations
     and their long forms with their corresponding PubMed IDs is constructed and updated weekly.
     Database URL: The Allie service is available at http://allie.dbcls.jp/.
     .............................................................................................................................................................................................................................................................................................

                                                                                                                                                                                                                                                                                   Database Center for Life Science
Discoverable?




http://thedatahub.org/dataset/allie-abbreviation-and-long-form-database-in-life-science
                                                                      Database Center for Life Science
Reliable?




http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/validate.php
                                                        Database Center for Life Science
Reliable/stable?




            http://stats.lod2.eu/rdfdocs
                                           Database Center for Life Science
Stable?




                     http://labs.mondeca.com/sparqlEndpointsStatus/
http://labs.mondeca.com/sparqlEndpointsStatus/details/allie-abbreviation-and-long-form-database-in-life-science.html
                                                                                           Database Center for Life Science
consider to be on the right track.




                                     Database Center for Life Science
Projects in this hackathon



                      Database Center for Life Science
RDFization of Life Science Dictionary

Life Science Dictionary

  English - Japanese / Japanese - English dictionary in life sciences

  Thesaurus and concordance

  Project started in 1993.

  110k English words and 120k Japanese words (as of Mar. 2011)

Can be used to inter- or intra-connect life science databases

  Bridge English-Japanese resources in life sciences

Prefix would be http://purl.org/lsd/


                                                            Database Center for Life Science
http://lsd.pharm.kyoto-u.ac.jp/en/service/weblsd/index.html



                                                    Database Center for Life Science
RDFization of Colil

Comments on Literature in Literature (Colil)

  Citation data extracted from PMC OA subset

  Citing comments on each cited literature (Citation context)

  Relevant literature based on co-citation data

  Similar to the MS academic search service

Can be used to a literature recommendation service

  Curation/annotation assistance services

Bulk downloadable


                                                           Database Center for Life Science
Colil




        Database Center for Life Science
Enjoy hack & Toyama!


                 digicacy

Contenu connexe

Tendances

Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyChris Evelo
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways: Chris Evelo
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! TheContentMine
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...Dag Endresen
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifestpetermurrayrust
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Paolo Missier
 
[2017.06.02] ASM17 Mads Albertsen
[2017.06.02] ASM17 Mads Albertsen[2017.06.02] ASM17 Mads Albertsen
[2017.06.02] ASM17 Mads AlbertsenMads Albertsen
 
EOL and Science: Yes we can!
EOL and Science: Yes we can!EOL and Science: Yes we can!
EOL and Science: Yes we can!Cyndy Parr
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseRai University
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing PhylogenomicsJoe Parker
 

Tendances (18)

Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
 
Xerox2009
Xerox2009Xerox2009
Xerox2009
 
[2017.06.02] ASM17 Mads Albertsen
[2017.06.02] ASM17 Mads Albertsen[2017.06.02] ASM17 Mads Albertsen
[2017.06.02] ASM17 Mads Albertsen
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
EOL and Science: Yes we can!
EOL and Science: Yes we can!EOL and Science: Yes we can!
EOL and Science: Yes we can!
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 database
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
 

En vedette

En vedette (18)

Folded Pdf File
Folded Pdf FileFolded Pdf File
Folded Pdf File
 
Ajacs27 TogoDoc, inMeXes, Allie
Ajacs27 TogoDoc, inMeXes, AllieAjacs27 TogoDoc, inMeXes, Allie
Ajacs27 TogoDoc, inMeXes, Allie
 
Action Research Pd
Action Research PdAction Research Pd
Action Research Pd
 
Powerpoint About Caribbean Water Resources
Powerpoint About Caribbean Water ResourcesPowerpoint About Caribbean Water Resources
Powerpoint About Caribbean Water Resources
 
SADI practice
SADI practiceSADI practice
SADI practice
 
SWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by YamamotoSWAT4LS 2014 SLIDE by Yamamoto
SWAT4LS 2014 SLIDE by Yamamoto
 
第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo
 
Introduction à la big data
Introduction à la big dataIntroduction à la big data
Introduction à la big data
 
Ajacs33 文献の検索とその整理方法
Ajacs33 文献の検索とその整理方法Ajacs33 文献の検索とその整理方法
Ajacs33 文献の検索とその整理方法
 
R intro
R introR intro
R intro
 
Swc2013 yamamoto
Swc2013 yamamotoSwc2013 yamamoto
Swc2013 yamamoto
 
Cezanne Lettura Visiva
Cezanne Lettura VisivaCezanne Lettura Visiva
Cezanne Lettura Visiva
 
Sap Pengambilan Keptsn
Sap Pengambilan KeptsnSap Pengambilan Keptsn
Sap Pengambilan Keptsn
 
Sap Pengambilan Keptsn
Sap Pengambilan KeptsnSap Pengambilan Keptsn
Sap Pengambilan Keptsn
 
JSAI 2015 1G5-1 生命科学分野の日本語言語資源の整備と日本語コンテンツへのリンク
JSAI 2015 1G5-1 生命科学分野の日本語言語資源の整備と日本語コンテンツへのリンクJSAI 2015 1G5-1 生命科学分野の日本語言語資源の整備と日本語コンテンツへのリンク
JSAI 2015 1G5-1 生命科学分野の日本語言語資源の整備と日本語コンテンツへのリンク
 
AJACS54 PubMed Allie inMeXes Colil
AJACS54 PubMed Allie inMeXes ColilAJACS54 PubMed Allie inMeXes Colil
AJACS54 PubMed Allie inMeXes Colil
 
第52回生命科学夏の学校
第52回生命科学夏の学校第52回生命科学夏の学校
第52回生命科学夏の学校
 
Qaシステム解説
Qaシステム解説Qaシステム解説
Qaシステム解説
 

Similaire à Building Data

Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEPrashantSharma807
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptxSwarup Malakar
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesCyndy Parr
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.pptSanthiyaAK
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationGigaScience, BGI Hong Kong
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 

Similaire à Building Data (20)

Data base in detail
Data base in detailData base in detail
Data base in detail
 
Biological database
Biological databaseBiological database
Biological database
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Biological databases
Biological databasesBiological databases
Biological databases
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 

Plus de yayamamo @ DBCLS Kashiwanoha (11)

D2RQ Mapper
D2RQ MapperD2RQ Mapper
D2RQ Mapper
 
SIG-SWO-A1402-09:SPINを用いたトリプルストアの性能評価システム
SIG-SWO-A1402-09:SPINを用いたトリプルストアの性能評価システムSIG-SWO-A1402-09:SPINを用いたトリプルストアの性能評価システム
SIG-SWO-A1402-09:SPINを用いたトリプルストアの性能評価システム
 
トーゴーの日2014ポスター
トーゴーの日2014ポスタートーゴーの日2014ポスター
トーゴーの日2014ポスター
 
Made in "Jimoto"
Made in "Jimoto"Made in "Jimoto"
Made in "Jimoto"
 
第7回 Linked Data 勉強会 @yayamamo
第7回 Linked Data 勉強会 @yayamamo第7回 Linked Data 勉強会 @yayamamo
第7回 Linked Data 勉強会 @yayamamo
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Ontology howto
Ontology howtoOntology howto
Ontology howto
 
第5回LinkedData勉強会@yayamamo
第5回LinkedData勉強会@yayamamo第5回LinkedData勉強会@yayamamo
第5回LinkedData勉強会@yayamamo
 
Towards Database Integration Through RDF & Linked Data
Towards Database Integration Through RDF & Linked DataTowards Database Integration Through RDF & Linked Data
Towards Database Integration Through RDF & Linked Data
 
LOD challenge day 2011 LT
LOD challenge day 2011 LTLOD challenge day 2011 LT
LOD challenge day 2011 LT
 
生物物理若手夏の学校 TogoDoc inMeXes Allie
生物物理若手夏の学校 TogoDoc inMeXes Allie生物物理若手夏の学校 TogoDoc inMeXes Allie
生物物理若手夏の学校 TogoDoc inMeXes Allie
 

Dernier

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Dernier (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Building Data

  • 2. NCBI Taxonomy 4,000 biomedical journals Database indexed at NLM 1994 4 DBs GenBank SWISSPROT PIR EMBL PRF DDBJ PDB dbEST GenBank dbSTS EMBL DDBJ 3442 Nucleic Acids Research, 1994, Vol. 22, No. 17 LANL Patent LANL Patent 35 DBs 2012 http://www.ncbi.nlm.nih.gov/sites/gquery Database Center for Life Science
  • 3. NAR database issue 1400 1380 1330 1300 1230 1200 1170 1078 1100 2008 2009 2010 2011 2012 Source: Oxford University Press 92 databases added every year 93 dullhunk Database Center for Life Science
  • 4. How to find a relevant database is an important topic, and, at the same time, to discuss what kind of databases are “good” is also significant. Database Center for Life Science
  • 5. Data before applications / services NASA Goddard Photo and Video Database Center for Life Science
  • 6. Good fishes first y ! m u m ! Y y m u m Y Database Center for Life Science
  • 7. Aziz T. Saltik Nature provides good fishes Chef mashes up good materials mrjorgen Database Center for Life Science
  • 8. What should be considered? and how can these be assessed? Interesting, useful & reliable Reliable in terms of content and structure Peer-reviewed → Published on NAR database issue or another scientific journal. Sustainable, reusable & discoverable Appropriate licenses, bulk downloadable via the Internet, Linked Data... Fresh & stable Frequent updates with the least amount of down time. Database Center for Life Science
  • 9. We should focus on building “good” data or developing tools to help it. Database Center for Life Science
  • 10. Allie Abbreviation / long form pairs in life sciences Japanese translation CC 2.1 (Japan) Allie Monthly update http://allie.dbcls.jp/ SPARQL endpoint / bulk downloadable (N-triples or tab delimited plain text) Links to PubMed and DBpedia (currently, RDF data only) Web search service 7000+ unique visits / mo to the search service Database Center for Life Science
  • 11. Allie data model absorption of lexical variants PairCluster ShortForm LongForm SPF specific pathogen-free appearsIn PubMedIDList contains CoocurringShort cooccursWith PairList FormList Pair ShortForm LongForm SPF specific pathogen-free inResearch AreaOf ResearchArea Pair ShortForm LongForm spf specified pathogen free frequency Database Center for Life Science
  • 12. Allie class hierarchy http://purl.org/allie/ontology/201108 Database Center for Life Science
  • 13. Allie RDF data excerpted "特定病原体除去の"@ja allie:LongForm Abbreviation SPF "specific pathogen-free"@en rdfs:label rdf:type Long form rdfs:label specific pathogen-free http://purl.org/allie/id/longform/1528191 English allie:hasLongFormOf 特定病原体除去の Japanese http://purl.org/allie/id/pair/1547869 rdf:type allie:hasShortFormOf allie:EachPair http://purl.org/allie/id/pair/1547869 rdfs:label rdf:type "SPF"@en allie:ShortForm Database Center for Life Science
  • 14. Useful / reliable? Database, Vol. 2011, Article ID bar013, doi:10.1093/database/bar013 ............................................................................................................................................................................................................................................................................................. Original article Allie: a database and a search service of abbreviations and long forms Yasunori Yamamoto1,*, Atsuko Yamaguchi1, Hidemasa Bono1 and Toshihisa Takagi2 1 Database Center for Life Science, Bunkyo-ku, Tokyo and 2Department of Computational Biology, University of Tokyo, Kashiwa, Chiba, Japan *Corresponding author: Tel: +81 (0)3 5841 0251; Fax: +81 (0)3 5841 8090; Email: yy@dbcls.rois.ac.jp Downloaded from http://database.oxfordjournals.org/ at University of Tokyo on Submitted 25 November 2010; Revised 25 March 2011; Accepted 28 March 2011 ............................................................................................................................................................................................................................................................................................. Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbre- viations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. ............................................................................................................................................................................................................................................................................................. Database Center for Life Science
  • 17. Reliable/stable? http://stats.lod2.eu/rdfdocs Database Center for Life Science
  • 18. Stable? http://labs.mondeca.com/sparqlEndpointsStatus/ http://labs.mondeca.com/sparqlEndpointsStatus/details/allie-abbreviation-and-long-form-database-in-life-science.html Database Center for Life Science
  • 19. consider to be on the right track. Database Center for Life Science
  • 20. Projects in this hackathon Database Center for Life Science
  • 21. RDFization of Life Science Dictionary Life Science Dictionary English - Japanese / Japanese - English dictionary in life sciences Thesaurus and concordance Project started in 1993. 110k English words and 120k Japanese words (as of Mar. 2011) Can be used to inter- or intra-connect life science databases Bridge English-Japanese resources in life sciences Prefix would be http://purl.org/lsd/ Database Center for Life Science
  • 23. RDFization of Colil Comments on Literature in Literature (Colil) Citation data extracted from PMC OA subset Citing comments on each cited literature (Citation context) Relevant literature based on co-citation data Similar to the MS academic search service Can be used to a literature recommendation service Curation/annotation assistance services Bulk downloadable Database Center for Life Science
  • 24. Colil Database Center for Life Science
  • 25. Enjoy hack & Toyama! digicacy

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n