SlideShare une entreprise Scribd logo
1  sur  43
Databasing the World:Biodiversity and the 2000s Written by Bowker, G. C.  Presented by Chen Zhang (Mike)
Four Key Aspects Database Infrastructure Standards—flexible, stable Technology—stable  Communication Data Sharing Ownership Disarticulation Data collection
Four Key Aspects Distributed Collective Practice Collaborate work New Knowledge Economy Accounting for life Development of Classification Cladistics The Future
Database    Infrastructure
Standards Why do we need standards Example of air-conditioner industry Diameter Match between screw and the hole on the panel Reasons for database Need ‘handshake’ among various media MIME<Multipurpose Internet Mail Extensions>protocol  Each layer of infrastructure requires its own set of standards Need standardized  categories.
Standards Standards will not always win Some best-known standards QWERTY keyboard
Standards Standards will not always win Some best-known standards VHS (Video Home System) standard
Standards Standards will not always win Some best-known standards DOS computing system
Standards Standards will not always win Why? The best standard maybe doesn’t have best market Standards setting is a key site of political work The inferior standard may be respected by the political agency. ( Such as standards-setting bodies)
Standards Interoperability Continuum of strategies for standards setting One Standard Fits All Let A Thousand standards bloom
Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection. 	IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.
Standards Interoperability Some Related Standards ANSI/NISO Z39.50
Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 	A single enquiry over multiple databases. 	widely adopter in the library world.
Standards Interoperability Some Related Standards 2. XML Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form. 	Two extremes: 	a. Colonial model b. Democratic model (win out) 	       People’s established computing environment
Technology Technology must be stable Nothing to guarantee the stability of vast data sets Failure of Paul Otlet’s  well catalogued microfiches Development of computer memory Hard to retrieve information
Technology Technology must stable Data accessible and usable Infrastructure will require a continued maintenance effort Reasons 	a.  Data is passed from one medium to another b.  Data is analyzed by one generation of database technology to the next.
Issues of Communication Problem of reliable metadata Metadata—data about data The blue lines  are metadata
Issues of Communication Problem of reliable metadata The standard name of certain kinds of data Searchable—easy to search over multiple database Issue—how detail does the name of data should be? Lack of details— the information of data is useless Too many details— longer time, more work
Issues of Communication Dublin code The Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements: Language Relation Coverage Rights Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source
Data Sharing
Ownership Control of knowledge Mid-nineteenth century:  only professionally trained scientists and doctors  New information economy:  from many people Example: patients group
Ownership Privacy Keep data private is difficult : 	Example: data is complied by third-company to generate a new, marketable form of knowledge New Patterns of ownership Science has frequently been analyzed as a “public good” Increasing privatization of knowledge :   	It is unclear to what extent the vaunted openness of the scientific community will last
Disarticulation Ideal database Should according to most practitioners be theory-neutral, but should serve as a common basis for a number of scientific disciplines to progress. Example: genome databank new kind of science  genome construct arguments about the genetic causation ≠ the process of mapping the genome ,[object Object]
The data in a database should be easily manipulated by other scientists.,[object Object]
Data Collection Deal with old data Difficulties Scientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated. The distributed database is becoming a new model form of scientific publication in its own right Issues of Update No automatic update from one field to a cognate one Scientist are not able to share information across discipline divides
Data Collection International Technoscience Purpose: Narrow the gaps between countries Issues: People do not have equal knowledge Access is never really equal Government have doubts of the usefulness of opening the database onto internet.
Distributed Collective Practice
Collaborative Work Management structures in universities and industry still tend to support the heroic myth of the individual researcher. What kind of value the large publishing houses add to journal production. Great attention must be paid to the social and organizational setting of technoscientific work
New Knowledge Economy Three central issues The development of flexible, stable data standard The generation of protocols for data sharing The restructuring of scientific careers
Accounting For Life
Development of Classification Introduction: PANDORA taxonomic database
Development of Classification Importance of classification 18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958) Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)
Development of Classification Example of classification Paper-based archival practice. Issues: hard to reclassified Type specimen had to be relocated physically So do Series of articles or books
Development of Classification Example of classification Multifaceted classification system Improve: Enabling the classifications to be ordered in multiple ways, rather than in a single Example: A collection of books might be classified using an author facet, a subject facet, a date facet
Development of Classification Example of classification Hierarchical classification (for reading the past) E.F. Codd In the early 1970s Split physical storage of data in the computer and the representation of that data. Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought. Improve method: one record for every name, regardless of its taxonomic level
Cladistics Definition It is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself. Features : Give a more regular algorithm for determining phylogeny Focusing attention on shared, derived characteristics of set organisms Using ‘outgroup’ comparisons to develop the classification system
Cladistics Tree of life Cladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of life Charles Darwin (1809–1882) was the first to produce an evolutionary tree of life
Cladistics Tree of life
Cladistics Computer programs in cladistics Undertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PC David Swofford’s PAUP is a software package for inference of evolutionary trees Purpose: follow a given algorithm for generating and testing cladograms
Cladistics Computer programs in cladistics
Cladistics Computer programs in cladistics Issues: The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem. Algorithm issues
The Future Store the life Life is described as itself a program, with DNA being code. IF everything is information, then life can equally well be “stored”
THANK YOU !

Contenu connexe

Tendances

Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008bosc_2008
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMCarole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1guest2426e1d
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationMichael Day
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.orgNorman Morrison
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCarole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordKerstin Lehnert
 
Moving From Small Science To Big Science
Moving From Small Science To Big ScienceMoving From Small Science To Big Science
Moving From Small Science To Big ScienceWebometrics Class
 
Moving From Small Science To Big Science
Moving From Small Science To Big ScienceMoving From Small Science To Big Science
Moving From Small Science To Big Scienceguest2426e1d
 
The Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataThe Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataBarry Smith
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Carole Goble
 
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainIAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainBarry Smith
 

Tendances (20)

Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
Moving From Small Science To Big Science
Moving From Small Science To Big ScienceMoving From Small Science To Big Science
Moving From Small Science To Big Science
 
Moving From Small Science To Big Science
Moving From Small Science To Big ScienceMoving From Small Science To Big Science
Moving From Small Science To Big Science
 
The Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataThe Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military Data
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher?
 
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainIAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
 

En vedette

Getting started with delicious
Getting started with deliciousGetting started with delicious
Getting started with deliciousbeatnikbrown
 
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...Roman Romachev
 
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...Roman Romachev
 
Practicum slideshow
Practicum slideshowPracticum slideshow
Practicum slideshowmpriske
 
Cp indicator
Cp indicatorCp indicator
Cp indicatorxvn
 
Practicum slideshow
Practicum slideshowPracticum slideshow
Practicum slideshowmpriske
 

En vedette (9)

Blog
BlogBlog
Blog
 
Getting started with delicious
Getting started with deliciousGetting started with delicious
Getting started with delicious
 
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
 
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
 
Practicum slideshow
Practicum slideshowPracticum slideshow
Practicum slideshow
 
Cp indicator
Cp indicatorCp indicator
Cp indicator
 
Family Newsletter
Family NewsletterFamily Newsletter
Family Newsletter
 
Practicum slideshow
Practicum slideshowPracticum slideshow
Practicum slideshow
 
Dogs
DogsDogs
Dogs
 

Similaire à Databasing the World: Key Aspects of Biodiversity Databases

Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Metadata standards
Metadata standardsMetadata standards
Metadata standardsmakammer
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Hedstrom Infrastructure
Hedstrom InfrastructureHedstrom Infrastructure
Hedstrom Infrastructureguest2c9ba28e
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...Rafael C. Jimenez
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchangelagoze
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformLaura Clarke
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 

Similaire à Databasing the World: Key Aspects of Biodiversity Databases (20)

Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Metadata standards
Metadata standardsMetadata standards
Metadata standards
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Hedstrom Infrastructure
Hedstrom InfrastructureHedstrom Infrastructure
Hedstrom Infrastructure
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
Digital Destiny
Digital DestinyDigital Destiny
Digital Destiny
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Databasing the World: Key Aspects of Biodiversity Databases

  • 1. Databasing the World:Biodiversity and the 2000s Written by Bowker, G. C. Presented by Chen Zhang (Mike)
  • 2. Four Key Aspects Database Infrastructure Standards—flexible, stable Technology—stable Communication Data Sharing Ownership Disarticulation Data collection
  • 3. Four Key Aspects Distributed Collective Practice Collaborate work New Knowledge Economy Accounting for life Development of Classification Cladistics The Future
  • 4. Database Infrastructure
  • 5. Standards Why do we need standards Example of air-conditioner industry Diameter Match between screw and the hole on the panel Reasons for database Need ‘handshake’ among various media MIME<Multipurpose Internet Mail Extensions>protocol Each layer of infrastructure requires its own set of standards Need standardized categories.
  • 6. Standards Standards will not always win Some best-known standards QWERTY keyboard
  • 7. Standards Standards will not always win Some best-known standards VHS (Video Home System) standard
  • 8. Standards Standards will not always win Some best-known standards DOS computing system
  • 9. Standards Standards will not always win Why? The best standard maybe doesn’t have best market Standards setting is a key site of political work The inferior standard may be respected by the political agency. ( Such as standards-setting bodies)
  • 10. Standards Interoperability Continuum of strategies for standards setting One Standard Fits All Let A Thousand standards bloom
  • 11. Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection. IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.
  • 12. Standards Interoperability Some Related Standards ANSI/NISO Z39.50
  • 13. Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 A single enquiry over multiple databases. widely adopter in the library world.
  • 14. Standards Interoperability Some Related Standards 2. XML Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form. Two extremes: a. Colonial model b. Democratic model (win out) People’s established computing environment
  • 15. Technology Technology must be stable Nothing to guarantee the stability of vast data sets Failure of Paul Otlet’s well catalogued microfiches Development of computer memory Hard to retrieve information
  • 16. Technology Technology must stable Data accessible and usable Infrastructure will require a continued maintenance effort Reasons a. Data is passed from one medium to another b. Data is analyzed by one generation of database technology to the next.
  • 17. Issues of Communication Problem of reliable metadata Metadata—data about data The blue lines are metadata
  • 18. Issues of Communication Problem of reliable metadata The standard name of certain kinds of data Searchable—easy to search over multiple database Issue—how detail does the name of data should be? Lack of details— the information of data is useless Too many details— longer time, more work
  • 19. Issues of Communication Dublin code The Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements: Language Relation Coverage Rights Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source
  • 21. Ownership Control of knowledge Mid-nineteenth century: only professionally trained scientists and doctors New information economy: from many people Example: patients group
  • 22. Ownership Privacy Keep data private is difficult : Example: data is complied by third-company to generate a new, marketable form of knowledge New Patterns of ownership Science has frequently been analyzed as a “public good” Increasing privatization of knowledge : It is unclear to what extent the vaunted openness of the scientific community will last
  • 23.
  • 24.
  • 25. Data Collection Deal with old data Difficulties Scientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated. The distributed database is becoming a new model form of scientific publication in its own right Issues of Update No automatic update from one field to a cognate one Scientist are not able to share information across discipline divides
  • 26. Data Collection International Technoscience Purpose: Narrow the gaps between countries Issues: People do not have equal knowledge Access is never really equal Government have doubts of the usefulness of opening the database onto internet.
  • 28. Collaborative Work Management structures in universities and industry still tend to support the heroic myth of the individual researcher. What kind of value the large publishing houses add to journal production. Great attention must be paid to the social and organizational setting of technoscientific work
  • 29. New Knowledge Economy Three central issues The development of flexible, stable data standard The generation of protocols for data sharing The restructuring of scientific careers
  • 31. Development of Classification Introduction: PANDORA taxonomic database
  • 32. Development of Classification Importance of classification 18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958) Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)
  • 33. Development of Classification Example of classification Paper-based archival practice. Issues: hard to reclassified Type specimen had to be relocated physically So do Series of articles or books
  • 34. Development of Classification Example of classification Multifaceted classification system Improve: Enabling the classifications to be ordered in multiple ways, rather than in a single Example: A collection of books might be classified using an author facet, a subject facet, a date facet
  • 35. Development of Classification Example of classification Hierarchical classification (for reading the past) E.F. Codd In the early 1970s Split physical storage of data in the computer and the representation of that data. Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought. Improve method: one record for every name, regardless of its taxonomic level
  • 36. Cladistics Definition It is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself. Features : Give a more regular algorithm for determining phylogeny Focusing attention on shared, derived characteristics of set organisms Using ‘outgroup’ comparisons to develop the classification system
  • 37. Cladistics Tree of life Cladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of life Charles Darwin (1809–1882) was the first to produce an evolutionary tree of life
  • 39. Cladistics Computer programs in cladistics Undertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PC David Swofford’s PAUP is a software package for inference of evolutionary trees Purpose: follow a given algorithm for generating and testing cladograms
  • 41. Cladistics Computer programs in cladistics Issues: The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem. Algorithm issues
  • 42. The Future Store the life Life is described as itself a program, with DNA being code. IF everything is information, then life can equally well be “stored”