The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. This presentation deals with what, why, how, where and who of PDB. In this presentation we have also included briefing about various file formats available in PDB with emphasis on PDB file format
3. • The Protein Data Bank (PDB) is a database for the three-dimensional structural data
of large biological molecules, such as proteins and nucleic acids
• The data, typically obtained by X-ray crystallography, NMR spectroscopy, or,
increasingly, cryo-electron microscopy
• The data is freely accessible on the Internet via the websites of its member
organizations (PDBe, PDBj, RCSB, and BMRB)
• The PDB is overseen by an organization called the Worldwide Protein Data Bank,
wwPDB
What is PDB ?
4. Why did it start ?
Growing
crystallographic
data
Development of
BRAD in 1968
5. • In 1969, Dr Edgar Meyer began to write software to store atomic coordinates files in
a common format to make them available for geometric and graphical evaluation
(with sponsorship of Dr. Walton Hamilton at Bookhaven National laboratory
• In 1971, one of Dr. Meyer’s programs – SEARCH, enabled networking, that enabled
the researches to access information from database to study protein structures offline
• In 1973, upon Hamilton’s death, Dr. Tom Koetzle took over direction of PDB fo 20
years
How did it start ?
6. • In 1980s, IUCr guidelines established, number of structures deposited increases and
independent biological databases such as the NDB were established
• In Oct, 1998, PDB was transferred to Research Collaboratory for Structural
Bioinformatics (RCSB), complete transfer since 1999. Dr. Helen M Berman of
Rutgers University was the new director
• In 2003, with the formation of wwPDB, the PDB became an international
organization having three member organizations
• In 2006, the BMRB joined PDB
How did it start ?
7. Who runs it ?
The Worldwide PDB
(wwPDB) organization
manages the PDB archive and
ensures that the PDB is freely
and publicly available to the
global community
Protein Data Bank
in Europe
Protein Data Bank
Japan
Research Collaboratory for Structural
Bioinformatics Protein Data Bank
Biological Magnetic Resonance
Data Bank
8. Who runs it ?
Rich information about all PDB entries,
multiple search and browse facilities,
advanced services including PDBePISA,
PDBeFold and PDBeMotif, advanced
visualisation and validation of NMR and EM
structures, tools for bioinformaticians
9. Who runs it ?
Supports browsing in multiple languages
such as Japanese, Chinese, and Korean;
SeSAW identifies functionally or
evolutionarily conserved motifs by
locating and annotating sequence and
structural similarities, tools for
bioinformaticians, and more
10. Who runs it ?
Simple and advanced searching for
macromolecules and ligands, tabular
reports, specialized visualization tools,
sequence-structure comparisons,
RCSB PDB Mobile, Molecule of the
Month and other educational resources
at PDB-101, and more
11. Who runs it ?
Collects NMR data from any experiment and
captures assigned chemical shifts, coupling
constants, and peak lists for a variety of
macromolecules; contains derived annotations
such as hydrogen exchange rates, pKa values,
and relaxation parameters
17. • The PDB is a repository of atomic coordinates and other information describing
proteins and other important biological macromolecules
• Structural biologists use methods such as X-ray crystallography, NMR spectroscopy,
and cryo-electron microscopy to determine the location of each atom relative to each
other in the molecule
• They then deposit this information, which is then annotated and publicly released into
the archive by the wwPDB
How data is collected?
18. • RCSB PDB website, allow you to search and explore the information under the PDB
header, including information on experimental methods and the chemistry and
biology of the protein
• Once you have found the PDB entries that you are interested in, you may
use visualization programs to allow you to read in the PDB file, display the protein
structure on your computer, download the information and create custom pictures of
it
• These programs also often include analysis tools that allow you to measure distances
and bond angles, and identify interesting structural features
How to retrieve the data ?
19. • One can search for their protein of interest by using the search bar in the RCSB PDB
website
• It allows one to search either by typing the PDB ID, name of the author (who has
deposited the structure), or the sequence of the protein or any particular ligand of
interest
How to search ?
20. • PDB ID, is the 4-character unique identifier of every entry in the Protein Data Bank
• A 4-character PDB ID is assigned to each new structure at the time of deposition
• The first character is a numeral in the range 1-9, while the last three characters can be
either numerals (in the range 0-9) or letters (in the range A-Z)
• If the PDB ID of an entry in the Protein Data Bank is known, it is the most direct way
to retrieve it from the database
• However, this can’t be used as an identifier for biomolecules, because several
structures of the same molecule in different enviroments or different conformations
are contained in PDB with different PDB IDs
PDB ID
21. • One or more PDB IDs can be typed or copied and pasted in the search box. Multiple
IDs can be separated by commas or white space, including line breaks.
• Example:
Enter 4HHB into the text box next to "PDB ID(s)" and press "Submit Query". The Structure Summary page
for 4HHB will load
Enter 2HHB, 3HHB, 4HHB into the text box and press "Submit Query". A Query Results Browser page with
a brief summary of the three structures will load. From there, clicking a PDB ID, thumbnail image, or
structure title will load the Structure Summary page for the respective ID
PDB ID
23. • The data in PDB is usually stored in 3 different file formats
PDB file format
mmCIF format
PDBML
File formats
24. • mmCIF is the acronym for the macromolecular Crystallographic Information File
• mmCIF is based on a subset of the syntax rules for the Self Defining Text Archive
(STAR) file
• A Dictionary Description Language (DDL) defines the structure of mmCIF
dictionaries
• Dictionaries provide the metadata which define the content of mmCIF data files
• mmCIF data files, dictionaries and DDLs all are expressed in a common syntax
mmCIF
25. • The Protein Data Bank Markup Language (PDBML) provides a representation of
PDB data in XML format
• The description of this format is provided in XML schema of the PDB Exchange
Data Dictionary
• This schema is produced by direct translation of the PDBx/mmCIF Exchange Data
Dictionary Other data dictionaries used by the PDB have been electronically
translated into XML/XSD schemas
PDBML
26. • The Protein Data Bank Markup Language (PDBML) provides a representation of
PDB data in XML format
• The description of this format is provided in XML schema of the PDB Exchange
Data Dictionary
• This schema is produced by direct translation of the PDBx/mmCIF Exchange Data
Dictionary Other data dictionaries used by the PDB have been electronically
translated into XML/XSD schemas
PDB file format
27. How to read PDB file ?
• Sections of an Entry
The following table lists the various sections of a PDB coordinate entry and the
records comprising them:
28. How to read PDB file ?
• Types of Records
It is possible to group records into categories based upon how often the record type
appears in an entry.
Single:
There are records that may only appear one time (without continuations) in
a file. It is an error for a duplicate of any of these records to appear in an
entry.
Once in an entry but exceed the number of columns available:
There are records that conceptually exist only once in an entry, but the
information content may exceed the number of columns available. These
records are therefore continued on subsequent lines.
29. How to read PDB file ?
• Types of Records
Multiple:
Most record types appear multiple times, often in groups where the
information is not logically concatenated but is presented in the form of a list.
Many of these record types have a custom serialization that may be used not
only to order the records, but also to connect to other record types.
Multiple in an entry but exceed the number of columns available:
These records are therefore continued on subsequent lines. The second and
subsequent lines contain a continuation field which is a right-justified integer.
This number increments by one for each additional line of the record, and is
followed by a blank character.
30. How to read PDB file ?
• Types of Records
Grouping:
There are three record types used to group other records.
Other:
The remaining record types have a detailed inner structure.
31. How to read PDB file ?
• Types of Records
Single:
32. How to read PDB file ?
• Types of Records
Once in an entry but exceed the number of columns available :
33. How to read PDB file ?
• Types of Records
Multiple :
34. How to read PDB file ?
• Types of Records
Multiple in an entry but exceed the number of columns available :
35. How to read PDB file ?
• Types of Records
Grouping :
36. How to read PDB file ?
• Types of Records
Other :
JRNL - Literature citation that defines the coordinate set
REMARK - General remarks, some are structured and some are
free form
37. How to read PDB file ?
• Order of Records:
All records in a PDB coordinate entry must appear in a defined order. Mandatory
record types are present in all entries. When mandatory data are not provided, the
record name must appear in the entry with a NULL indicator. Optional items become
mandatory when certain conditions exist.
41. Want to learn further ?
PDB-101 is an online portal for teachers, students, and the general public to
promote exploration in the world of proteins and nucleic acids. Learning
about the diverse shapes and functions of these biological macromolecules
helps to understand all aspects of biomedicine and agriculture, from protein
synthesis to health and disease to biological energy.
( http://pdb101.rcsb.org/ )