The document describes the Sanger Mouse Resources Portal, an attempt at a federated approach to creating a collaborative data portal for mouse genomic data. The portal aggregates data from 5 sources using a search engine and data services that allow each group to host their own data and expose it via defined interfaces. This avoids any single group having total control while allowing new data to be easily added. However, it also risks redundancy and lacks centralized curation of the whole collection.
7. Why This Works
• Clearly defined centre
• It provides central curation for all data
8. Mouse Informatics
• Genes
• Mutants (ES Cells, Mice)
• Phenotypes
• In mouse informatics, the traditional
Borg is MGI - this has worked nicely
for many years: http://informatics.jax.org
9. Mouse Informatics
• Times are changing...
• Other informatics groups are providing
high volume data and want in on the
portal game
10.
11. “Hand over your data,
prepare to be assimilated”
“No, YOU hand over your data and
prepare to be assimilated”
“Ahem, both of you, prepare to be assimilated!”
12. “Hand over your data,
prepare to be assimilated”
“No, YOU hand over your data and
prepare to be assimilated”
?
lB org
he rea
t
yo u is
c h of
… whi
“Ahem, both of you, prepare to be assimilated!”
13. ‘Federation’ Approach
• Each group hosts
their own data and
exposes it via defined
services
• Make a ‘clever’ portal
that pulls of these
resources together
• No single group is
totally in charge
14.
15.
16.
17.
18.
19.
20.
21. The Sanger Mouse
Resources Portal
http://www.sanger.ac.uk/mouseportal
(Our Attempt at the Federation Approach...)
22.
23. Distributed Data
• Currently 5 distinct, but related sets of
mouse data:
• Gene Information
• Phenotyping
• Mutant Mouse Breeding
• Mutant ES Cell / Vector Production
• Other DNA Resources
39. MartSearch / Portal
send users search term to Solr
Solr returns groups of terms
to query Biomarts with
index searchable
terms
40. MartSearch / Portal
send users search term to Solr
Solr returns groups of terms
to query Biomarts with
send asynchronous requests to each of the
Biomarts for the data the user is interested in
index searchable
terms
44. User searches for ‘Cbx7’
Search for ‘Cbx7’
JSON data containing information on
what to search each biomart by...
45. User searches for ‘Cbx7’
Search for ‘Cbx7’
JSON data containing information on
what to search each biomart by...
Search using query parameters
defined by Solr response
46. User searches for ‘Cbx7’
Search for ‘Cbx7’
JSON data containing information on
what to search each biomart by...
Search using query parameters
defined by Solr response
Render search results using templates
47. Extending The Portal
• Put new data into a Biomart
• Write JSON config file for MartSearch
(defining filters to index and use)
• Rebuild the index
48. Advantages
• Easily extensible
• Data responsibility shared
49. Disadvantages
• Hard to avoid redundancy
• Sometimes needed for data linking
• Un-curated
• Each group can curate its own data
• No curation as a whole
50. Disclaimer
• Windows users...
• If you use IE - it will eat your browser
• Use Firefox/Chrome/Safari/Opera for
a more pleasant internet experience
• We are working on it - IE 8 gives an ok
experience...
51. Acknowledgments
• Funding: I-DCC grant (EU FP7)
• Coordination of informatic resources
from high-throughput mouse ES cell
mutagensis programs
• Wellcome Trust Sanger Institute
• T87 - ES Cell Mutagenesis
• MIG - Mouse Informatics Group