Overview of InterMine infrastructure, ability to interoperate with other InterMine instances via IM 2.0 StairCase
Presented at the LF Project Kickoff Meeting, 2015/06/22
2. InterMine in a nutshell
• Open-source data warehouse software
• Integration of complex biological data
• Parsers for common biological data formats
• Extensible framework for custom data
• Cookie-cutter interface, highly customizable
• Interact using sophisticated web query tools
• Programmatic access using web-service API
3. Open-source Project
• Source code available online
• Distributed with the GNU
LGPL license
• GitHub Repo:
https://github.com/intermine/int
ermine
• GitHub Organization:
https://github.com/intermine
intermine / intermine
> bio
> biotestmine
> config
> flymine
> humanmine
> imbuild
> intermine
> testmodel
.gitignore
.travis.yml
LICENSE
LICENSE.LIBS
README.md
RELEASE_NOTES
4. Richard N. Smith et al. Bioinformatics 2012;28:3163-3165
InterMine system architecture
5. InterMine system architecture
Web Application
• Java Server Pages (JSP), HTML, JS, CSS
• Interfaces with Java Servlets and IM web-services
Web Server
• Tomcat 7.0.x, serves Web application ARchive file
• ant based build system using Java SDK
Database Server
• PostgreSQL 9.2 or above
• range query, btree, gist enabled (refer docs here)
http://intermine.readthedocs.org/en/latest/system-requirements/
6. Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472
InterMine web services
http://iodocs.labs.intermine.org
JBrowse
7. Federated Authentication
• Apart from the standard login scheme
(username/password), InterMine supports industry
standard OAuth2 based login flows, implemented
by Google, GitHub, Agave, etc.
• ThaleMine (Arabidopsis) relies on this
infrastructure to authenticate users against the
araport.org tenant registered within the Agave
infrastructure
• Documentation available here:
http://intermine.readthedocs.org/en/latest/webapp/
properties/web-properties/#openauth2-settings-
aka-openid-connect
8. Interoperability?
• Ability of InterMine instances to
communicate ‘automatically’ with each
other
• By way of leveraging web services
• Questions to be answered:
What do they say to each other?
How do they say it?
What mechanisms are used?
Enabling these mechanisms…
9. Data Model
• Data Model === Schema of InterMine
instance
• Defined in XML format
• Core data model (based on SO) can be
extended to suit requirements
• Access a mines data model in JSON format
http://MINE_URL/service/model/?format=json
• Compatibility of data models across mines
ensures interoperability
10. Advantages of common data
model
• Data mining scripts developed for one
mine immediately compatible with
others
• Promotes crowdsourcing
one/more groups write
tools/widgets/parsers
can be easily reused by others
• Enables cross species analysis
11. Available tools
• Multi-mine search tool
https://github.com/alexkalderimis/multimine-search-tool
Based on InterMine Lucene-based search index
Allows for interoperation when data models are different
• Integration based on Homologs:
Ontology integration using `dagify`
https://github.com/intermine/dagify
Pathway Integration by way of collating shared pathways
• InterMine Staircase
Powerful client-side interface enabling data analysis
workflows and cross-mine integration via web services
http://staircase.herokuapp.com
20. Available Reference Mines
• ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/
Integrates variety of genomic datasets pertaining to Arabidopsis thaliana col-0
Leverages both data warehousing and federation methods
Represents wide variety of data: genes, proteins, function, expression, co-expression,
interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm,
phenotypes
• MedicMine: https://github.com/jcvi-plant-genomics/intermine/
Warehouse for Medicago truncatula A17 genomic data
Houses variety of data: genes, proteins, function, expression
• PhytoMine: https://github.com/JoeCarlson/intermine/
Warehouse for 47 different Angiosperm genomes
Developed on a Chado InterMine migration path
Houses variety of data: genes, proteins, expression, homologs, protein families,
variation
• FlyMine: https://github.com/intermine/intermine/
21. Recommendations and Challenges
• Recommendations:
Develop core plant InterMine model
Follow InterMine guidelines
Learn from prior initiatives - InterMOD
• Challenges
Users/developers are used to current way of
doing things
Time taken to adapt to common data model
and/or software stack
Difficult to arrive at consensus with diverse group
22. Acknowledgments
• InterMine Team
Gos Micklem
Julie Sullivan
Alex Kalderimis
Richard Smith
Sergio Contrino
Josh Heimbach
et al.
• Araport Team
Chris Town
Jason Miller
Matt Vaughn
Maria Kim
Svetlana
Karamycheva
Erik Ferlanti
Chia-Yi Cheng
Benjamin Rosen
Irina Belyaeva
Editor's Notes
bio: code to deal with biological data, including data sources
flymine: config used to create FlyMine
testmodel: non-biological test data model used for testing core InterMine
imbuild: ant-based build system, do not edit anything
intermine: the core (generic) InterMine code to work with any data model
ObjectStore: custom Java object/relational mapping system, optimized for read-only database performance
Query optimizer: pre-computed tables joining connected data from different tables, improves PostgreSQL performance
Summary of web services available through InterMine.