Towards a (united) federation of Bioinformatics resources
1. TOWARDS A (UNITED) FEDERATION OF
BIOINFORMATICS RESOURCES
Matthew Vaughn @mattdotvaughn
Director, Life Sciences Computing, TACC | Co-PI Cyverse, Araport, Jetstream
1/14/2017 1
Interoperability and Federation Across Bioinformatic Platforms
and Resources
Jan 14, 2017
3. WHY FEDERATE?
1/14/2017 3
There’s always some existing or emergent
• Data Set
• Database
• Visualization Technology
• Software Algorithm or Library
• Physical Capacity or Capability
• Source of funding and support
not in scope for you to directly provide or
avail yourself of
Federated infrastructures are TEAM-BUILT
Increase the resiliency of your
informatics ecosystem
Leverage all the other brains who have
different views of your problem
4. WHY DON’T WE FEDERATE BY DEFAULT?
1/14/2017 4
Federation requires three
things:
• Components confirming to
“standardized" schemas,
protocols for interaction
and usage
• Stably-operated
frameworks to handle
yeoman’s work
of integrating components
6. WHY DON’T WE FEDERATE BY DEFAULT?
1/14/2017 6
Hey wait.. I said there were three things we needed for federation:
7. WHY DON’T WE FEDERATE BY DEFAULT?
1/14/2017 7
Hey wait.. I said there were three things we needed for federation:
Planning &
Specific Effort
8. 1/14/2017 8
Lab-Born Software
• Immediately responsive
• Limited R&D
• Resources on hand
• Sustainability? What’s that?
Centrally-Planned Software
• Mindfully built
• Better chance for R&D
• Dedicated resources
• Sustainability? What’s that?
WHY DON’T WE FEDERATE BY DEFAULT?
Some of the most interesting work is done at the edges of our
infrastructure. Their adopting federated access patterns post-hoc
means assuming substantial technical debt.
10. HOW CAN WE MAKE FEDERATION EASIER?
1/14/2017 10
Deeply understand the capabilities of existing integration platforms
• Avoid Not-Invented-Here by adopting the 80% rule
• Contribute enhancements, either via active feedback or by coding them
• Build on our platforms and make sure they get credit for their role
Identify and adopt existing standards. Contribute where they fall short of our needs
• OpenAPI for web service definitions
• ISA Framework for experimental metadata
• Oauth2 for authorization
Provide tooling and documentation for users with diverse technical backgrounds
• GUI, Forms, Web Services. But also language libraries and SDKs.
• Make sure we understand the motivations and constraints of those users
• Write cookbooks, not just shopping lists
12. MAKING FEDERATION WORK REQUIRES THAT WE
INCREASE EVERYONE’S PRODUCTIVITY
1/14/2017 12
@mattdotvaughn www.slideshare.net/mattdotvaughn vaughn@tacc.utexas.edu
Me. Background molecular genetics and physiology before moving into bioinformatics and infrastructure.
Talk about holstic approach taken by Cyverse project over the last few years and how I think it’s been transformational.
Walmart is an easy target, but think about other monoliths.
We also want to make our ecosystem RESILIENT and USE OUR DIVERSITY
This is beautiful. One of the great things about scientists is that we build our own tools with the materials we have at hand.
Let’s talk about Hipmunk. It’s actually a good analog for Bioinformatics portals.
Hipmunk’s value prop is predictive analytics to optimize customer purchasing decisions for travel. It takes a small slice, which is valuabe to offerors because it helps them match unsold inventory w flexible, price-concsious customers.
To accomplish this Hipmunk has to CAPTURE and PRESENT diverse data
Current and real-time pricing data from multiple lodging aggregators
Maps and transit
Review system(s)
Identity and access
Its own, proprietary data stream
It could not exist on its own if these resources were not available because its costs would exceed the value of its improved efficiency.
It’s a PLATFORM. So are some (but not all) of its data sources.
I don’t want to steal Chris’ thunder so I won’t go into GREAT detail here
To accomplish this browser view:
Araport Intermine, Jbrowse, and Adama services
CyVerse Auth & Data Store
GitHub & PyPi
TACC’s Agave API
JGI Intermine Phytozome
TACC Openstack Cloud + Amazon Web Services
New services have arisen under Araport model with very little NEW code or resource allocation.
Research requirements and build iin Fed to design
Dedicate effort to it, even if it’s cheaper in the short term to NOT FEDERATE
Built in immediate response to research needs
Limited or no research
Technology and resources on hand
Programming Language/Framework
Developer skill and commitment
Perspective
Possibly no sustainability or maintenance plan
Built mindfully, usually to fulfill a funded research mandate
Design and implementation research
During proposal
In early stage of development
May be able to dedicate resources, adopt new tech, acquire broader perspective
Possibly still no sustainability or maintenance plan ;-)
So, we need to make it EASIER to start, easier to comply, easier to maintain federated resources
I want to stop and point out an specific example:
It possible to extend CyVerse at ANY level of the infrastructure.
People can build against Cyverse by DEFAULT now and it’s a net positive.
This design pattern is now a standard referenced by non-BIO programs
Jetstream, NSF’s new cloud system, builds on CyVerse atmo. Went to production 30d after receipt of hardware.
DesignSafe.CI leveraged its own copy of CyVerse API to onboard a 25,000 person user community in 6 months
Standards are examples only
DE and Atmo and Web Apps / Web Service Catalog
INCREASE EVERYONE’S PRODUCTIVITY
How do these offerings meet those previous criteria?
MAKING FEDERATION WORK REQUIRES THAT WE INCREASE EVERYONE’S PRODUCTIVITY