SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
@yannick__ http://wurmlab.github.io
Opportunities to
overcome human &
computational challenges
in biology.
2015-03-25-CW15
This changes
everything.454
Illumina
Solid...
Any lab can
sequence
anything!
• Biology/life is complex
• Field is young.
• Biologists lack computational training.
• Generally, analysis tools suck.
• badly written
• badly tested
• hard to install
• output quality… often questionable.
• Understanding/visualizing/massaging data is hard.
• Datasets continue to grow!
Genomics is hard.
Four tools
suck less.
Four tools that
Four tools
suck less.
(hopefully)
Four tools that
1. SequenceServer
“Can you BLAST this for me?”
BLAST is the most commonly used tool: >100,000 citations
But:
•convoluted interface
•challenging on custom data
http://www.sequenceserver.com/
If no config file:Asks interactive setup questions.
If needed: Downloads BLAST binaries
If needed: Formats FASTA into BLAST database.
1. Installing
gem install sequenceserver
### Launched SequenceServer at: http://0.0.0.0:4567
2. Launch
sequenceserver
Demo
Anurag Priyam - @yeban
2. oswitch
Bioinformatics software challenges
• Software hard to install 😖
• Central versions on cluster
• “I need it now” - slow system administrators? → delays
• Impossible to install?
• AmazonVirtual Machine - copy there - run analysis - copy
back→😕
• Consistent version for specific project?
One project dozens of tools?
Miracle solution?
Too complicated & too much copying back & forth
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  abyss-­‐pe	
  k=25	
  reads.fastq.gz	
  
	
  	
  	
  	
  zsh:	
  command	
  not	
  found:	
  abyss-­‐pe	
  
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  oswitch	
  -­‐l	
  
	
  	
  	
  	
  yeban/biolinux:8	
  
	
  	
  	
  	
  ubuntu:14.04	
  
	
  	
  	
  	
  ontouchstart/texlive-­‐full	
  
	
  	
  	
  	
  ipython/ipython	
  
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  oswitch	
  yeban/biolinux	
  
	
  	
  	
  	
  ######	
  You	
  are	
  now	
  running:	
  biolinux	
  in	
  container	
  biolinux-­‐7187.	
  ######	
  
biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>	
  abyss-­‐pe	
  k=25	
  reads.fastq.gz	
  
	
  	
  	
  	
  [...	
  just	
  works	
  on	
  your	
  files	
  where	
  they	
  are…]	
  
biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>	
  exit	
  
mymacbook:~/2015-­‐02-­‐01-­‐myproject>	
  
	
  	
  	
  	
  [...	
  output	
  is	
  where	
  you	
  expect	
  it	
  to	
  be	
  ...]	
  
https://github.com/yeban/oswitch
oSwitch
One-line access to other operating systems.
oSwitch
One-line access to other operating systems.
Things feel (largely) unchanged:
• Current working directory is maintained.
• User name, uid and gid are maintained.
• Login shell (bash/zsh/fish) is maintained.
• Home directory is maintained (thus all .dotfiles and config files are maintained).
• read/write permissions are maintained.
• Paths are maintained whenever possible.Thus volumes (external drives, NAS, USB)
mounted on the host are available in the container at the same path.
https://github.com/yeban/oswitch
Working with Gene predictions
Gene prediction
Dozens of software algorithms: dozens of predictions
20% failure rate:
•missing pieces
•extra pieces
•incorrect merging
•incorrect splitting
Visual inspection... and
manual fixing required.
1 gene = 5 minutes to 3 days
Yandell&Ence2013NRG
GTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATTATGTTGAATaTTAGGGTTTTTATAAAGAATGTGTATATTGUTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTA
Evidence
Consensus:
2. GeneValidator
Monica Dragan
Ismail Moghul
https://github.com/monicadragan/GeneValidator
http://genevalidator.sbcs.qmul.ac.uk
Monica Dragan
https://github.com/monicadragan/GeneValidator
http://genevalidator.sbcs.qmul.ac.uk
Ismail Moghul
GeneValidator
Run on:
★whole geneset: identify most problematic predictions
★alternative models for a gene (choose best)
★individual genes (while manually curating)
3.Afra: Crowdsourcing
gene model curation
Gene prediction
Dozens of software algorithms: dozens of predictions
20% failure rate:
•missing pieces
•extra pieces
•incorrect merging
•incorrect splitting
Visual inspection... and
manual fixing required.
1 gene = 20 minutes to 3 days
15,000 genes * 20 species =
impossible.
Yandell&Ence2013NRG
TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT
Evidence
Consensus:
Algorithm discovery by protein folding game players
Firas Khatiba
, Seth Cooperb
, Michael D. Tykaa
, Kefan Xub
, Ilya Makedonb
, Zoran Popovićb
,
David Bakera,c,1
, and Foldit Players
a
Department of Biochemistry; b
Department of Computer Science and Engineering; and c
Howard Hughes Medical Institute, University of Washington,
Box 357370, Seattle, WA 98195
Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011)
Foldit is a multiplayer online game in which players collaborate
and compete to create accurate protein structure models. For spe-
cific hard problems, Foldit player solutions can in some cases out-
perform state-of-the-art computational methods. However, very
little is known about how collaborative gameplay produces these
results and whether Foldit player strategies can be formalized and
structured so that they can be used by computers. To determine
whether high performing player strategies could be collectively
codified, we augmented the Foldit gameplay mechanics with tools
for players to encode their folding strategies as “recipes” and to
share their recipes with other players, who are able to further mod-
ify and redistribute them. Here we describe the rapid social evolu-
tion of player-developed folding algorithms that took place in the
year following the introduction of these tools. Players developed
over 5,400 different recipes, both by creating new algorithms and
by modifying and recombining successful recipes developed by
other players. The most successful recipes rapidly spread through
the Foldit player population, and two of the recipes became parti-
cularly dominant. Examination of the algorithms encoded in these
two recipes revealed a striking similarity to an unpublished algo-
rithm developed by scientists over the same period. Benchmark
calculations show that the new algorithm independently discov-
ered by scientists and by Foldit players outperforms previously
published methods. Thus, online scientific game frameworks have
the potential not only to solve hard scientific problems, but also to
discover and formalize effective new strategies and algorithms.
citizen science ∣ crowd-sourcing ∣ optimization ∣ structure prediction ∣
strategy
Citizen science is an approach to leveraging natural human
abilities for scientific purposes. Most such efforts involve
visual tasks such as tagging images or locating image features
(1–3). In contrast, Foldit is a multiplayer online scientific discovery
game, in which players become highly skilled at creating accurate
protein structure models through extended game play (4, 5). Foldit
recruits online gamers to optimize the computed Rosetta energy
using human spatial problem-solving skills. Players manipulate
protein structures with a palette of interactive tools and manipula-
tions. Through their interactive exploration Foldit players also uti-
lize user-friendly versions of algorithms from the Rosetta structure
prediction methodology (6) such as wiggle (gradient-based energy
minimization) and shake (combinatorial side chain rotamer pack-
ing). The potential of gamers to solve more complex scientific pro-
blems was recently highlighted by the solution of a long-standing
protein structure determination problem by Foldit players (7).
One of the key strengths of game-based human problem ex-
ploration is the human ability to search over the space of possible
strategies and adapt those strategies to the type of problem and
stage of problem solving (5). The variability of tactics and
strategies stems from the individuality of each player as well as
multiple methods of sharing and evolution within the game
(group play, game chat), and outside of the game [wiki pages (8)].
One way to arrive at algorithmic methods underlying successful
human Foldit play would be to apply machine learning techniques
to the detailed logs of expert Foldit players (9). We chose instead
to rely on a superior learning machine: Foldit players themselves.
As the players themselves understand their strategies better than
anyone, we decided to allow them to codify their algorithms
directly, rather than attempting to automatically learn approxi-
mations. We augmented standard Foldit play with the ability to
create, edit, share, and rate gameplay macros, referred to as
“recipes” within the Foldit game (10). In the game each player
has their own “cookbook” of such recipes, from which they can
invoke a variety of interactive automated strategies. Players can
share recipes they write with the rest of the Foldit community or
they can choose to keep their creations to themselves.
In this paper we describe the quite unexpected evolution of
recipes in the year after they were released, and the striking con-
vergence of this very short evolution on an algorithm very similar
to an unpublished algorithm recently developed independently
by scientific experts that improves over previous methods.
Results
In the social development environment provided by Foldit,
players evolved a wide variety of recipes to codify their diverse
strategies to problem solving. During the three and a half month
study period (see Materials and Methods), 721 Foldit players ran
5,488 unique recipes 158,682 times and 568 players wrote 5,202
recipes. We studied these algorithms and found that they fell
into four main categories: (i) perturb and minimize, (ii) aggressive
rebuilding, (iii) local optimize, and (iv) set constraints. The first
category goes beyond the deterministic minimize function
provided to Foldit players, which has the disadvantage of readily
being trapped in local minima, by adding in perturbations to lead
the minimizer in different directions (11). The second category
uses the rebuild tool, which performs fragment insertion with
loop closure, to search different areas of conformation space;
these recipes are often run for long periods of time as they are
designed to rebuild entire regions of a protein rather than just
refining them (Fig. S1). The third category of recipes performs
local minimizations along the protein backbone in order to im-
prove the Rosetta energy for every segment of a protein. The final
category of recipes assigns constraints between beta strands or
pairs of residues (rubber bands), or changes the secondary struc-
ture assignment to guide subsequent optimization.
Different algorithms were used with very different frequencies
during the experiment. Some are designated by the authors as
public and are available for use by all Foldit players, whereas
others are private and available only to their creator or their
Foldit team. The distribution of recipe usage among different
players is shown in Fig. 1 for the 26 recipes that were run over
1,000 times. Some recipes, such as the one represented by the
leftmost bar, were used many times by many different players,
while others, such as the one represented by the pink bar in the
Author contributions: F.K., S.C., Z.P., and D.B. designed research; F.K., S.C., M.D.T., and
F.P. performed research; F.K., S.C., M.D.T., K.X., and I.M. analyzed data; and F.K., S.C., Z.P.,
and D.B. wrote the paper.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.
1
To whom correspondence should be addressed. E-mail: dabaker@u.washington.edu.
This article contains supporting information online at www.pnas.org/lookup/suppl/
doi:10.1073/pnas.1115898108/-/DCSupplemental.
BIOPHYSICSAND
COMPUTATIONALBIOLOGY
PSYCHOLOGICALAND
COGNITIVESCIENCES
http://Fold.it
• Recruiting & retaining contributors
Crowd-sourcing the visual inspection + correction
of gene models.
Challenges
Recruiting & retaining contributors
Plan A: get students.
• Increase accessibility:
• Make tasks small & simple
• Need excellent tutorials & training
• Need an intelligent “mothering” user interface.
• Provide rewards:
• Better grades
• Learning experience
• Good karma (helping science)
• Prestige & pride (on facebook; points & badges “leaderboard”, with
certificates, in publications)
• Opportunities to develop expertise & responsibilities
Crowd-sourcing the visual inspection + correction
of gene models.
Challenges
• Recruiting & retaining contributors
• Ensuring quality
Ensuring quality
• Excellent tutorials/training
• Make tasks small & simple
• Redundancy
• Review of conflicts by senior
users.
Begin
Needs curation
Create initial tasks
Being curated
Curate
Being curated
Curate
Being curated
Curate
Submit Submit Submit
Auto-check
Done
Inconsistent: create
“review” task
Consistent:
create next required task
Crowd-sourcing the visual inspection + correction.
Challenges
http://afra.sbcs.qmul.ac.ukAnurag Priyam http://github.com/yeban/afra
• Recruiting & retaining contributors
• Ensuring quality
Warning:Work in Progress
Timelines
• Rolled out to:
• 8 MSc students
• 20 3rd year students
• Need to improve tutorials/guidance/documentation/outputs
• Roll out to 200 first years (autumn)
• Expand
5. Bionode… stay tuned.
Thanks!
y.wurm@qmul.ac.uk
@yannick__
http://wurmlab.github.io
Colleagues & Collaborators
@ QMUL & UNIL
Anurag Priyam 		 @yeban
Monica Dragan
Ismail Moghul
Vivek Rai
Bruno Vieira @bmpvieira
Minimally guided demo :)
1. Set up a custom BLAST server (Sequenceserver)
2. Set up oswitch to rapidly switch to yeban/biolinux
@yannick__

Contenu connexe

Similaire à Sustainable software institute Collaboration workshop

Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemBenjamin Murphy
 
8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark Alves8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark AlvesMark Alves
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015Antoine Taly
 
KorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent frameworkKorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent frameworkAntonAndreev13
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)guestebde
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesAron Ahmadia
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentDavid De Roure
 
Ludo: An Ontology to Create Linked Data Driven Serious Games
Ludo: An Ontology to Create Linked Data Driven Serious GamesLudo: An Ontology to Create Linked Data Driven Serious Games
Ludo: An Ontology to Create Linked Data Driven Serious GamesOscar Rodríguez Rocha
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?EDB
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesAnnika Eriksson
 

Similaire à Sustainable software institute Collaboration workshop (20)

Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation System
 
8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark Alves8 Usability Lessons from the UPA Conference by Mark Alves
8 Usability Lessons from the UPA Conference by Mark Alves
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015
 
Ready, Set, Refactor
Ready, Set, RefactorReady, Set, Refactor
Ready, Set, Refactor
 
402 w2
402 w2402 w2
402 w2
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
KorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent frameworkKorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent framework
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research Environment
 
Ludo: An Ontology to Create Linked Data Driven Serious Games
Ludo: An Ontology to Create Linked Data Driven Serious GamesLudo: An Ontology to Create Linked Data Driven Serious Games
Ludo: An Ontology to Create Linked Data Driven Serious Games
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
OpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptxOpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptx
 
Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 

Plus de Yannick Wurm

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomicsYannick Wurm
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics researchYannick Wurm
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible researchYannick Wurm
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdomYannick Wurm
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosomeYannick Wurm
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assemblyYannick Wurm
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker badYannick Wurm
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...Yannick Wurm
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.keyYannick Wurm
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitchYannick Wurm
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolutionYannick Wurm
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolutionYannick Wurm
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible researchYannick Wurm
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.keyYannick Wurm
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcommYannick Wurm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 introYannick Wurm
 

Plus de Yannick Wurm (20)

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible research
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdom
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assembly
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.key
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolution
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolution
 
Evolution week3
Evolution week3Evolution week3
Evolution week3
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
 
Evolution week2
Evolution week2Evolution week2
Evolution week2
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Sbc322 intro.key
Sbc322 intro.keySbc322 intro.key
Sbc322 intro.key
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 intro
 

Dernier

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 

Dernier (20)

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 

Sustainable software institute Collaboration workshop

  • 1. @yannick__ http://wurmlab.github.io Opportunities to overcome human & computational challenges in biology. 2015-03-25-CW15
  • 3. • Biology/life is complex • Field is young. • Biologists lack computational training. • Generally, analysis tools suck. • badly written • badly tested • hard to install • output quality… often questionable. • Understanding/visualizing/massaging data is hard. • Datasets continue to grow! Genomics is hard.
  • 8. “Can you BLAST this for me?” BLAST is the most commonly used tool: >100,000 citations But: •convoluted interface •challenging on custom data
  • 9. http://www.sequenceserver.com/ If no config file:Asks interactive setup questions. If needed: Downloads BLAST binaries If needed: Formats FASTA into BLAST database. 1. Installing gem install sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567 2. Launch sequenceserver Demo Anurag Priyam - @yeban
  • 10.
  • 11.
  • 13. Bioinformatics software challenges • Software hard to install 😖 • Central versions on cluster • “I need it now” - slow system administrators? → delays • Impossible to install? • AmazonVirtual Machine - copy there - run analysis - copy back→😕 • Consistent version for specific project? One project dozens of tools?
  • 14. Miracle solution? Too complicated & too much copying back & forth
  • 15. mymacbook:~/2015-­‐02-­‐01-­‐myproject>  abyss-­‐pe  k=25  reads.fastq.gz          zsh:  command  not  found:  abyss-­‐pe   mymacbook:~/2015-­‐02-­‐01-­‐myproject>  oswitch  -­‐l          yeban/biolinux:8          ubuntu:14.04          ontouchstart/texlive-­‐full          ipython/ipython   mymacbook:~/2015-­‐02-­‐01-­‐myproject>  oswitch  yeban/biolinux          ######  You  are  now  running:  biolinux  in  container  biolinux-­‐7187.  ######   biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>  abyss-­‐pe  k=25  reads.fastq.gz          [...  just  works  on  your  files  where  they  are…]   biolinux-­‐7187:~/2015-­‐02-­‐01-­‐myproject>  exit   mymacbook:~/2015-­‐02-­‐01-­‐myproject>          [...  output  is  where  you  expect  it  to  be  ...]   https://github.com/yeban/oswitch oSwitch One-line access to other operating systems.
  • 16. oSwitch One-line access to other operating systems. Things feel (largely) unchanged: • Current working directory is maintained. • User name, uid and gid are maintained. • Login shell (bash/zsh/fish) is maintained. • Home directory is maintained (thus all .dotfiles and config files are maintained). • read/write permissions are maintained. • Paths are maintained whenever possible.Thus volumes (external drives, NAS, USB) mounted on the host are available in the container at the same path. https://github.com/yeban/oswitch
  • 17. Working with Gene predictions
  • 18. Gene prediction Dozens of software algorithms: dozens of predictions 20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting Visual inspection... and manual fixing required. 1 gene = 5 minutes to 3 days Yandell&Ence2013NRG GTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATTATGTTGAATaTTAGGGTTTTTATAAAGAATGTGTATATTGUTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTA Evidence Consensus:
  • 19.
  • 23. GeneValidator Run on: ★whole geneset: identify most problematic predictions ★alternative models for a gene (choose best) ★individual genes (while manually curating)
  • 25. Gene prediction Dozens of software algorithms: dozens of predictions 20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting Visual inspection... and manual fixing required. 1 gene = 20 minutes to 3 days 15,000 genes * 20 species = impossible. Yandell&Ence2013NRG TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT Evidence Consensus:
  • 26.
  • 27. Algorithm discovery by protein folding game players Firas Khatiba , Seth Cooperb , Michael D. Tykaa , Kefan Xub , Ilya Makedonb , Zoran Popovićb , David Bakera,c,1 , and Foldit Players a Department of Biochemistry; b Department of Computer Science and Engineering; and c Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195 Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011) Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For spe- cific hard problems, Foldit player solutions can in some cases out- perform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as “recipes” and to share their recipes with other players, who are able to further mod- ify and redistribute them. Here we describe the rapid social evolu- tion of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became parti- cularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algo- rithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discov- ered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms. citizen science ∣ crowd-sourcing ∣ optimization ∣ structure prediction ∣ strategy Citizen science is an approach to leveraging natural human abilities for scientific purposes. Most such efforts involve visual tasks such as tagging images or locating image features (1–3). In contrast, Foldit is a multiplayer online scientific discovery game, in which players become highly skilled at creating accurate protein structure models through extended game play (4, 5). Foldit recruits online gamers to optimize the computed Rosetta energy using human spatial problem-solving skills. Players manipulate protein structures with a palette of interactive tools and manipula- tions. Through their interactive exploration Foldit players also uti- lize user-friendly versions of algorithms from the Rosetta structure prediction methodology (6) such as wiggle (gradient-based energy minimization) and shake (combinatorial side chain rotamer pack- ing). The potential of gamers to solve more complex scientific pro- blems was recently highlighted by the solution of a long-standing protein structure determination problem by Foldit players (7). One of the key strengths of game-based human problem ex- ploration is the human ability to search over the space of possible strategies and adapt those strategies to the type of problem and stage of problem solving (5). The variability of tactics and strategies stems from the individuality of each player as well as multiple methods of sharing and evolution within the game (group play, game chat), and outside of the game [wiki pages (8)]. One way to arrive at algorithmic methods underlying successful human Foldit play would be to apply machine learning techniques to the detailed logs of expert Foldit players (9). We chose instead to rely on a superior learning machine: Foldit players themselves. As the players themselves understand their strategies better than anyone, we decided to allow them to codify their algorithms directly, rather than attempting to automatically learn approxi- mations. We augmented standard Foldit play with the ability to create, edit, share, and rate gameplay macros, referred to as “recipes” within the Foldit game (10). In the game each player has their own “cookbook” of such recipes, from which they can invoke a variety of interactive automated strategies. Players can share recipes they write with the rest of the Foldit community or they can choose to keep their creations to themselves. In this paper we describe the quite unexpected evolution of recipes in the year after they were released, and the striking con- vergence of this very short evolution on an algorithm very similar to an unpublished algorithm recently developed independently by scientific experts that improves over previous methods. Results In the social development environment provided by Foldit, players evolved a wide variety of recipes to codify their diverse strategies to problem solving. During the three and a half month study period (see Materials and Methods), 721 Foldit players ran 5,488 unique recipes 158,682 times and 568 players wrote 5,202 recipes. We studied these algorithms and found that they fell into four main categories: (i) perturb and minimize, (ii) aggressive rebuilding, (iii) local optimize, and (iv) set constraints. The first category goes beyond the deterministic minimize function provided to Foldit players, which has the disadvantage of readily being trapped in local minima, by adding in perturbations to lead the minimizer in different directions (11). The second category uses the rebuild tool, which performs fragment insertion with loop closure, to search different areas of conformation space; these recipes are often run for long periods of time as they are designed to rebuild entire regions of a protein rather than just refining them (Fig. S1). The third category of recipes performs local minimizations along the protein backbone in order to im- prove the Rosetta energy for every segment of a protein. The final category of recipes assigns constraints between beta strands or pairs of residues (rubber bands), or changes the secondary struc- ture assignment to guide subsequent optimization. Different algorithms were used with very different frequencies during the experiment. Some are designated by the authors as public and are available for use by all Foldit players, whereas others are private and available only to their creator or their Foldit team. The distribution of recipe usage among different players is shown in Fig. 1 for the 26 recipes that were run over 1,000 times. Some recipes, such as the one represented by the leftmost bar, were used many times by many different players, while others, such as the one represented by the pink bar in the Author contributions: F.K., S.C., Z.P., and D.B. designed research; F.K., S.C., M.D.T., and F.P. performed research; F.K., S.C., M.D.T., K.X., and I.M. analyzed data; and F.K., S.C., Z.P., and D.B. wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1 To whom correspondence should be addressed. E-mail: dabaker@u.washington.edu. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1115898108/-/DCSupplemental. BIOPHYSICSAND COMPUTATIONALBIOLOGY PSYCHOLOGICALAND COGNITIVESCIENCES http://Fold.it
  • 28. • Recruiting & retaining contributors Crowd-sourcing the visual inspection + correction of gene models. Challenges
  • 29. Recruiting & retaining contributors Plan A: get students. • Increase accessibility: • Make tasks small & simple • Need excellent tutorials & training • Need an intelligent “mothering” user interface. • Provide rewards: • Better grades • Learning experience • Good karma (helping science) • Prestige & pride (on facebook; points & badges “leaderboard”, with certificates, in publications) • Opportunities to develop expertise & responsibilities
  • 30. Crowd-sourcing the visual inspection + correction of gene models. Challenges • Recruiting & retaining contributors • Ensuring quality
  • 31. Ensuring quality • Excellent tutorials/training • Make tasks small & simple • Redundancy • Review of conflicts by senior users. Begin Needs curation Create initial tasks Being curated Curate Being curated Curate Being curated Curate Submit Submit Submit Auto-check Done Inconsistent: create “review” task Consistent: create next required task
  • 32. Crowd-sourcing the visual inspection + correction. Challenges http://afra.sbcs.qmul.ac.ukAnurag Priyam http://github.com/yeban/afra • Recruiting & retaining contributors • Ensuring quality
  • 34.
  • 35.
  • 36.
  • 37. Timelines • Rolled out to: • 8 MSc students • 20 3rd year students • Need to improve tutorials/guidance/documentation/outputs • Roll out to 200 first years (autumn) • Expand
  • 39. Thanks! y.wurm@qmul.ac.uk @yannick__ http://wurmlab.github.io Colleagues & Collaborators @ QMUL & UNIL Anurag Priyam @yeban Monica Dragan Ismail Moghul Vivek Rai Bruno Vieira @bmpvieira
  • 40. Minimally guided demo :) 1. Set up a custom BLAST server (Sequenceserver) 2. Set up oswitch to rapidly switch to yeban/biolinux @yannick__