7. First “fully Aussie” bacterial genome
● Leptospira hardjobovis str. L550
● 2 chromosomes
● 4 Mbp
● $1M dollar project
● Sanger sequencing
● Led by Dieter Bulach
8. First Illumina instrument in Australia
● Dept Microbiology
Monash University, 2008
● 36 bp single end reads
● 2 weeks to run
● 2 lanes for 1.6 Mbp genome
11. Bioinformatics software and me
Installed >1000 packages manually
Authored >100 packages into Brew
Written and maintain >10 packages
12. How to get a bioinformatics headache
1. See tweet about new published tool
2. Read abstract - sounds awesome!
3. Fail to find link to source code - eventually Google it
4. Attempt to compile and install it
5. Google for 30 min for fixes
6. Finally get it built
7. Run it on tiny data set
8. Get a vague error
9. Delete and never revisit it again
13.
14.
15. Should I stay for this talk ?
YES
It will help you write good tools
YES
It will help you identify bad tools
17. Should you write a new tool?
● NO
○ It already exists
○ You are unable to maintain it
○ You won’t really use it
● YES
○ YOU need the tool
○ YOU will use the tool
○ YOU want others to use the tool
○ Desire to give back to the community
19. Lessons from the Prokka experience
● Nearly all feedback is positive
● People all over the world are grateful
● Warm fuzzy feeling inside
● Increase your public profile
● But maintenance burden and guilt
22. Naming
● Try to be unique
○ Google to check for conflicts
○ Consider how internationals will pronounce it
○ Be creative!
● Avoid dodgy acronyms
○ Try not to win a JABBA Award
○ “Just Another Bogus Bioinformatics Acronym”
24. First impressions count
● Keep It Simple Stupid
● First page of documentation
○ What does it do?
○ How do I install it?
○ How do I run it?
● Try to keep in one place
○ Otherwise becomes inconsistent or missed
28. Always have a --help flag
% biotool -h
% biotool --help
Usage: biotool [options] seq.fa
--help Show this help
--version Print version and exit
--top N Keep top N sequences
29. Always have a --version flag
% biotool -v
% biotool -V
% biotool --version
biotool 1.3
30. Always raise an error when things go wrong
% biotool seq.fa
ERROR: can not open file ‘seq.fa’
31. Check that dependencies are installed
% biotool seq.fa
Checking BLAST... ok
Checking SAMtools... NOT FOUND!
Please install ‘samtools’ and add
it to your PATH.
32. Always let users control output filenames
% biotool seq.fa
Processing ‘seq.fa’
Wrote result to ‘filt.seq.fa.out’
# ARGH!
% biotool --out seq.filt.fa
33. KISS - run with minimum parameters
% biotool seq.fa
ERROR: missing -x parameter
% biotool -x 3 seq.fa
ERROR: missing -y parameter
% biotool -x 3 -y 7 seq.fa
ERROR: need -n name
# ARGH!
36. Use the standard getopt interface
Short options ( -h ) and long options ( --help )
● C #include <getopt.h>
● C++ boost:program_options
● Python import argparse
● Perl use Getopt::Long
● R library(argparse)
Command line interface
37. Unix exit codes
● A positive integer
● Loose standards
○ 0 = success
○ 1 = general failure
○ 2 = error with command line
○ 3..127 = user defined specific failures
● Result in shell $? Variable
38. Accessing exit codes in the shell
% ls /tmp/fake
ls: cannot access /tmp/fake
% echo $?
1
% ls /proc/cpuinfo
/proc/cpuinfo
% echo $?
0
43. Keeping your audience
“Each equation in a book
will halve your audience”
“Each difficulty encountered in installation
will halve your number of users”
44. Traditional systems level packaging
● Debian / DEB
apt-get install blast
dpkg -i blast-2.2.5-amd64.deb
● Redhat / RPM
yum install blast
rpm -i blast-2.2.5-x86_64.rpm
● Various others
48. Publish it
● Preprint archive
○ PeerJ, bioRxiv
● Method focussed journal
○ Bioinformatics, BMC Bioinformatics
● Software focussed journal
○ Journal of Open Source Software
49. Plug it
● Twitter
○ Ask someone popular you know to retweet it
● Blog
○ Start a general blog and slot
● Conferences
○ Tell people about it
50. Support your users
● Reply to emails
● Monitor your “Issues” web site
● Monitor Biostars and SeqAnswers
● Have a mailing list
● Update your documentation
● Fix bugs
52. Take home messages
● Make it as painless as possible to install
● Keep documentation clear and simple
● Get people to use it before you publish
● People are not judging your coding skills
● But they will curse you if waste their time
● Most users are grateful - leads to free beer
● A good tools worth much more than a paper
53. Acknowledgments
● Gary Glonek
● David Adelson
● Bernard Pope - VLSCI
● Dieter Bulach - VLSCI
● Anna Syme - VLSCI
● David Powell - Monash University
● Anders Goncalves da Silva - University of Melbourne