SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Programming in R
Quick refresher
• creating a vector
• three synonyms:
> myvector
> myvector
> myvector
> myvector
[1] 5 6

<- 5:11
<- seq(from=5, to=11, by=1)
<- c(5, 6, 7, 8, 9, 10, 11)
7

8

9 10 11

• accessing

a subset
• of a vector

> bigvector <- 150:100
> bigvector
[1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 1
[20] 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 1
[39] 112 111 110 109 108 107 106 105 104 103 102 101 100
> mysubset <- bigvector[myvector]
> mysubset
[1] 146 145 144 143 142 141 140

> subset(bigvector, bigvector > 120)
[1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 1
[20] 131 130 129 128 127 126 125 124 123 122 121
Regular expressions:
Text search on steroids.

Regular expression
David
Dav(e|id)
Dav(e|id|ide|o)
At{1,2}enborough

Atte[nm]borough
At{1,2}[ei][nm]bo{0,1}ro(ugh){0,1}

Finds
David
David, Dave
David, Dave, Davide, Davo

Attenborough,
Atenborough
Attenborough,
Attemborough
Atimbro, attenbrough, etc.

Easy counting, replacing all with “Sir David Attenborough”
• for

subsetting/counting:
grep()

• for

replacing:
gsub()
Functions
•R

has many. e.g.: plot(), t.test()

• Making

your own:

tree_age_estimate <- function(diameter, species) {
[...do the magic...
# maybe something like:
growth.rate <- growth.rates[ species ]
age.estimate <- diameter / growth.rate
...]
return(age.estimate)
}
>
+
>
+

tree_age_estimate(25, "White Oak")
66
tree_age_estimate(60, "Carya ovata")
190
“for”
Loop

> possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue',
'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark
blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue')
> possible_colours
[1] "blue"
"cyan"
"sky-blue"
[5] "steel blue"
"royal blue"
"slate blue"
[9] "dark blue"
"prussian blue" "indigo"
[13] "electric blue"
> for (colour in possible_colours) {
+
print(paste("The sky is oh so, so", colour))
+ }
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]
[1]

"The
"The
"The
"The
"The
"The
"The
"The
"The
"The
"The
"The

sky
sky
sky
sky
sky
sky
sky
sky
sky
sky
sky
sky

is
is
is
is
is
is
is
is
is
is
is
is

so,
so,
so,
so,
so,
so,
so,
so,
so,
so,
so,
so,

oh
oh
oh
oh
oh
oh
oh
oh
oh
oh
oh
oh

so
so
so
so
so
so
so
so
so
so
so
so

blue"
cyan"
sky-blue"
navy blue"
steel blue"
royal blue"
slate blue"
light blue"
dark blue"
prussian blue"
indigo"
baby blue"

"navy blue"
"light blue"
"baby blue"
Experimental design
Reproducible research &
Scientific computing.
Why consider experimental design?
• If

you’re performing experiments
• Cost
• Time
• for experiment
• for analysis
• Ethics
• If you’re deciding to fund? to buy? to approve? to compete?
• are the results real?
• can you trust the data?
Main potential problems
• Insufficient

data/power

• Inappropriate

statistics

• Pseudoreplication
• Confounding

factors

Inaccurate &
Misleading

Wrong
Example: deer parasites
• Do

red deer that feed in woodland have more parasites than
deer that feed on moorland?

• Find

a woodland + a highland; collect faecal samples from 20
deer in each.

• Conclusion?
• But:
• pseudoreplication: (n = 1 not 20!):
• shared environment (influence each other)
• relatedness
• many confounding factors: (e.g. altitude...)
Your turn: small
& big Pheidole
workers.
• Is

there a genetic predisposition for becoming a larger
worker?
• Design

an experiment alone.

• Exchange

ideas with your neighbor.
e.g.: John.
Your turn again: protein production
• Large

amounts of potential superdrug takeItEasyProtein™
required for Phase II trials.
• 10 cell lines can produce takeItEasyProtein™.
• You have 5 possible growth media.
• Optimization question: Which combination of temperature, cell
line, and growth medium will perform best?
• Constraints:
• each assay takes 4 days.
• access to 2 incubators (each can contain 1-100 growth tubes).
• large scale production starts in 2 weeks
• Design an experiment alone.
• Exchange ideas with your neighbor.
Reproducible Research &
Scientific Computing
Why care?
Some sources of inspiration
(steve@practicalcomputing.org),†† University of Wisconsin (khuff@cae.w
Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶ Unive
University D.A. Aruliah † , C. Titus Brown ‡ , Neil P. ChueUniversityDavisWisconsin Guy ,
(ethan@weecology.org), and ††† Hong § , Matt of ¶ , Richard T. (wils
∗
Greg Wilson ,

Best Practices for Scientific Computing

Steven H.D. Haddock ∗∗ , Katy Huff †† , Ian M. Mitchell ‡‡ , Mark D. Plumbley §§ , Ben Waugh ¶¶ ,
Ethan P. White ∗∗∗ , Paul Wilson †††

Software Carpentry (gvwilson@software-carpentry.org),† University of Ontario Institute of Technology (Dhavide.Aru
State University (ctb@msu.edu),§ Software Sustainability Institute (N.ChueHong@epcc.ed.ac.uk),¶ Space Telescope
(mrdavis@stsci.edu), University of Toronto (guy@cs.utoronto.ca),∗∗ Monterey Bay Aquarium Research Institute
(steve@practicalcomputing.org),†† University of Wisconsin (khuff@cae.wisc.edu),‡‡ University of British Columbia (mi
Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶ University College London (b.waugh@ucl.ac.uk),∗∗
University (ethan@weecology.org), and ††† University of Wisconsin (wilsonp@engr.wisc.edu)
∗

arXiv:1210.0530v3 [cs.MS] 29 Nov 2012

Scientists spend an increasing amount of time building and using
a
software. However, most scientists are never taught how to do this
i
efficiently. As a result, many are unaware of tools and practices that
d
would allow them to write more reliable and maintainable code with
p
less effort. We describe a set of best practices for scientific software
m
Scientists spend an increasing amount of time building and using research and software development [61
and open source experience,
development that have solid foundations in ical studies of scientific computing [4, 31,
software. However, most scientists are never taught how to do this
e
efficiently. As a improve are unaware of tools and practices thatand the reliability of their
and that result, many scientists’ productivity
development in general (summarized in
would allow them to write more reliable and maintainable code with
software. describe a set of best practices for scientific software practices will guarantee efficient, error-frt
less effort. We

ment, but used in concert they will red

f

development that have solid foundations in research and experience,
and that improve scientists’ productivitypeople, reliability of their
and the not computers. errors in scientific software, make it easie
1. Write programs for
the authors of the software time and effo
software.

Software is as important to modern focusing on the underlying scientific ques
scientific research as
2. Automate repetitive tasks.
3. Use important to tubes. From groups
the test modern scientific research
telescopesasand computer to record history. as that work exclusively
Software is
1
telescopes andMaketubes. From groups that work exclusively
test incremental changes.
4.
on computationalto traditional laboratory and field 1. laboratory andpeople, not c
problems, to traditional Write programs for field
on computational problems, control.
5. Use version
Scientists writing software need to writeS
scientists, more and more of the daily operation of science re- operation of science rescientists, more and more of the daily cutes correctly and can be easily read and
6. computers. This includes the development of
volves aroundDon’t repeat yourself (or others).
c
programmers (especially the author’s fut
volves 7. Plan for mistakes.
around computers. This includes the development of
new algorithms, managing and analyzing the large amounts
cannot be easily read and understood it is
p
of data algorithms, managing andworksand
that are generated in single research projects, correctly.the large amounts
new 8. Optimize software only after it analyzingknow that it is actually doing what it i
to
combining disparate datasets to assess synthetic problems.
c
9. Document the designown software single research projects, and must t
and purpose ofthese rather than itssoftware developers
code be productive, mechanics.
of Scientists that are generated in for
data typically develop their
aspects of human cognition into account
t
10. Conduct requires substantial domain-specific
purposes because doing so code reviews.
human working memory is limited, huma
Education

A Quick Guide to Organizing Computational Biology
Projects
William Stafford Noble1,2*
1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and
Engineering, University of Washington, Seattle, Washington, United States of America

Introduction

under a common root directory. The
understanding your work or who may be
exception to this rule is source code or
evaluating your research skills. Most comMost bioinformatics coursework focusscripts that are used in multiple projects.
monly, however, that ‘‘someone’’ is you. A
es on algorithms, with perhaps some
Each such program might have a project
few months from now, you may not
components devoted to learning prodirectory of its own.
remember what you were up to when you
gramming skills and learning how to
Within a given project, I use a top-level
created a particular set of files, or you may
use existing bioinformatics software. Unorganization that is logical, with chrononot remember what conclusions you drew.
fortunately, for students who are preparlogical organization at the next level, and
You will either have to then spend time
ing for a research career, this type of
logical organization below that. A sample
reconstructing your previous experiments
curriculum fails to address many of the
project, called msms, is shown in Figure 1.
or lose whatever insights you gained from
day-to-day organizational challenges asAt the root of most of my projects, I have a
those experiments.
sociated with performing computational
data directory for storing fixed data sets, a
This leads to the second principle,
experiments. In practice, the principles
results directory for tracking computawhich is actually more like a version of
Figure
names are
typeface, and filenames are
behind organizing and documenting 1. Directory structure for a sample project. Directorydo, youin large tional experiments in smaller typeface. Only a subset of
Murphy’s that the dates are formatted ,year.-,month.-,day. so that they can bepeformed on that data,
the files are shown here. NoteLaw: Everything you
sorted in chronological order. The
computational experiments are often code src/ms-analysis.c have to to do over again. and is documented in doc/ms-analysis.html. The README
source
is compiled
create bin/ms-analysis a doc directory with one subdirectory per
will probably
files in
what date. The driver script results/2009-01-15/runall
learned on the fly, and this learning is the data directories specify who downloaded the data files from what URL on manuscript, and directories such as src
automatically Inevitably, you will discover some flaw split3, corresponding to three cross-validation splits. The bin/parsegenerates the three subdirectories split1, split2, and in
sqt.py
strongly influenced by personal predilec- script is called by bothpreparation driverthe data being
for source code and bin for compiled
your initial of the runall of scripts.
doi:10.1371/journal.pcbi.1000424.g001
tions as well as by chance interactions
binaries or scripts.
analyzed, or you will get access to new
with collaborators or colleagues.
Within the data and results a complete
data, the distinction be- The your paramThese types of entries provide directowith this approach,or you will decide that Lab Notebook
The purpose of this article is to describe data and results may of a particular model was not
picture of the development a similar,
tween
not be useful.
ries, it is often tempting to apply of the project
eterization
In parallel with this chronological
over time.
Instead,
could
one good strategy for carrying out com- onebroad imagine a top-level means structure,the find itlogical toorganization. For example, you
enough. This directory that I
useful
directory called something like experiIn practice, I ask members of my
putational experiments. I will not describe , with subdirectories with names like last week, chronologically organizedhave two or group to data sets notebooks
maintain a or even
may lab research three put their lab against
ments
experiment you did
notebook. This is a document that resides
2008-12-19. Optionally, the directory
profound issues such as how to formulate
which
plan to password protection if
the set of experiments you’veroot of the results directory andyou online, behind benchmark your
in the been workname
also include a
or two
necessary. When I meet with a member
hypotheses, design experiments, or draw might ing on over word past month, will probably
that records your progress algorithms, ofso lab or a could team, we can one
in detail.
indicating the topic of the the
experiment
my you project create refer
Entries in the notebook
conclusions. Rather, I will focus therein. In practice,to single experiment you have organized should be dated, for each of lab notebook, focusing on
on
directory
need a be redone. If and they should be relatively verbose, with to the online them under data.
will often require more than one day of
the current entry but scrolling up to
relatively mundane issues such as organizthis
and documented your work clearly, thenimages In my experience, entries approach is risky,
links or embedded
or tables
work, and so you may end up working a
previous
as necessary. The URL
ing files and directories and documenting or repeating creating a new displaying the results of the experiments the can also be provided toof yourcollabobecause
logical structure remote final
few days
more before the experiment with the new

In each results folder:
•script getResults.rb or WHATIDID.txt or MyAnalysis.Rnw
•intermediates
•output
Take notes in Markdown

“compile”
to html, pdf,
knitr (sweave)Analyzing & Reporting in a single file.
MyFile.Rnw
documentclass{article}
usepackage[sc]{mathpazo}
usepackage[T1]{fontenc}
usepackage{url}
begin{document}

Also works with
Markdown
instead of LaTeX!

### in R:
library(knitr)
knit(“MyFile.Rnw”)
# --> creates MyFile.tex

<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=
# this is equivalent to SweaveOpts{...}
opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold')
options(replace.assign=TRUE,width=90)
@

title{A Minimal Demo of knitr}

### in shell:
pdflatex MyFile.tex
# --> creates MyFile.pdf

author{Yihui Xie}

A Minimal Demo of knitr

maketitle
You can test if textbf{knitr} works with this minimal demo. OK, let's
get started with some boring random numbers:

Yihui Xie
February 26, 2012

<<boring-random,echo=TRUE,cache=TRUE>>=
set.seed(1121)
(x=rnorm(20))
mean(x);var(x)
@

You can test if knitr works with this minimal demo. OK, let’s get started with s
numbers:

The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots
and histograms recorded by the PDF device:

set.seed(1121)
(x <- rnorm(20))

<<boring-plots,cache=TRUE,echo=TRUE>>=
## two plots side by side
par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)
boxplot(x)
hist(x,main='')
@
Do the above chunks work? You should be able to compile the TeX{}

## [1] 0.14496 0.43832
## [10] -0.02531 0.15088
## [19] 0.13272 -0.15594
mean(x)
## [1] 0.3217
var(x)

0.15319
0.11008

1.08494 1.99954 -0.81188
1.35968 -0.32699 -0.71638

0.16027
1.80977

0
0
Choosing a programming language
Excel
R
Unix command-line (i.e., shell, i.e., bash)
Perl
Java
Python
Ruby
Javascript
Ruby.

“Friends don’t let friends do Perl” - reddit user
example: reverse the contents of each line in a file
### in PERL:
open INFILE, "my_file.txt";
while (defined ($line = <INFILE>)) {
chomp($line);
@letters = split(//, $line);
@reverse_letters = reverse(@letters);
$reverse_string = join("", @reverse_letters);
print $reverse_string, "n";
}
### in Ruby:
File.open("my_file.txt").each do |line|
puts line.chomp.reverse
end
More ruby examples.

5.times do
puts "Hello world"
end
# Sorting people
people_sorted_by_age = people.sort_by{ |person| person.age}
Getting help.
• In

real life: Make friends with people. Talk to them.

• Online:
• Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...)
• Programming: http://stackoverflow.com
• Bioinformatics: http://www.biostars.org
• Sequencing-related: http://seqanswers.com
• Stats: http://stats.stackexchange.com
• Online

reputation is good:

• forums
• “citizen

science”

Contenu connexe

Tendances

Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
The ELIXIR UK industry survey by Gabriella Rustici
The ELIXIR UK industry survey by Gabriella RusticiThe ELIXIR UK industry survey by Gabriella Rustici
The ELIXIR UK industry survey by Gabriella RusticiELIXIR UK
 
Tracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) dataTracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) dataBen Anderson
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
 
COM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data MiningCOM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data Miningbutest
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 

Tendances (11)

Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
The ELIXIR UK industry survey by Gabriella Rustici
The ELIXIR UK industry survey by Gabriella RusticiThe ELIXIR UK industry survey by Gabriella Rustici
The ELIXIR UK industry survey by Gabriella Rustici
 
Tracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) dataTracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) data
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
COM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data MiningCOM 578 Empirical Methods in Machine Learning and Data Mining
COM 578 Empirical Methods in Machine Learning and Data Mining
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 

Similaire à 2013 10-30-sbc361-reproducible designsandsustainablesoftware

2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible researchYannick Wurm
 
2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburghYannick Wurm
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasMerce Crosas
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practicesc.titus.brown
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsAnubhav Jain
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible researchYannick Wurm
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataJeongwhan Choi
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibilityc.titus.brown
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyNeil Chue Hong
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionmatthewturk
 

Similaire à 2013 10-30-sbc361-reproducible designsandsustainablesoftware (20)

2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practices
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software Data
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibility
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimension
 

Plus de Yannick Wurm

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomicsYannick Wurm
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics researchYannick Wurm
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible researchYannick Wurm
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdomYannick Wurm
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosomeYannick Wurm
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assemblyYannick Wurm
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker badYannick Wurm
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...Yannick Wurm
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.keyYannick Wurm
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitchYannick Wurm
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolutionYannick Wurm
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolutionYannick Wurm
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.keyYannick Wurm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 introYannick Wurm
 
Sustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopSustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopYannick Wurm
 

Plus de Yannick Wurm (20)

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible research
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdom
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assembly
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.key
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolution
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolution
 
Evolution week3
Evolution week3Evolution week3
Evolution week3
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
 
Evolution week2
Evolution week2Evolution week2
Evolution week2
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Sbc322 intro.key
Sbc322 intro.keySbc322 intro.key
Sbc322 intro.key
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 intro
 
Sustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopSustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshop
 
2014 12-09-oulu
2014 12-09-oulu2014 12-09-oulu
2014 12-09-oulu
 

Dernier

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 

Dernier (20)

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 

2013 10-30-sbc361-reproducible designsandsustainablesoftware

  • 2. • creating a vector • three synonyms: > myvector > myvector > myvector > myvector [1] 5 6 <- 5:11 <- seq(from=5, to=11, by=1) <- c(5, 6, 7, 8, 9, 10, 11) 7 8 9 10 11 • accessing a subset • of a vector > bigvector <- 150:100 > bigvector [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 1 [20] 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 1 [39] 112 111 110 109 108 107 106 105 104 103 102 101 100 > mysubset <- bigvector[myvector] > mysubset [1] 146 145 144 143 142 141 140 > subset(bigvector, bigvector > 120) [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 1 [20] 131 130 129 128 127 126 125 124 123 122 121
  • 3. Regular expressions: Text search on steroids. Regular expression David Dav(e|id) Dav(e|id|ide|o) At{1,2}enborough Atte[nm]borough At{1,2}[ei][nm]bo{0,1}ro(ugh){0,1} Finds David David, Dave David, Dave, Davide, Davo Attenborough, Atenborough Attenborough, Attemborough Atimbro, attenbrough, etc. Easy counting, replacing all with “Sir David Attenborough”
  • 5. Functions •R has many. e.g.: plot(), t.test() • Making your own: tree_age_estimate <- function(diameter, species) { [...do the magic... # maybe something like: growth.rate <- growth.rates[ species ] age.estimate <- diameter / growth.rate ...] return(age.estimate) } > + > + tree_age_estimate(25, "White Oak") 66 tree_age_estimate(60, "Carya ovata") 190
  • 6. “for” Loop > possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue', 'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue') > possible_colours [1] "blue" "cyan" "sky-blue" [5] "steel blue" "royal blue" "slate blue" [9] "dark blue" "prussian blue" "indigo" [13] "electric blue" > for (colour in possible_colours) { + print(paste("The sky is oh so, so", colour)) + } [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] "The "The "The "The "The "The "The "The "The "The "The "The sky sky sky sky sky sky sky sky sky sky sky sky is is is is is is is is is is is is so, so, so, so, so, so, so, so, so, so, so, so, oh oh oh oh oh oh oh oh oh oh oh oh so so so so so so so so so so so so blue" cyan" sky-blue" navy blue" steel blue" royal blue" slate blue" light blue" dark blue" prussian blue" indigo" baby blue" "navy blue" "light blue" "baby blue"
  • 7.
  • 8. Experimental design Reproducible research & Scientific computing.
  • 9. Why consider experimental design? • If you’re performing experiments • Cost • Time • for experiment • for analysis • Ethics • If you’re deciding to fund? to buy? to approve? to compete? • are the results real? • can you trust the data?
  • 10. Main potential problems • Insufficient data/power • Inappropriate statistics • Pseudoreplication • Confounding factors Inaccurate & Misleading Wrong
  • 11. Example: deer parasites • Do red deer that feed in woodland have more parasites than deer that feed on moorland? • Find a woodland + a highland; collect faecal samples from 20 deer in each. • Conclusion? • But: • pseudoreplication: (n = 1 not 20!): • shared environment (influence each other) • relatedness • many confounding factors: (e.g. altitude...)
  • 12. Your turn: small & big Pheidole workers. • Is there a genetic predisposition for becoming a larger worker? • Design an experiment alone. • Exchange ideas with your neighbor.
  • 14. Your turn again: protein production • Large amounts of potential superdrug takeItEasyProtein™ required for Phase II trials. • 10 cell lines can produce takeItEasyProtein™. • You have 5 possible growth media. • Optimization question: Which combination of temperature, cell line, and growth medium will perform best? • Constraints: • each assay takes 4 days. • access to 2 incubators (each can contain 1-100 growth tubes). • large scale production starts in 2 weeks • Design an experiment alone. • Exchange ideas with your neighbor.
  • 15.
  • 18.
  • 19. Some sources of inspiration
  • 20. (steve@practicalcomputing.org),†† University of Wisconsin (khuff@cae.w Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶ Unive University D.A. Aruliah † , C. Titus Brown ‡ , Neil P. ChueUniversityDavisWisconsin Guy , (ethan@weecology.org), and ††† Hong § , Matt of ¶ , Richard T. (wils ∗ Greg Wilson , Best Practices for Scientific Computing Steven H.D. Haddock ∗∗ , Katy Huff †† , Ian M. Mitchell ‡‡ , Mark D. Plumbley §§ , Ben Waugh ¶¶ , Ethan P. White ∗∗∗ , Paul Wilson ††† Software Carpentry (gvwilson@software-carpentry.org),† University of Ontario Institute of Technology (Dhavide.Aru State University (ctb@msu.edu),§ Software Sustainability Institute (N.ChueHong@epcc.ed.ac.uk),¶ Space Telescope (mrdavis@stsci.edu), University of Toronto (guy@cs.utoronto.ca),∗∗ Monterey Bay Aquarium Research Institute (steve@practicalcomputing.org),†† University of Wisconsin (khuff@cae.wisc.edu),‡‡ University of British Columbia (mi Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶ University College London (b.waugh@ucl.ac.uk),∗∗ University (ethan@weecology.org), and ††† University of Wisconsin (wilsonp@engr.wisc.edu) ∗ arXiv:1210.0530v3 [cs.MS] 29 Nov 2012 Scientists spend an increasing amount of time building and using a software. However, most scientists are never taught how to do this i efficiently. As a result, many are unaware of tools and practices that d would allow them to write more reliable and maintainable code with p less effort. We describe a set of best practices for scientific software m Scientists spend an increasing amount of time building and using research and software development [61 and open source experience, development that have solid foundations in ical studies of scientific computing [4, 31, software. However, most scientists are never taught how to do this e efficiently. As a improve are unaware of tools and practices thatand the reliability of their and that result, many scientists’ productivity development in general (summarized in would allow them to write more reliable and maintainable code with software. describe a set of best practices for scientific software practices will guarantee efficient, error-frt less effort. We ment, but used in concert they will red f development that have solid foundations in research and experience, and that improve scientists’ productivitypeople, reliability of their and the not computers. errors in scientific software, make it easie 1. Write programs for the authors of the software time and effo software. Software is as important to modern focusing on the underlying scientific ques scientific research as 2. Automate repetitive tasks. 3. Use important to tubes. From groups the test modern scientific research telescopesasand computer to record history. as that work exclusively Software is 1 telescopes andMaketubes. From groups that work exclusively test incremental changes. 4. on computationalto traditional laboratory and field 1. laboratory andpeople, not c problems, to traditional Write programs for field on computational problems, control. 5. Use version Scientists writing software need to writeS scientists, more and more of the daily operation of science re- operation of science rescientists, more and more of the daily cutes correctly and can be easily read and 6. computers. This includes the development of volves aroundDon’t repeat yourself (or others). c programmers (especially the author’s fut volves 7. Plan for mistakes. around computers. This includes the development of new algorithms, managing and analyzing the large amounts cannot be easily read and understood it is p of data algorithms, managing andworksand that are generated in single research projects, correctly.the large amounts new 8. Optimize software only after it analyzingknow that it is actually doing what it i to combining disparate datasets to assess synthetic problems. c 9. Document the designown software single research projects, and must t and purpose ofthese rather than itssoftware developers code be productive, mechanics. of Scientists that are generated in for data typically develop their aspects of human cognition into account t 10. Conduct requires substantial domain-specific purposes because doing so code reviews. human working memory is limited, huma
  • 21.
  • 22. Education A Quick Guide to Organizing Computational Biology Projects William Stafford Noble1,2* 1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America Introduction under a common root directory. The understanding your work or who may be exception to this rule is source code or evaluating your research skills. Most comMost bioinformatics coursework focusscripts that are used in multiple projects. monly, however, that ‘‘someone’’ is you. A es on algorithms, with perhaps some Each such program might have a project few months from now, you may not components devoted to learning prodirectory of its own. remember what you were up to when you gramming skills and learning how to Within a given project, I use a top-level created a particular set of files, or you may use existing bioinformatics software. Unorganization that is logical, with chrononot remember what conclusions you drew. fortunately, for students who are preparlogical organization at the next level, and You will either have to then spend time ing for a research career, this type of logical organization below that. A sample reconstructing your previous experiments curriculum fails to address many of the project, called msms, is shown in Figure 1. or lose whatever insights you gained from day-to-day organizational challenges asAt the root of most of my projects, I have a those experiments. sociated with performing computational data directory for storing fixed data sets, a This leads to the second principle, experiments. In practice, the principles results directory for tracking computawhich is actually more like a version of Figure names are typeface, and filenames are behind organizing and documenting 1. Directory structure for a sample project. Directorydo, youin large tional experiments in smaller typeface. Only a subset of Murphy’s that the dates are formatted ,year.-,month.-,day. so that they can bepeformed on that data, the files are shown here. NoteLaw: Everything you sorted in chronological order. The computational experiments are often code src/ms-analysis.c have to to do over again. and is documented in doc/ms-analysis.html. The README source is compiled create bin/ms-analysis a doc directory with one subdirectory per will probably files in what date. The driver script results/2009-01-15/runall learned on the fly, and this learning is the data directories specify who downloaded the data files from what URL on manuscript, and directories such as src automatically Inevitably, you will discover some flaw split3, corresponding to three cross-validation splits. The bin/parsegenerates the three subdirectories split1, split2, and in sqt.py strongly influenced by personal predilec- script is called by bothpreparation driverthe data being for source code and bin for compiled your initial of the runall of scripts. doi:10.1371/journal.pcbi.1000424.g001 tions as well as by chance interactions binaries or scripts. analyzed, or you will get access to new with collaborators or colleagues. Within the data and results a complete data, the distinction be- The your paramThese types of entries provide directowith this approach,or you will decide that Lab Notebook The purpose of this article is to describe data and results may of a particular model was not picture of the development a similar, tween not be useful. ries, it is often tempting to apply of the project eterization In parallel with this chronological over time. Instead, could one good strategy for carrying out com- onebroad imagine a top-level means structure,the find itlogical toorganization. For example, you enough. This directory that I useful directory called something like experiIn practice, I ask members of my putational experiments. I will not describe , with subdirectories with names like last week, chronologically organizedhave two or group to data sets notebooks maintain a or even may lab research three put their lab against ments experiment you did notebook. This is a document that resides 2008-12-19. Optionally, the directory profound issues such as how to formulate which plan to password protection if the set of experiments you’veroot of the results directory andyou online, behind benchmark your in the been workname also include a or two necessary. When I meet with a member hypotheses, design experiments, or draw might ing on over word past month, will probably that records your progress algorithms, ofso lab or a could team, we can one in detail. indicating the topic of the the experiment my you project create refer Entries in the notebook conclusions. Rather, I will focus therein. In practice,to single experiment you have organized should be dated, for each of lab notebook, focusing on on directory need a be redone. If and they should be relatively verbose, with to the online them under data. will often require more than one day of the current entry but scrolling up to relatively mundane issues such as organizthis and documented your work clearly, thenimages In my experience, entries approach is risky, links or embedded or tables work, and so you may end up working a previous as necessary. The URL ing files and directories and documenting or repeating creating a new displaying the results of the experiments the can also be provided toof yourcollabobecause logical structure remote final few days more before the experiment with the new In each results folder: •script getResults.rb or WHATIDID.txt or MyAnalysis.Rnw •intermediates •output
  • 23. Take notes in Markdown “compile” to html, pdf,
  • 24. knitr (sweave)Analyzing & Reporting in a single file. MyFile.Rnw documentclass{article} usepackage[sc]{mathpazo} usepackage[T1]{fontenc} usepackage{url} begin{document} Also works with Markdown instead of LaTeX! ### in R: library(knitr) knit(“MyFile.Rnw”) # --> creates MyFile.tex <<setup, include=FALSE, cache=FALSE, echo=FALSE>>= # this is equivalent to SweaveOpts{...} opts_chunk$set(fig.path='figure/minimal-', fig.align='center', fig.show='hold') options(replace.assign=TRUE,width=90) @ title{A Minimal Demo of knitr} ### in shell: pdflatex MyFile.tex # --> creates MyFile.pdf author{Yihui Xie} A Minimal Demo of knitr maketitle You can test if textbf{knitr} works with this minimal demo. OK, let's get started with some boring random numbers: Yihui Xie February 26, 2012 <<boring-random,echo=TRUE,cache=TRUE>>= set.seed(1121) (x=rnorm(20)) mean(x);var(x) @ You can test if knitr works with this minimal demo. OK, let’s get started with s numbers: The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots and histograms recorded by the PDF device: set.seed(1121) (x <- rnorm(20)) <<boring-plots,cache=TRUE,echo=TRUE>>= ## two plots side by side par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1) boxplot(x) hist(x,main='') @ Do the above chunks work? You should be able to compile the TeX{} ## [1] 0.14496 0.43832 ## [10] -0.02531 0.15088 ## [19] 0.13272 -0.15594 mean(x) ## [1] 0.3217 var(x) 0.15319 0.11008 1.08494 1.99954 -0.81188 1.35968 -0.32699 -0.71638 0.16027 1.80977 0 0
  • 25. Choosing a programming language Excel R Unix command-line (i.e., shell, i.e., bash) Perl Java Python Ruby Javascript
  • 26. Ruby. “Friends don’t let friends do Perl” - reddit user example: reverse the contents of each line in a file ### in PERL: open INFILE, "my_file.txt"; while (defined ($line = <INFILE>)) { chomp($line); @letters = split(//, $line); @reverse_letters = reverse(@letters); $reverse_string = join("", @reverse_letters); print $reverse_string, "n"; } ### in Ruby: File.open("my_file.txt").each do |line| puts line.chomp.reverse end
  • 27. More ruby examples. 5.times do puts "Hello world" end # Sorting people people_sorted_by_age = people.sort_by{ |person| person.age}
  • 28. Getting help. • In real life: Make friends with people. Talk to them. • Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com • Bioinformatics: http://www.biostars.org • Sequencing-related: http://seqanswers.com • Stats: http://stats.stackexchange.com
  • 29.
  • 30.
  • 31. • Online reputation is good: • forums • “citizen science”