2. some words about me
• BSc in Life Sciences (2010)
• Working at Biodiversity & Climate
Research Center (since 2010)
• MSc studies at the Goethe University in
Frankfurt/Main (since 2011)
• Not exactly a biologist with much
professional background in human
genetics, but...
3. some words about me
• some background in data mining (mainly
transcriptomics)
• some experience with web applications
• interest in social media & crowd-sourcing
• customer of DTC genetic testing myself
5. mining DTC genetic
tests
• results are hidden somewhere
on the web
• often no phenotypic annotation
• not easily re-usable
6. let’s code it:
• wants to be a central repository for sharing DTC results
• enables users to share phenotypes as well
• lowers barrier to participate
• motivation to share through benefits for users
• can we take it a step further and provide data for GWAS?
7. mining DTC genetic tests
• lots of potential for open data (100k+ customers)
• cheap data source for scientists
Would you share DTC test results? (n=226)
6 %
26 %
68 %
Yes
Only with DTC company
No
12. technical implementation
• framework: Ruby on Rails
• database: PostgreSQL
• task management via resque (known of GitHub)
• basic API via JSON-queries
16. other resources
• Personal Genome Project
• data is open
• participation not
• no easy way to download data, no API etc.
• genomera
• participation will be open (currently invited beta)
• focus on small scale studies/experiments
18. problems & potential of patient driven/crowd-
sourced research
• problems
• sample sizes
• bias in participants
• motivation of participants
• accuracy of data
• potential
• possible sample sizes
• low costs
• "warm fuzzy feeling inside" for patients
19. positive examples: PatientsLikeMe
• around since ~2006
• published a dozen studies since then
• famous example: ALS research on lithium carbonate
intake (149 patients, 447 controls)
Paul Wicks et al. (2011) Accelerated clinical discovery using self-reported patient data
collected online and a patient-matching algorithm, Nature Biotechnology 29, 411–414
20. positive examples: 23andMe
• published some studies in 2010/2011
• done with self-reported data
• studies include 10.000+ to 30.000+ participants
21. positive examples: 23andMe – general traits
“
Replications of associations [...] for hair color, eye color,
and freckling validate the Web-based, self-reporting
paradigm. The identification of novel associations for hair
morphology [...], freckling [...], the ability to smell the
methanethiol produced after eating asparagus [...], and
photic sneeze reflex [...] illustrates the power of the
approach.
Nicolas Eriksson et al. (2010) Web-Based, Participant-Driven Studies Yield Novel
Genetic Associations for Common Traits. PLoS Genet 6(6): e1000993. doi:10.1371/
journal.pgen.1000993
22. positive examples: 23andMe – Parkinson’s Disease
“
We discovered two novel, genome-wide significant
associations with [Parkinson’s Disease]—both replicated
in an independent cohort. We also replicated 20
previously discovered genetic associations (including
LRRK2, GBA, SNCA, MAPT, GAK, and the HLA region),
providing support for our novel study design.
Chuong B. Do et al. (2011) Web-Based Genome-Wide Association Study Identifies
Two Novel Loci and a Substantial Genetic Component for Parkinson's Disease. PLoS
Genet 7(6): e1002141. doi:10.1371/journal.pgen.1002141
25. QS projects
• tracking health in response to work-outs (minimizing
impacts of disease/genetic predisposition)
• track response to different drugs
• tracking well-being in response to eating habits (butter vs
arithmetics)
27. my conclusions
• technology enables new kinds of research
• DTC results and patient driven research can lead to new
scientific knowledge
• can be a valuable addition to traditional research
28. openSNP: now & future
• won the Mendeley/PLoS Binary Battle in 2011
• got some funding of the German WikiMedia foundation to
get more people genotyped
• collaborating with consent to research to get IRB
approved consent-process
• working on implementing the Distributed Annotation
System