Call for non-coding mRNA resource

A call for the creation of a

Human long ncRNA Clone Set

Matthias Harbers - January 2012

Matthias Harbers 1

Long non-coding mRNAs
 The dogma that an “mRNA has to encode a Protein” directed the creation
of protein coding clone collections

 The Mammalian Gene Collection (MGC) and ORFeome Collaboration
(OC) focused on the cloning of protein coding transcripts

 However, by now it is generally accepted many mRNAs are non-coding
transcripts that exercise their functions by other, mostly unknown,
mechanisms

 All clone collections are incomplete because of the use of oligo-dT priming

 However, may be up to 40% or more of the mRNAs could lack
polyadenylation and hence are not covered by classical cDNA libraries

 Many non-polyadenylated mRNAs could be non-coding mRNAs
Matthias Harbers 2

Research is driven by sequencing
 High-speed sequencing changed the way genomic research is done

 High-speed sequencing has an unmatched power in “discovery”

 However, expressed sequences need confirmation by other means

 New transcripts have to be analyzed for their functions

 Loss-of-function studies became easier with RNAi knock-down
experiments

 However, gain-of-function experiments are essential to understand
mechanisms and functions

 We should have the resources needed to study the functions of
ncRNAs!
Matthias Harbers 3

“Real” versus “predicted”
 Public databases like NCBI and others use “reference sequences” for
transcripts

 This implies that there is only one transcript per gene!

 Reference sequences ignore splicing!

 Actual cDNA clones in the public domain often do not match reference
sequences

 High-speed sequencing data provide more “predicted sequences” based on
the assembly of short reads into contigs or alike

 Predicted sequences are not experimentally verified!

 Cloned cDNAs have at least an “experimental origin”!

Matthias Harbers 4

The “Knowledge Cycle”
Creation of genomic resources and data:
Public databases

Functional Studies:
Gene annotation

Find the clones you need:
Clone Distribution Services

 Physical resources are needed to bring “life” to in silico data!

Matthias Harbers 5

What is needed?
 Define what are long non-coding mRNAs (ongoing in the community)

 Description of human non-coding mRNA set

 Consent on the features of human non-coding mRNA set

 Starting materials available in the community?

 New starting materials required?

 Build consortium to build human non-coding mRNA clone set

 Consider non-coding mRNA collections from other organisms

 Small non-coding RNA are not considered here because they are in part
already covered by some public collections (e.g. Netherlands Cancer
Institute)
Matthias Harbers 6

Features of non-coding mRNA set
 ORFeome Collaboration committed to Invitrogen Gateway system

 Broad Institute also uses Invitrogen Gateway system

 Suggestion to stay with Invitrogen Gateway system for ncRNA set?

 However, many clone customers do not want Invitrogen Gateway clones!

 Addition of restrictions sites could enable sub-cloning without use of the
Invitrogen Gateway system

 For example Promega offers Flexi® Vectors using SgfI and PmeI

 Should the parental clones from cDNA libraries made available?

 Should the collection include splice variants?

 Are there special requirements we do not know of?
Matthias Harbers 7

Available starting materials
 Want to use high-quality full-length cDNA clones and libraries where
possible!

 Human cDNA collections:
• RIKEN ~311,000 human end-sequenced human cDNA clones in NCBI?
• Other human cDNA clones: e.g. ORIGENE?

 Human full-length cDNA libraries in the public domain?

 I do not see gene synthesis based on predicted sequences as “general
option”

 I prefer starting from cDNA libraries using “real transcripts”

 Classical cDNA libraries have not been sequenced deep enough to cover
rare genes! There is an option to find more unique clones in old libraries

 Need new cDNA libraries/pools to cover important biological samples?
Matthias Harbers 8

New technologies will help
 In the past sequencing cost limiting factor for building clone collections

 Many clones in the public domain are not full-length sequenced

 Lack of sequence information limits clone annotation

 New high-speed sequencing methods can overcome limitation on
sequencing cost

 Use high-speed sequencing instead of end-sequencing of individual clones
to screen cDNA libraries more deeply

 Use high-speed sequencing to obtain full-length sequences of all clones
within ncRNA collection

 Use high-speed sequencing to assure high quality standards of entire
collection!
Matthias Harbers 9

RNA RNA

cDNA Library cDNA Library/cDNA Pool

Clone Picking Shotgun sequencing

End Sequencing Library Screening

Annotated Clone Clones for Targets
Collection

Limited by sequencing cost Much higher coverage
Redundancy in clone collection Focus on new targets

Matthias Harbers 10

New starting materials required
 Many mRNAs lack polyadenylation and required new cloning method
Total RNA from cell

Removal of polyA mRNA

Ligation of 3’ adapter to mRNA

cDNA synthesis using 3’ adapter

Cap selection using Cap Trapper

Cloning into cDNA library

 Size does matter: Classical cDNA projects had a size cutoff
Matthias Harbers 11

Build internet presence
 Any clone collection requires an internet presence with a database!

 Clone related information can only be provided by a database

 Annotation of the clones by reference to other resources is important

 Application notes and references could be a great capture for users

 Good documentation of the project needed

 Provide all clones to community without limitations on the rights to use
(follow example of “Good Faith Agreement” of ORFeome Collaboration)

 ncRNAs may require “more” for a better understanding on how to
study new mechanisms and functions!

 Become the “home” for research on ncRNAs!
Matthias Harbers 12

Conclusion
 MGC and OC set standards for human clone resources!

 We should to build on the great record of MGC and OC to move from
coding mRNAs to long non-coding mRNAs

 After most coding genes have been covered by at least one cDNA clone, we
need to work on the non-coding transcripts to move forward

 Non-coding genes are essential players in life and we want to provide
comprehensive resources for their study and analysis

 Starting with human ncRNAs will greatly benefit medical research

 Including ncRNAs from other organisms could be an option where those
are key model systems to study ncRNA functions (RIKEN FANTOM clone
set from mouse includes many long ncRNAs)

Matthias Harbers 13

Call for non-coding mRNA resource

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Call for non-coding mRNA resource

Similaire à Call for non-coding mRNA resource (20)

Dernier

Dernier (20)

Call for non-coding mRNA resource