|
|
|
|||
| Home Help Feedback Subscriptions Archive Search Table of Contents | ||||
First published online April 20, 2007
Journal of Experimental Biology 210, 1497-1506 (2007)
Published by The Company of Biologists 2007
doi: 10.1242/jeb.000406
Review Article |
Constructing the landscape of the mammalian transcriptome
Genome Science Laboratory, Discovery and Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan and Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
e-mail: rgscerg{at}gsc.riken.jp
Accepted 16 January 2007
| Summary |
|---|
|
|
|---|
Key words: transcriptome, full-length cDNA, mRNA, non-coding RNA, genomics
| Influence of old assumptions |
|---|
|
|
|---|
Early estimates in the 1970s of the number of such cellular RNA species
were 70 000 to 100 000, based on the kinetics of in vitro
renaturation of mRNA/cDNA. All of these RNAs, whose number is substantially
larger than the known protein-coding genes within the genome (
25 000),
were thought to encode for proteins. Although the research community has long
been aware of the existence of some hundreds of non-protein coding RNAs, these
have been generally dismissed as exceptions to the widespread belief that
non-structural RNAs would all be protein-coding, and recognized by the
incorporation of polyadenosine (polyA) tags at the 3' end of the mRNA
molecule. Some mRNAs found to lack polyA tags were described in the late
1970s, but again were treated as exceptions and not pursued.
Clarifying the number of protein-coding genes, and the identification and
meaning of non-protein coding RNAs, has required the development of novel
technologies, starting with cloning methods that crucially incorporate the
full length of the mRNA molecule, rather than a shorter, artefactual fragment.
Despite an earlier report of a full-length cDNA library prepared by selecting
full-length cDNA/mRNA hybrids through the modifications made on the 5'
end of the mRNA (Theissen et al.,
1986
), it was not until the middle of the 1990s that the need to
systematically prepare full-length cDNA libraries was recognized. These
libraries allowed the systematic discovery of the entire length of the coding
mRNA, including its non-protein-coding ends (see Glossary). Conventional cDNA
libraries, not enriched for full-length, have an average content of
full-length cDNAs of 2030% (Marra
et al., 1999
), while in high quality, full-length cDNA libraries,
the proportion of full-length clones can exceed 90%. These libraries have thus
become very attractive for large-scale sequencing projects, because they yield
the full sequence data at a fraction of the sequencing cost of the entire
genomic DNA, and because the greater consistency of the full-length sequence
greatly aids data analysis, clone management and full-insert sequencing.
|
| Full-length cDNA cloning for gene discovery |
|---|
|
|
|---|
The widely used SMARTR method of cDNA production is based on the
addition by MMLV reverse transcriptase (RT), corresponding to the cap
structure, of a trinucleotide CCC, which is annealed by an oligonucleotide
having a GGG-3' end. Use of the reverse transcriptase to synthesize on
this cap-switch primer provides the means of priming the second strand of
full-length cDNAs only (Zhu et al.,
2001
). Due to the relatively low efficiency, the polymerase chain
reaction (PCR) is required. However, although these libraries are efficiently
enriched for full-length cDNAs, they show a dramatically reduced variety of
transcripts (less than half) when used for large-scale ESTs projects
(Sasaki et al., 1998
) if
compared with non-PCR amplified full-length cDNA libraries prepared from the
same tissue (Carninci et al.,
2003
). The oligo-capping procedure
(Maruyama and Sugano, 1994
;
Kato et al., 1994
) is more
sophisticated. Uncapped RNA molecules, such as truncated RNAs, ribosomal and
other structural RNAs, are dephosphorylated by a phosphatase. Next, the
removal of the cap structure by tobacco acid pyrophosphatase leaves a
phosphate group at the 5' end of full-length mRNAs only, to which an
oligonucleotide is added by RNA ligase, followed by library preparation by
reverse transcription (RT) and PCR. Despite the requirement for PCR, this
method has been widely used for the production of various cDNA collections
including the full-length Japan (FLJ) human cDNA collection
(Ota et al., 2004
).
To clone a large variety of mRNAs efficiently without PCR, new full-length
cDNA cloning approaches have been developed based on the separation of
full-length cDNA from the artefactual truncated cDNAs by full-length cDNA/mRNA
selection through the cap-site, while RNAse digestion cleaves the
single-strand portion of the mRNAs, which happens when RNA is not protected by
full-length cDNAs extending to the cap-site (see
Fig. 1). RNAse removes the
cap-site from these truncated cDNA/RNA hybrids. Full-length cDNARNA
hybrids can then be physically selected using selection techniques based on
retention of the cap structure. This can be achieved by direct binding of the
cap with a cap-binding protein (Edery et
al., 1995
) (see Fig.
1A) which, however, requires tedious coupling of a mammalian
cap-binding protein to a matrix and requires a substantial amount of starting
mRNAs. Alternatively, the cap can be selected after its chemical modification
by the addition of a biotin, followed by selection with streptavidin-coated
magnetic beads (Carninci et al.,
1996
; Carninci et al.,
1997
; Carninci and Hayashizaki,
1999
) (see Fig.
1B). This technology, called `cap-trapper', makes use of
commercially available reagents to oxidize the diol group at the cap site with
NaIO4, followed by biotinylation with a long-arm biotin hydrazide,
which is very efficient and allows further manipulations downstream without
using PCR, even if starting with as little as
1.5 µg of total RNA
(Carninci et al., 2003
).
Comprehensive genome annotation requires unbiased cDNA cloning
Development of the full-length cDNA isolation technologies was only the
first tool necessary. Although full-length libraries proved satisfactory in
terms of full-length rate (
95%)
(Carninci et al., 1996
), they
were not ideal for efficient isolation of difficult RNAs. In fact, the
efficiency of conversion of mRNA to full-length cDNAs, and subsequent cloning,
was inversely proportional to the length of the original RNAs, with clear
under-representation of cDNA deriving from long mRNAs. This problem can be
partially obviated by the use of engineered reverse transcriptases (RT), which
have been altered by mutating the RNAseH domain (for instance, Superscript II
and III from Invitrogen). Together with the use of these enzymes, we have
found that some small molecules, also called osmolytes, which are synthesized
by a multitude of organisms including yeast under conditions of stress
(De Virgilio et al., 1994
;
Hottiger et al., 1994
),
effectively activate RTs at a high temperature (60°C) that would normally
be inactivating. This enzyme `thermoactivation' is promoted by the addition of
trehalose and sorbitol to the reaction mixtures
(Carninci et al., 1998
;
Carninci et al., 2002
),
enabling the preparation of cDNAs that exceed 15 kb in length.
Conventional plasmid vectors are strongly biased to clone short cDNAs
present in cDNA ligation mixtures preferentially. This generates short insert
libraries on average [(11.5 kilobases (kb)], even when the input
molecules of cDNA are of a larger average size (>2.5 kb). To overcome this
problem, cDNA mixtures containing such long cDNAs can be cloned into lambda
vectors specifically designed for long cDNA cloning. These lambda FLC
(full-length cDNA) vectors were derived by adjusting the size of the vector to
just below the nominal cloning capacity (37.5 kb): the lambda phage most
efficiently packages DNA of lengths close to the wild-type size (48.5 kb), so
large cDNAs that traditionally were unclonable can now be packaged and cloned
more efficiently than shorter cDNAs
(Carninci et al., 2001
). This
has enabled the preparation of comprehensive cDNA libraries of size of
2.53 kb. Such libraries yield up to twofold greater diversity of cDNAs
by random sequencing compared to libraries of shorter size.
Targeting rare RNAs
The ultimate tool available for maximizing gene discovery by sequencing of
randomly selected cDNA clones is to remove undesired cDNA sequences through
normalization and subtraction by hybridization
(Bonaldo et al., 1996
). In
mammalian cells and tissues, the RNAs can be divided into classes of
expression. Relatively few genes may account for up to 2030% of the
total mass of the mRNAs, whereas intermediately expressed (10002000
different RNAs) and rarely expressed (>10 000 different RNAs) gene classes
account for the remaining 3050% and 3040% of the cellular RNAs,
respectively. Although the proportions of these RNA classes vary in different
tissues and cell types, in order to avoid prohibitive scaling up of sequencing
operations, it is mandatory to reduce the frequency of the highly and
intermediately expressed RNAs and increase that of the rarely expressed
sequences. Since the cap-trapper protocol is efficient, we developed methods
to rebalance the frequency of transcripts representing different genes
(normalization) and, secondly, to remove from the library those cDNAs already
collected (subtraction). Indeed, use of cap-trapped, normalized/subtracted
cDNA libraries is much more efficient for the discovery of novel cDNAs
(Carninci et al., 2000
;
Hirozane-Kishikawa et al.,
2003
).
Subtraction and normalization have been widely used to produce diverse EST
libraries rich in novel transcripts, and also for gene discovery in many
organisms including human (Hillier et al.,
1996
; Marra et al.,
1999
) and rat (Scheetz et al.,
2004
). These libraries have contributed substantially to our
current knowledge of gene structure and its many variations in mRNAs, and for
full-length cDNA-based ESTs (Carninci et
al., 2003
).
Significantly, normalization and subtraction protocols tend to select
against alternative splicing variants (different mRNAs generated from the same
coding sequence by alternative selection of coding modules contained within
it), and these have been discovered mainly by accident as hybridization
leftovers. Although in the mouse transcriptome we have already identified more
than 78 000 different splicing variants out of 44 000 transcriptional units
(TUs; a TU groups together all of the mRNA sequences that show transcription
overlap, see Glossary) (Carninci et al.,
2005
), splicing diversity is expected to be much larger. The
comprehensive discovery of splicing variants necessitates different
approaches, some of which may take advantage of selection of mis-paired
nucleic acid hybrids (Watahiki et al.,
2004
; Thill et al.,
2006
). Besides displaying alternative exons, however, new methods
will have to include full-length cDNA cloning, because it is not possible to
reconstruct the structure of full-length mRNA transcripts without full-length
cDNAs.
| Coverage is far from complete |
|---|
|
|
|---|
90% of the RIKEN ESTs), when one
would assume that most genes should have been already discovered
(Carninci et al., 2003
|
| Tiling arrays identify large RNAs complexity |
|---|
|
|
|---|
25%) and that a large part of the
transcript is cell-specific, as almost half of the novel transcripts (and 20%
of the known transcripts) are specific for only one cell line out of eleven
tested (Kampa et al., 2004
Even more surprisingly, the number of mRNAs that lack poly-adenylation is
as large as the number of polyadenylated RNAs
(Cheng et al., 2005
) and more
than 41.5% of the RNAs are confined to the nuclear regions. As such RNAs were
never considered for gene discovery use and there are no ad-hoc
technologies for cloning them, we can assume that transcriptome complexity is
at least some fourfold larger than our current description based upon
full-length cDNAs and ESTs (Table
1), which were derived from polyA-plus RNA isolated from whole RNA
enriched for cytosolic RNAs.
|
| CAGE tags suggests large number of transcripts and their variants |
|---|
|
|
|---|
Tagging technologies (Harbers and
Carninci, 2005
) have been developed with a sensitivity at least
one order of magnitude larger than EST sequencing to detect transcripts,
exhaustively to identify transcripts (Ng
et al., 2005
), identify their promoters, and correlate them with
expression profiling by counting the tags as a digital measure of gene
expression (Harbers and Carninci,
2005
; Nilsson et al.,
2006
). Unexpectedly, these technologies have also revealed a
surprisingly large degree of fine variability of transcription start and
termination sites (Carninci et al.,
2005
). In the mouse, we have grouped all the transcripts in 44 000
transcriptional units (of which less than 21 000 are protein coding). By
taking the conservative approach of requiring independent evidence for both
the TSS and TTS (transcription termination sites) via analysis of
their starting and termination sites, more than 181 000 independent
transcripts were identified in mouse, whereas there are at least 238 000
independent TSSs and 153 000 TTSs.
This variability in TSSs highlights biologically significant differences
between TSSs contained within a single TU, and indicates enormous complexity
in the mechanisms mediating their regulated expression. For instance, CAGE
analysis has identified promoters in the 3' UTRs of many genes
(Carninci et al., 2005
). When
two genes map tail-to-tail on the genome (i.e. the 3' ends of genes
mapping in opposing genomic strands are terminating towards each other), the
rate of 3' UTR transcription is higher when two genes map closer to each
other (average gap of
2 kbp) than for tail-to-tail genes having low
3' UTR transcription (
5 kb). Other genes, which do not map as
tail-to-tail, also show 3' UTR transcription, but no clear patterns are
evident. In all cases, such 3' UTR transcripts have true, conserved
promoters that can activate transcription of a reporter gene
(Carninci et al., 2006
).
CAGE tags allow the classification of the TSS clusters into two main
categories, based on the shape of the TSS. Surprisingly, the largest category
of mammalian promoters does not show an accurate TSS, but instead a broad TSS
(spread on average over up to 100 bp), generally associated with promoters
constituted by CpG islands (see Glossary). Within such CpG islands,
transcription starts mostly from pyrimidine/purine dinucleotides, a simplified
consensus of the `initiator' element, and these promoters are generally devoid
of TATA-boxes (see Glossary). A much smaller fraction of promoters show
well-defined, sharp peak TSSs, which are located 2932 nt downstream of
a classic TATA-box. Genes having TATA-box promoters are also preferentially
associated with the presence of unusual transcripts, originating from exons
(Carninci et al., 2006
)
(reviewed by Sandelin et al., in
press
). These exonic transcripts might consist of
non-protein-coding regulatory RNAs, which are speculated to influence the
chromatin status. Except for the brain, TATA-box promoted transcripts tend to
be tissue-specific (Gustincich et al.,
2006
), whereas CpG, broad promoters seem to be involved in
tissue-specific transcription, suggesting in turn that epigenetics features
are particularly relevant for brain transcriptional control. Elsewhere, CpG
promoters generally promote the transcription of housekeeping genes. The
promoter shape can be defined only when many CAGE tags are identified (>100
per cluster), which happens in cases of highly and broadly expressed
transcripts (8100 mouse and 6900 human promoters); however, all datasets
described above have pointed at the existence of RNAs that are rare and
specifically expressed, for which such general promoter properties analyses
will require larger CAGE datasets.
| Full-length cDNAs have been instrumental in the discovery of non-coding RNAs |
|---|
|
|
|---|
Known non-capped RNAs appear to be strongly selected against in the
cap-trapped libraries. Enrichment for capped RNAs during the cap-trapping
selection was calculated to be at least 330-fold
(Carninci et al., 2006
).
Indeed, although structural RNAs comprise more than 90% of the mammalian RNAs,
examination of the raw data obtained from RIKEN 3' ESTs (1 512 533
sequences) reveals that there are only 758 ribosomal cDNAs and 6516
mitochondrially derived cDNAs (of which 3842 were derived from only 12
problematic libraries out of 249). This proportion of cDNAs deriving from
non-capped RNA is much lower than the frequency of these RNAs in cells,
suggesting that these novel cDNAs, lacking coding potential, were unlikely to
be genomic cDNA contamination. We further analyzed these cDNAs by computation,
and identified a set of 4280 cDNAs that mapped far from existing loci, with
multiple proof of their existence as bona fide non-coding RNAs
(ncRNAs) (Numata et al.,
2003
). Experimental validation of novel ncRNAs that map in the
mouse Gnas locus demonstrated the existence of eight new imprinted
transcripts (Holmes et al.,
2003
). Further large-scale validation was performed, showing that
ncRNAs are dynamically regulated in macrophages upon induction with
lipopolysaccharides, further confirming that they are real RNA transcripts
(Ravasi et al., 2006
).
Further insights on the function of the ncRNAs derive from the observation
that a large fraction of RNAs are transcribed from both orientations of the
genome, thus forming senseantisense (S/AS) transcript pairs, in which
ncRNAs are often involved. These were first identified in the mouse
(Okazaki et al., 2002
;
Kiyosawa et al., 2003
) and
later in human (Yelin et al.,
2003
). Further analysis proved that antisense ncRNAs are
dynamically regulated and tend to be nuclear
(Kiyosawa et al., 2005
). CAGE
tag data suggested that the extent of the S/AS transcription is much larger
than previously estimated, by identification of bidirectional transcription
for 72% of the TUs, and in particular for 86% of the TUs that map in genomic
imprinted regions (loci containing genes that are expressed either paternally
or maternally), suggesting that these transcripts may be involved in
regulating entire complex loci (Katayama
et al., 2005
). The S/AS rate was further supported with 50%
estimation by mouse Serial Analysis of Gene Expression (SAGE) data
(Siddiqui et al., 2005
).
Further evidence of the regulation logic derives from the identification of
over 2000 `chains', or groups of transcriptional units that are overlapping or
share a bidirectional promoter. These chains are to some extent conserved
between mouse and human and are hypothesized to group genes under the same
epigenetic regulation (Engstrom et al.,
2006
).
The enormous transcripts (ENEOR) consist of a group of at least 66 very
large (
92 kb average) non-polyadenylated noncoding RNA, which have not
been clonable with standard techniques due to size limitation of cloning
vectors. These were identified by observing the presence of 3'-truncated
cDNA clones primed in A-rich stretches, and reconstructing their structure by
multiple RTPCR. These ENEOR span very large regions, including various
TUs, identify imprinted and micro-RNA (miRNA) genes, and may have a regulatory
effect on the chromatin, as in the case of the AIR gene
(Furuno et al., 2006
).
The observation of ENEOR is in line with the initial analysis of
5'3' ditags. In fact, a large part of the cDNA population
of primary lambda libraries, constituted by cDNAs longer than 67 kb
(Carninci et al., 2002
),
usually does not survive large-scale propagation/sequencing operations. To
overcome this, we prepared libraries containing only tags from the 5'
and 3' ends of transcripts (Carninci
et al., 2005
) that were derived from large insert size cDNAs
cloned in lambda FLC vectors (Carninci et
al., 2001
), which allows cloning of cDNAs without size bias as
long as the cDNAs do not exceed 15 kbp. Large-scale sequencing of these ditags
libraries suggests not only that the number of total independent transcript is
larger than that identified with full-length cDNAs, but also that there are
very large transcribed genomic regions called gene forests (see Glossary).
Large RNAs identified by ditags span regions as large as 2 Mbp and group the
TUs identified by cDNA into very large transcribed forests
(Carninci et al., 2005
). These
5'3' ditags represent borders of a part of the missing
transcriptome.
The identification of non-coding RNA was initially met with scepticism,
mainly because they are relatively poorly conserved between species
(Wang et al., 2004
;
Pang et al., 2006
). Despite
this, their putative promoters are well conserved
(Carninci et al., 2005
),
suggesting that their expression rather than their sequence may be
biologically more important. As they may be involved in S/AS, or produce
shorter RNAs (such as miRNAs), their full-length sequence conservation might
indeed not be biologically relevant. For a more dedicated discussion on the
function of these non-protein-coding RNAs, see
(Mattick, 2003
;
Mehler and Mattick, 2006
;
Mattick and Makunin, 2006
;
Mattick, 2007
;
Carninci, 2006
).
| Missing transcriptome |
|---|
|
|
|---|
Human and rat transcriptomes have also been extensively sampled using
subtracted/normalized ESTs from non-full-length libraries. The main difference
from the RIKEN project, is that the other widespread normalization/subtraction
technology (Bonaldo et al.,
1996
) uses double-strand cDNAs drivers. This is likely to remove
antisense- as well as sense-cDNAs, thereby rendering comparisons of S/AS
across different transcriptome datasets irrelevant.
The widespread existence of non-coding human RNA transcription was recently
vindicated by work with whole-genome tiling arrays: upon experimental
validation, some 60% of S/AS transcription rate was confirmed in the human
genome (Cheng et al.,
2005
).
Different methodologies give rise to very great differences among datasets.
In contrast with genome sequencing, where shotgun strategies are well
established, it is clear that we have not yet established a universal strategy
for analyzing the transcriptome, which differs from the genome in its inherent
complexity. Genome sequencing alone is insufficient to compare biological
phenomena because (1) comparative analysis cannot interpret a large fraction
of conserved but not expressed genomic regions, (2) expressed RNAs and
regulatory elements, including promoters, show different levels of
conservation, and (3) low or absent conservation may be important for
species-specific structural and regulatory functions. For example, the broad,
CpG type of human promoters are evolutionarily more plastic, and mutate
faster, than the average genomic regions in the recent human lineage, compared
to the chimpanzee, in contrast to sharp, TATA-box promoters, which tend
generally to be more conserved (Taylor et
al., 2006
; Carninci et al.,
2006
). Because there is such a variable degree of conservation of
RNAs and regulatory elements, strategies based on genome conservation to
identify genes and expressed transcripts are unacceptably
hypothesis-bound.
Conversely, transcriptomics datasets are still very far from being comprehensive and comparable, due to lack of sampling, shallow sequencing, subtraction and normalization and diversification of libraries. Transcriptome analysis takes advantage of the specific interest of scientists in particular sets of expressed genes in particular tissues, but data is not systematically collected, and consequently comparison of transcriptome datasets between different organisms is inconclusive.
| Even more among short RNA |
|---|
|
|
|---|
23 nt)
control transcript levels in mammalian cells
(Elbashir et al., 2001
100 gene-poor regions of the
genome (reviewed in Kim,
2006| How many RNAs are there in a mammal? |
|---|
|
|
|---|
The task to identify all of these different RNAs remains a substantial
challenge that requires us to develop novel methodologies beyond the
whole-genome tiling arrays (which cannot distinguish different overlapping
transcripts and their splicing variants), the tagging technologies and
individual cDNA clone analysis. Although sequencing short RNAs would fit the
novel generation of sequencers developed for the $1000 genome project
perfectly (Bennett et al.,
2005
; Margulies et al.,
2005
), this would not lend itself to the discovery of large
(m)RNAs, because the physical combination of all splicing variants requires
sequence determination of individual full-length cDNAs. Additionally, novel
technologies would need to collect full-length cDNA from many more different
and rare cell types from mammalian organs, and eventually from the unexplored
RNomics regions (polyA-minus and nuclear RNAs). Although the $1000 genome
project might become feasible in few years, a $1000 high-resolution
transcriptome is well beyond our cloning technologies due to the elusive
nature of different RNA classes.
Despite these difficulties, and because comprehensive transcriptome analysis adds so much value to genome sequencing, I argue for the strategic need to standardize transcript collection methods based on comprehensive cell and condition sampling with multiple types of transcriptome libraries, combined with novel high-throughput sequencing systems. Expanding this in the comparative direction by addressing the transcriptomes of as yet unexplored organisms will surely yield biological surprises and even more novelty.
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
Banerjee, A. K. (1980). 5'-terminal cap
structure in eukaryotic messenger ribonucleic acids. Microbiol.
Rev. 44,175
-205.
Bennett, S. T., Barnes, C., Cox, A., Davies, L. and Brown, C. (2005). Toward the 1,000 dollars human genome. Pharmacogenomics 6,373 -382.[Medline]
Bertone, P., Stolc, V., Royce, T. E., Rozowsky, J. S., Urban, A.
E., Zhu, X., Rinn, J. L., Tongprasit, W., Samanta, M., Weissman, S. et al.
(2004). Global identification of human transcribed sequences with
genome tiling arrays. Science
306,2242
-2246.
Bonaldo, M. F., Lennon, G. and Soares, M. B.
(1996). Normalization and subtraction: two approaches to
facilitate gene discovery. Genome Res.
6, 791-806.
Carninci, P. (2006). Tagging mammalian transcription complexity. Trends Genet. 22,501 -510.[CrossRef][Medline]
Carninci, P. and Hayashizaki, Y. (1999). High-efficiency full-length cDNA cloning. Meth. Enzymol. 303,19 -44.[Medline]
Carninci, P., Kvam, C., Kitamura, A., Ohsumi, T., Okazaki, Y., Itoh, M., Kamiya, M., Shibata, K., Sasaki, N., Izawa, M. et al. (1996). High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37,327 -336.[CrossRef][Medline]
Carninci, P., Westover, A., Nishiyama, Y., Ohsumi, T., Itoh, M., Nagaoka, S., Sasaki, N., Okazaki, Y., Muramatsu, M., Schneider, C. et al. (1997). High efficiency selection of full-length cDNA by improved biotinylated cap trapper. DNA Res. 4, 61-66.[Abstract]
Carninci, P., Nishiyama, Y., Westover, A., Itoh, M., Nagaoka,
S., Sasaki, N., Okazaki, Y., Muramatsu, M. and Hayashizaki, Y.
(1998). Thermostabilization and thermoactivation of thermolabile
enzymes by trehalose and its application for the synthesis of full length
cDNA. Proc. Natl. Acad. Sci. USA
95,520
-524.
Carninci, P., Shibata, Y., Hayatsu, N., Sugahara, Y., Shibata,
K., Itoh, M., Konno, H., Okazaki, Y., Muramatsu, M. and Hayashizaki, Y.
(2000). Normalization and subtraction of cap-trapper-selected
cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes.
Genome Res. 10,1617
-1630.
Carninci, P., Shibata, Y., Hayatsu, N., Itoh, M., Shiraki, T., Hirozane, T., Watahiki, A., Shibata, K., Konno, H., Muramatsu, M. et al. (2001). Balanced-size and long-size cloning of full-length, cap-trapped cDNAs into vectors of the novel lambda-FLC family allows enhanced gene discovery rate and functional analysis. Genomics 77, 79-90.[CrossRef][Medline]
Carninci, P., Shiraki, T., Mizuno, Y., Muramatsu, M. and Hayashizaki, Y. (2002). Extra-long first-strand cDNA synthesis. Biotechniques 32,984 -985.[Medline]
Carninci, P., Waki, K., Shiraki, T., Konno, H., Shibata, K.,
Itoh, M., Aizawa, K., Arakawa, T., Ishii, Y., Sasaki, D. et al.
(2003). Targeting a complex transcriptome: the construction of
the mouse full-length cDNA encyclopedia. Genome Res.
13,1273
-1289.
Carninci, P., Kasukawa, T., Katayama, S., Gough, J., Frith, M.
C., Maeda, N., Oyama, R., Ravasi, T., Lenhard, B., Wells, C. et al.
(2005). The transcriptional landscape of the mammalian genome.
Science 309,1559
-1563.
Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C. A., Taylor, M. S., Engstrom, P. G., Frith, M. C. et al. (2006). Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38,626 -635.[CrossRef][Medline]
Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S.,
Patel, S., Long, J., Stern, D., Tammana, H., Helt, G. et al.
(2005). Transcriptional maps of 10 human chromosomes at
5-nucleotide resolution. Science
308,1149
-1154.
Das, M., Harvey, I., Chu, L. L., Sinha, M. and Pelletier, J.
(2001). Full-length cDNAs: more than just reaching the ends.
Physiol. Genomics 6,57
-80.
De Virgilio, C., Hottiger, T., Dominguez, J., Boller, T. and Wiemken, A. (1994). The role of trehalose synthesis for the acquisition of thermotolerance in yeast. I. Genetic evidence that trehalose is a thermoprotectant. Eur. J. Biochem. 219,179 -186.[Medline]
Edery, I., Chu, L. L., Sonenberg, N. and Pelletier, J. (1995). An efficient strategy to isolate full-length cDNAs based on an mRNA cap retention procedure (CAPture). Mol. Cell. Biol. 15,3363 -3371.[Abstract]
Elbashir, S. M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K. and Tuschl, T. (2001). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature 411,494 -498.[CrossRef][Medline]
Engstrom, P. G., Suzuki, H., Ninomiya, N., Akalin, A., Sessa, L., Lavorgna, G., Brozzi, A., Luzi, L., Tan, S. L., Yang, L. et al. (2006). Complex Loci in human and mouse genomes. PLoS Genet. 2,e47 .[CrossRef][Medline]
Fire, A., Xu, S., Montgomery, M. K., Kostas, S. A., Driver, S. E. and Mello, C. C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans.Nature 391,806 -811.[CrossRef][Medline]
Frith, M. C., Wilming, L. G., Forrest, A., Kawaji, H., Tan, S. L., Wahlestedt, C., Bajic, V. B., Kai, C., Kawai, J., Carninci, P. et al. (2006). Pseudo-messenger RNA: phantoms of the transcriptome. PLoS Genet. 2,e23 .[CrossRef][Medline]
Furuno, M., Pang, K. C., Ninomiya, N., Fukuda, S., Frith, M. C., Bult, C., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y. et al. (2006). Clusters of internally-primed transcripts reveal novel long noncoding RNAs. PLoS Genet. 2, e37.[CrossRef][Medline]
Gerhard, D. S., Wagner, L., Feingold, E. A., Shenmen, C. M.,
Grouse, L. H., Schuler, G., Klein, S. L., Old, S., Rasooly, R., Good, P. et
al. (2004). The status, quality, and expansion of the NIH
full-length cDNA project: the Mammalian Gene Collection (MGC).
Genome Res. 14,2121
-2127.
Gustincich, S., Sandelin, A., Plessy, C., Katayama, S., Simone,
R., Lazarevic, D., Hayashizaki, Y. and Carninci, P. (2006).
The complexity of the mammalian transcriptome. J.
Physiol. 575,321
-332.
Harbers, M. and Carninci, P. (2005). Tag-based approaches for transcriptome research and genome annotation. Nat. Methods 2,495 -502.[CrossRef][Medline]
Hayashizaki, Y. and Carninci, P. (2006). Genome Network and FANTOM3: assessing the complexity of the transcriptome. PLoS Genet. 2,e63 .[CrossRef][Medline]
Hillier, L. D., Lennon, G., Becker, M., Bonaldo, M. F.,
Chiapelli, B., Chissoe, S., Dietrich, N., DuBuque, T., Favello, A., Gish, W.
et al. (1996). Generation and analysis of 280,000 human
expressed sequence tags. Genome Res.
6, 807-828.
Hirozane-Kishikawa, T., Shiraki, T., Waki, K., Nakamura, M., Arakawa, T., Kawai, J., Fagiolini, M., Hensch, T. K., Hayashizaki, Y. and Carninci, P. (2003). Subtraction of cap-trapped full-length cDNA libraries to select rare transcripts. Biotechniques 35,510 -516, 518.[Medline]
Holmes, R., Williamson, C., Peters, J., Denny, P. and Wells,
C. (2003). A comprehensive transcript map of the mouse Gnas
imprinted complex. Genome Res.
13,1410
-1415.
Hottiger, T., De Virgilio, C., Hall, M. N., Boller, T. and Wiemken, A. (1994). The role of trehalose synthesis for the acquisition of thermotolerance in yeast. II. Physiological concentrations of trehalose increase the thermal stability of proteins in vitro. Eur. J. Biochem. 219,187 -193.[Medline]
Imanishi, T., Itoh, T., Suzuki, Y., O'Donovan, C., Fukuchi, S., Koyanagi, K. O., Barrero, R. A., Tamura, T., Yamaguchi-Kabata, Y., Tanino, M. et al. (2004). Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2, e162.[CrossRef][Medline]
Jackson, D. A., Pombo, A. and Iborra, F.
(2000). The balance sheet for transcription: an analysis of
nuclear RNA metabolism in mammalian cells. FASEB J.
14,242
-254.
Kampa, D., Cheng, J., Kapranov, P., Yamanaka, M., Brubaker, S.,
Cawley, S., Drenkow, J., Piccolboni, A., Bekiranov, S., Helt, G. et al.
(2004). Novel RNAs identified from an in-depth analysis of the
transcriptome of human chromosomes 21 and 22. Genome
Res. 14,331
-342.
Katayama, S., Tomaru, Y., Kasukawa, T., Waki, K., Nakanishi, M.,
Nakamura, M., Nishida, H., Yap, C. C., Suzuki, M., Kawai, J. et al.
(2005). Antisense transcription in the mammalian transcriptome.
Science 309,1564
-1566.
Kato, S., Sekine, S., Oh, S. W., Kim, N. S., Umezawa, Y., Abe, N., Yokoyama-Kobayashi, M. and Aoki, T. (1994). Construction of a human full-length cDNA bank. Gene 150,243 -250.[CrossRef][Medline]
Kawai, J., Shinagawa, A., Shibata, K., Yoshino, M., Itoh, M., Ishii, Y., Arakawa, T., Hara, A., Fukunishi, Y., Konno, H. et al. (2001). Functional annotation of a full-length mouse cDNA collection. Nature 409,685 -690.[CrossRef][Medline]
Kim, V. N. (2006). Small RNAs just got bigger:
Piwi-interacting RNAs (piRNAs) in mammalian testes. Genes
Dev. 20,1993
-1997.
Kiyosawa, H., Yamanaka, I., Osato, N., Kondo, S. and
Hayashizaki, Y. (2003). Antisense transcripts with FANTOM2
clone set and their implications for gene regulation. Genome
Res. 13,1324
-1334.
Kiyosawa, H., Mise, N., Iwase, S., Hayashizaki, Y. and Abe,
K. (2005). Disclosing hidden transcripts: mouse natural
sense-antisense transcripts tend to be poly(A) negative and nuclear localized.
Genome Res. 15,463
-474.
Kodzius, R., Kojima, M., Nishiyori, H., Nakamura, M., Fukuda, S., Tagami, M., Sasaki, D., Imamura, K., Kai, C., Harbers, M. et al. (2006). CAGE: cap analysis of gene expression. Nat. Methods 3,211 -222.[CrossRef][Medline]
Lau, N. C., Lim, L. P., Weinstein, E. G. and Bartel, D. P.
(2001). An abundant class of tiny RNAs with probable regulatory
roles in Caenorhabditis elegans. Science
294,858
-862.
Liang, F., Holt, I., Pertea, G., Karamycheva, S., Salzberg, S. L. and Quackenbush, J. (2000). Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25,239 -240.[CrossRef][Medline]
Lu, C., Tej, S. S., Luo, S., Haudenschild, C. D., Meyers, B. C.
and Green, P. J. (2005). Elucidation of the small RNA
component of the transcriptome. Science
309,1567
-1569.
Maeda, N., Kasukawa, T., Oyama, R., Gough, J., Frith, M., Engstrom, P. G., Lenhard, B., Aturaliya, R. N., Batalov, S., Beisel, K. W. et al. (2006). Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet. 2, e62.[CrossRef][Medline]
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y. J., Chen, Z. et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature 437,376 -380.[Medline]
Marra, M., Hillier, L., Kucaba, T., Allen, M., Barstead, R., Beck, C., Blistain, A., Bonaldo, M., Bowers, Y., Bowles, L. et al. (1999). An encyclopedia of mouse genes. Nat. Genet. 21,191 -194.[CrossRef][Medline]
Maruyama, K. and Sugano, S. (1994). Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138,171 -174.[CrossRef][Medline]
Mattick, J. S. (2003). Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. BioEssays 25,930 -939.[CrossRef][Medline]
Mattick, J. S. (2007). A new paradigm for
developmental biology. J. Exp. Biol.
210,1526
-1547.
Mattick, J. S. and Makunin, I. V. (2006).
Non-coding RNA. Hum. Mol. Genet.
15 Suppl. 1,R17
-R29.
Mehler, M. F. and Mattick, J. S. (2006).
Non-coding RNAs in the nervous system. J. Physiol.
575,333
-341.
Mineno, J., Okamoto, S., Ando, T., Sato, M., Chono, H., Izu, H.,
Takayama, M., Asada, K., Mirochnitchenko, O., Inouye, M. et al.
(2006). The expression profile of microRNAs in mouse embryos.
Nucleic Acids Res. 34,1765
-1771.
Miura, K. (1981). The cap structure in eukaryotic messenger RNA as a mark of a strand carrying protein information. Adv. Biophys. 14,205 -238.[Medline]
Mockler, T. C., Chan, S., Sundaresan, A., Chen, H., Jacobsen, S. E. and Ecker, J. R. (2005). Applications of DNA tiling arrays for whole-genome analysis. Genomics 85, 1-15.[CrossRef][Medline]
Ng, P., Wei, C. L., Sung, W. K., Chiu, K. P., Lipovich, L., Ang, C. C., Gupta, S., Shahab, A., Ridwan, A., Wong, C. H. et al. (2005). Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat. Methods 2,105 -111.[CrossRef][Medline]
Nilsson, R., Bajic, V. B., Suzuki, H., di Bernardo, D., Bjorkegren, J., Katayama, S., Reid, J. F., Sweet, M. J., Gariboldi, M., Carninci, P. et al. (2006). Transcriptional network dynamics in macrophage activation. Genomics 88,133 -142.[CrossRef][Medline]
Numata, K., Kanai, A., Saito, R., Kondo, S., Adachi, J.,
Wilming, L. G., Hume, D. A., Hayashizaki, Y. and Tomita, M.
(2003). Identification of putative noncoding RNAs among the RIKEN
mouse full-length cDNA collection. Genome Res.
13,1301
-1306.
Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H. et al. (2002). Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420,563 -573.[CrossRef][Medline]
Ota, T., Suzuki, Y., Nishikawa, T., Otsuki, T., Sugiyama, T., Irie, R., Wakamatsu, A., Hayashi, K., Sato, H., Nagai, K. et al. (2004). Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat. Genet. 36, 40-45.[CrossRef][Medline]
Pang, K. C., Frith, M. C. and Mattick, J. S. (2006). Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet. 22, 1-5.[CrossRef][Medline]
Ravasi, T., Suzuki, H., Pang, K. C., Katayama, S., Furuno, M.,
Okunishi, R., Fukuda, S., Ru, K., Frith, M. C., Gongora, M. M. et al.
(2006). Experimental validation of the regulated expression of
large numbers of non-coding RNAs from the mouse genome. Genome
Res. 16,11
-19.
Sandelin, A., Carninci, P., Lenhard, B., Ponjavic, J., Hayashizaki, Y. and Hume, D. (in press). Mammalian RNA polymerase II core promoters insights from genome-wide studies. Nat. Rev. Genet.
Sasaki, N., Nagaoka, S., Itoh, M., Izawa, M., Konno, H., Carninci, P., Yoshiki, A., Kusakabe, M., Moriuchi, T., Muramatsu, M. et al. (1998). Characterization of gene expression in mouse blastocyst using single-pass sequencing of 3995 clones. Genomics 49,167 -179.[CrossRef][Medline]
Scheetz, T. E., Laffin, J. J., Berger, B., Holte, S., Baumes, S.
A., Brown, R., 2nd, Chang, S., Coco, J., Conklin, J., Crouch, K. et al.
(2004). High-throughput gene discovery in the rat.
Genome Res. 14,733
-741.
Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T.,
Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T. et al.
(2003). Cap analysis gene expression for high-throughput analysis
of transcriptional starting point and identification of promoter usage.
Proc. Natl. Acad. Sci. USA
100,15776
-15781.
Siddiqui, A. S., Khattra, J., Delaney, A. D., Zhao, Y., Astell,
C., Asano, J., Babakaiff, R., Barber, S., Beland, J., Bohacec, S. et al.
(2005). A mouse atlas of gene expression: large-scale digital
gene-expression profiles from precisely defined developing C57BL/6J mouse
tissues and cells. Proc. Natl. Acad. Sci. USA
102,18485
-18490.
Strausberg, R. L., Feingold, E. A., Grouse, L. H., Derge, J. G.,
Klausner, R. D., Collins, F. S., Wagner, L., Shenmen, C. M., Schuler, G. D.,
Altschul, S. F. et al. (2002). Generation and initial
analysis of more than 15,000 full-length human and mouse cDNA sequences.
Proc. Natl. Acad. Sci. USA
99,16899
-16903.
Taylor, M. S., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y. and Semple, C. A. (2006). Heterotachy in mammalian promoter evolution. PLoS Genet. 2, e30.[CrossRef][Medline]
Theissen, H., Etzerodt, M., Reuter, R., Schneider, C., Lottspeich, F., Argos, P., Luhrmann, R. and Philipson, L. (1986). Cloning of the human cDNA for the U1 RNA-associated 70K protein. EMBO J. 5,3209 -3217.[Medline]
Thill, G., Castelli, V., Pallud, S., Salanoubat, M., Wincker,
P., de la Grange, P., Auboeuf, D., Schachter, V. and Weissenbach, J.
(2006). ASEtrap: a biological method for speeding up the
exploration of spliceomes. Genome Res.
16,776
-786.
Wang, J., Zhang, J., Zheng, H., Li, J., Liu, D., Li, H., Samudrala, R., Yu, J. and Wong, G. K. (2004). Mouse transcriptome: neutral evolution of `non-coding' complementary DNAs. Nature 431,1 p following 757; discussion following 757.[Medline]
Watahiki, A., Waki, K., Hayatsu, N., Shiraki, T., Kondo, S., Nakamura, M., Sasaki, D., Arakawa, T., Kawai, J., Harbers, M. et al. (2004). Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas. Nat. Methods 1,233 -239.[CrossRef][Medline]
Yelin, R., Dahary, D., Sorek, R., Levanon, E. Y., Goldstein, O., Shoshan, A., Diber, A., Biton, S., Tamir, Y., Khosravi, R. et al. (2003). Widespread occurrence of antisense transcription in the human genome. Nat. Biotechnol. 21,379 -386.[CrossRef][Medline]
Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. and Siebert, P. D. (2001). Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30,892 -897.[Medline]
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati
Twitter What's this?
Related articles in JEB:
This article has been cited by other articles:
![]() |
J. Kawai, P. Carninci, and Y. Hayashizaki Transcriptomics resources for functional genomics Brief Funct Genomic Proteomic, November 19, 2007; (2007) elm024v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pheasant and J. S. Mattick Raising the estimate of functional human sequences Genome Res., September 1, 2007; 17(9): 1245 - 1253. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Quackenbush Extracting biology from high-dimensional biological data J. Exp. Biol., May 1, 2007; 210(9): 1507 - 1517. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Y. Gracey Interpreting physiological responses to environmental change through gene expression profiling J. Exp. Biol., May 1, 2007; 210(9): 1584 - 1592. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||