First published online April 20, 2007
Journal of Experimental Biology 210, 1492-1496 (2007)
Published by The Company of Biologists 2007
doi: 10.1242/jeb.02783
Glossary of terms
This section is designed to help readers adapt to the complex terminology
associated with contemporary molecular genetics, genomics and systems biology.
Fuller descriptions of these terms are available at
http://www.wikipedia.org/
- Ab initio prediction
- methods used to predict the potential genes encoded in the genome,
which are trained on datasets made of known genes, and used computationally to
predict coding regions out of genome without the aid of cDNA sequence.
Although their performance is improving, these algorithms perform very poorly
on non-protein coding genes.
- Annotation
- as applied to proteins, DNA sequences or genes. The storage of data
describing these entities (protein/gene identities, DNA motifs, gene ontology
categorisation, etc.) within a biological database. Active projects include
FlyBase and WormBase. See Gene ontology.
- Assembly
- the process of aligning sequenced fragments of DNA into their correct
positions within the chromosome or transcript.
- cDNA
- complementary DNA. This is DNA synthesised from a mature mRNA template
by the enzyme reverse transcriptase. cDNA is frequently used as an early part
of gene cloning procedures, since it is more robust and less subject to
degradation than the mRNA itself.
- ChIP
- chromatin immunoprecipitation assay
used to determine which segments of genomic DNA are bound to chromatin
proteins, mainly including transcription factors.
- Chip
- see Microarray.
- ChIP-on-chip
- use of a DNA microarray to analyse the DNA generated from
chromatin immunoprecipitation experiments
(see ChIP).
- cis-acting
- a molecule is described as cis-acting when it affects other
genes that are physically adjacent, on the same chromosome, or are genetically
linked or in close proximity (for mRNA expression, typically a
promoter).
- Collision-induced dissociation
- a mechanism by which molecules (e.g. proteins) are fragmented to form
molecular ions in the gas phase. These fragments are then analysed within a
mass spectrometer to provide mass determination.
- Connectivity
- a term from graph theory, which indicates the number of connections
between nodes or vertices in a network. Greater connectedness between nodes is
generally used as a measure of robustness of a network.
- CpG islands
- regions that show high density of `C followed by G' dinucleotides and
are generally associated with promoter elements; in particular, stretches of
DNA of at least 200 bp with a CG content of 50% and an observed
CpG/expected CpG in excess of 0.6. The cytosine residues can be methylated,
generally to repress transcription, while demethylated CpGs are a hallmark of
transcription. CpG dinucleotides are under-represented outside regulatory
regions, such as promoters, because methylated C mutates into T by
deamination.
- Edge
- as in networks. Connects two nodes (or vertices) within a system. These
concepts arise from graph theory.
- Enhancer
- a short segment of genomic DNA that may be located remotely and that,
on binding particular proteins (trans-acting factors), increases the
rate of transcription of a specific gene or gene cluster.
- Epistasis
- a phenomenon when the properties of one gene are modified by one or
more genes at other loci. Otherwise known as a genetic interaction, but
epistasis refers to the statistical properties of the phenomenon.
- eQTL
- the combination of conventional QTL analysis with gene expression
profiling, typically using microarrays. eQTLs describe regulatory elements
controlling the expression of genes involved in specific traits.
- EST
- expressed sequence tag. A short DNA sequence determined for a cloned
cDNA representing portions of an expressed gene. The sequence is generally
several hundred base pairs from one or both ends of the cloned
insert.
- Exaptation
- a biological adaptation where the current function is not that which
was originally evolved. Thus, the defining (derived) function might replace or
persist with the earlier, evolved adaptation.
- Exon
- any region of DNA that is transcribed to the final (spliced) mRNA
molecule. Exons interleave with segments of non-coding DNA (introns) that are
removed (spliced out) during processing after transcription.
- Gene forests
- genomic regions for which RNA transcripts, produced from either DNA
strand, have been identified without gaps (non-transcribed genomic regions).
Conversely, regions in which no transcripts have ever been detected are called
`gene deserts'.
- Gene interaction network
- a network of functional interactions between genes. Functional
interactions can be inferred from many different data types, including
proteinprotein interactions, genetic interactions, co-expression
relationships, the co-inheritance of genes across genomes and the arrangement
of genes in bacterial genomes. The interactions can be represented using
network diagrams, with lines connecting the interacting elements, and can be
modelled using differential equations.
- Gene ontology (GO)
- an ontology is a controlled vocabulary of terms that have logical
relationships with each other and that are amenable to computerised
manipulation. The Gene Ontology project has devised terms in three domains:
biological process, molecular function and cell compartment. Each gene or DNA
sequence can be associated with these annotation terms from each domain, and
this enables analysis of microarray data on groups of genes based on
descriptive terms so provided. See
http://www.geneontology.org
- Gene set enrichment analysis
- a computational method that determines whether a defined set of genes,
usually based on their common involvement in a biological process, shows
statistically significant differences in transcript expression between two
biological states.
- Gene silencing
- the switching-off of a gene by an epigenetic mechanism at the
transcriptional or post-transcriptional levels. Includes the mechanism of
RNAi.
- Genetic interaction (network)
- a genetic interaction between two genes occurs when the phenotypic
consequences of a mutation in one gene are modified by the mutational status
at a second locus. Genetic interactions can be aggravating (enhancing) or
alleviating (suppressing). To date, most high-throughput studies have focussed
on systematically identifying synthetic lethal or sick (aggravating)
interactions, which can then be visualised as a network of functional
interactions (edges) between genes (nodes).
- Genome
- a portmanteau of gene and chromosome, the entire
hereditary information for an organism that is embedded in the DNA (or, for
some viruses, in RNA). Includes protein-coding and non-coding
sequences.
- Heritability
- phenotypic variation within a population is attributable to the genetic
variation between individuals and to environmental factors. Heritability is
the proportion due to genetic variation usually expressed as a
percentage.
- Heterologous hybridization
- the use of a cDNA or oligonucleotide microarray of probes designed for
one species with target cRNA/cDNAs from a different species.
- Homeotic
- the transformation of one body part to another due to mutation of
specific developmentally related genes, notably the Hox genes in
animals and MADS-box genes in plants.
- Hub
- as in networks. A node with high connectivity, and thus which interacts
with many other nodes in the network. A hub protein interacts with many other
proteins in a cell.
- Hybridisation
- the process of joining (annealing) two complementary single-stranded
DNAs into a single double-stranded molecule. In microarray analysis, the
target RNA/DNA from the subject under investigation is denatured and
hybridised to probes that are immobilised on a solid phase (i.e. glass
microscope slide).
- Hypomorph
- in genetics, a loss-of-function mutation in a gene, but which shows
only a partial reduction in the activity it influences rather than a complete
loss (cf. hypermorph, antimorph, neomorph, etc).
- Imprinting
- a phenomenon where two inherited copies of a gene are regulated in
opposite ways, one being expressed and the other being repressed.
- Indel
- insertion and deletion of DNA, referring to two
types of genetic mutation. To be distinguished from a `point mutation', which
refers to the substitution of a single base.
- Interactome
- a more or less comprehensive set of interactions between elements
within cells. Usually applied to genes or proteins as defined by
transcriptomic, proteomic or proteinprotein interaction data.
- Intron
- see Exon.
- KEGG
- The Kyoto Encyclopedia of Genes and
Genomes is a database of metabolic and other pathways collected
from a variety of organisms. See
http://www.genome.jp/kegg
- Metabolomics
- the systematic qualitative and quantitative analysis of small chemical
metabolite profiles. The metabolome represents the collection of metabolites
within a biological sample.
- Metagenomics
- the application of genomic techniques to characterise complex
communities of microbial organisms obtained directly from environmental
samples. Typically, genomic tags are sequence characterised as markers of each
species to inform on the range and abundance of species in the
community.
- Microarray
- an arrayed set of probes for detecting molecularly specific analytes or
targets. Typically, the probes are composed of DNA segments that are
immobilised onto the solid surface, each of which can hybridise with a
specific DNA present in the target preparation. DNA microarrays are used for
profiling of gene transcripts.
- Model species
- a species used to study particular biological phenomena, the outcome
offering insights into the workings of other species. Usually, the selection
is based on experimental tractability, particularly ease of genetic
manipulation. For the geneticist, it is an organism with inbred lines where
sibs will be >98% identical (i.e. Drosophila, Caenorhabditis
elegans and mice). For genomic science, it refers to a species for which
the genomic DNA has been sequenced.
- miRNA
- a category of novel, very short, non-coding RNAs, generated by the
cleavage of larger precursors (pri-miRNA). These short RNAs are included in
the RNA-induced silencing complex (RISC) and pair to the 3' ends of
target RNA, blocking its translation into proteins (in animals) or promoting
RNA cleavage and degradation (in plants).
- mRNA
- a protein-coding mRNA containing a protein-coding region (CDS),
preceded by a 5' and followed by a 3' untranslated region
(5' UTR and 3' UTR). The UTRs contain regulatory elements. A
full-length cDNA contains the complete sequence of the original mRNA,
including both UTRs. However, it is often difficult to assign the
startingtermination positions for protein synthesis unambiguously. A
cDNA containing the entire CDS is often considered acceptable for
bioinformatic and experimental studies requiring full-length cDNAs.
- ncRNA
- non-coding RNA is any RNA molecule with no obvious protein-coding
potential for at least 80 or 100 amino acids, as determined by scanning
full-length cDNA sequences. It includes ribosomal (rRNA) and transfer RNAs
(tRNA) and is now known to include various sub-classes of RNA, including
snoRNA, siRNA and piRNA. Just like the coding mRNAs, a large proportion of
ncRNAs are transcribed by RNA polymerase II and are large transcripts. A
description of the many forms of ncRNA can be found at
http://en.wikipedia.org/wiki/Non-coding_RNA.
- Node
- as in networks. Objects linked by edges to create a network.
- PCR
- polymerase chain reaction. A molecular biology technique for
replicating DNA in vitro. The DNA is thus amplified, sometimes from
very small amounts. PCR can be adapted to perform a wide variety of genetic
manipulations.
- piRNA
- Piwi-interacting RNA. A class of RNA molecules (2930 nt long)
that complex with Piwi proteins (a class of the Argonaute family of proteins)
and are involved in transcriptional gene silencing.
- PMF
- peptide mass fingerprinting. An analytical technique for protein
identification in which a protein is fragmented using proteases. The resulting
peptides are analysed by mass spectrometry and these masses compared against a
database of predicted or measured masses to generate a protein
identity.
- Polyadenylation
- the covalent addition of multiple A bases to the 3' tail of an
mRNA molecule. This occurs during the processing of transcripts to form the
mature, spliced molecule and is important for regulation of turnover,
trafficking and translation.
- Post-source decay
- in mass spectrometry. The fragmentation of precursor molecular ions as
they accelerate away from the ionisation source of the mass spectrometer. All
precursor ions leaving the ion source have approximately the same kinetic
energy, but fragmentation results in smaller product ions that can be
distinguished from precursor ions using a `reflectron' by virtue of their
lower kinetic energies.
- Post-translational modification
- the chemical modification of a protein after synthesis through
translation. Some modifications, notably phosphorylation, affect the
properties of the protein, offering a means of regulating function.
- Principal component analysis (PCA)
- a technique for simplifying complex, multi-dimensional datasets to a
reduced number of dimensions, the principal components. This procedure retains
those characteristics of the data that relate to its variance.
- Promoter
- a regulatory DNA sequence, generally lying upstream of an expressed
gene, which in concert with other often distant regulatory elements directs
the transcription of a given gene.
- Proteome
- the entire protein complement of an organism, tissue or cell culture at
a given time.
- Quantitative trait
- inheritance of a phenotypic property or characteristic that varies
continuously between extreme states and can be attributed to interactions
between multiple genes and their environment.
- qPCR
- quantitative real-time PCR, sometimes called real-time PCR. A more
quantitative form of RT-PCR in which the quantity of amplified product is
estimated after each round of amplification.
- QTL
- quantitative trait loci. A region of DNA that contains those genes
contributing to the trait under study.
- RISC
- RNA-induced silencing
complex. A protein complex that mediates the double-stranded
RNA-induced destruction of homologous mRNA.
- RNAi
- RNA interference or RNA-mediated interference. The process by which
double-stranded RNA triggers the destruction of homologous mRNA in eukaryotic
cells by the RISC.
- RT-PCR
- reverse transcriptionpolymerase chain reaction. A technique for
amplifying a defined piece of RNA that has been converted to its complementary
DNA form by the enzyme reverse transcriptase. See qPCR.
- siRNA
- small interfering RNA, or silencing RNA. A class of short (2025
nt), double-stranded RNA molecules. It is involved in the RNA interference
pathway, which alters RNA stability and thus affects RNA concentration and
thereby suppresses the normal expression of specific genes. Widely used in
biomedical research to ablate specific genes.
- snoRNA
- small nucleolar RNA. A sub-class of RNA molecules involved in guiding
chemical modification of ribosomal RNA and other RNA genes as part of the
regulation of gene expression.
- SNP
- single nucleotide polymorphism. A single base-pair mutation at a
specific locus, usually consisting of two alleles. Because SNPs are conserved
over evolution, they are frequently used in QTL analysis and in association
studies in place of microsatellites, and in genetic fingerprinting
analyses.
- SSH
- suppressive subtractive hybridisation. A powerful protocol for
enriching cDNA libraries for genes that differ in representation between two
or more conditions. It combines normalisation and subtraction in a single
procedure and allows the detection of low-abundance, differentially expressed
transcripts, such as those involved in signalling and signal
transduction.
- Structural RNAs
- a class of non-coding RNA, long known to have a structural role (for
instance, the ribosomal RNAs), transcribed by RNA polymerase I or
III.
- Systems biology
- treatment of biological entities as systems composed of defined
elements interacting in defined ways to enable the observed function and
behaviour of that system. The properties of the systems are embedded in a
quantitative model that guides further tests of systems behaviour.
- TATA-boxes
- sequences in promoter regions constituted by TATAAA, or similar
variants, which were considered the hallmark of Promoters. Recent data show
that they are present only in the minority of promoters, where they direct
transcription at a single well-defined location some 30 bp downstream of this
element.
- trans-acting
- a factor or gene that acts on another unlinked gene, a gene on a
separate chromosome or genetically unlinked usually through some diffusible
protein product (for mRNA expression, typically a transcription
factor).
- Transcript
- an RNA product produced by the action of RNA polymerase reading the
sequence of bases in the genomic DNA. Originally limited to protein-coding
sequences with flanking UTRs but now known to include large numbers of
products that do not code for a protein product.
- Transcriptome
- the full set of mRNA molecules (transcripts) produced by the system
under observation. Whilst the genome is fixed for a given organism, the
transcriptome varies with context (i.e. tissue source, ontogeny, external
conditions or experimental treatment).
- Transgene
- a gene or genetic material that has been transferred between species or
between organisms using one of several genetic engineering
techniques.
- Transinduction
- generation of transcripts from intergenic regions. At least some such
products do not relate to a definable promoter or transcriptional start
site.
- Transposon
- sequences of DNA able to move to new positions within the genome of a
single cell. This event might cause mutation at the site of insertion. Also
called `mobile genetic elements' or `jumping genes'.
- Transvection
- an epigenetic phenomenon arising from the interaction between one
allele and the corresponding allele on the homologous chromosome, leading to
gene regulation.
- TUs
- transcriptional units. Used to group all of the overlapping RNA
transcripts that are transcribed from the same genomic strand and share exonic
sequences.
- UTR
- untranslated region. Regions of the mRNA that lie at either the
3' or 5' flanking ends of the molecule (i.e. 3' UTR and
5' UTR). They bracket the protein-coding region and contain signals and
binding sites that are important for the regulation of both protein
translation and RNA degradation.
Related articles in JEB:
- Editor-in-Chief's introduction
- Hans Hoppeler
JEB 2007 210: 1490.
[Full Text]