Changes in gene expression underlie phenotypic plasticity, variation within species, and phenotypic divergence between species. These expression differences arise from modulation of regulatory networks. To understand the source of expression differences, networks of interactions among genes and gene products that orchestrate gene expression must be considered. Here I review the basic structure of eukaryotic regulatory networks and discuss selected case studies that provide insight into how these networks are altered to create expression differences within and between species.
Comparative physiologists strive to understand how and why cellular and organismal functions differ among environments, among individuals, and among species. These issues can be addressed by comparing complex organismal phenotypes, comparing enzyme activity and flux through biochemical pathways, or comparing the molecular mechanisms that produce these higher-order phenotypes. Recently, many comparative studies have focused on differences in gene expression as a source of organismal diversity. Although the importance of gene regulation for controlling biological processes has been recognized for over 30 years (Britten and Davidson, 1969; King and Wilson, 1975), it is only in the last decade that the tools necessary to study changes in gene expression on a large-scale have become available (Schena et al., 1995). These tools, particularly DNA microarrays, have opened new areas of exploration for comparative physiologists studying both model and non-model systems (reviewed in Cossins et al., 2006).
With the rapid accumulation of studies analyzing transcript levels, it is important to remember that transcription is only one step (albeit a critical one) in converting genotypes into phenotypes; changes in transcript levels do not always affect phenotypes and vice versa. The limitations of transcript analysis have been discussed in detail (Feder and Walser, 2005). Despite these limitations, comparative studies of gene expression have provided insight into the molecular mechanisms of ecological responses and phenotypic evolution. These comparative studies can be divided into three groups: comparisons within genotypes, within species and between species.
(1) Differences in gene expression can be created by environmental cues without any genetic differences. Comparative studies of individuals exposed to different environments reveal changes in gene expression associated with physiological responses to external stimuli. The utility of this approach is illustrated by a recent study of the eurythermic goby fish (Gillichthys mirabilis) reared under multiple temperature regimes mimicking wild conditions (Buckley et al., 2006). Approximately 2% of the genes surveyed showed a change in gene expression between treatments, with the specific genes showing altered expression varying among tissues. Individual expression changes observed in this study (and in similar studies) may be associated with positive physiological adjustments that help an organism cope with its surroundings, may reflect a stress response to adverse conditions, or may be consequences of pleiotropy that have no direct role in adjusting to different temperatures. Additional experiments are required to determine which expression changes fall into which class.
(2) Under standardized environmental conditions, genetic polymorphisms can cause expression differences within a species. Studies comparing genetically distinct samples of yeast, fruit flies and fish reared under similar laboratory conditions found that up to 25% of genes vary in their expression level between individuals of the same species (Brem et al., 2002; Jin et al., 2001; Oleksiak et al., 2002). These expression differences include both neutral polymorphisms and variation that contributes to phenotypic variation. For example, polymorphic gene expression explains variation in cardiac metabolism of the teleost fish Fundulus heteroclitus, demonstrating the adaptive potential of regulatory variation (Oleksiak et al., 2005). Putative adaptive changes in gene expression have also been observed in experimental populations of microorganisms (e.g. Cooper et al., 2003; Ferea et al., 1999; Riehle et al., 2003). Although the number of case studies demonstrating phenotypic consequences for regulatory changes is growing, the proportion of regulatory variation that is advantageous, neutral and deleterious remains subject to debate (e.g. Fay and Wittkopp, in press; Gilad et al., 2006; Lemos et al., 2005; Ranz and Machado, 2006).
(3) Genetic divergence between species also creates differences in gene expression. Regulatory evolution has been shown to contribute to divergent traits such as body armor in stickleback fishes (Colosimo et al., 2005), ear shape in maize (Hubbard et al., 2002) and pigmentation in fruit flies (Wittkopp et al., 2003). Up to 25% of genes differentially expressed between closely related Drosophila species show patterns of expression variation consistent with lineage-specific selection (Rifkin et al., 2003). Meta-analysis of comparative genomic expression data concludes that regulatory evolution is characterized by strong stabilizing selection with directional selection on some genes (Lemos et al., 2005).
All changes in gene expression result from modifications to regulatory networks. To understand the genetic and molecular mechanisms responsible for expression differences, we must examine the structure of regulatory networks and investigate how changes in these networks alter gene expression. Elucidating the architecture of regulatory networks will reveal connections among genes and is expected to uncover properties of regulatory networks that make certain types of changes more or less likely to occur (Wittkopp, 2005). Here, I review basic structural features of eukaryotic regulatory networks and use selected case studies to examine how networks vary between environments, between genotypes, and between species.
The structure of regulatory networks
In isolation, a gene is an inert object – nothing more than a string of nucleotide bases. Only when placed in a cellular environment and interacting with the products of other genes can a gene become active. The first step in this activation is for the gene to be transcribed, or `expressed'. Comparative studies of gene expression on a genomic scale typically use standing transcript levels as a measure of gene expression. Although post-transcriptional regulation is expected to be equally important, techniques needed to study this process in a high-throughput manner remain limited. Biochemical interactions among proteins, RNA and DNA that control transcript levels are summarized below. These interactions form a complex network that coordinates gene expression and integrates the genome (Babu et al., 2004).
Molecular interactions between genes and gene products form the connections that make up a regulatory network (Fig. 1A). Sequence-specific interactions between transcription factor proteins and cis-regulatory DNA sequences provide the basic network structure (Blais and Dynlacht, 2005). Binding sites for 106 transcription factors have been mapped genome-wide in the baker's yeast Saccharomyces cerevisiae (Lee et al., 2002). The number of target genes for a given transcription factor ranged from 0 to 181 in this experiment, with an average of 38 putative cis-regulatory targets per transcription factor. Two-thirds of the transcription factors surveyed had less than 40 targets each. Although full genomic surveys have not yet been completed in metazoans, smaller scale studies have been conducted in Drosophila melanogaster (Moorman et al., 2006) and Caenorhabditis elegans (Deplancke et al., 2006). These studies suggest that the number of regulatory factors per gene may be higher in multicellular eukaryotes than in yeast.
Protein–protein interactions affecting co-regulators, signaling pathways, and other control systems also influence transcriptional regulation and must be incorporated into transcriptional regulatory networks. Genomic surveys of protein–protein interactions in yeast, flies and worms (Giot et al., 2003; Li et al., 2004; Uetz et al., 2000) reveal a scale-free network structure (i.e. few genes with many connections and many genes with few connections) similar to that seen for transcription factor targets (Albert, 2005). In animals, an additional layer of regulatory control is provided by small, noncoding, microRNA molecules that affect transcript levels by altering chromatin state or modulating mRNA stability (Ke et al., 2003; Stark et al., 2005).
Despite the many possible arrangements of regulatory factors and their target genes, five common motifs (Fig. 1B) have emerged from analyses of transcriptional regulatory networks in yeast (Lee et al., 2002): (1) Feed-forward loop, which involves three genes. Gene A regulates gene B, and together they regulate gene C. This motif is over-represented in transcriptional networks (Milo et al., 2002), and has properties well-suited to transcriptional regulation (Mangan and Alon, 2003). (2) Single input module, which features a single transcription factor that activates expression of a group of target genes. This type of motif is often associated with genes that respond to exogenous signals (Luscombe et al., 2004). (3) Multiple input module, which describes cases where the same group of regulatory factors controls expression of a battery of target genes. Genes that regulate embryonic development in D. melanogaster display this type of regulation (Erives and Levine, 2004). (4) Autoregulatory and feedback (multi-component) loops; these describe cases in which a gene product regulates expression of the gene encoding it, either directly (autoregulation) or through interaction with other genes (feedback loops). These motifs provide stability to patterns of gene expression (Becskei and Serrano, 2000). (5) Regulatory chain: cascades of regulatory interactions in which gene A regulates gene B, which regulates gene C, which regulates gene D, and so on. This motif contributes to the hierarchical structure of regulatory networks. All five of these regulatory motifs are also found in the regulatory networks of metazoans (e.g. Davidson et al., 2003; Levine and Davidson, 2005; Stathopoulos and Levine, 2005).
Modules: trait-specific pathways
Groups of genes regulating the same phenotype can be placed together into a pathway (Fig. 1C). Historically, pathways have been defined genetically by ordering the action of mutations that disrupt the same phenotype. When biochemical interactions responsible for these genetic effects are identified, the pathways can be integrated with transcriptional regulatory networks.
In multicellular animals, the two best-understood regulatory networks– at both the genetic and biochemical levels – control mesoderm development in sea urchins and embryonic patterning in D. melanogaster embryos (Levine and Davidson, 2005). These pathways share features thought to be representative of developmental regulatory systems in general (Davidson et al., 2003; Stathopoulos and Levine, 2005). For example, developmental pathways have a hierarchical structure with genes controlling the earliest regulatory events at the top and genes controlling the final differentiation at the bottom. Different functional classes of proteins act at different levels in these hierarchies, with genes encoding transcription factors and signaling molecules near the top and genes encoding enzymes and structural proteins at the bottom. Often, regulatory interactions that initiate a developmental program are followed by multi-gene feedback loops that maintain differentiated states. Both positive and negative regulators operate in these pathways and contribute to the robustness of developmental pathways. The wing development pathway of D. melanogaster, depicted in Fig. 1C, illustrates all of these features.
Regulatory factors that function in multiple pathways link trait-specific pathways together to form a complex genomic regulatory network (Fig. 1D). These common regulators can create pleiotropy within the network. However, independent control of gene expression in different pathways can minimize pleiotropic effects and generate modularity. The modularity of regulatory networks is a critical property that facilitates evolutionary change (Carroll et al., 2001).
Like developmental pathways, genomic networks also have properties that appear to be shared among eukaryotes. For example, all known regulatory networks share a scale-free distribution with a small number of highly connected genes (i.e. `hubs') and many genes with few connections (Albert, 2005). Genomic networks also have a hierarchical structure similar to individual developmental pathways (Yu and Gerstein, 2006) (Fig. 1E). Highly connected nodes occur at the top and middle of the hierarchy with minimally connected `terminal nodes', which do not directly impact regulation of other genes, at the bottom. The similarity of genetic architecture among species may result from shared ancestry, selection for an optimal design, or (most likely) both. Simulation studies have shown that the structure of regulatory networks confers a robustness and stability in the face of genetic and environmental perturbations known as `canalization' (Hornstein and Shomron, 2006; Siegal and Bergman, 2002).
Regulatory variation in a network context
Comparisons of gene expression alone, whether measured by microarrays or in situ hybridization, reveal little about the underlying molecular mechanisms that cause divergent gene expression. Integrating this comparative work with knowledge of regulatory networks based on genetic analysis and biochemistry will provide a more complete understanding of developmental, physiological and evolutionary processes. Here I use selected case studies to illustrate how comparative studies of gene expression can be used to examine the structure of regulatory variation within regulatory networks.
Environmental effects on gene regulation
A recent study investigating the effects of alcohol exposure on gene expression in the fruit fly D. melanogaster illustrates the power of using microarrays to understand physiological changes (Morozova et al., 2006). 3% of transcripts were found to have significant expression differences between flies that were and were not exposed to ethanol. Genes involved in olfaction, signal transduction, metabolism, transcription regulation, circadian rhythm and pigmentation changed expression more often than expected by chance. Some of these categories, such as olfaction and metabolism, fit prior expectations for the types of physiological changes induced by ethanol. Other classes, such as pigmentation genes, may represent secondary pleiotropic consequences of the network structure or genes whose functions are incompletely characterized.
To determine the fraction of genes with altered expression that functionally mediate the response to ethanol, mutant strains for 20 of the affected genes were tested for ethanol tolerance (Morozova et al., 2006). 15 of the mutants had an ethanol sensitivity that differed from that of a control, non-mutant strain. These data suggest that the majority of genes whose expression is affected by ethanol exposure contribute to ethanol tolerance. Up to 25% of the genes with altered expression, however, appear to be side effects resulting from pleiotropic connections in the underlying regulatory network. Pathways controlling expression of genes induced by external stimuli (e.g. ethanol) have been shown to contain a small number of transcription factors directly regulating expression of a collection of functionally related genes (Luscombe et al., 2004). This structure may contribute to the high specificity of expression changes induced in response to ethanol.
Compared to networks regulated by exogenous cues, networks controlling development and basic cellular processes tend to be controlled by more regulators, with extensive interactions and feedback loops among the regulators (Luscombe et al., 2004). This structure provides a variety of mechanisms for altering the output of regulatory systems. The developmental basis of polyphenism in ants illustrates this point (Abouheif and Wray, 2002). Exposure to juvenile hormone causes genetically identical individuals to develop into any one of three castes (i.e. reproductive, worker, soldier). Of these, only the reproductive caste has wings. Using the regulatory networks controlling wing development in D. melanogaster as a guide, patterns of gene expression for wing developmental genes were compared between winged and wingless castes. The point at which wing development is disrupted was found to differ between the two wingless castes as well as between the two developing wing discs within one caste. In one case, expression was disrupted only in a gene located at the bottom of the network, whereas in the other case, the developmental pathway was blocked at a much higher step in the pathway (Fig. 2). Expression changes affecting wing development do not interfere with other functions of these pleiotropic proteins because of the modularity in regulatory networks.
Variable expression within species
One way to understand how regulatory networks vary is to identify the genetic changes affecting gene expression, determine which genes they affect, and examine their placement in the network. Quantitative trait locus mapping can be used to identify regions of the genome responsible for expression differences (eQTL) between strains or individuals of the same species (Brem et al., 2002). To determine the molecular mechanisms by which genetic variants affect gene expression, the location an eQTL is compared to the location of the affected gene. If the eQTL is located near the affected gene, the change is inferred to act by modifying cis-regulation. If the eQTL is located elsewhere, the genetic change is inferred to act by modifying trans-regulation. Although discriminating between cis- and trans-acting changes does not identify the precise change within the regulatory network, it does separate changes at the gene itself from those residing in other factors.
The majority of eQTL identified in studies ranging from yeast to humans had trans-acting effects on gene expression (e.g. Brem et al., 2002; Cheung et al., 2005; Monks et al., 2004; Morley et al., 2004; Schadt et al., 2003). Yvert et al. characterized these trans-acting eQTL in yeast and found that eQTL for different genes often map to the same genomic region (Yvert et al., 2003). Assuming the causative sites responsible for the coincident eQTL are all located within the same gene, regulatory variation appears to be concentrated at highly connected hubs in the network. Indeed, only 100–200 distinct genes are estimated to account for all of the 1716 trans-acting eQTL identified in this study. The authors hypothesized that transcription factors with many target genes were responsible for these widespread effects; however, this class of proteins was found not to be over-represented near clustered eQTL. This is perhaps not surprising because a variety of biochemical classes can serve as hubs in regulatory networks. For example, a mutation in a receptor-associated G protein was shown to be responsible for one of the eQTL `hotspots'.
The high frequency of trans-acting eQTL with widespread effects could be due to a biased mutational process, or to a higher fitness for changes in hubs relative to other types of genes. Studies of mutation accumulation (MA) lines in the nematode Caenorhabditis elegans suggest that the mutational process itself may produce many variants with pleiotropic, trans-acting effects on gene expression (Denver et al., 2005). MA lines were created using single hermaphrodites to propagate independent lines derived from the same starting genotype for many generations. This procedure maintains all but the most severe mutations. After 280 generations, expression was compared among four MA lines. 9% of the genes were found to have evolved in expression differences in at least one line. By contrast, only 2% of genes were found to have expression differences among distantly related C. elegans strains isolated from the wild, demonstrating that natural selection eliminates many regulatory mutations. In the MA lines, but not in natural isolates, co-expressed genes were over-represented among genes with altered expression. Expression differences for groups of co-expressed genes were also observed in a similar mutation accumulation study of D. melanogaster (Rifkin et al., 2005). These findings suggest that only a few regulatory mutations with effects on multiple downstream genes are responsible for the extensive expression divergence observed. Some groups of coregulated genes changed expression in multiple mutation accumulation lines, suggesting that the portions of the network controlling expression of these genes are particularly susceptible to regulatory mutations. Because selection is minimal in mutation accumulation lines, the structure of the regulatory network is expected to control the distribution of regulatory mutations within the genome.
Expression divergence between species
Regulatory networks evolve by changing which genes interact as well as by changing how these genes interact (Babu et al., 2004). For example, cis-regulatory sequences may switch binding sites from one transcription factor to another, mutations in protein–protein interaction domains or within the DNA binding regions may abolish a connection in the network, or evolution of microRNA sequences may generate new target genes. Gene duplications and deletions can also add and subtract entire regulatory modules. Comparative studies of regulatory networks controlling development have identified divergent steps as well as sets of highly conserved regulatory interactions (reviewed in Davidson and Erwin, 2006). Differences in network structure between species may cause expression divergence, but may also reflect silent changes characteristic of developmental system drift (True and Haag, 2001). The presence of similar regulatory motifs and scale-free properties in regulatory networks from diverse eukaryotes suggests that regulatory connections are rearranged in a manner that has minimal effect on the overall network architecture. Mutations affecting the kinetics of individual regulatory interactions are expected to impact network output (i.e. gene expression) without altering its structure.
Interspecific comparisons of gene expression can be used to identify regulatory changes contributing to phenotypic divergence. Expression differences that correlate with phenotypic diversity have been observed for genes encoding transcription factors at the top of hierarchical pathways (e.g. Abzhanov et al., 2004; Averof and Patel, 1997; Gompel and Carroll, 2003; Sucena et al., 2003) as well as genes encoding enzymes at the bottom of pathways (e.g. Dickinson et al., 1984; Wittkopp et al., 2002). For all divergent phenotypes analyzed to date, only a subset of genes (often only one gene) in the developmental pathway is compared between species. Consequently, it remains unknown whether regulatory changes tend to cluster at the top, bottom or middle of a pathway.
Distinguishing between cis- and trans-acting variants is the first step for locating regulatory changes within a network. For a given gene, a cis-regulatory change indicates that the variant is associated with the gene surveyed. A trans-regulatory change indicates that the primary difference is located in a gene functioning upstream in the pathway. Transgenic and genetic studies comparing interspecific expression differences have shown that both cis- and trans-regulatory differences are common, regardless of where the gene fits within the regulatory pathway (Table 1). Consistent with these data, allele-specific analysis of 29 genes with expression differences between two Drosophila species found that 97% of genes with expression differences were affected by cis-regulatory divergence and approximately half showed evidence of trans-regulatory changes (Wittkopp et al., 2004).
Relating expression differences to underlying regulatory networks
The recent explosion of genomic resources and high throughput techniques is accelerating the elucidation of regulatory networks in model systems. Comparative studies of gene expression are an important tool for interpreting the function of network components and for understanding their stability and susceptibility to change. To understand how changes in regulatory networks contribute to phenotypic diversity within and between species, the following issues should be addressed.
(1) Many changes in gene expression are often induced by an environmental change. Some of the changes may be directly involved in the physiological adjustment while others may be secondary consequences of the regulatory network. Comprehensive descriptions of regulatory connections will help disentangle these types of changes by revealing connections between functional modules. However, genetic mapping and functional tests will ultimately be needed to identify the subset of genes for which expression changes impact the phenotype.
(2) Regulatory changes may be most stable when located in particular parts of a pathway. To test this hypothesis, complete pathways controlling divergent traits should be surveyed to locate all independent regulatory changes within the network. Locating regulatory variants will also make it possible to determine whether the connectivity of a gene within the network influences its propensity for change.
(3) The distribution of new regulatory mutations within a network appears to differ from the distribution of regulatory variants in the wild (Denver et al., 2005). Network architecture is expected to influence how regulatory variation arises, while the pleiotropic side effects of individual regulatory mutations are expected to influence which changes survive the test of time. To fully appreciate the impact of network architecture on evolutionary trajectories, properties that promote particular changes within regulatory networks must be identified.
(4) Some functional classes of genes may be more susceptible than others to regulatory mutations affecting their expression. Analyzing the distribution of regulatory variants among genes with different gene ontology designations will test this hypothesis. Such an analysis may also identify specific biological functions with a propensity for regulatory changes (e.g. sperm expressed genes in C. elegans) (Denver et al., 2005). However, any analyses using current gene ontology designations should be interpreted cautiously. At present, for most genes, gene ontology assignments of functional classes and biological processes are predicted solely based on sequence similarity and are awaiting genetic and/or biochemical verification.
As discussed in this review, existing case studies provide some insight into these issues. However, we have a long way to go toward understanding how regulatory variation is distributed within genomic regulatory networks and how network structure influences patterns of variable gene expression. A combination of genetic and biochemical dissection of regulatory networks in model systems, computational analyses of network properties, and comparative studies of gene expression among non-model species will be needed to resolve these issues. Given the recent growth in these research areas, a comprehensive understanding of regulatory variation in the context of regulatory networks may soon be achieved.
I would like to thank G. Kalay for helpful suggestions and careful reading of the manuscript. I apologize to authors whose work was not cited due to space constraints. This work was supported using funds provided by the University of Michigan.
Glossary available online at http://jeb.biologists.org/cgi/content/full/210/9/1567/DC1
- © The Company of Biologists Limited 2007