Functional genomics research using Fundulus heteroclitus has focused on variation among individuals because of the evolutionary importance and value of Fundulus in explaining the human condition (why individual humans are different and are affected differently by stress, disease and drugs). Among different populations and species of Fundulus, there are evolutionarily adaptive differences in gene expression. This natural variation in gene expression seems to affect cardiac metabolism because up to 81% of the variation in glucose utilization observed in isolated heart ventricles is related to specific patterns of gene expression. The surprising result from this research is that among different groups of individuals, the expression of mRNA from different metabolic pathways explains substrate-specific metabolism. For example, variation in oxidative phosphorylation mRNAs explains glucose metabolism for one group of individuals but expression of glucose metabolism genes explains this metabolism in a different group of individuals. This variation among individuals has important implications for studies using inbred strains: conclusions based on one individual or one strain will not necessarily reflect a generalized conclusion for a population or species. Finally, there are surprisingly strong positive and negative correlations among metabolic genes, both within and between pathways. These data suggest that measures of mRNA expression are meaningful, yet there is a complexity in how gene expression is related to physiological processes.
- functional genomics
- cardiac metabolism
- evolutionary selection
- Fundulus heteroclitus
- mRNA expression
- gene expression
- phenotypic variation
Microarrays measure the expression of hundreds or thousands of genes at a time, offering near global measurements at the level of the transcript. The importance of the patterns of expression revealed by this technology has been questioned (Feder and Walser, 2005). There are good reasons to wonder about the utility of quantifying the mRNA expression of specific genes, and most of these concerns have to do with the multiple biochemical and physiological steps that modulate gene expression other than at the level of the transcript (e.g. micro-RNAs, translation, protein turnover, covalent modifications of enzymes and protein–protein interactions). To better understand the validity of transcriptomic data and whether mRNA expression underpins phenotypic differences, we need to understand how gene expression varies within and among taxa, whether the variation in mRNA co-varies with a specific phenotypic character or biological process and whether these patterns of mRNA expression are adaptively significant. All three are necessary for defining the biological importance of gene expression because they each provide separate information to confirm or reject the importance of gene expression.
If mRNA expression regulates phenotypic variation, then we would expect the variation in gene expression to reflect the differences among populations and species. For example, one would expect that outbred, highly polymorphic species would have greater variance than inbred species. One would expect greater variance among taxa as the genetic distance is increased. This variation should also correlate with appropriate phenotypic variation. That is, with an increase in physiological performance, one would expect a change in the mRNA that affects this performance. Yet, and this is the crux of the problem, does the lack of correlation between mRNA and a physiological measure arise because one does not have the correct gene or because there are epistatic interactions, both of which make the patterns difficult to discern? Finally, if the pattern of mRNA variation is to be shown as adaptive, then it must affect a phenotype that is selectively (biologically) important.
Evolutionary variance in gene expression
Much progress has been made in understanding the variation in gene expression using microarrays in yeast, worms, fish, mice and humans (Cavalieri et al., 2000; Cheung et al., 2003a; Denver et al., 2005; Gibson and Weir, 2005; Jin et al., 2001; Oleksiak et al., 2002; Oleksiak et al., 2005; Pritchard et al., 2001; Schadt et al., 2003; Townsend et al., 2003; Whitehead and Crawford, 2005; Whitehead and Crawford, 2006a; Whitehead and Crawford, 2006b). Evolutionary analyses indicate that stabilizing selection affects much of gene expression (Denver et al., 2005; Lemos et al., 2005). Stabilizing selection distinguishes between individuals with phenotypes closer to the mean versus those that deviate from the mean and selects against individuals that deviate. Thus, the observation that stabilizing selection affects a majority of mRNA indicates that small changes in gene expression have biological effects.
Although many genes have significant differences in expression among individual F. heteroclitus within a population, the magnitude of these differences is small, typically less than 1.5-fold (Oleksiak et al., 2002; Oleksiak et al., 2005; Whitehead and Crawford, 2005; Whitehead and Crawford, 2006a; Whitehead and Crawford, 2006b). For Fundulus, it is the many small changes in metabolic gene expression that together appear to be responsible for the phenotypic variation in cardiac metabolism (Oleksiak et al., 2005). Additionally, although stabilizing selection eliminates many mutations affecting gene expression because this variation is slightly deleterious, there is significant additive genetic variation affecting gene expression that can be the source of adaptive change. In Drosophila simulans (Wayne et al., 2004), the additive heritable variation is distributed among chromosomes, with much of the effect acting in trans (Wayne et al., 2004). Among 14 large human families, approximately 29% of genes have significant additive variation due to both cis and trans loci (Morley et al., 2004). Among mice, 19% of all loci have significant additive variation (Cui et al., 2006). Among Drosophila (Nuzhdin et al., 2004) and primates (Caceres, 2003; Enard, 2002; Gilad et al., 2005; Gilad et al., 2006; Khaitovich, 2004a; Khaitovich, 2004b) much of this additive genetic variation for expression is evolving by natural selection.
In Fundulus, by applying evolutionary analyses to natural populations that have experienced the effects of selection, we were able to document patterns of expression affected by directional, stabilizing and balancing selection (Crawford et al., 1999a; Crawford et al., 1999b; Oleksiak et al., 2002; Pierce and Crawford, 1997a; Whitehead and Crawford, 2006a; Whitehead and Crawford, 2006b). Our data on Fundulus, as well as other investigators' data on Caenorhabditis elegans (Denver et al., 2005), Drosophila (Lemos et al., 2005; Nuzhdin et al., 2004) and humans (Caceres, 2003; Enard, 2002; Gilad et al., 2006; Gilad et al., 2005; Khaitovich, 2004a; Khaitovich, 2004b), suggest that the variation in gene expression is selectively important; thus, this variation is biologically important. This is only one, albeit an important, criterion for establishing the importance of gene expression. It is also important to define the heritability of gene expression and how it relates to important phenotypic differences.
Genetics of gene expression
Much of gene expression measured by microarrays is genetic; it differs between inbred lines, is associated with quantitative trait loci (QTLs) and has narrow sense heritability (h2) greater than 30% [narrow sense heritability is due only to the additive genetic variation (Va), or h2=Va/Vp, where Vp is phenotypic variation] (Cheung et al., 2003b; Gibson and Weir, 2005; Sharma et al., 2005; Tan et al., 2005). Among F1 generations from two inbred mice strains, approximately two-thirds of all loci have measurable h2 with a quarter having an h2 of >50% (Cui et al., 2006). In both humans and mice, the median h2 is 34% among loci with measurable h2 (Cui et al., 2006). The variation in regulatory processes affecting gene expression has been inferred by combining microarray and QTL studies [expressed QTLs (eQTL)]. These studies identify both cis- and trans-acting loci that are related to differences in gene expression in Drosophila (Wang et al., 2004; Wayne and McIntyre, 2002), yeast (Brem and Kruglyak, 2005; Brem et al., 2005; Ronald et al., 2005; Yvert et al., 2003), mice (Chesler et al., 2005; Doss et al., 2005; Ghazalpour et al., 2005; Schadt et al., 2003) and humans (Monks et al., 2004; Morley et al., 2004; Schadt et al., 2005). In general, 20–30% of differential expressions are due to a cis-eQTL (Doss et al., 2005; Ronald et al., 2005). Yet with more powerful analyses, gene expression becomes more complex, involving many loci with a few loci that affect the expression of many genes (Brem and Kruglyak, 2005; Brem et al., 2005; Gibson and Weir, 2005; Schadt et al., 2005; Stamatoyannopoulos, 2004). These data suggest a complex regulation of gene expression in which polymorphisms among several loci affect the variation in gene expression of a particular gene. It is important to realize that heritability in gene expression indicates that gene expression is stable between generations. This stability suggests that random biological variation or `noise' is not the principal cause of variation in gene expression.
Although much progress has been made in understanding the variation in gene expression, we are unsure of its importance in affecting phenotypic variation. It is the phenotypic variation on which natural selection acts that defines human populations and humans' susceptibility to disease, drugs and stress, and thus is of scientific importance. To understand the importance of gene expression and its effect on phenotypic variation, more attention needs to be paid to the variation among individuals and whether there is variation in which genes are `important' in effecting a change in phenotype. We present here a summary of microarray data from Fundulus that supports our contention that mRNA expression affects physiological performance and is thus evolutionarily important.
Cardiac metabolism in Fundulus is measured using isolated heart ventricles (Oleksiak et al., 2005; Podrabsky et al., 2000), and individual determinations were made by alternating between populations. Metabolic rates are measured in triplicate by determining oxygen consumption in a well-mixed chamber for each of three different substrates: 5 mmol l–1 glucose, 1 mmol l–1 fatty acid (FA; palmitic acid–bovine serum albumin) or LKA [5 mmol l–1 lactate, ketones (5 mmol l–1 each hydroxybutyrate and acetoacetate) and 0.1% ethanol]. Two inhibitors of glucose metabolism (20 mmol l–1 2-deoxy-glucose and 10 mmol l–1 iodoacetic acid) were used when measuring FA and LKA metabolism. The addition of inhibitors reduces metabolism to less than 15% of the rate with glucose only. Adding FA or LKA caused a significant increase in metabolism compared with metabolic rates of hearts with inhibitors and glucose, indicating that much of glucose metabolism was inhibited when using both inhibitors. Although determination of metabolism for all three substrates took approximately 15 min, isolated heart ventricles from Fundulus are able to maintain stable metabolic rates for greater than 45 min. All metabolic rates from isolated heart ventricles were a function of body mass, which is highly correlated (r=0.7–0.85) with heart mass (see below). These data indicate that our measures of metabolism are substrate dependent.
Individual ventricles were splayed open for all metabolic measures. Fundulus hearts, like those of most small fish, lack coronary circulation, and thus oxygen is supplied by diffusion from the internal blood flow (Farrell, 1993). Dissecting or splaying open the heart provided greater access and more uniform interfaces for the Ringer solution and the internal surfaces of the heart.
When measuring oxygen consumption in isolated ventricles, we were primarily concerned with the variation between individuals that can arise from genetic, developmental or physiological mechanisms (e.g. due to acclimation to different temperatures). All individuals were acclimated to a common laboratory environment (temperature, salinity, light:dark cycle, feeding regime, etc.) and were assayed at similar times of day. Despite minimizing physiological sources of variation, there can be other biological differences. For example, a fish could be sick or stressed due to social interactions or `unhappy' for unknown reasons. These sources of variation are difficult to ascertain, and their importance remains unknown. Technical variation can arise due to poor heart preparation, electrical interference with the oxygen electrode, poor mixing of Ringer solution or poor maintenance of the oxygen electrode. Multiple determinations provide some measure of consistency and thus estimates technical variance. However, because a single heart cannot be measured on separate days, the replicate measures are dependent on the status of the electrode, heart preparation and electronic noise during the 15–20 min measurement period. One can measure `day effects' among individuals, and this was not significant (P>0.5). That is, among the 3 days that F. heteroclitus cardiac ventricles were measured, hearts had approximately the same mean. However, because there is much variation among individuals it would be difficult to detect a day effect. Thus, the lack of day effect does not eliminate this as a source of variation but only makes it less probable.
Among 16 hearts measured (eight from a southern, Georgia population and eight from a northern, Maine population), more oxygen was consumed per time, with an average r2 (explained variance) of 0.94, 0.86 and 0.84, for glucose, fatty acid or LKA, respectively. That is, there was little variance within a specific measure of oxygen consumption. Among the triplicate measures of oxygen consumption for each substrate, the coefficient of variation (CV; % standard deviation relative to the mean) was 12%, 31% or 21% for glucose, FA and LKA, respectively. This means that 95% of all measures of an individual heart will fall within 60% of the mean.
Among the 16 individuals, there were 4.9-, 15.9- and 4.8-fold differences between the highest and lowest metabolic rates for glucose, FA and LKA, respectively. These individuals differed in body mass, and body mass affected all three measures of metabolism (Fig. 1). Thus, this range of metabolism was due both to differences in body mass and to individual variation. The effect of body mass was removed by using the residual from the log regression, resulting in 2.0-, 11.2- and 2.5-fold differences between the highest and lowest metabolic rates for glucose, FA and LKA, respectively. With the caveats described above concerning technical variation, replicate measures enabled the statistical testing of inter-individual differences in substrate-specific metabolism among 16 individuals: P<0.0001, P<0.005 and P<0.02 for glucose-, FA- and LKA-dependent metabolism, respectively (Fig. 2). These data suggest that there was significant variation in metabolism among individuals. These differences also exist if one compares just the eight individuals within the Maine population (P<0.05 for all three measures) and glucose-specific metabolism in the Georgia population (P<0.005). Fig. 2 illustrates these differences, but more importantly is the variation in the relative use of each substrate within an individual (Fig. 2B). For most individuals, metabolism fuelled by FA was greater than with either glucose or LKA (Fig. 2). For example, for individual ME06, FA metabolism was more than 10-fold greater than the other two substrates. Yet, in other individuals (ME10 and ME03), glucose metabolism was greater than FA metabolism. These measures of substrate-specific metabolism within an individual are unlikely to be due to technical variation because they are measured within a short time (less than 20 min) on the same heart preparation.
Fundulus gene expression and metabolism
For F. heteroclitus, we have demonstrated that gene expression explained the variation in cardiac metabolism among 16 male individuals from natural outbred populations raised in a common environment (Fig. 3) (r2=0.65–0.81) (Oleksiak et al., 2005). But the relationship was complex, a vast majority of genes were different between individuals and these expression patterns cluster the 16 individuals into three groups. It is within these three groups that one can show an association between gene expression and metabolic rates.
For mRNA expression, 94% of the genes had expression levels that were significantly different among individuals within a population (P<0.01) (Oleksiak et al., 2005). Using a very conservative multiple correction (Bonferroni's or Fmax permutations), 84% were significantly different. These differences were not due to one or a few individuals: all possible permutations of six out of eight individuals within each population had on average 78.9% of genes that were significantly different among individuals (P<0.01). This high frequency of differences was among male individuals raised in a common environment and was unrelated to body mass or any other obvious physiological or experimental condition (Oleksiak et al., 2005). What is apparent is that the three groups of individuals share different patterns of expression (Oleksiak et al., 2005).
Among the 16 individuals examined, shared gene expression patterns cluster the 16 individuals into three groups of 5, 5 and 6 individuals. However, there was no difference in the mean or variation in substrate-specific metabolism among these three groups (P>0.4 for glucose and FA metabolism, and P>0.05 for LKA metabolisms) (Fig. 2). Yet, gene expression differences among the three groups were statistically robust. Differences in gene expression within each of the three groups were less than those found within a single population (62–73% vs 94% of genes are statistically significant) and were significantly less (P<0.01) than those found in the 2052 random combinations of five individuals (average 85%). The number of genes that were significantly different among the three groups (50 genes) was considerably more than the 12 genes that were different between populations and significantly greater than the average of one significant difference found among three random groups formed by 1000 random permutations. These data indicate that the three groups were functionally distinctive and the differences were robust.
To explore the relationship between metabolism and gene expression within the three groups, the variation in gene expression was reduced to the principal components (PCs) for glycolytic, tricarboxylic acid (TCA) and oxidative-phosphorylation metabolic pathways (Table 1) (see Table S1 in supplementary material for a list of genes). A PC is a linear equation that sums the measures of gene expression where each gene is multiplied by a coefficient. The weight of the coefficient is chosen to maximize the explained variation among individuals without, in this case, reference to metabolism. For example, the first PC for glucose-related enzymes explains 54% of the variation among these 15 genes. Additionally, there were both strong positive and negative weighting factors [e.g. glyceraldehyde 3-phosphate dehydrogenase (GAPDH) expression is multiplied by –0.63, and aldehyde dehydrogenase (ALDH) by 0.37] (Table 1).
These PCs summarize the metabolic pathway-dependent RNA expression and statistically explain significant proportions of the variation in cardiac metabolism, but only within the three defined groups of fish (Fig. 3). For glucose metabolism, 81% of the variation in group 1 individuals was explained by changes in expression of genes involved in glucose. However, in groups 2 and 3, these glycolytic genes had little power to explain the differences among individual metabolic rates. Instead, genes of the oxidative-phosphorylation pathway explained the variation in glucose-specific metabolic rates. Similarly, gene expression in different pathways explained FA- and LKA-specific metabolism. These patterns, where gene expression from a pathway explains substrate-specific metabolism, only occur if one examines these groups of individuals. Permutation analyses indicate that few other random sets of five individuals share a common relationship between gene expression and phenotype. There is no relationship of one or more genes to metabolism among all 16 individuals. Nor do any of the PCs explain metabolism among all 16 individuals.
What these data suggest is that the genes that are important, which explain the variation in cardiac metabolism, differ among individuals. So, for example, altering glycolysis can affect glucose utilization but it will not do so in all individuals.
Correlated patterns of gene expression
Among the PCs, some genes have greater importance, as indicated by their larger coefficients. These coefficients are both negative and positive (Table 1). These coefficients reflect the strong correlation in expression among metabolic genes within and between pathways. In Fig. 4, the correlations between two genes among the 16 individuals are displayed as green and blue boxes in an all-against-all matrix. A significant positive correlation (a green cell) indicates that the expression for these two genes would be similar among all individuals (i.e. both would be high or low in the same individuals). Surprisingly, the expressions of most metabolic genes (Fig. 4) are either negatively or positively correlated. For example, an individual with a high expression of phosphofructokinase (PFK) also has high expression of GAPDH but low expression for phosphoglucoisomerase B (PGI-B) (Fig. 4C). These correlation patterns repeat themselves in each of the three pathways (Fig. 4B–D). In the linear pathway of glycolysis, an increase in aldolase, Ldh-B and enolase among 16 individuals is matched with a decrease in PFK, GAPDH and pyruvate kinase. These enzymes are interspersed along the pathway. What is more difficult to see are the correlations among proteins that form enzyme complexes or pathways. In the first enzyme complex of oxidative phosphorylation, NADH dehydrogenase, the subunits have significant positive or negative correlations for expression with each other (correlations are significant for 11 of the 20 NADH subunits). This pattern is repeated for complex 4 (cytochrome c oxidase) and complex 5 (ATP synthetase). These subunits have to form at a stoichiometric ratio for each enzyme, yet they have negative correlations for expression! Although these opposite patterns of gene expression present a biochemical conundrum, an obvious molecular reason is that many of these enzymes share common transcription factors.
Our microarray analyses of cardiac physiology demonstrated that >80% of the variation in substrate metabolism can be explained by variation in metabolic gene expression (Fig. 3) (Oleksiak et al., 2005). The strength of this study and our other cardiac microarray studies (Oleksiak et al., 2002; Oleksiak et al., 2005; Whitehead and Crawford, 2005; Whitehead and Crawford, 2006a; Whitehead and Crawford, 2006b) is that they used an outbred vertebrate population (such as humans) where the variation among individuals was substantial enough to statistically define meaningful patterns. The weakness of these studies is that they used outbred populations, which left unresolved the genetic contributions to the variation in gene expression that explain cardiac metabolism. For the studies provided here, only males were used, and these individuals were acclimated to the same environment. Thus, physiologically induced differences due to temperature, hypoxia or any other environmental parameter do not explain the observed differences.
Our conclusions that mRNA expression was statistically related to cardiac performance, that it is evolving by natural selection and that it varies in a predictable manner depending on genetic distance (Oleksiak et al., 2002; Oleksiak et al., 2005; Whitehead and Crawford, 2006a; Whitehead and Crawford, 2006b) suggest that microarray experiments can be meaningful. Thus, in contrast to Feder (Feder and Walser, 2005), we argue that there is much utility in measuring genome-wide patterns of gene expression. We suggest that these microarrays present patterns of mRNA expression that are both informative and provide unexpected relationships. However, they are not simple and thus are subject to misinterpretation.
We present several observations concerning individual variation: (1) there are significant differences in cardiac metabolism among individuals; (2) there are large differences in which substrate is preferentially used, (3) a vast majority of genes have significant differences in expression and (4) the genes that statistically explain the variation in substrate utilization differ among individuals. Our ability to explain these patterns depends on the delineation of three distinct groups among the 16 individuals, each group exhibiting a distinctive but consistent pattern of gene expression. The biological importance of these three groups is related to the correlated patterns of gene expression among all 16 individuals (Fig. 4). Most genes were positively or negatively correlated. Thus, individuals with high expression of genes `A' to `E' also had low expression of genes `W' to `Z', and the opposite pattern happened for other individuals (high expression of W–Z and low expression of A–E). Thus, the groups (especially groups 2 and 3) represent individuals with opposite patterns of gene expression. The surprising observation is that these patterns occur within pathways. Thus, individuals in any one group share patterns of expression within a pathway that are different (opposite) from those in other groups. For example, among individuals in group 2, relative to group 3, there was greater expression of GAPDH and PFK but lower expression of aldolase, aldehyde dehydrogenase, pyruvate kinase and PGI.
What is the meaning of these large differences among individuals?
There were significant differences among individuals in (1) metabolism, (2) which metabolic substrate was preferentially used, (3) the pattern of metabolic gene expression and, consequently, (4) the relationship between gene expression and metabolism. The greater metabolic rate among some individuals was due to a greater utilization of glucose whilst in others it was the utilization of FA. The variation in specific substrates was explained by the expression of different metabolic pathways, but these explanations differ among groups of individuals. For example, fatty acid utilization can be explained by changes in oxidative, TCA or glycolytic enzyme expression depending on which group of individuals is examined.
These data have several implications. The first is that to examine one or a few inbred individuals could lead to misleading conclusions. Imagine having an inbred line from one F. heteroclitus individual. Cardiac metabolism in this imaginary strain could be primarily dependent on glucose and not on fatty acid utilization. Thus, one would conclude that glucose was the primary energy source for the heart. Yet, this is only correct for a subset of individuals. Similarly, the investigation of fatty acid utilization in cardiac tissue using one individual would reveal that gene expression in glycolytic enzymes regulates FA utilization as opposed to the TCA pathway, which would be found using another individual. Neither conclusion is incorrect, just misleading because it makes conclusions based on too few individuals.
The second implication from these data is that many genes or different pathways can affect substrate-specific metabolism. It is not surprising that the expression of glycolytic enzymes explains glucose metabolism (Fig. 3, group 1). However, the observation that the relative expression of genes in the oxidative phosphorylation pathway explained glucose metabolism among some individuals is somewhat surprising. More surprising was that glycolytic enzyme expression explained 75% of the variation in fatty acid utilization in group 3 individuals (Fig. 3). If these data are correct, it suggests that the activity of one pathway affects the flux through other pathways. Thus, we cannot measure one or a few enzymes and expect it to always explain metabolic variation among many different individuals.
The third implication is that it is unlikely that one gene, or a set of genes, is responsible for the phenotypic variation among all individuals. For example, if cardiac metabolism is related to health or fitness, then which genes affect the health of an individual is dependent on the status of other genes and pathways, an outcome that is entirely consistent with studies of metabolic epistasis, where metabolism is dependent on variation at other loci (Clark and Wang, 1997; Segrè et al., 2005). Thus, we should not expect a `magic bullet' that will cure everyone of a specific disease. Instead, there will need to be different cures for different individuals, supporting the concept of personalized medicine that is currently driving many pharmaceutical research programs (Nadeau and Topol, 2006).
Finally, the lack of correlation between gene expression and phenotype is not necessarily due to the lack of importance of gene expression. Instead, if the importance of gene expression is context dependent then a significant relationship will only be discernable from within a specific context. Thus, the inability to accept an alternative hypothesis that there is a relationship between gene expression and phenotype does not support the null hypothesis. Instead, one has failed to address the proper hypothesis.
Glossary available online at http://jeb.biologists.org/cgi/content/full/210/9/1613/DC1
Supplementary material available online at http://jeb.biologists.org/cgi/content/full/210/9/1613/DC2
- © The Company of Biologists Limited 2007