Comparative environmental genomics in non-model species: using heterologous hybridization to DNA-based microarrays.

SUMMARY The emerging field of comparative environmental genomics involves the cross-species comparison of broad-scale patterns of gene expression. Often, the goal is to elucidate the evolutionary basis or ecological implications of genomic responses to environmental stimuli. DNA-based microarrays represent powerful means with which to investigate gene expression, and the application of genomic tools to studies on non-model species is becoming increasingly feasible. The use of a microarray generated from one species to probe gene expression in another, a method termed `heterologous hybridization', eliminates the need to fabricate novel microarray platforms for every new species of interest. In this review, recent advances in heterologous hybridization are reviewed, and the technical caveats of this approach are discussed.


Introduction
Understanding how evolutionary history and genotype are linked to phenotypic responses to the environment is a central goal of modern comparative physiology. In the post-genomic era, this goal is being approached in novel ways through the application of resources, such as DNA-based microarrays, to the investigation of environmental regulation of gene expression. Of the growing number of applications of such resources, the use of transcriptomic profiling to characterize gene expression in organisms undergoing environmental perturbation is perhaps the most mature (Hoheisel, 2006). Transcriptional profiling has been used widely in both genetic model species [e.g. in yeast (Gasch et al., 2000;Gasch et al., 2001;Chen et al., 2003), in Drosophila (Girardot et al., 2004), in human cells (Murray et al., 2004) and in E. coli (Riehle et al., 2005)] and non-model species, with studies on fishes representing a dominant part of the literature in this field (Gracey et al., 2001;Oleksiak et al., 2002;Feder and Mitchell-Olds, 2003;Gracey et al., 2004;Podrabsky and Somero, 2004;Krasnov et al., 2005;Sneddon et al., 2005;Vornanen et al., 2005;Buckley et al., 2006). These types of studies can identify both phylogenetically conserved patterns of environmentally responsive gene expression as well as taxa-specific responses. In a similar way, differences in general and stressor-specific responses can be characterized. While a detailed treatment of the myriad applications of DNA-based microarrays is outside the scope of the current review, several recent syntheses are available that examine the revolutionary advances that these new genomic tools have enabled in various fields, including medicine, ecology, evolution, ecophysiology and ecotoxicology (Gracey and Cossins, 2003;Hoheisel, 2006;Lettieri, 2006).
The lack of available genomic sequence information for species outside the traditional genetic models is no longer an impediment to using genomic tools to investigate patterns of gene expression in these organisms. It is becoming increasingly clear that within related phylogenetic groups, adequate sequence identity exists for many genes to allow for a genomic platform developed for one species in the group to be applied to its other members. In the case of cross-species comparisons of gene expression using DNA-based microarrays designed for a single species, this approach has been termed 'heterologous' hybridization (Renn et al., 2004). In this review, I discuss the recent applications of heterologous hybridization to DNA microarrays, highlighting its strengths and weaknesses. Various factors are considered that may affect the efficacy of this approach, including such variables as the phylogenetic distance between the species involved, the nature and length of the DNA probes affixed to the microarray platform and the experimental design employed.
The emerging field of comparative environmental genomics involves the cross-species comparison of broadscale patterns of gene expression. Often, the goal is to elucidate the evolutionary basis or ecological implications of genomic responses to environmental stimuli. DNAbased microarrays represent powerful means with which to investigate gene expression, and the application of genomic tools to studies on non-model species is becoming increasingly feasible. The use of a microarray generated from one species to probe gene expression in another, a method termed 'heterologous hybridization', eliminates the need to fabricate novel microarray platforms for every new species of interest. In this review, recent advances in heterologous hybridization are reviewed, and the technical caveats of this approach are discussed.

Measuring gene expression with DNA-based microarrays
In the following discussion, a 'microarray' refers to any platform (usually glass microscope slides or nylon membranes) to which oligonucleotide or cDNA probes are permanently affixed and to which samples of fluorescently labeled target cDNAs, generated through reverse transcription of expressed mRNA, are hybridized. The competitive hybridization of two samples, each labeled with a unique fluorescent dye, allows for the direct comparison of the mRNA levels from each sample. The results are generally reported as a set of ratios representing the fold-difference in expression between the two samples for each spot or 'feature' on the microarray. In the majority of microarray studies, the DNA probes affixed to the platform are generated from the same species that provide the fluorescently labeled samples (i.e. 'single-species' hybridization). A growing number of studies are utilizing probes generated from one species to examine gene expression in other species. The obvious advantage of this heterologous hybridization is that it avoids the expense in time and money associated with the production of cDNA clones and expressed sequence tag (EST) data when fabricating novel microarrays for every new species of interest, provided a platform for a related extant species is accessible. Often, the primary question then becomes: to what degree must one species be related to another for microarray technology to be transferable across taxa?

The effect of sequence divergence on microarray analyses
The primary technical challenge presented by heterologous hybridization is the problem of sequence divergence between the species for which the microarray was constructed and the species providing the sample to which it will be hybridized. Sequence divergence influences hybridization kinetics and therefore it is important to differentiate between differences in detection intensity that are due to actual differential gene expression and those that may be due to sequence mismatches. The endeavor is complicated by the fact that gene-by-gene divergence rates will differ from other metrics of phylogenetic distance such as species' evolutionary divergence time or average genome-wide divergence rates.
The competitive hybridization of genomic DNA from multiple species to a single-species array can be helpful in providing a quantitative assessment of the impact of sequence divergence on overall hybridization efficiency. For example, in a study on different species of Drosophila , genomic DNA from D. melanogaster displayed an average of 4.2% greater hybridization to a D. melanogaster array than did genomic DNA from D. simulans. This disparity in hybridization strength was in broad agreement with the known sequence divergence between these two species (3.8% different at the nucleotide level). Especially in cases where the degree of sequence divergence between two species is not known, the relative binding of genomic DNA from the two species will provide an idea as to the effect of evolutionary distance on hybridization efficiency.
The problem of comparative differences in expressed isoforms creating false positives is also likely to increase with evolutionary distance. This is of particular concern for members of large gene families with many isoforms and/or variants of ancestral genes. As species diverge, it becomes increasingly difficult to discern specific patterns of expression in such families where multiple cDNAs may bind to a single probe bearing a conserved region shared by all isoforms or variants.
Owing solely to sequence divergence, the number of features that a single-species microarray can detect in targets from another species is expected to decrease with increasing phylogenetic divergence. This appears to generally hold true, although not to the extent that one might initially suppose. In a study employing a 16·006-gene salmonid microarray, those features generated from Atlantic salmon (Salmo salar) or rainbow trout (Oncorhynchus mykiss) were equally able to detect target cDNAs from either species, despite the 8-20·million years of divergence time between these two species (von Schalburg et al., 2005). In another study, Rise et al. (Rise et al., 2004) tested the ability of a 7356-feature cDNA microarray, generated from ESTs from rainbow trout and Atlantic salmon, to detect target cDNAs from lake whitefish (Coreogonus clupeaformis) and smelt (Osmerus mordax). As expected, hybridization performance did rank according to evolutionary relationships, with the lowest number of features being detected in the most diverged species (smelt). However, 38% of the Atlantic salmon features on the microarray detected smelt target cDNAs, compared with 70% of Atlantic salmon targets. While hybridization performance decreased by approximately half in smelt, this nevertheless resulted in nearly 2500 features being successfully detected. In a sense, then, this approach merely reduces the effective size of a given microarray. With current technology allowing for the dense spotting of many thousands of features onto glass slides, the detecting power of even a numerically diminished microarray still remains considerable.
It is important to bear in mind however that 'number of detected spots' does not translate into information on changes in expression level. The ability of a given platform to detect changes in gene expression, particularly in poorly detected features, would be expected to diminish with increasing phylogenetic distance as sequence mismatches begin to create variation in hybridization strength, even for features that pass the detection threshold. However, these challenges may be mitigated by the choice of experimental design, as discussed in a later section.

Short oligonucleotides versus full-length cDNAs
Heterologous hybridization efficiency may also depend upon the nature and length of the probes employed. Short oligonucleotide probes, such as those on Affymetrix GeneChips ® (Affymetrix, Inc., Santa Clara, CA, USA), are likely to be more sensitive to sequence mismatches than are longer probes, such as full-length cDNAs. In one of the first studies to employ heterologous hybridization (Enard et al., 2002), arrays generated from either human oligonucleotide sequences (Affymetric U95A arrays) or longer cDNAs (~1000·bp) were used to characterize quantitative differences in gene expression among several primate species. The authors acknowledge the likelihood that sequence differences between species may have affected the outcome of the experiments using short oligonucleotide probes. They assert, however, that with the use of longer cDNA probes, the 0.8% nucleotide sequence difference between human and chimpanzee was not expected to affect the results significantly and that variation in the data due to sequence divergence was smaller than that due to experimental error. By using longer probes and by maintaining high stringency in hybridization conditions (e.g. keeping hybridization temperature at or close to 65°C for all hybridizations and using high-stringency washing procedures), non-specific binding of mismatched targets can be kept to a minimum.
Where the use of short oligonucleotides is desired, advantage can be gained from the fact that the complete sequences of the spotted features from probes and targets are often known and the effect of sequence identity on detection strength can be determined directly. Methods of masking the poorly hybridized probes in silico can then restrict further analysis to those probes that possess sufficient sequence homology to detect target cDNAs across taxa. This approach has been used to compare gene expression among mammals as evolutionarily diverged as dogs, pigs, cows and humans (Ji et al., 2004). As demonstrated in a recent study on Xenopus, it is also possible in such circumstances to remove the hybridization bias from interspecific comparisons by calculating and applying correction factors to expression data (Sartor et al., 2006).

'Apples to apples': the importance of experimental design
The choice of experimental design can also help mitigate the challenges presented by between-species sequence divergence. Heterologous hybridization studies have generally employed one of two basic experimental designs (Fig.·1). In some cases, a microarray containing probes directed against species 1 is used to compare gene expression profiles in two different species (species 2 and 3; Fig.·1A). With this design, there are two divergence factors to consider; namely, the percentage of sequence homology between species 1 and 2 and that between species 1 and 3 (note that in some cases, species 1 may be the same as 2 or 3, in which case one of the percentages of sequence identity will be 100%).
For very closely related species, this design may nevertheless be effective and has been used, for example, to explore patterns of gender-biased gene expression in different species of Drosophila Meiklejohn et al., 2003). However, in a study on primates that employed both single-and multi-species microarrays to directly test the limits of inter-specific competitive hybridization (Gilad et al., 2006), it was demonstrated that the difference in sequence homology between humans and chimpanzees was sufficient to affect the B. A. Buckley resulting gene expression values when target cDNAs from each species were directly compared first on a human-based microarray and then on a chimpanzee microarray. Even the use of relatively long cDNA probes apparently did not eliminate the problem, which was especially significant for instances when the differences in gene expression between species were subtle (e.g. ~1-2 fold).
The use of an alternative experimental design (Fig.·1B) avoids the problem of phylogenetic distance between the two samples being competitively hybridized by always comparing two different samples from the same species (i.e. comparing 'apples to apples' rather than 'apples to oranges'). With this design, there is only a single divergence factor to consider (species 1 vs 2; Fig.·1B) and it applies equally to both samples, allowing for accurate measurements of their relative levels of specific mRNAs. The two samples could differ in any experimental variable, such as treatment, time point, collection site, developmental stage or tissue.
This design was recently used to demonstrate the efficacy of heterologous hybridization in measuring biologically meaningful differences in gene expression for several species . Two general experimental designs often employed by studies using heterologous hybridization to microarrays. In the first design (A), samples from two different species (species 2 and 3) are competitively hybridized against one another to a microarray generated from oligonucleotides or cDNAs from a single species (species 1). Note that, in some cases, species 1 may be the same as either 2 or 3. In this design, the sequence distance between species 1 and 2 will differ, to some degree, from that between 1 and 3; if this difference is too great, it may affect hybridization kinetics, which may in turn artificially affect the generated gene expression values. Under the second design (B), the two hybridized samples are always from the same species, and the two samples generally differ in another variable, e.g. treatment, time point or tissue. With this design, the only sequence divergence factor is that between species 1 and 2, and this factor should affect both hybridized samples equally.
of fish, using a ~4500-feature cDNA microarray that was generated from brain tissue of an African cichlid, Astatotilapia burtoni (Renn et al., 2004). Target cDNA samples from brain and mixed muscle from this species were competitively hybridized against one another on the microarray to establish a set of 804 'reference' genes that were expressed differentially between these two tissues. Subsequently, similar hybridizations were performed comparing muscle and brain samples from seven other fish species. These species included three other members of the order Perciformes, as well as more distantly related species, such as the zebrafish Danio rerio (diverged from A. burtoni by ~200·million years). As expected, the total number of features detected decreased with phylogenetic distance, although the decrease was surprisingly moderate. In even the most diverged species, Renn et al. found that 3000-4000 spots out of 4500 were detected by the A. burtoni microarray. Hybridization efficiency was particularly high among the perciform fishes, even though this order spans over 65·million years of divergence time.
Another important finding of this study was that, of the 804 reference spots whose expression differed between tissues in A. burtoni, nearly 80% also differed in the other perciform species. This number did decrease significantly, however, in comparisons of more highly diverged species. For instance, only ~20% of the reference spots displayed changes in expression in zebrafish, the most phylogenetically distant species examined. This underscores the inverse relationship between sequence divergence and the conservation of gene regulatory patterns, even for features that are well detected by a given array. Nevertheless, these results support the ability of heterologous hybridization to reveal conserved patterns of biologically relevant gene expression across considerable taxonomic spans such as those encompassing the perciform fishes.
In my laboratory, similar success has been achieved using a 9200-feature cDNA microarray generated from ESTs from the eurythermal goby Gillichthys mirabilis to characterize the responses to heat stress in the cold-adapted (and evolutionarily distant) fish species of the Antarctic (B.A.B., unpublished data). In keeping with the findings above, the number of spots detected using heterologous targets tends to decline with evolutionary distance, but not significantly. Interestingly, the fold-changes in expression measured in the heterologous hybridizations were lower than those measured in hybridizations using the homologous targets [a similar phenomenon was observed among fish species by Renn et al. (Renn et al., 2004)]. Whether this represents a reduced ability of the Antarctic fish to up-and down-regulate gene expression or is an artifact of heterologous hybridization remains to be determined.

Conclusions
Based on the first round of studies exploring cross-species microarray analysis, it appears that single-species platforms present a promising means by which to explore genomic responses to the environment across related species, even in non-model organisms. Prudent measures should always be employed to ensure that poorly detected features are excluded from analysis, even though this may reduce the effective size of a given microarray. While successful detection of numerous features in cross-species analyses is encouraging, the impact of sequence divergence on the conservation of gene regulatory patterns is significant. As with any microarray experiment, 'spot-checking' of selected expression data with routine methods of mRNA quantification such as quantitative real-time PCR (qPCR) and northern blotting can also provide another layer of quality control and help strengthen the results obtained by heterologous hybridization to DNA-based microarrays.
This manuscript was much improved by the comments of Dr George Somero and those of two anonymous reviewers.