Many complex systems can be represented and analyzed as networks, and examples that have benefited from this approach span the natural sciences. For instance, we now know that systems as disparate as the World Wide Web, the Internet, scientific collaborations, food webs, protein interactions and metabolism all have common features in their organization, the most salient of which are their scale-free connectivity distributions and their small-world behavior. The recent availability of large-scale datasets that span the proteome or metabolome of an organism have made it possible to elucidate some of the organizational principles and rules that govern their function, robustness and evolution. We expect that combining the currently separate layers of information from gene regulatory networks, signal transduction networks, protein interaction networks and metabolic networks will dramatically enhance our understanding of cellular function and dynamics.
- systems biology
- complex networks
- computational biology
- protein interaction networks
- metabolic networks
The post-genomic era has brought unprecedented opportunities to bridge and bring together traditionally separate disciplines in the natural sciences. The development of high-throughput techniques and the wide availability of large biological datasets, ranging from annotated genomes to organism-level maps of protein interactions and cellular metabolism, have made it possible simultaneously to probe cellular function at multiple levels. Most of the dramatic progress in the natural sciences during the last century can be directly related to the reductionist approach; the behavior of a system can be predicted and understood solely from the detailed knowledge of the system's elementary constituents.
However, it is by now clear that our ability to understand simple fundamental laws governing the individual building blocks is a far cry from being able to predict the overall behavior of a complex system (Anderson, 1972). Since evolutionary forces have shaped the complex and highly non-linear interactions between genes, proteins and metabolites, there exists considerable variation in the nature of both the elementary building blocks and their interactions, requiring the development of novel methods capable of uncovering cellular organization and functional principles at the systems level.
In this review, I aim to show how computational systems biology (Kitano, 2002), and more particularly network theory as applied to biological systems, offers quantifiable tools to uncover organizational principles of biological systems at the cellular level.
Network analysis of protein interaction systems
In building a network from physical protein binding data, it is customary to consider individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins, e.g. as measured by high-throughput experiments, as a link between two corresponding nodes. Fig. 1A shows the protein interaction network (PIN) for the yeast C. elegans using data from various high-throughput experiments available from The BioGRID (version 2.0.20; http://www.thebiogrid.org/). The lowest connectivity nodes (only a single neighbor) are colored blue, nodes with an intermediate connectivity (two to nine) are green, while the highly connected nodes (⩾10 neighbors) are colored red. This figure suggests that the network is somewhat organized in a layer structure, with the majority of the singly connected nodes at the periphery and the highly connected nodes in the center. However, we need to introduce quantitative, statistical measures to systematically probe the properties of this PIN.
In this example, links represented experimentally measured binding. However, links may represent more general relationships between proteins than just physical binding. For instance, correlations between mRNA expression profiles in microarray data can be used as a basis for the determination of a direct link between two nodes. In this situation, one may define interactions between proteins whose mRNA expression profiles have a correlation value above an appropriately chosen cut-off, say κ, while no links are introduced when the pair-wise correlation values are less than κ.
With the availability of large-scale experimental data on PINs, such as those for Saccharomyces cerevisiae (Uetz et al., 2000; Ito et al., 2001; Gavin et al., 2002; Ho et al., 2002) and Drosophila melanogaster (Giot et al., 2003), network approaches have become crucial for developing a comprehensive understanding at the organism level. There exist many methods to dissect and analyze networks in general, and the PIN in particular. In the following, I will discuss the most common of these methods, while highlighting the biologically relevant information that can be gleaned.
Applying the tools of network analysis, a system's interacting elements (e.g. genes, proteins or metabolites) are represented as nodes, and the existence of an interaction between two elements as a link between the respective nodes. Links may carry information about the interaction, either as a link weight (interaction strength) or by specifying an interaction asymmetry (link direction). In general, a network consists of N nodes and M links and is represented mathematically as a binary matrix frequently called the `adjacency matrix' [aij]. An interaction between the nodes i and j is present when the matrix element aij=1 and absent if aij=0.
The modeling and analysis of systems as disparate as the World Wide Web and PINs has revealed surprising similarities in their structural organization. Possibly the simplest measure to characterize the role that a node (in this section, a protein) plays in the network is the `node connectivity', or degree, ki=Σjaij. We can also define the `average node degree' in the network, corresponding to an average protein's number of interaction partners,<k>=(1/N)Σiki. However, these measures do not provide a detailed insight into the organization of PINs.
To gain a more detailed insight into the structure of the PIN, we study the `connectivity distribution' given by P(k)=Nk/N, where Nk is the number of nodes with k neighbors. From this measure, we may determine the variation in connectivities in the network. Such distributions were studied by Erdös and Rényi (e.g. Bollobás, 2001) and they showed that simple random graphs lead to a Poisson connectivity distribution. However, for many real networks, including the PINs, P(k) does not have a Poisson-type behavior or, even more generally, a unimodal behavior as predicted by the Erdös–Rényi random graph theory. Instead, P(k) is frequently found to adhere to a heavy-tailed distribution that is often modeled as a power-law P(k)∼k–γ (Albert and Barabási, 2002). Fig. 1 shows a side-by-side comparison of a generic Poisson and power-law distribution using linear (Fig. 1B) and logarithmic scales (Fig. 1C). Notably, the logarithmic scale represents the power-law distribution as a straight line, and its decay is clearly seen to be significantly slower than that of the Poisson. Consequently, slowly decaying distributions such as the power-law are described as being heavy-tailed. Fig. 2 shows the connectivity distribution of the PINs of the yeast S. cerevisiae, the nematode Caenorhabditis elegans and the fruit fly D. melanogaster (see also Table 1).
It is interesting to note that if the connectivity distribution had instead been single-peaked, such as a Poisson or a Gaussian, the notion of a typical node, as described by the average connectivity <k>, would be valid. Since the PINs are networks with a heavy-tailed connectivity distribution, the majority of the nodes only have a few interaction partners while they coexist with nodes that participate in hundreds of interactions. Consequently, there exists no typical node in the PINs, and they are frequently described as being `scale-free'. The class of nodes with a very large number of interaction partners is called a network `hub'. These hub proteins often have biological properties that are significantly different from those of proteins participating in only a few interactions. Note that no formal definition exists to separate a hub protein from non-hub proteins.
One of the most popular network models that captures the observed heterogeneity of the connectivity distribution was proposed by Barabási and Albert (Barabási and Albert, 1999). It is similar to a model by Price (Price, 1965) [see Newman (Newman, 2003b), for a detailed discussion and comparison of the models]. These models are based on the notion that in a growing and evolving network, new nodes are not connected with uniform probability to already existing nodes. Instead, new nodes have a higher chance of connecting to those with many neighbors than to nodes with few. This is often called the rich-gets-richer effect or `preferential attachment'. If the chance Πi of connecting to an already existing node i is linearly proportional to the node degree ki, i.e.Π i=ki/Σjkj, the resulting connectivity distribution is a power-law with an exponent ofγ =3 (Albert and Barabási, 2002; Newman, 2003b).
Note that, if the effective preferential attachment rule is a non-linear function of the degree k, we can no longer expect the resulting connectivity distribution to be scale-free (Krapivsky et al., 2000; Krapivsky and Redner, 2001). In particular, if the preferential attachment rule is slower than linear in k, the connectivity distribution is a stretched exponential. For the case of a preferential attachment rule that is faster than linear in k, the resulting network is of a star type, where the majority of the nodes are connected to a single `super-hub' (Krapivsky et al., 2000; Krapivsky and Redner, 2001).
Protein interaction networks and evolution
Although the connectivity distribution of PINs is heavy-tailed, which is consistent with the preferential attachment prediction, it is far from clear that the actual evolutionary mechanisms responsible for the current structure of these networks are related to preferential attachment. It appears unlikely that an evolutionary process directly measures the size of a protein's network neighborhood. In fact, multiple alternative processes exist that may give rise to a scale-free connectivity distribution (Newman, 2005). These include local network growth rules, such as gene duplication (addition of nodes) and gene diversification (loss and/or addition of links) (for a review, see Albert and Barabási, 2002), all giving rise to scale-free connectivity distributions. Consequently, models based on local growth mechanisms demonstrate that there are many possible network expansion rules that have an effective linear preferential attachment as a result. Nevertheless, it is possible to directly estimate the evolutionary rates of link addition or removal, as well as those of node duplication from empirical data (Wagner, 2001). Focusing on the yeast PIN, two empirical studies (Eisenberg and Levanon, 2003; Wagner, 2003) clearly support the hypothesis that local network-growth rules give rise to linear preferential attachment, where highly connected proteins display an elevated rate of interaction turnover.
It has long been argued that biological systems are `functionally modular' (e.g. Hartwell et al., 1999), and it has been a much sought-after goal to understand how this modularity is reflected in the structure of the networks. The `clustering coefficient' (c) of a node (Watts and Strogatz, 1998): measures the degree to which the neighborhood of a node resembles a complete subgraph built from triangles and is the ratio of the actual number of triangles to all possible triangles, for which node i is a member. Consequently, ci is a measure of the cliquishness, or transitivity, of the local neighborhood. Take Fig. 4C as a network example. Nodes D–B–E are connected in a triangle, while nodes A–C–B–D are connected in a square. The clustering of node A is cA=0, since there is no direct link between its nearest neighbors, nodes C and D. However, the clustering of node E is cE=1, since its two (only) neighbors are connected. Finally, the clustering of node B is cB=1/6, since some of its neighbors (namely nodes D and E) are directly connected.
The average clustering coefficient<C>=(1/N)Σici provides information on the global distribution of links. A value of<C> close to unity indicates a high level of modularity, or cohesiveness of triangles, in the network, while a value close to zero indicates a lack of modularity. It is customary to test the significance of a particular <C> value by comparison with a suitable random-network model consisting of the same number of nodes and links (Albert and Barabási, 2002). For most such null models, we would find a reference clustering of <C>rand=<k>/N, where <k>=2M/N is the average node degree.
Assuming that a network has a non-zero <C>, we may further investigate the network's large-scale modularity structure by studying the average clustering as a function of node degree, C(k) (Dorogovtsev et al., 2002). If the network shows a hierarchical modularity (Ravasz et al., 2002), then the clustering C(k)∼1/k. In this case, nodes with few neighbors tend to have network neighborhoods with high clustering, while the highly connected nodes act as bridges tying different parts of the network together. However, the network modules are not clearly discernible, being interwoven on all levels.
In many real networks, there exist correlations between the properties of neighboring nodes. In particular, it is often the case that the connectivity of neighboring nodes is correlated. When these correlations are absent, we can expect that the joint probability of two randomly selected nodes i and j having ki and kj neighbors, respectively, is P(ki,kj)=P(ki)P(kj). However, in the presence of such node–node correlations, knowing the connectivity ki of node i, we have received information about the connectivity of any node j directly connected to node i with a link. Several methods have been developed to measure these connectivity correlations, and we will highlight two of them (Maslov and Sneppen, 2002; Pastor-Satorras et al., 2001; Newman, 2002; Newman, 2003a).
The first method to measure correlations between neighboring nodes was suggested by Vespignani and co-workers (Pastor-Satorras et al., 2001). It measures connectivity correlations by calculating the `neighborhood connectivity' of a node knn,i=(1/ki)Σjkjaij, where index nn denotes `nearest neighbor'. Consequently, knn,i measures the affinity with which a node i connects to other nodes of either high or low degrees. In Fig. 3, we have plotted the function knn(k), which is the average neighborhood degree for nodes with connectivity k. Note that if knn(k) is an increasing function of k, the network shows an `assortative' mixing, and high-degree nodes preferentially tend to be connected to other high-degree nodes. For the opposite situation, where knn(k) is a decreasing function of k (as in Fig. 3B), low-degree nodes tend to be connected to high-degree nodes, and the network is `disassortative'. This is also the typical case for computer networks, where a limited number of servers each are connected to a large number of individual computers (Pastor-Satorras et al., 2001).
The second method of measuring degree–degree correlations in a network is the Pearson correlation, ρ, in nearest neighbor degrees, called the `assortativity' (Newman, 2002). A Pearson correlation is often interpreted as a measure of a linear relationship between two variables, in this case the connectivity of node pairs joined by a link. The degree–degree correlation ranges fromρ =1 to ρ=–1. The distribution knn(k) and the assortativity ρ are related as follows. If knn(k) is uniform, then ρ=0. However, if knn(k) is increasing or decreasing, then ρ is positive or negative, respectively. The magnitude of ρ indicates the strength of the correlation. It is straightforward to develop similar expressions for directed networks (Newman, 2003a).
The last column of Table 1 shows the assortativity ρ for three whole-organism-level PINs. As expected, the trends displayed in Fig. 3 agree with the assortativity correlations calculated using ρ (Newman, 2002). In particular, Fig. 3A and Fig. 3C show no clear increasing or decreasing trend in knn(k), which agrees well with the calculated assortativity values being close to zero. Taken together, these two methods offer detailed insights into the connectivity correlations of a network.
Protein interaction networks and essentiality
So far we have discussed topological properties of PINs without emphasizing the close relationship between network representations and biological information. The first indication that the large-scale structure of a PIN network might carry biological information arose from investigations of network robustness (Albert et al., 2000). This work demonstrated that networks with heavy-tailed connectivity distributions were robust against random failures, yet fragile when an attack occurred at a highly connected node. The robustness of a network was evaluated in terms of a network topology measure, the `giant component'. A connected component consists of all nodes between which there exists a path, and the giant component is the largest among the connected components. The third column of Table 1 lists the giant component of the three PINs. We can study the resilience of a network to node removal by monitoring the size S of the giant component while randomly deleting nodes from the network (corresponding to failure) or iteratively removing the largest hubs (corresponding to attack). Through such a node-removal analysis, it was discovered that networks with a scale-free connectivity distribution retain a giant component while subject to random failures (Albert et al., 2000). On the other hand, when the scale-free networks are subject to attack, they fragment very quickly. Consequently, these networks are extremely robust against random perturbations, yet highly susceptible to targeted attacks.
Several molecular biology techniques are now available for the experimental perturbation and disruption of PINs. In fact, a large-scale experimental study in S. cerevisiae shows that only 18.7% of the total number of genes are essential on disruption or removal (Giaever et al., 2002), while a study on Escherichia coli found 13.7% of the genes to be essential (Gerdes et al., 2003). Motivated by the above theoretical and experimental observations on network fragility, Barabási and co-workers investigated the possibility of correlations between a protein's connectivity and its phenotypic essentiality, discovering an increasing likelihood for highly connected proteins to be essential (Jeong et al., 2001). In other words, the more interaction partners a protein has, the more likely it is to be involved in an essential cellular function. This result is often called the `centrality-lethality' rule. Although recently debated (Coulomb et al., 2005), careful analyses strongly support the centrality-lethality rule (Batada et al., 2006).
A recent study suggests that the increased lethality of highly connected proteins can be explained by a simple mechanism (He and Zhang, 2006). The idea is to explain the centrality-lethality rule by assuming that essential nodes and `links' are randomly distributed on the network. The function of an essential link is carried out by the interaction of the incident proteins, and both nodes are essential. This model generates the centrality-lethality rule through the simple fact that it is more likely for a hub to partake in an essential link than for a low-degree node. By choosing the essential link and node fractions appropriately, it is possible to fit the observed centrality-lethality rule within experimental error bars (He and Zhang, 2006).
Since highly connected proteins occupy a special role in the network, it is interesting to study if hub proteins should evolve at a different pace from proteins with only a few interaction partners (Batada et al., 2006). Indeed, because highly connected proteins do not have a higher density of active domains, they do not show any significant difference in mean rate of protein evolution. However, the hub proteins of S. cerevisiae contain a higher number of phosphorylation sites than do non-hub proteins and show a marked trend of being encoded by mRNAs with short half-lives. This indicates that highly connected proteins are subject to much tighter control, being part of dynamic short-lived protein complexes (Batada et al., 2006).
Protein interaction networks and dynamics
We have focused on the static aspects of a PIN, but proteins are constantly being degraded and produced and many carry out their functions in specific cellular locations such as a cellular membrane. A more realistic depiction would address the temporal and spatial aspects of the situation. Whole-organism protein-expression arrays are currently unavailable, and the chosen substitute has been the mRNA expression array. A recent analysis (Han et al., 2004) indicates that highly connected nodes in the S. cerevisiae PIN are either `date-hubs', binding to their partners at different times or locations, or `party-hubs', which interact with most of their network neighbors simultaneously. Including temporal aspects in the PIN analysis allows for the investigation of information flow, since the temporal activation of protein transcription is reflective of evolved regulatory mechanisms to ensure proper cellular responses to external stimuli.
Network analysis of metabolism
Cellular metabolism depends on enzymatic reactions where substrates, such as glucose or acetate, are converted into products by enzymes. However, the set of metabolic reactions can be translated into a network representation in many different ways. Fig. 4 demonstrates several possible network representations of a simple metabolic reaction set. Fig. 4A describes the relationship between the metabolites A–F. In the first reaction, A+B→C+D, we say that A and B are educts and C and D are products. A common network representation is displayed in Fig. 4C, where metabolites are nodes, and two metabolites are connected with an undirected link if they participate as an educt and a product, respectively, in the same reaction. Note that a link does not represent a single reaction, or enzyme, as two metabolites may appear in multiple reactions. An example of this possibility is shown in Fig. 4A, where metabolites A and D co-occur in reactions R1 and R3, and the link between A and D in Fig. 4C corresponds to both reactions. To further complicate the mapping, one reaction may also appear as multiple links (see Fig. 4). An alternative representation is that of a bipartite network (Fig. 4E), where the two kinds of nodes are metabolites or enzymes. For this case, a directed link from (to) a metabolite to (from) an enzyme indicates that the metabolite acts as an educt (product) in that reaction. Finally, a metabolic reaction set may also be represented as a reaction–reaction network (Fig. 4F). Here, the nodes are reactions and a (possibly directed) link is included between two nodes (reactions) i and j if a metabolite is used as an educt (product) in reaction i and as a product (educt) in reaction j.
Metabolic network structure
The various network representations of Fig. 4 have different statistical properties. Using the bacterial metabolism in E. coli as an example, Fig. 5 shows the differences in the connectivity distribution, P(k), implied by the three network representations detailed in Fig. 4B–D. Note that P(k) is heavy-tailed in all panels of Fig. 5; however, the result is not as simple for a bipartite network representation (Fig. 4E). In this case, it is possible to distinguish between metabolites and enzymes; for the metabolites, the connectivity distribution is heavy tailed, while the enzyme distribution is best fit by an exponential. This is not surprising, as cofactors such as ATP or NADP may contribute to hundreds of reactions while an enzyme has a limited number of active domains. To further contrast and compare potential biases of various network representations, Table 2 shows the clustering<C> and the assortativity ρ for three organisms using the network representations of Fig. 4B,C. As expected, the clustering and assortativity corresponding to Fig. 4B is significantly higher than that of Fig. 4C, since the network representation in the former implies a fully connected subgraph for each reaction.
Weighted metabolic networks
The majority of network studies have focused on topological properties and not on the rate of metabolic activity, which can vary significantly from reaction to reaction. This important function is not captured by standard topological approaches. It is necessary to include this information in the network description to develop an understanding of how the structure of a metabolic network affects metabolic activity. A meaningful understanding requires us to consider the intensity (i.e. strength), the direction (when applicable) and the temporal aspects of the interactions. Although much is still unknown about the temporal aspects of metabolic activity inside a cell, recent results have provided information about the relative intensities of the interactions in single-cell metabolism (Sauer et al., 1999; Canonaco et al., 2001; Gombert et al., 2001; Emmerling et al., 2002; Fischer and Sauer, 2003; Cannizzaro et al., 2004; Blank et al., 2005; Fischer and Sauer, 2005). We may incorporate these results into the network analysis by considering links not only to be present or absent, but additionally to carry a `link weight' that reflects the non-uniform interaction strength between two nodes. A natural, although not unique, measurement of interaction strength for a metabolic network is the amount of substrate being converted to a product per unit time, called the `flux' of the reaction.
A simple linear optimization approach, called `flux-balance analysis' (FBA), enables us to calculate the flux rate for each reaction in a whole-cell metabolic network. The FBA method is based on the assumption that the concentration of all cellular metabolites, [Ai], not subject to transport across the cell membrane must satisfy the steady-state constraint of d[Ai]/dt=ΣjSijνj=0, where Sij is the stoichiometric coefficient of metabolite Ai in reaction j, t is time, and νj is the steady-state flux of reaction j. We follow the convention that Sij<0 (Sij>0) if metabolite i is a substrate (product) in reaction j. Take Fig. 4A as an example. The stoichiometric coefficients of reaction j=R3 are then SA,R3=–2, SE,R3=–1, SD,R3=1, while SB,R3=SC,R3=SF,R3=0. Note that any flux value νi satisfying the steady-state constraint corresponds to a stoichiometrically allowed state of the cell. To select flux values that are biologically relevant, we optimize for cellular growth. Experiments support this hypothesis in several conditions, but there are also other meaningful objectives. See Bonarius et al. (Bonarius et al., 1997) and Kauffman et al. (Kauffman et al., 2003) for a more detailed discussion of FBA.
The recent advances in whole-genome annotation has made it possible to generate high-fidelity whole-cell level metabolic networks. Metabolic models of the prokaryotic Helicobacter pylori and E. coli, as well as the eukaryote S. cerevisiae, have been used to predict `essential genes' (Edwards and Palsson, 2000; Schilling et al., 2002; Duarte et al., 2004; Papp et al., 2004), `epistatic interactions' where the action of one gene is modified by one or multiple genes at different loci (Segre et al., 2005), and possible `minimal microbial genomes' (Burgard et al., 2001; Pal et al., 2006). The resulting fluxes from FBA measure each reaction's relative activity. In particular, Almaas et al. demonstrate that, similar to the degree distribution, the flux distribution of E. coli displays a strong overall inhomogeneity: reactions with fluxes spanning several orders of magnitude coexist in the same environment (Almaas et al., 2004). Applying the FBA computational approach, the flux distribution for S. cerevisiae (Fig. 6) is heavy-tailed, indicating that P(ν)∼ν–α with a flux exponent ofα =1.5. In a recent experiment, the strength of the various fluxes of the central metabolism of E. coli was measured using nuclear magnetic resonance (NMR) methods (Emmerling et al., 2002), revealing a power-law flux dependence P(ν)∼ν–1 (Almaas et al., 2004). This power-law behavior indicates that a vast majority of reactions with small fluxes coexists with a few reactions that have large fluxes.
The FBA approach allows us to analyze the metabolic network as a weighted network since each reaction is assigned a flux value. Such a generalization of non-weighted network measures was originally introduced in the context of the airline transportation and co-authorship networks (Barrat et al., 2004). The first of the generalized network measures is called the `node strength', si, of a node i, defined as si=Σjwijaij, where wij is the weight of the link connecting nodes i and j, and aij is the adjacency matrix as before. The node strength acts as a generalization of the node degree to weighted networks and sums the total weight on the links connected to a node. Fig. 7 shows the distribution of node strengths, P(s), for E. coli metabolism with glucose as the single carbon source.
We continue by generalizing the clustering coefficient to weighted networks. Since ci indicates the local density of triangles, a similar definition with link-weights should make it possible to discern if large or small weights are more or less likely to be found clustered together. We denote one possible definition given by Barrat et al. (Barrat et al., 2004) as cw,i, and the average weighted clustering is<Cw>=(1/N)Σicw,i. If no correlations exist between weights and topology, this new definition of clustering coefficient is equal to that of the unweighted network. Furthermore, we may identify two possible scenarios. If <Cw> is greater than <C>, large weights are predominantly distributed in local clusters, whereas if <Cw> is less than <C>, triangles are built using mostly low-weight links. Other possible definitions of a weighted clustering coefficient with somewhat different properties have been proposed (Onnela et al., 2005; Zhang and Horvath, 2005; Holme et al., 2007).
Fluxes and metabolic network structure
The flux distributions of a metabolic network rely on the network topology. Some of this dependence is understood by studying the correlation between wij, the strength of the link connecting nodes i and j and their respective connectivities, ki and kj. The metabolic fluxes scale as<wij>∼(kikj)θ, where θ=0.5 under glucose-limited conditions in S. cerevisiae (Fig. 8A) and E. coli (Macdonald et al., 2005), as well as the World-Air-Transportation network (Barrat et al., 2004). We may also find similar behavior in network models. As an example, the betweenness-centrality [a measure of how many shortest paths utilize a given node or link (see Brandes, 2001; Freeman, 1977; Newman, 2001; Wasserman and Faust, 1994) on the Barabási–Albert network model (Fig. 8C)]. However, other values for θ are possible, as demonstrated in Fig. 8B, where we findθ =0.7 for metabolic fluxes under acetate-limited conditions.
How does the network structure influence flux patterns on the level of single metabolites? The observed scale-free flux distribution is compatible with two quite different potential local flux structures. A homogeneous local organization would imply that all reactions producing (consuming) a given metabolite have comparable flux values. On the other hand, a more de-localized, or `hot backbone', is expected if the local flux organization is heterogeneous, such that each metabolite has a dominant source (consuming) reaction. To distinguish between these two scenarios, we define the measure Y(k,i) (Barthelemy et al., 2003; Almaas et al., 2004) for each metabolite produced or consumed by k reactions, with the following characteristics. If all reactions producing (consuming) metabolite i have comparable values, Y(k,i)≈1/k. If, however, the activity of a single reaction dominates, then Y(k,i)≈1, i.e. Y(k,i) is independent of k. For the two cases where the E. coli metabolic performance is optimized with glucose and succinate as the only available carbon sources, Y(k)∼k–0.27. This is an intermediate behavior between the two extreme cases described above. However, the exponent value of β=–0.27 indicates that the large-scale inhomogeneity observed in the overall flux distribution is increasingly valid at the level of the individual metabolites as well.
Consequently, for most metabolites, a single reaction can be identified that dominates its production or consumption. A simple algorithm is capable of extracting the sub-network solely consisting of these dominating reactions, called the `high-flux backbone' (HFB) (Almaas et al., 2004). This algorithm has the following two steps: (1) for each metabolite, discard all incoming and outgoing links except the two links that dominate mass production; and (2) from the resulting set of reactions, keep only those reactions that appear as both a maximal producer and a maximal consumer.
Note that the resulting HFB is specific to the particular choice of system boundary conditions (i.e. environment). Interestingly, the HFB mostly consists of reactions linked together, forming a giant component with a star-like topology that includes almost all metabolites produced in a specific growth environment. Only a few pathways are disconnected; while these pathways are members of the HFB, their end-products serve only as the second most important source for some other HFB metabolite. One may further analyze the properties of the HFB (Almaas et al., 2004); however, we limit our discussion and simply mention that groups of individual HFB reactions largely agree with the traditional, biochemistry-based partitioning of cellular metabolism into pathways. For example, in the E. coli metabolic model, all metabolites of the citric acid cycle are recovered, and so are a considerable fraction of other important pathways, such as those being involved in histidine, murein and purine biosynthesis, to mention a few. While the detailed nature of the HFB depends on the particular growth conditions, the HFB captures the reactions that dominate the metabolic activity for this condition. As such, it offers a complementary approach to elementary flux mode and extreme pathway analyses (Schuster and Hilgetag, 1994; Schilling et al., 2000; Papin et al., 2004), which successfully determine the available modes of operation for smaller metabolic sub-networks.
Metabolic core reactions
Any whole-cell metabolic model contains a number of transport reactions for the uptake of nutrients and excretion of byproducts. Consequently, we may systematically sample among all possible environments captured by the model through varying the constraints on uptake reactions. This analysis suggests that optimal metabolic flows are adjusted to environmental changes through two distinct mechanisms (Almaas et al., 2004). The more common mechanism is `flux plasticity', involving changes in the fluxes of already active reactions when the organism is shifted from one growth condition to another. For example, changing from glucose- to succinate-rich media altered the flux of 264 E. coli reactions by more than 20%. Less commonly, environmental changes may induce `structural plasticity', resulting in changes to the metabolism's active wiring diagram, turning on previously zero-flux reactions and inhibiting previously active pathways. For example, when shifting E. coli cells from glucose- to succinate-rich media, 11 previously active reactions were turned off completely, while nine previously inactive reactions were turned on.
The `metabolic core' is the set of reactions found to be active (carrying a non-zero metabolic flux) in all tested environments. In recent computational experiments where more than 30 000 possible environments were sampled, the metabolic core contained 138 of the 381 metabolic reactions in the model of H. pylori (36.2%), 90 of 758 in E. coli (11.9%) and 33 of 1172 in S. cerevisiae (2.8%) (Almaas et al., 2005). While these reactions respond to environmental changes only through flux-based plasticity, the remaining reactions are conditionally active, being turned on only in specific growth conditions.
The metabolic core can be further partitioned into two types of reactions. The first type consists of those that are essential for biomass formation under all environmental conditions (81 out of 90 reactions in E. coli), while the second type of reaction is required only to assure optimal metabolic performance. In case of the inactivation of the second type, alternative sub-optimal pathways can be used to ensure cellular survival. However, the compact core of S. cerevisiae only contains reactions predicted by FBA to be indispensable for biomass formation under all growth conditions. A similar selection of metabolic reactions was suggested by Burgard et al. (Burgard et al., 2001). Their `minimal reaction' contains the metabolic core as well as all reactions necessary for the sustained growth on any chosen substrate. A different definition of a minimal reaction set was proposed by Reed and Palsson (Reed and Palsson, 2004), which consists of the 201 reactions that are always active in E. coli for all 136 aerobic and anaerobic single-carbon-source `minimal environments' capable of sustaining optimal growth.
A reasonable speculation is that the reactions in the metabolic core play an important role in the maintenance of crucial metabolic functions since they are active under all environmental conditions. Consequently, the absence of individual core reactions may lead to significant metabolic disruptions. This hypothesis is strengthened through cross-correlation with genome-scale gene-deletion data (Gerdes et al., 2003): 74.7% of those E. coli enzymes that catalyze core metabolic reactions (i.e. core enzymes) are essential, compared with a 19.6% lethality fraction for the non-core enzymes. A similar pattern of elevated essentiality is also present when analyzing large-scale deletion data for S. cerevisiae (Giaever et al., 2002). Here, essential enzymes catalyze 84% of the core reactions, whereas the conditionally active enzymes have an average essentiality of only 15.6% (Almaas et al., 2005). The likelihood that the cores contain such a large concentration of essential enzymes by chance is minuscule, with P-values of 3.3×10–23 and 9.0×10–13 for E. coli and yeast, respectively.
Metabolic core reactions also stand apart from the conditionally active ones when comparing their evolutionary conservation. In comparing the core enzymes of E. coli with a reference set of 32 bacteria, the average core conservation rate is 71.1% (P<10–6) while the non-core enzymes have a homology matching of only 47.7%. Taking into account correlations between essentiality and evolutionary conservation, one would expect the core enzymes to show a conservation level of 63.4% (Almaas et al., 2005).
These results indicate that an organism's ability to adapt to changing environmental conditions rests largely on the continuous activity of the metabolic core, regardless of the environmental conditions, while the conditionally active metabolic reactions represent the different ways in which a cell is capable of utilizing substrates from its environment. This suggests that the core enzymes that are essential for biomass formation, both for optimal and suboptimal growth, may provide effective antibiotic targets, given the cell's need to maintain the activity of these enzymes in all conditions.
Network approaches provide an important set of tools to analyze and dissect complex systems spanning from biology to the social sciences. Their generic applicability has successfully been exploited by bringing measures to bear on biological problems that, for example, were originally developed for transportation systems (Albert and Barabási, 2002). As the focus of this review has been PINs and metabolism, a variety of network approaches have given us the opportunity to interrogate the interconnected nature of cellular networks. It is, however, important to remember that the cell is far from a static environment, and it is absolutely necessary to develop new approaches to incorporate both the temporally and spatially dynamic nature of biological systems.
To achieve an accurate description of cellular networks, we also need to couple the available information on gene regulatory, signal transduction, protein interaction and metabolic networks. So far, the majority of research has been focused on studying these networks as separate entities. In particular, the study of metabolism has already shown great promise for coupling to transcription regulatory networks (Covert et al., 2004). Although our current knowledge of kinetic parameters is severely limited, making the development of detailed kinetic models largely intractable, approaches such as FBA married with network methods have opened the door for organism-level investigations of quasi-dynamic cellular response to external and internal perturbation.
The author wishes to thank Profs Holder and Livingstone at Trinity University for helpful discussions and input. This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48, and supported by LDRD Grant 06-ERD-061.
Glossary available online at http://jeb.biologists.org/cgi/content/full/210/9/1548/DC1
- © The Company of Biologists Limited 2007