Variation in limb loading magnitude and timing in tetrapods

ABSTRACT Comparative analyses of locomotion in tetrapods reveal two patterns of stride cycle variability. Tachymetabolic tetrapods (birds and mammals) have lower inter-cycle variation in stride duration than bradymetabolic tetrapods (amphibians, lizards, turtles and crocodilians). This pattern has been linked to the fact that birds and mammals share enlarged cerebella, relatively enlarged and heavily myelinated Ia afferents, and γ-motoneurons to their muscle spindles. Both tachymetabolic tetrapod lineages also possess an encapsulated Golgi tendon morphology, thought to provide more spatially precise information on muscle tension. The functional consequence of this derived Golgi tendon morphology has never been tested. We hypothesized that one advantage of precise information on muscle tension would be lower and more predictable limb bone stresses, achieved in tachymetabolic tetrapods by having less variable substrate reaction forces than bradymetabolic tetrapods. To test this hypothesis, we analyzed hindlimb substrate reaction forces during locomotion of 55 tetrapod species in a phylogenetic comparative framework. Variation in species means of limb loading magnitude and timing confirm that, for most of the variables analyzed, variance in hindlimb loading and timing is significantly lower in species with encapsulated versus unencapsulated Golgi tendon organs. These findings suggest that maintaining predictable limb loading provides a selective advantage for birds and mammals by allowing energy savings during locomotion, lower limb bone safety factors and quicker recovery from perturbations. The importance of variation in other biomechanical variables in explaining these patterns, such as posture, effective mechanical advantage and center-of-mass mechanics, remains to be clarified. Summary: Variation in proprioceptive sensory systems across tetrapods influences loading behaviors of the limbs.

The neuromuscular basis for high rhythmicity of the cyclic movements of birds and mammals is hypothesized to lie with an enlarged cerebella, relatively enlarged and heavily myelinated Ia afferents, and γ-motoneurons to their muscle spindles (Ross et al., 2013). The cerebellum is an important regulator of predictive and responsive correction of external perturbations (Aoi et al., 2013;Butler and Hodos, 2005;Ross et al., 2013). Selective damage or degeneration of the cerebellum or its afferent and efferent neural pathways results in impaired interlimb coordination (Aoi et al., 2013;English, 1989;Fortier et al., 1987;Ichise et al., 2000;Morton and Bastian, 2006;Yanagihara et al., 1993). Birds and mammals have convergently evolved relatively enlarged lateral cerebella (Butler and Hodos, 2005), along with larger and more complex input and output nuclei (Appelberg et al., 1975;Johansson, 1988;ten Donkelaar, 1988;Wild and Williams, 2000).
Muscle spindle primary afferentstype Ia nerve fibersconvey information from muscle spindles to the central nervous system (CNS) about the rate of change in the length of fibers within a muscle fascicle (Purves and Fitzpatrick, 2001). Afferent information about velocity changes in limb muscles is necessary for coupling limb movements to alternating bursts of motor activity from spinal central pattern generators (Verdaasdonk et al., 2006). Furthermore, stronger afferent proprioceptive signals are associated with less variable cycle frequency (Ausborn et al., 2007). Deafferentation of spinal cord central pattern generators renders them incapable of compensating for variation in external forces and displacements associated with variably disrupted coordination (Allum et al., 1998;Grillner andZangger, 1979, 1984;Wetzel et al., 1976). Bird and mammal type Ia afferents are myelinated and larger than those of other tetrapods, facilitating rapid conduction of where muscle fibers insert into collagen bundles lying within the receptor capsule. This anatomical arrangement enables fine-scale signaling of tension in discrete portions of muscles (Mileusnic and Loeb, 2009), allowing more precise CNS control and predictability of forces generated by the muscles (Alneas, 1967;Crago et al., 1982;Houk and Henneman, 1967;Mileusnic and Loeb, 2009). Interestingly, the GTOs of turtles exhibit features resembling both bradymetabolic and tachymetabolic tetrapods, where some encapsulation of the GTOs is visible near the muscle-tendon junction, but non-encapsulated or free-endings are also present deeper in the tendon (Huber and Dewitt, 1900). Currently, we know little about the GTO morphology of crocodilians and monotremes.
Differences in rhythmicity between tachymetabolic and bradymetabolic tetrapods have been identified in limb step cycle duration (Granatosky et al., 2018a;Ross et al., 2013), but these data do not directly refer to variability in the locomotor forces. One important question is whether substrate reaction forces are also less variable in taxa with low variation in step cycle duration. Maintaining a predictable limb loading environment may have important consequences for overall cost of locomotion (O'Connor et al., 2012;Verdaasdonk et al., 2006), limb bone safety factors (Bertram and Biewener, 1988;Blob et al., 2014;Lowell, 1985) and the ability to recover from unexpected obstacles or perturbations to locomotion (Daley et al., 2006). These factors may be especially important for birds and mammals, which have greater daily travel distances (Daley et al., 2016;Jedrzejewski et al., 2001;Klaassen et al., 2008;Marcus Rowcliffe et al., 2012;Stark et al., 2005;Thompson, 1992;Thompson et al., 1999) and higher metabolic costs than bradymetabolic tetrapods (Nagy, 1987(Nagy, , 2005Nagy et al., 1999). In this study, we used hindlimb substrate reaction forces collected during locomotion of 55 tetrapod species to test the following hypothesis: tetrapods with encapsulated GTOs have less-variable substrate reaction forces than species with unencapsulated GTOs. Corroboration of this hypothesis would support links between the degree of rhythmicity in cycle duration and predictability in the forces acting on the hindlimbs during locomotion.   Huber and Dewitt (1900) with permission. All information about histological preparation and imaging is available in Huber and Dewitt (1900).
During all new trials, animals were video-recorded from a lateral view at 60-125 Hz. Only strides in which the animal was traveling in a straight path and not accelerating or decelerating (i.e. steadystate locomotion) were selected for analysis. Steady-state locomotion was determined by calculating the instantaneous velocity of a digitized point on the head between subsequent video frames throughout the entire stride, and then using regression analysis to determine whether velocity changed during the stride (Granatosky, 2015;Granatosky and Schmitt, 2019). Only strides in which no change in speed (i.e. slope not significantly different from zero) was detected were analyzed.
From these data, five variables were calculated for each single hindlimb substrate reaction force: (1) braking peak (Bpk) force; (2) propulsive peak (Ppk) force; (3) medial peak (Mpk) force; (4) lateral peak (Lpk) force; and (5) vertical peak (Vpk) force. Additionally, the times at which Bpk, Ppk, braking to propulsive transition (B/P), Mpk, Lpk and Vpk occurred during stance phase were also recorded. All force data were normalized for the direction of travel, differing body mass, and whether the limb that touched the instrumented portion of the runway was left or right. This resulted in comparable force curves that all displayed vertical force as a positive value on the vertical axis, braking force as a negative value on the fore-aft axis, propulsive force as a positive value on the fore-aft axis, medially oriented substrate reaction force as a negative value on the mediolateral axis, and laterally oriented substrate reaction force as a positive value on the mediolateral axis.
Inter-cycle variation in limb loading was assessed using the coefficient of variation (CV*) of peak forces and of the timing of these peaks within each stance phase. Coefficients of variation were calculated within individuals for each species using CV*=(1+1/ 4n)CV, where n is equal to the number of strides. The CV* of stride cycle duration was also calculated for each individual. The inclusion of n in the calculation of CV* provides an unbiased approximation of relative variance when sample size is low (Sokal and Rohlf, 2012). Because of the limited number of isolated hindlimb substrate reaction forces available for Pleurodeles waltl (i.e. one hindlimb substrate reaction force per individual), data for this species were combined for all statistical analyses. The CV* of stride cycle duration for P. waltl was calculated from data in Karakasiliotis et al. (2016) and for Recurvirostra avosetta, Haematopus ostralegus and Vanellus vanellus from data in Kilbourne et al. (2016).
For all analyses, variables were log 10 -transformed to more closely approximate normality and reduce the potentially confounding effects of extreme values (Keene, 1995;Sokal and Rohlf, 2012). The species-mean CV* of all limb loading variables and stride cycle durations were compared between species with unencapsulated versus encapsulated GTOs using a series of Mann-Whitney U-tests.
Despite an attempt to approximate normality in the dataset via log 10 transformation, the Mann-Whitney U-test remained the preferred conservative method of analyses because of the small sample sizes (e.g. 55 species) (Sokal and Rohlf, 2012). Mann-Whitney U-tests were conducted in MATLAB (v.2017b;MathWorks). Although information about GTO morphology is lacking for crocodilians, data collected from Caiman crocodilus were analyzed along with those for other bradymetabolic tetrapods following Ross et al. (2013).
It is important to note that several variables are thought to affect variation in force magnitude and timing (see Table S1). Consequently, it may be the case that statistical differences observed via Mann-Whitney U-tests described above do not effectively address the potentially influential effects of these confounding variables. As such, we conducted a series of linear mixed-effects models to assess the relationship between the variables of interest with species nested within GTO morphology as a random effect, and GTO morphology (i.e. encapsulated versus unencapsulated), substrate, number of hindlimb substrate reaction forces analyzed, body mass and contact time as fixed effects. As it is well known that speed has a substantial effect on both force magnitude and the shape of force profiles (but see Figs S1 and S2), it is important to consider speed and variation in speed as additional fixed effects. However, because of the large disparity of body sizes analyzed within this study, considering speed and variation in speed without considering potential scaling effects is untenable. As such, dimensionless speed (i.e. speed divided by the square root of acceleration due to gravity multiplied by leg length: s/ √(gL)] and variation in dimensionless speed were utilized instead and included in the model as additional fixed effects. Hindlimb length for each individual was determined from direct measurements from the animals, calibrated space in video recordings, the literature (Karakasiliotis et al., 2016), or based on a closely related taxon (hindlimb length for Ambystoma mexicanum was based on data from A. tigrinum). Preliminary model runs included the interaction between GTO morphology and mass, dimensionless speed, dimensionless speed CV* and contact time; however, these interactions were only rarely significant (3 out of 44). This indicates that the slope of relationships between limb loading/timing CV* and mass, dimensionless speed, dimensionless speed CV* and contact time does not differ between GTO morphologies. Thus, none of these interactions were included in the full models. As the goal of this study was to investigate the influence that GTO morphology has on limb loading magnitude and timing, we constrained comparison of our full model to a single null that did not include GTO morphology as a fixed effect, nor did it include the GTO nesting (i.e. species was an unnested random effect in the null model). The Burnham and Anderson (2001) approach for model comparison was used and Akaike's information criterion (AIC) generated for each model. AIC provides a measure of the goodness of fit of an estimated model and an operational way of trading off the complexity of an estimated model against how well the model fits the data. The best model has the lowest AIC and the significance of full models versus the null models was tested using likelihood ratio tests. Linear mixed-effects models were constructed and analyzed in R using 'lme4' (Bates et al., 2014) following Winter (2013 preprint). Individual CV* for each of the variables of interest were used to construct linear mixed-effects models. Mass, dimensionless speed, dimensionless speed CV*, contact time and number of trials were centered and scaled prior to analysis.
Phylogenetic relatedness between sample taxa may influence these statistical analyses (Felsenstein, 1985;Garland et al., 1992), so we took the following steps to account for these effects in our comparisons. First, we generated a sample of 100 phylogenetic trees to account for phylogenetic uncertainty using the template of a recently published study on European tetrapods (Roquet et al., 2014). To do this, we first built the trunk of the phylogenetic tree to include the most recent common ancestor (mrca) of each of the following crown groups: Amphibia, Mammalia, Lepidosauria, Testudines, Crocodylia and Aves. Tree topology was fixed to widely accepted relationships among these major groups and the depth of each mrca node was fixed to the mean value reported at www.timetree.org (Hedges et al., 2006(Hedges et al., , 2015Kumar and Hedges, 2011;Kumar et al., 2017). Next, we grafted samples of trees for each crown group onto this trunk. To do this, we retrieved 1000 posterior samples of trees from www.vertlife.org/phylosubsets that were generated from phylogenetic analyses of squamates (Tonini et al., 2016), birds (Jetz et al., 2014) and amphibians (Jetz and Pyron, 2018). We used a posterior sample of 100 trees for mammals (Kuhn et al., 2011), which are based on a recent supertree analysis (Hedges et al., 2006(Hedges et al., , 2015Kumar and Hedges, 2011;Kumar et al., 2017). Our dataset had three turtle species; therefore, we set the branching time between these taxa using values from www.timetree.org (Hedges et al., 2006(Hedges et al., , 2015Kumar and Hedges, 2011;Kumar et al., 2017). We then randomly chose one sample of each of these trees, then grafted them onto the appropriate node. We repeated this procedure 100 times to produce a posterior sample of 100 trees that accounted for uncertainty in branch length and topology. These trees were not ultrametric because of the decimal precision of the branch length estimates in the grafted trees; therefore, we forced them to be ultrametric by adding small amounts of branch lengths as needed (see http://blog.phytools.org/2017/03/forceultrametricmethod-for-ultrametric.html for additional explanation). The final sample of 100 ultrametric, dated phylogenetic trees was used in all subsequent analyses. The maximum clade credibility tree from this sample had 100% nodal support for all nodes except for: (1) the node connecting Varecia variegata and Lemur catta, which had 60% support and (2) the node connecting Meleagris gallopavo and Gallus gallus, which had 52% support. The results of subsequent comparative analyses are presented as the mean±s.d. of the test statistic as computed from the sample of 100 trees. R-packages used to construct the trees included 'ape' (Paradis et al., 2004) and 'phangorn' (Schliep, 2011).
We tested whether species-mean CV* of limb loading variables and stride cycle duration differed between tetrapods with encapsulated versus unencapsulated Golgi tendon morphology by fitting four different evolutionary models to our data given our sample of phylogenetic trees. The first two models were a single rate Brownian motion model (BM-1) and a single optimum Ornstein-Uhlenbeck model (OU-1) (Hansen, 1997). The BM-1 model assumed that the CV* of all limb loading variables and stride cycle duration evolved under a single evolutionary rate. The OU-1 model assumed that only a single evolutionary trait optimum (i.e. one type of Golgi tendon morphology) was present with a parameter α pulling trait evolution towards that optimum. The other two models we fitted were a two-rate Brownian motion model (BM-M) and a two-optimum Ornstein-Uhlenbeck model (OU-M). To fit the BM-M and OU-M models, we assumed that the ancestral condition for tetrapods was unencapsulated GTOs and that the mammalian and avian lineages independently evolved encapsulated GTOs, and then 'painted' the internal branches of the phylogeny accordingly (Figs 2-5). We fitted these models over the sample of 100 trees and then computed the mean and standard deviation of parameter estimates across the 100 model fits. To determine which model (BM-1, BM-M, OU-1 or OU-M) was the 'best' fit to the data, we computed the small-sample size AIC for each model and computed Akaike weights from the AIC scores (Burnham and Anderson, 2001). We note that majority support for either the OU-M or BM-M model(s) would indicate that GTO morphology was an important predictor of the evolution of CV* of limb loading variables and/or stride cycle duration.
We ran these evolutionary models using two different inputs. First, we used the function phyl.resid in phytools (Revell, 2012) to fit a phylogenetic, multiple least squares regression with log 10 -transformed species mean values for CV* of limb loading variables and stride cycle duration as the responses (separate regression for each response), and with log 10 mass, dimensionless speed and dimensionless speed CV* as predictors, all whilst accounting for phylogeny and assuming a Brownian motion model of trait covariance. This function returned a vector of species residuals, which can be interpreted as mass, dimensionless speed and dimensionless speed CV* 'corrected' values. These residuals were then used as input for the first set of evolutionary models listed above. For the second set of evolutionary models, we incorporated sampling error because it can have an important impact on analysis (Ives et al., 2007). To do this, we fitted models to the log 10 -transformed species mean values for CV* of limb loading variables and stride cycle duration. We used squared standard errors as our estimate of sampling error. Standard errors were computed per species by first computing the mean CV* per variable within each individual sampled, then computing the per-species standard deviation and dividing that standard deviation by the square root of the number of individuals sampled within that species. Some species had only one sampled individual, and thus their standard error could not be computed using this method. For these species, we assumed a standard error that was the arithmetic mean of all other species standard errors. Unfortunately, neither set of models is 'ideal'. The first set of models accounts for covariates that may influence force or cycle duration variables, but we are unaware of a method to account for species level 'error' in the residuals used as input for the first set of models. The second set of models can account for 'error' but does not adjust for covariates. In the context of these caveats, we fitted the evolutionary models using the mvMORPH package (Clavel et al., 2015).
We computed type I error rates and statistical power for the OU-M models using a simulation approach (Boettiger et al., 2012;Cooper et al., 2016;Schmitz and Higham, 2018). We did this by simulating 100 datasets under a BM-1 model of evolution and an additional 100 datasets under an OU-M model. Starting values for each model were derived from the fit of the first model from our analyses done over the sample of 100 trees, and performed separately for mass/ dimensionless speed/dimensionless speed CV corrected limb loading variables, and for raw variables that accounted for intraspecific sampling error. We then fitted the simulated datasets using BM-1 and OU-M and used the results of these fits to compute: (1) the proportion of BM-1 datasets fitted with OU-M models that had lower AIC than BM-1 datasets fitted with BM-1 models (type I error rate) and (2) the proportion of OU-M datasets fitted with OU-M models that had lower AIC than OU-M datasets fitted with BM-1 models (statistical power). We also computed selection opportunity (η), the discrimination ratio (φ) and the signal to noise ratio. These three variables are dimensionless quantities that can provide insight into statistical power when using OU-M models (Cressler et al., 2015). We compared our computed values for η, φ and the signal to noise ratio with those from a previous simulation study to help better understand our statistical power, given our relatively low sample size (Cressler et al., 2015).
To test whether variation in single limb loading affects overall system rhythmicity, we conducted a series of regression analyses to assess the relationship between species-mean log 10 CV* for each of the limb loading variables and CV* of stride cycle duration. A series of phylogenetic least squares regression (PGLS) analyses was also conducted to account for the effect of phylogeny on these relationships using the R-package phylolm (Ho and Ané, 2014). Covariance in the PGLS was modeled using Pagel's λ and using a single-optimum Ornstein-Uhlenbeck model, so two PGLS models were fitted for each limb loading variable. For the Pagel's λ model, λ can vary between 0 and 1, with 0 being a branch length transformation resulting in a star phylogeny and a λ of 1 resulting in the original phylogeny. Thus, a model fitted using Pagel's λ estimates the phylogenetic signal in the regression and transforms branch lengths accordingly. We checked for * Fig. 3. Phylogeny of species used in this study and bar graphs of logtransformed mean coefficients of variation (CV*) of stride cycle duration for each species. See Fig. 2 for species names. Coefficients of variation were calculated within individuals for each species using CV*=(1+1/4n)CV, where n is the number of strides. Species with encapsulated Golgi tendon organs (GTO) are illustrated in black and species with unencapsulated GTOs are in red. Branch colors on phylogeny correspond to hypothesized ancestral GTO morphology (encapsulated: black, unencapsulated: red). For scale, use Pleurodeles waltl (marked with an asterisk) at 1.32. normality and homoscedasticity of the residuals using diagnostic plots, and no issues were detected. Model fit for each variable was compared using Akaike weights. The P-values of the slope estimates for the best fitting models were corrected for multiple comparisons using the false discovery rate (Benjamini and Hochberg, 1995).

RESULTS
We analyzed 1930 hindlimb substrate reaction forces collected from 150 individuals. As found previously, CV* of stride cycle duration is lower in animals with encapsulated GTO morphology (Fig. 3). On average, tachymetabolic tetrapods with encapsulated GTO morphology (i.e. mammals and birds) also experience lower variation in peak force magnitude and the timing at which those peak forces occur compared with bradymetabolic tetrapods with unencapsulated GTOs (i.e. amphibians, lizards, turtles and crocodilians) (Tables 1-3, Figs 4 and 5).
Results from Mann-Whitney U-tests revealed significant differences (all P≤0.044) based on GTO morphology for CV* of stride cycle duration and all limb loading variables, except Lpk CV* and the timing of Lpk CV*. Linear mixed-effects models did reveal the importance of considering other variables in addition to GTO morphology when exploring the causes of variability in limb loading and timing, such as speed, variation in speed, contact time, number of strides, substrate and body mass (Table S1). However, in all cases, except in regards to the timing of Lpk and timing of Vpk CV*, the inclusion of information about GTO morphology in the linear mixedeffects models resulted in significantly lower AICs (Table 3). Lower AICs indicate that consideration of GTO morphology results in more parsimonious explanations for variability in limb loading and timing than a model that does not include GTO morphology. The OU-M models were the best fit for six out of 12 limb loading and cycle duration CV* variables when they were corrected for size, speed and speed CV* (Table 4). OU-1 models were the best fit for the other six limb loading and cycle duration CV* variables, and in all of these cases OU-M models were the second best fit (Table 4). In the second set of models, which accounted for intraspecific sampling error but did not correct for body mass, speed and speed CV*, the OU-M models were the best fit for the timing of Bpk CV*, timing of Ppk CV*, timing of Lpk CV* and timing of B/P CV*, while BM-M models were the best fit for timing of Vpk CV* and Bpk CV*. OU-1 and BM-1 models were the best fit for the other six variables (Table S2). On average, the OU-M/BM-M models were favored in 50% of the cases, suggesting GTO morphology has evolved towards distinct optima and/or at distinct rates for some limb loading variables but not others. Simulations and computed values of η, φ and the signal to noise ratio all suggest moderate to high statistical power for most variables [Table S3; but see Cressler et al. (2015) for a cautious note on interpreting these values], meaning that if an OU-M process generated the observed limb loading CV* patterns, then we were likely to detect that process. However, simulations also found inflated type I error rates (mean 0.17, range 0.06-0.25), suggesting that we too often reject a BM-1 model when it might be the 'correct' evolutionary model. Phylogenetic half-life is reasonable for most of the OU-M and OU-1 models (i.e. in the range of the length of the tree, <352 mya; Table 4), although some models that include standard error have a very large half-life, suggesting that traits will never reach their optima (Table S2). There was a significant relationship between CV* of stride cycle duration and Ppk force CV* ( y=0.29x+0.81; P=0.009), Vpk force CV* ( y=0.35x+0.84; P=0.005) and the timing of Lpk force CV* ( y=−0.29x+1.74; P=0.016) (Fig. S3). PGLS models using Pagel's λ had the highest Akaike weight for all limb loading variables (Table 5). Pagel's λ was ∼0.3-0.4, suggesting relatively weak phylogenetic signal in the relationships between CV* of stride cycle duration and all limb loading variables. PGLS found no significant relationships between CV* of stride cycle duration and limb loading variables after accounting for phylogenetic relatedness of sample taxa (Table 5).

DISCUSSION
In general, variance in peak force magnitude and the timing at which those peak forces occur was found to be lower in tachymetabolic tetrapods with encapsulated GTOs (i.e. mammals and birds) than in bradymetabolic tetrapods with unencapsulated GTOs. This is consistent with the hypothesis that birds and mammals have convergently evolved the ability to perceive precise information on muscle tension and as such can maintain a more predictable limb loading environment. That being stated, it is important to recognize several constraints on our experimental design that may limit the scope of its applicability. First, as with many studies that analyze force profiles, variation in locomotor speed across species, individuals and trials can have substantial effects on the interpretation of results (e.g. Bishop et al., 2018;Demes et al., 1994;Granatosky and Schmitt, 2019;Granatosky et al., 2018b). Despite our use of dimensionless speed as a means to address this issue, it remains the case that one cannot discount speed and variation in speed entirely as an explanatory factor when exploring variability in limb loading and timing (Table S1). However, in almost all cases, the inclusion of information about GTO morphology in the linear mixed-effects models results in a more parsimonious explanation of the observed patterns in limb loading variation and timing across the species sampled. As such, we have observed no evidence suggesting that variation in locomotor speed across species, individuals and trials in some way negates the major conclusions of this study. Though we addressed potentially confounding associations with dimensionless speed and dimensionless speed variation through statistical analyses, a more appropriate means of addressing this issue would have been Mann-Whitney U-tests were used to compare species-mean CV* of limb loading variables between tetrapods with encapsulated versus unencapsulated GTO morphology. Linear mixed-effects models were used to assess the relationship between the variables of interest with species and subject as random effects, and GTO morphology (i.e. encapsulated versus unencapsulated), substrate, number of hindlimb substrate reaction forces analyzed, body mass, dimensionless speed, variation in dimensionless speed and contact time as fixed effects. Model (d.f.=11) comparison was constrained to a single null (d.f.=10) that did not include GTO morphology as a fixed effect. GTO, Golgi tendon organ; Bpk, braking peak; Ppk, propulsive peak; Mpk, medial peak; Lpk, lateral peak; Vpk, vertical peak; B/P, braking to propulsive transition; and AIC, Akaike's information criterion.
more rigorous sampling at the initial experimental stages. Because this study used a combined dataset originating from multiple independent studies of freely moving animals, this was not possible. Future testing of the hypotheses presented here should take all possible precautions to ensure similar speeds, gait types and preferably Froude numbers between individuals, though this may be difficult to achieve across the full diversity of tetrapod species and body designs. Another potential limitation was based on our goal to use data collected from animals moving on their preferred substrate. No data from arboreal bradymetabolic animals were available, raising the possibility that the observed differences are simply the result of locomotion on different substrates (i.e. arboreal versus terrestrial). Our statistical analyses that account for differences in substrate use suggest no such conclusion, but data on the limb loading behavior of arboreal lizards currently being collected by Knight and Lee (2019) and Munteanu et al. (2019) will help to address this issue. Related to this, postural differences among tetrapods have clear effects on the limb effective mechanical advantage, center-of-mass mechanics, limb kinematics, energetic savings from spring or pendular mechanisms, gait and ecological use of locomotor behaviors (reviewed by Reilly et al., 2007). Any or all of these factors may explain differences among these taxa in the variation observed in substrate reaction forces, and their covariation makes disentangling their individual effects challenging. That being said, the sprawling locomotion of the common vampire bat (Desmodus rotundus) does not appear to influence inter-cycle loading variability compared with the other mammals sampled. Similarly, the 'intermediate' postures used by C. crocodilus (Nyakatura et al., 2019) appear to do little to differentiate limb loading variation and timing of this taxa from other bradymetabolic tetrapods. Even though these are only two species, these data suggest that posture is less important in driving patterns in force variability than factors related to GTO morphology.
Finally, we acknowledge that our data underrepresent total tetrapod diversity and are skewed towards primates (14/55 species sampled) and tachymetabolic species broadly (39/55 species sampled). Data on forces and GTO anatomy are needed from a greater diversity of species, especially basal mammals, crocodilians and salamanders. Moreover, sampling more species may help to reduce the inflated type I error rates we found associated with the OU-M models (Cooper Coefficients of variation were calculated within individuals for each species using CV*=(1+1/4n)CV, where n is the number of strides. Species with encapsulated GTOs are illustrated in black and species with unencapsulated GTOs are in red. Branch colors on phylogeny correspond to hypothesized ancestral GTO morphology (encapsulated: black, unencapsulated: red). For scale, use Bpk CV* for Pleurodeles waltl (marked with an asterisk) at 1.92. et al., 2016). Sampling GTO morphology and forces within a greater diversity of turtles would serve as a powerful test of the link between GTO morphology and force CV* because it would control for metabolic type (i.e. all turtles are bradymetabolic) and it would limit the myriad of confounding variables inherent in sampling at a broad phylogenetic scope, such as all of Tetrapoda.
These concerns notwithstanding and as stated above, analyses of species-mean variation in limb loading magnitude and timing confirm that, for most of the variables analyzed, variance in hindlimb loading is significantly lower in animals with encapsulated versus unencapsulated GTOs. This difference is significant regardless of speed, variation in speed, contact time, number of individuals, number of strides, substrate and body mass. This result has mixed support from the evolutionary analyses; the OU-M models that assume distinct evolutionary trait optima for animals with encapsulated versus unencapsulated GTOs are the best fit for ∼50% of the limb loading variables. The large magnitude of the differences in variance between animals with encapsulated versus unencapsulated GTOs in both peak hindlimb forces and the timing of those forces, as well as the persistence of these differences across multiple lineages of birds and mammals, suggest that these clade-specific differences in limb loading provide insight into the functional significance of differences in rhythmicity. Specifically, maintaining a predictable limb loading environment may have important consequences for overall costs of locomotion (O'Connor et al., 2012;Verdaasdonk et al., 2006), bone safety factors (Bertram and Biewener, 1988;Blob et al., 2014;Lowell, 1985) and the ability to recover from unexpected falls (Daley et al., 2006). We address each of these in turn below.
While the substrate reaction forces examined in this study do not provide a direct measure of force generation by the muscles, the external forces acting on the body during locomotion must be resisted by muscular activity (Beck, 2009;Gray, 1944Gray, , 1968. As such, variation in hindlimb substrate reaction forces provides insight into variation in muscle force production during locomotion. The energetic costs of moving the body constitute a high proportion of the overall metabolic budget of an animal (Kram and Taylor, 1990;Pontzer, 2016;Reilly et al., 2007) and the predominant energy-consuming process in locomotion is muscle force production (Kram and Taylor, 1990;Pontzer, 2016). During locomotion on level substrates, muscle forces produced by limb muscles must support body weight and propel the animal forward. To optimize energy expenditure, animals should only apply the amount of force necessary to achieve support, balance and propulsion (O'Connor et al., 2012;Taylor et al., 1980Taylor et al., , 1982 as increased variability in muscle force magnitude wastes considerable amounts of energy (Agiovlasitis et al., 2015;Granatosky et al., 2018a;O'Connor et al., 2012;Verdaasdonk et al., 2006). Hence, minimizing variability in muscle force generation contributes to energetic efficiency during steady-state locomotion. Minimizing variation in substrate reaction forces also reduces the likelihood that oscillations of the center of mass and limbs will produce unstable dynamic states (Full et al., 2002;Jordan et al., 2007;O'Connor et al., 2012). In such states, avoiding falling and interlimb interference likely necessitates more muscle recruitment and more work by the limbs and their muscles. For birds and mammals, which have greater daily travel distances (Daley et al., 2016;Jedrzejewski et al., 2001;Klaassen et al., 2008;Marcus Rowcliffe et al., 2012;Stark et al., 2005;Thompson, 1992;Thompson et al., 1999) and higher metabolic costs than bradymetabolic tetrapods (Nagy, 1987(Nagy, , 2005Nagy et al., 1999), minimizing unnecessary energetic expenditure by maintaining a predictable limb loading environment is likely to have had an important selective benefit.
During locomotion over land, limb bones are exposed to loads and, like most biological structures, they can withstand greater loads than they usually experience, as estimated by their safety factor (Alexander, 1981(Alexander, , 1988Blob et al., 2014;Lowell, 1985). Among tetrapods, birds and eutherian mammals (opossums have safety factors consistent with bradymetabolic tetrapods; Butcher et al., 2011;Gosnell et al., 2011) have lower limb-bone safety factors than Residuals are from regressions of log 10 limb loading and stride cycle duration CV* on log 10 mass, log 10 speed and log 10 speed CV*. do other tetrapod lineages (Blob et al., 2014), possibly as a result of the greater predictability of the loads (Bertram and Biewener, 1988;Blob et al., 2014;Lowell, 1985). We hypothesize that improved predictability of dynamic loading facilitates the capacity of birds and mammals to operate successfully with lower limb bone safety factors, making it possible to reduce energetic costs as well (Alexander, 1997;Lowell, 1985). The data presented here suggest that the limbs of birds and mammals experience reduced variability in external forces compared with other tetrapod lineages. We speculate that this is in part due to anticipatory modulation of reflexes through γ-motoneurons and enlarged cerebella, as well as to enhanced precision of their GTO system compared with other tetrapods (Gregory and Proske, 1975;Gregory et al., 2002;Haiden and Awad, 1981;Huber and Dewitt, 1900;Proske, 1979). At present, we know little about the control strategies that tetrapods use to maintain stability in the face of the unexpected obstacles they experience in their natural environment. Daley and colleagues (2006) addressed this question by perturbing the running of guinea fowl with an unexpected drop in substrate height. To avoid instability upon encountering a sudden drop, the bird must dissipate energy, convert it to another form, or perform both in combination (Biewener and Daley, 2007;Daley et al., 2006). Interestingly, guinea fowl adopt a range of these strategies across a continuum that relates to magnitude and direction of the substrate reaction force. When animals experience an unexpected perturbation, limb muscles must activate with the appropriate timing and intensity to resist substrate reaction forces and provide the appropriate leg stiffness (Daley et al., 2006). The activation level of the limb muscles depends on a combination of feed-forward, rhythmic motor control and proprioceptive feedback, including muscle stretch (spindle organs) and GTOs (Grillner, 1975;Pearson et al., 1998). The derived GTO morphology of birds and mammals and the increased predictability of rhythmic movements may allow birds and mammals to return to a state of dynamic stability after an unexpected fall quicker than animals with unencapsulated GTOs. Future work in this area is required to test this hypothesis.
While variation in limb loading does appear to be largely driven by differences in GTO morphology, the magnitude of this variation is largely variable dependent. Namely, propulsive and braking forces show the greatest disparity between species with encapsulated versus unencapsulated GTOs. This is followed by vertical forces, and much smaller differences are observed in mediolateral forces, which tend to be highly variable across strides for all species. Arguably, there are functional reasons and consequences associated with these findings. As articulated by Bishop et al. (2018), mediolateral forces are probably only (or at least predominantly) exerted for stabilization purposes. That is, they reflect small-scale, step-to-step adjustments made by the animal in order to maintain dynamic stability. Therefore, rather than being an important motor goal to achieve straight-line locomotion, mediolateral forces may be viewed as a constraint: simply apply whatever mediolateral force is necessary at each instant in time to maintain dynamic stability. Furthermore, because mediolateral forces tend to be relatively small compared with vertical and fore-aft force components, even in sprawling taxa, small fluctuations about the mean result in substantially greater variance (Sokal and Rohlf, 2012). Vertical forces are usually the largest that an animal exerts and primarily serve to support the body against gravity (Gray, 1944). As such, maintaining appropriate vertical forces is essential to preventing an animal from collapsing. As a result, there is likely less room for variance in this loading parameter compared with the other force components. In terms of both timing and magnitude, variation in propulsive and braking forces is greatest between sample taxa. These fore-aft forces functionally serve to keep the animal moving forward and inhibit out-of-control momentum of the center of mass (Granatosky et al., 2018b;Gray, 1944). Thus, propulsive and braking forces likely most influence overall system Table 5. Phylogenetic generalized least squares (PGLS) models of the relationships between coefficient of variation (CV*) of stride cycle duration (y) and CV* of all limb loading variables (x) Bold indicates models with the most support. Values presented are means based on running the analysis on 100 trees to account for phylogenetic uncertainty (standard deviations not shown, but were at least an order of magnitude smaller than the mean for all parameters). Variables defined as follows: σ 2 , Brownian motion rate parameter; α, strength of pull towards trait optimum under OU-1 model; λ, Pagel's lambda. Models defined as follows: λ, Pagel's lambda model; OU-1, single optimum Ornstein-Uhlenbeck. Bpk, braking peak; Ppk, propulsive peak; Mpk, medial peak; Lpk, lateral peak; Vpk, vertical peak; B/P, braking to propulsive transition; and AIC, Akaike's information criterion.
rhythmicity, which, as discussed above, has clear selective advantages for birds and mammals. It is also the case that fore-aft forces most strongly correlate with overall external morphology of bony structures (Fabre et al., 2016). This relationship may explain the overlapping patterns in bone safety factors observed by Blob et al. (2014) and the findings of this study.

Conclusions
This study demonstrates that, in addition to having less-variable cycle durations, tachymetabolic tetrapods (i.e. birds and mammals) also exhibit lower variation in limb loading magnitude and timing during locomotion compared with bradymetabolic tetrapods (i.e. amphibians and reptiles). The ability of birds and mammals to monitor and correct force variability could be linked to neural specializations such as encapsulated GTOs positioned near the muscle-tendon junction, along with the presence of γ-motoneurons and enlarged afferents and cerebella. We hypothesize that a predictable limb loading environment is advantageous for birds and mammals by allowing energy savings during locomotion, lower safety factors in limb bones and quicker recovery from perturbations.