Welcome to our new website

Elasticity and stress relaxation of rhesus monkey (Macaca mulatta) vocal folds
Tobias Riede


Fundamental frequency is an important perceptual parameter for acoustic communication in mammals. It is determined by vocal fold oscillation, which depends on the morphology and viscoelastic properties of the oscillating tissue. In this study, I tested if stress–strain and stress–relaxation behavior of rhesus monkey (Macaca mulatta) vocal folds allows the prediction of a species' natural fundamental frequency range across its entire vocal repertoire as well as of frequency contours within a single call type. In tensile tests, the load–strain and stress–relaxation behavior of rhesus monkey vocal folds and ventricular folds has been examined. Using the string model, predictions about the species' fundamental frequency range, individual variability, as well as the frequency contour of ‘coo’ calls were made. The low- and mid-frequency range (up to 2 kHz) of rhesus monkeys can be predicted relatively well with the string model. The discrepancy between predicted maximum fundamental frequency and what has been recorded in rhesus monkeys is currently ascribed to the difficulty in predicting the behavior of the lamina propria at very high strain. Histological sections of the vocal fold and different staining techniques identified collagen, elastin, hyaluronan and, surprisingly, fat cells as components of the lamina propria. The distribution of all four components is not uniform, suggesting that different aspects of the lamina propria are drawn into oscillation depending on vocal fold tension. A differentiated recruitment of tissue into oscillation could extend the frequency range specifically at the upper end of the frequency scale.


The elongation of mammalian vocal folds causes the tissue to stiffen and therefore to vibrate at higher rates, directly affecting the fundamental frequency (f0) of phonation (Hollien, 1960; Hirano et al., 1969). At strains larger than 20–30%, the stress response is highly nonlinear and demonstrates quantitative differences between individuals, sexes, as well as species (Haji, 1990; Min et al., 1995; Chan et al., 2007; Hunter and Titze, 2007; Zhang et al., 2009; Riede et al., 2010). When vocal folds are strained, some stress is released (Zhang et al., 2009) and would cause a drop in f0 if not compensated, for example, by further stretching. In a first approximation, a vocal fold behaves like a string. Stiffening increases the oscillation rate and, over time, the oscillation rate decreases due to stress relaxation. The string (vocal fold) has to be tuned again. A string can be operated within a certain stress range. A minimum positive stress prevents the string from sagging, and a maximum stress indicates the point before the material gets irreversibly damaged. I tested the hypothesis that the string model allows the prediction of the f0 range of a species' entire vocal repertoire as well as of the f0 contour within a specific call. The rhesus monkey (Macaca mulatta) is an excellent model to study this hypothesis because its vocal repertoire is well known, and a vocalization such as the ‘coo’ call (Fig. 1) shows a very specific rising-and-falling f0 contour that can only be achieved if laryngeal intrinsic muscles stiffen the vocal fold tissue to the correct stress level and simultaneously compensate for stress relaxation if necessary.

Most mammals produce their calls by vocal fold oscillations, which are caused by airflow forced through the larynx (e.g. Hast et al., 1974; Jürgens, 2002; Brown et al., 2003). The oscillation rate determines the f0 of the sound that a con-specific can hear and interpret. f0 is an important perceptual parameter for acoustic communication. It is controlled actively by the aerodynamic driving force (lung pressure), vocal fold positioning and nonlinear source–filter interactions, but, most importantly, it is determined passively by viscoelastic properties of the oscillating tissue (Titze, 1988; Titze, 2008; Titze et al., 1997; Titze et al., 2008; Chan et al., 2009). The relationship between vocal fold viscoelastic properties and f0 of the human voice has been established via the string model, in which string (or vocal fold) length, material stress and material density determine the rate of oscillation (e.g. Titze, 2000; Chan et al., 2009). In other species, it is unknown if and how the string model holds and how viscoelastic data relate to a species' f0 range. In the present study, I investigated quality and magnitude of the vocal fold stress response in rhesus monkeys and made predictions about the species' expected f0 range, specifically discussing the relevance of vocal fold viscoelastic properties for coo call production.

The coo call is a typical vocalization of all species in the genus Macaca (Green, 1975; Grimm, 1967; Itani, 1963; Lillehei and Snowdon, 1978; Hohmann and Herzog, 1985; Hauser, 1991; Hauser and Fowler, 1992; Owren et al., 1992; Owren and Rendall, 2003; Rendall et al., 1996). Coo calls are approximately 500 ms in duration. f0 increases variably during the first part of the call. The f0 contour varies at species, sexual and individual levels and is associated with certain context (Green, 1975; Hauser, 1991), and the laryngeal muscle activity demonstrates a f0-dependent pattern (Hast et al., 1974; West and Larson, 1993).

The goal of the present study is to quantify the tensile stress response of male and female rhesus monkey vocal folds, including its time dependence. I have also collected histological data because the composition of the vocal fold, specifically of the lamina propria, is critical for its viscoelastic behavior. Additionally, I have investigated the ventricular folds. They mark the cranial edge of the entrance to the lateral laryngeal ventricles and therefore are very closely located to the vocal folds. In humans, the ventricular folds (or ‘false folds’) are involved in some types of phonation (e.g. Tsai et al., 2008). It is likely that they are also involved in nonhuman primate phonation, in particular in high-amplitude screaming (van den Berg, 1955; Agarwal et al., 2003; Alipour et al., 2007).

Fig. 1.

Time signal and spectrogram of a rhesus macaque coo call (courtesy of Drew Rendall, University of Lethbridge). Note that the fundamental frequency in the coo call increases during the first 100 ms before it remains constant for a while and decreases towards the end. Stress relaxation in the monkey's vocal fold would likely cause the fundamental frequency to drop if no compensation for the tissue relaxation occurs.


Twelve rhesus monkey (Macaca mulatta, Zimmermann 1780) larynges (six males, six females) became available through the Wisconsin National Primate Research Center (WNPRC, Nonhuman Primate Tissue Distribution Program, Madison, WI, USA). Tissue was collected immediately after the animal was sacrificed. The tissue was quickly frozen in saline solution in liquid nitrogen and kept at –30°C until the experiment. Age, sex and body mass were available for all specimens (Table 1).


Six vocal and ventricular folds (three males, three females) were subjected to histology. The isolated tissue was stored in 10% neutral buffered formalin for 2 weeks before further processing. The tissue was then embedded in paraffin and 5 μm cross-sections were made.

Adjacent sections were exposed to one of the following stains: hematoxylin and eosin (H&E) for a general histological evaluation; elastica van Gieson stain (EVG) to identify elastic fibers; trichrome stain (TRI) to demonstrate collagen fibers; Alcian Blue stain (AB) (pH 2.5) to determine mucopolysaccharides and glycosaminoglycans. We also performed a hyaluronidase digestion (with bovine testicular hyaluronidase for 2 h at 37°C) in combination with a subsequent Alcian Blue stain to increase specificity for various acid mucosubstances in the Alcian Blue stain. Alcian Blue positivity is destroyed following prior incubation with hyaluronidase if hyaluronan is a major component of the mucosubstances. All stains were also performed on other tissues as positive control stains (artery for EVG; liver for TRI; small intestine for AB).

Table 1.

Summary of body mass (mb) and vocal fold length (L) for all specimens tested

Stress–strain measurements

The force–elongation data were obtained by (a) a 1 Hz sinusoidal cyclic loading and (b) a stepwise loading procedure, both performed by an automated electromechanical system, recording force and distance.

Before dissection, the in situ vocal fold length was measured with an accuracy of 0.1 mm. One vocal fold and one ventricular fold from each of six males and six females were dissected using microsurgery instruments and were tested in their complete length. Vocal folds and ventricular folds remained attached to a small portion of the arytenoid cartilage dorsally and to the thyroid cartilage ventrally. The thyroarytenoid muscle was removed while the lamina propria remained intact. The epithelium was also carefully removed. Great care was taken to make sure that the remaining fibers were not damaged. One suture connected the arytenoid cartilage to the lever arm of the servo-control lever system, and a second connected the thyroid cartilage to a fixation point below the lever arm.

The tissue was vertically mounted in a water-surrounded chamber containing saline solution (Ringer solution) maintained at 38°C (Fig. 2). The exact length between cartilage and fixation point in the clamp was measured with a caliper (±0.1 mm accuracy).

The force–elongation data were obtained by 1 Hz sinusoidal stretch and release of the vocal fold by means of a dual-mode servo-control lever system (Model 305B; Aurora Scientific, Aurora, ON, Canada). Displacement of and force on the lever arm (resolution 1 μm and 0.3 mN) were recorded. Elongation was applied in a longitudinal direction (dorso-ventral), followed by a shortening to the original length. The present set of experiments was conducted with a system under displacement control. A controlled sinusoidal displacement was applied to the lever arm so that the vocal fold was lengthened and shortened (loading–unloading condition) 15 times at a frequency of 1 Hz. The force and elongation signals were then transmitted via a 16-bit analog-to-digital acquisition board (Windaq Model DI722; DATAQ Instruments, Akron, OH, USA) at 6 kHz sampling frequency to a PC.

After two minutes of rest, in a second set of experiments, stress relaxation under fixed strain was measured. For stress relaxation estimations, the tissue was strained to 50% (ventricular folds) and 60% (vocal folds), respectively, using 500 ms ramp time and a 20 s holding period.

A pre-strain of 20% was applied to each specimen. Pre-strain is the elongation (relative to total specimen length) imposed on the specimen prior to each lengthening–shortening test. Vocal fold length differed between in situ (intact larynx) and ex situ (vocal fold excised from the cartilage framework of the larynx), most likely due to suspension of the vocal folds in the laryngeal cartilage framework. A pre-strain of 20% compensated for length changes due to isolating the tissue and was based on measurements of vocal fold length before and after isolation from the cartilage framework. The tension rise during the 20% extension is negligible. The distance between insertion points at the thyroid cartilage and the arytenoid cartilage was 19.8±2.1% shorter (mean ± s.d., N=5) after isolating the vocal fold.

Mass measurements were affected by hydration (tissue bath, see above) of the isolated tissue. We corrected all mass measurements by 10%, accounting for mass gain, following earlier studies (Riede and Titze, 2008; Riede et al., 2010) as well as a measurement in one rhesus monkey vocal fold. We adapted density values from an earlier study (Min et al., 1995). Hunter et al. demonstrated that small variations in density have a negligible effect on stress calculations (Hunter et al., 2007).

Fig. 2.

The tissue sample was mounted by suturing the arytenoid cartilage to the ergometer arm. The other end of the tissue was a piece of thyroid cartilage. The thyroid cartilage was sutured to the metal frame that could also hold the clamp.

Data analysis

True tensile strain (ϵ) was calculated as a logarithm of specimen length divided by its original mounting length: Embedded Image (1) where l0 is the mounting length (including pre-strain) and l is the actual length of the specimen during stretching. Note that when strain is given as percentage in this paper, the term on the right side of Eqn 1 is multiplied by 100.

True tensile stress (σ) is defined as the ratio between force (F; in N) and cross-sectional area (A; in m2) of the specimen. Assuming tissue incompressibility, uniform specimen cross-sectional area with roughly cylindrical geometry, and tissue isotropy allow for calculating the average cross-sectional area. The mean cross-sectional area (Embedded Image) is then: Embedded Image (2) where m is the specimen mass and ρ is the tissue density. This equation considers the strain-dependent cross-sectional area A, which decreases as the specimen is elongated and increases as the specimen returns to its initial mounting length. Usually, ‘engineering’ or ‘nominal stress’ is used (where the cross-sectional area, A, is assumed constant). However, when the changes in cross-sectional area are significant, then stress must be calculated using the strain-dependent cross-sectional area called ‘true stress’. Considering the large strains applied to the tissue, the use of true stress is critical. With the varying cross-sectional area, the tensile stress (in Pa) can be calculated as: Embedded Image (3) where F is the applied force.

The ratio of stress and strain tells us how much the material stretches for a given load. It is called variously ‘stiffness’, ‘elastic modulus’ or ‘Young's modulus of elasticity’.

Cyclic loading

The overall stress–strain response of vocal fold tissue has been differentiated into a linear low-strain and a nonlinear high-strain region. The low-strain region is modeled with a linear function: Embedded Image (4) where a is the slope of the curve, and b is the y-axis intercept. The high-strain region is best approximated with an exponential equation: Embedded Image (5) where A and B are constants to be determined empirically.

The amplitude of the cyclic strain application remains constant during 15 cycles, and the amplitude of the resulting stress response decreases (Fig. 3A). The combined stress–strain response results in a ‘banana-shaped’ curve (Fig. 3B), with a loading part (upper curve) higher than the unloading part (lower curve). The constants in Eqns 4 and 5, as well as the upper limit of the linear strain region (‘linear strain limit’, ϵ1), are derived by fitting a linear and an exponential regression line, respectively, to the empirical data, while maximizing the sum of both regression coefficients. The maximization process is performed by conducting a step-wise movement of the linear strain limit and performing a linear and exponential regression, respectively, on the two data sets at each step. The two resulting regression coefficients are added at each step [see Riede and Titze (Riede and Titze, 2008) for more detail]. The maximum of the sum of both regression coefficients is considered the linear strain limit, ϵ1, in Eqns 4–6.

The range of Young's modulus was described by the first derivative of the respective stress–strain function: Embedded Image (6) where A, B and a are constants.

In a 1 Hz cyclic tensile test, an energy loss in the unloading phase relative to the loading phase was observed in all specimens (i.e. difference between upper and lower curve in Fig. 3B). This energy loss, or hysteresis, was estimated as the difference between the area under the curves of the loading and the unloading phases of the 1Hz-sinusoidal stress–strain response. Hysteresis is frequency dependent because it involves the strain rate with respect to time. We provide here only one estimate for a 1 Hz loading–unloading regime. The stress–strain responses (loading phase and unloading phase separately) were fitted with an exponential curve according to Eqn 5. The area under the curve between ϵ1 and maximum applied strain was estimated by integrating both exponential equations. The difference in the area under the curve between loading and unloading stress–strain response was considered hysteresis and expressed as a percentage.

Stress relaxation

Vocal and ventricular fold tissue is stretched at a predetermined rate (here 0.5 Hz) to the desired strain (50% for ventricular folds and 60% for vocal folds). The tissue is maintained at the respective strain for 20 s, thus producing a stress relaxation curve (Fig. 4). The curve was modeled with an exponential decay function between the time point when the maximum stress was achieved and one second thereafter: Embedded Image (7) where P is the estimated plateau, K is the rate constant and σ0 is the peak stress at time point zero (t0). Stress half-life is computed as ln(2)/K.

Fig. 3.

Stress–strain response in time from a 1 Hz sinusoidal elongation of lamina propria. (A) Note that the amplitude of strain remains constant while stress decreases over time. The decrease in stress is a result of stress relaxation. (B) Stress–strain relationship for a single cycle from the same data set. The upper part of the ‘banana-shaped’ curve is the loading phase (stretching). The lower part is the unloading phase (return to the original mounting length). The difference between both curves is due to hysteresis of the tissue, i.e. lower stress in the tissue during the unloading phase. The low strain regions were fitted with a linear regression line, while the high-strain regions of both curves were modeled with exponential functions. In the loading curve, the limit of the linear region is the ‘linear strain limit’ (ϵ1), here approximately 0.16.

Viscoelastic properties were tested for differences between males and females as well as between vocal folds and ventricular folds using t-tests.

Fundamental frequency predictions

The anatomical arrangement of the vocal fold inside the larynx invites comparison of vocal fold oscillation with that of a string. The vocal fold is positioned between the thyroid cartilage and the arytenoid cartilage, and dorso-ventral elongation increases its tension. The airflow provides the energy source to set vocal folds into vibration. The rate of vibration determines f0 and depends on vocal fold length and viscoelastic properties (van den Berg, 1958; Titze, 2006).

Fig. 4.

Load-strain and stress relaxation curve. The vocal fold tissue is stretched at a predetermined rate (here to 50% and 60%, respectively, within 500 ms) to the desired strain, at which it is held for 20 s.

According to the string model, f0 is determined by: Embedded Image (8) where L is the string (or vocal fold) length, σ is the tissue stress (force per unit area), and ρ is the tissue density (1.02 g cm–3). The stress response during longitudinal straining was tested here.

The stress–strain data collected here, together with the string model, were used to make inferences about f0. The cyclic loading data were used to estimate a f0 range. The f0 curves for all individuals and the mean male and female vocal fold were calculated.

The peak stress data from each of the 15 cycles in the cyclic loading experiment were used to estimate f0 relaxation due to repeated straining.

The stress relaxation data were used to estimate the acoustic effect of tissue relaxation on f0. f0 curves for all individuals were calculated implementing the stress relaxation curve in Eqn 8. Individual vocal fold lengths were measured before dissections (Table 1).



Vocal folds consist of thyroarytenoid muscle, lamina propria and epithelium. Ventricular folds consist of lamina propria and epithelium. The extracellular matrix of vocal and ventricular fold consists of collagen, elastin fibrillar proteins and hyaluronan (Fig. 5). The removal of hyaluronan by hyaluronidase digestion, with subsequent Alcian Blue staining, indicated that much of the positive stain in the vocal fold (Fig. 5E) and the ventricular fold (Fig. 5I) can be attributed to hyaluronan (Fig. 5F,J). Laryngeal glands, located in the ventricular fold, remained intensely positively stained after hyaluronidase treatment. Although the glucosaminoglycans of vocal and ventricular fold were mainly made of hyaluronan, which was digested away by hyaluronidase, it was a composition of various mucopolysaccharides and glycosaminoglycans in the laryngeal gland. This pattern was found in all six specimens. In all six investigated specimens, fat was found in the deep portion of the lamina propria of the vocal fold and throughout that of the ventricular fold (Fig. 5E,I).

Fig. 5.

Histological sections of the mid-membranous part of the vocal fold of a 6-year-old male rhesus monkey. (A) Schematic horizontal section of a larynx, indicating where the histological sections were taken from (gray box). (B) Hematoxylin and eosin stain of vocal fold (VF) and ventricular fold (FF); the two boxes indicate the position of vocal fold (C–F) and ventricular fold sections (G–J). C and G are trichrome stains, indicating collagen fibers in blue; D and H are elastica van Gieson stains, indicating elastic fibers in black stain; E and I are Alcian Blue stains; F and J are Alcian Blue stains after hyaluronidase digestion. Scale bar in B, 2 mm; scale bars in C–J, 200 μm. TA, thyroarytenoid muscle; CT, cricothyroid muscle; E, epiglottis; Th, thyroid cartilage; Cr, cricoid cartilage; TR, tracheal ring; LV, laryngeal ventricle; LG, laryngeal gland.

Dorso-ventral longitudinal stress–strain relationship

Rhesus monkey vocal folds and ventricular folds showed the typical linear stress–strain response in the low-strain region and a nonlinear relationship in the high-strain region (Fig. 3B). Linear and nonlinear models (Eqns 4 and 5) reached regression coefficients of 0.97 and higher (Table 2).

The linear strain limit (ϵ1) was not significantly different between males and females, for either vocal folds or ventricular folds (Table 2). The slope of the linear model (constant a), the y-intercept (constant b) (Eqn 4) and the A constant in the exponential model (Eqn 5) were also not significantly different between males and females (Table 2). The B constant of the exponential model (Eqn 5) was significantly larger in males (Table 2), leading to a higher modulus at strains larger than ϵ1, meaning the vocal folds from six male rhesus monkeys were, on average, stiffer than that of the six females at strains above the linear limit.

The linear and exponential models of the stress–strain response of vocal folds and ventricular folds were compared pair-wise in order to test for differences between those two structures within individuals. Only the b constant of the exponential model in females showed significant differences (Table 3). In females, the ventricular fold reaches systematically higher moduli at strains larger than the linear strain limit.

The between-individual coefficients of variations (CV) were calculated (Table 2). For vocal folds, the CV values are tentatively larger in females than in males (t-test; t=-2.18, P=0.081); for ventricular folds, the CV values are not significantly different (t-test; t=0.07, P=0.94).

Hysteresis ranged from 27 to 36% for male vocal folds, from 27 to 40% for female vocal folds, from 27 to 40% for male ventricular folds and from 27 to 44% for female ventricular folds (Table 2). Differences between male and female tissue (Table 2), as well as between vocal and ventricular fold within each sex (Table 3), were not significant.

Stress relaxation

The stress relaxation in the stepwise procedure (Fig. 4) within the first second after reaching peak stress ranged from 22 to 35% for male vocal folds, from 17 to 46% for female vocal folds, from 16 to 29% for male ventricular folds and from 15 to 29% for female ventricular folds (Table 4). The stress half-life ranged from 147 to 263 ms for male vocal folds, from 94 to 265 ms for female vocal folds, from 193 to 236 ms for male ventricular folds and from 188 to 236 ms for female ventricular folds (Table 4). Differences between male and female tissue were not significant (Table 4).

Fundamental frequency predicted by biomechanical model

Data were implemented in a combination of the linear and exponential models (Eqns 4 and 5) using individual stress responses and vocal fold length (Table 1). For the average stress–strain curve, between the original vocal fold length (which was estimated at 7.8 mm for female and 8.3 mm for male vocal folds, according to data in Table 1) and 50% elongation, the string model suggests a maximum f0 range between ∼80 Hz and 1.6 kHz (Fig. 6).

Table 2.

Parameters of the linear (Eqn 4) and the exponential (Eqn 5) model for curve-fitting the empirical stress–strain response of the vocal fold

The relaxation of peak stress in 15 cycles can lead to a 10 to 30% drop in fundamental frequency (Fig. 7).

The stress relaxation within 1 s after reaching peak stress in a stepwise procedure to 50% strain can lead to a 10 to 30% drop in fundamental frequency (Fig. 8).


Rhesus monkey vocal folds and ventricular folds respond almost linearly with stress at strains up to ∼15%, i.e. elongation leads to a proportional increase in stiffness and in f0. Beyond ∼15% strain, the stress response is nonlinear. Small length changes result in large stiffness changes and therefore would result in large f0 changes. The stress response is different in male and female vocal folds. This parallels similar sex-specific differences in human (Chan et al., 2007) and Rocky Mountain elk (Riede and Titze, 2008) vocal folds. In all three species, it is the male vocal fold that is somewhat stiffer, suggesting that testosterone mediates higher tissue stiffness in vocal folds. Ventricular folds of rhesus monkeys demonstrate no sex differences, unlike human ventricular folds (Chan et al., 2006). This difference between human and rhesus monkey tissue could be based on their different morphology. The extracellular matrix of the former consists of collagen, elastic fibers and hyaluronan in which gland tissue is embedded, and the latter contains additionally large amounts of fat cells (Fig. 5). Fat in vocal folds has been interpreted as an adaptation to produce low f0 by enlarging the oscillating portion of the vocal fold (Hast, 1989). It is also injected as graft to substitute tissue in a damaged human vocal fold (Mikaelian et al., 1991). Its function in rhesus monkeys is not clear.

Table 3.

Test results of a pair-wise comparison of the linear (constants A and B) and exponential (constants a and b) model parameters between vocal folds and ventricular folds

How do tissue properties affect vocalization; for example, the coo call production? Its f0 contour is individual specific (Hauser, 1991; Owren and Rendall, 2003). Sources of f0 individual specificity include vocal fold morphology (size, shape and viscoelastic properties) and central control of laryngeal and respiratory muscles. The lamina propria stiffness shows individual differences, particularly at high strains, which could result in a f0 difference on the magnitude of 2 at 50% strain (Fig. 6). Human vocal folds also show a large inter-subject variability in their stress response in tensile tests (Zhang et al., 2009). However, certain call types, such as the coo call, are produced within a narrow f0 range. In order to achieve an across-individual similarity in the f0 range, vocal folds must be operated taking their viscoelastic properties into account. Data by Nishizawa et al. suggest that, in humans, length changes of the vocal fold that accompany the change of f0 are highly individual-specific (Nishizawa et al., 1988). These authors report that, in human subjects, vocal fold length may vary as little as 21% and as much as 111% over an individual's entire f0 range (Nishizawa et al., 1988). Their data also suggest that the length change required for producing a target f0 change differs significantly between subjects.

The knowledge about the viscoelastic properties of vocal folds allows us to make predictions about not only constraints but also the costs of sound production. Coo calls are produced within a relatively narrow f0 range. If the species' predicted total f0 range is between 80 and about 2000 Hz (Fig. 6), then coo calls with a f0 ranging between 200 and 600 Hz (Hauser, 1991; Owren and Rendall, 2003) must be produced with vocal folds strained to about 15–35% (Fig. 6). This is close to the linear strain limit (Table 2), suggesting that coo call production requires only a small to medium effort by intrinsic laryngeal muscles.

Table 4.

Parameters of exponential decay model (Eqn 7) for curve-fitting the empirical stress relaxation curve of the vocal fold, as well as stress half life and relative energy loss after one second

How well does the predicted and observed f0 range overlap? The lowest f0 of 80–100 Hz corresponds to what has been reported in the literature (Green, 1975; Grimm, 1967; Lillehei and Snowdon, 1978; Hohmann and Herzog, 1985; Hauser, 1991; Hauser and Fowler, 1992; Owren et al., 1992; Owren and Rendall, 2003; Patel and Owren, 2007). The estimated maximum f0 of ∼2 kHz fits many harmonic call types reported for rhesus monkeys, such as coo calls (Rowell and Hinde, 1962; Hauser, 1991; Owren and Rendall, 2003) or the girney call (Rowell and Hinde, 1962; Hauser and Fowler, 1992). But the highest ever reported f0 is close to 8 kHz for screams and squeaks (e.g. Rowell and Hinde, 1962; Gouzoules et al., 1984). These are high amplitude, often containing noisy and harmonic components. At this stage, it is hard to explain the discrepancy between expected and reported maximum f0. Possible explanations include the involvement of additional tissue into oscillation during screaming, like ventricular folds, for example, in humans (Ufema and Montequin, 2001), which might be able to oscillate at higher frequencies or are involved in nonlinear interactions (Titze et al., 2008). A very important alternative explanation is the reduction of the amount of oscillating tissue at highest frequencies. Among mammals, the upper oscillation rate limit is difficult to predict because the vocal fold is a layered structure (Hirano, 1975), and each layer has its own specific viscoelastic properties (Zhang et al., 2009). Therefore, it is hard to predict how much tissue will be drawn into oscillation and what effective mass and effective stress will be present at these highest strains. Here, the stress response of the complete lamina propria was tested, but during phonation epithelium and variable amounts of lamina propria are drawn into oscillation (Hirano, 1975). Rhesus monkey vocal folds consist of thyroarytenoid muscle, lamina propria and epithelium (Fig. 5). The lamina propria consists of elastin, collagen and hyaluronan (Fig. 5) and, unlike human (e.g. Hirano 1974; Hahn et al., 2006a; Hahn et al., 2006b) and many other mammal vocal folds (e.g. Kurita et al., 1983; Riede et al., 2010), fat cells (Fig. 5E). All four components are not equally distributed and are likely contributing to differentiated viscoelastic properties throughout the lamina propria, giving rise to functionally different layers in the lamina propria.

Rhesus monkey vocal fold tissue relaxes over time at constant strain (Fig. 4). The short- and long-term tissue time-dependence would affect f0 by lowering the tensile stress, accounting for a significant portion of f0 drop. In rhesus monkeys, this could lower f0 by 20–30% within a second (Fig. 8). However, a typical coo call shows a f0 rise from the beginning up to a certain time point within the call (Fig. 1). The modulation during the rise and the position of the peak f0 are reportedly associated with context (Hauser, 1991; Owren and Rendall, 2003). Muscle activity must counter the tissue's tendency to relax in order to achieve a f0 rise. Electromyography of the intrinsic laryngeal muscles in rhesus monkeys responsible for vocal fold elongation are typically activated about 100–200 ms before the coo call onset (Hast et al., 1974; West and Larson, 1993; Jürgens, 2002). The prephonatory muscle activity is of the same temporal magnitude as the stress half-life of the lamina propria (Table 4). It is tempting to conclude that prephonatory muscle activity is not only responsible for positioning vocal folds but also serves to precondition the vocal fold tissue and thereby account for the initial very dramatic tissue relaxation after reaching the peak stress.

Fig. 6.

Fundamental frequency (f0) predicted by the string model (Eqn 8). The linear and nonlinear stress responses for female (A) and male (B) vocal folds were implemented in Eqn 8 for strains between 0 and ∼50%. The red and blue lines, respectively, indicate the mean relationship between vocal fold length (L) and f0.

Fig. 7.

Fundamental frequency at peak stress in 15 loading cycles, predicted by the string model (Eqn 8) in six females (A) and six males (B).

Rhesus monkey vocal fold tissue also relaxes from loading cycle to loading cycle (Fig. 3A and Fig. 7). After 15 cycles, the simulated f0 has dropped by 10–20%, which is comparable to data found in humans (Chan et al., 2009). Long-term f0 decrease in human speech is called f0 declination. f0 declination refers to an observation in many languages in which f0 declines in a phrase or utterance (e.g. Bolinger, 1978). Hauser and Fowler (Hauser and Fowler, 1992) found f0 declination to occur in call sequences of rhesus monkeys and vervet monkeys (Cercopithecus aetiops). One suggestion for a functional relevance of this phenomenon is that f0 declination serves as a temporal cue for an indication of the end of an utterance (Breckenridge, 1977). A listener could use f0 development as an indicator for boundaries. Hauser and Fowler suggested that constraints of the sound production apparatus might account for the occurrence of this phenomenon in humans and nonhuman primates (Hauser and Fowler, 1992). Since f0 and viscoelastic properties of the vocal fold are tightly connected, stress relaxation in vocal folds could be an important factor contributing to f0 declination in humans (Chan et al., 2009) as well as in nonhuman primates (present study). The fact that viscoelastic properties are individual-specific could also explain why results about f0 declination can be mixed (for a review, see Hauser and Fowler, 1992). Of course, f0 regulation is more complex because it is also affected by other factors such as aerodynamic driving pressure (lung pressure). For example,'t Hart et al. demonstrated that lung pressure can decrease over the course of an utterance and could therefore contribute to f0 declination ('t Hart et al., 1990).

Fig. 8.

Fundamental frequency development after stretching the vocal fold to 40% strain (at time point zero), predicted by the string model (Eqn 8) in four females (A) and six males (B). The mean stress relaxation curve is indicated in red and blue, respectively.

Sound production is complex, involving the coordination of different motor patterns and physical properties. Among the passive physical properties, the vocal fold stress response and its time dependence represent an important constraint of laryngeal sound production in human and nonhuman primates affecting the short- and long-term f0 contour of vocalization. It therefore seems to be a feature of universal relevance in many primates and is not unique to the human language faculty.


  • Funding for this work was provided in part by NIH Grants R01 DC008612 and R01 DC04390. This publication was also made in part possible by Grant Number P51 RR000167 from the National Center for Research Resources, a component of the National Institutes of Health, to the Wisconsin National Primate Research Center, University of Wisconsin-Madison. Deposited in PMC for release after 12 months.


cross-sectional area
Embedded Image
mean cross-sectional area
Alcian Blue stain
coefficient of variation
elastica van Gieson stain
fundamental frequency
hematoxylin and eosin stains
string (or vocal fold) length
specimen length
original mounting length
body mass
trichrome stain
tensile strain
linear strain limit
tensile stress
peak stress at time point zero


View Abstract