Determining the information content of vocal signals and understanding morphological modifications of vocal anatomy are key steps towards revealing the selection pressures acting on a given species' vocal communication system. Here, we used a combination of acoustic and anatomical data to investigate whether male koala bellows provide reliable information on the caller's body size, and to confirm whether male koalas have a permanently descended larynx. Our results indicate that the spectral prominences of male koala bellows are formants (vocal tract resonances), and show that larger males have lower formant spacing. In contrast, no relationship between body size and the fundamental frequency was found. Anatomical investigations revealed that male koalas have a permanently descended larynx: the first example of this in a marsupial. Furthermore, we found a deeply anchored sternothyroid muscle that could allow male koalas to retract their larynx into the thorax. While this would explain the low formant spacing of the exhalation and initial inhalation phases of male bellows, further research will be required to reveal the anatomical basis for the formant spacing of the later inhalation phases, which is predictive of vocal tract lengths of around 50 cm (nearly the length of an adult koala's body). Taken together, these findings show that the formant spacing of male koala bellows has the potential to provide receivers with reliable information on the caller's body size, and reveal that vocal adaptations allowing callers to exaggerate (or maximise) the acoustic impression of their size have evolved independently in marsupials and placental mammals.
Vocal signals are often found to convey reliable or ‘honest’ information about the caller (Davies and Halliday, 1978; Clutton-Brock and Albon, 1979; Reby and McComb, 2003). The reliability of this information can be maintained by selection either because calls are more costly to produce for low quality individuals (‘handicaps’) (Zahavi and Zahavi, 1997) or because acoustic features of these calls are directly linked to the caller's phenotype and cannot easily be faked (‘indexical cues’) (Maynard Smith and Harper, 2003). Although the production of male sexual calls in mammals is likely to result in direct fitness costs to the caller, either by inviting competition from rivals or through increased metabolic rate (Gillooly and Ophir, 2010), such costs are difficult to quantify. As a consequence, recent studies of mammalian vocal communication have focused on the importance of indexical cues to the caller's quality (Briefer et al., 2010). In particular, because body size is a key determinant of resource holding potential and fighting ability in mammals (Clutton-Brock et al., 1979; McElligott et al., 2001; Sanvito et al., 2007), several studies have investigated whether acoustic features have the potential to act as indices of body size [for an overview see Taylor and Reby (Taylor and Reby, 2010)].
Two key acoustic features of mammalian calls are the fundamental frequency (which is the primary determinant of the perceived pitch, hereafter F0) and the vocal tract resonances or formants (Fitch and Hauser, 1995; Fitch, 1997). F0 corresponds to the rate of vocal fold vibration in the larynx and is affected by factors such as the length, mass and tension of the vocal folds (Titze, 1994). In particular, the lowest producible F0 is determined by the length and mass of the vocal folds, with larger vocal folds able to produce lower F0 values. Consequently, if vocal fold length and/or mass are correlated to body size then F0 could provide reliable information on the caller's size (Morton, 1977). However, while the prediction that larger animals should have lower F0 is verified across age classes (Reby and McComb, 2003; Ey et al., 2007) and may hold across species of very different sizes (Hauser, 1993), F0 is typically a poor cue to adult body size within a given mammalian species (Lass and Brown, 1978; Masataka, 1994; Reby and McComb, 2003; Rendall et al., 2005; Sanvito et al., 2007; Vannoni and McElligott, 2008). This lack of a correlation between F0 and overall body size within species is less surprising when we consider that mammalian larynges are relatively unconfined by surrounding bony structures and can vary independently in size from the rest of the body (for a review, see Fitch and Hauser, 2002). Moreover, F0 is especially likely to be a poor cue to male body size because testosterone effects laryngeal growth (Beckford et al., 1985), and males with higher testosterone levels may have longer, heavier vocal folds and concomitantly low F0 irrespective of their overall adult body size.
In contrast, a number of recent studies have confirmed that formants provide reliable information on the caller's body size in mammals (Fitch, 1997; Riede and Fitch, 1999; Reby and McComb, 2003; Harris et al., 2006; Sanvito et al., 2007; Vannoni and McElligott, 2008; Charlton et al., 2009). This relationship exists because longer vocal tracts produce lower, more closely spaced formants, and vocal tract length is positively correlated to skull and body size in a wide range of mammals (Fitch, 2000b). However, because the primary determinant of vocal tract length in most mammals is skull size, formants are only likely to provide reliable size-related information when skull and body size are correlated. Indeed, a dissociation between female skull and body size appears to render formants poor cues to size in female giant pandas (Charlton et al., 2009), and may provide the causal basis for similar findings in other female mammals (Rendall et al., 2005; Pfefferle and Fischer, 2006). Furthermore, the presence of anatomical adaptations that allow callers to elongate their vocal tracts and lower formants (by lengthening the supra-laryngeal or nasal vocal tract) (Fitch and Reby, 2001; Frey and Gebler, 2003; McElligott et al., 2006; Frey et al., 2007; Sanvito et al., 2007; Frey et al., 2011) could also make formants unreliable cues to body size, unless physiological constraints can act as a barrier to further formant lowering (Fitch and Reby, 2001; Reby and McComb, 2003). Consequently, it cannot be assumed a priori that formants will provide reliable information on body size in a given species and, hence, it remains important to determine whether they do and, more generally, to test the prediction that formants are reliable cues to body size in a wider range of taxa.
Male koalas (Phascolarctos cinereus) (Goldfuss 1817) produce low-pitched bellow vocalisations at high rates during the breeding season (Smith, 1980; Mitchell, 1990). In addition, male bellowing activity peaks during the period when most copulations are predicted to occur (Ellis et al., 2011). Taken together, these observations suggest that male koala bellows convey important information in reproductive contexts. Previous studies have suggested that males use bellowing to demarcate territories and repel rivals (Mitchell, 1990), whereas more recent work indicates that male bellowing functions to attract females directly, and may even induce female oestrus (Ellis et al., 2011). However, despite speculation about the possible functions of male koala bellows, it is not known whether male bellows have the potential to provide receivers with reliable information on the caller's physical attributes that are of potential importance in reproductive contexts. Koalas can distinguish between the bellows of different male callers (Charlton et al., 2011) and, thus, they attend to variation in the acoustic structure of these calls. In addition, male koalas are larger than females (Martin and Handasyde, 1999) and recent evidence shows that male reproductive output correlates with body mass (Ellis and Bercovitch, 2011). Interestingly though, males rarely engage in direct physical confrontations for access to females (Mitchell, 1990). Accordingly, if size-related information is present in male koala bellows then it could be a key mediator of male reproductive success, allowing males to assess rivals and females to select larger males as mating partners (Reby et al., 2005; Charlton et al., 2007b).
In a previous study we showed that clear spectral prominences likely to represent formants are present in male koala bellows (Charlton et al., 2011). If these pronounced frequency bands in male bellows are indeed formants, i.e. produced by the supra-laryngeal vocal tract, then their very low frequency values are surprising and intriguing. In addition, both male and female koalas appear to have a larynx that is located very low in the throat (Young, 1881; Sonntag, 1921; Lee and Carrick, 1989). However, to our knowledge, no clear evidence of a permanently descended male larynx has been reported in this species. Whilst a low larynx position has been suggested to be an adaptation for human speech production (Lieberman et al., 1969), its convergent evolution outside of the primate lineage (Fitch and Reby, 2001; Weissengruber et al., 2002; Frey and Gebler, 2003; Frey et al., 2011) indicates that this trait has alternative, nonlinguistic functions. It has been suggested that a descended larynx may represent a mechanism for callers to lower formants in order to increase the impression of their body size to receivers (Ohala, 1984; Fitch, 1997; Fitch, 2002; Fitch, 2010). A permanently descended larynx in the koala may, therefore, indicate strong selection pressures to broadcast and possibly to exaggerate body size using formants in this species.
The goals of this study were to: (1) investigate whether the spectral prominences in male koala bellows represent formants; (2) determine whether the formant frequency spacing and/or F0 provides reliable information about the caller's body size; (3) use the formant frequency spacing to estimate male vocal tract lengths (based on an idealised uniform-tube model) for comparison with anatomically verified measures of this species' vocal tract; and (4) determine whether male koalas have a permanently descended larynx and/or other adaptations that could allow callers to elongate their vocal tracts and lower formants. Our findings will allow us to assess the potential of male koala bellows to convey ‘honest’ information to receivers about the caller's body size, and reveal whether vocal adaptations allowing callers to maximise the acoustic impression of their size have evolved independently in a marsupial species.
MATERIALS AND METHODS
Study site and animals
This work followed the Association for the study of Animal Behaviour/Animal Behaviour Society guidelines for the use of animals in research, and was approved by the University of Queensland Animal Ethics Committee (approval number SAS/227/10).
The data for this study were collected during the 2010 koala breeding season (November–December) at Lone Pine Koala Sanctuary (LPKS), the Queensland Parks and Wildlife Service Moggill Koala Hospital (MKH), and the Centre for Advanced Imaging at the University of Queensland (CAI), all located in or near Brisbane, Australia. The subjects at LPKS were 20 male koalas between 3 and 15 years of age (mean, 8 years). All the animals at LPKS were individually recognisable. Two adult male koalas at MKH served as specimens for a post-mortem examination and for magnetic resonance imaging (MRI) scans at the CAI.
The animals at LPKS are habituated to the presence of humans and used to being handled. Consequently, we were able to measure male head length without the need for sedation or anaesthesia using Vernier callipers (±1 mm). In each case, male head length in centimetres was taken from the occipital protuberance to the tip of the nose (at a point equating to the rostral tip of the nasal bone). We were not able to obtain accurate measurements of body length from our study population without anaesthetising animals; however, data from wild populations indicate that male koala head length is tightly correlated to body length (N=50, R=0.64, P<0.001: data provided by the Queensland Parks and Wildlife Services) and body mass (Ellis and Bercovitch, 2011) and, hence, provides a good proxy for overall skeletal body size (head length measurements for all the subjects are presented in Table 1).
MRI scans and post-mortem investigations
MRI scans were conducted at the CAI on an adult male koala that had previously been euthanised and stored deep-frozen at –40°C. The head (occipital protuberance to the tip of the nose) and body length (occipital protuberance to the last caudal vertebrae) of the male specimen were 13.9 and 57.6 cm, respectively, indicating that this male was representative of an average-sized adult male koala (N=50 males; occipital protuberance to the tip of the nose, mean ± s.d., 14.1±0.8 cm; occipital protuberance to the last caudal vertebrae, mean ± s.d., 55.3±2.9 cm: data provided by the Queensland Parks and Wildlife Services).
The male specimen was allowed to fully thaw over a 48 h period before images were taken with a Siemens (Erlangen, Germany) 3T TRIO clinical MRI system using the following parameters: TR 4000 ms, TE 54 ms, turbo factor 11, field of view 400×241 mm, matrix size 448×216, image resolution 0.9×1.1 mm, slice thickness 6 mm, slice gap 1.2 mm, number of slices 28. Oral and nasal vocal tract lengths were then measured in OsiriX (v. 3.8.1; Geneva, Switzerland) using the regions of interest (ROI) facility. To do this a curvilinear line equidistant from the external walls of the oral or nasal vocal tract (viewed in a sagittal plane) was drawn from the middle of the glottis to the opening of the lips or nostrils, respectively [following Fitch (Fitch, 1997)]. We used the work of Kratzing (Kratzing, 1984) to guide our measurement of nasal vocal tract length. In addition, a post-mortem investigation was conducted on a freshly euthanised adult male koala at MKH. Again, the head (14.0 cm) and body length (53.4 cm) of this animal indicate that it was a medium-sized adult male koala. During the post-mortem dissection we measured the distance from the ventral prominence of the thyroid cartilage (that roughly corresponds to the position of the vocal folds in the larynx) to the lips, and the top of the sternum to the lips. In both cases the male's head was maximally stretched upwards. In addition, we opened the thoracic cavity to document where the sternothyroid muscle originates from on the sternum.
Recording and selection of male bellows
Male bellows were recorded using a Sennheiser ME67 (Sennheiser Electronics, Wedemark, Germany) directional microphone and a Zoom H4N portable solid-state digital recorder (Tokyo, Japan; sampling rate 44.1 kHz, amplitude resolution 16 bits) at distances ranging from 1 to 10 m. The recordings were transferred to an Apple Macintosh Macbook Pro computer, normalised to 100% peak amplitude, and saved as WAV files (44.1 kHz sampling rate and 16 bit amplitude resolution). The overall spectral structure of each bellow was initially investigated using narrow-band spectrograms generated using Praat DSP package version 5.0.29 (www.praat.org) (see Fig. 1 for spectrogram settings) and recordings with high levels of background noise were discarded.
Koala bellows consist of a continuous series of inhalations and shorter exhalations, and often have an introductory phase that is produced on exhalation and consists of abrupt amplitude onsets and offsets (Smith, 1980; Charlton et al., 2011) (see Fig. 1). The exhalation and initial inhalation phases of male bellows have five clear spectral prominences below 4000 and 3500 Hz, respectively. In addition, these sections are typically characterised by deterministic chaos (episodes of non-random noise indicating chaotic vibration of the vocal folds) with no clear F0 or harmonic structure (see Fig. 1). The later inhalation phases have six stable spectral prominences below 2500 Hz and a very low F0 (see Fig. 1). Because the frequency values and overall pattern of the spectral prominences varied between exhalation, initial inhalation and later inhalation phases (and were considerably lower in frequency for the later inhalation phases: see Fig. 2) we wanted to include measures of these frequency components from all three phases, in order to generate estimated vocal tract lengths from these three sections (exhalation, initial inhalation and later inhalation) for comparison with our anatomical data. However, not all male bellows contained the initial inhalation phases. Consequently, we used 276 bellows from 20 males (range 10–20 per male, mean 13.8) for analysis of the exhalation and later inhalation phases, and 102 bellows from 20 males (range 1–12 per male, mean 4.7) for analysis of the initial inhalation phases. It is worth noting that the inhalation phases of male bellows are much longer than adjacent exhalation phases and sound quite different because of their lower frequency components. As a result, we were able to clearly distinguish these phases both using spectrograms and when listening to bellows.
We extracted the exhalation and inhalation phases of male bellows and saved them as separate WAV files, before using purpose-built scripts in Praat DSP package version 5.0.29 (www.praat.org) to objectively measure the acoustic features in these sections.
Because the exhalation and initial inhalation sections of male bellows are typically characterised by broadband frequency noise in which little or no harmonic structure is visible, we only considered later inhalation sections for the F0 analysis. The F0 contour of the inhalation phases was extracted using the ‘To pitch (ac) command’ in Praat (time step 0.01 s; voice threshold 0.3; silence threshold 0.2; minimum and maximum F0 10 Hz and 100 Hz, respectively) and the minimum F0 (F0,min), meang F0 (F0,mean) and maximum F0 (F0,max) values (in Hz) were measured. To ensure that the autocorrelation algorithm was tracking the F0 contour, time-varying visual representations of the F0 contour were compared with narrow-band spectrograms, and any incorrect values were manually ‘unvoiced’ in the Pitch edit window. Finally, the extracted F0 contour was played back (as a pulse train) for auditory comparison with the original recording.
Visual inspection of spectrograms allowed us to identify regions encompassing several harmonics that may represent vocal tract resonances in the exhalation and inhalation phases of bellows (see Fig. 1). However, without a more complete knowledge of the functional anatomy of the koala vocal tract it was difficult to predict the number of vocal tract resonances to expect in a given frequency range. Furthermore, the spectral prominences of the later inhalation phases are much lower than those in the initial inhalation and exhalation phases (see Figs 1 and 2), indicating that males reconfigure their supra-laryngeal vocal tract during call production. Consequently, to standardise the analysis we selected exhalation and initial inhalation phases with stable spectral prominences, and later inhalation phases from the end of bellows where the spectral prominences had already reached well-defined minima that are likely to reflect anatomical constraints (see Fig. 1).
The frequency values (Hz) of the first five spectral prominences (F1–F5) of the exhalation and initial inhalation phases of male bellows and the first six spectral prominences (F1–F6) of the later inhalation phases were measured using Linear Predictive Coding [‘To Formants (Burg)’ command in Praat]. We did not measure spectral prominences above these values because they were often poorly defined and less stable across inhalation and exhalation sections. To measure the spectral prominences of the exhalation and initial inhalation phases we used the following analysis parameters: time step 0.01 s; window analysis 0.03 s; pre-emphasis 50 Hz; maximum number of formants 5; maximum formant value 3500–4000 Hz. For analysis of the later inhalation phases the maximum formant value was set to 2100–2500 Hz and the maximum number of formants to six; otherwise, the same analysis parameters were used. To check whether Praat was accurately tracking these frequency components we compared the outputs with visual inspections of each call's spectrogram and power spectrum (using cepstral smoothing at 200 Hz). The average frequency spacing of the spectral prominences achieved during exhalation phases (ΔFexhale), initial inhalation phases (ΔFinhale1) and later inhalation phases (ΔFinhale2) were then estimated using the regression method of Reby and McComb (Reby and McComb, 2003), in which the observed frequency values are plotted against those that would be expected if the vocal tract was a straight uniform tube, closed at one end and open at the other. Finally, the estimated apparent vocal tract length (VTL) was deduced using the equation: (1) where c is the approximate speed of sound in a mammalian vocal tract (350 m s–1) and ΔF is the average frequency spacing of the spectral prominences.
To exclude the possibility that the spectral prominences we measured in the later inhalation phases represented harmonics (integer multiples of F0), we used Pearson correlation coefficients to investigate the relationship between F0,mean and the spectral prominence frequency values (F1–F6) in the sections we selected for analysis. If any of the spectral prominences were harmonics then they should significantly co-vary with F0,mean. In addition, although we did not routinely measure F0 in the exhalation and initial inhalation sections of male bellows (because of the deterministic chaos often present in these sections), we were able to accurately measure F0 in one exhalation and initial inhalation section from each of the 20 different males that had a clear harmonic structure (see Fig. 2 for examples) using the ‘To pitch (ac) command’ in Praat (time step 0.01 s; voice threshold 0.3; silence threshold 0.2; F0,min, 30 Hz and F0,max, 100 Hz). Extracted F0 contours were checked against the relevant spectrograms and played back for auditory comparison as before. As we already had frequency values for the spectral prominences in these sections we could then test for independence of F0 and F1–F5 using Pearson correlation analysis.
Log10 transformations were used to normalise the data distribution for head length, F0,min and ΔFinhale1. All the other variables were normally distributed (Shapiro–Wilk: P>0.05). We used a multivariate general linear model (GLM) to examine relationships between our acoustic measures (F0,mean, log10F0,min, F0,max, ΔFexhale, log10ΔFinhale1 and ΔFinhale2) and log10 head length: the direction of any effects on the acoustic features was determined using the slope of standardised beta coefficients. By analysing all the dependent variables together, these tests reduce Type 1 errors and take into account the correlation between dependent variables (Field, 2000). Male age was initially entered as a covariate but as it never had a significant effect on any of the acoustic measures we did not consider it further in the analyses. SPSS version 16 for Mac OS X was used for all the analyses; significance levels were set at 0.05, and two-tailed probability values are quoted.
Independence of F0 and spectral prominence frequency values
The spectrograms presented in Fig. 2 clearly show that the spectral prominences in the exhalation (Fig. 2A), initial inhalation (Fig. 2B) and later inhalation (Fig. 2C) sections of male bellows are independent from the F0 and related harmonics. In addition, values of F0,mean and the frequency values of the spectral prominences were not significantly correlated across males (all P>0.05: exhalation phases, N=20: F1 R=0.202; F2 R=–0.354; F3 R=0.001; F4 R=0.089; F5 R=–0.083; initial inhalation phases, N=20: F1 R=–0.231; F2 R=0.083; F3 R=–0.257; F4 R=0.182; F5 R=0.034; later inhalation phases, N=20: F1 R=–0.102; F2 R=–0.184; F3 R=–0.048; F4 R=–0.296; F5 R=–0.130; F6 R=–0.399). Taken together, these data indicate that the spectral prominences of male koala bellows are produced independently of F0, making it likely that they represent supra-laryngeal resonances, i.e. formants.
Relationships between log10 head length and acoustic variables
The multivariate tests revealed that log10 head length (our proxy for body size) was a significant predictor of acoustic variation in male bellows (Wilks' lambda: F6,13=3.724, P=0.022). Univariate tests showed that larger males had lower ΔFexhale (b=–1314.149, F1,18=11.112, P=0.004), log10ΔFinhale1 (b=–0.826, F1,18=9.256, P=0.007) and ΔFinhale2 (b=–1169.541, F1,18=15.520, P=0.001). In contrast, log10 head length was not related to any of the F0 measures (F0,mean: b=68.193, F1,18=0.469, P=0.502, log10F0,min: b=–1.057, F1,18=1.107, P=0.307, or F0,max: b=–235.115, F1,18=1.706, P=0.208) (acoustic measures for each subject are given in Table 1).
Acoustically estimated vocal tract lengths
The mean formant spacing of male koala bellows was 795.8, 708.3 and 353.7 Hz for the exhalation, initial inhalation and later inhalation phases of bellows, respectively. When these values are substituted for ΔF in Eqn 1 this predicts a VTL of 22.0 cm for the exhalation phases, a VTL of 24.7 cm for the initial inhalation phases, and a VTL of 49.5 cm for the later inhalation phases.
MRI and post-mortem data
The MRI scans and post-mortem examination revealed a permanently descended male larynx positioned at the level of the 3rd and 4th cervical vertebrae, and a well-developed sternothyroid muscle anchored very deep in the thoracic cavity, on the 4th sternebra (Fig. 3A). The oral VTL was measured at 13.5 cm on the MRI scan (Fig. 3B) and 13 cm during the post-mortem examination. The MRI scans also revealed a nasal tract length of 16.6 cm, a total distance from lips to manubrium (the thoracic inlet at the top of the sternum) of 16.9 cm, and a distance from the manubrium to the origin of the sternothyroid muscle (the attachment point on the 4th sternebra) of 6.8 cm (Fig. 3B). The distance from lips to manubrium measured during the post-mortem examination was 18 cm, which is likely to reflect the maximally extended neck of the post-mortem specimen. Thus, a vocal tract configuration capable of producing the acoustically estimated vocal tract lengths would require that the larynx be retracted beyond the manubrium.
Although the best way to demonstrate formants in animal calls is to place vocalising animals in heliox (Rand and Dudley, 1993), which leads to an upward shift in formants, this is often impractical. Alternative approaches involve showing asynchronous movement of the fundamental frequency and spectral prominences presumed to represent formants, and revealing correlations between the frequency spacing of these spectral prominences and the caller's VTL, head length or body size (Fitch, 2000b). In the current study we were unable to obtain the actual VTLs for each of our subjects, but we provide strong evidence that the spectral prominences in male koala bellows are independent of the F0 and its related harmonics, and that the frequency spacing of these spectral prominences is negatively correlated with the caller's head length. Consequently, these findings support the hypothesis that the spectral prominences in male koala bellows are indeed formants.
Larger male koalas were found to have lower formant frequency spacing, but no relationship between our proxy of male body size and F0 was found. These findings accord with considerable research on other mammals, in which formant frequency spacing was found to be a reliable cue to male body size (Fitch, 1997; Riede and Fitch, 1999; Reby and McComb, 2003; Harris et al., 2006; Sanvito et al., 2007; Vannoni and McElligott, 2008; Charlton et al., 2009), whereas fundamental frequency is not (Lass and Brown, 1978; Masataka, 1994; Reby and McComb, 2003; Sanvito et al., 2007; Vannoni and McElligott, 2008). Whether koalas attend to size-related formant information in male bellows remains to be demonstrated. However, in a previous study we showed that male and female koalas can discriminate between the bellows of different male callers and that formant-related spectral prominences contributed disproportionately to vocal identity (Charlton et al., 2011). Consequently, it seems likely that koalas perceive formant frequency-related variation in male bellows, and attend to this information in the context of identity cueing.
Indeed, numerous other non-human animals spontaneously respond to formant shifts in their own species-typical vocalisations (Fitch and Kelley, 2000; Reby et al., 2005; Fitch and Fritz, 2006; Charlton et al., 2007a; Charlton et al., 2008; Charlton et al., 2010; Taylor et al., 2010) and are capable of perceiving formant shifts in human speech sounds with a high degree of accuracy (Baru, 1975; Burdick and Miller, 1975; Hienz et al., 1981; Hienz and Brady, 1988; Sinnott, 1989; Dooling and Brown, 1990; Sinnott and Kreiter, 1991; Sommers et al., 1992; Hienz et al., 1996). It is also noteworthy that the inter-individual variation in formant spacing we report is high: the minimum and maximum formant frequency spacing values for the exhalation, initial inhalation and later inhalation phases of bellows corresponded to a 16%, 25% and 31% variation around the mean values of 796, 708 and 354 Hz, respectively (see Table 1). Humans can perceive shifts in formant frequency spacing or apparent vocal tract length as low as 4% (Smith et al., 2005; Puts et al., 2007) and non-human animals have been shown to respond to shifts in formant frequency spacing of 8–10% in species-specific calls (Charlton et al., 2007a; Charlton et al., 2010). Consequently, we suggest that koalas should be able to use the formant frequency spacing in male bellows to detect a meaningful difference between a large, medium and small male representative of the population. Moreover, vocalisations like the koala bellow with a very low F0 and/or broadband noise are particularly well suited for highlighting formants, potentially increasing the salience of this size-related information (Ryalls and Lieberman, 1982; Fitch and Hauser, 1995). Future studies designed to test whether male and female koalas perceive and attend to size-related formant information in male bellows are certainly warranted.
Although we have shown that the formant frequency spacing of male koala bellows is a reliable acoustic cue to body size relative to other conspecifics, the VTLs we derived from the formant spacing are unusually long for a koala. Observations of bellowing males indicate that they typically extend their necks during the exhalation and initial inhalation phases of bellows, by raising the head so that the lower jaw is aligned with the lower part of the neck (see Fig. 2D,E), and the distance from the top of the sternum to the lips is around 18 cm when the head is raised. However, the vocal tract lengths we derived from the formant spacing of the exhalation and initial inhalation sections of bellows (of 22.0 and 24.7 cm, respectively) are several centimetres longer than this, suggesting that male koalas are using their deeply anchored sternothyroid muscle to retract the larynx inside the thoracic cavity during call production. Other mammals possess sternothyroid muscles that originate deep in the thorax [e.g. tree sloths (Naples, 1986), spiny anteaters (Chan, 1995), giant anteaters (Naples, 1999), lions (Weissengruber et al., 2008)] and, hence, whilst unusual, this anatomical feature is not unique to koalas or marsupials. However, this is, to our knowledge, the first indication of such intra-thoracic laryngeal retraction in any mammal. Future studies should investigate whether deeply anchored sternothyroid muscles are present in other marsupial species, and use x-ray visualisation of bellowing male koalas to provide incontrovertible evidence that they retract their larynges inside the thoracic cavity during call production.
We are aware that our acoustic findings appear to refute Smith's (Smith, 1980) observation that ‘the larynx is abruptly raised to the top of the throat’ during the exhalation phases of bellows. However, we suggest that Smith's original contention is not based on actual observations of laryngeal movement, firstly, because male koala larynges are not especially large and, therefore, do not produce an obvious external protuberance (B.D.C. and A.J.McK., personal observation), and secondly, because our own close-range observations of male neck regions during bellowing have failed to detect any signs of laryngeal movement (including careful observations of slowed-down video footage). Instead, our acoustic findings suggest that the larynx is retracted at the beginning of the call, and kept within the thorax during the exhalation and inhalation phases.
Indeed, laryngeal retraction just prior to calling is documented in other mammalian species (Fitch, 2000a; Reby and McComb, 2003) and is probably common throughout mammals [see pp. 315–318 of Fitch (Fitch, 2010)]. The permanently descended larynx and well-developed sternothyroid muscle revealed by our anatomical investigations certainly suggest strong selection pressures for male koalas to elongate their vocal tracts through laryngeal retraction (Fitch and Reby, 2001; Frey and Gebler, 2003) and maximise the acoustic impression of their body size conveyed to receivers. In addition, laryngeal retraction would be ultimately constrained by the structures of the thorax and sternum (Fitch and Reby, 2001), thereby maintaining the ‘relative’ honesty of the size-related formant information in male bellows. As a result, the formant spacing remains an honest cue to body size within male koalas, even if it exaggerates body size relative to other species [as in colobus monkeys (Harris et al., 2006) and red deer (Reby and McComb, 2003)]. The formant spacing of the later inhalation phases of male bellows, however, predicts a VTL of around 50 cm, which is nearly the entire length of an adult male koala's body. Although the anatomical basis for this finding remains unclear, we offer two potential explanations that may guide future research.
The first explanation is that the vocal tract configuration of the later inhalation phases of male bellows deviates greatly from a simple tube, and is augmented by a large cavity resonator: for example, the voluminous nasal sinuses documented in this species (Kratzing, 1984) or via expansion of the supra-laryngeal pharynx into a temporary air sac (de Boer, 2009). A sub-hyoid air sac inflated just prior to vocalisation is hypothesised to cause the extremely low formants of Colobus monkey (Colobus guereza) roars (Harris et al., 2006), and observations of bellowing male koalas show that the neck inflates during the exhalation phase, and then starts to deflate across the inhalation phases that are characterised by extremely low formant frequency spacing. Despite these observations, it is unclear whether any specific anatomical constraint could tie the inflation of an extendable pouch to overall male body size and, hence, maintain the reliability of the size-related information.
Another potential explanation is that the larynx is fully retracted by the sternothyroid muscle, and when disengaged from the nasopharynx it provides acoustic energy to simultaneously excite resonances in the oral and nasal cavities, or allows oscillations of this species' elongated velum (Young, 1881) to act as a source of acoustic energy (as in human snoring) (Pevernagie et al., 2010). The different dimensions of the oral and nasal vocal tract (the nasal tract measured on the MRI scan was around 3 cm longer than the oral vocal tract) could then produce formants that do not overlap but, rather, augment the acoustic impression of the caller's body size by providing more formants within a given frequency range. Consistent with this notion, we have felt air being exhaled through the nostrils during the inhalation phases of bellows with extremely low formants (K.N., personal observation) indicating that the intrapharyngeal ostium (the connection between the oro- and naso-pharynx) is open, and that air is circulating in the nasal cavities and oral vocal tract at the same time. Nevertheless, because the nasal tract departs significantly from a linear tube, it is difficult to predict exactly where any additional formants would appear on the call spectra. Future studies should use data generated from computer tomography scans to build 3D models of the koala's oral and nasal tract, in order to predict the centre frequencies of this species' oral and nasal tract resonances. These 3D models could also be used to construct plastic moulds of the koala's oral and nasal tract, so that resonances could be excited and their frequencies documented using sweep-tone experiments (Fujimura and Lindqvist, 1971).
To conclude, our findings indicate that the spectral prominences of male koala bellows are formants, and show that the formant spacing provides honest information on male body size relative to other conspecifics. The accurate assessment of body size is important in many animal species during agonistic interactions between males and when females assess potential mates (Andersson, 1994; Owings and Morton, 1998). Consequently, male and female koalas may adjust their behavioural responses according to the size-related information broadcast in male bellows. For instance, because of the genetic benefits of larger, more competitive offspring, female koalas might prefer bellows signalling larger males in mate choice contexts (Charlton et al., 2007b; Charlton et al., 2008). In addition, male koalas could use this information on male body size to help them avoid risky encounters with larger rivals (Reby et al., 2005). Furthermore, the independently evolved low larynx position and well-developed sternothyroid muscle of male koalas suggest strong selection pressures for callers to elongate their vocal tracts, and maximise the acoustic impression of their body size conveyed to receivers (Fitch and Reby, 2001; Frey and Gebler, 2003). Indeed, the ‘size exaggeration’ hypothesis (Ohala, 1984; Fitch, 1997; Fitch, 2002; Fitch, 2010) has been proposed to justify the permanently descended larynges found in other mammals (Fitch and Reby, 2001; Weissengruber et al., 2002; Frey and Gebler, 2003; Frey et al., 2011) and the elongated trachea present in some bird species (Fitch, 1999). More generally, the observation that several male mammals actively lower formants in their sexual calls (Fitch and Reby, 2001; Frey and Gebler, 2003; McElligott et al., 2006; Frey et al., 2007; Sanvito et al., 2007; Frey et al., 2011) suggests that selection for adaptations to exaggerate (or maximise) the acoustic impression of body size in reproductive contexts could be widespread in non-human animals (see Fitch and Hauser, 2002). Whether this is predominantly for deterring rivals and/or attracting females remains a key question for future research.
We wish to thank all the staff at Lone Pine Koala Sanctuary for helping to identify the koalas and two anonymous reviewers for their comments on the manuscript. We also want to thank Martha Tate for her invaluable help recording male bellows and the Queensland Wildlife and Parks Services for access to unpublished data. B.D.C. was partially supported by a European Research Council Advanced Grant SOMACCA (no. 230604) awarded to W. Tecumseh Fitch. This work follows the Association for the study of Animal Behaviour/Animal Behaviour Society guidelines for the use of animals in research, and was approved by the University of Queensland Animal Ethics Committee (approval number SAS/227/10).
- © 2011.