The evolutionary origins of human language are obscured by the scarcity of essential linguistic characteristics in non-human primate communication systems. Volitional control of vocal utterances is one such indispensable feature of language. We investigated the ability of two monkeys to volitionally utter species-specific calls over many years. Both monkeys reliably vocalized on command during juvenile periods, but discontinued this controlled vocal behavior in adulthood. This emerging disability was confined to volitional vocal production, as the monkeys continued to vocalize spontaneously. In addition, they continued to use hand movements as instructed responses during adulthood. This greater vocal flexibility of monkeys early in ontogeny supports the neoteny hypothesis in human evolution. This suggests that linguistic capabilities were enabled via an expansion of the juvenile period during the development of humans.
The human language faculty vastly outperforms primate vocal communication systems in scope and flexibility (Balter, 2010; Ghazanfar, 2008; Hammerschmidt and Fischer, 2008). This lack of essential linguistic characteristics in extant non-human primate communication systems hampers insights into the evolutionary origins of speech and language (Arnold and Zuberbühler, 2006; Seyfarth and Cheney, 2010). Volitional control of vocal utterances is deemed a critical, albeit insufficient, precursor for the development of a flexible communicative system (Balter, 2010; Ghazanfar, 2008; Ackermann et al., 2014). However, primate communication systems consist of stereotyped and innate calls that are almost exclusively uttered affectively (Ackermann et al., 2014; Deacon, 2010; Jürgens, 2002). Non-human primates lack the neural machinery that endows modern humans with outstanding cognitive abilities such as language. The ‘neoteny hypothesis of human evolution’ (Gould, 1977) posits the expansion of the childhood period with refined synaptic development in modern humans to facilitate larger and more powerful neural systems. Specifically, the prefrontal cortex, which is associated with the highest levels of cognition in addition to being the site of Broca's language production area, experiences extraordinary long phases of developmental reorganization of neuronal circuits (Petanjek et al., 2011). Genes related to the development of the prefrontal cortex show excessive, neotenic expression in humans relative to chimpanzees and rhesus macaques (Somel et al., 2013).
The neoteny hypothesis suggests an exploitation of greater neural plasticity early in ontogeny to fostering the neural underpinnings of high-level communication systems like language (Carroll, 2003; Oller, 2000). Interestingly, primate vocalizations experience ontogenetic changes. In infant and juvenile simian monkeys, calls are more variable (Hammerschmidt et al., 2001; Pistorio et al., 2006; Takahashi et al., 2015) and vocal-related learning, such as call usage and comprehension, is facilitated (Seyfarth and Cheney, 1986, 2010). Therefore, infant and juvenile monkeys seem to have an advantage and can use vocal communication signals more flexibly.
Earlier studies revealed that monkeys and apes can be trained to vocalize in operant conditioning tasks (Sutton et al., 1973, 1974, 1985; Trachy et al., 1981; Coudé et al., 2011; Koda et al., 2007). We recently reported that two juvenile rhesus monkeys can be trained with effort to instrumentalize their calls as a conditioned response in a simple detection task (Hage et al., 2013). All but one study, including our own, that indicated the age of the monkeys and apes were performed with juvenile animals (Sutton et al., 1973, 1974; Trachy et al., 1981; Koda et al., 2007; Hage et al., 2013). Based on the neoteny hypothesis, we hypothesized that juvenile monkeys with a more plastic brain would be better suited for volitional call production than adult monkeys. We here present a longitudinal study based on data collected from two monkeys over several years, investigating potential developmental trends of vocal behavior from the juvenile to the adult period.
MATERIALS AND METHODS
We used two male rhesus monkeys, Macaca mulatta (Zimmermann 1780), aged 4.8 and 4.9 years and weighing 4.2 and 4.5 kg at the beginning of this long-term study, and aged 9.5 and 9.7 years and weighing 8.6 and 9 kg, respectively, at the end. All procedures were authorized by the national authority, the Regierungspräsidium Tübingen, Germany.
Both monkeys were first trained to perform a vocal response task (Fig. 1A), i.e. a visual go/no-go detection task using their vocalizations as a response (Hage et al., 2013; Hage and Nieder, 2013, 2015). Briefly, the monkeys were required to vocalize cued by arbitrary visual stimuli (red or blue squares) to receive a reward. Monkey T was trained to utter ‘coo’ vocalizations; monkey C was taught to emit ‘grunts’. The two colors appeared with equal probability (P=0.5) and had no significant influence on call probability (Wilcoxon signed rank test, P>0.1 for both monkeys). Trials began when the monkey initiated a ‘ready’ response by grasping a bar. Then, a visual cue, indicating the ‘no-go’ signal (‘pre-cue’; white square, diameter 0.5 deg of visual angle) appeared for a randomized time of 1–5 s (time epoch 1 of monkey C with times between 0.5 and 5 s). During this period, vocal output had to be withheld. Next, in 80% of the trials, the visual cue was changed to a colored ‘go’ signal (red or blue square; diameter 0.5 deg of visual angle) lasting for 3000 ms (for monkey C, the duration of the go signal was extended to 3500 ms from the 19th session of epoch 6 until the end of epoch 8). During this time, the monkeys had to emit a vocalization to receive a reward. In 20% of the trials, the cue remained unchanged for another 3000 ms (‘catch’ trial). During this period, the monkey had to withhold calls. Catch trials were not rewarded. ‘False alarms’ were indicated by visual feedback (blue screen) and by trial abortion. To demonstrate its readiness to work, the monkey had to grab the bar throughout the pre-cue as well as the go phases. Bar release aborted the trials instantaneously, followed by visual feedback (red screen). In accordance with the go/no-go detection protocol, successful go trials were defined as ‘hits’, and unsuccessful catch trials as false alarms. One session was recorded per individual per day.
Vocal recording sessions comprised eight contiguous epochs in monkey C (epoch 1: median age 4.9 years with N=15 daily sessions; epoch 2: 5.4 years, N=27; epoch 3: 6.2 years, N=25; epoch 4: 6.8 years, N=29; epoch 5: 7.1 years, N=47; epoch 6: 7.7 years, N=28; epoch 7: 7.8 years, N=13; epoch 8: 8.0 years, N=8) and 7 epochs in monkey T (epoch 1: 4.9 years, N=15; epoch 2: 5.0 years, N=33; epoch 3: 5.8 years, N=20; epoch 4: 6.4 years, N=52; epoch 5: 6.7 years, N=33; epoch 6: 7.0 years, N=12; epoch 7: 7.8 years, N=53). Monkey T was head-fixed during all sessions, monkey C was head-fixed in all sessions during epoch 1–5. In both monkeys, epochs 2 and 5 include neuronal recording sessions, while all other sessions in the remaining epochs were behavioral sessions.
After the monkeys ceased to produce conditioned calls as a response, they were re-trained to perform a manual response task (Fig. 1B). They were trained to perform a standard visual delayed match-to-sample (DMS) task with colors and were required to respond to matching colors by hand movements. A trial started when the monkey grasped a lever. A sample display showing a color square (2 deg visual angle) was presented on a black background in the center of a computer screen for 800 ms. A constant 1000 ms memory delay followed. Next, a test display appeared which in 50% of the cases was a match showing the same color as the sample period (‘match’ trials). In the other 50% of cases (‘non-match’ trials), the first test display after the delay period was a non-match, showing a different color, followed by a second test display, which always displayed a match color. If a match appeared, monkeys released the lever (within 1.2 s) to receive a fluid reward. If a non-match was shown, they held the lever until the second test display appeared (which in these trials was always a match), requiring a lever release for a reward. Trials were randomized and balanced across all relevant features (e.g. match versus non-match, colors). Monkey C performed the task with red and blue colors, monkey T with red, blue and green colors.
As in our previous studies (Hage et al., 2013; Hage and Nieder, 2013, 2015), stimulus presentation and behavioral monitoring were automated on PCs running the CORTEX program (National Institutes of Health) and recorded by a Plexon Multi-Acquisition system. Vocalizations were recorded by the same system with a sampling rate of 40,000 Hz via an A/D converter. A custom-written MATLAB program running on another PC monitored the vocal behavior in real time and detected the vocalizations. Vocal onset times were detected offline by a custom-written MATLAB program to ensure precise timing for data analysis in all but two sessions of monkey C (epoch 3 and epoch 4), as these behavioral sessions were recorded by the CORTEX program only.
Recording of spontaneous vocalizations
The spontaneous vocalizations of the two monkeys in their housing environment were measured during their juvenile and adult periods as part of ‘ethograms’ for which a range of behaviors was recorded (Hage et al., 2014). To that aim, we focally sampled the call behavior of the monkeys in 1 min intervals over a duration of 10 min, during two periods of five consecutive days (‘continuous sampling’; Altmann, 1974; Martin and Bateson, 1993). Call occurrence (%) could range from 0% (no calls during the 10 min observation window) to 100% (calls every minute during the 10 min observation) and was averaged for the juvenile and adult test periods. The data for the juvenile phase were collected when monkey C was 5.4 and 5.6 years old, and when monkey T was 5.0 and 6.1 years old. Spontaneous call behavior for the adult phase was recorded when monkey C was 9.5 years old, and when monkey T was 9.7 years old. Wilcoxon rank sum tests were used to test for significant differences in spontaneous vocal behavior between the juvenile phase and adulthood.
We computed d-prime (d′) sensitivity values derived from signal detection theory (Green and Swets, 1966) by subtracting z-scores (normal deviates) of median hit rates from z-scores of median false alarm rates. The detection threshold for d′ values was set to 1.8. The d′ criterion for the threshold was 1.8, which corresponds to a hit rate of 56% at a false alarm rate of 5% in this go/no-go task (Green and Swets, 1966).
Kruskal–Wallis tests (with post hoc Wilcoxon rank sum tests) were performed to test for significant differences in call performance, hit rate, false alarm rate, d′ value and call latency during the detection task over time. We used Pearson's correlations to test for possible correlations between these parameters characterizing vocal behavior and the monkeys’ age in the appropriate sessions.
We measured vocal behavior over a period of about 4 years, when monkey C's age ranged from 4.8 to 8.1 years and monkey T's age spanned from 5.1 to 7.9 years. During this time, we recorded 12,769 vocalizations in monkey C and 21,029 vocalizations in monkey T, which were uttered as obligatory responses in the vocal response task (Fig. 1A). In total, this corresponded to 192 daily sessions in monkey C, and 218 sessions in monkey T. Vocal recording sessions comprised eight contiguous epochs in monkey C and 7 epochs in monkey T (see Materials and methods for details). Fig. 2 shows the vocalization behavior of both monkeys over this time in relation to the timing of life history events in macaques (Fleagle, 2013). We measured several behavioral parameters characterizing call behavior: the total number of volitional calls per session, the hit rate (percentage correct responses) and the false alarm rate (vocalizations during catch trials without a go stimulus). The hit rate and false alarm rate were used to calculate the sensitivity index, or d′, from signal detection theory (Green and Swets, 1966). During the first epoch, and at an age of 4.8 and 5.1 years for monkey C and monkey T, respectively, both monkeys showed superior vocalization behavior. This was evidenced by high call rates (monkey C: median 90 calls per session, Fig. 2A; monkey T: median 181 calls per session, Fig. 2B), high hit rates (monkey C: median 62.7%, Fig. 2C; monkey T: median 56.2%, Fig. 2D) and no false alarms at all in both monkeys (Fig. 2E,F). As a result of this high performance, the d′ value was 4.0 in monkey C (Fig. 2G) and 3.9 in monkey T (Fig. 2H), and thus well above chance.
However, vocal performance progressively declined with increasing age of the monkeys. The number of calls per session decreased systematically over the epochs until both monkeys stopped uttering vocalizations completely (Fig. 2A,B; Kruskal–Wallis test; monkey C: P<0.001, N=192, d.f.=7, χ2=109.1; monkey T: P<0.001, N=218, d.f.=6, χ2=138.1) and was significantly correlated with age (Pearson's correlation: monkey C: P<0.001, N=192, R=−0.63; monkey T: P<0.001, N=192, R=−0.63; Fig. 2A,B). A similar decline of hit rates was observed for monkey C (Fig. 2C; Kruskal–Wallis test, P<0.001, N=192, d.f.=7, χ2=125.1) and monkey T (Fig. 2D; Kruskal–Wallis test, P<0.001, N=218, d.f.=6, χ2=151.4), which was also significantly correlated with age in both monkeys (Pearson's correlation: monkey C: P<0.001, N=192, R=−0.80; monkey T: P<0.001, N=218, R=−0.73; Fig. 2C,D). Importantly, however, the false alarm rate stayed at low levels for all epochs in both monkeys (Fig. 2E,F), indicating that the monkeys did not develop arbitrary calling behavior. Therefore, the accompanying significant change of d′ values (Fig. 2G,H; Kruskal–Wallis test; monkey C: P<0.001, N=192, d.f.=7, χ2=118.3; monkey T: P<0.001, N=165, d.f.=5, χ2=42.6), as well as the correlation of d′ values with age, was caused by the decrease in overall vocalizations until extinction. However, median d′ values were well above detection threshold until the end of the recordings (Pearson's correlation: monkey C: P<0.001, N=190, R=−0.68; monkey T: P<0.001, N=165, R=−0.39; Fig. 2G,H).
In parallel with the decline in performance, call latency increased significantly in duration. In monkey C, call latency changed from a median of 1.64 s in epoch 1 to 2.63 s in epoch 7 (Fig. 3A,B; Kruskal–Wallis test, P<0.001, N=130, d.f.=4, χ2=91.0, post hoc Wilcoxon rank sum test, P<0.001, N=28). In addition, median call latency was significantly correlated with the age of monkey C (Pearson's correlation, P<0.001, N=130, R=0.77). A less pronounced but equally significant decrease of call latency was observed between the first and last time epoch in monkey T (Fig. 3C; Kruskal–Wallis test, P<0.001, N=165, d.f.=5, χ2=32.4, post hoc Wilcoxon rank sum test, P<0.02, N=68). Changes in call latency did not constantly increase from epoch to epoch as in monkey C and showed only a weak, yet significant, correlation with the animal's age (Pearson's correlation, P<0.01, N=165, R=0.21).
To see whether the absence of vocalizations within the vocal response task was due to a general loss of vocal behavior, we investigated the spontaneous vocal behavior of both monkeys in their housing environment during their juvenile phase and adulthood. Fig. 4 depicts the mean occurrence of the monkeys’ vocal behavior during focal animal scanning (10 min ethogram). Spontaneous calling behavior remained stable in monkey C (Wilcoxon rank sum test, P>0.1, N=20). Monkey T showed reduced spontaneous calling behavior during adulthood (Wilcoxon rank sum test, P<0.01, N=20), but never stopped vocalizing spontaneously. Thus, the ongoing spontaneous call behavior of both monkeys was in stark contrast to the complete halt of volitional vocalizations with age. Therefore, the reported decline of volitional vocalizations cannot be explained by a general lapse of calling behavior, because the monkeys continued to vocalize spontaneously in their housing environment, i.e. outside of the behavioral protocol.
Moreover, the discontinuation of volitional vocal behavior could also not be accounted for by major environmental changes. Throughout these years of training, the monkeys maintained continuous good health (also verified by regular blood tests) and gained normal weight. Moreover, the same behavioral protocol was presented, the same controlled fluid intake protocol for motivation was applied, the same housing of the monkeys in small social groups was carried out, and the same scientific trainers (S.R.H. and N.G.) worked with the monkeys throughout this 4 year period.
Finally, we wondered whether the extinction of volitional calling could be explained by a general loss of volitional responses, a lack of motivation, or some general resistance to respond in a conditioned task. To test this possibility, we re-trained both monkeys after they stopped vocalizing in the vocal response task on a manual response task. To remain within the same sensory modality, we trained them to perform a DMS task with color stimuli (Fig. 1B). Monkeys were required to use a manual bar release instead of a vocalization as a response. Even though a DMS discrimination task is more demanding in comparison to the previous simple detection task, the monkeys, which were now 9.0 years old (monkey C) and 8.2 years old (monkey T), showed full recovery of the volitional response. Monkey C performed, on average, 534 trials (8 sessions) and monkey T performed 526 trials (7 sessions; both medians; Fig. 2A,B). They also showed a high median percentage of correct responses (Fig. 2C,D; monkey C: 80.2%, monkey T: 79.7%). Both monkeys continued to work at this high performance level.
We report a systematic decline of volitional vocalizations in rhesus monkeys that was not explained by (a) a general lapse of calling behavior, (b) environmental changes or (c) a general loss of voluntary responses or lack of motivation. During this longitudinal investigation, we also performed unilateral single-unit recordings with microelectrodes in the prefrontal cortex (PFC) of both monkeys (Hage and Nieder, 2013, 2015), but we exclude the possibility that recordings caused damage that would have left the monkeys unable to vocalize on command. We have never witnessed a decline of any cognitive function as a result of PFC recordings, and post-mortem histological examination of other monkey brains has never showed damage to the tissue resulting from recordings. Furthermore, both monkeys have successfully been re-trained on other demanding tasks, and there was no indication whatsoever that the monkeys had suffered from disturbance of cognitive control functions. In fact, we argue that the visual DMS task that both monkeys successfully performed after they ceased to vocalize volitionally is more demanding than the cued vocalization (CV) task. In contrast to the CV task, the DMS task required discrimination of both sample and test stimuli (not just simple detection of a go stimulus) and memorization of a sample image over a delay period (which was entirely missing in the CV task). This is another indication that the monkeys were fully intact. Finally, we think it is highly unlikely that a putative worsened coordination between the manual and oral domains over development (the monkeys needed to grab a bar while vocalizing) might have caused the observed effects, given that hand movements and vocalizations were temporally disparate. Because the observed decline in volitional call behavior correlated with the transition of the monkeys from juvenile phases to adulthood, our findings can therefore best be reconciled with a maturation process. We suspect that early in ontogeny, the monkeys’ neural central executive was still connected with the vocal motor network, thus allowing rudimentary cognitive control over call behavior. This cognitive control of vocal behavior was lost when the monkeys reached adulthood, pointing to developmental reorganization in the brain of these monkeys.
Using the identical task protocol, we previously reported a neuronal correlate of the monkeys’ ability to initiate calls in response to the detection of an arbitrary visual stimulus (Hage and Nieder, 2013). Single neurons in the monkey homolog of Broca's area (Brodmann area 44 and 45) in the lateral PFC specifically signaled the preparation of instructed vocalizations, but not of spontaneous calls (Hage and Nieder, 2013). We hypothesize that these neurons of the PFC (which is generally associated with the brain's cognitive control center) connect the brain's executive with the vocal motor network early in primate ontogeny (Ackermann et al., 2014) as an obligate network for executive control on vocal output (Miller and Cohen, 2001). The anatomical substrate of this juvenile capability might be found in the excessive synaptic connections and dendritic spines particularly found in the PFC of human and non-human primates that are initially overproduced to about two times the adult number before being pruned during puberty to reach the adult level at the onset of adolescence (Petanjek et al., 2011; Bourgeois et al., 1994; Huttenlocher and Dabholkar, 1997; Dehaene and Cohen, 2007). This neoteny of brain structures in the PFC could be mediated by genes related to the development of the prefrontal cortex that show a correspondingly excessive, neotenic expression in humans relative to chimpanzees and rhesus macaques (Somel et al., 2013). Our hypothesis predicts that neural connections between the executive functioning networks in PFC and the brain's vocal motor network, which exist in juvenile monkeys, are decoupled during adolescence and are lost in adult monkeys. If true, such a finding would strengthen the neoteny hypothesis of human evolution (Gould, 1977) and explain aspects of human language evolution.
It is widely acknowledged that adolescence is associated with considerable reorganization of the brain. But what could cause the loss of volitional vocalizations? Activity-dependent pruning of connections via elimination of excessive synapses is thought to play a major role in sculpting circuits and connections during ontogeny. However, because the brain networks to produce vocalizations were in use and of considerable behavioral relevance for our monkeys, the loss of this function would be difficult to reconcile with activity-dependent elimination of synapses. However, even without activity-dependent plasticity, the brain undergoes considerable reorganization during adolescence that serves a variety of other, possibly competing functions. For instance, hormonal changes associated with sexual maturation contribute to adolescent-typical behavioral changes that necessarily have an impact on large-scale networks. Functions beneficial during childhood may become inhibited during adulthood. In addition, changes of the highly interconnected brain in one area may in turn constrain the maintenance of other functions. Moreover, synaptic elimination during adolescence probably involves adjustment of the excitatory/inhibitory balance on individual neurons and within networks, given that excitatory synapses are selectively degenerated whereas inhibitory synapses are spared (Rakic et al., 1986). We speculate that the causes of the loss of brain circuits and networks for voluntary vocalizations are related to one (or several) of the non-activity-related elimination processes occurring in the maturing brain.
Our study emphasizes one of the rare cases of commonality between the human language system and non-human primate communication systems, namely the (developmentally restricted) ability to cognitively control vocalizations. It suggests that one important aspect of flexible communication is grounded in the primate lineage and could be exploited during the emergence of functional flexibility of prelinguistic vocalizations of human infants (Oller et al., 2013). As a phylogenetic pre-adaptation, volitional control of vocal utterances would be a crucial subcomponent in the complex multi-component system ‘human language’ and instrumental for all higher level linguistic characteristics emerging in human development, such as semantic compositionality or the grasp and mastering of a symbol system (Deacon, 1997; Nieder, 2009). Our behavioral study suggests an expansion of the juvenile period during ontogeny as one of the key evolutionary events in the evolution of language.
We thank two anonymous reviewers for helpful comments on a previous version of the manuscript.
The authors declare no competing or financial interests.
S.R.H. and A.N. designed the study, interpreted the data and wrote the manuscript. S.R.H. and N.G. performed experiments and analyzed the data.
This work was supported by the Werner Reichardt Centre for Integrative Neuroscience (CIN) at the Eberhard Karls University of Tübingen (CIN is an Excellence Cluster funded by the Deutsche Forschungsgemeinschaft within the frame work of the Excellence Initiative EXC 307).
- Received January 20, 2016.
- Accepted March 15, 2016.
- © 2016. Published by The Company of Biologists Ltd