Toothed whales (Cetacea, odontoceti) emit sound pulses to probe their surroundings by active echolocation. Non-invasive, acoustic Dtags were placed on deep-diving Blainville's beaked whales (Mesoplodon densirostris) to record their ultrasonic clicks and the returning echoes from prey items, providing a unique view on how a whale operates its biosonar during foraging in the wild. The process of echolocation during prey capture in this species can be divided into search, approach and terminal phases, as in echolocating bats. The approach phase, defined by the onset of detectable echoes recorded on the tag for click sequences terminated by a buzz, has interclick intervals (ICI) of 300-400 ms. These ICIs are more than a magnitude longer than the decreasing two-way travel time to the targets, showing that ICIs are not given by the two-way-travel times plus a fixed, short lag time. During the approach phase, the received echo energy increases by 10.4(±2) dB when the target range is halved, demonstrating that the whales do not employ range-compensating gain control of the transmitter, as has been implicated for some bats and dolphins. The terminal/buzz phase with ICIs of around 10 ms is initiated when one or more targets are within approximately a body length of the whale (2-5 m), so that strong echo returns in the approach phase are traded for rapid updates in the terminal phase. It is suggested that stable ICIs in the search and approach phases facilitate auditory scene analysis in a complex multi-target environment, and that a concomitant low click rate allows the whales to maintain high sound pressure outputs for prey detection and discrimination with a pneumatically driven, bi-modal sound generator.
- beaked whale
- Mesoplodon densirostris
- automatic gain control
- click interval
- sound production
Species belonging to the suborders Odontoceti (toothed whales) and Microchiroptera (bats), encompassing one in five of all known mammalian species, explore their environment by listening for echoes from emitted ultrasonic sound pulses. This process was coined echolocation by Griffin (1958) after a series of seminal studies showing that bats orient themselves and acquire aerial prey by the aid of biosonar. The 800 species of microchiropteran bats inhabit a suite of diverse niches, and numerous studies have demonstrated that different bat species have adapted their sonar systems to the differing acoustic characteristics of these habitats (Neuweiler, 1990; Schnitzler and Kalko, 2001; Denzinger et al., 2004) and to the size and behavior of their prey (Fenton, 1984; Barcley and Brigham, 1991; Surlykke et al., 1993; Houston et al., 2004). Toothed whales encompassing some 70 species also inhabit very different aquatic environments from shallow freshwater rivers to mesopelagic depths where they target diverse groups of prey. But little is known about how toothed whales may have adapted their sonar systems to different habitats, foraging niches and prey types, and how they use echolocation in the process of orientation and foraging.
Vespertilionid bats targeting aerial prey employ a stereotypical pattern of vocal behavior as they detect, locate and capture prey items. Their acoustic behavior during foraging can generally be divided into search, approach and terminal phases (Griffin, 1958; Griffin et al., 1960). The search phase in FM bats involves emission of 2-10 ms, frequency-modulated (FM) cries with stable interpulse (or interclick) intervals (IPI or ICI) (Kalko and Schnitzler, 1998). The approach phase starts when a prey is detected at maximum ranges of 2-4 m (Kalko, 1995). The ICI decreases during the approach phase as a function of reducing range between the ensonified prey and the approaching bat (Kalko, 1995; Wilson and Moss, 2004), and the pulse duration is often reduced to avoid overlap between emitted pulses and returning echoes (Cahlander et al., 1964). In the terminal phase just prior to capture, the repetition rate rapidly increases to some 200 Hz and pulse durations are reduced along with amplitudes and in some cases the pulse frequencies (Griffin et al., 1960; Simmons et al., 1979; Schnitzler and Kalko, 1998). A complete capture event of aerial prey by Vespertilionids normally lasts less than 500 ms, and it is repeated many hundreds of times during a nocturnal foraging bout (Kalko, 1995). Other bat families make constant frequency (CF) or short-CF, Doppler-sensitive cries for prey finding and orientation (Schnitzler and Kalko, 2001), underlining the diversity of biosonar signals in the microchiropteran suborder.
In addition to the gradual reduction in ICI with reducing two-way travel time (TWT) during the approach phase, the bat makes sensori-motor adjustments of its vocal apparatus, and auditory and neural systems in a response to the incoming echoes (Simmons, 1989; Wadsworth and Moss, 2000). In situations where both the vocalizing bat (if using a constant output) and its prey target can be modeled as point sources, the received echo levels will increase by a factor of four when the target range is reduced by a factor two, meaning that the echo level increases 12 dB when the target range is halved (plus gain from reduced frequency-dependent absorption). To avoid high sensation levels of echoes from targets at close quarters and the possible deafening effects of their own vocalizations, at least some bat species have automatic gain control (AGC) in their auditory system (Henson, 1965; Suga and Jen, 1975). They reduce the sensitivity of the ear just prior to a vocalization by tightening their stapedial muscles in the middle ear, and then gradually increase the sensitivity during the next 6.4 ms corresponding to a target range of 1.4 m (Suga and Jen, 1975). This gain control on the receiving side is further augmented by neural attenuation in the midbrain operating synchronously with the vocalizations (Suga and Schlegel, 1973). Kick and Simmons (1984) reported that AGC stabilizes the echo sensation level of the ear with an 11 dB attenuation per distance halved (dh) that almost compensates for the 12 dB dh-1 increase in echo level. They argued that stabilization of the echo sensation levels renders target-specific variations in target strength (such as wing fluttering) more detectable to the bats (Kick and Simmons, 1984), and that stable echo levels may serve to minimize amplitude-induced latency shifts that disrupt accurate ranging (Simmons and Kick, 1984). Using a different experimental setup, Hartley (1991) concluded that the AGC only reduces the received level by 6 dB dh-1, but that the bats concomitantly lower the source levels by 6 dB dh-1 to achieve a similar stabilizing effect on echo-sensation levels. An AGC of 6 dB dh-1 on the transmitting side was also reported by Kobler et al. (1985), but both studies used targets with much higher target strengths than natural prey, so it remains at present unresolved if AGC of the transmitter also applies for free-ranging bats in foraging situations. In summary, bats approaching a target in the lab reduce their ICIs and use AGC in the auditory system, and maybe also in their vocal apparatus, in adaptation to the temporal and energetic changes in the returning echoes.
With the exception of visual observations of bottom-foraging dolphins (Herzing and Santos, 2004), there is little if any information about how free-ranging toothed whales use echolocation to find and collect their prey. Studies in captivity have shown that harbor porpoises terminate a prey pursuit with a buzz similar to that reported for bats (Verfuss et al., 2000). Recordings from narwhals (Miller et al., 1995) and sperm whales (Madsen et al., 2002a; Miller et al., 2004) during foraging show that they too terminate capture with fast click trains, but the biosonar analogy to bats remains conjectural. There is no information about the ranges at which prey targets are detected or how echolocating toothed whales respond and adapt to incoming prey echoes.
However, elaborate studies on trained dolphins have provided great insight into the detection capabilities of echolocation for artificial targets in different experimental settings (Au, 1993). Delphinids have a dynamic sound production apparatus capable of varying the frequency peaks of clicks by more than an octave (Moore and Pawloski, 1990). In addition, dolphins can modify click source levels by 60 dB or more depending on the acoustic environment and the detection task (Au, 1993). Bottlenose dolphins can detect steel targets at ranges in excess of 100 m in high background noise levels by producing clicks with source levels up to 228 dB relative to (re.) 1μ Pa (peak-to-peak; pp) (Au et al., 1974). During target detection experiments, the bottlenose dolphin waits 19-45 ms after the return of the echo before emitting a new click. This additional lag time (after the round trip travel time of the sonar signal) has been interpreted as a delay for echo reception, processing and activation of motor-systems (Au, 1993). Thus, most delphinids in target-detection experiments use ICIs given by the TWT plus a short, fixed lag time (Au, 1993).
When delphinids are in small tanks or are faced with easy detection tasks at close range, they produce clicks with source levels (SL) below 200 dB re. 1μ Pa (pp) (Au, 1993). Conversely, if the echo-to-noise ratio (ENR) is reduced, the dolphins will often increase their SLs to improve the ENR (Au, 1993). This picture has recently been supported by data from several species of free-ranging delphinids echolocating on a hydrophone array. Au and Benoit-Bird (2003) reported that the source levels of dolphins' clicks are range dependent with a reduction of 6dB dh-1 as the dolphins approach the array. Au and Benoit-Bird (2003) argued that this is the result of an AGC built into the sound production apparatus where ICI adjustments to the reduction in TWT causes a reduction in the acoustic output. Thus, echolocating dolphins have a dynamic vocal-motor apparatus in which source level increases with ICI. If the dolphin adjusts to reducing target range by reducing ICI with TWT, this relation between SL and ICI may function as an AGC that stabilizes echo levels. Still, it remains unknown how toothed whale biosonar operates when free-ranging animals echolocate for prey.
In a recent brief communication, we reported acoustic data collected with archival Dtags, which store acoustic and diving data (Johnson and Tyack, 2003), on two elusive deep-diving beaked whale species, Ziphius cavirostris and Mesoplodon densirostris (Johnson et al., 2004). On the basis of detectable prey echoes, we showed that beaked whales echolocate for food during deep foraging dives by using ultrasonic clicks to ensonify their prey. Foraging events were terminated by a rapid click train, coined a buzz in analogy with bats, and impact sounds could often be heard when the prey was caught during increased dynamic acceleration by the foraging whale (Johnson et al., 2004).
In the present study, we explore the biosonar performance of one of the beaked whale species, Mesoplodon densirostris, in the context of the dynamic auditory scene (Moss and Surlykke, 2001) it encounters during foraging. We quantify how an echolocating toothed whale responds to information in incoming prey echoes, and we discuss the results in the light of reported biosonar performance and dynamics of bats and dolphins. We demonstrate that beaked whales do not employ AGC of their transmitter as they close in on a target, and that the ICIs are not given by TWTs plus a fixed, short lag time in the approach phase of prey pursuits. We suggest that stable ICIs in the search and approach phases facilitate auditory scene analysis in a multi-target environment, and that the long durations of these ICIs allow the whale to maintain high sound-pressure outputs for prey detection and selection with a pneumatically driven sound generator.
Materials and methods
Habitat, animals and tag deployment
Field work was performed off El Hierro in the Canary Islands during October, 2003. At the El Hierro field site, foraging Blainville's beaked whales (Mesoplodon densirostris L.) can be found less than 4 km off shore. Blainville's beaked whales (from here on Mesoplodon) are among the smaller beaked whales with an adult body length of around 4.5 m and weigh∼ 600 kg (Mead, 1989). They are found in tropical and temperate waters, and stomach contents from stranded animals have shown that they forage on mesopelagic squid and fish (Mead, 1998).
For tagging, surfacing whales were slowly approached in a small inflatable boat. The tags were deployed by a handheld pole and attached with suction cups. Due to positive buoyancy, the tags floated to the surface after a maximum programmed release time of 16 h after which they were recovered by taking bearings to built-in radio transmitters. Two adult Blainville's beaked whales were tagged for 15.4 h (male, eight deep foraging dives) and 3 h (female or juvenile, two deep foraging dives), respectively. The 3 h tag was placed behind the head (Fig. 1), whereas the tag on the second animal was attached closer to the dorsal fin. The whales were foraging on the slope of an underwater ridge with variable water depths between 500 and 1500 m.
Data were collected with Dtags that recorded sound and orientation of the tagged animal (Johnson and Tyack, 2003). Sounds were recorded with 16 bit resolution and 96 kHz sampling rate, providing an overall flat (±1 dB) frequency response of the recording system from 0.6 to 45 kHz. Low frequency flow noise was reduced by a built-in 1 pole(6 dB octave-1) high-pass filter (-3 dB at 400 Hz), and aliasing was avoided by use of sigma-delta conversion. The tags store 3 GByte of data corresponding to 16 h of sound recordings when using a loss-less audio-compression algorithm. No sounds saturated the recorder with clipping levels at 181 dB re. 1 μPa (peak).
The whales were not recorded making sounds at depths shallower than 200 m. However, they clicked almost continuously during foraging dives at depth (Johnson et al., 2004). Echoes from incoming prey could be detected in recordings from both tag deployments, but the 3 h tag, placed in the most favorable position behind the head (Fig. 1), rendered the only echo trains with sufficient signal-to-noise-ratios (SNR) for quantification of the echoes. We cannot prove that the incoming echoes are from prey items (Johnson et al., 2004) but, as demonstrated here, there is significant circumstantial evidence to support that parsimonious contention. However, the discussion and the results should be made with this inference in mind. The quantitative data on the echoes are derived from two dives of a single individual, whereas the general acoustic performance is based on both tag deployments. A large number of echoes were recorded in longer or shorter trains. It is assumed that the changes in recorded echo properties reflect the echo changes received by the auditory system of the whale with the exception that the tag recordings were limited to 48 kHz, excluding click and the echo energy above the Nyquist frequency of the tag (Johnson et al., 2004). This, however, is not likely to affect the relative energetic changes in the recorded echoes.
Targets may be ensonified during several clicks, but then suddenly disappear either because the prey item moved out of the beam as the whale ensonified a different target or because the prey successfully eluded the predator. To maximize the probability that we analyzed echo trains from targets the whale actually intended to capture and therefore tried hard to keep within the sonar beam, we selected sequences containing echoes terminated within 5 s before a buzz, strongly suggesting capture of the ensonified prey (Johnson et al., 2004). Secondly, we only included echolocation runs during which the echo delay was halved to ensure enough data points for evaluation of possible range effects on ICI and AGC. These criteria restricted the number of echolocation runs to 11 out the total number of 48 foraging buzzes made by the favorably tagged whale during two foraging dives.
Analysis and signal processing was performed with custom-written software in Matlab 6.0 (Mathworks; Natick, MA, USA). Click rates were derived with a click-detecting routine measuring the time differences between the peaks of the envelopes generated from consecutive clicks in a train. The relative acoustic output of the echolocating whale was estimated by quantifying the relative peak-peak amplitudes on a dB scale. Because the tag was placed behind the sound generator and out of the forward directed acoustic beam of the animals (Johnson et al., 2004), these measures do not reflect source levels, but we argue in line with Madsen et al. (2002b), that changes up or down in source level may also be seen as increases or reductions in the apparent output (AO) of clicks as measured by the tags (Fig. 1). Changing the shape or directivity of the acoustic beam could partly invalidate such a conjecture, but we have no means of assessing if that is taking place or not.
Echolocation sequences were identified by scrolling through the click trains with a Matlab script presenting sound power on a color scale in a click versus time plot, with a 25 s window. Returning echoes have a frequency content similar to on-axis clicks measured from clicking conspecifics that ensonify the tagged animal (Johnson et al., 2004). The clicks recorded from the tagged animal contain both weak high-frequency components and more-powerful low-frequency components generated by recording on, or close to, the sound generator. To measure the delay between the emitted click and the returning echo (Fig. 2B), we cross-correlated a window containing the outgoing click and the returning echo with an on-axis click recorded in the far field from an echolocating conspecific (Fig. 2C). The delay was subsequently determined by the time difference between peaks of the envelopes of the Hilbert-transformed cross-correlator output (Fig. 2C). The delay equals the TWT of the sound pulse to and from the ensonified target, and the delay can thus be converted to target range if the sound speed is known. Using the Leroy equation (Urick, 1983), the sound speed at 400-800 m depth was calculated to be 1485 m s-1 based on a temperature of 9°C and a salinity of 38%thou measured with a CTD (conductivity-temperature-depth) probe at 800 m depth on location.
The mammalian ear operates as an energy detector that integrates intensity over a time window τ (Green and Swets, 1966). When evaluating the echo levels received by the tagged whale, the echo return should therefore be quantified by energy flux density and not sound pressure (Au, 1993, 2004). Energy flux density (dB re. 1 μPa2 s) is given by the RMS intensity over an integration window T: 1 RMS sound pressure of returning echoes was calculated by integrating the square of the instantaneous pressure, p(t), as a function of time over the echo signal duration (T) (Equation 1) relative to the same integral over the same time, T, of a calibration signal based on the sensitivity of the tag. Echo duration (T) was determined from the relative signal energy derived by integrating the squared pressure over a 1 ms time window symmetrical around the peak of the echo envelope. Onset of the signal was defined as the point at which 5% of the relative signal energy was reached, and the termination of the signal was defined as the point at which 95% of the relative signal energy was reached. A 15 kHz high-pass filter (-12 dB octave-1) was applied to improve SNR. The integration time for the auditory system of Mesoplodon is unknown, so the best available figure is the echo duration, T, of 250 to 320 μs, which is close to the measured integration time of 263 μs for the bottlenose dolphin (Au et al., 1988).
No sounds were recorded from the whales within 200 m of the surface, but the animals start to click at a depth between 200 and 500 m during the descent part of foraging dives. The whales generally produce click sounds in two modes: regular clicking and buzzes (Figs 2A, 3). Regular clicking involves production of long click trains with regular ICIs of 200-500 ms, with most ICIs being around 400 m (Fig. 3). The bimodal nature of the sound production is seen in Fig. 3, where the apparent output (AO) is plotted as function of interclick intervals (ICI). It is seen that buzz clicks are produced with AO some 15 dB lower than regular clicks (Figs 2A, 3), and that the interclick intervals are between 5 and 20 ms. Regular clicks, conversely, have high apparent outputs, and long ICIs between 200 and 500 ms, with clustering around 400 ms (Fig. 3).
The regular clicks are directional, ultrasonic transients with durations around 250 μs and energy from 20 kHz and up to, and probably beyond, the Nyquist frequency of the recording system at 48 kHz (Johnson et al., 2004). The whales produce 4000-5000 regular clicks per dive. Buzzes, terminating some of the regular click trains, consist of 2-5 s high-repetition click trains where the ICIs are reduced to 5-20 ms (Fig. 2A). When analyzing assumed on-axis buzz clicks from nearby conspecifics ensonifying the tagged animal (sensu Johnson et al., 2004), buzz clicks have the same apparent frequency content (with the reservation of limited sampling) as regular clicks, but their duration is around 150 μs, which is only around half of that of regular clicks (Johnson et al., 2004). The whales produce 23 buzzes on average per dive (Johnson et al., 2004), amounting to some 10,000 buzz clicks per dive. Thus, a total of some 15,000 clicks are produced during each foraging dive.
The acoustic behavior of echolocating Mesoplodons during foraging can be divided into three phases: the search, approach and terminal phases. The initial search phase part of the vocal behavior involves long (10-30 s) trains of regular clicks interrupted by short pauses of 1-3 s. During regular clicking with ICIs between 300 and 400 ms, the whale passes through clouds of echo sources of varying echo return relating to the target strength (TS) and the degree of ensonification (Fig. 4A). This phase is coined the search phase. When the whales eventually focus on an object by ensonifying it during several clicks, the approach phase is initiated (Figs 4C, 5). This phase is characterized by a continuous ensonification of the target as the whale homes in on it (Figs 2, 4C). Thus, the approach phase is defined as the part of a click train where the whales continuously ensonify a prey item and receive echoes all the way to the transition to the buzz/terminal phase. Echoes will often become weaker or disappear from the tag recordings just before a buzz (Figs 4C, 5B), and then reappear within the buzz. This phenomenon probably relates to the fact that the whales start to roll upside down just before or in the beginning of the buzz, and the body shades the tag for the echoes (Fig. 5A). There are no apparent differences between the ICI and AO between the search and the approach phases. The third and terminal phase is characterized by a rapid increase of the click rate, the so-called buzz, and a reduction in the apparent output (Figs 2A, 3). The whale intercepts the prey in the terminal phase often by a sharp turn and increased dynamic acceleration (Fig. 5A,B).
The auditory scene (Bregman, 1990) of the echolocating whales comprises passive and active parts. The passive part arises from sounds in the acoustic Umwelt of the whales (Bregman, 1990), whereas the active part is generated and, to some degree, controlled by the echolocating whale by ensonification of objects in the water column (sensu Moss and Surlykke, 2001). Fig. 4 provides an example of the active part of the auditory scene of an echolocating beaked whale. Fig. 4A shows a one-dimensional version of a three-dimensional auditory scene as received by the whale in a 250 s time span. The complexity of the auditory scene is demonstrated by the large number of echoes when the whale passes and ensonifies target aggregations in the water column (Fig. 4B). On top of echoes coming from marine organisms within a range of some 20 m, the whale also receives strong echoes from the bottom when directing its sonar beam towards it (Fig. 4A). Thus, the actively generated acoustic Umwelt of beaked whales is a perceptually complex, multi-target auditory scene of echoes with temporal, spatial and spectral differences.
When the whales swim through clouds of echoes, they do not make capture attempts (Fig. 4B) on all ensonified prey targets nor do they necessarily select the ones with the largest echo strength (Fig. 4C). Rather, they seem to select certain targets in the periphery of dense echo clouds (Figs 4A-C, 6A) in a part of the water column between 650 and 725 m (Fig. 6A), where they also spend the most time (Fig. 6B).
Apparent output and energy flux density of returning echoes
The received echo energy (RL) is given by the source level energy flux density (SL) corrected for the two-way transmission loss (TL) plus the target strength (TS). If spherical spreading is applied, the transmission loss is given by 40 log(R)+α, where R is target range and α is the frequency-dependent absorption, which can be ignored at 40 kHz for the short target ranges in this study. Consequently, for constant source levels and continuous ensonification of a target with constant TS, the received echo level, RL, should be dictated by changes in TL only, yielding 12 dB dh-1. Conversely, if either SL, TS (by changing aspect of the target) or the degree of target ensonification changes, RL changes will not be given by the reducing TL only.
Fig. 7A shows how the RL of the closing target of Fig. 2 changes as a function of reducing target range. It is seen that RL increases steadily from 14.3 to 5.5 m where the maximum received level is recorded. After 5 m, the RL drops rapidly and the echoes cannot be detected at ranges closer than some 4 m. Another example is given in Fig. 8A where the RLs also increase with reducing target range, reaching a maximum at 2 m. We define the termination of the approach phase by the click at which the RL has reached its maximum. By plotting the RLs as a function of log10 to the target range (in meters) of the approach phase, the slope of the regression line will provide the range-dependent, if any, increase in RL on a dB scale. This has been done for the two examples of Figs 7A and 8A in Fig. 9A,B. It is seen that there is a large and significant linear relationship between the target range and the RLs in the approach phase of –26.7 and –40.6 log10(R), respectively. All approach phases in the analyzed material show a similar large and significant increase in echo energy (EE) with reducing target range during the approach phases. The mean change of –36 log(R) yields a 10.4 dB dh-1 (±2 dB), which in turn suggests that the RL changes can be explained, to a large degree, by TL changes only, and that SL and TS seem to be rather stable.
If the RL increases (with reducing range) only result from TL changes, then the SL should be more or less constant throughout the approach phases. We have argued in the Materials and methods section that the apparent output (AO) recorded on the tag can be used as a proxy for changes in SL. Fig. 7B and 8B show the AO for the approach phases of 7A and 8A. Fig. 7B shows stable AOs within a 5 dB window until the maximum RL is reached whereupon the AO drops rapidly in the transition to the buzz phase (Fig. 7C) with a concomitant decrease in RL. By plotting the AO in the approach phase as a function of log10 of range, the slope of the correlation between AO and range can be evaluated on a dB scale. As seen in Fig. 9A, there is a significant positive relationship between target range and AO, so that AO decreases significantly with reducing range in a 9.9 log(R) manner (-2.7 dB dh-1). On the contrary, the regression of the AO versus range of the approach in Fig. 8A does not show a significant decrease with range. Seven of the 11 approaches do not have a significant drop in AO with diminishing target range, and three approaches have a significant negative relationship.
Interclick intervals and two-way transit time
When the whale is approaching a prey target, the echoes return after shorter and shorter delays (Δt; Fig. 2B,C) as the TWT time drops. For the prey approach depicted in Fig. 7A, the target range is reduced from 14.5 to 5.3 ms during the approach phase as Δt or TWT goes from 20 ms to 7 ms. The ICI during the approach phase of around 400 ms is, however, much longer than the TWT by more than an order of magnitude. Initially during the approach in Fig. 7C, the ICI increases, but then it drops slowly and steadily during the approach phase. The approach of Fig. 8C shows a different and more typical ICI development, where the ICI initially also increases slightly, but then stays more or less constant during the approach phase. When looking at all the sequences, three have minor drops as depicted in Fig. 7C, whereas the rest only have small or statistically insignificant changes in their ICIs during the approach phase. If the data are pooled (a la Au, 1993, fig. 7.2), and plotted as a function of range (Fig. 10), it is seen that the regression line has a small, but significant slope of 7.1 ms m-1. The correlation is poor and there is considerable scatter as expressed by an R2 of 0.12. It is, however, safe to conclude that the ICIs during the approach phase are much longer than what would be predicted from a short (19-45 ms) processing time plus the TWT. This picture of long, fairly stable ICI during the approach phase is also supported from the large number of assumed approach phases preceding buzzes from both tag deployments. There are no evident differences between ICI of search and approach phases.
The present study is based on a limited data set collected mainly from two dives of a single animal. However, these data provide a novel perspective on how an echolocating animal receives and responds to incoming echoes from prey items in a biosonar based foraging system. The results are, to our knowledge, the first quantitative measures of prey echoes collected from any echolocating animal in the wild, and some of the first evidence on how a toothed whale acquires its food with sound in a natural setting. While the data to some extent only represent the biosonar of a single whale, we argue that the findings apply more generally to this species on the basis that this whale seemingly collected food in a successful manner using a biosonar with the derived properties, and that the second tagged whale showed similar movements and acoustic behavior. It should also be noted that the analyzed data are based on those of the approach phases (11 out of 48) where echoes could be traced to within 5 s of the transition to a buzz. Hence the data may only reflect how certain prey items are localized and approached, and that some capture events without detectable echoes in the preceding approach phases may involve capture of small prey using a different biosonar tactic. The acoustic behavior in terms of AO and click intervals during the approaches with no detectable echoes is, however, similar to those approaches from which echoes have been extracted, suggesting that the biosonar operates in the same general fashion during all approaches in both whales irrespective of prey type. Thus, the whales seem to employ stereotyped acoustic behavior irrespective of prey type in this habitat.
During prey localization and capture, the whales use the search, approach and terminal/buzz phases as seen in insectivorous Vespertilionid bats hunting for aerial prey. Bats evolved to echolocate for prey in Eocene more than 50 million years ago (Novacek, 1985) and odontocete cetaceans evolved the same capabilities independently some 30 million years ago (Thewissen, 1998; Fordyce, 2002). It is striking to note how two very different groups of mammals in functional convergence have evolved the same basic acoustic behavior and movements (Fig. 5A) during echolocation and capture of prey in aquatic and aerial habitats. It appears that a pneumatically generated, high repetition, low amplitude buzz, providing rapid temporal updates, is advantageous in the terminal phase of biosonar-based prey capture, irrespective of whether the echolocator is a 3 g bat in a tropical rainforest or a 600 kg whale at 700 m depth in oceanic blue water. There are, however, temporal differences between the two groups in that an echolocation event typically lasts less than 1 s in bats (Kalko, 1995) and around 10 s for the Mesoplodon, which probably relates to the ratio between speed of motion of the predator and target detection ranges, being low in bats and high for the whales.
The beaked whale sonar also differs significantly from the biosonar performance reported for bats and dolphins in other ways. As touched upon in the Introduction, bats employ AGC in their receiving and apparently also in their transmitting systems to counteract the gain of 12 dB dh-1 during target approaches for echo level stabilization. AGC in the transmitting system has also been reported for dolphins approaching deployed recording gear in the wild, and it has been proposed to be an adaptation to stabilize received echoes from fish schools with volume reverberative properties (Au and Benoit-Bird, 2003). The present study has the advantage of being able to quantify echo returns during approaches of the targets, and it turns out that the mean increase in received echo energy is 10.4 dB dh-1 for the analyzed approaches. Considering that the target may change aspect and thereby apparent target strength during the approach, and that the prey may not be right on the acoustic axis of the directional clicks of the whale, 10.4 dB is surprisingly close to the expected 12 dB dh-1 if the increase in echo return was due to changes in transmission loss only. Secondly, the 10.4 dB dh-1 increase is far from being 6 or 1 dB dh-1 as would be the case with an AGC similar to the one reported for some bats (Hartley, 1991; Kick and Simmons, 1984) and dolphins (Au and Benoit-Bird, 2003) were implemented. Thus, there is no support in the data for the contention that beaked whales have a 12 or 6 dB AGC in the transmitting part of their biosonar as reported for bats and dolphins.
Evidence for the lack of a transmit AGC is supported further by the observation that the AO is kept high during the approach phase. The whales seem to increase the received echo level rather than compensating for it by a major reduction in source level. It remains unknown if the sensation level is increased by a similar magnitude or if the increase is partly compensated for by an AGC on the receiving side as is the case for bats (Henson, 1965; Suga and Jen, 1975). The role of the cetacean middle ear in hearing is debated (Hemila et al., 1999; Ketten, 1997; Ridgway et al., 2001). The large mass of odontocete middle ear bones does not suggest rapid and strong middle ear reflexes (Au and Benoit-Bird, 2003), as seen in bats with AGC on the receiving side (Suga and Jen, 1975). However, a recent novel experiment by Supin et al. (2004) shows that the acoustic brainstem response signals of a false killer whale vary little with transmission loss, which may indicate time-varying gain control provided that the target ensonification levels were the same irrespective of range.
The highest received levels of echoes of 90 dB re. 1 μPa2 s (135 dB 1 μPa, pp) are unlikely to present any harm to the auditory system. Rather it would seem that high echo-to-noise ratios (ENR) increase the information that can be derived from the prey echoes. By maximizing the echo return from the prey, the whales get high ENRs for signal processing and prey classification. The fact that the whale ensonifies a large number of targets with only engaging in a few pursuits (Figs 4A,B and 6) suggests that the whale was selecting certain types of prey. In order to do so, it seems advantageous to gather as much echo information about the targets as early on in the approach as possible to maximize the time for classification rather than just detection. Selective foraging seems to be employed by some bats (Black, 1972; Houston et al., 2004), but not all species (Barcley and Brigham, 1994). For Mesoplodon, selective foraging does seem plausible in a heterogeneous prey community where long, deep dives render capture of prey with the highest energy returns per dive effort beneficial. Dolphins have acute discrimination capabilities (Roitblat et al., 1995) that will deteriorate with decreasing SNR. Thus, the lack of AGC in the transmit system of beaked whales may serve to maximize ENR for target classification in a selective foraging scheme to maximize energy return per unit dive effort. Future studies should test if selective foraging relates to niche segregation in habitats with competitive resource partitioning among deep diving odontocete species.
While the animals seem to maximize echo return during the approach, there is little support for acoustic prey debilitation (Norris and Møhl, 1983) to occur. The identity of the sonar targets assumed to be prey is unknown, but stomach contents of Mesoplodon densirostris suggest the prey are likely to be squid or deep water fish (Mead, 1989). If the whales were to expose the prey to sound pressure levels of more than 230 dB re. 1μ Pa (0-p) required to debilitate fish (Zagaeski, 1987), they should continue to emit the high-powered clicks of the approach phase right up to the prey. Thus, considering that the sound pressure levels are reduced significantly at 2-5 m target range when a buzz is initiated, it seems that the sounds are used to locate the prey, but not to facilitate capture by acoustic debilitation. It remains, however, to be seen if squid and fish may be affected by high repetition, low level click trains in the buzzes.
A shared property of the biosonars of bats and dolphins is that the animals do not emit a sound pulse before they have received the echo from the previous sound pulse (Cahlander et al., 1964; Au, 1993). ICIs are, therefore, generally given by the TWT plus a short lag time. The biosonar of the echolocating whales in the present study performs likewise in that the whales do not emit a click before reception of the echo(s) from the previous click. The small, but significant slope of the regression line fitted to the pooled ICI against range (Fig. 10) suggest, at least in some approaches, a drop in ICI with reducing range/TWT. However, the ICI between 300 and 400 m during the approach phase of Mesoplodon is an order of magnitude longer than the ICIs of 20-50 msreported for dolphins echolocating at stationary targets at similar ranges (Au, 1993). Using the ICI and a lag time of some 30 ms, as has been found in dolphins (Au, 1993), much larger target ranges would be predicted: (400 ms - 30 ms) × 1.485 m ms-1× 0.5 = 275 m. The lack of an intimate relationship between the ICI and TWT during the approach phase is also very different from bats, where such a correlation defines the onset of the approach phase acoustically (Simmons, 1989).
While ICIs of 300-400 ms may indicate the maximum relevant target ranges during the initial search for prey aggregations during the descent part of the dive, they do not reflect the likely much shorter, actual target range while foraging at depth. Recent studies have modeled the prey detection ranges of large delphinids to be 50-300 m (Au et al., 2004; Madsen et al., 2004) and, if Mesoplodons can generate the same source levels as these delphinids of some 220 dB re. 1 μPa (pp), it is not inconceivable that the estimated 275 m search range reflects detection ranges for prey aggregations during the descent part of the dive. But when the prey echo during approaches has been received and, probably, processed within the first 50 ms after emission of the clicks (Fig. 3A,B), it is puzzling why the whales would wait another 300 ms before emission of the next click.
One possible answer to that question may relate to how the whales perceptually organize and analyze the auditory scene partly generated by their own clicks. Temporal control of vocal behavior affects the perceptually guided segregation of many targets in a complex, dynamic acoustic scene (Bregman, 1990), and bats have been inferred to implement auditory streaming by using stable ICIs in the search phase (Moss and Surlykke, 2001). The auditory scene displayed in Fig. 4 shows that deep-diving Mesoplodons generate a complex, 3-D-multi-target input flow to the auditory system that also seems to call for a similar perceptual organization. We propose that the stable ICIs of foraging Mesoplodons may be another example of the acoustic streaming inferred for bats for perceptual processing of a dynamic, actively generated auditory scene comprising back-scattering surroundings and prey targets (Moss and Surlykke, 2001).
Perceptual organization and processing of the auditory scene may help the whales to identify patches of preferred prey, and to keep track of such patches in time and space. As exemplified in Fig. 6, the whale does not engage in foraging where the echo density is the highest. Rather, it seems that the foraging occurs in a simpler acoustic scene (Figs 4 and 6). By keeping the ICIs long and stable both in the search and in the approach phases, the animals may be able to keep more distant echo sources such as prey patches and the bottom perceptually organized, so that this spatial and temporal information can be exploited either after a successful capture or if the approach is aborted.
A second, and not mutually exclusive, explanation for the much longer ICI than would be predicted from the TWT plus a short lag time, may relate to the biomechanics of the sound generator. Toothed whales generate sound by forcing pressurized air past monkey-lips-dorsal-bursae (MLDB) complexes in their foreheads (Ridgway et al., 1980; Cranford et al., 1996). Emission of clicks is preceded by an air-pressure build up in the bony nares (Ridgway et al., 1980) that eventually overcomes the variable tension of the closed MLDB-complexes by which a click is generated either when the monkey (phonic) lips separate (Dubrowskiy and Giro, 2004) or when they slap back together (Cranford and Amundin, 2004). Given that the system operates as a pneumatic capacitor (Cranford and Amundin, 2004; Au and Benoit-Bird, 2003), it may be envisioned that the sound generator has an upper limit to how fast it can produce clicks and still maintain high outputs in the regular click mode (Fig. 3), because it takes time to build up tension in the phonic lips and actuate them by the pressurized air. So while sound production in toothed whales is not linked intimately with the respiratory cycles as is the case in vocalizing bats (Suthers et al., 1972), the biomechanics of pneumatic sound production may pose other constraints on how signal parameters are interlinked in different modes.
Hence, if high ENRs are more important than frequent updates (short ICIs) during the approach phase, the slow ICIs may be explained by a need to keep up the acoustic output to maximize echo return for signal processing in selective predation. If the sound production apparatus operates as a pneumatic capacitor, the long ICI may therefore reflect the time constant of high outputs rather than target range per se. If AO can be used as a proxy for the acoustic output, the distinct lower border of AO at ICIs of 200 ms in the regular click mode (Fig. 3) suggests that the biomechanical constraints of a pneumatic sound generator dictates the ICIs during approaches. It may be envisioned that when the prey is within a body length, the whale needs frequent updates to keep track of the prey for capture rather than high received echo levels, so it switches to the buzz mode with short ICIs and low outputs (Fig. 3) by which maximized echo returns are traded for rapid updates on the position and movements of the prey target.
We have demonstrated that Blainville's beaked whales differ from what is known/has been conjectured about the biosonar performance of bats and dolphins by the lack of transmission AGC, and much longer ICIs than what would be predicted from a short, fixed processing time plus TWT. The question is whether this pattern is unique to this species, or if it applies more broadly to other odontocete echolocators. We do not have available echo data from other deep diving odontocetes, but their acoustic behavior may represent a clue to evaluate possible similarities. Both sperm whales (Physeter macrocephalus, Madsen et al., 2002b) and Cuvier's beaked whales (Ziphius cavirostris, Johnson et al., 2004) terminate long click trains by rapid buzzes similar to that of the Mesoplodon (Johnson et al., 2004; present study). Fig. 11 depicts representative ICI developments during transitions from regular clicking to buzzes in a sperm whale and a Cuvier's beaked whale tagged with Dtags. The ICIs of regular clicks vary between 400-500 ms for the Cuvier's beaked whale and between 550-650 ms for the sperm whale, and long ICIs are maintained until the buzz is initiated. The AO for both species is also kept high until the buzz where after it drops suddenly as seen in the Mesoplodon. If these buzzes serve the same function as in the Mesoplodon and in bats (Miller et al., 2004), there is circumstantial evidence to suggest that the lack of AGC and lack of TWT adjustment in ICIs are also found in other deep diving odontocetes, and that the sensory and biomechanical implications presented here may apply on a broader scale. These first data from toothed whales echolocating in a context and habitat for which their biosonars have evolved show that they in some ways behave differently than smaller odontocete species trained to solve specific tasks in captive settings. Future studies should attempt to elucidate how the biosonars of smaller odontocetes operate during echolocation for prey in natural habitats.
In conclusion, Blainville's beaked whales generate some 15,000 clicks per dive for orientation and echolocation of prey. The search and approach phases are characterized by a regular click mode with high, fairly stable outputs and ICIs around 400 ms, and the buzz phases are characterized by low outputs and high repetition rates. When a prey target during the approach is within approximately a body length, the whales trade high echo returns with rapid updates by switching to the buzz mode with low acoustic outputs and ICIs around 10 ms. Contrary to some reports from bats and dolphins, beaked whales do not employ AGC of their transmitter, and ICIs are not given by TWT plus a short, fixed lag time in the approach phase of prey pursuits. It is suggested that stable ICIs in the approach phase facilitate acoustic scene analysis in a multi-target environment, and that a low repetition rate allows the whales to maintain high sound-pressure outputs for prey detection and classification with a pneumatically driven sound generator. Similarities in acoustic behavior suggest that these biosonar characteristics during prey capture may also apply to other large, deep-diving toothed whales.
- automatic gain control
- apparent output
- constant frequency
- distance halved
- echo energy
- echo-to-noise ratio
- frequency modulated
- interclick intervals
- interpulse intervals
- instantaneous pressure
- relative to
- target range
- received echo energy
- source levels
- signal-to-noise ratios
- echo signal duration
- two-way transmission loss
- target strength
- two-way-travel time
A. Bocconcelli, A. Brito, F. Díaz, I. Domínguez, M. Guerra, F. Gutierrez, A. Hernandez and C. Militello are thanked for assistance in the field, and K. Barton, T. Hurst, J. Partan and A. Shorter provided engineering support. B. Møhl, B.K. Nielsen, A. Surlykke, M. Wahlberg and two referees provided analytical suggestions and constructive criticism on earlier versions of the manuscript. Funding for tag development was provided by the Cecil H. and Ida M. Green Award and the US Office of Naval Research. Funding for field work was provided by the Strategic Environmental Research and Development Program (SERDP) under program CS-1188 and the Packard Foundation. Fieldwork was supported by University of La Laguna and Governments of El Hierro and the Canary Islands. Research was conducted under US NMFS permits # 981-1578-02 and 981-1707-00 and a permit from the government of the Canary Islands. This publication is contribution number 11268 from the Woods Hole Oceanographic Institution.
- © The Company of Biologists Limited 2005