We studied whether honeybees can distinguish face-like configurations by using standardized stimuli commonly employed in primate and human visual research. Furthermore, we studied whether, irrespective of their capacity to distinguish between face-like stimuli, bees learn to classify visual stimuli built up of the same elements in face-like versus non-face-like categories. We showed that bees succeeded in discriminating both face-like and non-face-like stimuli and categorized appropriately novel stimuli in these two classes. To this end, they used configural information and not just isolated features or low-level cues. Bees looked for a specific configuration in which each feature had to be located in an appropriate spatial relationship with respect to the others, thus showing sensitivity for first-order relationships between features. Although faces are biologically irrelevant stimuli for bees, the fact that they were able to integrate visual features into complex representations suggests that face-like stimulus categorization can occur even in the absence of brain regions specialized in face processing.
Primates are very good at processing face-like stimuli (Rosenfeld and Van Hoesen, 1979; Parr et al., 2000). In particular, humans have remarkable capabilities for learning unfamiliar faces and recognizing familiar faces (Collishaw and Hole, 2000). This ability has been related to the possession of specialized brain areas both in primates (Tsao et al., 2006) and in humans (Kanwisher, 2000). The capacity for recognizing familiar faces has largely been attributed to configural processing (Tanaka and Sengco, 1997; Collishaw and Hole, 2000; Maurer et al., 2002), which allows treating a complex visual stimulus by taking into account not only its individual components but also the relationships among them (Palmeri and Gauthier, 2004; Peterson and Rhodes, 2003). It has often been assumed that this ability requires time to develop because children confronted with face-recognition tasks move towards configural processing with increasing age and visual experience (Carey and Diamond, 1977; Carey and Diamond, 1994). However, experiments on how humans learn non-face objects using configural processing (Gauthier and Tarr, 1997; Gauthier et al., 2000; Busey and Vanderkolk, 2004) suggest that this ability might be learnt reasonably quickly if the appropriate visual experience is made available.
Putting these results into perspective is difficult given the various meanings that the term ‘configural processing’ can adopt. Indeed, although commonly used in visual cognition studies, the term configural processing remains ambiguous as it can refer to different levels of compound stimulus processing. Configural learning and processing sensu Pearce (Pearce, 1987; Pearce, 1994), for instance, implies that a compound ‘AB’ is treated as an entity different from the sum of its elements — that is, the stimulus complex AB is not viewed as ‘A+B’ but, instead, can be thought of as a distinct entity that is related to A and B only through physical similarity. In visual cognition, the term configural processing rarely refers to Pearce's theories and is used to refer to processing forms that involve perceiving relations among the features of a compound stimulus (Maurer et al., 2002). It is opposed to ‘featural processing’ (or ‘analytical processing’), in which only the features, but not the relationships among them, are taken into account. In the light of such ambiguity, Maurer et al. (Maurer et al., 2002) proposed that studies on visual cognition, particularly face-recognition studies, should distinguish three levels of configural processing: (i) sensitivity to first-order relations, in which basic relationships between features are taken into account (e.g. detecting a face because its features conform to a standard arrangement in which two eyes are located above, and a nose is in turn located above a mouth, etc.); (ii) holistic processing, in which features are bound together into a gestalt; and (iii) sensitivity to second-order relationships, in which distances between features are perceived and used for discrimination (for a review, see Maurer et al., 2002). In order to avoid the lack of consensus about terminology, and the fact that ‘configural processing’ is used indistinctly to characterize one or all three types of processing mentioned above, we will adopt here Maurer and colleagues' three-level definition as the main framework for our study.
Besides humans and primates, insects constitute an interesting model to understand how brains learn to process complex images (Peng et al., 2007; Benard et al., 2006). Among insects, honeybees are particularly appealing because they learn and memorize a variety of complex visual cues to identify their food sources, namely flowers. The study of their visual capacities is amenable to the laboratory as it is possible to train and test individual free-flying bees on specific visual targets, on which the experimenter offers a drop of sucrose solution as the equivalent of a nectar reward (reviewed by Giurfa, 2007).
Using this protocol, it has recently been shown that bees are capable of previously unsuspected higher-order forms of visual learning that have been mainly studied in vertebrates with larger brains. Indeed, bees categorize both artificial patterns (for a review, see Benard et al., 2006) and pictures of natural scenes (Zhang et al., 2004). They also learn abstract relationships (e.g. sameness) between visual objects in their environment (Giurfa et al., 2001) and exhibit top-down modulation of their visual perception (Zhang and Srinivasan, 1994). Many of these experiments have shown that the way in which individual bees are conditioned is crucial to uncover fine discrimination performances (Zhang and Srinivasan, 1994; Giurfa et al., 1999; Stach and Giurfa, 2005). Bees trained in differential conditioning protocols, which imply learning to differentiate rewarded from non-rewarded targets, exhibit sophisticated discrimination abilities, some of which were unsuspected in an invertebrate (Giurfa, 2007).
The possibility that small brains can learn to recognize human-face-like stimuli has considerable impact on several domains, from fundamental ones related to the neural architecture required to achieve this task, to applications based on how computer vision could benefit from using similar and potentially highly efficient mechanisms (Rind, 2004). Although the visual machinery of bees has definitely not evolved to detect and recognize human faces, but rather flowers (Chittka and Menzel, 1992) and other biologically relevant objects, it might have the necessary capacities to extract and combine human face features in unique configurations defining different persons. This ability might simply reflect the use by the bees of similar strategies to recognize and discriminate food sources such as flowers in their natural complex environment. In other words, testing whether bees learn to recognize and classify face-like stimuli should be contemplated as a test of configural processing in the visual domain, allowing an understanding of which of the three levels of processing (see above) is used to process a complex visual stimulus through a relatively simple visual machinery. We do not intend to raise the inappropriate question of whether human faces are biologically important for bees, which is certainly not the case. Nevertheless, if classification and processing of face-like stimuli are achieved by a brain lacking specific areas devoted to the recognition of human faces such as those existing in humans and primates (Tsao et al., 2006; Kanwisher, 2000), one might conclude that basic mechanisms already available in more ‘primitive’ nervous systems allow the attainment of comparable goals in the absence of such brain specializations.
A recent study trained free-flying honeybees to discriminate pictures of human faces used in standard psychophysics tests (Dyer et al., 2005) and found that bees could indeed distinguish the pictures presented. This report was questioned (Pascalis, 2006) (but see Dyer, 2006) as it could not control for the actual cues extracted from the pictures and used by the bees for recognition. Indeed, instead of responding to specific face configurations, bees could have used low-level cues to perform their choices. However, there is evidence that wasps recognize conspecific faces (Tibbetts, 2002), and honeybees learn multiple representations of human face stimuli and interpolate this visual information to recognize novel face viewpoints (Dyer and Vuong, 2008), leading to the question of what mechanisms might allow minibrains to perform apparently complex spatial recognition tasks such as face recognition.
Here, we have asked whether honeybees can learn to classify visual stimuli that are constituted by the same visual features in face-like versus non-face-like categories. We used standardized stimuli commonly employed in primate and human visual research, and we analyzed the processing mechanisms, sensu Maurer et al. (Maurer et al., 2002), used by the bees to solve the visual discriminations proposed in our experiments.
MATERIALS AND METHODS
Experimental set-up and procedure
In Experiments 1 to 3, free-flying honeybees, Apis mellifera Linnaeus, were individually trained to collect sucrose solution on visual targets presented on the back walls of a Y-maze (Giurfa et al., 1996). Only one honeybee marked individually with a color spot on the thorax was present at a time.
The maze was covered by an ultraviolet-transparent Plexiglas ceiling to ensure the presence of natural daylight. The entrance of the maze led to a decision chamber, where the honeybee could choose between the two arms of the maze. Each arm was 40×20×20 cm (length×height×width). Visual targets (20×20 cm) were black-and-white parameterized line drawings presented vertically on the back walls of both arms and were placed at a distance of 15 cm from the decision chamber. They subtended thus a visual angle maximum of 67 deg. to the center of the decision chamber. One of the two stimuli was rewarded with 50% (weight/weight) sucrose solution, whereas the other was non-rewarded. Sucrose solution was delivered by means of a transparent micropipette 6 mm in diameter located in the center of the stimulus. The micropipette was undetectable to the bees from the decision chamber and did not provide a sucrose-predicting cue as the non-rewarded stimulus presented a similar but empty micropipette in its center.
During training, the side of the rewarded stimulus (left or right) was interchanged following a pseudorandom sequence in order to avoid positional (side) learning. If the bee chose the rewarded stimulus, it could drink sucrose solution ad libitum. When it chose the non-rewarded stimulus, it was gently tossed away from the maze such that it had to re-enter it to get the sucrose solution. In such cases, only the first incorrect choice was recorded. After training, transfer tests with different non-rewarded stimuli were performed. Such stimuli were novel to the bees as they were never used during the training. Contacts with the surface of the patterns were counted for 1 min. The choice proportion for each of the two stimuli was calculated. Each test was done twice, interchanging the sides of the patterns to control for side preferences. Refreshing trials, in which the training patterns were represented and the animal got reward on the appropriate ones, were intermingled among the tests to ensure motivation for the subsequent test.
In Experiment 4, free-flying bees were trained and tested with visual targets presented on a rotating grey screen, which was 50 cm in diameter (Dyer et al., 2005). The screen was located outdoors and was therefore illuminated by natural daylight. Four visual targets were presented at different, interchangeable positions on the screen. Visual targets were 6×8 cm achromatic photographs presented vertically. At the base of each target, a small platform allowed the bee to land. Two correct landing positions were rewarded with a drop of sucrose solution 30% (weight/weight) placed on the platform, whereas the two alternative positions presented a drop of 0.012% quinine solution. Thus, the presence of a liquid drop could not be used by the bees to discriminate correct from incorrect targets. A choice was recorded whenever the bee touched a landing platform. When the bee landed on a correct target, it could drink the sucrose solution (for details, see Dyer et al., 2005). When, by contrast, it landed on the incorrect target, it experienced the quinine solution. Between foraging bouts, landing platforms and stimuli were cleaned with 30% ethanol. After training, the bee experienced a non-rewarded test in which fresh stimuli were presented. Landings on the non-rewarded stimuli were counted until the bee flew more than one meter away from the screen. A minimum of 20 landings were counted for each test, and the test ended when the bee made 30 choices or when 5 min had elapsed.
Refreshing trials, in which the training patterns were presented again and the animal was rewarded on the appropriate ones, were intermingled between the tests to ensure motivation for the subsequent test.
In a first experiment, we trained bees with face-like stimuli (‘F1’ to ‘F6’) and/or non-face-like stimuli (‘NF1’ to ‘NF6’) (Fig. 1) presented in a Y-maze. Face-like stimuli consisted of parameterized line drawings presenting the main features constitutive of a face (eyes, nose and mouth). Such features could be varied systematically in order to create different face-like alternatives. Non-face-like stimuli NF1 to NF6 presented the same features in a scrambled way so that they exhibited no common configuration. Stimuli were printed on white paper with a high-resolution laser printer. Similar stimuli are commonly used in primate and human visual research (e.g. Sigala and Logothetis, 2002) as they allow independent variation of dimensions such as mouth or nose length and interocular distance. Each element (bar or disc) subtended a minimum visual angle of 8 deg., whereas the global stimuli subtended a visual angle of between 25 deg. and 48 deg., depending on the stimulus. The stimuli were therefore perfectly resolvable to the eyes of the bees. We first verified that bees were able to distinguish stimuli belonging to the same category, face or non-face (i.e. within-class discrimination) after 48-trial training (e.g. F4 vs F6 in the face-like class, and NF3 vs NF5 in the non-face class). Each discrimination experiment was balanced as it involved two groups of bees: in one of them, one stimulus was rewarded and the other stimulus was non-rewarded, whereas, in the other group, the stimulus contingencies were reversed. After training, tests with non-rewarded stimuli were performed.
We then studied whether bees learn to classify face-like versus non-face-like stimuli (i.e. between-class discrimination). We trained bees with five pairs of F versus NF stimuli (Fig. 1), which were presented in a random succession during 48 trials. Experiments were balanced as half of the bees were rewarded with sucrose on the F stimuli, whereas the other half was rewarded on the NF stimuli. The continuous alteration of the stimuli precluded that bees memorized a single stimulus pair. We determined whether bees extract the common configuration underlying the rewarded patterns (e.g. F or NF) and transfer appropriately their choice to a test pair of F versus NF stimuli that were never used during the training (sixth pair) and that did not present a sucrose reward. Performance in such transfer tests should thus reveal whether bees possess the capacity to build generic face versus non-face categories.
Four kinds of transfer tests were performed: (i) in a first transfer test, bees were confronted with a novel pair of F versus NF stimuli; bees trained to faces should transfer their choice to the novel F stimulus, whereas bees trained to non-faces should choose the novel NF stimulus; (ii) in a second transfer test, bees were confronted with an ambiguous situation as they had to choose between a novel F stimulus and a novel NF stimulus in which scrambled features presented the spatial configuration of a face; this test should reveal whether bees focus on the configuration irrespective of its content or whether they expect specific features at the appropriate position; (iii) in a third transfer test, bees had to choose between a novel face-like stimulus and the same image rotated by 180 deg. (i.e. upside-down); bees trained to faces should choose the novel face configuration, whereas bees trained to non-faces should choose the inverted face as an example of a non-face stimulus; this test allows ruling out bilateral symmetry as the cue predicting pattern reward, given that both test stimuli are perfectly symmetric; (iv) finally, in a fourth transfer test, bees were presented with the inverted face versus a novel, scrambled non-face-like stimulus. If bees classify novel stimuli in the face versus the non-face categories, random choice should be expected both in bees trained with F (no test stimuli would have a face configuration) and in bees trained with NF stimuli (both test stimuli would belong to the non-face category).
To control for potential effects of the set-up used, the same experiments were conducted using the rotating screen.
In a further experiment performed in the Y-maze, we tested whether bees used the face configuration or low-level features to classify stimuli in the appropriate category. Features such as the centre of gravity of the figures [COG (Ernst and Heisenberg, 1999)], the main visual angle subtended by a visual pattern to the decision point of a bee in the maze [(Horridge et al., 1992) in our case, the decision point was the centre of the triangular imaginary space between both arms of the maze] and the position of the eyes (two dots at the top) can be used as predictive cues allowing category discrimination without the necessity of configuration learning (COG: F stimuli: 8.9±0.7 cm, NF stimuli: 10.2±0.5 cm; Mann—Whitney test: Z5=1.92, P=0.055; visual angle: F stimuli: 32.4±2.6 deg.; NF stimuli: 43.6±1.6 deg.; Z5=2.56, P<0.01).
To control for this possibility, we trained bees to categorize face versus non-face stimuli, following the procedure of the previous experiment. Given that the previous experiment did not show differences in performance between bees trained to choose faces and those trained to choose non-faces, we analyzed the performance of bees trained to face-like stimuli only. After training, we performed two tests with novel stimuli (Fig. 4). In one of these tests, bees were confronted with F6 (not used during the training) versus a variant of F6 in which mouth and nose were swapped (F6′). If bees use only the position of the eyes (two dots on the top) to classify stimuli, random choice should be expected in this test. In the other test, the same bees were presented with a rough-drawn stimulus (‘RD’) versus F6′ used in the previous test (mouth and nose swapped). The RD stimulus was designed in such a way that it had a COG value similar to those of the non-face stimuli (10.8 cm), whereas F6′ had a COG close to those of the face stimuli (9.4 cm) despite not presenting a face configuration. Thus, if bees used the COG, they should prefer F6′ to RD, even if RD corresponds better to the face category than F6′. Moreover, F6′ and RD subtended the same visual angle to the decision point of the maze (39.4 deg.) — so this feature could not be used as a predictive cue. In this case, a random choice should be expected if bees base their choice on this cue. Finally, a fast-Fourier analysis (Zhang et al., 2004; Dyer et al., 2008) showed that the spatial frequency energy distribution of RD differed widely from that of all the stimuli used during training. Thus, bees should always prefer F6′ if they base their choice on this cue.
In this experiment, we studied the effect of enriching or impoverishing the face-like configuration learned. In one case, bees were trained in the Y-maze to distinguish two simple F stimuli that consisted of the parameterized line drawings (F1 vs F4 in Fig. 1; see also Fig. 5A, ‘learning test’) and were afterwards tested with the same configurations superimposed onto real-face layouts derived from achromatic photographs of human faces (see Fig. 5A, ‘transfer test’). Such photographs were obtained from standardized psychophysics tests of human visual recognition (Warrington, 1996), and they subtended a visual angle of 67 deg. from the center of the decision chamber. In the other case, the reverse protocol was conducted — that is, bees were trained to discriminate between F1 and F4 configurations superimposed onto real-face layouts and then tested with the line-drawing stimuli (Fig. 5B). In each experiment, one half of the bees were rewarded on F1 (superimposed or not onto a real-face layout), whereas the other half was rewarded on F4, thus ensuring that the experiments were balanced.
We performed two further transfer experiments using photographs of real human faces to determine whether findings on parameterized line stimuli also apply to the recognition of more complex pictures. Pictures of human faces were obtained from standardized psychophysics tests of human visual recognition (Warrington, 1996). They were presented on a circular screen apparatus, which could be rotated to change the position of the figures (Dyer et al., 2005).
Bees were first trained to distinguish two photographs of real human faces (Fig. 6, ‘learning test’) and then tested with altered versions of these photographs. For one group of bees, the outer features (hair and ears) were removed (Fig. 6, ‘transfer test 1’). For another group, the inner features (eyes, nose and mouth) were removed (Fig. 6, ‘transfer test 2’). For the last group, the photographs were scrambled along the vertical axis (see Fig. 6, ‘transfer test 3’). The scrambling method we used exactly matches the method used by Collishaw and Hole (Collishaw and Hole, 2000) and reorders the spatial arrangement of the major human facial features (hair, eyes, nose, mouth and chin) without causing a disruption to any of the particular features that bees could use to solve the recognition task in the transfer test.
Within each group, half of the bees were rewarded on one face (F1: left face on Fig. 6), whereas the other half was rewarded on the other face (F2: right face on Fig. 6), so that the experiments were balanced.
In all cases, we checked for normality using the Lilliefors test. When necessary and depending on the test to be used, data were subjected to an arcsine transformation in order to normalize them. The performance of balanced groups within each experiment (e.g. group trained to discriminate face-like stimuli rewarded from non-face-like stimuli non-rewarded vs group trained with the reversed contingency) was compared by means of a two-factorial ANOVA of repeated measures in which the groups constituted one factor and the test stimuli the other factor. For each individual bee, we calculated the proportion of correct choices per test (i.e. a single value per bee). Performance in a given test was therefore assessed through a sample of such values. This situation allowed a one-sample approach in which our null hypothesis was that the proportion of correct choices in the test considered was not different from a theoretical value of 50%. Such a hypothesis was evaluated by means of a one-sample t-test. In all cases the alpha level was 0.05.
We first studied within-class discrimination to ensure that transfer performances, if any, are not due to a lack of discrimination. Bees differentiated between F stimuli on the one hand and between NF stimuli on the other hand, thus showing that within-class discrimination was possible. As an example, Fig. 2 shows discrimination for the F pair (F4 vs F6) and the NF pair (NF3 vs NF5) in which stimuli were more similar and thus in principle difficult to distinguish (see Fig. 1).
In the task F4 versus F6 (face-like stimuli), discrimination was the same irrespective of which stimulus was rewarded (two-sample t-test, t6=1.97, P=0.10), so that results were pooled and presented as a single black bar (Fig. 2). Bees chose the correct F stimulus in the absence of sucrose reward in 68.7±3.1% of the cases (mean ± s.e.m.; N=8 bees; one-sample t-test against a 50% random choice, t7=5.72, P<0.001), thus showing a capacity to distinguish between closest face-like figures. A similar conclusion applies to non-face-like stimuli. In the task NF3 versus NF5 (non-face-like stimuli), discrimination did not depend on which stimulus was rewarded (t6=0.08, P=0.94), so that results were pooled and presented as a single white bar (Fig. 2). In this case, bees preferred the correct NF stimulus in 67.7±2.0% of the cases (N=8 bees; t7=8.22, P<0.001), thus showing a capacity to discriminate between highly similar non-face-like stimuli.
Bees were then trained to classify face-like versus non-face-like stimuli in a Y-maze with five pairs of F versus NF stimuli (Fig. 1), which were presented in a random succession. Fig. 3 shows the performance during the four transfer tests performed after training (black bars: bees trained on F stimuli; white bars: bees trained on NF stimuli).
In the first transfer test, bees of both groups (F-trained and NF-trained) transferred appropriately their choice to the corresponding stimulus of the novel pair. Thus, bees trained to faces chose the novel face-like configuration (78.4±7.3% correct choices; N=6; black bar in Fig. 3), whereas bees trained to non-faces chose the novel non-face-like configuration (64.3±9.8% correct choices; N=6; white bar in Fig. 3). As there were no significant differences in transfer performances between these two groups (t10=1.41, P=0.19), their data were pooled. Pooled performance was significantly different from a random choice (71.3±6.2% correct choices; t11=3.14, P<0.01), thus showing that bees extracted the correct configuration irrespective of the configuration trained.
In the second transfer test, bees rewarded on F stimuli transferred their choice appropriately to the novel F stimulus (79.5±2.8% correct choices; N=6; black bar in Fig. 3), whereas bees rewarded on NF stimuli preferred the novel NF stimulus in which the wrong features occupied the correct places of the face array (71.4±5.0% correct choices; N=6; white bar in Fig. 3). As there were no significant differences in transfer performances between the two groups of bees (t10=1.25, P=0.24), their data were pooled. Pooled performance was significantly different from a random choice (75.4±3.0%; t11=7.59, P<0.001). These results show that neither did bees trained to faces confuse the novel face-like stimulus with the ambiguous alternative nor did bees trained to non-faces interpret the ambiguous stimulus as a face. In other words, in extracting a face configuration, bees assigned features to a specific position, so that, if the spatial array was preserved but the position assigned to each feature was inappropriate, the stimulus was not recognized as belonging to the category learned.
In the third transfer test, bees rewarded on F stimuli chose the novel F configuration (67.3±6.1% correct choices; N=6; black bar in Fig. 3), whereas bees rewarded on NF stimuli preferred the inverted face (74.2±6.6% correct choices; N=6; white bar in Fig. 3). There were no significant differences in transfer performances between these two groups (t10=0.91, P=0.38). Pooled performance was significantly different from a random choice (70.8±4.4% correct choices; t11=3.86, P<0.005). These results indicate that bees lack rotational invariance as they do not treat an image and its 180 deg.-rotated version as equivalent. A rotated face-like configuration is therefore a non-face configuration, a result that excludes bilateral symmetry, distinctive of F stimuli, as the cue used to classify stimuli.
In the fourth transfer test, both groups of bees chose randomly between an inverted face-like stimulus and a novel non-face-like stimulus with scrambled features. Bees rewarded on F stimuli exhibited a random level of choices for the inverted face (50.6±2.8% choices; N=5; black bar in Fig. 3), whereas bees rewarded on NF stimuli exhibited a similar performance for the novel non-face-like stimulus (51.1±2.9% choices; N=5; white bar in Fig. 3). As there were no significant differences in transfer performances between these two groups (t8=0.40, P=0.70), their data were pooled. Pooled performance did not differ from a random choice (mean choice of the inverted face: 49.8±1.9%, t9=0.12, P=0.91). These results show, therefore, that bees trained to classify faces did not interpret a rotated face configuration as a face, thus reaffirming the lack of rotational invariance, and that bees trained to classify non-face-like stimuli treated a rotated face and a scrambled version of a face as equivalent. These performances reveal the use of specific configurations (i.e. face-like) in which the use of symmetry can be excluded.
We repeated this experiment by using the rotating screen to control for potential effects of the set-up used. The results were not significantly different from those obtained in the same experiment performed in the Y-maze (paired sample t-test; first transfer test: t9=0.28, P=0.79; second transfer test: t9=2.03, P=0.07; third transfer test: t9=0.08, P=0.94; the fourth transfer test was not performed). We conclude, therefore, that configural processing is a strategy employed by honeybees to recognize visual targets, which is independent of the experimental set-up used.
This experiment was conceived to determine whether bees solved the previous task using low-level cues such as the center of gravity of the stimuli [COG (Ernst and Heisenberg, 1999)], the visual angle subtended by their main axis (Horridge et al., 1992), their spatial frequency (Horridge, 1997) or the position of the two dots typical of face-like stimuli.
Bees trained to face-like stimuli were confronted with F6 (not used during the training) versus a variant of F6 (F6′) in which mouth and nose were swapped (Fig. 4, left). Bees significantly preferred F6 to F6′ (N=8; black bar in Fig. 4: 72.9±8.7 correct choices; t7=6.87, P<0.001), thus showing that they did not only use the position of the eyes (two dots on the top) to classify stimuli.
In a further test, the same bees were presented with a rough-drawn stimulus (RD) versus F6′ (Fig. 4, right). Bees significantly preferred RD to F6′ (N=8; white bar in Fig. 4: 73.7±8.3 correct choices; t7=7.44, P<0.001), thus showing that neither COG (predicting preference for F6′) nor the visual angle subtended by the stimuli to the decision point of the maze (39.4 deg. in both cases), nor spatial frequency energy distribution (which predicted preference for F6′) accounted for stimulus choice. Stimulus configuration was therefore the main information used by the bees to achieve discriminations.
To what extent can basic face-like configurations like the ones used in the previous experiments be recognized as such if additional visual cues pertaining to real human faces are added to them? And vice versa, can bees trained on a simple face-like configuration enriched by real human-face features recognize the correct configuration after depriving it of such features? To answer these questions, we performed two series of experiments, testing the effect of enriching or impoverishing the face-like configuration learned.
In the first series, bees trained with the parameterized line drawings alone discriminate very well between the two stimuli during the learning test. Bees rewarded on F1 (N=9; left black bar in Fig. 5A) reached 63.9±4.3% correct choices, whereas bees rewarded on F4 (N=9; left white bar in Fig. 5A) reached 64.9±2.2% correct choices. As both performances did not differ significantly (t16=0.02, P=0.98) their data could be pooled. The resulting performance (64.4±2.3% correct choices) was significantly different from a random choice (t17=5.46, P<0.001), thus showing that bees learned to recognize their respectively trained face-like configuration. In the transfer test, both groups of bees chose the correct face-like configuration despite being enriched by a human face background (Fig. 5A); bees originally rewarded on F1 chose preferentially the enriched version of F1 (66.3±3.4; right black bar in Fig. 5A), whereas bees trained on F4 preferred the enriched version of F4 (65.6±2.0%; right white bar in Fig. 5A). There were no significant differences between groups (t16=0.23, P=0.82). The pooled performance was significantly different from a random choice (66.0±1.9% of correct choices; t17=7.93, P<0.001), thus showing that adding a visual background did not alter the recognition of the configuration learnt.
In the second series of experiments, bees were first trained with the parameterized line drawings (F1 or F4) superimposed onto the real-face layouts and then tested with impoverished stimuli presenting only the parameterized line drawings. Training was successful in both groups of bees. Bees rewarded on the enriched F1 reached a level of 66.1±2.8% correct choices (left black bar in Fig. 5B), whereas bees rewarded on the enriched F4 performed at 68.2±3.1% correct choices (left white bar in Fig. 5B). There were no significant differences between these groups (t16=0.53, P=0.60; Fig. 5B). The pooled performance (67.2±2.0% of correct choices) differed significantly from a random choice (t17=7.94, P<0.001) and was similar to that obtained in the learning tests of Fig. 5A (two samples t-test, t34=0.76, P=0.45). In the transfer tests, both bees trained on F1 (72.0±3.5%; right black bar in Fig. 5B) and on F4 (68.9±3.7%; right white bar in Fig. 5B) transferred correctly their choice to the impoverished F1 and F4 configurations (Fig. 5B) with comparable performances (t16=0.23, P=0.82). The pooled choice level (70.4±2.5%) was significantly different from a random choice (t17=7.64, P<0.001) and did not differ from the transfer performance found in Fig. 5A (t34=1.47, P=0.15). Transfer in both directions was, therefore, equally possible, thus showing that enriching or impoverishing a simplified face-like configuration by adding or suppressing visual cues from real human faces did not affect visual recognition in bees.
Further experiments using actual human photographs were performed to determine whether findings on parameterized line stimuli apply to the recognition of complex pictures such as those of human faces. Bees were trained on the rotating screen to distinguish two photographs of real human faces (Fig. 6, ‘learning test’) and then tested with altered versions of these photographs. Half of the bees were rewarded on one face (F1: left face on Fig. 6), whereas the other half was rewarded on the other face (F2: right face on Fig. 6) so that experiments were balanced.
Bees learned to discriminate the two training stimuli. In the learning test, bees rewarded on F1 reached 74.0±1.0% correct choices (N=21; black bar in Fig. 6, ‘learning test’), whereas bees rewarded on F2 reached 78.0±1.1% correct choices (N=21; white bar in Fig. 6, ‘learning test’). As both performances did not differ significantly (t40=0.41, P=0.068), their data could be pooled. The resulting performance (76.0±1.1% correct choices) was significantly different from a random choice (t41=17.02, P<0.001), thus showing that bees learned to recognize the human-face photograph on which they were rewarded.
In the transfer test in which the outer features (hair and ears) were removed (Fig. 6, ‘transfer test 1’), bees originally rewarded on F1 chose preferentially the inner part of F1 (60.0±2.4; black bar in Fig. 6, ‘transfer test 1’), whereas bees trained on F2 preferred the inner part of F2 (60.7±3.0%; white bar in Fig. 6, ‘transfer test 1’). As there were no significant differences between the performances of these two groups (t12=0.20, P=0.85), their data could be pooled. The resulting performance was significantly different from a random choice (60.4±1.9% of correct choices; t13=5.47, P<0.001), thus showing that the inner parts of the faces were used by the bees to discriminate between the two human-face photographs. However, discrimination was significantly poorer than that obtained in the learning test with the complete photographs (paired samples t-test, t13=7.86, P<0.001).
In the transfer test in which the inner features (eyes, nose and mouth) were removed (Fig. 6, ‘transfer test 2’), bees trained on F1 significantly preferred the photograph presenting the outer parts of F1 (67.9±1.8%; black bar in Fig. 6, ‘transfer test 2’), whereas bees trained on F2 significantly preferred the photograph presenting the outer parts of F2 (70.7±1.7%; white bar in Fig. 6, ‘transfer test 2’). Performance was similar in both cases (t12=1.14, P=0.28). The pooled choice level (69.3±1.3%) was significantly different from a random choice (t13=10.26, P<0.001). However, discrimination was again significantly poorer than that obtained in the learning tests with the complete photographs (t13=2.97, P=0.01). In addition, recognition based on the outer features of the faces was significantly better than that based on the inner features (t26=3.95, P<0.001; Fig. 6, ‘transfer tests 1 and 2’). This experiment shows, therefore, that bees use both internal and external features of human-face photographs to discriminate between them and that both kinds of features are bound together in a configural representation.
Finally in the transfer test in which the photographs were scrambled along the vertical axis (see Fig. 6, ‘transfer test 3’), both groups of bees failed to choose the correct scrambled face (Fig. 6, ‘transfer test 3’). Bees originally rewarded on F1 chose the scrambled image of F1 in 51.6±2.5 of the cases (black bar in Fig. 6, ‘transfer test 3’), whereas bees trained on F2 chose the scrambled image of F2 in 50.0±3.1% of the cases (white bar in Fig. 6, ‘transfer test 3’). As there were no significant differences between groups (t12=0.39, P=0.70), their data could be pooled. The resulting performance (50.8±1.9% of correct choices) was not different from a random choice (t13=0.42, P=0.68). These results show that scrambling the photographs completely disrupts face recognition and suggests that bees employ holistic processing [as defined in Maurer et al. (Maurer et al., 2002)] to discriminate the photographs. Indeed, this manipulation alters the configuration of the face but not the features (Collishaw and Hole, 2000). The use the average picture brightness as a discriminative low-level cue can be discarded given that it was the same in the scrambled photographs.
The present work shows that configural visual processing is present in an insect and underlies its learning and classification of complex images such as face-like stimuli. Bees succeeded in categorizing face-like versus non-face-like stimuli using configural information and not only isolated features and low-level cues such as the symmetry, center of gravity, visual angle, spatial frequency or background cues present in face-like stimuli. Whether bees can use configural information to recognize complex visual stimuli remained an important question to be answered as it has been argued that bees can only use simple, unconnected features for object recognition (Horridge, 2009). Our findings exclude this possibility as stimulus recognition was possible even when low-level cues were removed or were confusing (Figs 4 and 5) and because recognition was not possible in the case of face-like stimuli in which the first-order relationship between features was slightly modified (Fig. 3). Moreover, pictures of real faces that contained all cues presented in a scrambled arrangement were not recognized as the training stimulus, given that the original configuration was disrupted (Fig. 6).
Following differential conditioning, bees thus looked not for isolated features but for a specific configuration in which each feature had to be located in an appropriate relationship with respect to the others. In that sense, their performance is consistent with Maurer et al.'s (Maurer et al., 2002) first level of configural processing termed ‘sensitivity to first-order relations’, in which basic relationships between features are taken into account. The second level proposed by Maurer et al. (Maurer et al., 2002), ‘holistic processing’, constitutes an appealing framework to interpret the performance of the bees, but so far the evidence obtained is contradictory and does not allow concluding that such a processing form is available in bees. Holistic processing implies that features are bound together into a gestalt, which is more than the simple sum of its components. From this perspective, it corresponds to Pearce's configural theories (Pearce, 1987; Pearce, 1994), such that it can be predicted that partial suppression of one or more components should severely affect gestalt recognition. This is not what we observed in experiment 3 (Fig. 5B) in which bees were trained with the parameterized line drawings superimposed onto the real-face layouts and then tested with impoverished stimuli presenting only the parameterized line drawings. In this case, suppression of the real-face background did not affect recognition. By contrast, experiments in which pictures of actual human faces were used (see Fig. 6) yielded evidence consistent with holistic processing as suppressing external or internal features of the faces induced a significant decrease in recognition. These contradictory results might be explained by differences in salience and/or similarity between components, which might affect the capacity for configuring elements into a compound (Deisig et al., 2002). The fact that more salient cues might be easier to extract to build a configured representation could explain why bees did not exhibit a decay in performance when the real human face background was suppressed, leaving the parameterized line configuration alone (Fig. 5B). In this case, the high contrast provided by the black features could promote focusing on the simplified configuration. On the contrary, when real human faces were deprived of part of their features (Fig. 6), a decay in performance was observed probably due to the absence of highly salient cues in this case. More experiments are, therefore, necessary to determine whether holistic processing occurs in the framework of complex visual stimuli recognition by honeybees. Finally, no evidence allows discussing Maurer and colleagues' third level of processing, ‘sensitivity to second-order relationships’, in which distances between features are perceived and used for discrimination. These results support, therefore, the notion that configural processing in bees reaches at least the ‘sensitivity to first-order relations’ level, based on extracting the relevant, predictive features common to a given category and combining them in a general representation.
Such a capacity allows constructing a high number of different representations on the basis of a limited number of features, thus providing the basis for complex categorization abilities. Visual categorization in bees has been shown in several independent experiments (for a review, see Benard et al., 2006). Such experiments focused on single-feature categorization and showed that bees transferred their choice to novel stimuli presenting the predictive feature of a category. Recent work shows that bees can construct complex image representations following extended differential conditioning (Stach et al., 2004; Stach and Giurfa, 2005) (but see Horridge, 2009). Here, we move a step further by showing that such a task can involve various, different features as long as these preserve the spatial relationship defining the category. This ability might underlie categorization of natural objects in classes such as radial flowers, plant stems or landscapes, as shown in free-flying honeybees (Zhang et al., 2004) and might thus be very useful for bees for foraging efficiently in a complex visual environment.
A crucial feature in visual discrimination experiments in bees is the visual angle at which targets to be discriminated are presented. Indeed, local or global processing might be promoted depending on how stimuli are perceived by the bees at the decision point in a Y-maze (Zhang et al., 1992). In our parameterized-line drawing experiments, the visual targets subtended a mean visual angle of 38 deg. to the eye of the bee when it had to decide between visual alternatives. This angle was chosen to ensure perception of a figure as a whole. Given the low spatial resolution of the insect compound eye, focusing on global configurations might be an appropriate strategy before closing-up to a visual target. Indeed, while spatial details are still unclear at farther distances, basic configurations are preserved and might be perceived in low-frequency visual patterns. For example, honeybees are able to learn images of natural shapes such as trees, which could at a distance assist in navigational tasks in complex visual environments such as forests (Dyer et al., 2008). It is interesting that the bees did not choose to use low-level cues such as symmetry (Fig. 3), COG, spatial frequency distribution (Fig. 4), or brightness (Fig. 5) to recognize the stimuli in the current study. This might be due to configural cues offering more robust information on which to make decisions in complex environments, where some low-level cues such as brightness are often highly variable. Note also that individual features were resolvable in our stimuli so that it was not the lack of discrimination (due to, for instance, a lack of visual resolution) that might have led to prioritizing configural information.
Our results show that a non-specialized brain can learn to do this complex recognition task using a mechanism of configural processing despite the absence of specialized brain areas, such as the fusiform face area (Sergent et al., 1992; Kanwisher et al., 1997) or its homologous region in the macaque brain (Tsao et al., 2006), which have been proposed to function as dedicated modules for the recognition of faces. This result has significant implications for understanding how larger brains might learn face-processing tasks if specialised neural circuitry is not available (Pierce et al., 2001; Koshino et al., 2008). We maintain nevertheless that face-like stimulus recognition is, in our experiments, an artificial situation far from the biological background of visual recognition tasks to which bees are naturally confronted. The various simplified and complex stimuli used in our experiments were simply uncommon flowers on which they were rewarded with sucrose solution as the equivalent of nectar and that they could recognize using configural processing. Although they could use a similar processing to distinguish between human faces, nothing prepares them to do so in evolutionary terms. The performance exhibited in our work underlines, nevertheless, that higher-order forms of visual processing and categorization of complex stimuli are not a prerogative of vertebrates. They are already present in more ‘simple’ brains, thus showing that simplicity refers to the number of neurons but not necessarily to the sophistication of performances that can be achieved with such a reduced number of neurons.
We thank two anonymous referees for comments and corrections. M.G. and A.A.-W. thank the French Research Council (CNRS) and the University Paul Sabatier for support. A.A.-W. was supported by a Travelling Fellowship from The Company of Biologists and by the University Paul Sabatier (ATUPS fellowship). A.D. acknowledges the USAF AOARD, the Alexander von Humboldt Foundation and ARC DP0878968 for funding support.
- © 2010.