## SUMMARY

Goldfish swimming was analysed quantitatively to determine if it exhibits distinctive individual spatio-temporal patterns. Due to the inherent variability in fish locomotion, this hypothesis was tested using five nonlinear measures, complemented by mean velocity. A library was constructed of 75 trajectories, each of 5 min duration, acquired from five fish swimming in a constant and relatively homogeneous environment. Three nonlinear measures, the `characteristic fractal dimension' and `Richardson dimension', both quantifying the degree to which a trajectory departs from a straight line, and `relative dispersion', characterizing the variance as a function of the duration, have coefficients of variation less than 7%, in contrast to mean velocity (30%). A discriminant analysis, or classification system, based on all six measures revealed that trajectories are indeed highly individualistic, with the probability that any two trajectories generated from different fish are equivalent being less than 1%. That is, the combination of these measures allows a given trajectory to be assigned to its source with a high degree of confidence. The Richardson dimension and the `Hurst exponent', which quantifies persistence, were the most effective measures.

## Introduction

Individual differences are often sufficiently large as to make it difficult to quantify a behaviour and to distinguish its underlying components (Gotceitas and Colgan, 1988; Mather and Anderson, 1993; Wilson et al., 1993). However, individual variation may constitute an important aspect of behavioural selection (Clark and Ehlinger, 1987; Gotceitas and Colgan, 1988; Colgan et al., 1991). For example, it might ensure fitness of a population when resources are limited (Magurran, 1986a; Gotceitas and Colgan, 1988). Nevertheless, analyses of behavioural performance often focus on general phenomena of entire populations, and idiosyncratic aspects are noted secondarily. Moreover, studies on individuality often concern higher order behaviours, such as, in the case of teleost fish, foraging, fear avoidance, aggression, predator inspection, mating strategies, parental care and sociability (Gervai and Csányi, 1986; Magurran, 1986a,b; Clark and Ehlinger, 1987; Huntingford and Giles, 1987; Gotceitas and Colgan, 1988; Francis, 1990; Murphy and Pitcher, 1991; Colgan et al., 1991; Wilson et al., 1993; Budaev, 1997; Coleman and Wilson, 1998; Budaev and Zhuikov, 1998; Budaev et al., 1999a,b). While these behaviours all involve motor activity, they are often quantified on the basis of socio-biological descriptors, such as inspection rates, or proximity to other fish. However, less attention has been paid to the possibility of individual differences in the underlying basic patterns of motor activity.

Swimming is actually composed of highly organized spatial and temporal patterns even in a relatively homogeneous environment (Kleerekoper et al., 1974; Steele, 1983). Some of these patterns are complex and cannot be characterized with the tools of classical kinematics, as they may exhibit nonlinear properties, such as persistence (the tendency to repeat a given sequence), redundancy (the relationship between the uncertainty of a signal and its length) and scale invariance (a tendency for a signal to have the same structure when observed on different temporal or spatial scales) (Faure et al., 2003). Indeed, nonlinear measures have been used to characterize locomotion and the behavioural repertoires in various species, including invertebrates (Dicke and Burrough, 1988; Cole, 1995), fish (Coughlin et al., 1992; Alados and Weber, 1999; Brewer et al., 2001), birds (Viswanathan et al., 1996; Ferriere et al., 1999) and mammals (Paulus et al., 1990; Marghitu et al., 1996; Alados et al., 1996; Alados and Huffman, 2000).

The present study was designed to (1) apply five nonlinear measures and one linear measure as descriptors of goldfish swimming trajectories in order to quantify this locomotor behaviour and (2) to develop a discriminant analysis that would allow us to ask if a given trajectory could be assigned to an individual within the experimental pool. It was found that, despite the apparent variability of trajectories, our protocol could reliably achieve such a classification.

## Materials And Methods

### Animals

Mature goldfish (*Carassius auratus* L.) were purchased from a
commercial hatchery (Hunting Creek Fisheries, Inc., Thurmont, MD, USA). Upon
arrival in the laboratory, the animals were adapted to laboratory conditions
for at least one week. Five female fish with similar body length (9–12
cm) were chosen randomly. They were maintained together in a rectangular glass
aquarium (92 cm×41 cm×31 cm; 75 litre), using deionised water
conditioned with NovAqua (0.13 ml l^{–1}; Novalek Inc., Hayward,
CA, USA), Instant Ocean (16 mg l^{–1}; Aquarium Systems, Mentor,
OH, USA), Aquarium Salt (0.125 g l^{–1}; Jungle Labs, Cibolo,
TX, USA), Copper Safe (0.32 ml l^{–1}; Mardel Laboratories,
Inc., Harbor City, CA, USA) and pH Stabilizer 7.2 (0.125 g
l^{–1}; Jungle Labs). Water quality was monitored regularly and
was the same for holding and experimental tanks (temperature
22±1°C; pH 7±0.2; dissolved oxygen saturated, 8 p.p.m.).
Fish were fed on a regular 48-h schedule. A 12 h:12 h light:dark cycle was
supplied by room lights (360 lux at the water surface). All video recordings
were made during the light period.

### Swimming environment

A cylindrical Plexiglas tank (20 litre, 50 cm diameter) was used for the experiments. The water column was comparatively shallow (10 cm deep) to prevent fish from swimming out of the camera's focal plane and to minimise errors due to changing swimming depth. To reduce mechanosensory and visual cues, the tank was mounted on an anti-vibration table and its wall and lid were translucent white. Its bottom was clear to allow video recording from below.

Translucent white plastic sheets were mounted on the inside frame of the table with a small hole in the bottom sheet for the camera lens. Illumination was from above with a circular fluorescent bulb (approximately 350 lux at the water surface) and from below with four floodlights (approximately 250 lux at the bottom of the tank). New conditioned water was used for each recording session.

### Data acquisition and experimental design

Approximately 30 min prior to all recording sessions, fish were transferred to a translucent white container (20 cm×15 cm×10 cm) filled with aerated, conditioned water, to be marked for automated motion tracking. Two markers were applied with instant adhesive (Quick Tite; Locktite Corp., Avon, OH, USA) along the ventral midline of the fish to specify its position on the video image. They were made of double-sided black tape (1 cm×1 cm) with a dot (approximately 4 mm diameter) of white nail polish painted in the centre. For this purpose, the fish was removed from the water, the ventral midline was exposed and the skin was gently dried. The markers were applied between the paired pelvic and pectoral fins and onto the lower jaw in less than 1.5 min, after which the fish recovered in fresh aerated water for at least 10 min. This procedure had no obvious impact on behaviour and, in most cases, the marker remained in place for several days.

To analyse locomotion (see also Faure et al., 2003), recordings of the ventral view of the fish were obtained from below at 30 Hz using a digital camcorder (Canon Optura; Canon USA, Jamesburg, NJ, USA). Each recording session started 30 s after the fish was introduced into the experimental tank and lasted 15 min. Video capturing software (Adobe Premiere; Adobe Systems Inc., San José, CA, USA) was used to subdivide a recording session into three 5-min trajectories. Five such recording sessions, each obtained on a different day, were collected from five fish and used to construct a library of 75 trajectories.

### Data analysis

Commercial motion analysis software (WinAnalyze; Mikromak GmbH, Erlangen,
Germany) provided frame-to-frame data on the *X* and *Y*
position of the markers (Fig.
1A). Only the central marker data were used for the calculations
reported in this paper. The *X* and *Y* position data as
functions of time were used as primary data for the multivariate analysis
described below. The five nonlinear measures chosen for this study were
computed with software of our own construction; the mean velocity was then
taken as the 5-min average of the instantaneous velocity
[(*dx*/*dt*)^{2}+(*dy*/*dt*)^{2}]^{0.5}.

A pragmatic criterion for choosing each nonlinear measure was that it should provide a quantitative value that could be assigned to a trajectory, allowing for statistical comparisons between groups of data. A brief description of these nonlinear measures is presented here; a more detailed mathematical description, including additional references to the primary literature, is given in Rapp et al. (2002).

#### 1. The characteristic fractal dimension (CFD)

The CFD measures the degree to which a trajectory departs from a straight-line path (Katz and George, 1985; Katz, 1988). It is a measure of the total distance travelled from point to point (or frame to frame) relative to the maximum separation of any two points in the series. In other words, it is an approximation, equal to the distance travelled divided by the diameter of the experimental tank. It is sensitive to the duration of the observation period and to the speed of motion (see Rapp et al., 2002). It has a minimum value of 1 but does not have an upper limit. Since, in the present application, the fish is swimming in a cylindrical tank, a circular motion of constant velocity would be equivalent to a straight line. As the trajectory deviates from circular motion, the CFD increases. This measure has been used to analyse a variety of complex geometrical patterns (Rinaldo et al., 1993; Rodriguez-Iturbe and Rinaldo, 1997).

#### 2. The Richardson dimension (D_{R})

The D_{R} is also an estimate of the degree to which a trajectory
departs from a straight line (Richardson,
1960; Mandelbrot,
1983). In contrast with the CFD, D_{R} also quantifies how
the estimate of a curve changes with the precision of the measurement. It is
an example of the generic class of dimension measures that have been applied
to the analysis of the classical problem of fractal geometry, namely `How long
is the coast line of Britain?'
(Mandelbrot, 1967). Stated
operationally, for a fixed step length one counts the number of steps required
to walk around the coast (or, as in our application, along the fish's
trajectory). The length of the stride, i.e. the distance covered with each
step, is then reduced and the number of steps required using this new step
length is determined. The process is repeated and the log of the number of
steps required is plotted as a function of the log of the step length. Thus,
D_{R} is a measure for scale invariance. The slope of this curve is
used to determine D_{R}. As for the CFD, a value of 1 is obtained from
a straight line. The value of 2 is the maximum possible D_{R} and it
represents a theoretical limit when a trajectory covers the entire
two-dimensional surface. Given the differences between the factors influencing
the CFD and D_{R}, they can diverge.

Measures of fractal analysis comparable to CFD and D_{R} have been
used to describe behavioural sequences, such as swimming and foraging in
clownfish (Coughlin et al.,
1992), trails in mites (Dicke
and Burrough, 1988), reproductive behaviour in fathead minnows
(Alados and Weber, 1999),
social behaviour in chimpanzees (Alados and
Huffmann, 2000) and head lifting during feeding behaviour in ibex
(Alados et al., 1996).

#### 3. The Lempel–Ziv complexity (LZC)

The LZC is a sequence-sensitive measure that characterizes the structure of time-varying signals as a series of symbols (Lempel and Ziv, 1976; Ziv and Lempel, 1978). The spatial difference in the fish's position between two consecutive points in time is compared, generating a time series of incremental distance travelled. This distance function is simplified by partitioning it into a binary symbol sequence about the median increment size. For example, a typical sequence might be `aabaabbab' where `a' symbolises values less than the median and `b' symbolises those greater. Then, the LZC is calculated for the resulting symbol sequence. It reflects the number of sub-strings in the sequence (e.g. aab) and the rate at which they occur. This measure will therefore give information about the redundancy (or lack thereof) of a trajectory, for example about the irregularity of its velocity. Kurths et al. (1995) used this method for analysing heart rate variability in an investigation of predictors of sudden cardiac death, while Gu et al. (1994) and Xu et al. (1998) found differences in the electroencephalograms (EEGs) of healthy controls and psychotics with symbolic dynamics. Subsequently, it was shown that the complexity of multichannel EEGs of healthy controls is sensitive to changes in behaviour (Watanabe et al., 2002; this reference includes a review of the associated literature). The value of LZC increases approximately linearly with the number of measurements in the time series and attains a maximum with random numbers (Rapp et al., 2001a). For data sets of the length used in this study (9000), a maximum of approximately 700 would be expected.

#### 4. The Hurst exponent (HE)

The HE measures persistence – the tendency of large displacements to
be followed by large displacements (e.g. an increase is followed by an
increase) and small displacements to be followed by small displacements–
and anti-persistence, which is the tendency of large displacements to
be followed by small displacements (e.g. an increase is followed by a
decrease) and *vice versa* (Hurst,
1951; Hurst et al.,
1965; Feder, 1988;
Bassingthwaighte et al., 1994).
In other words, it describes how deterministic a trajectory is, i.e. the
extent to which a future component of the trajectory is specified by
components of its past. Theoretically, its range of possible values is 0 to 1,
with 0.5 as the crossover point between anti-persistence and persistence
(since it is estimated from the log–log plot of variability
*versus* epoch length, uncertainty in curve fitting can expand this
range slightly). An HE of 0.5 would be obtained if the trajectory was
indistinguishable from a random walk. Biological applications of the HE have
included investigations of heart interbeat interval sequences
(DePetrillo et al., 1999;
Sherman et al., 2000) and
pulmonary dynamics (Zhang and Bruce,
2000).

#### 5. Relative dispersion (R. Disp.)

R. Disp. measures the dependence of signal variance on the duration of the
dataset. It ranges from 1.0 to 1.5
(Boffetta et al., 1999) and
quantifies the change in the uncertainty in a time series' mean value as the
observation period increases. Practically, the R. Disp. is the slope of the
linear region of a log–log plot of the coefficient of variation of a
signal *versus* the length of the data set. Its primary applications
have been in the analysis of the physics of turbulent flow
(Pedersen et al., 1996;
Willis et al., 1997) but it
has also been used in the quantitative characterization of pulmonary perfusion
(Klocke et al., 1995;
Capderou et al., 2000).

All of the algorithms used to calculate these measures are sensitive to
noise in the data, non-stationarities in the underlying dynamics and the
temporal duration of the examined epoch. For example, filtered noise can mimic
low-dimensional chaotic attractors (Rapp
et al., 1993) and, if inappropriately applied, the method of
surrogate data (which is used to validate dynamical calculations) can give
false-positive indications of non-random structure (Rapp et al.,
1994,
2001b). These are central
concerns if one is trying to establish the absolute value of one of these
measures, such as the true value of the D_{R}. However, this is a less
crucial consideration in the present investigation because we do not presume
to calculate the value of any measure in an absolute sense. Rather, we are
computing approximations of these empirical measures, which nonetheless may be
of value in the classification of these signals. The efficacy of these
computed values in the classification was assessed quantitatively in the
course of the discriminant analysis, as described below.

### Discriminant analysis

A multivariate discrimination was constructed to ask specific questions
about the behavioural data. For example, can locomotor performance be
distinguished between individual fish? For this purpose, each swimming
trajectory was represented by its set of values calculated for the five
nonlinear measures described above plus its mean velocity. Since it was
possible that no measure alone would provide consistent results for such
discrimination, all the measures were incorporated into the discriminant
analysis and then their relative contributions to the classification process
were assessed, as described in the Results section. The discriminant analysis
is thus based on these six measures, and calculations were made between the
sets of values defining individual trajectories in a matrix consisting of a
six-dimensional space. All statistical procedures used are explained in
mathematical detail by Rapp et al.
(2002).
*P*_{SAME}(Fish A, Fish B) is defined as the probability that
the six-dimensional measurement distributions corresponding to Fish A and Fish
B were drawn from the same parent distribution. The estimate of failure in a
pairwise discrimination is *P*_{ERROR}(Fish A, Fish B). This is
the theoretically estimated probability that a trajectory from Fish A will be
incorrectly classified as a Fish B trajectory and *vice versa*. Note
that *P*_{ERROR} is not the same as *P*_{SAME}
and can be much larger. For example, a previous report
(Rapp et al., 2002) included
an example in which *P*_{SAME}=3.2×10^{–13}
while *P*_{ERROR}=0.32, which is relatively large, given that
the maximum possible error in a pairwise discrimination (the error rate
corresponding to random assignment) is *P*_{ERROR}=0.5. A
disparity between *P*_{ERROR} and *P*_{SAME}
occurs because they address different questions. *P*_{SAME}
determines if the means of two multivariate distributions are significantly
different. For cases where only one measure is used, *P*_{SAME}
is identical to the probability calculated in a *t*-test.
*P*_{SAME} can be very small even when the two distributions
overlap. However, if the distributions do overlap, which is the case here,
there can be considerable error in a between-group classification.

Two classification criteria were used for *P*_{SAME} and
*P*_{ERROR}. The first classification is based on the minimum
Mahalanobis distance (Lachenbruch,
1975). In the context of the six-dimensional measure space, the
Mahalanobis distance is a generalized mathematical distance between the vector
from the single trajectory that is to be classified and the collection of
measure vectors calculated from all of the trajectories obtained from one of
the fish. The test trajectory is deemed to be a member of the group
corresponding to the smallest Mahalanobis distance. The second procedure for
classifying a trajectory is based on the Bayesian likelihood
(McLachlan, 1992). The
trajectory's vector is classified into the group corresponding to the maximum
Bayesian membership probability. Both classification schemes incorporate a
correction for correlations between the measures, ensuring that dynamically
similar measures do not bias the classification results. In practice, the two
procedures usually give identical results. Cases where results differ
correspond to classification with low confidence levels. Finally, as the
descriptive analysis did not reveal consistent time-dependent differences
between three successive 5-min trajectories for most measures, this variable
was not incorporated into the discriminant analysis.

A distinction should be made between the out-of-sample classifications used in this study and within-sample classification. When an out-of-sample classification is performed, the trajectory to be classified is removed from the library before the classification was calculated. For this reason, the error rates of classifications are always greater than, or at best equal to, the error rates obtained using within-sample classifications, where the trajectory to be classified remains in the library during the calculation. If the number of elements in each group is small (here, there are 15 trajectories for each fish), the disparity between within-sample and out-of sample classifications can be large. A comparison showing how within-sample classifications can give unrealistically optimistic results is given in Watanabe et al. (2002).

## Results

### Characterization of swimming trajectories

A representative trajectory (Fig.
1A) is characterized by a predominance of swimming along the
circumference of the cylindrical tank (`wall hugging' effect;
Warren and Callaghan, 1975;
Steele, 1983;
Kato et al., 1996), which is
occasionally interrupted by swimming across the centre and by changing speed
(very fast swimming is indicated by a clear separation between successive data
points) and/or direction. Periods of fast swimming were observed as swimming
in circles along the wall, without significant change in direction, and as
occasional fast sprints across the centre of the tank. Additionally, fish did
not only swim forward but sometimes propelled themselves backward, which is
not obvious with visual inspection of a trajectory. In general, a trajectory
gives the impression of moderate irregularity. However, there are also
restricted areas signalled by path components of higher density, mostly along
the wall, and visual inspection of the video tapes suggests they correspond to
small turning movements of the fish while facing the wall or to periods of
inactivity. Swimming in the centre occurs in a different way as the fish swims
more calmly and slowly without generating a dense accumulation of path
components. The instantaneous velocity calculated from the trajectory in
Fig. 1A is shown as a function
of time in Fig. 1B. It reveals
high variability within the 5-min recording period. During this epoch, the
instantaneous velocity of this trajectory ranged from 0 mm
s^{–1} to 460 mm s^{–1}, with a mean value of
49±45 mm s^{–1} (mean ± s.d.). The
velocity trace displays several fast bursts, prolonged periods of slower
swimming and periods of inactivity. For this trajectory, the characteristic
fractal dimension (CFD) is 1.609, indicating that the trajectory is not
straight (if straight, CFD=1). The Richardson dimension (D_{R}) is
1.002, which would appear to suggest minimal deviation from a straight line
and therefore to be in conflict with the CFD. As previously mentioned, the
D_{R} additionally incorporates sensitivity to the measurement scale
while the CFD depends upon the duration of observation. The two measures can
also diverge in the case of noisy data or data digitised over a small range of
values. However, repeated analysis of individual trajectories indicated that
the measurements are not compromised by noise or a limited range. The
Lempel–Ziv complexity (LZC) of this trajectory is 242. Since the
expectation for a purely random trajectory is approximately 700, this result
therefore indicates that velocity does not vary randomly. The Hurst exponent
(HE) is 0.938, indicating that the trajectory is highly persistent; in other
words, its components are determined, or preserved, and thus the trajectory
corresponds to uniform or consistent motion. Finally, the relative dispersion
(R. Disp.) is 1.188. This value is close to the midrange of this measure and
indicates that the mean value of the time series is relatively stable as a
function of time.

Swimming trajectories of different fish are dissimilar in appearance
(Fig. 2). The distribution of
path components in the centre *versus* the periphery of the tank seems
to be most variable. For example, the three consecutive 5-min trajectories of
Fish 2 in Fig. 2A show more
time spent in the centre of the tank than do the three consecutive
trajectories of Fish 5 in Fig.
2B, which indicate relatively little time spent in the centre or
traversing it. Instead, in Fig.
2B there is rather more accumulation of path components near the
wall, sometimes forming dense patches, which are not seen in the trajectories
of Fig. 2A. In addition, there
is session-to-session variation in an individual fish's trajectories, as seen
by comparing the first (Fig.
2B) and fifth (Fig.
2C) recording sessions of Fish 5. In the fifth session, there is a
greater tendency to explore the centre than in the first session. This
difference is reflected in a significant difference between the mean
velocities of the two 15-min sessions (62.6 mm s^{–1}
*vs* 58.8 mm s^{–1}; *P*<0.002). Also, there is
greater variability between the three successive trajectories of the fifth
session than between those in the first session; indeed, the third 5-min
trajectory of Fig. 2C more
closely represents those of the first session (2B) than the two trajectories
preceding it, as it is denser at the periphery and exhibits a smaller number
of excursions to the centre of the tank.

An initial overall impression of the nonlinear dynamical analysis can be
obtained by determining the range and variability of each measure determined
across all five fish. These results are stated in
Table 1. The coefficient of
variation, CV=(s.d./mean)×100%, provides a quantitative
characterization of the degree of spread in the observed dynamical measures. A
high degree of variation is observed for some measures. The mean velocity has
the highest CV (30.2%) and a nearly 10-fold range in values, and the CV of the
LZC is also high (25.7%). By contrast, the CVs of the CFD, the D_{R}
and the R. Disp. are less than 7%. With the exception of D_{R}, the
mean values of the nonlinear measures are all consistent with properties of a
complex dynamical behaviour. The data summarized in
Table 1 are displayed
separately for each fish in Table
2. The latter results were obtained by averaging the values from
all recording sessions (five per fish) and all trajectories (three for each
recording session). Appreciably different values were obtained for each fish.
Nevertheless, given the large s.d.s, the between-fish distributions
overlap. Mean velocity values are similar for Fish 2 and Fish 4 and for Fish 3
and Fish 5. This pattern was repeated for two of the nonlinear measures, CFD
and D_{R}, but not for the other three. In general, there did not seem
to be a consistent relationship between the mean values of different
parameters and individual fish, suggesting that the measures, which, with the
exception of mean velocity, are empirical, reflect different properties of the
swimming trajectories.

Three of the six measures have time-dependent changes during the 15-min
recording periods. Mean velocity decreased by 77% from the first to the last
5-min recordings (from 58.41 mm s^{–1} to 45.01 mm
s^{–1}), and the mean CFD decreased by 5% from 1.62 to 1.54. By
contrast, the average LZC increased by 15% from 214 to 248, while the other
measures did not change appreciably. Since the data were pooled for multiple
exposures of the five fish, a repeated-measures analysis of variance (ANOVA)
was used to ask if there were significant changes in a given measure between
the three subsequent 5-min epochs of a 15-min recording session. The results,
shown in Table 3, indicate
significant differences (*P*<0.015) between subsequent 5-min
trajectories for mean velocity and CFD. Also, in the case of LZC, the first
5-min trajectory was significantly different from both the second and third
ones. These time-dependent changes in the six measures relative to each other
during a 15-min recording are illustrated in
Fig. 3. Values for each measure
are normalized with respect to the corresponding values obtained in the first
5-min trajectory. The repeated-measures ANOVA was also used to ask if there
were differences between the five subsequent sessions in which data were
collected from each fish, and the results were negative. Since the changes
that occurred within a 15-min recording session were minimal, the discriminant
analysis did not treat successive 5-min trajectories separately.

### Discriminant analysis classifies individual fish

Three questions were addressed in the discriminant analysis: (1) based on
the application of these six dynamical measures, would it be possible to
conclude that the five fish are different; (2) given a trajectory and its
dynamical characterization, would it be possible to correctly determine which
fish produced the trajectory and (3) of the six measures used, which ones were
the most effective in discriminating between different fish? These questions
were addressed by performing a discriminant analysis based on the six
measures, with each fish providing a total of 15 trajectories. For this
analysis, no distinction was made between first, second and third 5-min
trajectories. Using these measures, we calculated
*P*_{SAME}(Fish A, Fish B), which is the probability that the
six-dimensional measurement distributions corresponding to Fish A and Fish B
were drawn from the same parent distribution (see Materials and methods). The
results from the 10 possible pairwise discriminations are shown in
Table 4. As an example from
that table, it is seen that *P*_{SAME}(1,
2)=0.19×10^{–5}; that is, the probability that Fish 1 and
Fish 2 trajectories were produced by the same fish is
0.19×10^{–5}. We conclude that Fish 1 and Fish 2 have very
different dynamical profiles. The largest value of *P*_{SAME}
is *P*_{SAME}(3, 4)=0.9×10^{–2}. While Fish
3 and Fish 4 are the most similar, even in this case the probability that
these trajectories were obtained from the same fish is less than 1%. Given the
very low value of *P*_{SAME}, it might be supposed that a
classification of a single trajectory amongst the five fish would be highly
accurate. However, this is not necessarily the case.

*P*_{ERROR} is a theoretical prediction of the pairwise
classification error, using the between-group Mahalanobis distance. In the
present study, using six measures, the theoretical *P*_{ERROR}
for the 10 pairwise calculations was less than 0.07 in eight cases and ranged
from 0.003 (Fish 2 *vs* Fish 4) to a maximum of 0.1118 (Fish 3
*vs* fish 4).

The error rate also can be determined empirically by performing a classification. The results of an out-of-sample classification are shown in Table 5 for both minimum Mahalanobis distance and maximum Bayesian likelihood criteria, respectively. For example, the entry 13/12 in the Fish 1–Fish 1 box means that 13 out of 15 Fish 1 trajectories were classified as Fish 1 using the minimum distance criterion and 12 were correctly classified as Fish 1 using the maximum likelihood criterion. The entry 2/3 in the Fish 1–Fish 5 box means that two Fish 1 trajectories were classified as Fish 5 using minimum distance and three Fish 1 trajectories were classified as Fish 5 using maximum likelihood as the criterion. Thus, more than 75% of the trajectories from Fish 1, 2 and 5 were correctly classified with both criteria. Also, a comparison based on mean velocity alone suggests similarities between Fish 1 and 4 and between Fish 3 and 5; the discriminant analysis, which uses six measures, does not often confuse these fish.

The expectation error rate is the error rate that would be observed if the classifications were performed randomly. There are five fish. If trajectories were assigned randomly, four out of five trajectories would be misclassified. This gives an expectation error rate of 80%. For these data, the overall error rate using minimum Mahalanobis distance as the classification criterion is 36%. The overall error rate using the maximum Bayesian likelihood is 28%.

The third question to be addressed with discrimination analysis asked, `of
the measures used, which were the most effective in discriminating between
different fish?'. This question is not easily answered when there are five
groups (five fish) as opposed to only two. In the case of a pairwise,
two-group comparison, a measure's coefficient of determination establishes the
amount of total between-group variance that can be accounted for by the
measure (Flury and Riedwyl,
1988). Then, the larger a measure's coefficient of determination,
the more effective it is in discriminating between groups. A large coefficient
of determination corresponds to a large between-group Mahalanobis distance
(specifically, the partial derivative of the coefficient of determination with
respect to the Mahalanobis distance is positive). The effectiveness of the six
measures in the 10 pairwise between-group discriminations has been assessed
empirically. Table 6 gives the
rank ordering of the coefficients of determination for each measure for each
pairwise discrimination (ordered from the largest to the smallest). For
example, when Fish 1 and Fish 2 are compared, the HE is most effective in
discriminating between the two groups while the D_{R} is the least
effective. When the rank ordering of the 10 pairwise discriminations is
compared, none of the measures stands out as being exceptionally effective.
However, if the rank order is treated as a score for each pair, the data
indicate that the D_{R} and the HE have the lowest cumulative scores,
suggesting they are the most effective. Interestingly, the mean values of
these two measures (Table 1)
are consistent with trajectories that are relatively stable or determined
(i.e. mean of HE=0.82 indicates a high degree of persistence and mean
D_{R}=1.06 indicates high similarity to a straight line trajectory).
The lack of a consistent pattern in the results presented in
Table 6 is not surprising,
since our results established that the fish trajectories are highly
individualistic (Table 4) using
a statistic, *P*_{SAME}, that combines all six measures.
Another approach for obtaining an estimate of the comparative effectiveness of
each dynamical measure is to calculate each measure's average coefficient of
determination, taking the average over the 10 pairwise discriminations. These
average values are shown in Table
7 and again suggest that D_{R} and the HE are the most
effective measures when used alone.

## Discussion

The results demonstrate that a set of nonlinear measures can be used in a discriminant analysis, or classification system, to distinguish between swimming trajectories of individual fish. That is, any two trajectories generated from different fish are distinguishable with a high confidence level. This discrimination is possible only when those nonlinear measures, along with the linear measure mean velocity, are applied collectively, as no single measure has a high coefficient of determination. The results also show that the nonlinear measures used here potentially provide a perspective on a basic behaviour, swimming in a sparse environment, that complements insights obtained with more classical kinematic measures. In general, the values for the different measures suggest that swimming is not purely random but is rather complex, with detectable redundancy.

### Interpretation of fish locomotion with nonlinear measures

Although they are empirical, the tools of nonlinear dynamical analysis are increasingly being used in the analysis of biological phenomena (Faure and Korn, 2001; Giesinger, 2001), including continuously recorded behavioural sequences. One rationale is that, since these measures are sensitive to the spatio-temporal structure of a sequence, they might reveal hidden structures in those continuous signals. Indeed, studies have shown that the examination of behavioural data that appeared to be random can reveal highly non-random components when analysed with sequence-sensitive nonlinear measures. For example, a number of behaviours have been described as fractal, from spontaneous locomotion (Dicke and Burrough, 1988; Coughlin et al., 1992; Motohashi et al., 1993; Cole, 1995) and foraging (Alados and Weber, 1999) in diverse species to social behaviour in chimpanzees (Alados and Huffman, 2000) and feeding-related activities in goats (Alados et al., 1996). Related tools have also been used to successfully analyse the pattern of transitions between periods of active swimming and inactivity (Faure et al., 2003). As discussed below, this type of analysis might be effectively employed to reveal subtle changes in locomotion not revealed with classical means.

The five nonlinear measures applied in the present study are empirical measures of complexity of swimming behaviour, and each reduces a trajectory into a single value. With the exception of the Richardson dimension, the values of these nonlinear measures are consistent with the notion that goldfish swimming in even a relatively sparse environment is a mixture of random and nonlinear deterministic activities. Their empirical nature may explain the finding that two of the measures, the characteristic fractal dimension and Richardson dimension, which are expected to reflect similar properties, often diverged.

The degree of complexity exhibited in locomotor behaviour and other
behavioural patterns can depend on the environment
(Coughlin et al., 1992;
Motohashi et al., 1993;
Anderson et al., 1997). Spatial
and temporal complexity of foraging trajectories, for example, can be
correlated to the pattern of occurrence of food sources
(Cole, 1995;
Viswanathan et al., 1996).
Similarly, some bird species exhibit nonlinearities in vigilance behaviour
(Ruxton and Roberts, 1999),
and correlations have been drawn between fractal complexity and the ability to
cope with the environment, such as in the presence of toxins or stress
(Alados et al., 1996;
Alados and Weber, 1999;
Alados and Huffman, 2000). One
can thus speculate that fish exposed to an environment more heterogeneous than
that used in the present study would generate swimming trajectories with
higher values of CFD and D_{R}, indicative of a more fractal nature.
Such an experimental design would give more insight into what extent the
environment might influence the nonlinear properties and their underlying
components.

The nonlinear measures and discriminant analysis employed here may then be applied to detect subtle changes in behavioural sequences altered by changes in the environment. Fish behaviour is increasingly important in toxicology, and it has already been shown that fractal dimension could serve as a sensitive measure for quantifying differences in locomotor activity during sublethal exposure to toxic contaminants (Motohashi et al., 1993; Alados and Weber, 1999; Brewer et al., 2001). The application of multiple measures, including a linear one, may well enhance such discriminations. Indeed, preliminary data, obtained using this methodology to distinguish swimming trajectories of goldfish exposed to low dosages of Malathion, a pesticide and neurotoxin, confirm this expectation (Neumeister et al., 2001).

Exposure to a novel environment for a continuous period or for several discrete periods will, in general, result in a gradual decrease of locomotor activity over the course of several days or weeks (Russell, 1973; Warren and Callaghan, 1976; Clark and Ehlinger, 1987). Novelty represents a potentially stressful situation (Russell, 1973; Csányi and Tóth, 1985; Gervai and Csányi, 1986). For example, male guppies initially show high velocity swimming at the periphery of an open field, and it has been suggested that this activity is related to some degree of fear (Warren and Callaghan, 1976). In the present study, a relatively small but significant decrease during the 15-min period was not only detected in mean velocity but also in CFD and Lempel–Ziv complexity. The results in the CFD are consistent with reports that fractal dimension decreases in conditions characterized as stressful (Alados et al., 1996; Alados and Weber, 1999; Alados and Huffman, 2000). Nevertheless, this modification with time can be subtle, and it remains to be seen if further development of the discriminant analysis would benefit by treating successive 5-min trajectories separately.

### Classifying trajectories

Multivariate discriminant analysis, which allowed us to classify swimming trajectories to the fish that generated them, has a long and successful history in the physical and biological sciences (Lachenbruch, 1975; McLachlan, 1992). The combination of discriminant analysis with nonlinear measures is, however, comparatively recent (Rapp et al., 2002; Watanabe et al., 2002). In the present study, a discriminant analysis based on six measures was used to characterize between-group differences and to classify individuals amongst the groups, with each fish defining its own group. Five fish were used and five recordings consisting of three consecutive 5-min trajectories were obtained from each fish. Thus, in the language of discriminant analysis, there are five groups, 15 elements in each group and six-dimensional measure space.

As outlined above, we addressed a sequence of three questions. First, we
asked if we are able to conclude that the fish are different, computing
*P*_{SAME} for each pair of fish. Although direct visual
observation of the fish did not suggest that their swimming behaviour was
dramatically different, the calculations of *P*_{SAME} indicate
that trajectories are highly individual, and each fish has a very different
swimming profile.

We then addressed the problem of classification of individual 5-min
trajectories among the five possible groups, by calculating
*P*_{ERROR} for each pairwise classification. As expected (see
Results), *P*_{ERROR} is larger than *P*_{SAME},
with an average value of 5.7%. However, *P*_{ERROR} is a
theoretical estimate of the error in a pairwise classification based on the
between-group Mahalanobis distance
(Lachenbruch, 1975). An
empirical test of this classification was produced by computing an
out-of-sample classification that used the minimum individual-to-group
Mahalanobis distance as the classification criterion. It gave an error rate of
36%, in contrast to the expected error rate obtained with random assignment of
80%. The error rate using maximum Bayesian likelihood as the assignment
criterion was even less, 28%.

It might seem surprising that, while the average *P*_{ERROR}
is 5.7%, the empirically determined classification error rate is greater. Yet,
*P*_{ERROR} is the predicted error rate in a single pairwise
classification. The empirically determined error rate is more appropriately
compared against a classification procedure based on a sequence of pairwise
classifications in which several individual pairwise errors accumulate to
produce the overall result. When the distinction between pairwise and global
error is taken into account, it is seen that the error rates are similar.

The third question concerned the identification of the measure or measures
that were most successful in discriminating between fish. This was
investigated by calculating the coefficient of determination in each pairwise
classification for each measure. The results indicated that no single measure
emerged as the most effective. However, it was possible to conclude that the
nonlinear measures were more effective than the mean velocity, with the most
effective being the HE and D_{R}, values which are consistent with the
general conclusion that fish swimming in a sparse environment have a
relatively low degree of complexity.

It should be recognized that the ability to classify any given trajectory is limited. To introduce an analogy, we can prove that fingerprints are highly individual but we can't usually base a positive identification on a single fingerprint. We should point out that these conclusions are dependent on the measures used in this study. The application of additional measures to these data might result in an improvement in the classification calculations. Thus, the results presented here are, in a sense, a worst-case calculation.

### Individuality

We have found that the discriminant analysis using swimming trajectories and nonlinear dynamical measures established in a convincing manner that fish locomotion is highly individualistic. Recent ethological and psychological studies have revealed individual differences in many species (Clark and Ehlinger, 1987; Bell, 1991; Mather and Anderson, 1993; Boissy and Bouissou, 1995). As already mentioned, most of these studies concerned higher order behaviours. To our knowledge, idiosyncratic variability in fish swimming has not been the subject of previous investigations, although it has been noted (Kleerekoper et al., 1974). Locomotion serves a range of behaviours in fish, including exploration, foraging and social interactions. Individuality in these behaviours can be expected to benefit survival of individuals and, therefore, of the population. For example, it may increase access to food sources by enhancing the search efficiency of shoaling fish (Gotceitas and Colgan, 1988; Colgan et al., 1991). Additionally, it can provide a competitive advantage to some individuals, such as the dominant ones within a hierarchy based upon boldness (Budaev, 1997; Wilson et al., 1993). Again, this would contribute to the fitness of the population by guaranteeing survival of individuals in the case of limited resources (Magurran, 1986a; Gotceitas and Colgan, 1988). Thus, the variations observed here may have functional relevance.

Three categories of mechanisms have been proposed to underlie behaviours that are unique to one individual as opposed to another, namely a variable environment, social effects and phenotypic variability (reviewed in Magurran, 1986a). In that context, the present study was designed to quantitatively characterise swimming of one fish alone in a sparse and constant environment, minimising any affective contribution to the resulting pattern. The results demonstrate that, with the appropriate analytical tools, it is possible to conclude that this elementary behaviour exhibits individuality. Thus, we suggest that this property reflects phenotypic differences of either genetic or experiential origin. Such differences are not simply related to environmental conditions, body size or sex, as these factors were controlled in this study. Rather, they may be embedded in underlying intrinsic processes. It has been suggested that a population benefits from varying phenotypes, or differences in individuals, by being better adapted to environmental conditions (Clark and Ehlinger, 1987). In this context, it would be interesting to know how the individuality observed in the present study would change in other conditions, such as a more heterogeneous environment or one requiring social interactions.

## ACKNOWLEDGEMENTS

The authors thank I. Cantave and N. Gianattassio for their contributions in data acquisition and analysis. We also thank H. Eckholdt for his valuable advice and assistance with both theoretical and practical statistical analysis. P.E.R. and C.J.C. thank Tanya Schmah of the Mathematical Institute, Warwick University for essential leadership in the implementation of the nonlinear measures and during the development of the discriminant analysis system. This work has been sponsored by the Defense Advanced Projects Agency (DARPA), contract No. N66001-00-C-8012.

- © The Company of Biologists Limited 2004