SUMMARY
Often experimental scientists employ a Randomized Complete Block Design (RCBD) to study the effect of treatments on different subjects. Under a `complete randomization', the order of the apparatus setups within each block, including all replications of each treatment across all subjects, is completely randomized. However, in many experimental settings complete randomization is impractical due to the cost involved in readjusting the device to administer a new treatment. One typically resorts to a type of `restricted randomization', in which multiple subjects are tested under each treatment before the apparatus is readjusted. The order of the treatments as well as the assignment of subjects to each block are random. If the data obtained under any type of restricted randomization are treated as if the data were collected under an RCBD with complete randomization within each block, then there is potential to increase the risk of false positives (Type I error). This is of concern to animal orientation studies and other areas such as chemical ecology where it is impractical to reset the experimental device for each subject tested. The goal of the research presented in this article is twofold: (1) to demonstrate the consequences of constructing an Fstatistic based on a mean square error for testing the significance of treatment effects under the restricted randomization; (2) to describe an alternative method, based on splitplot analysis of variance, to analyze designed experiments that yield better power under the restricted randomization. The statistical analyses of simulated experiments and data involving virgin male Periplaneta americana substantiate the benefits of the alternative approach under the restricted randomization. The methodology and analysis employed for the simulated experiment is equally applicable to any organism or artificial agent tested under a restricted randomization protocol.
 analysis of variance
 false positive
 false negative
 mean square error
 olfaction
 pheromone
 randomized complete block design
 restricted randomization
 splitplot
 type I error
Introduction
In designed experiments, one measures responses on multiple experimental subjects with the goal of analyzing the effect of changes in controlled experimental conditions, or treatments, on the responses of different subjects. In laboratory experiments, one controls for extraneous conditions, to ensure each experimental run is conducted under similar conditions, so that one is reasonably confident that consistent differences in response are caused by the treatments. However, it is not always possible to ensure that all extraneous conditions are properly controlled. To reduce the effects of extraneous conditions, two techniques are commonly employed: randomization and blocking.
Blocking refers to the division of experimental runs into smaller subgroups, or blocks. Each treatment is applied randomly to a number of subjects within each block. This design, known as a Randomized Complete Block Design (RCBD), is commonly employed in biological experiments, where, for example, experimental runs on a given day may be treated as a block (Sokal and Rohlf, 1981). The randomization protocol reduces any bias in favor of particular treatments, while the blocking enables extraneous variation to be absorbed into block effects. Consequently, one obtains better estimates of treatment effects and more powerful tests for treatment differences (Cochran and Cox, 1957).
In the RCBD, the application of treatments to subjects within a block must be completely randomized. If a treatment is applied to five subjects within a block, then the subjects are chosen randomly within the block. However, in many experimental designs, practical constraints prevent this ideal situation from being realistic. The motivation for this research arises from insect pheromonetracking studies, where a pheromone plume is generated at one end of a wind tunnel. An insect begins at the other end of the tunnel and is challenged to track the plume to the pheromone source. The goal is to detect differences in the response of an insect to different types of plumes or treatments (Willis and Baker, 1984; Linn et al., 1988a,b; MafraNeto and Cardé, 1998; Zanen and Cardé, 1999; Cardé and Knols, 2000; Dekker et al., 2001; Willis and Avondet, 2005). The effect of various types of formulated synthetic pheromone on different species of walking and flying insects has been studied by various researchers (Linn et al., 1988a,b; Willis and Arbas, 1991; Linn et al., 1996; Cardé et al., 1998; Willis and Avondet, 2005).
Whenever the complete randomization protocol is violated, we refer to the corresponding design framework as `restricted randomization'. One such type is illustrated in Fig. 1. Forinstance, in chemoorientation studies, changing the treatment typically involves dismantling and reconfiguring the experimental device, which is often not practical after each experimental run. A more practical approach often undertaken by experimental scientists is: within each block, multiple subjects are challenged to a single treatment before changing the device to administer the next treatment. Only the order in which the treatments are applied is randomized. By considering subjects in groups, the experiments could be conducted in a relatively short time span. This randomization protocol is a type of restricted randomization as illustrated in Fig. 1. We demonstrate the effects of a restricted randomization on the analysis and scientific conclusions using a computersimulated experiment and data involving virgin male Periplaneta americana.
Analysis of variance (ANOVA) is the fundamental tool for analyzing data from designed experiments (Cochran and Cox, 1957; Searle, 1971). Chapter 8 of Sokal and Rohlf (1981) emphasizes that ANOVA, while being an effective tool for any modern biologist, may create artificial constructs in the mind of a scientist that could lead to misleading conclusions. In subsequent sections of this article, this is demonstrated in the context of restricted randomization.
In many of the animal chemoorientation studies published over the past two decades (see for instance, Table 1), an RCBD or a related design was employed to analyze experimental data obtained under the restricted or other modified randomization protocols. One such modified experimental design was employed by Linn et al. (1988a), in which some treatment effects were confounded with block effects. This might be one of the reasons for obtaining nonsignificance of the treatment effects. In a flight tunnel experiment, Linn et al. (1988b) challenged 5–10 males of each species of Trichoplusia ni and Pseudoplusia includens to the treatments at one of the two dosages. From their experimental design description it appears that there were multiple levels of blocking and a type of restricted randomization. They analysed the data using a oneway ANOVA, however, ignoring the block effects and restriction in randomization.
MafraNeto and Cardé (1998) utilized an RCBD to test the effect of treatments; however, it is not clear from the `Materials and methods' section whether their experimental design indeed satisfies the complete randomization protocol. From the `Materials and methods' section of Linn et al. (1996), it appears that they do not have a complete randomization protocol; however, they analysed their experimental data using an ANOVA (although not clear whether it was a oneway or twoway). The effect of light levels and plume structure on the orientation maneuvers of male Lymantria dispar (gypsy moths) flying along pheromone plumes has been studied by Cardé and Knols (2000). They used the flight tracks of 20 males per treatment (a total of six treatments), tested in a complete randomized block design. The goal was to study the effects of odor plume structure on the orientation maneuvers of different species of walking and flying insects (Willis and Baker, 1984; Baker, 1990; Cardé and Knols, 2000). Justus et al. (2002) employed an ANOVA; however, they do not state the details of the experimental design and the statistical analysis employed such as a oneway ANOVA or an RCBD. Vickers (2002) considered male Heliothis subflexa, which were flown in a wind tunnel to a variety of combinations of synthetic pheromone components admixed on a filter paper disk. Their experimental design violated the complete randomization protocol since groups of 3–5 males were flown under each treatment on any given day; however, they analysed the experimental data using an RCBD ANOVA.
The flight behavior of mosquitoes in hostodor plumes and the effects of the finescale structures of such plumes have been studied by Dekker et al. (2001). They considered seven treatments; each treatment had eight replicates and treatments were randomized within each test day. Zanen and Cardé (1999) applied an RCBD to test the treatments on a given day on male L. dispar; however, they did not employ a twoway ANOVA to analyse the resulting experimental data. Consequently, their analysis does not match the design. Furthermore, it is not clear whether their design satisfied a complete randomization protocol.
All the above analyses lead to an important question: how does violating the fundamental assumption of complete randomization affect the interpretation of experimental results or scientific conclusions? It is often very difficult (1) to assess the consequences of subtle modifications of the design on the resulting analysis and scientific conclusions and/or (2) to identify an alternative method of analysis corresponding to the randomization protocol scheme at hand by simply referring to the statistical literature (Cochran and Cox, 1957; Searle, 1971; Sokal and Rohlf, 1981).
The goals of this article are to provide insights into the statistical analysis issues embedded within designed experiments when practical constraints impose restrictions on randomization of the treatments. The statistical analyses of simulated experiments and data involving virgin male P. americana demonstrate the consequences of overlooking the restricted randomization on the scientific questions being addressed as well as the analysis and interpretation of the results. Our simulated experimental data, presented in the `Results' section, demonstrate that the RCBD incorrectly finds a highly significant treatment effect that was not present in the model, while failing to find the real effect present in the model. However, the risk of a false positive (Type I error) indication of the treatment significance is substantially reduced under the alternative analysis. In essence, by employing an RCBD when the underlying assumption is not satisfied, we are more likely to declare an effect exists when it does not. This has implications for the understanding of experimental results as well as the scientific conclusions. It is important to note that the methodology and analysis employed for the simulated experiment is equally applicable to any organism or artificial agent tested under a restricted randomization framework.
Applying an appropriate model to account for the changes in the design is relevant for two reasons. Violation of assumptions in a particular design could result in (1) underestimation of experimental error variance and (2) obtaining false positives (Type I errors). These in turn, may lead to incorrect results or invalidate the analysis employed by a researcher. Therefore, it is important to choose an appropriate model and error structure in considering designed experiments.
Materials and methods
The effect of a restricted randomization on the analysis of experimental data is best illustrated through a simulated experiment. In general, if there is a restriction on randomization at a given level in experimentation, there will be a `split' in the design, leading to a splitplot design nested within an RCBD structure. There are two types of experimental units: the larger units, the groups of subjects (for example, insects), are called the wholeplots and the smaller ones, individual subjects, the subplots. A splitplot design creates a nesting within the design structure since the wholeplots are nested within blocks and the subplots are in turn nested within the wholeplots. The design structure for the wholeplot experimental units is essentially an RCBD. A splitplot design has two advantages over a simpler ANOVA: (1) if treatment can be applied to the wholeplot at once, rather than separately to subplots, this may reduce costs, and (2) because subplots are usually more uniform, parameters measuring comparisons among conditions may be estimated more precisely.
Several examples of splitplot designs can be found in the biological literature. Linn and Roelofs (1983) considered a total of 100 treatments in a 5×4×5 factorial design such that two of the three factors were varied between days and one factor was varied within days. This design has a splitplot structure with days serving as wholeplots. The experiments of Linn et al. (1988a) consider two species, Grapholita molesta and Pectinophora gossypiella. They challenged 5–10 males to each treatment per day, with a total of 70 males for each treatment–temperature combination. Both the treatments and temperatures were randomized over the experimental period. This experimental design has a splitplot structure with unbalanced data. In both of these examples, the analysis was based on ANOVA and regression techniques, rather than a splitplot analysis.
Model for a hypothetical splitplot design
Every ANOVA is associated with a linear model specifying the effects being considered. The linear model for a splitplot ANOVA includes hierarchies of terms modeling both the block effects and treatment effects. The key concept in constructing models for splitplot designs is recognizing the different sizes of experimental units and consequently identifying the corresponding design structures and treatment structures.
Consider an experiment in which several treatments are administered to different subjects and the experiment is conducted over several blocks. Suppose that a biological experiment consists of subjects of different ages, challenged to various treatments such as pheromones on different blocks (for example, days). The age factor may be included in the model to determine the performance of behavior of an animal as it develops over time. The split plot design originated in agricultural field trials and, in this setting, one factor may be fertilizer and another factor irrigation level.
Consider a hypothetical design such that Y_{ijk} denotes a response measured on a subject at the kth (k=1,...,n) age when challenged to the jth (j=1,...,t) treatment in the ith (i=1,...,b) block. The simplest possible model (1) can be written as 1 where μ represents the overall mean, α_{i} is the block effect, β_{j} is the effect of the jth treatment applied at the wholeplot level, γ_{ij} is the wholeplot effect, andδ _{k} is the age effect. The term ϵ_{ijk} measures random error.
The scientific interest is in the treatment effect β_{j} and the age effect δ_{k}. The basic statistical problem is to detect the significance of these effects and estimate the size of any effects that are present. The effects β_{j} and δ_{k} are treated as fixed effects or parameters in the model. The block effectα _{i} and the wholeplot effect γ_{ij} are of no inherent interest; however, these can cause considerable variation from block to block and wholeplot to wholeplot. Therefore, these effects are treated as random. A typical assumption is that these effects follow distributions N(0,σ_{α}^{2}) and N(0,σ_{γ}^{2}) respectively, whereσ _{α}^{2} andσ _{γ}^{2} are unknown variance components corresponding to the block and wholeplot effects, respectively. The random errors ϵ_{ijk} are assumed to follow a N(0,σ_{ϵ}^{2}) distribution.
Simulated experiment: generation of data
We demonstrate how to specify a model for splitplot design and how to construct appropriate Ftest statistics through simulated experimental data. The design is generated as follows:

Sixteen runs, numbered 1 to 16, were performed for each of five blocks.

Each block was divided into four wholeplots of four runs. The four treatments, such as pheromone plumes, were randomly assigned to the four wholeplots.

Four subjects, one of each age, are assigned in a random order to the runs within each wholeplot.
The important feature of this design is that two factors, treatment and age, are being varied across experimental runs. The treatment is varied randomly at the wholeplot level, while the age is varied randomly at the level of individual runs or subplots.
After creating the experimental design, responses were generated according to the following model (2): 2 where subscripts i, j and k denote the block, treatment and age, respectively. (1) The overall mean is μ=100; (2) the block effectα _{i} has a normal distribution with mean zero andσ _{α}^{2}=25; (3) the δ_{k} term representing the age effect takes values 1,...,4 corresponding to subjects of four different age groups (for example, insects of 10 days, 20 days, 30 days and 40 days old, respectively); (4) the componentλ _{i}r_{ijk} represents drift of experimental conditions over time, with r_{ijk} taking the run number 1,...,16 of the (j,k) treatment combination in the ith block; (5) the coefficient λ_{i} follows a N(0,1) distribution; and (6) ϵ_{ijk} has a normal distribution with mean zero and σ_{ϵ}^{2}=1. That is, we assumed that different random effects contribute differently to the level of variation in the final measurement of Y_{ijk}. The statistical model used to generate the data includes an age effect, a random block effect and a random drift effect within the block. However, the model does not include any treatment effect β_{i} and, therefore, the responses are independent of the treatment employed.
It is important to emphasize that the model under which the simulated data is generated is different from the model. In particular, the experimental drift in the model corresponding to the simulation experiment does not match exactly the assumption of the splitplot model. This effect was intentionally introduced into the model, since in practice one does not know the precise form of any uncontrolled variation. It is important that the statistical analysis is robust to misspecification of this term. The data was generated using the statistical language S (Venables and Ripley, 2002) and the model (2) was fitted using the lm() function in S. The model fitting and data analysis can also be performed using R, a freely available open source statistical language available from http://www.rproject.org.
P. americana experiment
The P. americana experiment is an example of a splitplot design, characterized by multiple levels of blocking. The experiment involved 3–18 weeks old virgin males of P. Americana, which were challenged to track windborne plumes of (–)periplanoneB 2 h into their scotophase (12 h:12 h L:D cycle). The animals were video recorded as they tracked windborne plumes of the female sexpheromone (–)periplanoneB in a laboratory wind tunnel. Each videotaped walking path was digitized using a computerized motion analysis system.
Plume structure
For this experiment, four different plume structures were constructed by varying the size, shape and orientation of the pheromone source (Fig. 2). The point source plume was constructed using a 0.7 cm diameter circular filter paper disk (Whatman No. 1, Eastbourne, East Sussex, UK) held perpendicular to the airflow with an insect pin (Fig. 2A). A ribbon plume with a chemical source 0.05 cm wide was constructed by rotating the 0.7 cm filter paper disk 90°, so that the disk shape was parallel to air flow in the wind tunnel, resulting in a very narrow plume (Fig. 2B). The third plume was created by increasing the surface area of the source by ca. 25 times while also proportionally increasing the dosage of pheromone solution applied to the source. The wide plume treatment source was 14.3 cm wide ×0.7 cm tall (Fig. 2C). The cylinder plume structure was generated by placing a Plexiglas® cylinder (81.28 cm tall× 7.62 cm diameter) 5 cm upwind of the 0.7 cm diameter circular filter paper disk held perpendicular to airflow (Fig. 2D). The reader is referred to Willis and Avondet (2005) for further details on the materials employed in conducting this experiment.
The aim of this experiment was to test the hypothesis that male cockroaches steer their walking while tracking female pheromone using a chemotactic strategy characterized by counter turning (turningback) when they experience a sharp pheromoneclean air edge (Willis and Avondet, 2005).
Measurements
Response variables measured from the digitized insect movement tracks included: track angle (degrees), track width (cm), ground speed (cm s^{–1}), body axis (degrees), net velocity (cm s^{–1}), interturn duration (s), the number of times each animal stopped, and the duration of each stop (s). For the purposes of the analysis, Willis and Avondet (2005) considered (1) a turn as the location at which the head reached a local maximum or minimum value with respect to the lateral frame of reference of the wind tunnel, and (2) an animal to be in stopping position if there was no movement between two sequential positions of the head point. Measurement of these response variables from one animal was considered as one trial. The response variable is the average of the measurements for an entire walking track of a single animal.
The animals are expected to have peak response during a specific period in each scotophase, and only a limited number of experimental runs can be performed each day. The experiment was therefore carried out over 5 days. The design can be summarized is as follows:

The experiment was run over 5 days.

Each day was divided into four wholeplots. Each of the four pheromone treatments were randomly assigned to one of the wholeplots.

Within each wholeplot, five animals were tracked. Each animal or subplot representing the smallest experimental unit.
This gives a total of 100, corresponding to 5×4×5 experimental runs. Three animals did not respond when challenged with the experimental conditions so that 24 observations in total were completed for each treatment, except for the second treatment which yielded 25. The analysis therefore includes a total of 97 observations.
Splitplot model for the P. americana experiment
The key feature of the P. americana design is that the treatments (the pheromone plumes) were varied at the level of wholeplots, and not at the level of individual experimental runs. In the simulated experimental data described above, a second factor of age was varied at the level of an experimental unit; however, in the P. americana experiment, this second factor is absent.
Since the treatments were applied to groups of animals within each apparatus setup, the treatments must be associated with the wholeplot part of the design. Therefore, in order to make an appropriate inference regarding treatments, the Fstatistic denominator must include the random variation between the wholeplots.
In the context of the P. americana experiment, the response Y_{ijk} in model (1) represents the ground speed (cm s^{–1}) averaged over different time points; α_{i} and γ_{ij} are the day and wholeplots effects, respectively;β _{j} is the effect of the jth pheromone applied at the wholeplot level and δ_{k} is the effect of the treatment applied at the subject level. However, since no factor was varied at the subject level in this experiment, the δ_{k} term is absent in the model.
Appropriate Ftest statistics
The analysis involves computing the sums of squares (SS) and mean squares (MS) due to each of the terms in the model; see Sokal and Rohlf (1981) and Searle et al. (1992) for the partitioning of the SS. In order to test for the significance of treatment effects, one forms an Fratio using the treatment MS (MS_{trt}) and the error MS as: 3 where the mean square error (MSE) is an unbiased estimator of the error variance σ_{ϵ}^{2}, the variation between subjects within groups. The above Fstatistic is an appropriate one to employ when the only source of random variation in the estimated treatment effects are the random errors. This is the case for the age effect δ_{k} term in the model (1). Since each age occurs once in each wholeplot, any block effects must influence all treatments equally. Consequently, the presence of block effects does not inflate the treatment MS.
The Fstatistic in Equation (3) is no longer valid when one is testing for the treatment effects applied at the wholeplot level, the β_{j} term in model (1). In the P. americana experiment, there may be additional wholeplot variation due to differences in responsiveness of animals during the scotophase, or due to any random variation in resetting the experimental device. The MSE estimates only the subjecttosubject variation while ignoring these other potential sources of random variation. Therefore, the Fstatistic in Equation (3) corresponding to the treatment effects is biased upwards, leading to false indications of significant treatment effects (Type I error).
The splitplot analysis overcomes this difficulty by modeling wholeplot variation by the random γ_{ij} term. The appropriate denominator for the Fratio is the MS attributed to γ_{ij}; i.e. MS_{intr}. Consequently, the Fratio becomes:
Results
We demonstrate the consequences of the randomization protocol on the analysis of experiments and scientific conclusions using the simulated and real data.
Comparison of the RCBD and splitplot analyses: simulation experiment
In terms of the practical significance, the main findings from our simulation experiment are summarized in Tables 2 and 3. Table 2 presents the results of an RCBD ANOVA, which assumes complete randomization as illustrated in Fig. 3. The analysis shows a highly significant block effect (P<0.00001). However, scientific interest is usually in treatment effects and the analysis in Table 1 incorrectly finds a highly significant treatment effect (P=2.533×10^{5}), while failing to detect the real age effect (P=0.2919). The restricted randomization in this design has led to both false positive (Type I error) and false negative (Type II error) results.
The restricted randomization means that the treatments are submitted in wholeplots of four runs, and for the purpose of analyzing treatment effects, the correct analysis is to treat these four runs as a single experimental unit. This leads to the splitplot ANOVA. The results of applying the splitplot analysis to the simulated experimental data are shown in Table 3. This analysis correctly identifies that there is no significant treatment effect (P=0.1889) and that the age effect is highly significant (P=6.126×10^{–5}). Therefore, the risk of a false positive indication of the treatment significance is substantially reduced under the splitplot design. In essence, by employing an RCBD when the underlying assumption is not satisfied, we are more likely to reject a true null hypothesis. This has implications to the understanding of experimental results as lack of treatment effect would be expected to be of relevance.
Comparison of the RCBD and splitplot analyses: the P. americana experiment
We present the analysis of the data from the P. americana experiment using RCBD and splitplot models to demonstrate when and how likely false positives can occur and their consequences on the biological questions. This illustrates how violations of the underlying assumption for the RCBD leads to underestimation of the error variability and inflating the statistical significance of the treatment effects.
The response variable, ground speed (cm s^{–1}), is shown in Fig. 4 sorted first by day and next by pheromone, and in Fig. 5 sorted first by pheromone and then by day. It is clear that the pheromone D (cylinder source) is behaving differently. It also appears that treatments behave differently on different days. For example, a comparison of pheromone A (point source) vs pheromone B (ribbon source) shows that animals are responding more rapidly to the point source on days 2 and 4 and more rapidly to the ribbon source on days 1 and 5.
Analysis of the data was conducted using the SAS PROC MIXED program (Littell et al., 1996). Table 4 presents a twoway ANOVA for the response variable ground speed (cm s^{–1}). While the usual analysis of an RCBD includes only main effects for treatments and blocks, in the present experiment there are multiple replications of the pheromones on each day. Therefore, we are able to include an interaction term between the treatments and blocks. The table shows a highly significant day and pheromone effects (P<0.0001) and a significant pheromonebyday interaction effect (P<0.0307). These results are consistent with those obtained by Willis and Avondet (2005).
How should the pheromonebyday interaction effect be interpreted? What are the consequences in terms of the questions of biological interest such as responsiveness of animals to different pheromone plumes? The pheromonebyday interaction effect means that the animals have responded more to some plumes than others on different days. However, the biological interest lies in the overall response to the individual pheromones and there is no inherent interest in the individual days. It is of little use to state that the animals respond more rapidly to the ribbon source on day 1, and to the point source on day 2, as the data here indicate.
The resolution to this paradox is to model the interaction effect, theγ _{ij} term in model (1), as a random effect, introduced by the restricted randomization of the design. However, these random effects influence estimates of the treatment means. Therefore, the SS attributed to the pheromone effects in Table 4 is inflated by these interactions. As a result, comparing the pheromone MS (MS_{trt}) with the MSE is inappropriate.
Table 5 presents the ANOVA using a splitplot model for the P. americana experiment under the restricted randomization while treating the block effects as random. The point estimates of the variance components and the Fstatistic for the treatment effect in the table, were generated using the SAS PROC MIXED program (Littell et al., 1996). The Fstatistic for the pheromone effect is now obtained as the MS for pheromone (MS_{trt}=424.3216) divided by the MS for pheromonebyday interaction (MS_{intr}=52.2499).
The analysis in Table 5 is further complicated by the three deleted observations, causing imbalance in the design (i.e. n_{ij}×n for all i=1,..., b and j=1,..., t, where n_{ij} is the number of observations in the ith block for the jth treatment). The consequence is that a Satterthwaite approximation (p. 24 in Milliken and Johnson, 1984) must be used for the degrees of freedom (d.f.) of the Fstatistic, namely the df_{2}. Note that the df_{1} is simply (t–1). The Fstatistic is much smaller (8.18 instead of 16.63) under the splitplot analysis; however, the treatment effect is still significant (P=0.003). The 95% confidence intervals for the variance components were generated by the lme() function in SPlus (Pinheiro and Bates, 2000). The results presented in Table 5 confirm that bothσ _{α}^{2} andσ _{γ}^{2}, representing the day and wholeplot effects respectively, are significant.
Pairwise comparisons
To further understand the treatment differences, we performed pairwise comparisons of treatment means. The analyses under RCBD and splitplot design in Table 6 correspond to the results in Tables 4 and 5, respectively. In comparison with the RCBD, the splitplot analysis returns larger standard errors and in turn, yields smaller Tstatistics and larger Pvalues. In fact, if we choose the level of significance to be 0.01, then the pairwise difference of A vs D (point source vs cylinder) is significant under the RCBD while not significant under the splitplot analysis. Moreover, the Pvalues corresponding to the pairwise differences of B vs D (ribbon vs cylinder) and C vs D (wide vs cylinder) are closer to 0.01 than to 0.0001. The RCBD analysis has substantially overstated the statistical significance; see CurranEverett and Benos (2004) for a discussion of why choosing 0.01 for a significance level is appropriate for certain situations.
The expected mean squares (Sokal and Rohlf, 1981; Searle et al., 1992) may be used to determine how much the Fstatistic for treatment affects will be inflated, on average, when blocks such as days are treated as fixed and the RCBD analysis is employed for analysis as if the design is carried out under the complete randomization protocol. In this scenario, the Fstatistic for treatment effects would use the MSE as its denominator, which has an expected value ofσ _{ϵ}^{2}. However, under the splitplot analysis with block effect treated as random, the denominator would be MS_{intr}, which has an expected value of (tnσ_{α}^{2} +nσ_{γ}^{2} +σ_{ϵ}^{2}). In the balanced case (i.e. n_{ij}=n for all i=1,..., b and j=1,..., t), an approximate indication of the inflation in the Fstatistic is provided by the ratio: where and are the estimators of σ_{γ}^{2} andσ _{ϵ}^{2}, respectively. Recall thatγ _{ij} is confounded with the setup variation of the experimental device as well as any potential treatmentbyblock interaction under the restricted randomization protocol and this combined variation is given by σ_{γ}^{2}. The Fstatistic given in Equation (3) under an RCBD does not include σ_{γ}^{2} in its denominator and therefore ignores the setup variation entirely. In other words, this Fstatistic ignores the `loss in efficiency' when one cannot randomize between individual subjects due to practical difficulties involved in conducting an experiment.
The simulated experimental studies presented earlier demonstrate the generality of these conclusions and are consistent with our above treatment of the real data. Recall that the simulation results show a change from highly significant to nonsignificant treatment effects.
Discussion
A randomized complete block design is one of the most widely used designs by experimental scientist in studying the effects of treatments on subjects. Often, treatments are replicated within each block to obtain separate estimates of the error variance and any potential treatmentbyblock interaction. Experimental constraints can prevent complete randomization within each block. Previous work in identifying the statistical significance of treatment effects on the behavior of response has used either a oneway ANOVA or an RCBD. In this article, we have demonstrated that a splitplot model can be applied to analyse data under a restricted randomization protocol. Furthermore, we have demonstrated that overlooking the effect of restricted randomization on inferences from RCBD analyses can lead to various spurious interaction effects as well as potentially serious Type I or Type II errors. In particular, if the restricted randomization is ignored and an RCBD analysis performed, then there is a risk of overstating the significance of treatment effects. In contrast, the splitplot analysis provides a powerful alternative to the analysis of data collected under the restricted randomization protocol. The proposed methods are illustrated using a real data from chemoorientation studies; however, they extend directly to other studies where it is impractical to completely randomize the treatments given to individual experimental subjects. The techniques presented in this article can be implemented using widely available statistical software.
Our findings clearly substantiate the consequences of ignoring the restricted randomization and have the following implications. (1) Under the restricted randomization (Fig. 1), one has two sets of experimental subjects: (i) the subjects nested within groups, which in turn serve as blocks for the subjects, and (ii) groups nested within blocks. The appropriate analysis is to employ a splitplot ANOVA by considering the groups of subjects as the wholeplots and the individual subjects as the subplots. (2) The expected mean squares usually assume complete randomization. Under the restricted randomization, one must be cautious in calculating the expected mean squares. In particular, one cannot test the treatmentbyblock interaction through an Fstatistic given by F=MS_{intr}/MSE, and it is possible only if experimental setups are completely randomized between animals.
An important conclusion of our work is to demonstrate the significance of describing completely the design employed and statistical analysis performed on any experimental data. As we have shown, small changes to the design protocol can have a major effect on the validity of a statistical analysis. There are many different types of ANOVA, and employing an inappropriate analysis to a dataset, can result in incorrect conclusions. The main function of the `Materials and methods' section of a scientific article is to provide sufficient details and information so that a knowledgeable reader with access to the original data can verify and reproduce the reported results (CurranEverett and Benos, 2004). Our applications, as well as the citation of the literature, have been limited due to incomplete description of the experimental designs and statistical methods employed in many of the articles we reviewed, thereby making it difficult if not impossible to replicate the experiments or the statistical analysis. Statistical methods and analysis are inherent to many allied fields and underpin the scientific discovery process. As stated eloquently by CurranEverett and Benos (2004), misunderstanding and misuse of the statistical techniques as well as misinterpretation of the analysis jeopardizes the scientific discovery process as well as accumulation of scientific knowledge. We hope that this article will serve to improve the caliber of statistical information as well as the reporting and presentation of the statistical techniques in allied scientific publications.
ACKNOWLEDGEMENTS
R.S.P.'s research was supported in part by the National Science Foundation (NSF) DMS 0239053 and Office of Naval Research (ONR) grants N000140210316 and N000140410481. C.L.'s research was supported in part by the NSF grant DMS 0306202 and ONR grant N000140410481. The authors thank the Editor and the reviewers for constructive comments and suggestions that led to significant improvements of the manuscript. We especially thank Mark Willis for providing the P. americana experimental data, for important discussions with regard to the relevant biological literature and for several comments on the manuscript. The authors are grateful to Joseph Koonce and Christopher Cullis for stimulating discussions and for their comments on the earlier version of the manuscript.
 © The Company of Biologists Limited 2005