Stereo videography is a powerful technique for quantifying the kinematics and behavior of animals, but it can be challenging to use in an outdoor field setting. We here present a workflow and associated software for performing calibration of cameras placed in a field setting and estimating the accuracy of the resulting stereoscopic reconstructions. We demonstrate the workflow through example stereoscopic reconstructions of bat and bird flight. We provide software tools for planning experiments and processing the resulting calibrations that other researchers may use to calibrate their own cameras. Our field protocol can be deployed in a single afternoon, requiring only short video clips of light, portable calibration objects.
Researchers have studied many diverse topics, including animal behavior and plant seed dispersal, by recording and quantifying the movement of animals and plants using video (e.g. Hayashi et al., 2009; Olaniran et al., 2013; Bhandiwad et al., 2013). For example, in 2012, 70 papers published in The Journal of Experimental Biology reported using video to measure kinematics, representing 11% of the papers published in the journal that year. Many of these studies used two or more cameras to measure the three-dimensional (3D) locations of points of interest in the scene. In order to use image observations from multiple cameras to reconstruct 3D world positions via triangulation, the relative position and orientation of the cameras (extrinsic parameters) and their focal lengths and principal points (intrinsic parameters) must be given. The process by which these parameters are estimated is known as ‘camera calibration’ and typically involves matching points on a calibration object across camera views.
In the recent biology literature, the most commonly mentioned method for calibrating cameras is direct linear transformation (DLT) (Abdel-Aziz and Karara, 1971), with some researchers also using a camera calibration toolbox for MATLAB (Bouguet, 1999). When using DLT, it is important to obtain calibration points throughout the volume of interest, otherwise reconstruction accuracy may be reduced (Hedrick, 2008). Previous authors who have used DLT in a field setting have constructed a large physical calibration object at the field site (Clark, 2009; Munk, 2011), limiting the size of the calibration volume. Others have carefully measured the extrinsic parameters by hand (Cavagna et al., 2008), relying on a semi-permanent placement of their cameras in a sheltered location. We propose a different calibration approach that is particularly useful for field settings where the volume of interest may be tens of thousands of cubic meters or where cameras cannot be left in place for multiple sessions. Our approach uses the sparse bundle adjustment (SBA) calibration algorithm (Lourakis and Argyros, 2009), which minimizes the difference between the observed and ideal locations of the calibration points in each camera view. Bundle adjustment has been chosen by biologists for its 3D reconstruction accuracy (Walker et al., 2009), but lacks wide use because of the absence of easily accessible software implementations that are directly applicable to analysis of field data.
The purpose of this paper is to describe a successful three-step workflow and provide software tools (supplementary material Fig. S1) that will allow researchers to perform stereo videography easily and accurately in a field setting. The workflow provides guidance on (1) pre-experiment planning of camera placement to meet observational objectives, (2) an in-field calibration protocol and (3) post-experiment camera calibration. Camera placement planning is supported by our easyCamera software tool in MATLAB; camera calibration is supported by easyWand, which is a graphical user interface to the easySBA routine, or a command-line Python package that calls the SBA routine provided by Lourakis and Argyros (Lourakis and Argyros, 2009). Our software is licensed under GNU public license version 3. Usage instructions are provided with the software download. Sample data and video tutorials are also available online. We here present two cases using the proposed workflow to calibrate cameras and provide accurate estimates of the three-dimensional flight paths of cliff swallows (Petrochelidon pyrrhonota) and Brazilian free-tailed bats (Tadarida brasiliensis). Although designed for use in the field, the methods outlined here may also be useful in laboratory situations where introducing a calibration object into the volume to be measured is difficult or infeasible.
RESULTS AND DISCUSSION
When developing our stereo videography workflow and software, we focused on accuracy and ease. It is important to estimate the level of calibration and reconstruction accuracy when the goal of 3D videography is to quantify the kinematics of airborne animals and facilitate the study of their behavior. Any uncertainty in the estimation of their 3D position affects the uncertainty in derived calculations such as velocity and acceleration, which may be of direct biological interest. We here first describe the results of our experiments that yielded stereoscopic reconstructions of bat and bird flight and then discuss their accuracy and the ease in obtaining them.
List of symbols and abbreviations
- direct linear transformation
- maximum observation distance
- focal length of the camera
- physical width of the camera pixels
- root mean squared
- sparse bundle adjustment
- physical size of an animal
- the bound on the pixel span of the image of an animal
Stereoscopic reconstructions of bat field flight
We recorded video of Brazilian free-tailed bats during their evening emergence from Davis Blowout Cave in Blanco County, Texas, USA (Fig. 1), and used the proposed stereo videography workflow to support the estimation of the flight paths of 28 bats flying through a 1400 m3 volume during a 1 s interval with a 7.8 cm root mean squared (RMS) uncertainty in their 3D positions. We used easyWand and easySBA to estimate the extrinsic and intrinsic camera parameters, using, from each view, 226 points manually digitized from the ends of a 1.56 m calibration wand, 2010 points manually digitized from hot packs thrown in the air, and 7135 points on the bats flying through the volume of interest, identified using automated methods (Wu et al., 2009). Manufacturer-provided values served as initial estimates of the intrinsic parameters. The calibrated space was aligned to gravity by calculating the acceleration of hot packs thrown in the volume of interest. The standard deviation of the estimated wand length divided by its mean was 0.046 (4.6%). The respective reprojection errors were 0.63, 0.74 and 0.59 pixels for the three views. The protocols used in bat observation are consistent with the American Society of Mammalogists (Sikes and Gannon, 2011), and were approved by the Institutional Animal Care and Use Committee of Boston University and the Texas Parks and Wildlife Department (permit no. SPR-0610-100).
Stereoscopic reconstructions of bird field flight
A cliff swallow flock was recorded adjacent to the colony roost under a highway bridge in Chatham County, North Carolina, USA, at 35°49′42″N, 78°57′51″W (Fig. 2). The proposed stereo videography process was used to support the estimation of the flight paths of 12 birds flying through a 7000 m3 volume during a 2.3 s interval with a 5.9 cm RMS uncertainty in their 3D positions. The extrinsic camera parameters were estimated by easyWand and easySBA using, from each view, 58 points from the ends of a 1.0 m wand tossed through the scene and 4946 points acquired from swallows flying through the volume of interest via automated processing of the video sequences (Wu et al., 2009). Manufacturer-provided values were used as the intrinsic parameters. The calibrated space was aligned to gravity by measuring the acceleration of a rock thrown through the volume of interest. The standard deviation of the estimated wand length divided by its mean was 0.0056 (0.56%); the respective RMS reprojection errors were 1.16, 1.58 and 1.17 pixels for the three cameras. The swallow observation protocol was approved by the University of North Carolina Institutional Animal Care and Use Committee.
Accuracy of results and ease of experimental setup
As our results demonstrate, the proposed field videography workflow and software yield accurate calibration of multi-camera systems and enable accurate reconstruction of 3D flight paths of bats and birds in field settings. The values of our two measures of calibration inaccuracy were sufficiently small to indicate accurate calibrations without any errors in determining corresponding calibration points.
We balanced our two conflicting observational objectives of recording sufficiently long flight paths in a large volume of interest and recording the animals so that they appeared sufficiently large in all camera views. The resulting camera placement yielded a level of uncertainty in estimated 3D locations of the animals that was less than the length of a bat and half the length of a bird. A different camera placement may have resulted in a different level of accuracy. The Materials and methods section describes how researchers can determine this level in pre-experiment planning using the proposed easyCamera software tool. In post-experiment processing, easyWand can be used to estimate the accuracy of the calibration.
Our recordings were part of extensive multi-day experiments. Camera placement and calibration recordings required 45 min at the beginning of each daily recording bout. This short setup time may be important when the study organism or group is on the move and must be followed, or when site location, daily weather patterns, tides or safety considerations dictate.
We suggest that the methods and implementations provided here will substantially aid biologists seeking to make quantitative measures of animal movements and behavior in field settings, as they have done for our own work on bat and bird flight. While previous studies have gathered similar data, they required heroic attempts at calibration frame construction in the field or carefully controlled field environments. We believe that the ease of setup and accuracy of calibration afforded by our methods opens up a wide range of previously unachievable studies and we plan to continue refining the publicly available software implementations to fit a variety of needs.
MATERIALS AND METHODS
The proposed three-step workflow for performing stereo videography provides guidance on field-experiment planning, capture and post-processing. During planning, appropriate camera equipment is chosen and the placement of the cameras is determined so that the captured imagery will satisfy observational requirements. In the field, a protocol is followed for moving calibration objects through the volume of interest. During post-processing, the image locations of the calibration points are digitized in each camera view. Then, easyWand uses corresponding points to estimate the relative position and orientation of the cameras.
When selecting a multi-camera system, scientists should consider whether the frame rate, spatial resolution, field-of-view and synchronization ability of the cameras are appropriate for the size and speed of the study organisms. Camera synchronization, in particular, is a requirement for successful stereoscopy. In our field work, we used hardware synchronization to ensure accurate temporal alignment of frames across cameras. Multi-camera systems without precise frame synchronization could be calibrated using these methods if wand and animal pixel motion per frame is small, a large number of calibration points are used and some motionless background points are visible in all cameras.
For capturing video of Brazilian free-tailed bats, we used three thermal infrared cameras (FLIR SC8000, FLIR Systems, Inc., Wilsonville, OR, USA) with variable-focus 25 mm lenses and a pixel width of 18 μm, providing a 40.5 deg field of view. The 14 bit grayscale−1 video has a frame size of 1024×1024 pixels and frame rate of 131.5 Hz. For capturing video of cliff swallows, we used three high-speed cameras (N5r, Integrated Design Tools, Inc., Tallahassee, FL, USA) with 20 mm lenses (AF NIKKOR 20 mm f/2.8D, Nikon Inc., Melville, NY, USA), recording 10 bit grayscale video with a frame size of 2336×1728 pixels at 100 Hz.
When designing camera placement, scientists should consider their observational objectives, the amount of 3D reconstruction uncertainty they may tolerate, and potential additional requirements introduced by the manual or automatic post-experiment video analysis.
Two observational objectives that commonly conflict are the size of the volume of space in which the animals are observed and the spatial resolution at which they are observed. When designing an experiment for such studies, we suggest imposing a lower bound on the size of animals in the image, so that they are not recorded at sizes that will make post-experiment analysis difficult. Based on a pinhole camera model, the bound xmin on the pixel span of the animal in the image can only be guaranteed if the observation distance between animals and each camera is at most: (1) where X is the length of the animal, f is the focal length of the camera and p is the physical width of a pixel. For our studies, in the interest of observing the flight paths of the animals over a large distance, we chose to allow a small image size. A 10 pixel nose-to-tail span of a 10-cm-long bat in an image was ensured for animals that flew at distances smaller than Dmax=(25 mm×10 cm)/ (10 pixel×18 μm pixel−1)=13.8 m from our thermal cameras.
An inescapable source of uncertainty in the stereoscopic reconstruction of a 3D point is the quantization of intensity measurements (light or thermal radiation) into an array of discrete pixels. Each pixel, projected into space, defines a pyramidal frustum expanding outward from the camera. The location of the 3D point resides somewhere in the intersection of the frustums defined by pixels in each image. For any camera configuration, we can estimate this uncertainty for every 3D point observed by at least two cameras via simulation using the easyCamera software. Our procedure first projects the 3D point onto the image plane of each camera and quantizes the location of each projection according to the pixel grid of each camera. The discrete pixel coordinates of the image points in each camera are then used to reconstruct a 3D position via triangulation. The reconstruction uncertainty is finally computed as the difference between the original and reconstructed positions of the point.
Our simulation results, shown in Fig. 3, indicate that the size and shape of the observation volumes and the uncertainty due to quantization within these volumes can differ significantly depending on the number of cameras and their placement. Ensuring that the angle between the optical axes is not wider than the field-of-view angle of the cameras leads to ‘open’ intersection volumes that extend infinitely far away from the cameras (all examples in Fig. 3 except C), which is desirable because it facilitates recording even if the animals appear in an unexpected location. The level of uncertainty increases with the distance from the cameras because the volume of the intersection of the pixel frustums also increases with this distance.
In addition to the reconstruction uncertainty created by quantization, we also consider the reconstruction uncertainty arising from the difficulty in identifying the location of an animal in an image (Fig. 4). The location of an animal is often thought of as a single point, e.g. at the center of its body. Localization accuracy of this ill-defined point depends on the resolution of the animal in the image (Fig. 4B). To estimate uncertainty in localization, we included a stochastic element to the simulation procedure described above by adding noise to the two-dimensional projections before quantization (supplementary material Fig. S2). Over 100 trials, the RMS distance between the original and reconstructed scene points gives an estimate of the reconstruction uncertainty at the original point (Fig. 4D). The camera placements we selected for our bat field experiments (Fig. 1A) were similar to the configuration shown in Fig. 3B. With our simulation, we were able to determine, prior to any field work, that the levels of uncertainty due to quantization and localization issues would be acceptable for use (supplementary material Figs S2–4).
Reconstruction error occurs when image locations corresponding to different animals are mistakenly used to reconstruct 3D positions. These ‘data association’ errors are commonly made by automated tracking methods, especially if the animals appear similar and small in the images. Camera selection and placement can reduce the potential occurrence of data association errors by imposing appropriate geometric constraints on the triangulation (Fig. 5). We recommend use of three or more cameras and a non-collinear camera placement that ensures that the image planes are not parallel (avoiding the configuration in Fig. 3F).
Protocol for field experiment
The protocol we recommend for field work includes two phases. In the first phase, prior to any camera setup and recordings, a preliminary plan for the location of the recording space and placement of the cameras is made. The easyCamera software is then used to estimate the uncertainty in localizing the study organism in this space, and adjustments to the plan can be made by experimenting with other hypothetical camera configurations. In the second phase, the actual camera setup in the field can be done easily because no field measurements of camera pose or distances to the animals of study are needed. The only measurements required are references for scene scale and orientation. In our experiments, the known length of a calibration wand (supplementary material Figs S5, S6) moved through the scene provided a scale reference, and gravitational acceleration, estimated from the ballistic trajectories of thrown objects, provided a reference for scene orientation.
Our calibration method generally produces more accurate results the more sets of corresponding image points it uses (supplementary material Table S1, column 2). Thus, we used recordings of the animals, digitized automatically using a preliminary wand-only calibration, to augment the number of calibration points and volume encompassed by them. This augmentation is a feature of our SBA-based calibration pipeline not possible with calibration-frame-based DLT methods. See supplementary material Table S1 for an exploration of the effects of using animal points in the calibration. We typically recorded our study videos of bats and birds after obtaining videos of calibration objects, but this order can be reversed.
Post-experiment camera calibration
Our easyWand calibration software bundles a modular pipeline of algorithms that can be used to estimate the relative positions and orientations of the cameras and their intrinsic parameters. The first, most time-consuming step of the calibration procedure is to manually or automatically digitize the image locations of objects recorded in all views. In our post-experiment analysis, we identified thousands of sets of matching image points.
Using the focal lengths and principal points obtained directly from the lenses and image sensors as preliminary estimates of the intrinsic camera parameters and the 8-point algorithm (Hartley and Zisserman, 2004), our software computes preliminary estimates of the camera pose and 3D positions of the calibration objects. Our software then applies the SBA algorithm (Lourakis and Argyros, 2009) to obtain refined estimates for all calibration parameters. Finally, it converts to a representation of the camera calibration parameters in the form of the DLT coefficients in order to easily integrate into previously existing workflows. None of the 8-point, SBA or DLT algorithms explicitly requires use of a wand, and other sources of matched camera points could be used as input. Wands are convenient for their mobility and as means to measure scene scale and conduct additional error checking.
The easyWand software tool computes three measures of calibration inaccuracy. The ‘reprojection error’, measured in pixels for each camera, is the RMS distance between the original and reprojected image points of each calibration point, where the ‘reprojected’ image points are computed using the estimated 3D position of the calibration point and the estimated camera parameters. The second measure of inaccuracy is the ratio of the standard deviation of wand-length estimates to their mean. A large ratio may indicate problems with the calibration; for example, unidentified lens distortion. The third measure is the average uncertainty in the position of each wand tip, estimated from the distance between the two tips. The easyCamera tool can be used with the estimated extrinsic parameters to compute the uncertainty of the reconstructed 3D positions of the study animals.
We wish to thank Brian Borucki, Ashley Banks, Ann Froschauer and Kimmi Swift for assisting with the bat recordings, and Nick Deluga for assisting with the swallow recordings. We also thank Dewayne Davis and David Bamberger for property access, and the Department of Texas Parks and Wildlife for permitting assistance. Thanks also to two referees for providing useful feedback.
D.T.: developing methodology and software, collecting data, writing manuscript. N.F.: collecting data, writing manuscript. B.J.: collecting data. E.B.: developing software. D.E.: developing software, writing manuscript. Z.W.: developing software. M.B.: study concept, developing methodology, writing manuscript. T.H.: study concept, developing methodology and software, writing manuscript.
The authors declare no competing financial interests.
This work was partially funded by the Office of Naval Research [N000141010952 to M.B. and T.H.], the National Science Foundation [0910908 and 0855065 to M.B. and 1253276 to T.H.] and the Air Force Office of Scientific Research [FA9550-07-1-0540 to M.B.].
Supplementary material available online at http://jeb.biologists.org/lookup/suppl/doi:10.1242/jeb.100529/-/DC1
- © 2014. Published by The Company of Biologists Ltd