A monocular wide-field vision system for geolocation with uncertainties in urban scenes

In engineering applications related to video surveillance, the use of monocular omnidirectional cameras would reduce costs and complications associated with infrastructure, installation, synchronization, maintenance and operation of multiple cameras. This makes omnidirectional cameras very useful for transport analysis, a key task of which is to accurately geolocate vehicles and/or pedestrians observed in an ample region. The problem of measuring on the plane was previously solved for monocular central perspective images. However, the problem of determining uncertainties in geolocalization using monocular omnidirectional images, has not been addressed. This problem is not trivial due to the complexity of the image formation models associated with these cameras. The contributions of this work are: (1) The geolocation problem is solved using omnidirectional monocular images through a Bayesian inference approach. (2) The calculation of Bayesian marginalization integrals is simplified through first-order approximations. (3) The accuracy of the estimated positions and uncertainties is shown through Monte Carlo simulations under realistic measurement conditions. (4) The method to geolocate a vehicle’s trajectory on a satellite map is applied in an urban setting.


Introduction
Currently, central perspective projection cameras (described by the pin-hole model) are used to solve these video-surveillance tasks. These cameras typically have a maximum field of view of approximately 60°×60°, so in order to observe an ample region, it's necessary to use numerous cameras increasing costs and complications associated with infrastructure, installation, synchronization, maintenance and operation. A very common solution is to use pan-tilt-zoom (PTZ) cameras whose direction of observation can be remotely commanded by an operator. However, they have the limitation that they can only observe one region at a time, that is when the operator reorients the camera it loses vision from other regions. Additionally, for the PTZ camera to monitor a broad panorama without being continuously controlled by a human operator requires programming frequent automatic reorientations, generating mechanical friction and reducing the camera's lifespan.
The limitations mentioned above can be solved using omnidirectional cameras (OCs) with a visual field of approximately 360°×180°, allowing the observation of the hemisphere of interest of the scene [1,2]. There are several sensors which can be used to achieve a wide field vision, such as: synthetic compound eyes, catadioptric and dioptric cameras. Synthetic compound eyes are sensors of reduced size that generally use a set of photodiodes mimicking an ommatidium ordered array [3]. Due to the size and reduced weight of these sensors, they are widely used in small mobile robots; however, given their low resolution they are not used in video surveillance tasks. Catadrioptic cameras use a standard digital camera along with a specially curved reflective surface to increase camera's field of view. [4,5]. This is a convenient and flexible approach, since the mirror profile can be adapted to achieve greater resolution in certain directions of interest. Nevertheless catadioptric OCs tend to be rather bulky and costly. Finally, the omnidirectional dioptric cameras employ a fisheye lens with a field of view so wide that it extends a few degrees behind the camera. Fisheye lenses resemble the natural underwater phenomenon of how a fish sees a hemispherical upward view from beneath the water, known as Snell's window. At present there are different commercial versions of these cameras that can produce a reasonable angular resolution using high resolution CCD sensors (around 5 mega pixels) [6].
Given the previously mentioned potential advantages of using OCs this work is focused on the development of a wide-field video surveillance for the monitoring of urban scenes. In this application, the objects of interest (for example vehicles and/or pedestrians) are bound to earth's surface due to the gravitational pull [2,7,8], and most of the events which need attention or cautionary measures, take place below the horizon. Therefore the objective of this system is to measure the geographical location of objects on the terrestrial plane along with determining the uncertainty of the measurement ( figure 1(a)).
The problem of measuring in the world plane from central perspective images, and accurately predicting the uncertainty of these measurements was solved by Criminisi et al [9]. They use a homography transformation to map positions from the image to a world plane predicting uncertainty with a first order model, and take into account uncertainties in the image input points and in the homography matrix. They use a linear distortion model and estimate projection parameters minimizing the projection error in the world plane and demonstrate that first order analysis is accurate.
In order to make measurements of the location of objects over a terrestial plane with the OCs, the first step is to model and correct the distortions ( figure 1(a)). In this sense, there is a wide bibliography [4,10,11] on calibration of catadioptric and dioptric OCs. However, the problem of determining uncertainties in the process of predicting location with OCs has not been addressed. This problem is not trivial due to the complexity of the image formation models associated with these cameras, thus approaching it from a Bayesian perspective is the main contribution of this work.
The Bayesian approach is well suited to formulate both, camera calibration and position estimation problems in explicit probabilistic terms It has been demonstrated for the case of 3D reconstruction by Sundareswara and Schrater [12] that Bayesian prediction marginalizes on the parameters making it less susceptible to statistical fluctuations than the plug-in approach where only the value of the most likely parameter is used. Together with Civera et al [13] they use a Bayesian approach to intrinsic and extrinsic calibration in addition to 3D scene estimation. But they rely on multiple view geometry because they use the movement of the camera. Also due to the complexity of the projection models, sampling algorithms are used for the estimation of parameters (calibration) and scene positions.
This paper faces the problem of calibration and prediction with a Bayesian approach with a single static OC already installed in an urban setting and develops a calibration method for localization in the ground plane. The calibration is simple and only requires an operator to manually match some fiducial points in both a satellite image and the OC image. Bayesian marginalizations integrals are simplified by assuming the projection function can be approximated by a first order Taylor series which result in fast calculations. This makes the method suitable for real time localisation with uncertainty.
This work is organized as follows: In section 2 the workflow of the proposed method is described, the geometric model of image formation (projection) that maps world coordinates to image coordinates, and the inverse (back-projection) in which world coordinates from image coordinates can be obtained. In section 3 the Bayesian approach is described in detail, along with its implementation in Python scripts to be applied to real data. In section 4 the linear approximation and the prediction algorithm are shown to be valid estimations through Monte Carlo simulations under realistic conditions of application, later applying this method in real data for the geolocation of vehicle trajectories.

Workflow of the proposed method
The model of image formation depends on both the OC's intrinsic parameters (such as focal length, radial distortion and CCD optical center, symbolised as G) and extrinsic parameters describing the relative pose of the camera to the world (position and orientation, symbolised as Q) [14,15]. A back-projection function  mapping coordinates X I to world plane coordinates X w using the camera's parameters G and Q can be calculated ( figure 1(a)).
The calibration's first step ( figure 1(b)) is performed in the laboratory by detecting multiple corners of a chessboard with known position in the world plane (X w ). This process generates an extensive set of training data called  1 from which an estimate the intrinsic parameters and their uncertainties (denoted as the meanm G , and the varianceˆG C ). After this, it is assumed that the OC is installed in the field observing the region of interest ( figure 1(c)). In this setup, a reduced set of world points ¢ X w and their correspondence in the image are observed (denoted as the calibration dataset  2 ) which allow to estimate mean and variance of pose (ˆm Q Q C , ). Those calculations associated with the estimation of model parameters { } G Q , are performed offline. After this process the objective is to project the coordinates of a new detection in the image ( ¢ X I ) to the world plane (m w ) and their uncertainties (Ĉ w ), which is intended to be computed online, see figure 1(d). Formally, the estimation of the probability density function(PDF) ( | ) ¢ ¢   p X X C , , , w I I 1 2 (the probability of position in the map X w given the measurement in the image ¢ ¢ X C , I I and the calibration data) is performed. The linear propagation of calculates the position on the world plane X w , given the position in the image X I , the intrinsic parameters G and extrinsic parameters Q. (b) Intrinsic calibration. N images of a known chessboard are acquired and the corners in each image are automatically detected. This process generates the dataset  1 that allows to calculate the mean and variance of the intrinsic parameters,ˆm G G C , through maximum likelihood estimation. (c) Extrinsic calibration. On the left is showed an image of the plane of interest acquired with the OC. On the right, a satellite image of the same area. A few points that can be detected in both images (red dots) are identified. With this dataset, denoted  2 , (and additional a priori information about the OC's pose, provided by the camera installer) the extrinsic parameters associated with the position and orientation of the cameraˆm Q Q C , are estimated by Maximum a Posteriori estimation. (d) Geolocation of objects. An object of interest is detected in the OC image ( ¢ X I , with ¢ C I uncertainty). This information is combined with the estimates of the calibration parameters to predict the object position in the world reference frame and the associated uncertainty,ˆm C , w w .
uncertainties approach is taken, since it is computationally inexpensive and also accurate, as further demonstrated.

Camera
In this work a fisheye IP camera VIVOTEK FE8172 (figure 2(a)) is used. It has a field of view (FOV) of 360°×183°allowing for the observation of a full hemisphere, this camera is compact and easily connected over Ethernet port. A simplified intrinsic model for this OC is shown in figure 2(b). It is observed that an incident ray from the point P with an angle θ with respect to the optical axis, is refracted in the projection center of the camera forming an exit angle different from the one of entrance, and projecting into the image plane (corresponding to the CCD sensor) at a distance r d from the projection center. The model of fisheye cameras is specified by defining a relationship between r d and θ, this being in general strongly non linear. This differs from the model of central perspective projection cameras, for which input and output angles are the same (pin-hole model). In a previous work [1], a calibration of the VIVOTEK FE8172 camera was performed showing that the stereographic model is a good characterisation of lens' distortion (figure 2(d)). It is given by the relation where k is the central distance. At full resolution, 1920×1920, radial distortion parameter obtained was = k 952.16 px. In order to improve system's performance in terms of framerate, the resolution is set to 1600×900 and set the camera to 'wide' mode obtaining a FOV of 183°horizontally and about 120°vertically. This doesn't hinder performance because the FOV of full fisheye mode covers an area so extense that a large part of the image corresponds to uninteresting regions while full resolution mode reduces camera's framerate with little visual information added with respect to 1600×900.

Stereographic projection model
A projection model describes the path of a light ray that originates from a 3D world position (see figure 3) as it passes through the camera lens and hits the CCD chip incrementing image pixel's intensity measure( figure 3). This process is broken down in two steps [1,15,16]. First, the 3D position of the light source X w is rotated and translated to the camera's frame of reference yielding X c . Information about distance to the camera is eliminated, only direction of arrival of light's ray with respect to the camera matters, yielding X h a 2D vector. Second, it goes through a function that models lens distortion and projection to the CCD chip, yielding X I . Given that third dimension is lost, it is not possible in general to back-project X w from X I , but with the hypothesis that the light source is at ground level plane the back-projection function can be solved. There are several proposed models of optical distortion [10,17], in [1] it is shown that the stereographic model is a good description for the OC used (figure 2(d)) and is easy to fit having only one parameter describing distortion. Although there are more general distortion models, they have more parameters than needed for this case [18]. In the rest of this section the projection and back-projection function is formulated following OpenCV's projection model. The rotation matrix R is calculated from three parameters that express the orientation, following OpenCV's convention r r r , , x y z are the components of the Rodrigues vector [19,20]. These six parameters that describe the rotation and translation are the extrinsic parameters, the concatenation of the Rodrigues vector and translation vector: , , , , , x y z x y z . Information about distance to the camera is eliminated by projecting to the image plane placed at = z 1 c . The now bidimensional coordinates Optical distortion is applied to X h . The stereographic model only deals with radial distortion, assuming cylindrical symmetry. The radius in the image plane (defined as follows in equation (3)) is used to calculate the polar angle between light ray and the optical axis (equation (4)). The stereographic model applies a nonlinear distortion on it (equation (5)), here it introduces the only non-linear parameter k that scales to pixels units. In symbols all this steps are: Figure 3. Image formation model when the OC camera observes the X w point on the world plane ( = z 0 w ). The incident ray forms an angle θ with the optical axis of the camera (which corresponds the axis z of the camera's frame of reference). This ray is refracted according to the stereographic model (see figure 2(b)) which projects on the CCD sensor.
Conventionally the origin of pixel coordinates is located at the top left corner of the image, thus the coordinates are appropriately displaced by [ The parameters that depend solely on the camera and describe the optical distortion are intrinsic parameters x y . OpenCV's formulation of the projection model is followed, except for the specific form of the radial distortion function. This makes it fairly easy to later extend the procedure here explained to OpenCV's distortion models.

Stereographic back-projection
Since the goal is to predict the real world position of objects, the model of image formation from the previous section must be inverted. To map image positions into world positions the model of image formation is followed reversing every step so that a position in the image X I can be transformed to a position in the physical world X w . The back-projection is referred as the function , , x y and extrinsic parameters [ ]  r r r t t t , , , , , The calculations are shown in algorithm 1.
, , , , , , , , , , x y x y z x y z Intrinsic correction: x y z >Rodrigues rotation formula 8: , , First, in line 2 the image position is displaced to be expressed in reference to the optical center [ ]  c c , x y . The radius with respect to the optical center in pixels is used to calculate the angle of arrival of the light ray using parameter k in line 3. The tangent of this angle is the radius in homogeneous coordinates (see line 4), which by simple proportionality serves to calculate X h as shown in line 5. [ in the frame of reference of the camera. To project to the world frame of reference a 3D position needs to be calculated. The missing 3rd dimensional information (distance of the object to the camera) is made up as in [7] with a reasonable hypothesis in the context of traffic monitoring: the object of interest is on the ground.
, h h , the parametrization of the pose (R ij is the element i, j of the rotation matrix R) and the hypothesis = z 0 w . Working with equations (1) and (2) yields the well known collinearity equations. The solution is shown in lines 9 through 13.

Previous work on intrinsic calibration and python libraries
The most widely used camera calibration procedure is based in Zhang [21], Bouguet [22] for its ease of use. The calibration procedure usually is: print a chessboard-like pattern and attach it to a planar surface. Take about 10 images of the pattern in different positions with respect to the camera. Detect chessboard corner points in the image automatically. Feed the detected corner points and its corresponding planar coordinates to the algorithm, it will return the distortion parameters of the camera and the rotation-translation of the chessboard in each image.
OpenCV [16] is an open source computer vision library that has been largely adopted as the primary development tool by the community of researchers and developers in computer vision [23]. It includes solutions for camera calibration [21,22] and camera pose estimation [24] for a variety of optical distortion models including OCs. Among the solutions it provides, it estimates distortion parameters, extrinsic parameters and perspective transformation. As it will be explained in section 5 this paper's contribution can be added to OpenCV to yield a more complete treatment of uncertainties.

Bayesian approach to calibration
In this section an approach to camera calibration from a Bayesian perspective is proposed. The starting point is the general expression of the predictive distribution. From there, the two step calibration process is deduced: intrinsic and extrinsic calibration; in both cases the posterior probability of the parameters given the calibration data is estimated. With the estimated posteriors on the parameters, the predictive distribution is aproximated as a linear propagation of uncertainty.
The predictive distribution of the world position of an object is conditioned on a measurement on the image and on previous data, . Following the standard procedures of camera calibration, previous data is separated in two, intrinsic calibration data,  1 , and extrinsic calibration data,  2 . The new measurement corresponds to the detection of an object in the image, ¢ X I . This detection process gives a position in the image but also must report some quantification of the uncertainty of detection. It will be denoted by a covariance matrix ¢ C I that is considered to come directly from the detection algorithm. It follows that the predictive PDF can be expanded as There are four terms in the integrand: • The first term is the Dirac delta function on the back-projection, ( | ) • The second term describes the PDF of the random varible that represents the position in the image, assumed normal given a noisy measurement parameterised by ¢ ¢ X C , 1 is the posterior probability of the intrinsic parameters given the intrinsic calibration data. It will be addressed in section 3.1. The result is the estimation of the mean and variance of said PDF, assumed normal; that is, is the posterior probability of the pose of the camera given the extrinsic calibration data, and the intrinsic calibration as well. This is because the extrinsic calibration requires the results of the intrisic calibration as will be explained in section 3.2. Again, the estimated posterior is a normal Replacing with the normal PDFs that will be estimated in the following pages yields  Figure 4 shows a graphical representation of the calculation of the predictive distribution for the hypothetical case in which variables X X , I w were one-dimensional. It is important to note that, even though all the PDFs in the integrand were approximated to normal distributions, the integral is still hard to evaluate due to the non-linearity of  . The integral could be solved using expensive computational strategies, but as explained above the goal is to perform this calculation online. Linearising  aroundˆm m ¢ G Q X , , I reduces the calculation to a simple linear combination of mutually independent normal random vectors [27], In words, the approximately normal PDF of the predicted position in the world has a mean that is a direct evaluation of the back-projection function on the means of the detected image position and the parameters; and a covariance that combines the uncertainty on the detection and parameters through the Jacobian J .

Intrinsic calibration
, be the data set for intrinsic calibration where each tuple i j , consist of world coordinates each corner in the chessboard calibration pattern. Detections might not have the same accuracy so ( ) ¢ C i j I , is included in the list of calibration data. All the parameters for this data set are Q i i is the list of extrinsic parameters of every picture. The posterior density of the parameters given the data Since at this point there is no prior information on W the posterior can be equated to the likelihood, which in turn is the product of the probability of each data tuple given the parameters. In symbols and applying the definition of conditional probability to leave ( ) , on the right side of the conditional quickly leads to  (13) can be evaluated numerically for some value of W, it requires the calibration data and to compute  and its derivative with respect to X I as shown in equation (14).  ( | ) are the components of the mean and variance of | W  1 (obtained by Metropolis-Hastings) that correspond to G.

Extrinsic calibration
After intrinsic calibration in controlled conditions, where the PDF of G was estimated, the camera is set up in some urban location pointing to some zone of interest ( figure 1(c)). Calibration data is now a set denoted as  2 of M points on the real world and its associated image coordinates . The extrinsic calibration is the procedure to estimate mean and variance of the camera pose Q.
By the law of total probability and assuming independence between Q and  1 and between G and  2 the posterior on Q is but as Q and G are independent ( | ) ( ) Notice that a non flat prior on the camera pose is allowed ( ) Q p , as this is a physical magnitude for which there might be some information after installation, unlike the camera intrinsic parameters that depend on the model, which might be quite obscure to elucidate.
As in equation (14) the likelihood ( | ) Q G  p , 2 can be calculated as the product of the likelihood of each data tuple resulting in  (20), such that

Summing up calibration and prediction
In brief, the procedure is as follows. Taking images of a calibration pattern in the laboratory as in figure 1(b) produces the intrinsic calibration data  1 and the result of calibration is to parameterize the posterior PDF of the intrinsic parameters ( | ) G  p 1 with a meanm G and varianceˆG C . This is done computing the posterior via equation (14) (refer to section 4 for more details) and standard methods of numerical integration like Metropolis-Hastings. When the camera is finally installed in its final position the extrinsic calibration points can be extracted  2 ( figure 1(c)) that are used to estimate the mean and varianceˆm Q Q C , of the posterior PDF ( | ) Q   p , 1 2 (computed as shown in equation (20)). This completes the calibration. With a new detection of a vehicle ¢ ¢ X C , I I the predicted PDF in the world frame of reference is calculated with equation (9) as ilustrated in figure 1(d). This prediction can be performed online since the computational cost is negligible.

Results
In this section it is shown that the linear approximation for uncertainty propagation delivers significant accuracy when compared to a more proper but computationally intensive nonlinear Monte Carlo estimation. The twostep calibration and prediction are applied to simulated data cases: first generate data of realistic chessboard pictures for the intrinsic calibration and a total of six final camera installation positions and orientations for the extrinsic calibration. Then use real chessboard data obtained in controlled conditions to estimate the intrinsic parameters; the camera was installed at a testing site and calibration points were manually obtained from images to estimate the extrinsic parameters. Finally, the uncertainty of the predicted world positions for a vehicle detected within the video sequence is shown.
As a pattern for intrinsic calibration a 37 cm long chessboard with 9×6 interior corners was used. = N 33 pictures were taken and then applied OpenCV corner detector as shown in figure 5(a). These pictures were taken to cover the field of view, as suggested by Fraser [18], the detected corners are shown in figure 5(b). OpenCV's calibrateCamera function takes the detected corners and their corresponding positions in 3D (figure 5(c)) and returns 33 camera poses shown in figure 5(d). The detected chessboard corners and the estimated camera poses are used either as initial conditions for the sampling algorithms or as ground truth to generate synthetic data, as explained in the following subsections.
Both the acquisition of video/images and off-line data processing were carried out in a desktop computer running under Linux operating system using Python [29] scripts with the aid of the libraries NumPy [30], SciPy [31], Matplotlib [32], OpenCV [16] and the Spyder IDE [33]. As OpenCV implements the calibration algorithms of Bouguet [22] it was adopted as starting point for calculations, and for general image manipulation. The library PyMC3 [34] was used for Monte Carlo simulations.

Comparing linear approximation with Monte Carlo
In this section the first order approximation of the propagation function against a Monte Carlo (MC) evaluation of the nonlinear mapping are compared. The heart of the stereographic model is a highly nonlinear radial distortion function because it must conform to the severe optical distortion that characterizes the OC. And more generally, any image formation model includes a perspective projection that is strongly nonlinear in the camera pose so it is not at all evident that the linear approximation would hold in practice.
In a similar fashion as done by Criminisi et al [35] a population of tuples ( ) G Q X , , I that follow normal distributions are generated. Each variable then . The resulting set of points will be compared with the parameterized normal PDF obtained by linear propagation of the normal distributions where the points were drawn from. If the linear approximation is valid then the mean and covariance of the MC particles will be close to the propagated mean and covariance.
The data gathered for the intrinsic calibration is a useful source of realistic image coordinates. Instead of arbitrarily defining a number of poses that imitate chessboard calibration data it was preferred to borrow some camera poses associated to a real data set as it covers a reasonable range of positions. Taking  . It is considered reasonable (and this is later confirmed empirically) that the intrinsic and extrinsic parameters have been determined to about three significant digits, that is the standard deviation is 10 −3 of the parameter value, hence defining G Q C C , . In figure 6(a) the image coordinates sampled from a Gaussian distribution, 5000 samples for each corner detected, the zoomed inset on the right shows the comparison between the samples for a detected corner, the covariance ellipses of 90% probability, both estimated from the Monte Carlo samples (in black) and theoretical first order analysis (in red). In figure 6(b) the same samples were corrected for intrinsic distortion (lines 2-5 of algorithm 1) and it can be seen that the ellipses have become elongated in the radial direction, also, the difference between MC and the linear approximation has been accentuated due to the linearising error in the radial direction. In figure 6(c) the perspective projection is performed (lines 11-13). The uncertainty in the six pose parameters adds more uncertainty but the estimations of covariances from Monte Carlo and linear propagation are indistinguishable.
There are 33×54 calibration points, each of them was used to produce a pair of prediction PDFs. One by linearly propagating uncertainties and a second one by fitting a Gaussian distribution to the MC back-projected samples. To visually assess the similarity between the two PDFs for all 1782 calibration points, the covariance ellipse associated to the PDF obtained by first order propagation are transformed to a new base where its corresponding numerical MC counterpart becomes the unitary normal distribution with zero mean. Ilustrated in 6(d), then subtract the center of the ellipse from linear propagation and apply a change of base such that this ellipse becomes a unitary circle. Plotting the transformed first order covariance ellipses in figure 6(e) in red lines and as reference the MC covariance circle in red showing all the red ellipses superimposed result in a blue halo around the reference circle.

Intrinsic calibration with synthetic data
To test the intrinsic calibration the posterior probability distribution of the three intrinsic parameters of the camera is estimated. first and second moment. Figure 5 shows the 33 camera poses with respect to the checkboard points and all corner detections in one single image.
The 3 intrinsic parameters and the 33×6 extrinsic parameters (6 per image) form a multivariate random vector of 201 components. The probability of the vector is evaluated as shown in equation (13). There were drawed 442 chains of 50 samples with Differential Evolution Metropolis (DEM) [34,36]. The starting values for the chains were defined ad hoc to minimize the burn-in period. The histograms of the samples from this section and the ones to follow were unimodal and bell shaped.  Table 1 compares the true values of the parameters with the estimations from the samples, the disparity is in the sixth significant digit, it is due to the statistical fluctuation of the artificially added detection noise (standard deviation of 1 pixel). This shows that the expectation of the posterior probability is a good estimator of the true parameter values. The variance of the samples comes from the width of the dispersion of said noise, the more uncertain the detection in the image the less informative the posterior.

Intrinsic calibration with real data
To estimate the intrinsic parameters of the OC it is followed the same procedure as above with experimental data, the detected corners in the 33 chessboard images (not the ones artificially generated assuming known distortion parameters and camera poses).
Before sampling a standard non linear optimization function to get better seed values is used. The extrinsic parameters given by OpenCV's calibrateCamera and the intrinsic parameters used as ground truth in the previous section result in a back-projection of the corners that show significant discrepancies to the true chessboard positions. To provide DEM with better initial values for sampling, a standard non linear optimization routine from Scipy [31] that brings the back-projections closer to their target is used. Minimizing the error function associated with the posterior on the parameters (equation (13)). The back-projection with the values from synthetic chessboard (the initial guess) are shown in the left panel of figure 7 and in the right panel the back-projection with the optimized parameters. OpenCV estimates the parameters minimizing the projection (in image) error, that's why they are bad estimates for back-projection. The clear improvement in fitting drastically cuts down the burn-in period when sampling.
Assuming Notice that the variance is much greater than the one estimated for the simulated intrinsic calibration because it now accounts for the error in the model.

Extrinsic calibration and predictions with synthetic data
Following section 4.2 where the calibration for a simulated camera was solved, placement of the same camera is simulated in an urban environment to perform the extrinsic calibration and test the algorithm with plausible ad hoc camera poses. The main interest is to test the calibration in a set of realistic conditions in the context of monitoring of vehicles and pedestrian in urban scenes. Camera height above ground { } 7.5 m, 15 m ; its optical axis forming and angle with respect to vertical: { }    0 , 30 , 60 ; and 20 calibration points, in total encompassing 6 situations. Points on the z=0 plane are in region of 50 m radius such that are evenly distributed in the observed image, half of them will be used to calibrate the pose of the camera and the other half for testing. That is, they are all projected to image coordinates, ten world-image pairs will be used to estimate the pose. Then prediction of the world coordinates of the ten unused image detections to be compared to their corresponding world coordinates is performed.
With the calibration points and the estimation of intrinsic parameters previously obtained, the calibration procedure to sample the six dimensional pose space is applied. In every case 30 chains of 1000 MC samples are drawn. The means and variances of the six sets of samples are used to back-project the synthetic image detections and their uncertainty to the corresponding georeferenced positions. Figure 8 shows the projected ellipses on the world reference frame, the size of the ellipses and the error with respect to the true position has been magnified by a factor of 10 to make the disparity visible in figure 8. The projected uncertainty is smaller for positions closer to the camera and also the ellipses are less elongated because those regions hold a better view factor, as the projected point gets further away from the camera the uncertainty grows, specially in the radial direction due to the perspective effect. To visualize all the projections errors it is linearly transformed each projection error to the space where the projected covariance becomes the identity matrix as in section 4.1. In figure 8(c) all the calibration points have been transformed in this way, for reference the circle of 90% probability is drawn. Table 2 reports the root mean squared deviation between the real world positions and the back-projections of the calibration points and prediction test points. The prediction error on testing points is always greater than the error on calibration points and both are in the order of 10 −1 m.

Extrinsic calibration and prediction with real data
Following section 4.3 calibration points are used to estimate the camera pose in a real world situation and geolocate the trajectory of a vehicle.
The camera was placed 15.7±0.2 m above the ground, this is the a priori information used for calibration. Manually defining = M 19 calibration points that consist of corresponding pairs of image and latitudelongitude coordinates. The terrain where the experiment took place is even and horizontal, so that the assumption = z 0 w holds. Also this facilitates the conversion of the world coordinates to and from different representations (degrees of latitude-longitude, pixels inside a satellite image, meters) using a simple scaling factor. The point on the floor directly below the a priori position of the camera was defined as the coordinate origin ( ) 0 m, 0 m of the ground plane. Detections in the image were assigned 1 pixel of standard deviation. Figure 9 shows the image calibration points and its corresponding latitude-longitude points. The trajectory of a car as it traverses the field of view of the camera is shown in figure 9(a) and this detections have 1 pixel of standard deviation, they correspond to a feature of the car close to the ground.
Using the estimation of intrinsic parameters from section 4.3 and the a priori information, Differential Evolution Metropolis returned 60 chains of 9500 samples of 6-D rotation-traslation vectors. The mean and variance of the samples are On the right, after the non-linear optimization, the back-projection has improved significantly with respect to before the optimization (left). Where the rotation component ofm Q (first three elements) is in radians and the translation component is in meters, the standard deviation of the former being~ 0.3 and ∼0.1 m of the latter.m G is the rotation-translation parameters of the world reference frame from the point of view of the camera, the position of the camera in the , , x y z x y y . It yields a height of 17.0 m with a std of 0.14 m, in the order of the actual height above ground.
The predicted car trace in world coordinates has an uncertainty that combines the estimated uncertainties of the intrinsic parameters, extrinsic parameters and image detection. In figure 10 the blue ellipses are the 90% probability regions, drawn every few back-projected detections of the car (red dots). The effect that the perspective projection has on the propagation of uncertainty has two components, one being the distance to the camera that magnifies the uncertainty, the reverse of an inverse-square law. In the inset of figure 10 it is empirically shown that the area of the 90% confidence ellipse is proportional to the square of the distance to the camera. The second component is the view factor of the back-projected point with respect to the camera that stretches the ellipse in a direction radial to the closest point to the camera. In this case the optical distortion and view factor tend to elongate the confidence ellipses in approximately the same direction, that is why the ellipses are so stretched. The smallest area of the 90% probability region is 3.15 m 2 , when the car is closest to the camera, and increases with the square of the distance as shown in the lower inset of figure 10.

Conclusion and discussion
Wide-field vision systems (based on synthetic compound eyes or omnidirectional cameras) are currently being incorporated to engineering applications related to terrestrial and aerial mobile robotics. Despite the advantages mentioned in section 1, OCs are not widely used in video surveillance applications in urban environments; where the traditional solution is still the installation of many cameras (fixed or PTZ type) each with reduced visual field.
The main limitation of using the OC's in this type of applications is the strong distortions introduced in the image. Beyond this limitation (resolved by correcting the distortions computationally [4,14]) the use of fisheye cameras has the advantage of observing a complete hemisphere of the scene at all times. This is very useful in transport-related applications in which the movement of vehicles or pedestrians in wide regions must be analyzed (for example in convoluted road intersections, see figure 10) [2]. In addition, the evaluation of geo localization uncertainties are needed for estimation algorithms based on Bayesian filters (Kalman filter, particle filter, etc) used for motion analysis and prediction, tracking and decision making on vehicular traffic violations. For these reasons, this work studies the use of a monocular omnidirectional camera to geolocate objects solving the calibration and prediction problems from a Bayesian perspective.

Bayesian approach to camera calibration
Camera calibration is a critical part of any photogrammetric system. The Bayesian approach is well suited to formulate both calibration and prediction problem in explicit probabilistic terms, and to incorporate a priori information about the camera and/or its installation pose.
Sundareswara and Schrater [12] demonstrated that the Bayesian prediction is less susceptible to statistical fluctuations than maximum likelihood estimation. Their work follows similar ideas to the present one but with critical differences. Sundareswara and Schrater [12] use a pin hole model (not dealing with severe distortions), they calibrate in one step (instead of two) with several views of the object of interest (here a single monocular view is assumed), estimating the posterior probability of the parameters and the reconstruction at the same time (here intrinsic calibration must be done prior to the installation of the camera). The result of this is a population of samples of the parameters that is later averaged, for marginalization, during 3D reconstruction (here calibration means to estimate a mean and a covariance; prediction as linear propagation automatically incorporates marginalization).
The methodology proposed in this work is designed for vehicle motion analysis applications in urban environments and consists of two calibration steps and a computationally efficient method for position prediction. The first step is very similar to standard camera calibration techniques and estimates the posterior PDF of the optical distortion parameters within the laboratory.
The second step is specific to the proposed back-projection function and estimates the posterior of the extrinsic parameters. In this case, the Bayesian approach allows for the introduction of a priori information about the camera pose provided by the installer: in the case of very few calibration points the prior should decrease the uncertainty of calibration, and also eliminate the ambiguity of multiple solutions that are typical of symmetric calibration rigs [37].
The posterior distributions of the parameters given the data are estimated with Differential Evolution Metropolis. The population of samples obtained showed that the distribution was uni-modal and bell shaped. This observation opens the possibility to replace this method with a non linear optimization to get the most probable value of the parameter and Laplace approximation to estimate the variance, which has a lower computational cost [28]. The detected car positions, they were back-projected to the world reference frame (main panel), the propagated uncertainty is drawn in blue as 90% confidence covariance ellipses. The smallest ellipse has an area of 3.2 m 2 and happens when the car is closest to the camera. The inset shows that the area of the ellipses are proportional to the square of the distance to the camera.
In the prediction step, propagation of uncertainty assumes first order approximation and it is shown that the assumption holds even if the camera has a severe optical distortion by Monte Carlo simulations in figure 6. The Jacobians for the propagation were calculated using the chain rule. Also the propagation of uncertainty can be improved by accounting for higher moments of the PDF and higher orders of Taylor expansion if the higher order derivatives of the back projection function were available. Mekid and Vaja [38] derive the expression of the propagation of up to fourth moment (including skewness and kurtosis) through a Taylor series truncated at third order for the case of 2D random vectors. This could be implemented as methods of automatic differentiation became available for high level programming languages [39].
This work assumes perfect measurement of world coordinates X w and that the fiducial points are perfectly on the ground plane, meaning that = z 0 w exactly. This are the only variables not treated as a random, they are treated as exact measurements. But uncertainty on z w could reasonably arise from two factors: the fiducial points being selected on objects slightly out of the ground plane (on a road hump or bump or on the sidewalk) and deviations of the observed surface from the assumed plane model. Errors in both variables will increase the uncertainty of the estimated extrinsic parameters Θ; and following from equation (9) this will increase the uncertainty in geolocation. Expanding the model to include uncertainty of X w and z w would complete the Bayesian formulation. This could done easily by the theorem of marginalisation of normal PDFs [28]. Also it is important to note that the retro projection model can be expanded for models of the ground other than the horizontal plane, including curved surfaces such as quadrics.

Results of the method in simulations and real data
The simulated calibrations showed that the intrinsic parameters were estimated with high accuracy and that the extrinsic calibration predicts world positions that agree perfectly with the propagated confidence ellipse ( figure 8). The ellipses are smaller if projected closer to the camera and if the view factor is small they become stretched in the radial direction, both effects tend to be more pronounced as the point projected on the horizontal plane is further away from the camera.
Calibrating with real data, the intrinsic parameters are estimated with a standard deviation of around one thousand of the estimated value (equation (23)). The increase with respect to the simulated case is because real data does not perfectly follow the proposed model. The extrinsic calibration returns the camera pose with an uncertainty of less than 1°for orientation and -10 m 1 for position (equation (24)). Predicting the world position of a car is shown in figure 10 as 90% confidence ellipses the area of the ellipse is proportional to the squared distance of the vehicle to the camera, which is the expected behavior of a perspective projection. Accuracy can be improved with more accurate calibration points and possibly by expanding the model to describe the curvature of the ground surface and optical distortion at finer level of detail.

Relationship between the proposed method and the OpenCV library
OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library which includes a comprehensive set of both classic and state-of-the-art computer vision algorithms [16] and is widely used by the artificial vision system developer community. For this reason, this work follows OpenCV's formulation for the calibration model except for the specific function that models radial distortion. This leaves open the possibility to include later other distortion models. The adoption of the stereographic model is an appropriate description of the optical distortion for the camera utilized in this work [1]. The back-projection to world coordinates is solved analytically (algoritm 1) assuming that the object is on the horizontal z w =0 plane.
OpenCV provides functionality that is similar to the solutions here proposed but with an incomplete treatment of uncertainty. The function calibrateCamera estimates intrinsic parameters by minimizing the projection error in a least squares estimator fashion [40] following Zhang [21], Bouguet [22]. It also computes the Jacobian of the projected image coordinates with respect to the parameters but not with the purpose of uncertainty propagation, it is used during the global optimization of camera calibration. It returns a vector of standard deviations of the parameters by an inverse propagation of sorts: it multiplies the unbiased estimator of the projected variance by the Moore-Penrose inverse of the Jacobian. There is no treatment of interacting terms in the covariance, it assumes the parameters are uncorrelated. This work calibrations show (equations (23) and 24) covariance matrices with non negligible interaction terms clearly meaning that the presented approach can contribute to improve OpenCV's methods. Also calibrateCamera does not take into account the uncertainty of the detected corners. The function solvePnP solves for the pose of an object given corresponding 3D-2D points and warpPerspective can map image coordinates to a world plane provided the right transformation matrix; both without treatment of uncertainty.
In sum, this work deals with a set of topics relevant to engineering applications of wide field vision systems. The algorithm developed fulfills the function of predicting the position in a map with correct quantification of position uncertainty, thus functioning as a position sensor.