Learning a Spatial Field in Minimum Time with a Team of Robots

We study an informative path-planning problem where the goal is to minimize the time required to learn a spatially varying entity. We use Gaussian Process (GP) regression for learning the underlying field. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. We study three versions of the problem. In the placement version, the objective is to minimize the number of measurement locations while ensuring that the posterior variance is below a predefined threshold. In the mobile robot version, we seek to minimize the total time required to visit and collect measurements from the measurement locations using a single robot. We also study a multi-robot version where the objective is to minimize the time required by the last robot to return to a common starting location called depot. By exploiting the properties of GP regression, we present constant-factor approximation algorithms. In addition to the theoretical results, we also compare the empirical performance using a real-world dataset, with other baseline strategies.

Abstract-We study an informative path-planning problem where the goal is to minimize the time required to learn a spatially varying entity. We use Gaussian Process (GP) regression for learning the underlying field. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. We study three versions of the problem. In the placement version, the objective is to minimize the number of measurement locations while ensuring that the posterior variance is below a predefined threshold. In the mobile robot version, we seek to minimize the total time required to visit and collect measurements from the measurement locations using a single robot. We also study a multi-robot version where the objective is to minimize the time required by the last robot to return to a common starting location called depot. By exploiting the properties of GP regression, we present constant-factor approximation algorithms. In addition to the theoretical results, we also compare the empirical performance using a real-world dataset, with other baseline strategies.
Index Terms-Informative path planning, Gaussian Process regression.

I. INTRODUCTION
S ENSING, modeling, and tracking various spatially varying entities can improve our knowledge and understanding of them. This can have significant economic, environmental, and health implications. For example, knowing the content of various nutrients in the soil of a farm can help the farmers better understand soil chemistry. Understanding soil chemistry is helpful for the farmers to improve the yield and reduce the application of fertilizers [1]. An overload of certain chemicals inside a water body may have a significant impact on marine life. Knowing the spatial extent of the spill is necessary for effective control and mitigation strategies [2]. Understanding the spatial variation of rock minerals can help in efficient mining strategies [3]. In all such applications, a key first step is the collection of data using appropriate sensors which can then be used to build models of the underlying phenomenon. However, collecting data can be tedious and often requires careful human planning. Manual data collection can also be dangerous. For example, volcano monitoring data helps to see where previous lava flows have gone and previous ash fall has occurred. However, volcanic ash is usually pulverized rocks and glass particles and potentially catastrophic for the people engaged in monitoring [4]. One alternative which would alleviate the human risks of manual data collection is the use of robots equipped with appropriate sensors to collect data.
There are many factors to consider when deploying robots for data collection. Usually, a trade-off must be made between the quantity of sensing resources (e.g., number of deployed robots, energy consumption, mission time) and the quality of data collected. The robots can be deployed to act as stationary or mobile sensors depending on the application ( Figure 1). Deploying robots to function as mobile sensors is especially challenging because of the need for path planning. While deploying mobile robotic sensors, one needs to plan the most informative resource-constrained observation paths to minimize the uncertainty in modeling and tracking the spatial phenomena.
Planning informative resource-constrained observation paths for robot sensors to estimate a spatially varying entity, often known as Informative Path Planning (IPP), has received recent attention in the robotics community [5]- [10]. IPP deals with the problem of deciding an autonomous robot path along which maximum possible information about a quantity of interest can be extracted while operating under a set of resource constraints. In this paper, our quantity of interest is a spatially varying phenomenon, often represented by a spatial field 1 . Generally, the underlying spatial field is specified by a probabilistic model. One of the commonly used probabilistic models is Gaussian Process (GP) [11]. GPs provide an mathematically convenient way of performing non-parametric regression while making fewer assumptions on the underlying field. They allow for expressing domain knowledge through the choice of kernel functions. In particular, for spatially varying fields, numerous studies have shown the efficacy of modeling with GPs [12]. An alternative would be geometric models which make strong assumptions and cannot represent the stochastic noise in the measurements directly [5]. Thus, probabilistic models make a suitable candidate for such scenarios.
Once the underlying spatial field is modeled, the next task is to plan a robot path based on that model. The robot travels along a path planned in this step. Several metrics can be used to perform the planning step. An information-theoretic metric such as mutual information, entropy, or variance is typically used as a criterion to drive the robot to sampling locations [13]. Generally, the information-theoretic metrics are submodular and hence, an approximation guarantee can be given on the performance of the resulting algorithms [5]. Unfortunately, the information-theoretic metrics such as entropy, mutual information, etc. are indirect and do not consider the accuracy of the predictions. Unlike these works, we study how to ensure that the GP predicted mean 2 is accurate and present a constantfactor approximation algorithm if the hyperparameters of the GP kernel do not change.
We use variance of GP prediction as the metric to perform the planning step. Predictive variance also turns out to be the Mean Square Error (MSE) in GP prediction if the hyperparameters are known [14]. Our goal in this work is to plan the informative paths such that the predictive variance at all locations is below a predefined threshold, ∆, after collecting measurements using mobile sensors. This leads to same guarantees on MSE as well. We study three related problems, that of: 1) finding measurement locations to make measurements; 2) planning a tour for a single robot to visit those measurement locations; and 3) planning tours for multiple mobile robots to ensure that the predictive variance is below ∆. The objective is to minimize the number of measurement locations in the first problem and the total tour time in the second problem. With multiple robots, the objective is to minimize the maximum time taken among all the robots. The total tour time is given by the measurement time and the travel time between measurement locations. The measurement time depends on the number of measurements taken at each location as well as the time to take a single measurement. Depending on the sensor, the measurement time can be zero (e.g., cameras) or nonzero (e.g., soil probes measuring organic content). We show that a non-adaptive algorithm suffices to solve the problem and yields a polynomial-time constant-factor approximation to the optimal algorithm. While other algorithms have been proposed before for estimating spatial fields, this is the first result that provides the theoretical guarantees on the total time for ensuring predictive accuracy at all points. Our main contributions include: • introducing stationary sensor placement and mobile sensor algorithms for ensuring that the predictive variance, and hence MSE, at each location in the environment is below a predefined threshold, • providing polynomial-time constant-factor approximation guarantees on their performance, and • showing their performance on a real-world dataset comprising of organic matter concentrations at various locations within a farm. Similar problems have been studied in the literature. For example, Yfantis et. al. [15] studied a stationary sensor problem. Their approach considers and investigates only three types of pre-defined placement designs while for a general case none of them may be a good design. The algorithms presented in this work are not restricted to any pre-defined placement strategy. Further, we are not aware of any existing theoretical guarantees on the mobile sensor problems studied in this paper.
The rest of the paper is organized as follows: In section II, we present a discussion on related works and background on the problems studied. In section III, we formally present the problems and their solutions. Simulation results are presented in the section IV followed by the conclusion and scope for the future work in section V.
A preliminary version of this work was presented at the 13 th International Workshop on the Algorithmic Foundations of Robotics (WAFR'18) [16]. In the preliminary version, we provided guarantees for the chance constraints of incorrect predictions using an aggregate measure of prediction error. In current work, a more direct performance criterion, MSE at each location of the environment, is considered which leads to stronger theoretical guarantees. Also, an extension of the algorithms for the multi-robot case is presented.

II. RELATED WORK AND BACKGROUND
We begin by reviewing the related work in sensor placement where the goal is to cover a given environment using sensors placed at fixed locations and mobile sensing where sensors can move and collect measurements from different locations.

A. Stationary Sensor Placement
When monitoring a spatial phenomenon, such as temperature or humidity in an environment, selection of a limited number of sensors and their locations is an important problem. The goal in this problem is to select the best k out of n possible sensor locations and use the measurements from these to make predictions about the spatial phenomenon. The typical formulation of a sensor selection problem makes it NP-hard [17]. Previous work used global optimization techniques such as branch and bound to exactly solve this problem [18], [19]. However, these exact approaches are often computationally intensive.
One can solve the task as an instance of the art-gallery problem [20], [21]: Find the minimum set of guards inside a polygonal workspace from which the entire workspace is visible. However, this version of the problem only covers visionbased sensors and does not consider noisy measurements [5].
An alternative approach from spatial statistics is to learn a model of the phenomenon, typically as a GP [22], [23]. The learned GP model can then be used to predict the effect of placing sensors at locations and thus optimize their placement. For a given GP model, many criteria including information-theoretic ones have been proposed to evaluate the quality of placement. Shewry and Wynn introduced the maximum entropy criterion [24] where the sensors are placed sequentially at the locations of highest entropy. Ko et al. [25] proposed a greedy algorithm by formulating the entropy maximization as maximizing the determinant of the covariance matrix. However, the entropy criterion tends to place the sensors at the boundary of the environment thus wasting sensed information [26]. Mutual information (MI) can be used as well [22], [27], [28]. Krause et al. [5] study the problem of maximizing MI for optimizing sensor placement problem. They present a polynomial-time approximation algorithm with constant factor guarantee by exploiting submodularity [29]. Eventually, they show that MI criterion leads to improved accuracy with a fewer number of sensors compared to other common design criteria such as entropy [24], A-optimal, Doptimal, and E-optimal design [30].
The above-mentioned methods estimate the prediction error indirectly. Nguyen et al. [31] consider choosing a set of n potential sensor measurements such that the root mean square prediction error is minimized. They present an annealing based algorithm for the sensor selection problem. Their algorithm starts by selecting a potential subset of cardinality k from the entire population of sensor locations. After that, it iteratively attempts to substitute the members of the selected subset by its neighbors according to an optimization criterion.
None of the criteria discussed above cannot directly make any guarantees on the MSE in predictions at each point in the environment. Instead, we design a sensor placement algorithm which results in an accurate reconstruction of the spatial field using the collected sensor measurements. Most works in the past have focused on optimizing an objective function (entropy, MI, etc.) given the resource constraints (limited energy, number of sensors, and time, etc.). We optimize the resource requirement given the objective constraint (MSE below a predefined threshold ∆), predictive accuracy more than a predefined threshold in our case.

B. Mobile Sensing
The goal in the mobile sensing problem, also known as Informative Path Planning (IPP), is to compute paths for robots acting as mobile sensors. Paths are being computed in order to accurately estimate some underlying phenomenon, typically a spatial field [32], [33]. A central problem in IPP is to identify the hotspots in a large-scale spatial field. Hotspots are the regions in which the spatial field measurements exceed a predefined threshold. In many applications, it is necessary to assess the spatial extent and shape of the hotspot regions accurately. Low et al. presented a decentralized active robotic exploration strategy for probabilistic classification/labeling of hotspots in a GP-based spatial field [34]. The time needed by their strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, real-time active sampling. Another formulation in hotspot identification is that of level set identification [35].
Previous works on level set boundary estimation and tracking [36]- [38] have primarily focused on communication of the sensor nodes, without giving much attention to individual sampling locations. Bryan et al. [39] proposed the straddle heuristic, which selects sampling locations by trading off uncertainty and proximity to the desired threshold level, both estimated using GPs. However, no theoretical justification had been given for its use and its extension to composite functions [40]. Gotovos et al. proposed a level set estimation algorithm, which utilizes GPs to model the target function and exploits its inferred confidence bounds to drive the selection process. They provided an information-theoretic bound on the number of measurements needed to achieve a certain accuracy, when the underlying function is sampled from a GP [41].
In many mobile sensing problems, it is not enough to identify only a few specific regions but estimate the entire spatial field accurately. It can be formulated as a path planning problem to observe a spatial field at a set of sampling locations, and then making inference about the unobserved locations [42]. Choosing and visiting the sample locations so that one can have an accurate prediction (point prediction and/or prediction interval) is of great importance in soil science, agriculture, and air pollution monitoring [28]. The objective functions used are usually submodular and thus exhibit a diminishing returns property. Submodularity arises since nearby measurement locations are correlated [43]. Chekuri and Pal introduced a quasipolynomial time algorithm [44] for maximizing a submodular objective along the path using a recursive greedy strategy. This algorithm was further extended by Binney et al. [45] for spatiotemporal fields using average variance reduction [46] as the objective function.
Zhang and Sukhatme proposed an adaptive sampling algorithm consisting of a set of static nodes and a mobile robot tasked to reconstruct a scalar field [47]. They assume that the mobile robot can communicate with all the static nodes and acquire sensor readings from them. Based on this information, a path planner generates a path such that the resulting integrated mean square error is minimized subject to the constraint that the boat has a finite amount of energy.
An important issue in designing robot paths is deciding the next measurement location [6], [48]- [50], often referred to as the exploration strategy. Traditionally, conventional sampling methods [51] such as raster scanning, simple random sampling, and stratified random sampling have been used for single-robot exploration. Low et al. presented an adaptive exploration strategy called adaptive cluster sampling. It was demonstrated to reduce mission time and yield more information about the environment [52]. Their strategy performs better than a baseline sampling scheme called systematic sampling [53] using root mean squared error as a metric. A different adaptive multi-robot exploration strategy called MASP was presented in [54] which performs both widearea coverage and hotspot sampling using non-myopic path planning. MASP allows for varying adaptivity and its performance is theoretically analyzed. Further, it was demonstrated to sample efficiently from a GP and logGP. However, the time complexity of implementing it depends on the map resolution, which limits its large-scale use. To alleviate this computational difficulty, an information-theoretic approach was presented in [55]. The time complexity of the new approach was independent of the map resolution and less sensitive to the increasing robot team size. Garnett et al. [56] considered the problem of active search, which is also about sequential sampling from a domain of two (or more) classes. Their goal was to sample as many points as possible from one of the classes.
Yilmaz et al. [57] solved the adaptive sampling problem using mixed integer linear programming. Popa et al. [49] posed the adaptive sampling problem as a sensor fusion problem within the extended Kalman filter framework. Hollinger and Sukhatme proposed a sampling-based motion planning algorithm that generates maximal informative trajectories for the mobile robots to observe their environment [8]. Their information gathering algorithm extends ideas from rapidlyexploring random graphs. Using branch and bound techniques, they achieve efficient optimization of information gathering while also allowing for operation in continuous space with motion constraints. Low et al. [58] presented two approaches to solve IPP for in situ active sensing of GP-based anisotropic spatial fields. Their proposed algorithms can trade-off active sensing performance with computational efficiency. Ling et al. [9] proposed a nonmyopic adaptive GP planning framework endowed with a general class of Lipschitz continuous reward functions. Their framework can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners flexibility to specify choices for defining new tasks. Tan et al. [59] introduced the receding-horizon crossentropy trajectory optimization. Their focus was to sample around regions that exhibit extreme sensory measurements and much higher spatial variability, denoted as the region of interest. They used GP-UCB [60] as the optimization criteria which helps in exploring initially and converging on regions of interest eventually.
A naive implementation of GP prediction scales poorly with increasing training dataset size. Sparse GP frameworks can overcome this problem by using only a subset of the data to provide accurate estimates. A state-of-the-art sparse GP variant is SPGP [61]- [64]. The SPGP framework learns a pseudo subset that best summarizes the training data. Mishra et al. introduced an online IPP framework AdaPP [65] which uses SPGP.

C. Sensing with Multiple Robots
Mobile sensing can be made faster by distributing the task among several robots. Multi-robot systems can do complex tasks and have been widely used in environmental sampling [66], coverage [67]. Robots can use local communication or control laws to achieve some collective goals.
Singh et al. [6] proposed a sequential allocation strategy that uses GP regression, which can be used to extend any single robot planning algorithm for the multi-robot problem. Their procedure approximately generalizes any guarantees for the single-robot problem to the multi-robot case. However, the approach works only when MI is the optimization objective. Cao et al. [58] presented two approaches along with their complexity analysis addressing a trade-off between active sensing performance and time efficiency. Luo et al. [68] combined adaptive sampling with information-theoretic criterion into the coverage control framework for model learning and simultaneous locational optimization. They presented an algorithm allowing for collaboratively learning the generalized model of density function using a mixture of GPs with hyperparameters learned locally from each robot. Kemna et al. [69] created a decentralized coordination approach which first splits the environments into Voronoi partitions and makes each vehicle then run within their own partition. Other multirobot approaches used in other domains, e.g. exploration and estimation with ground vehicles, include auction-based methods [70]- [72] and spatial segregation, typically through Voronoi partitioning [73], [74].
Tokekar et al. [10] presented a constant factor approximation algorithm for the case of accurately classifying each point in a spatial field. The first step in the algorithm is to determine potentially misclassified points and then to find a tour visiting neighborhoods of each potentially misclassified point. In this paper, we study a regression version of the problem where every point is of interest. We exploit the properties of GP and squared-exponential kernel to find a constant-factor approximation algorithm. Before the details of the algorithms, we review some relevant background and useful properties of GPs and MSE.

D. Gaussian Processes
In GP regression, the posterior variance at any test location x is given by, where K(X, X) is the kernel matrix with entries, Here, σ 2 0 , l, ω 2 are signal variance, length scale, and additive independent and identically distributed Gaussian measurement noise respectively [11]. We use the same value of length scale along each input dimension. Note that the posterior variance at a particular location x conditioned on set of observations at locations X = {x 1 , . . . , x n } does not depend on the actual observation but only on the locations from where the observations are collected. Multiple observations at a location is equivalent to that location being counted as many times as the number of measurements. The kernel is a function that measures the similarity between two measurement locations [11].
Since the posterior variance is a function of only the measurement locations, the posterior variance for all points in the environment can be computed a priori, if the measurement locations are known, even without making any observations. In many implementations [13], [33], [75], the hyperparameters for the kernel k are tuned online as more data is gathered. As such, the hyperparameters may change with the observed data and the posterior variance will depend on the data observed, which may require adaptive planning. We assume that the hyperparameters are estimated a priori. This is done using prior data from the same or similar environments or a pilot deployment over a smaller region, as described in [5], [76], [77]. Example applications are underwater inspection [76] and occupancy map building [77], where prior data is used for determining hyperparameters before the actual deployment. Nevertheless, one can perform sensitivity analysis of the presented algorithms by varying the hyperparameters [78], [79].
The posterior meanμ x|X at a location x is given by a weighted linear combination of the observed data, where y = {y 1 , . . . , y n } denotes the observations at locations X = {x 1 , . . . , x n }.

E. Mean Square Error
MSE measures the expected squared difference between an estimator and the parameter the estimator is designed to estimate [80]. The MSE at a location x for an estimatorf is, where the Equation 3 is the bias-variance expression for the estimator. GP predicted valuef (x) at a location x is an unbiased estimator of the true value f (x) [15] and has a normal distribution with mean given by Equation 2, and variance given by Equation 1. Among all linear and non-linear estimators, GP is the best in terms of minimizing MSE [14], [81]. Further, GPs are unbiased and hence, the MSE at a location x is equal to the posterior variance of the predicted value, i.e., From Equation 4, one can deduce that MSE for GPs is same as the posterior variance and hence, any guarantees for the posterior variance hold for MSE as well.
III. ALGORITHMS In this section, we formally define the problems and the algorithms. We assume that the environment is a two dimensional area U ⊂ R 2 and the underlying spatial field is an instance of a GP, F [15]. F has an isotropic covariance function of the form, defined by a squared-exponential kernel where the hyperparameters σ 2 0 and l are known a priori. Let X denote the set of measurement locations within U produced by an algorithm.
Problem 1 (Placement). Find the minimum number of measurement locations, such that the MSE at each location in U is below ∆< σ 2 0 , i.e., minimize |X|, where |X| is the cardinality of X and M SE(x) is the MSE at location x.
Problem 2 (Mobile). Find the minimum time trajectory for a mobile robot that obtains a finite set of measurements at one or more locations in U , such that the MSE at each location in U is less than ∆, i.e., minimize len(τ ) + ηn(X), τ denotes the tour of the robot. Robot travels at unit speed, obtains one measurement in η units of time and obtains n(X) total measurements.
The robot may be required to obtain multiple measurements from a single location. Therefore, the number of measurements n(X) can be more than |X|. For multiple robots, their tours can start at the same starting location (often referred to as a depot) or can start at different locations. In this paper, our focus is on the former case. The latter case is more appropriate when the robots must persistently monitor the environment.
Problem 3 (multi-robot). For k robots starting from a given starting location (depot), design a set of trajectories that collectively obtain a finite set of measurements at one or more locations in U , such that the MSE at each location in U is less than ∆, i.e., minimize max i∈{1,...,k} τ i denotes the tour of the i th robot and X i the subset of measurement locations covered by the i th robot. The robots travel at unit speed, obtain one measurement in η units of time. i th robot obtains n(X i ) total measurements.
The solution for Problem 1 is a subset of the solution to Problem 2. Further, the solution for Problem 3 is derived from the solution for Problem 2. The three algorithms build on top of each other by: (1) finding a finite number of measurement locations for the robot; (2) finding a tour to visit all the measurement locations; and (3) splitting the tour from step 2 in multiple sub-tours for k robots. We exploit the properties of squared-exponential kernel to find the measurement locations. By knowing the value at a certain point within some tolerance, values at nearby points can be predicted albeit up to a larger tolerance.

A. Necessary and Sufficient Conditions
We start by deriving necessary conditions on how far a test location can be from its nearest measurement location. A test location corresponds to a point in the environment where we would like to make a prediction.
Lemma 1 (Necessary Condition). For any test location x, if the nearest measurement location is at a distance r max away, and, then it is not possible to bring down the MSE below ∆ at x.
Proof. Consider the posterior varianceσ 2 x (n) at x (which is also equal to the M SE at x from Equation 4) after collecting n measurements, possibly from different locations. A lower bound onσ 2 x (n) can be obtained by assuming that all measurements were collected at the nearest location x i to x. This is based on the fact that the closer the observation, lower the predictive variance. A mathematical proof for this is provided in Appendix A. Let the nearest measurement location x i is distance r away from x. Assuming that all n measurements were collected at x i , lower bound for posterior variance at x can be calculated using Equation 1, It is worth mentioning that the square matrix in Equation 7 is of order n × n since there are n measurements. Substituting the value for k(x, x i ) = σ 2 0 exp − r 2 2l 2 in Equation 7 and performing the required matrix operations, we get, Therefore, Even if we had collected infinitely many measurements at the nearest location x i , the posterior variance will still be lower bounded as, If the posterior variance at x even with the infinitely many measurements collected at the nearest measurement location x i (Equation 12) is greater than ∆, i.e., then it is not possible to bring down the MSE at x below ∆ in any circumstance.
Next, we prove a sufficient condition that if every point in the environment where no measurement is obtained (test location) is sufficiently close to a measurement location, then we can make accurate predictions at each point.

Lemma 2 (Sufficient Condition).
For a test location x ∈ U , if there exists a measurement location x i ∈ X, r distance away from x with n suff measurements at x i , such that, then GP predictions at x will be accurate, i.e., MSE at x will be smaller than ∆.
Proof. From Equation 10, we have an expression for variance of the posterior predictive distribution at x. Taking the other measurement locations into consideration can not increase the posterior variance at x. Information never hurts [82]! To prove the sufficiency, we consider n suff measurements at x i only and discard others knowing that the other locations can not increase the posterior variance at x. Bounding the expression in Equation 10 with ∆ results in, Lemma 2 gives a sufficient condition for GP predictions to be accurate at any given test location x ∈ U . The following lemma shows that a finite number of measurements n suff =n α , are sufficient to ensure predictive accuracy in a smaller disk of radius 1 α r max around x i , where α > 1 (Figure 2). Lemma 3. Given a disk of radius 1 α r max centered at x i , n α measurements at x i suffice to make accurate predictions for all points inside the disk, where, Proof. We want a sufficiency condition on the number of measurements n α inside a disk of radius 1 α r max . Lemma 2 gives an upper bound on the radius of a disk such that all points inside the disk will be accurately predicted after n α measurements at the center. We construct a disk (D 2 in Figure 2), whose radius is equal to 1 α r max such that, Plugging in the value of r max from Lemma 1, squaring both sides in Equation 19 and re-arranging for n α gives the required bound stated in Lemma 3. Ceiling function in Equation 18 accounts for the fact that n α is an integer.
A packing of disks of radius r max gives a lower bound on the number of measurements required to ensure predictive accuracy. On the other hand, a covering of disks of radius 1 α r max gives us an upper bound on the number of measurements required. To solve Problem 1, what remains is to relate the upper and lower bound and present an algorithm to place the disks of radii 1 α r max .

B. Placement of Sensors for Problem 1
We use an algorithm similar to the one presented by Tekdas and Isler [83] for stationary sensor placement in order to track a target using bearing sensors. In their case, the goal is to place sensors such that irrespective of where the target is in the environment, there are at least three sensors forming a triangle that get good quality bearing information of the target. The show how to cover the environment with disks and place a triangle of sensors within each disk. The setup is different from the one we have; however, we use a similar disk coverage strategy as a subroutine here. The exact procedure is outlined in Algorithm 1. Proof. Denote the set of measurement locations computed by the optimal algorithm to solve the Problem 1 by X * . The function MIS in Step 1 of Algorithm 1 computes a maximally independent set of disks: the disks in I are mutually nonintersecting (independent) and every disk in X \I intersects at least one disk in I (maximal). The set I can be computed by a disk Fig. 3. We cover each 3rmax radius disk with 1 α rmax radii disks (smaller gray disks) in lawn-mower pattern. 18α 2 disks suffice to cover the bigger disk. The locations of disks of radii 1 α rmax inside a disk of radius 3rmax are obtained by covering the square circumscribing bigger disk with smaller squares inscribed in smaller disks. The centers of smaller squares coincide with the centers of smaller disks. simple polynomial greedy procedure: choose an arbitrary disk d from X , add it to I, remove all disks in X which intersect d, and repeat the procedure until no such d exists.
An optimal algorithm will collect measurements from at least as many measurement locations as the cardinality of I. This can be proved by contradiction. Suppose an algorithm visits measurement locations fewer than the number of disks in I. In that case, there will exist at least one disk of radius r max in I which will not contain a measurement location. This means that there will be at least a point in that disk which will be more than r max away from each measurement location. From Lemma 1, the robot can never make accurate predictions at that point and hence violating the constraint in Problem 1. Hence, |I| ≤ |X * |.
Every disk in X intersects at least one disk in I and hence, lies within 3r max of the center of a disk in I. As a result,X disks cover all the X disks and hence, the entire environment. 3 Collecting measurements from 18α 2 locations inside a 3r max disk suffice to make accurate predictions in that disk (satisfying the Problem 1 constraint for points belonging to that disk) as illustrated in Figure 3. DISKCOVER collects measurement from 18α 2 such locations per disk inX . It collects measurements from a total of 18α 2 |X | locations, hence, satisfying the constraint for all points in the area covered by union ofX disks. Since, union ofX disks covers the entire environment, DISKCOVER satisfies the constraint for all points in the environment. Multiplying both sides of Equation 20 with 18α 2 , we get, 18α 2 |I| ≤ 18α 2 |X * |. Note that |X | = |I|. Hence, where, n DISKCOVER is the number of measurement locations for DISKCOVER.

C. Finding an Approximate Optimal Trajectory for Problem 2
The algorithm for Problem 2 builds on the algorithm presented in the previous section. The locations where measurements are to be made become the locations that are to visited by the robot. The robot must obtain at least n α measurements at the center of each disk of radius 1 α r max . A pseudo-code of the algorithm is presented in the Algorithm 2. Proof. From Theorem 1, we have a constant approximation bound on number of measurement locations. Let the time (travel and measurement time) taken by the optimal algorithm be T * 1 . Using notation from Theorem 1, we assume that the optimal traveling salesperson with neighborhoods (TSPN) time to visit disks in I be T * I . In TSPN, we are given a set of geometric neighborhoods, and the objective is to find the shortest tour that visits at least one point in each neighborhood (disks in this case) [10]. The optimal algorithm will visit at least all disks once in I which gives the following minimum bounds on the optimal travel time (T * travel ) and optimal measurement time (T * measure ), Let the optimal time to visit the centers of disks in I be T * I C . An upper bound on T * I C can be established by the fact that upon visiting each disk, the robot can visit the center of that disk and return back by adding an extra tour length of 2r max , i.e., a detour of maximum length |I| × 2r max for all disks in I. As a result: T * I C ≤ T * I + 2r max |I|. Using inequality from Equation 23: T * I C ≤ T * travel + 2r max |I|. For any disk inX , the length of lawn-mower path starting from the its center and return back (Figure 3) after visiting all center points of 1 α r max disks will be of order O(α 2 )r max . Hence, the total travel time for DISKCOVERTOUR is: T C + |I|O(α 2 )r max , where T C is the (1 + )-approximated time with respect to the optimal TSP tour returned by the (1 + )-approximation algorithm to visit the centers of the disks inX (or I disks since they are concentric). T C can be calculated in polynomial time [84] having bounds: T C ≤ (1+ ) T * I C , with T * I C being the optimal TSP time to visit the centers ofX disks. Measurement time for DISKCOVERTOUR is 18α 2 ηn 2 |I|. Hence, the total time T 1 alg for DISKCOVERTOUR is, n 2 is the number of sufficient measurements required inside a disk of radius 1 2 r max (Lemma 3 with α = 2). Length of any tour that visits k non-overlapping equal size disks of radii r is at least 0.24kr [85], which gives 0.24r max |I| ≤ T * I . Combining this result with Equation 23 modifies the bounds in Equation 26 as, ≤ max 9.33(1 + ) + 82 where c, a constant, is larger one of the two quantities inside the bracket in Equation 30.
Note that the Algorithm 2 collects same number of measurements from each measurement location. There may be another algorithm that collects different number of measurements from different locations which may result in better performance. This modification is an avenue for future work.

D. Finding an Approximate Optimal Trajectory for Problem 3
When one robot can not handle a large territory, to speed up the task, k robots can be sent to collectively visit1 all measurement locations. A natural objective is to ensure that no robot has too large of a task. Hence, We choose our optimization criterion as minimizing the maximum of the krobot tour costs. This is equivalent to minimizing the time taken by the last robot to return back to the common starting location. Our proposed algorithm only works if the robots start and return back to the same location -called depot. Any measurement location can be chosen as the depot but in our case, we assume that the robots start from and return back to a pre-defined depot.
We now describe an algorithm which employs a toursplitting heuristic to plan for k robots. We modify the heuristic proposed by Frederickson et al. [86] to account for the measurement time, and not just the travel time.
Let the output tour of the robot from Algorithm 2 be denoted by τ and l max be the distance of farthest measurement location from the depot. Algorithm 3 k-DISKCOVERTOUR 1: procedure 2: Input: Tour calculated from Algorithm 2, Depot location x 1 . 3: Output: k approximate optimal paths visiting all measurement locations collectively. 4: begin 1) For j th robot, 1 ≤ j < k, find the last measurement location x p(j) such that the time taken to travel from x 1 to x p(j) along τ is not greater than j k T 1 alg − (2l max + ηn 2 ) + (l max + ηn 2 ). 2) Obtain k subtours as R 1 = (x 1 , . . . , x p(1) , x 1 ), . , x n , x 1 ). Proof. First, we prove that the time taken along every subtour is bounded and eventually show that the bound is within a constant factor of the optimal time. With k robots, let the subtours for 1 st and k th robot are x 1 → x p(1) − → x 1 and x 1 → x p(k−1)+1 → x 1 respectively (an example with k = 5 is shown in Figure 4). Subtours for the remaining robots can be denoted by Substituting j = 1 in Algorithm 3, the time to travel from x 1 to x p(1) along τ , T (x 1 τ − → x p(1) ) is no greater than 1 k T 1 alg − (2l max + ηn 2 ) + (l max + ηn 2 ). Time for the first subtour is hence bounded by T (x 1 i.e., 1 k T 1 alg − (2l max + ηn 2 ) + (l max + ηn 2 ) + l max . From the condition in Algorithm 3, we know that x p(k−1) is the last location such that, and hence, Subtracting both sides from T 1 alg , which gives, and hence, time for the last subtour is bounded by i.e., 1 k T 1 alg − (2l max + ηn 2 ) + (3l max + 2ηn 2 ) + l max . Similar inequalities can be derived for remaining subtours as follows. For 1 ≤ j ≤ k − 2, following inequalities hold from Algorithm 3, i.e., the time taken along remaining subtours, also bounded by 1 k T 1 alg − (2l max + ηn 2 ) + 2l max . Hence, we can conclude that the time taken along each subtour does not exceed 1 k T 1 alg − (2l max + ηn 2 ) + (4l max + 2ηn 2 ). Let T k alg be the time taken for largest of the k subtours generated by the Algorithm 3, and T * k be the cost of the largest subtour in an optimal solution to Problem 3. We have, From the triangle inequality, T * k ≥ 1 k T * 1 . It is natural to think that at least one robot will have to go to the farthest location from the depot and come back from there after collecting n 2 measurements which gives us a lower bound on the output of the optimal algorithm, i.e., 2l max + ηn 2 ≤ T * k . Combining these results with Equation 30, we get,

IV. EMPIRICAL EVALUATION
In this section, we report results from empirical evaluation of the theoretical results. We show qualitative and quantitative comparison of our algorithms with other baseline strategies through simulations using precision agriculture as our motivating example.
Dataset: We use a real-world dataset [87], collected from a farm, consisting of organic matter (OM) measurements manually collected from several hundred locations within the farm. The maximum and minimum values of the underlying field are 54.6 parts per million (ppm) and 25.4 ppm respectively shown by the colorbar (Figure 5(a)). Taking this into account, we set ∆ to be equal to 4 which is 10% of the average of maximum and minimum field values. We use a simulated sensor that The squared-exponential kernel has three hyperparameters: length scale (l), signal variance (σ 2 0 ), and noise variance (ω 2 ). The values of l, σ 0 , and ω 2 were estimated to be 8.33 meter, 12.87, and 0.0361 respectively by minimizing the negative log-marginal likelihood of the manually collected data. We assume that the estimated values are the true values of the kernel hyperparameters. In a general application where some prior data is available, the hyperparameters can be estimated in a similar way. We used the GPML toolbox to perform the necessary GP operations [88].

A. Qualitative Example
Stationary Sensor Placement: The final predicted OM content after performing inference using the measurements obtained is shown in Figure 5(b). This predicted OM content is the average of ten trials. In each trial, the reported value by the sensor can be different even at the same location because of the simulated noise. Figure 5(c) shows a plot of the prediction error averaged over those ten trials. We observe that the average prediction error is below ∆ = 4 ppm at each location in the environment. It is important to mention that average prediction error is not same as the MSE. The MSE at a location is the expected squared error in prediction at that location. The average prediction error, referred as empirical MSE in the following text, is an empirical estimate of that expectation. As the number of runs increases, the empirical MSE will converge to the actual MSE. We verify it through simulations and report the results later. Our theoretical guarantees hold for the MSE and not for the empirical MSE. However, one can expect that the empirical MSE will also be less than the pre-defined threshold ∆ given enough trials. The regions where the OM content changes sharply tend to be more erroneously predicted as shown by the lighter colored regions in Figure 5(c). This can be attributed to the inherent smoothness assumptions of a squared-exponential kernel.
Single and Multi-Robot Tours: The measurement locations computed by the DISKCOVERTOUR are shown in Figure 6(a). As post-processing, we removed the redundant measurement locations in overlapping 3r max radii disks. After performing this step, the total number of measurement locations was 2320. A covering of the farm with disks of radii 3r max and an approximate optimal tour visit the centers of those disks calculated by DISKCOVERTOUR is shown in Figure 6(b). We compute the optimal TSP tour since this is a reasonably sized instance. The lawn-mower detours visiting individual 3r max disks have been omitted to make the figure more legible. For the multi-robot version, we assume that we have three robots. Splitting of a single robot tour (Figure 6(b)) in three subtours is shown in Figure 6(c). The robots start from a common depot.
Varying values of ∆: In some applications, one may be interested in having more accurate predictions in some parts of the environment than others. Our algorithms provides a way to choose locations and plan paths in such applications as well. To demonstrate this, we divide the farm in three sub-environments that have different ∆ tolerances. The leftmost, middle, and right-most regions have thresholds of ∆ = 6, 4 and 2 ppm respectively. We solve for the measurement locations independently in each region. The corresponding r max values were calculated to be 4.97, 3.93, and 2.70 meters respectively using Equation 13. Figure 7 shows the measurement locations. One can qualitatively observe that the algorithm places fewer measurement locations in the left-most sub-environment which allows for the highest error tolerance.
An approximate TSP tour visiting the centers of all 3r max disks, in all three regions, is shown in Figure 8. The size of the disks shrinks as one moves to the right-most sub-region which has the least tolerance for prediction error. The TSP tour goes outside the environment in this case, which may be feasible if an aerial robot is used to monitor the farm. In case of applications, where the robot must stay inside the environment, we can enforce this constraint by replacing the Euclidean edge weights in the TSP input graph with the length of the shortest path between two vertices inside the environment.

B. Comparisons with Pre-defined Lawn-mower Tours
One can observe from Figure 6(a) that the measurement location pattern closely resembles a lawn-mower pattern. It motivated us to compare the performance of our algorithms and with lawn-mower plan. Figure 9 and 10 show the average posterior variance and average empirical (for ten trials) MSE respectively for a pre-defined lawn-mower pattern with varying grid resolutions on a semi-logarithmic scale. Note that the posterior variance at a test location is always same in each trial because it is not a function of the actual measurement value. The blue horizontal line corresponds to DCT and is shown for the sake of comparison.
A plot of the time taken by the robot to cover lawnmower patterns with various grid resolutions is shown in Figure 11. The lawn-mower lines in Figures 9, 10, and 11 intersect the DCT lines at approximately a resolution of 2 meters. It suggests that one would need to create a grid of approximately that resolution to achieve same performance as DCT. Figure 12 shows the average posterior variance for (a) The actual organic matter content (ppm).
(b) The predicted organic matter content (ppm). (c) Prediction error between the actual and the predicted OM content (ppm).    (c) All robots start from and return to the depot after making measurements.   DCT and a pre-defined lawn-mower of resolution 2.4 meter as a function time elapsed along a deployment (averaged over 10 deployments). We chose a resolution of 2.4 meters since a lawn-mower planner with this resolution has approximately the same number of measurement locations as DISKCOVERTOUR. We observe that both perform almost the same empirically. . lawn-mower pattern, one would need to pick a grid resolution. There is no systematic way of picking this resolution without enumerating a few combinations to analyze the trade-off between time and posterior variance or MSE. This can be wasteful. Instead, we present a systematic way of planning the measurement locations and give explicit theoretical guarantees

C. Comparison With Other Baselines
A comparison between DISKCOVERTOUR and two baselines, entropy-based, and MI-based planner is shown in Figure 13. The measurement locations for the entropy-based and MI-based planners were calculated greedily, i.e., picking  the next location at the point of maximum entropy and MI respectively as described in [5]. We study the average posterior variance and average empirical MSE in prediction as a function of the total time (measurement plus traveling) spent by the robot on the farm for each planner. After finding the measurement locations for each planner separately, TSP tours visiting those locations were calculated. The X axis in Figure 13 shows the time taken along a tour and the Y axis shows the respective metrics based on measurements collected until that point in time along the tour (averaged over ten trials). We observe that DISKCOVERTOUR performs at par with other planners. The entropy-based planner results in the most significant reduction in posterior variance and average empirical MSE initially. This can be explained by the fact that the entropy-based planning tends to spread the measurement locations far from each other resulting in covering a bigger portion of the environment initially. However, DISKCOVER-TOUR converges to a lower value of average empirical MSE and average posterior variance.

D. MSE and Variance
We verify our hypothesis that MSE is equal to the posterior variance for GPs. A plot of the mean percent difference between the empirical MSE and the posterior variance is shown in Figure 14. The mean is computed over approximately 5600 test locations which are different from the measurement locations and placed on a grid. As the number of trials increases, the mean difference between empirical MSE, which is essentially the MSE given enough number of trials, and the posterior variance decreases implying that the empirical MSE converges to the posterior variance asymptotically. In each trial, the measurement locations, test locations, and the hyperparameters are same, and therefore the variance estimates are same as well. However, the predicted value in each trial, and hence the prediction error, may be different since the actual measurement collected can be different in each trial due to the simulated noise. The effect of noise will decrease as one computes empirical estimate over a larger number of trials.  .

V. CONCLUSION
In this paper, we study several problems: Placing the minimum number of stationary sensors to track a spatial field, mapping a spatial field by a single as well as multiple robots while minimizing the time taken by the robots. For all the problems, we propose polynomial-time approximation algorithms to ensure that the mean square error in prediction the underlying spatial field is smaller than a pre-defined threshold at each point. We also derive the lower bounds on the performance of any algorithm (including optimal) to solve respective problems are provided. We show that it is possible to learn a given spatial field accurately with high confidence without planning adaptively. Note that, if the kernel parameters are optimized online, then, one would require an adaptive strategy.
The algorithms suggested in this paper perform comparatively with the baseline planners developed earlier. Our algorithms have theoretical bounds on their performance. The algorithms can also be generalized to 3D mapping, even though we illustrate using 2D examples. The disks in the 2D case will be replaced by spheres in 3D. The disk packing/covering problem becomes a sphere packing/covering. The tour will need to visit points in 3D, as opposed to 2D. The existing TSP algorithms already apply to the 3D case [42]. Our ongoing work is on developing competitive strategies for spatio-temporal learning and deriving similar guarantees for adaptive cases. APPENDIX A Proof. Consider two measurement locations x 1 , x 2 and a test location x such that x 1 is closer to x. The posterior variance at x if a measurement was collected at x 1 can be computed as follows: Similarly, the posterior variance at x if a measurement was collected at x 2 , From ||x − x 1 || 2 < ||x − x 2 || 2 and f (x) = − exp (−x) being a monotonically increasing function, we have Using this to compare Equations 44 and 45 one can easily see thatσ 2 x|x1 <σ 2 x|x2 .