Displacement-based Error Metrics for Morphodynamic Models

The accuracy of morphological predictions is generally measured by an overall point-wise metric, such as the mean-squared difference between pairs of predicted and observed bed levels. Unfortunately, point-wise accuracy met-rics tend to favour featureless predictions over predictions whose features are (slightly) misplaced. From the perspective of a coastal morphologist, this may lead to wrong decisions as to which of two predictions is better. In order to overcome this inherent limitation of point-wise metrics, we propose a new diagnostic tool for 2-D morphological predictions , which explicitly takes (dis)agreement in spatial patterns into account. Our approach is to formulate errors based on a smooth displacement field between predictions and observations that minimizes the point-wise error. We illustrate the advantages of this approach using a variety of morphological fields, generated with Delft3D, for an idealized case of a tidal inlet developing from an initially very schematized geometry. The quantification of model performance by the new diagnostic tool is found to better reflect the qualitative judgement of experts than traditional point-wise metrics do.


Introduction
Quantitative validation methods for morphodynamic models are often grid-point based; they compare observations and predictions per grid-point and compute various metrics for the entire set or subset of grid-points (e.g., Sutherland et al., 2004).Unfortunately, point-wise accuracy metrics, such as the commonly used MSE (Mean-Squared Error) and RMSE (Root-Mean-Squared Error), tend to penalize, rather than reward, the model's capability to provide information on features of interest, such as scour holes, accumulation zones and migrating tidal channels.For instance, a prediction of a morphological feature that is correct in terms of timing and size, but is misplaced in space, may not outperform even a flat bed, which is inconsistent with the common judgement of morphologists (Fig. 1).This "double penalty effect" (Bougeault, 2003), which applies in full when a feature is misplaced over a distance equal or larger than its size, makes it difficult to demonstrate the quality of a high variability prediction (Anthes, 1983;Mass et al., 2002).Clearly, a high quality validation process requires alternative validation techniques that account for the spatial structure of 2-D morphological fields.
For the verification of weather variables (e.g.precipitation), methods are being actively developed to quantify forecast performance based on spatial structure; see for instance Casati et al. (2008) and Gilleland et al. (2009) for an overview.One of the approaches in meteorology, now also pioneered in other fields (e.g., Haben et al., 2014;Ziegeler et al., 2012), is to find an optimal deformation of the predictions that minimizes the misfit with observations.This optimal deformation can be obtained by employing one of many existing image matching methods, of which optical flow techniques, designed to estimate motion, are probably most well-known in the coastal community.The result of the image matching or warping is a vector field of displacements, which can be regarded as a displacement error field.In addition, an intensity or amplitude error field may be defined as the difference between the deformed prediction and the observations (e.g., Marzban and Sandgathe, 2010), which can be seen as the point-wise error if no penalty applies for misplacements.
Existing verification methods, based on field deformation of meteorological fields, not only differ in the applied image matching method, but also in the approach to the subsequent extraction of map-mean errors.Keil and Craig (2009) determine RMS (root-mean-squared) intensity and mean displacement errors within the boundaries of precipitation features, which they then combine into a single error metric.The latter requires the normalization of the two errors to put each term on equal footing, which introduces two parameters to the formulation.In contrast, Gilleland et al. (2010) propose a combined error metric that besides the post-warp RMS intensity error and the mean displacement error also takes the original RMS intensity error into account, enabling a more fair comparison of forecast performance.Their metric, however, is not easily applicable since it requires three user-chosen weights that are dependent on the error terms themselves.
The goal of this paper is to quantify morphodynamic model performance, while taking the spatial characteristics of 2-D morphology into account.Using a field deformation technique, we have developed and tested a new diagnostic tool for the validation of 2-D morphological predictions.It includes a location (displacement) error metric and a robust and physically intuitive combined error metric that incorporates both location and intensity error.The combined metric rewards predictions to the degree that a larger error reduction can be obtained with smaller displacements.As a reference, we use the subjective but very powerful method of visual inspection of morphological patterns by experts.
Our method is outlined in Sect.2, along with a brief description of the image warping method that we have adopted to calculate the optimal deformation.Next, in Sect.3, we put the new diagnostic tool to the test, using morphological fields generated with Delft3D for an idealized case of a tidal inlet developing from an initially very schematized geometry.Section 4 concludes with a summary of our findings and the implications for morphodynamic model validation.

Method
This section outlines our two-step approach to quantify the (dis)agreement between 2-D morphological patterns.Section 2.1 describes the first step of deforming (or warping) the predicted morphology to minimize the point-wise error with observations.Next, Sect.2.2 formulates two new error metrics, a mean location error that is distilled from the displacement vector fields and a single-number error metric that measures both the correspondence with respect to location and intensity (i.e.depth-values).

Warping method
The measure of closeness between images or spatial fields is encountered in many fields from radiography to meteorology.This has led to the development of a multitude of image matching methods that, depending on the scientific field, are also named registration or warping methods.The goal of such methods is to find the optimal transformation that maps each point of a static image to a corresponding point (with the same intensity) in the moving image.Within the context of morphodynamic model validation, the static image represents the observed depth field o and the moving image the predicted depth field p.
Of all the available techniques, the class of optical flow techniques, designed to estimate small displacements in temporal image sequences, is probably the most well-known in our field.The basic assumption of optical flow is that the intensity of a moving object does not change appreciably in the considered time interval.We employ the efficient, non-rigid (i.e.allowing for free-form deformations) registration technique named Demon's registration (Thirion, 1998), which bears similarities to optical flow, in an implementation by Kroon and Slump (2009).The Demon's approach can be considered as similar to a minimization of the sum of square image intensities between the deformed predictions and observations (Pennec et al., 1999).It is therefore consistent with our quest to find the optimal deformation of the predictions that minimizes the point-wise (R)MSE.
The estimated backward pixel displacements B * = (B * x , B * y ) that are required for a given point in a static image (the observations in our validation context) to match the corresponding point in a moving image (the predictions) is given by Thirion (1998): in which α is a normalization factor that is equal to 1 in the original method and I o and I p are the intensities of the static and moving image, respectively.The latter are taken as the observed and predicted depth fields, normalized by scaling between 0 and 1.Since Eq. ( 1) is based on local information, it is solved iteratively while including Gaussian smoothing as a regularization criterion.This ensures that a realistic, smooth displacement field is found instead of an irregular field that nonetheless minimizes the sum of squares.
The normalization factor is chosen as α = 2.5 in line with Kroon and Slump (2009) and the standard deviation of the Gaussian smoothing window as σ = 4.These parameters are kept constant for all registrations presented in Sect.3. The forward displacements F * = (F * x , F * y ) from the moving to the static image can be determined from B * after the registration.Note that when in the following the subscript * is dropped, we refer to the displacement fields transferred to a physical distance.
For the purpose of model validation, we interpret d 0 = p 0 −o, with p 0 the prediction prior to warp, as the total pointwise error and d 1 = p 1 − o, with p 1 the deformed prediction as follows from the registration, as the point-wise error if no penalty is imposed for location disagreement.Next, we use this perspective in the formulation of map-mean errors.

Formulation of new error metrics
From the Demon's registration (Sect.2.1), we obtain the optimal displacement vector field between predictions and observations as well as the optimal deformation of the predictions."Optimal" in this context means that the sum of squares between the deformed predictions and observations is minimized, such that 0 ≤ RMSE 1 ≤ RMSE 0 , where RMSE 0 and RMSE 1 are the root-mean-squared errors before and after the warp, respectively.Note that we have preferred the RMSE over the MSE, since the first is measured in the same units as the data.Out of two predictions that have the same RMSE 0 , a prediction that has similar morphological features as the measurements, albeit displaced, may receive a lower RMSE 1 than a prediction that is not able to reproduce the observed morphological features at all.Thus, the RMSE 1 is expected to diagnose the agreement between morphological fields if a zero penalty is imposed for misplacements of features.However, which of the two predictions is valued the better prediction by morphologists not only depends on RMSE 0 and RMSE 1 , but also on the magnitude of the displacements required to obtain the error reduction.Therefore, we expect that the similarity in both location and intensity between morphological patterns can be fully assessed using three error metrics in concert: RMSE 0 , RMSE 1 and a mean location error D that we will formulate next from the displacement vector fields.
It is tempting to define D as the arithmetic mean of D = √ (B x 2 + B y 2 ), the field of displacement magnitudes.However, it should be realised that the optical flow problem is underconstrained; for a single grid-point, we only have information on the displacements normal to the contour lines, whereas along the contour lines the displacements are ambiguous (the so-called aperture problem).In the Demon's approach, the Gaussian smoothing acts as the necessary additional constraint, requiring that nearby grid-points have similar displacements.As a consequence, non-zero displace-ments may be found along depth contours in morphologically inactive regions (see Sect. 3), whereas these displacements do not improve the match between the deformed prediction and the observations.Therefore, we propose a weighted mean location error that weights the local backward displacement magnitudes D with their effect on the reduction of the local squared error.In this way, displacements are only taken into account to the extent that they contribute to the minimization of the sum of squares.This yields: Here SE 0 = (p 0 − o) 2 and SE 1 = (p 1 − o) 2 are the local squared errors before and after the warp, respectively, n is the number of equidistant points in the spatial domain and Whereas model performance is usually diagnosed based on RMSE 0 only, we now have two additional metrics RMSE 1 and D. In Sect.3, it is demonstrated that considering these three metrics in concert allows a full assessment of model quality, avoiding the double penalty effect for misplaced features.In practice, guidance may be required on how to weight these three metrics.Besides, the morphologist may sometimes desire a single-number summary of model performance, especially if automated calibration routines are used.To serve these needs, we propose an adjusted RMS error measure, RMSE w , that is computed from a field of weighted squared errors SE w .The latter are determined by locally weighting SE 0 and SE 1 .The purpose of the weighting procedure is to locally relax the requirement of an exact match to an extent determined by the local displacement magnitude.Figure 2 illuminates the weighting procedure for the ith gridpoint; an error reduction is awarded that is a fraction 1−δ i of the full error reduction potential (SE 0,i −SE 1,i ).
Here, δ i = D i /D max and D max is a maximum displacement length above which no relaxation is allowed.A larger fraction 1 − δ i is allowed for smaller displacement magnitudes D i , with a maximum of 1 − δ i = 1 and thus SE w,i = SE 1,i for D i = 0 m.For D i ≥ D max , we have 1 − δ i = 0 and thus SE w,i = SE 0,i .Note that D max is a user-defined, physically intuitive parameter that is dependent on the prediction situation and the goal of the simulation.It can be seen as the maximum distance over which morphological features may be displaced for the prediction to still get (some) credit for predicting these features.We now have for RMSE w : where www.adv-geosci.net/39/37/2014/Adv.Geosci., 39, 37-43, 2014 Fig. 2. Weighted squared error for the ith gridpoint SE w,i , which is the sum of the local squared error after the warp SE 1,i and a penalty for misplacements δ(SE 0,i − SE 1,i ) with δ = D i /D max .The penalty ranges from 0 for D i → 0 to (SE 0,i − SE 1,i ) for D i = D max , a user-defined maximum displacement length.For D i ≥ D max the full point-wise error applies and SE w,i = SE 0,i .
In conclusion, RMSE w as an error metric rewards forecasts to the degree that a larger error reduction can be obtained by smaller displacements.By definition, RMSE 1 ≤ RMSE w ≤ RMSE 0 .If the error reduction due to the image deformation is negligible or can only be obtained with displacements equal to or larger than D max , the diagnosed error is equal to the original error prior to the deformation RMSE 0 .If, on the other hand, the displacements required to minimize the point-wise error are very small relative to D max , we have RMSE w ≈ RMSE 1 .The justification for this approach lies in the tendency of coastal morphologists to credit a prediction for the reproduction of features, albeit displaced, while imposing a relatively small penalty for misplacement.The intuitive weighting of these two aspects is mimicked by the user-defined parameter D max .

Application
Below, the new error metrics are used to diagnose the correspondence between model-generated pairs of morphological patterns for an idealized tidal inlet as well as the relative ranking between the pairs.The fields have been generated for the idealized case of a tidal inlet developing from an initially very schematized geometry (Roelvink, 2006).First, Sect.3.1 demonstrates that the location error D is able to capture the overall misplacement of the morphological patterns.Next, in Sect.3.2, the combined error metric RMSE w is put to the test.Two examples are shown where the RMSE w makes the right the decision as to which of two predictions is the better prediction while the conventional, purely point-wise RMSE 0 fails to do so.

Location error
In this subsection, we consider a subset of the modelgenerated depth fields which only differ with respect to the latitude, and hence Coriolis parameter, used in the model.Of four depth fields, we label the field generated at 53 • N as the "observations" (Fig. 3a) and consider the other fields, for latitudes 90 • N, 0 • and 90 • S, as three competing predictions.
Even though the predictions are not shown here, it will not come as a surprise that the point-wise error RMSE 0 is smallest for 90 • N and largest for 90 • S (Table 1).
In order to determine RMSE 1 and D, the image warping method is applied, following the procedure outlined in Sect.2, and illustrated here for the prediction at 0 • (Fig. 3b).The deformed prediction that matches the observations most ) is the field of displacement magnitudes computed from the backward displacement vector field B (see Eq. 1), and w is determined according to Eq. ( 2).
closely is shown in Fig. 3d and the corresponding backward vector displacement field B in Fig. 3c.As explained in Sect.2.1, in the inactive outer regions, physically unrealistic displacements are found along depth contours, since no penalty is imposed in the minimization for displacements along depth contours.As will be illustrated next, this is solved for in the formulation of D (Eq. 2).
The difference d 0 between the predictions prior to the warp and the observations is shown in Fig. 4a, whereas Fig. 4b shows the difference d 1 after the warp.Note that taking the root-mean-square of d 0 and d 1 yields RMSE 0 and RMSE 1 , respectively.From d 0 , the double penalty problem is clearly observed; for instance at the edges of the ebb-tidal delta, an error is diagnosed both where the delta is present in the observations but absent from the predictions and vice versa.After the warp, both errors have practically disappeared, such that they will not count towards RMSE 1 , demonstrating again that RMSE 1 should be regarded as the point-wise error if no penalty for misplacement is taken into account.For the prediction at 0 • , RMSE 1 /RMSE 0 = 0.5, and slightly smaller ratios are found for the other two predictions (Table 1).
The weighted dispacements wD, with D = √ (B x 2 + B y 2 ) and w according to Eq. ( 2), are shown in Fig. 5. Inherent to the use of the squared error to determine w is that larger error reductions are heavily weighted.Here, we have never-Table 2. Subjective ranking (with 1 being the best prediction) and errors for competing predictions, generated with Delft3D for various boundary conditions.The "observations" are taken as the model outcome at 0 theless chosen this weighting since squared errors are consistent with the minimization as performed by the registration method as well as with the use of the (R)MSE as the pointwise metric, which is common in morphodynamic model validation.Note that for the computation of D (Eq. 2), we require the backward (from the observations to the predictions) rather than the forward displacements; for each point in the observational domain, these provide the distance at which the point in the predictions is located that is shifted to the considered location in the observations.Summing wD for the entire domain yields a location error D = 350 m at 0 • (Table 1).
The values for D for the three predictions demonstrate a qualitative behaviour consistent with the error in latitude and hence Coriolis effect in the various predictions (Table 1).In fact, all three error metrics, RMSE 0 , RMSE 1 and D diagnose the predictions for 90 • N and 90 • S as the best and worst predictions, respectively.Next, we will consider situations in which a ranking consistent with expert judgement is only obtained by considering these three metrics in concert, using an appropriate weighting, or from RMSE w .

Ranking according to the combined error metric
In this subsection, we present an example, again using depth fields generated with the Delft3D model of the schematized tidal inlet, that demonstrates that RMSE w outperforms the traditional score RMSE 0 .Now, the model results at a latitude of 0 • (see Sect. 3.1) are assumed to be the "truth".Four competing predictions are considered that are generated at 0 • with various changes to the model boundary conditions (w.r.t.tidal amplitude and flow direction).Figure 6 shows the four predictions, the "observations" and the deformed predictions that minimize the point-wise error.
We have labelled the predictions according to a subjective ranking based on visual inspection, with A the prediction with the closest match with the observations and D, the worst prediction.We have a slight preference for prediction B over C, but it is possible that other morphologists would tend to regard C as the better prediction.Not surprisingly, the relative ranking as diagnosed by RMSE 0 deviates from the expert ranking (Table 2); based on RMSE 0 one would wrongfully conclude that predictions A and B perform equally well and that prediction D outperforms prediction C. The values of RMSE 1 , D and RMSE w for the respective predictions provide the necessary additional information on model performance (Table 2).The smaller RMSE 1 for prediction A than for prediction B shows that if no penalty is imposed for misplacements, prediction A receives a lower error than B.Moreover, a smaller average displacement D is required to minimize the point-wise error.Thus, even though no distinction can be made based on RMSE 0 , we can conclude that pattern A more closely corresponds to the observations than pattern B. Clearly, considering the values of RMSE 0 , RMSE 1 and D in concert leads to a diagnosis of relative model performance of A and B in line with visual inspection.
To determine RMSE w , a value for D max must be chosen.A defendable choice would be to limit D max to the scale of the morphological features of interest.For this particular case, D max = 3000 m is considered appropriate, being in the order of magnitude of the seaward extent of the ebb-tidal delta.In general, of course, D max must be chosen in accordance with the goal of the simulation.
Figure 7 shows that with D max = 3000 m, RMSE w reports a higher quality for prediction A than for prediction B, regardless of the exact choice for D max .Only if one decides to not allow any relaxation of the requirement of an exact match (D max = 0 m), RMSE w is identical to the full point-wise error RMSE 0 and no distinction can be made between A and B. If one wishes to allow the full error reduction potential (D max → ∞), we have RMSE w = RMSE 1 .
Table 2 illuminates that prediction C, the prediction with the largest RMSE 0 , has a much larger potential for error reduction by warping than prediction D; notwithstanding the larger RMSE 0 , RMSE 1 is smaller for prediction C than for D and at a smaller mean displacement D. The relatively small error reduction potential for D is a result of the fact that features not present in the predictions remain absent after the warping procedure, as evident in the deformed predictions in Fig. 6.As a result, RMSE 1 remains relatively high for D, rightfully penalizing the prediction for the absence of the observed features.A conclusive answer as to whether C or D is the better prediction now requires a (subjective) weighting of RMSE 0 , RMSE 1 and D. Conveniently, the weighting between location errors, pre-warp and postwarp intensity errors is already provided by the formulation of RMSE w , allowing a quantitative single-number comparison between predictions C and D. For D max = 3000 m, the values for RMSE w indicate that prediction C outperforms D (Fig. 7), consistent with the ranking based on visual inspection.Naturally, the occurence of this ranking reversal, as compared to the ranking based on RMSE 0 , depends on the chosen value of D max .

Conclusions
We have developed a new diagnostic tool for morphodynamic model validation.It employs an image warping method that finds the smooth displacement field between predictions and observations that minimizes the point-wise error.Two new metrics are proposed: (1) a location error D that is determined as a weighted mean distance between morphological fields; and (2) a combined error metric RMSE w that takes both location and intensity errors into account.
A full appreciation of the quality of a prediction can be obtained when considering D in concert with both the original point-wise error RMSE 0 and the point-wise error of the deformed predictions, RMSE 1 .In order to quantify the relative performance between predictions, a (subjective) weighting of these three metrics must be carried out.Alternatively, the weighting is already provided by RMSE w that combines all relevant information on location errors and pre-and postwarp intensity errors.
The combined error metric credits predictions to the degree that a larger error reduction can be obtained with smaller displacements.It reduces to RMSE 0 if all displacements are larger than a user-defined D max and to RMSE 1 for displacements that are negligible relative to D max .The latter can be seen as the maximum distance over which morphological features may be displaced for the prediction to still get (some) credit for predicting these features.The appropriate choice for D max depends on the prediction situation and the goal of the simulation.Since it only requires a single, physically intuitive parameter, RMSE w provides a robust basis for comparison.
An example of a schematized tidal inlet has demonstrated that RMSE w outperforms the conventional validation approach based on a strictly point-wise metric such as RMSE 0 .In situations where morphological features are misplaced, point-wise accuracy metrics tend to favour predictions that underestimate variability.For the schematized tidal inlet, it was shown that, as opposed to RMSE 0 , the new combined error metric RMSE w makes choices as to which of two predictions is better, which are consistent with visual validation by experts.

Fig. 1 .
Fig. 1.The "double penalty effect".Top panels: the featureless prediction A has a non-zero difference d A between predicted and observed depth values at the location of the observed feature only.Lower panels: prediction B, which reproduces the feature at the wrong location, is penalized twice (d B is non-zero both where the predicted feature is and where it should be) and is thus diagnosed with a twice as large (R)MSE as prediction A.

Fig. 3 .Fig. 4 .
Fig. 3. Example of the image warp: (a) the "observations", calculated using Delft3D with Coriolis at 53 • N; (b) the predictions, calculated at 0 • ; (c) the backward displacement vector field B of the observations towards the predictions, shown on top of the observations; and (d) the predictions deformed to more closely match the observations.

Table 1 .Fig. 5 .
Fig. 5. Weighted displacements wD for the prediction at 0 • .Here D = √ (B x 2 + B y 2) is the field of displacement magnitudes computed from the backward displacement vector field B (see Eq. 1), and w is determined according to Eq. (2).

Fig. 6 .Fig. 7 .
Fig.6.Predictions A, B, C and D, the "observations" (taken as the model results for 0 • ) and the corresponding deformed predictions that minimize the point-wise mismatch between predictions and observations.The labels are chosen such that the lower the label in the alphabet, the higher the quality that the prediction is probably diagnosed with upon visual inspection.The axes are as in Fig.3.