The use of MOGREPS ensemble rainfall forecasts in operational flood forecasting systems across England and Wales

Operational flood forecasting systems share a fundamental challenge: forecast uncertainty which needs to be considered when making a flood warning decision. One way of representing this uncertainty is through employing an ensemble approach. This paper presents research funded by the Environment Agency in which ensemble rainfall forecasts are utilised and tested for operational use. The form of ensemble rainfall forecast used is the Met Office shortrange product called MOGREPS. It is tested for operational use within the Environment Agency’s National Flood Forecasting System (NFFS) for England and Wales. Currently, the NFFS uses deterministic forecasts only. The operational configuration of the NFFS for Thames Region is extended to trial the use of the new ensemble rainfall forecasts in support of probabilistic flood forecasting. Evaluation includes considering issues of model performance, configuration (how to fit the ensemble forecasts within the current configurations), data volumes, run times and options for displaying probabilistic forecasts. Although ensemble rainfall forecasts available from MOGREPS are not extensive enough to fully verify product performance, it is concluded that their use within current Environment Agency regional flood forecasting systems can provide better information to the forecaster than use of the deterministic forecasts alone. Of note are the small number of false alarms of river flow exceedance generated when using MOGREPS as input and that small flow events are also forecasted rather well, notwithstanding the rather coarse resolution of the MOGREPS grid (24 km) compared to the studied catchments. In addition, it is concluded that, with careful configuration in NFFS, MOGREPS can be used in existing systems without a significant increase in system load. Correspondence to: J. Schellekens (jaap.schellekens@deltares.nl)


Introduction
From the early 1990s, meteorologists have been providing ensemble predictions of rainfall and an increasing number of hydrologists have begun to use these in (semi) operational flood forecasting systems (e.g.Gouweleeuw et al., 2005;Pappenberger et al., 2005).One major source of uncertainty in hydrological forecasting systems is the forecasted rainfall.As such, the use of ensemble rainfall forecasts may lead to increased understanding of the uncertainty associated with hydrological forecasts.
The Met Office is the main provider for the Environment Agency of meteorological forecast products.These forecast products are being used in the Environment Agency's current operational forecasting system, the National Flood Forecasting System (NFFS).The numerical weather prediction (NWP) capability of the Met Office is continuously being enhanced and new ensemble products are now being made operationally available.Currently, the nowcasting system STEPS -Short Term Ensemble Prediction System (Bowler et al., 2006) -produces deterministic rainfall forecasts at a 2 km resolution.In the near future these will be available to the Environment Agency in ensemble form.Also, for longerterm numerical weather prediction (NWP) a new ensemble forecasting system has been developed called MOGREPS -Met Office Global and Regional Ensemble Prediction System (Bowler et al., 2008) -which uses a coarser model resolution of 24 km (now 18 km).These developments offer interesting opportunities for the Environment Agency and open the door to probabilistic flood forecasting.Operational research is required to realise the potential benefits of these developments for flood warning.Many existing ensemble prediction systems across Europe are based on ECMWF ensembles (Cloke et al., 2009;Cloke and Pappenberger, 2009;Thielen-del Poze et al., 2009) and are aimed at mediumrange forecasting although MAP D-PHASE (Rotach et al., 2009) is focused on Alpine systems and shorter lead-times.
Published by Copernicus Publications on behalf of the European Geosciences Union.
The MOGREPS product, investigated here for the first time in a hydrological forecasting application context, is aimed at the shorter range of one to three days.
Hydrological models have the capability to provide useful river-flow forecasts supporting flood warnings if the rainfall information they are supplied with is sufficiently accurate.These models have generally been used with raingauge data, radar analyses and extrapolated radar forecasts.More recently longer-term NWP model rainfall forecasts have also been used.All of these have now been incorporated in the NFFS for regional flood forecasting across England and Wales and are used to drive a large number of hydrological models.
The Environment Agency's R&D project "Hydrological Modelling using Convective Scale Rainfall Modelling" (Schellekens et al., 2010) aimed to investigate what hydrological model concepts and associated computational methods allow for making best use of the latest Met Office developments in probabilistic rainfall forecasting.All the hydrological modelling concepts explored were trialled within the current National Flood Forecasting System (NFFS) infrastructure that is based on Delft-FEWS (Werner et al., 2004).Currently, the new Flood Forecasting Centre (FFC), a joint venture between the Environment Agency and Met Office, is trailing MOGREPS in combination with a gridbased hydrological model (Grid-to-Grid; Bell et al., 2009) to deliver flood risk outlooks across England and Wales (Price et al., 2011).

MOGREPS
In 2005, the Met Office introduced MOGREPS for shortrange ensemble prediction, providing a 24-km-resolution regional ensemble for the Atlantic and Europe domain.Ensemble forecasting is based on the principle of adding small perturbations to the "best estimate" of the initial state of the atmosphere.The model is then run forward from the perturbed starting conditions to generate an ensemble of different forecasts.The regional model which is used here (MOGREPS-R) is designed to provide ensemble forecasts for the short-range (days 0-3) for the United Kingdom and Ireland.It provides a 24-member ensemble of 3-hour rainfall totals with a grid resolution of 24 km for a forecast length of 54 h (36 h are used in this research).Boundary conditions for the regional model are provided by a global model (MOGREPS-G) with a 90km grid and a forecast time of 72 h producing a 24-member ensemble.Both models are run twice daily, the global at 00:00 and 12:00 UTC and the regional 6 h after these times.Due to spin-up issues, and the fact that only two forecasts are available per day, the first hours of the MOGREPS runs are generally not used.The ensembles consist of 1 control run and 23 additional members.The control forecast is run at the same resolution as the other ensemble members but does not contain any perturbations to account for initial condition or model uncertainties -as such it runs from the best analysis of the initial state of the atmosphere.The control run can be compared with the standard deterministic weather forecast that is run at a 12-km resolution.

Implementation of MOGREPS ensembles in NFFS Thames Region
Thames Region, one of the eight regions within the NFFS, was chosen to test the use of ensemble rainfall forecasts from both MOGREPS and STEPS in an emulation of the operational flood forecasting system.The Region has fairly long lead times to its most important forecasting locations but has, on the other hand, significant fast responding urban areas within its forecasting responsibility.Hydrological modelling within Thames region employs the Thames Catchment Model or TCM (Wilby et al., 1994;CEH, 2005) with a total of 148 TCMs used and that cover most of the region.The forecasting time-step is 15 min and models have quick run-times.Each TCM is used in combination with an ARMA (AutoRegressive Moving Average) error predictor model (CEH, 2005) with the forecast flow formed as the sum of the TCM simulated flow and a prediction of the error that utilises flow observations up to the time the forecast is made.A total of 112 raingauges and 537 river gauges is used in Thames Region to support the forecasting process.TCM plus ARMA combinations are used to forecast flow at key locations or lateral inflows to the main rivers which are modelled separately using hydrodynamic ISIS models.Due to the nested configuration of the TCMs, the larger of the currently available models cover a long lead-time which makes them less beneficial if used with nowcasting ensembles from STEPS and in some cases also for short-range ensembles from MOGREPS.
To evaluate the performance of MOGREPS and STEPS as they would be used in the operational setup in NFFS, a complete copy of the operational system was setup and run on a day-to-day basis using all available operational data as input to the system.After extending the configuration of the system to include the MOGREPS and STEPS ensemble rainfall forecasts, the system could be run in hindcast mode with the available historical forecasts.
Hindcasts using MOGREPS were performed for Thames Region for the period July 2008 to February 2009 for which MOGREPS ensemble rainfall forecasts were available twice a day.Hydrometric data between January 2008 and the start of the hindcast were used to hotstart the TCMs used for river flow forecasting.After warming up the models, runs where made ending each day at 09:00 and for which time a set of initial conditions were saved.Repeating this for the whole period provided a set of initial conditions to be used for starting the forecasts.Subsequently, the forecasts were run twice a day at 09:00 and 21:00 coinciding with current operational practice in Thames Region.Results from the operational system were exported and employed in post-processing analyses that used the R programming language (Ihaka and Gentleman, 1996) outside the operational environment.

Results and discussion
The eight-month period used in hindcasting proved to be too short to determine reliable statistics for assessing performance over flood events.Ideally, verification would measure the performance of the system in forecasting the crossing of important flood warning thresholds.However, to establish meaningful verification statistics, a sufficiently large number of observed "events" is needed, where an event is understood to be the crossing of a certain flow (or level) threshold.Typically, thresholds that are meaningful within the context of operational flood forecasting are relatively high.As the verification period considered here is relatively short, crossing of these thresholds may not have occurred at all or only so infrequently that the number of events is not large enough to give a meaningful performance statistic.Other work on MOGREPS (Bowler et al., 2007) has look at verification of the rainfall forecast itself.Therefore, here the results of the hindcast assessment will be presented in three ways: (i) scatter plots of forecast versus observed flow for a given lead time, (ii) Rank Probability Skill Score (RPSS), of limited value because of sample size, and (iii) individual forecast hydrograph plots to get an impression of the behaviour of the short-term ensemble prediction system for the small catchments involved.
For the analysis of the results, eight locations were chosen (that have warning thresholds set) to provide a representative cross section of the TCMs used in Thames Region.The name, station identifier and catchment area of these locations are given in Table 1 together with their warning thresholds.
Figures 1 and 2 show typical scatter plots of the flow forecasts using MOGREPS rainfalls as input versus the corresponding observations for different forecast lead times.These scatter plots show how well the forecasted flows correspond to those observed.As one would expect, the spread of the forecasts narrows and the bias falls as the lead-time of the forecast decreases.For some locations, this is more the case than for others.This has mainly to do with the response  time of the specific catchment and if a rainfall event was predicted by MOGREPS.For example, plots for Newbury (Fig. 1) hardly show any spread because no large rainfall event was predicted (and observed) during the period July 2008 to February 2009.Plots for smaller catchments, such as Addlestone (Fig. 2) and Kinnersley Manor (Fig. 3), already exhibit larger spreads at 6 h lead-time due to catchment size/response time.
In order to verify the whole range of possible outcomes, the Ranked Probability Score (RPS) can be used (Wilks, 2006).For verifying flow rates, a number of categories are defined whose flow ranges cover all possible outcomes.The squared difference between the cumulative forecast probability and the corresponding cumulative observation for each category is averaged across all categories to obtain the RPS.The RPS is sensitive to category distance: for example, if a forecast falls into a more distant category than the  observation, it will be penalised more.Zero is the perfect score for RPS.Table 2 shows the mean RPS values for the period July 2008 to February 2009 using the warning thresholds of Table 1.As mentioned previously, these statistics are not very revealing because of the limited period used in the analysis not encompassing many threshold-exceedance events.The ranked probability skill score (RPSS) using the naïve persistence forecast as a reference has been calculated as well and is shown in Table 3.This skill score measures the improvement of the multi-category probabilistic forecast relative to a reference forecast (usually the long-term or sample climatology, but here the persistence forecast).It is similar to the 2-category Brier Skill Score, in that it takes climatological frequency into account.Because the score denominator approaches 0 for a perfect forecast, this score can be unstable when applied to small datasets.The rarer the event, the larger the number of samples needed to stabilise the score.clearly visible in Table 3. Possibly the only result which has some credibility is that for Kinnersley Manor where there are multiple events over the whole range of observed values.Here, the RPPSSs indicate that there is skill when using MOGREPS ensembles as forecast rainfall input to the TCM compared to use of the naïve persistence forecast.Another insight that may be deduced from Table 3 is that MOGREPS did not produce many false alarms.
For Colindeep Lane, Marsh Farm, Binfield and Newbury no warning threshold, only the Standby threshold, was crossed (both by flood forecasts and observations) during the 8 month hindcast period.
The lack of enough warning thresholds being crossed limits the usefulness of the RPPSS somewhat in this case.Therefore, the Continuous Ranked Probability Score (CRPS) was also calculated to get a better indication of the performance of MOGREPS (Grimit et al., 2006).Table 4 shows the CRPSS (Continuous Rank Probability Skill Score).This is defined as the CRPS of the MOGREPS forecast divided by the CRPS of the naïve forecast.A value of 1 indicates perfect skill and, while zero would indicate that the forecast is equal in skill to the reference forecast, anything below 0 indicates that the skill is less than the reference forecast.Ta-ble 4 shows that the skill of the MOGREPS forecast increases with lead time.This is to be expected as the naïve forecast does not include any forecasted precipitation.For Redbridge the naïve forecast performs better up to 12 h but for other, faster responding, stations MOGREPS shows skill even for the shorter lead times (6 to 16 h).
Figures 4 and 5 show the forecast and observed hydrographs for Addlestone and Kinnersley Manor for February 2009.For both locations the event of 9 to 11 February is well predicted, although use of the MOGREPS ensembles tends to lead to underestimation of the flood event to some degree.This behaviour can also be observed in some of the scatter plots (Fig. 1).

Conclusions
An ensemble product MOGREPS developed by the Met Office has been tested for operational use by the Environment Agency.The regional model used is designed to provide ensemble forecasts (24 members) for the short-range (days 0-3) for the United Kingdom and Ireland.
A real verification study of the ensemble flood forecast made using MOGREPS is not possible given the period of the hindcast July 2008 to February 2009.But from this hindcast study several observations can be made.The MOGREPS ensembles of forecast rainfall give good ensemble flood forecasts when used with the flood forecasting models used across the Thames Region.The frequency of false alarms is low over this period.Also smaller events below the warning thresholds are often well predicted, despite the rather coarse resolution of the ensemble rainfall forecasts available at the time.Whilst MOGREPS at 24 km resolution does not purport to represent localised convective (or orographic) rainfall, this operational evaluation over the lowland Thames Region of England provides evidence of its utility for probabilistic flood forecasting and warning.
Edited by: F. Pappenberger Reviewed by: two anonymous referees

Fig. 1 .
Fig. 1.Scatter plot of forecast and observed flow for different lead-times for Newbury.

Fig. 2 .
Fig. 2. Scatter plot of forecast and observed flow for different lead-times for Addlestone.

Fig. 3 .
Fig. 3. Scatter plot of forecast and observed flow for different lead-times for Kinnersley Manor.

Table 1 .
Warning thresholds (m 3 s −1 ) for locations (name, station identifier and area) used in the evaluation.

Table 2 .
Mean RPS for the period July 2008 to February 2009 using the thresholds listed in Table1.

Table 3 .
Mean RPSS for the period July 2008 to February 2009 using the thresholds of Table1and the naïve forecast (persistence) as reference forecast.Flow forecast ensembles for 12 consecutive forecasts, 12 h apart, in February 2009 at Addlestone.Observed flows indicated by circles.Horizontal lines represent the thresholds from Table1.

Table 4 .
CRPS for the period July 2008 to February 2009 using the naïve forecast (persistence) as reference forecast.Flow forecast ensembles for 12 consecutive forecasts, 12 h apart, in February 2009 at Kinnersley Manor.Observed flows indicated by circles.Horizontal lines represent the thresholds from Table1.