Estimation of the systematic error of precipitation and humidity in the MM5 model

Abstract. To comprehensively diagnose model capabilities in simulating atmospheric flow including the relevant microphysical processes, the main prognostic fields of the MM5 model are compared with ERA40 reanalysis data. This approach allows to identify and compare meaningful features of model parameterization schemes and to quantify model errors. Various combinations of schemes for cumulus convection, planetary boundary layer (PBL), microphysics and radiative transfer are used in order to identify those combinations which produce the closest resemblance between model state and reanalysis. The spatial structure of systematic errors, both horizontal and vertical will be described and geographical regions and synoptic situations will be identified, which are associated with pronounced systematic model deviations. The study focused on precipitation and humidity fields as well as on the main thermodynamic atmospheric variables on a coarse resolution grid (about 80 km) over the North Atlantic - Europe region. Our results identify advantages and shortcomings of the various parameterization schemes. They also indicate that, in general, the combination of best schemes does not result in optimal simulations of a particular variable.


Introduction
Precipitation still poses many challenges for both observers and modelers. Despite the deployment of numerous studies, sufficiently accurate quantitative estimations of the magnitude and location of precipitation have not yet achieved. Major problems arise from the still incomplete knowledge of the physical processes involved in precipitation generation and from uncertainties in the computational representation of that Correspondence to: S. Ivanov (svvivo@te.net.ua) what is already known. Even the measurement of precipitation is not yet solved satisfactory. Despite more than 50 years of development, radar derived precipitation still differs up to a factor of 2 from direct measurements at the surface (Austin, 1987). On the model side, besides errors in model formulations, problems are related to the numerical approximation of continuous physics by linearization and to the parameterization of sub-grid scale processes, inevitably resulting in systematic model errors. In contrast to the other major prognostic variables, humidity and precipitation are phenomena with pronounced spatial and temporal intermittence (Rabier et al., 1998), contributing significantly to the representativeness error (Ivanov and Palamarchuk, 2006) and the accuracy of rainfall estimates in general (Berne and Uijlenhoet, 2007).
Increasingly, forecast uncertainties due to initial conditions are tackled by using various approaches (Wei and Toth, 2003;Buizza et al., 2005;Harlim et al., 2005). The main conclusion of those investigations is that both the initial condition errors and numerical model quality are equally important for the forecast performance. But they cannot be separated since the two are convoluting (Orrell et al., 2001). To distinct from a classical meteorological set-up with an ensemble of short-range runs, long-term simulation has certain advantages in extracting systematic errors (Jung, 2005). First, state-dependent model error variations are suppressed on a simulation period, which is much longer than a typical synoptic period. Then, by averaging the error over a domain and/or throughout a simulation period, the systematic model error is obtained with a number of degrees of freedom that is equal to the corresponding value for a series of short-range runs with the same total simulation period. And last but not least, by discarding the spin-up period, we obtain a series of model variables with the saturated error. This eliminates the initial condition errors, thereby only the boundary condition effect remains (we assume that the ERA40 reanalysis data are reliable to be used as the boundary conditions). More details of the practical implementation are given in Sect. 2.

Data and diagnostics
Model capabilities to reproduce the atmospheric flow and relevant microphysical processes are diagnosed by comparing simulations against precipitation and humidity fields from the ERA40 reanalysis. This approach allows to identify and compare meaningful features of model parameterization schemes and to quantify model errors. Note that derived estimates may contain biases due to biases of the reanalysis itself as well as be sensitive to the sampling variability for state-dependent model errors. We simulate the atmospheric flow for the period January to February 2002. The area of interest covers the extratropical regions of the Atlantic Ocean and the European continent. This area is characterized by relatively homogeneous conditions for developing atmospheric systems like cyclones over different surface types, i.e. oceanic and continental, and orography. Different parameterization schemes for microphysics, radiation transport, cumulus convection and the boundary layer are used to investigate their abilities in reproducing the large-scale atmospheric flow and the meso-scale precipitation structure as well (Table 1). The ERA40 reanalysis is used for the initial and boundary conditions during the long-term simulation.
Two diagnostics have been used to estimate the systematic error. The first is the average difference, d dif , between forecasts f v and reanalysis r v (see, for example Jung, 2005) with V either the domain or period of time being averaged, and v is an index for discrete points in space or in time. The hat denotes that we are dealing with the estimate, which is subject to the finite length of the time series. This diagnostic estimates either the spatial structure of the systematic model errors averaged throughout the period being considered or temporal evolution of the systematic errors over the whole domain (or its parts like the ocean and continent). It identifies the magnitude and sign of the systematic errors, but may suffer, however, from misrepresentation in cases, when the model properly simulates the spatial or temporal scales, but with incorrect amplitudes of patterns. The diagnostic can compensate, however, errors of opposite sign when averaging over a given domain or period. A more detailed description is obtained by the absolute deviation, d std This diagnostic better describes the magnitude of systematic errors but it will not provide the sign of a deviation. Thus, both diagnostics should be used to obtain a more complete description of systematic model errors.

Results
We performed two months of integrations and discard the first 10 days during which the model error usually growths until saturation leaving us with a statistically homogeneous series of systematic errors. Our results show that the model error varies depending on the region and the particular atmospheric pattern. Moreover, different atmospheric variables are better reproduced by different parameterization schemes. In other words, there is not one set of parameterization schemes which is optimal for all variables simultaneously. Thus, for a given variable there exists a conditional optimality. To quantitatively estimate the optimality, variable-specific weighting coefficients have been arranged in the following manner: for each variable at every level the best set of schemes has a zero weight, while the worst set gets the unit value. The other sets receive weights according to their relative location between the first and last sets. These coefficients are shown in gain tables (Tables 2a-c). The choice of the optimal set depends on variable and level, but also on the diagnostic. In general, the optimal set of parameterizations is 5653 (see Table 1 for a presentation of available schemes and their digital designation), which includes the mixed-phase microphysics by Reisner (Dudhia, 1993), cumulus convection by Kain and Fritsch (1993), the MRF scheme for the boundary layer by Hong and Pan (1996) and the CCM2 radiation scheme (Hack et al., 1993). Similarly, satisfactory results with a little smaller score are also obtained from the 5654 experiment. This uses the RRTM (Rapid Radiative Transfer Model) (Mlawer et al., 1997) radiation scheme which is about twice faster than the CCM2 scheme with a minor lack of accuracy. For relative humidity, the set 5243 based on the Anthes-Kuo cumulus convection scheme (Grell et al., 1994) and the Eta PBL (Janjic, 1994) schemes also ranges high.
In the next paragraphs, the simulation results are shown for the 5653 and 5243 experiments, both of which are good as regards the relative humidity. The behavior of the systematic Table 2a. Score of parameterization schemes for geopotential height (H), temperature (T) and relative humidity (RH) with the absolute deviation diagnostic. The parameterization set is denoted in the following manner: MCBR, where M is microphysics, C is cumulus, B is boundary layer and R is radiation with the digital designation from Table 1   model errors is described in terms of the horizontal spatial distribution, vertical profiles and temporal variability.

Spatial distribution of the systematic relative humidity error
The spatial distribution of the systematic model humidity error over the Europe-Atlantic region for the set 5653 is shown in Fig. 1. Except Greenland, where the orography and surface properties create exclusive conditions, the systematic error has a two-layer structure. In the lower troposphere and mainly throughout the boundary layer, the model simulates humidity at higher values compared to the reanalysis over the whole region. Above 500 hPa, the situation is opposite, i.e., the model underestimates humidity against reanalysis. Figure 2 shows the histograms for the areas covered by the average and absolute errors of relative humidity at the 850 and 300 hPa levels with different parameterization schemes.
In the upper atmosphere, the relative humidity model error is less sensitive to the parameterization set. The largest area is covered by an absolute error of about 25% and by an average error of -5÷15% for all parameterizations. Figure 2b also confirms that the model has a relative humidity bias at the 300 hPa level, which is much less pronounced in the lower troposphere. In the low troposphere, the absolute error varies from 15 to 30%, depending on the scheme, while the average error ranges from -10 to +10%.
Within the intermediate layer between 850 and 700 hPa surfaces, the error changes sign, forming a gradient from the South to North. Such a distribution of the systematic model error mandates discussion. One possible reason is   with advection by large-scale atmospheric flow can partially explain so-called the phase error (Bousquet et al., 2007) associated with misplacement of predicted precipitation patterns. Perhaps, an approach proposed by Hoffman et al. (1995) and Brewster (2003) to specify the forecast error for precipitation in terms of amplitude and displacement errors instead of the conventional control variables can provide more useful results. Pierce et al. (2006) pointed out, however, that the majority of models have a pattern of drier than observed conditions by 10-25% below 800 hPa, but 25-100% too moist conditions between 300 and 600 hPa in the extra-tropics. John and Soden (2007) found similar results for specific humidity and discussed the origin of theses biases in models compared to observations. They reported that satellite data may overestimate humidity in the low troposphere due to digitizing vertical profiles and shifting observed values to levels below. It should be noted that we consider the systematic model error as the difference between the corresponding model and reanalysis fields. If the reference values, i.e., the reanalysis data represent a bias that comes from the satellite data, then the result is expected to be biased in the opposite manner. Thus, the model can try to correct the distribution of humidity received from the initial and boundary conditions, i.e., from the reanalysis, with respect to physical laws on which the model is based.   schemes underestimate humidity in the layer between 700 and 300 hPa against the reanalysis. The minor error at 200 hPa is easy to explain by the small amount of water at this altitude and simpler microphysical processes in comparison with those developing within the PBL. Below 700 hPa, the systematic humidity error depends on the scheme being used. Crucial for the sign of the systematic error becomes the boundary layer scheme. For the simulations with the use of the Eta (Janjic, 1994) PBL schemes, the model underestimates relative humidity within the low troposphere as well.

Vertical profiles of the systematic humidity error
In contrast, the use of the MRF PBL scheme (Hong and Pan, 1996) results in increased relative humidity against reanalysis throughout the boundary layer. The set 5243 has systematic errors within the boundary layer which are closed to zero. This scheme, as well as the optimal scheme (set 5653) accordingly to the weighting coefficients of the Table 1, is used for further investigations of the spatial structure of the systematic error. Hereafter, two parameterization sets, i.e., set 5653 and set 5243, are considered (Table 1). The first set was selected due to its optimal features to reproduce the large-scale flow as well as meso-scale precipitation structures. The second set is characterized by its ability to simulate the average relative humidity profile within the boundary layer similar to reanalysis data. Figure 4 shows the evolution with lead time of the precipitation water summarized over the whole domain for these two sets. The total amount of precipitation is reproduced in a rather good agreement for both cases (Fig. 4a). It is worth noticing that various synoptic patterns with different rain rates were developing during the integration period and the model simulated changes in precipitation intensity fairly well. However, different precipitation types, i.e. convective and large-scale (or stable according to the MM5 terms), are simulated with different parameterization schemes with different accuracy. For example, the combination of Anthes-Kuo cumulus convection and the MRF PBL schemes significantly overestimates convective precipitation over the domain during the entire model run, by about a factor of 3 (Fig. 4b). Large-scale precipitation is underestimated at about the same value such that the total precipitation amount is similar to the reanalysis. Thus, the usage of different schemes in sufficiently different manner redistributes precipitation from large-scale in favor of convective precipitation.

Mean systematic precipitation error
The other set combines the Kain-Fritsch cumulus convection and Hong-Pan non-local advection scheme within the PBL. It also overestimates convective precipitation, but not as crucially as in set 5243, while stable precipitation is similar to that in the reanalysis. Thus, in terms of the temporal evolution of the overall precipitation amount set 5653 seems to be optimal.
3.4 Spatial structure of the precipitation error for a particular synoptic pattern We address now the question of how precipitation is being reproduced in a particular synoptic pattern. Numerical experiments have shown that the model error in reproducing an amount of precipitation over the domain is of the same magnitude during the simulation period, although the spatial distribution of model errors varies depending on a particular atmospheric pattern. Below, we will consider a typical winter season circulation over the North Atlantic and Europe developing on 21 January 2002 (Fig. 5). Two extensive cyclones cover the region and stretch throughout the troposphere. One of the cyclones occupies the Newfoundland basin, which is known as an extremely intensive air-ocean interaction zone. In this region, cyclones travel from the cold Northern American land surface to the warm ocean over the Gulf Stream. They regenerate by energy input due to latent heat fluxes from the surface and create storm track weather conditions over the North Atlantic (Rivals et al., 1998;Snyder and Joly, 1998). The other cyclone is a dipole over Northern Europe with a center located west of the British Isles. The impact of this cyclone results in both moderate stable precipitation over the Isles and Scandinavia and in showers between Greenland and Iceland. Weather conditions over the south-west part of the North Atlantic and Southern Europe were influenced by a depression system, humid air with some rain. Figure 6 shows the spatial distributions of different precipitation types, i.e. the convective, stable and total, as they are presented in the reanalysis and in the model simulation with the parameterization sets 5653 and 5243. The differences between the model and reanalysis fields are shown in Fig. 7. In general, set 5653 reproduces the main regional patterns and rates of large-scale precipitation over Newfoundland, Central and North Atlantic. A few regions exist, however, where the systematic precipitation error is pronounced. Over the Mediterranean region both large-scale and convective precipitation are overestimated. The systematic error of stable precipitation is large in magnitude and localized westerly from the Apennines and in spots over the Eastern Mediterranean. On the contrary, the systematic error of convective precipitation is small in magnitude but is spread out over the whole basin. The other region of pronounced systematic errors is Scandinavia. Over this area, the model catches weak stable precipitation but considerably overestimates convective rain for the current synoptical pattern. Moderate large-scale precipitation is also overestimated in the form of a belt within the low pressure zone between Greenland and Iceland.
The complete analysis of systematic error sources requires a detailed decomposition of mass and heat fluxes, as well as monitoring the temporal evolution of main components. This is rather an ambitious task, which might become a goal of another investigation. Nevertheless, it might be noticed that the above systematic errors arise due to the following: first, over the warm sea surface, like the Mediterranean, air-sea interactions including water vapor fluxes are described in the model with higher intensity that leads to larger accumulation of water mass in the atmosphere and an overestimation of precipitation. The other source of the systematic error relates to different rates of evolution of atmospheric processes as they are described by different model modules. For example, the redistribution of atmospheric vapor in the model originates from relatively slow large-scale processes toward faster developing convection within the warm sectors of cyclones, again over the warm ocean surface like the North-Atlantic current. As a result, over the western part of the North Atlantic and Scandinavia the model underestimates stable precipitation in favor of convective.
The other interesting feature is present in the frontal zones of cyclones with moderate and intensive rain rates over the Newfoundland basin and North Atlantic. In these areas, similar spatial structures of the systematic error with opposite signs are following each other. This is a typical example of the phase error (Bousquet et al., 2007), where the spatial structure of precipitation patterns and rain rates is reproduced, but the patterns are shifted relatively to the true location, normally upstream. One reason might be the discretization form of continuous processes in the model. Particularly, while temperature and humidity profiles are approximated at the model levels, cloud characteristics, such as the condensation level etc, are shifted downward on model levels. This results in earlier water vapor condensation and, as a consequence, in untimely precipitation. Coupling this process with advection by the atmospheric flow displaces the precipitation structures in the model upstream of their real positions. Thus, dipoles of the systematic precipitation error with opposite signs are formed, although the model seems to correctly reproduce the physics itself. Although, the model parameterization set 5243 has shown the lowest averaged systematic error for relative humidity in the PBL (Fig. 4), this combination provides less realistic precipitation for a particular moment in both the spatial distribution and rain rate (Figs. 6 and 7). Convective precipitation is considerably overestimated in the latitude belt of 30-50 • N, and in particularly over the whole Mediterranean region. Large-scale precipitation is affected by phase errors of large amplitudes over the Newfoundland basin and in the central part of the North Atlantic and is considerably overestimated over the Balkan Mountains. Thus, one may conclude that set 5653 is to be preferred.
Water vapor is expected to play a key role in atmospheric dynamics. Even small absolute changes in the amount of water vapor can have strong effects on the radiative forcing in the free atmosphere and on the mass and heat fluxes within the boundary layer. Investigation of the model sensitivity to errors in the initial conditions including water vapor content in the atmosphere as well as of the model fitness to properly reproduce cloud and precipitation processes is thus of high importance.
This study presents estimates of the systematic error for humidity and precipitation in the MM5 model. The evaluation is based on a comparison of model fields against reanalysis ERA40 for the winter season of 2002 over the North Atlantic and Europe. Different available parameterisation schemes for microphysics, cumulus convection, PBL and radiation are used in order to select the optimal set.
Results show that in terms of weighting coefficients for the main prognostic variables, the optimal set includes the following parameterization schemes: mixed phased by Reisner for microphysics, cumulus convection by Kain and Fritsch, non-local advection in the PBL by Hong and Pan and the CCM2 radiation scheme (Dudhia, 1993). This set best simulates also humidity fields and precipitation.
However, the following systematic errors are typical for these parameterization sets. First, the model redistributes water vapor from the middle and upper atmosphere downward to the boundary layer. This results in overestimating relative humidity within the PBL by about 5% and underestimating this variable by about 10% in the layer between 850 and 300 hPa. It is worth noting that these estimations have been obtained against ERA40 reanalysis, which still suffers from unbalanced global water budget (Hagemann et al., 2002), e.g. the dry bias in both winter and summer in the European-Atlantic region; the overestimation of precipitation over the tropical ocean; the long-term mean of P-E (precipitation-evaporation) over the ocean; dry bias over North America.This may happen due to the fact that the main purpose of the reanalysis is to get a realistic representation of the atmosphere, while processes near the surface or in the soil are only of secondary interest. However, these processes may seriously affect the atmospheric circulation through their impacts on surface fluxes. Thus, observing the overestimation of relative humidity in the lower atmosphere in the mid-latitudes by the model compared to ERA40, one might suggest that the model attempts to retrieve the water vapor distribution accordingly to physics.
Further, the total amount of precipitation and their temporal variation are in good agreement with the reanalysis. But in the mid-latitude belt between 30 • N and 50 • N, the model overestimates precipitation in total amount. Moreover, this overestimation is mainly due to convective precipitation, while large-scale precipitation is slightly under-estimated. In particular, such redistribution occurs over the warm ocean surface within stationary or slow depressions.
The other specific feature of these simulations is the phase error of precipitation associated with misplacement of predicted precipitation patterns, while the precipitation magnitude is reproduced more or less correctly.
Results showed that for a proper simulation of precipitation and humidity fields, the choice of the cumulus and PBL parameterization schemes is important. In particular, the systematic model errors are lower in the runs using the Kain-Fritsch scheme for cumulus and MRF scheme by Hong-Pan for PBL. These schemes better produce the other atmospheric variables and dynamical features of the atmospheric flow. The latter becomes important for proper simulation of the advection in middle and extended range runs.
Edited by: S. C. Michaelides Reviewed by: two anonymous referees