IWV retrieval from ground GNSS receivers during NAWDEX

A ground-based network of more than 1200 Global Navigation Satellite System (GNSS) Continuously Operating Reference Stations (CORS) was analysed using GIPSY-OASIS II software package for the documentation of time and space variations of water vapor in atmosphere during the North Atlantic Waveguide and Downstream impact EXperiment (NAWDEX) during fall 2016. The network extends throughout the North Atlantic, from the Caribbeans to Morocco through Greenland. This paper presents the methodology used for GNSS data processing, screening, and conversion of Zenith Tropospheric Delay (ZTD) estimates to Integrated Water Vapor content (IWV) using surface parameters from reanalysis. The retrieved IWV are used to evaluate the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalyses ERAI and ERA5. ERA5 shows an overall improvement over ERAI in representing the spatial and temporal variability of IWV over the study area. The mean bias is decreased from 0.31± 0.63 to 0.19± 0.56 kg m−2 (mean ±1σ over all stations) and the standard deviation reduced from 2.17±0.67 to 1.64± 0.53 kg m−2 combined with a slight improvement in correlation coefficient from 0.95 to 0.97. At regional scale, both reanalyses show a general wet bias at mid and northern latitudes but a dry bias in the Caribbeans. We hypothesize this results from the different nature of data being assimilated over the tropical oceans. This GNSS IWV data set is intended to be used for a better description of the high impact weather events that occurred during the NAWDEX experiment.


Introduction
The North Atlantic Waveguide and Downstream impact EXperiment (NAWDEX) is an international field campaign that took place during fall 2016. The experiment aims at a better understanding of the effects of the diabatic process on the evolution of atmospherics disturbances along the North Atlantic Jet Stream and a better representation of these processes in Numerical Weather Prediction (NWP) models. In particular, it consisted in the realisation of new and various multi-scale observations in the North Atlantic Basin. The observational field campaign took place from 16 September to 16 October 2016 with the deployment of four research aircrafts and extensive ground-based instrumentation (Schäfler et al., 2018).
In addition to this special ground-based instrumentation, measurements from more than 1200 Global Navigation Satellite System (GNSS) Continuously Operating Reference Stations (CORS) located along the North Atlantic Arc were analysed using a single software package for an extended period, from 1 September to 5 November 2016. Indeed, with constantly improved accuracy in the estimation of tropospheric delay, GNSS has become a very valuable tool for monitoring the atmosphere, and especially atmospheric water vapor . The routine analysis of continuously operating GNSS networks worldwide provide continuous ZTD time series with high resolution (from a few hours down to a few minutes) (Byun and Bar-Sever, 2009). Thus, it is now quite common to use GNSS ZTD estimates retrieved from global or regional networks for climatological and meteorological studies (Poli et al., 2007;Wang and Zhang, 2008;Bock et al., 2008;Guerova et al., 2016;Parracho et al., 2018). The goal of this paper is to describe the retrieval of ZTD and IWV estimates from the GNSS measurements and to assess the quality of IWV estimates from the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalyses ERAI (ECMWF Reanalysis -Interim, Dee et al., 2011) andERA5 (ECMWF Reanalysis 5, Hersbach et al., 2020) in the NAWDEX study area. In Sect. 2, the GNSS CORS network, data processing and post-processing (screening and ZTD to IWV conversion) are described. In Sect. 3, the GNSS IWV data are compared to the two reanalyses and the results are discussed in different geographical regions. Section 4 gives a summary and concludes.
2 GNSS data analysis 2.1 GNSS networks and data processing GNSS measurements for 19 CORS networks, including 1221 stations, were reprocessed for the NAWDEX project ( Fig. 1). For more details about the networks used, please refer to Appendix A.
Global Positioning System (GPS) measurements were processed using the GIPSY/OASIS II v6.4 software in PPP mode (Zumberge et al., 1997). The fiducial-free final satellite obit and clock products 3.0 from Jet Propulsion Laboratory (JPL) were used. The data were analysed in a 30 h window centred on each day from which the 00-24 h parameters were extracted to avoid edge effects. Second order ionospheric correction was used. Phase ambiguities were fixed using the wide-lane phase biases computed by JPL as part of the analysis (Bertiger et al., 2010). The cut-off angle was fixed at 7 • without down-weighting low-elevation observations. We applied IERS conventions for solid Earth tides (Petit and Luzum, 2010) and the Finite Element Solution tide model FES2004 (Lyard et al., 2006) for ocean tide loading effects using the coefficients computed by the ocean tide loading provider (Machiel Simon Bos and Hans-Georg Scherneck, http://holt.oso.chalmers.se/loading/, last access: 20 June 2020). We also used absolute antenna models for satellites and ground receivers (from https://files. igs.org/pub/station/general/igs_01.atx, last access: 29 January 2021).
Tropospheric delays were modelled by time-varying Zenith Hydrostatic Delays (ZHD), Zenith Wet Delays (ZWD) and horizontal gradients along with their respective Vienna Mapping Function 1 (VMF1) mapping functions (Boehm et al., 2006). The a priori values for ZHD and ZWD and the values for the mapping functions were computed from 6-hourly ECMWF operational analyses by Technische Universität Wien (TU-Wien, https://vmf.geo.tuwien.ac.at/, last access: 20 June 2020). Correction to a priori ZWD and horizontal gradients were modelled as random walk processes with a 300 s time resolution and were estimated during the data processing. The random walk parameters were fixed at 5 and 0.5 mm h −1/2 for ZWD and gradients, respectively. Zenith Tropospheric Delays (ZTD) were then calculated by the summation of a priori ZHD, a priori ZWD, and the estimated ZWD correction.

Screening and quality assessment
A quality assessment of the ZTD estimates is mandatory for scientific use, and also to obtain reference data sets. This is particularly crucial when considering large data sets as in this study. Screening consists in the removal of spurious data or outliers that would induce erroneous interpretation and/or degrade comparisons. A GNSS-based screening method (i.e. independent from external data) was used previously with GPS  and DORIS ZTD data (Bock et al., 2014). The method was applied using both ZTD estimates and their formal errors, ans included two major steps: the range check, which tests the values against maximum range of the physical variable, and the outlier check that is statistical in nature and will detect values based on station statistics. In both studies, screening was shown to improve the agreement of GNSS ZTD or IWV data with independent data.
Inspired by previous work and completed by an analysis of the ZTD and its formal error distribution, the following procedure was applied for the screening of the ZTD estimated for the entire network, using both ZTD estimated values and the associated formal errors (σ ZTD ): Overall, 0.31 % of the ZTD estimates data were rejected; screening checks based on ZTD estimates (both range and outlier checks) were ineffective (no input data were effectively rejected). This was due not only to the high quality of the data sets but also to the processing strategy (ZTD estimation as a random walk process) that prevented highly spurious ZTD values by constraining time variations. 0.13 % of input data were rejected after the range check on formal error; Outlier checks on formal error were more active in terms of rejection: they eliminated around 0.16 % of the input data. The last step, the data check, rejected around 0.02 % of input data.
To assess the impact of screening, we also investigated daily processing output parameters (ambiguity resolution rate, height estimates) which were resampled at ZTD time resolution (5 min) and selected according to ZTD screening criteria. Such selection induces a re-weighting of output parameters according to the count of valid estimates per session. Figure 2 represents changes in resampled output parameters (mean ambiguity resolution rate and height repeatability) and ZTD formal error after screening according to the station rejection rate. For most stations, the rejection rate was low and no significant changes (> 1 %) were observed. Significant changes (> 1 %) occurred for a high rejection rate (> 5 %) and were always positive whatever the parameter (increase in ambiguity resolution rate and decrease in height repeatability). Of course, the reduction in mean ZTD formal error was to be expected (decrease greater than 1 % for 73 stations), but the ambiguity resolution rate was also improved, as 10 stations presented an increase in average ambiguity resolution rate greater than 1 %. An improvement in height repeatability was also observed since this value was decreased by more than 1 % for 42 stations. For these 3 parameters, there was no degradation higher than 1 %.

GNSS ZTD to IWV conversion
The ZTD can be divided into 2 components as follows: (1) ZHD was computed from surface pressure from ERAI or ERA5 grids using the Saastamoinen formula (Saastamoinen, 1972). For ERAI, grids were given for a horizontal resolution of 0.75 • every 6 h. For ERA5, grids were given for a horizontal resolution of 0.25 • every hour. ZHD were then extrapolated to GNSS antenna height using a formulation inspired by Steigenberger et al. (2009) and Boehm and Schuh (2013) which was shown to be sufficient for small height differences: where h G and h E are respectively the geoid height of the GNSS antenna and the ERAI or ERA5 grid, T (h E ), P (h E ) and g(h E ) = 9.8062 m s −2 are respectively the temperature, pressure and mean gravity between the GNSS antenna and the ERAI or ERA5 surface; P and T could be approximated using a standard model as GPT (Boehm et al., 2007); k 1 is a refractivity constant (Thayer, 1974) and g atm = 9.7840 m s −2 is the approximated gravity of the centre of mass of the atmosphere (Boehm and Schuh, 2013). ZHD are then interpolated at each GNSS epoch; From GNSS ZTD and ERAI or ERA5 ZHD we can deduce ZWD that is related to IWV following: ( 3) κ is a semi-empirical function given by Bevis et al. (1992). It depends on the integrated mean temperature, T m , which is interpolated using values provided by the TU-Wien database (https://vmf.geo.tuwien.ac.at/, last access: 20 June 2020) that consists in 6-hourly grids derived from ECMWF operational analysis and provided on global grid (2.5 • × 2.0 • ). In order to limit the systematic errors due to the extrapolation with Eq. (2), GNSS stations with a geoid height differing by more than 1000 m from the model orography were not subsequently considered.

IWV from ERA reanalysis
We used IWV grid from ERAI and ERA5. The IWV were first extrapolated from the ERAI or ERA5 orography to the height of the GNSS antenna by using the empirical formulation used by Parracho et al. (2018): where h E and h G are respectively the geoid height of the ERAI or ERA5 grid and the GNSS antenna; IWV(h E ) is the IWV values from the ERAI or ERA5 grid; IWV(h G ) is the extrapolated values at GNSS antenna height. Once again, the selection of only the GNSS stations that had a geoid height differing by less than 1000 m from the model orography enabled a reduction in extrapolation errors. Finally, the IWV were bilinearly interpolated to the horizontal position of the antenna to obtain the ERAI or ERA5 time series at a time resolution of 6 and 1 h respectively for each GNSS station.

Results
In this section, we compare the GNSS IWV derived using ERAI and ERA5 surface pressure fields and the IWV extracted from ERAI and ERA5 IWV fields respectively. It is necessary for the dates of the compared data to match exactly, i.e. GNSS data are resampled at 6 h for comparisons with ERAI and at 1 h for comparisons with ERA5 (the results for 6-hourly ERA5 data are almost exactly the same, see Appendix B). Only stations with enough comparison points for a duration of at least 60 h are considered here. Finally, stations for which more than 50 % of the ZTD estimates had been rejected by screening were also removed from the data set. That decreases the number of stations from 1221 to 1203. Figure 3 gives an overview of the observed differences between the IWV from the ERAI and ERA5 reanalyses and GNSS for the whole network. Both reanalyses show an overall small wet bias compared to GNSS, but the bias is slightly smaller for ERA5 compared to ERAI: 0.31 ± 0.63 kg m −2 for ERAI and 0.19 ± 0.56 kg m −2 for ERA5. A significant overall reduction in the standard deviation of differences is also observed, from 2.17 ± 0.67 kg m −2 for ERAI to 1.64 ± 0.53 kg m −2 for ERA5, which highlights a better representation of the IWV time variations with ERA5 than with ERAI. This reduction in standard deviation also comes with an increase in the correlation coefficient from 0.95 to 0.97. For all three statistics, a tightening of the histograms is clearly observed, as well as an increase in the number of classes corresponding to lower values for bias and standard deviation and higher values for the correlation coefficient with ERA5. The overall better agreement between ERA5 and GNSS can be thought as being due to smaller representativeness errors (errors due to some small-scale variability that is not resolved by the reanalysis but captured by GNSS observations) with ERA5 than with ERAI, thanks to its higher spatial resolution (31 km compared to 80 km), and the impact of more water  Table 1. Statistics of IWV differences between ERAI, ERA5, and GNSS over all stations and over four domains: Caribbean (1), Northern America East Coast (2), Arctic Circle (3) and Western Europe (4). N sta denotes the number of stations in the area; mean IWV (kg m −2 ) is the mean IWV from all GNSS stations on the area; "mean" is the mean difference (kg m −2 , ERAI, ERA5 − GNSS), σ the standard deviation of difference (kg m −2 ), and ρ the mean correlation coefficient. The values indicate the mean ± 1 SD (standard deviation) over the stations in the domain. Note that some stations are outside of the four main domains. vapour observations (especially from satellites) in the more recent reanalysis (Hersbach et al., 2020). The geographical distribution of the three statistics (bias, standard deviation of differences, and correlation coefficient) for both reanalysis is presented in Figs. 4 and 5. It is obvious from these figures that the statistics are strongly region-dependent. Both reanalyses show contrasted agreement with GNSS depending on the region with some differences between the reanalyses. In order to simplify the discussion, the results are analysed in four domains: (1) the Caribbean sea, (2) the Northern America East Coast, (3) Greenland and the broader area close to the Arctic Circle, and (4) Western Eu- rope. The results in all four regions are more spotted (larger dispersion between sites) with ERAI than with ERA5. This feature does again highlight the larger representativeness errors with ERAI. The biases show a quite large scatter from station to station, but more positive biases are observed over Europe (wet bias in the reanalyses) and more negative biases in the Caribbean. The biases are further discussed below. The standard deviation and correlation coefficient are more uniform within each region and highlight: (i) smaller standard deviation in the Arctic where the mean IWV content is lower, Figure 5. Same as Fig. 4 for comparisons between ERA5 and GNSS IWV.
(ii) higher correlation in the US where more observations are assimilated, and (iii) higher standard deviation and smaller correlation in the Caribbean where the mean IWV content and the amplitude of temporal variability is higher. Table 1 reports the average results along with the dispersion (one standard deviation) in the 4 geographical areas. The general comments on standard deviation and correlation are easily checked based on these numbers. It is noticeable that the results are systematically improved with ERA5 in all four regions. The mean IWV numbers also highlight the huge difference in total column water vapour content between the tropics (48.7 kg m −2 ) and the Arctic region (8.2 kg m −2 ). One interesting feature here is also the high variability along the Northern America East Coast (13.5 kg m −2 ) that reflects the alternation of strong cyclonic activity and quieter weather conditions in this region during this period of the year. Cyclonic activity in the storm track region also impacts the Western Europe area as reflected from the relatively high space-time variability (7.3 kg m −2 to be compared to the moderate mean of 20 kg m −2 ).
Coming back to the biases, we emphasized that the mean bias compared to GNSS is slightly reduced in ERA5. However, this result is not uniform throughout the large study area, nor is the sign of the bias between regions. The most striking feature is the negative (dry) bias in the Caribbean in both reanalyses. This is in contrast with the positive (wet) bias in the other regions, in both reanalyses. Such a spatial variation in the mean IWV differences was already highlighted by Parracho et al. (2018) and Bock and Parracho (2019), as a strong systematic feature of ERAI (based on 16 years of data). Both studies showed that ERAI has an overall dry bias in the inter-tropical band and an opposite wet bias at higher latitudes. The reason hypothesized in these studies is the difference in the nature of data assimilated. In the tropics, most of the humidity observations come from satellite observations over the oceans, while over the more continental higher latitudes (especially northern latitudes), the many radiosonde observations have still a strong impact in the data assimilation process and the water vapour observations from satellites are limited over land. These characteristics are again found in ERA5 probably for the same reasons. The second interesting result is the consistency of the systematic difference between ERAI and ERA5 in three of the regions where the bias difference is about 0.2-0.3 kg m −2 , with ERA5 drier than ERAI. Only for the Northern America East Coast is ERA5 moister than ERAI. This result is not explained. Finally, the larger absolute value of the dry bias in ERA5 in the Caribbean compared to ERAI may simply be due to the combination of the overall 0.2-0.3 mean difference between the two reanalyses and the dry bias in the tropics discussed above.

Conclusions
In the framework of the NAWDEX project, GNSS data from more than 1200 CORS located along the North Atlantic arc have been analysed over a one month period during fall 2016. An outlier screening method was applied to ZTD data and rejected about 0.3 % of the data. The method is based on distribution of ZTD and its formal error. It also improved other processing outputs such as the mean ambiguity resolution rate and the height repeatability. GNSS IWV estimates were computed using surface fields from ECMWF reanalyses, ERAI and ERA5, and then compared to the IWV fields from these reanalyses. The change from ERAI to ERA5 goes together with a significant reduction in standard deviation (25 % decrease) and a slight improvement in correlation with GNSS which are partly explained by reduced representativeness errors in ERA5 as well as the impact of more recent model physics and enhanced data assimilation. Improvements are noticeable in the entire area considered in this study but the results show some regional variability due to the contrasted climates between the Tropics and the Arctic. The study highlights a slight reduction in the overall wet bias in ERA5 compared to ERAI in all regions except along Northern America East Coast. This result is unexplained and will be investigated in more detail in the future. Consistent with previous studies based on ERAI, a small dry bias was found in both reanalyses in the Tropics. However, the absolute value of this bias is increased in ERA5 compared to ERAI. We hypothesize this is a result of the overall wet bias reduction observed in ERA5 and the dry bias in the Tropics. In conclusion, general good agreement is found between both reanalyses and GNSS, with a small superiority in the quality of the IWV represented in the ERA5 reanalysis. The higher temporal resolution of ERA5 (1-hourly) combined with the GNSS IWV time series makes both data sets especially useful to study high impact weather events. The GNSS IWV data set is available from the NAWDEX database. The providers listed in Table A1 are gratefully acknowledged for giving access to the GNSS data used in this study. Appendix B: Impact of ERA5 time resolution on differences with GNSS A 6 h resampling of ERA5 grid is evaluated with respect to GNSS IWV in order to assess the impact of the time resolution of the reanalysis. Results are presented in Table B1. No significant change was observed for bias, standard deviation and correlation coefficient.  Table A1. The IWV products described in this study can be provided for free on demand.
Author contributions. PB collected and processed the GNSS data and computed all the results. PB and OB analysed the results and co-wrote the article.
Competing interests. The authors declare that they have no conflict of interests.
Special issue statement. This article is part of the special issue "European Geosciences Union General Assembly 2020, EGU Geodesy Division". It is a result of the EGU General Assembly 2020, 4-8 May 2020.