Trend analysis for daily rainfall series of Barcelona

(1) Universitat Politècnica de Catalunya, Departament de Matemàtica Aplicada III. Barcelona, Spain (ma.isabel.ortego@upc.edu ; jose.gibergans@upc.edu; juan.jose.egozcue@upc.edu), (2) Universitat Politècnica de Catalunya. Laboratori d’Enginyeria Marítima. Barcelona, Spain (raimon.tolosana@upc.edu), (3) Universitat de Barcelona. Departament d’Astronomia i Meteorologia. Barcelona, Spain (carmell@am.ub.es)


Introduction
Frequency analysis of hydrological series is a key point to acquire an in-depth understanding of the behavior of hydrologic events.The occurrence of extreme hydrologic events in an area may imply great social and economical impacts.Therefore, modelling extreme values is of great interest in all environmental studies and civil engineering.
This work adopts a Bayesian version of the commonly used point over threshold (POT) or exceedance methods.The occurrence of events in time are assumed to be Poisson distributed and the excesses over a selected high threshold are assumed to follow a Generalized Pareto Distribution (GPD).
During the last decades, a common concern has been to know whether magnitudes of hazardous events have changed.This issue has been assessed from different points of view, both Bayesian and frequentist.Ideas of extreme value theory have been applied to enviromental studies, e.g.
Correspondence to: M. I. Ortego (ma.isabel.ortego@upc.edu)Smith (1989), where the possibility of a trend was considered.Within the same study, the possibility of an abrupt change due to an improvement in measurement technology is discussed (Raftery, 1989).Other climatic variables such as temperatures (Parey et al., 2007) or rainfall have been also assessed for trends, through nonparametric tests (Pujol et al., 2007) or linear regression type tests (Bodini and Cossu, 2010).However, some authors (Katz et al., 2002) consider that the evidence of trends based on least squares regression does not necessarily reflect properties of extremes like heavy tails.A Bayesian approach may be suitable (Katz et al., 2002;Coles et al., 2003).
The first part of the work focuses on the possible existence of abrupt changes in the parameters of the GPD.These abrupt changes may be due to changes in the location of the observatories and/or technological advances introduced in the measuring instruments.A new parameterization of the GPD distribution is suggested.Using this reparameterization, the second part of the work examines the possible existence of trends.
The paper is organized as follows: the Poisson-GPD occurrence model and the Bayesian estimation of parameters are described in Sect. 2. This methodology is applied to the Barcelona rainfall series and results are shown in Sect.3.

Methodology
A standard hazard analysis approach, Point-Over-Threshold (POT), has been used to model events occurring in time and their magnitude.Rainfall events are modelled as points in time.For each event, a magnitude X (natural logarithm of daily precipitation) is taken as a random variable.The assumptions are: (a) magnitudes are independent from the point process itself; (b) they are independent from event to Published by Copernicus Publications on behalf of the European Geosciences Union.event; (c) all magnitudes have the same distribution F X ; and (d) the occurrence of the events in time follows a Poisson process.Thus, the number of events, N(u), occurring in a given arbitrary time interval, t, is assumed to be Poisson distributed, N (u) ∼ Poisson(λ), There are difficulties in identifying a model for F X , but the Generalized Pareto Distribution (GPD) provides a suitable model for excesses of X over a high enough threshold x 0 , Y = X − x 0 given X > x 0 , (Pickands, 1975).This GPD distribution is parametrized by a shape parameter, ξ , and a scale parameter β: The parameter ξ is non-dimensional, and β has units that are proportional to the units of the excesses (Y ).The value of ξ defines three so-called domains of attraction (DA) (Davison and Smith, 1990;Castillo et al., 2004).The Weibull DA corresponds to GPD's such that ξ < 0 and y sup = −β/ξ , i.e. the support of Y , and hence of X, is limited with a finite upper tail.The Fréchet DA is characterized by ξ > 0 and y sup = +∞, and the support of Y becomes infinite.Finally, the Gumbel DA corresponds to GPD's such that ξ = 0 and y sup = +∞, the GPD takes the limit form which is an exponential distribution.Densities of GPD distributions of different domains of attraction show distinct features (Tolosana-Delgado et al., 2010).The first step in the estimation of the GPD distribution for the excesses, Y , is the selection of an appropriate reference threshold x 0 .A graphical technique (Davison and Smith, 1990;Embrechts et al., 1997;Dupuis, 1998) can be applied taking into account that, for GPD distributed excesses and for x 1 ≥ x 0 , the mean excess is linear with respect to x 1 , (ξ < 1), Once a suitable threshold x 0 has been selected, and considering that this work examines the possible existence of trends, the parameters ξ and β of the GPD are considered as a function of time.In order to parsimoniously deal with this climate variation a new parameterization of the GPD distribution is suggested.The classical scale and shape parameters of the GPD (ξ,β) are reformulated as a location parameter µ, linked to the upper limit of the distribution, and a shape parameter ν, In order to take account for uncertainty of the estimation of these parameters, a Bayesian approach for the joint estimation of µ and ν is chosen.One of the basis of Bayesian estimation of parameters is to assume that they are random variables and, then, they are described by their joint probability distribution (Gelman et al., 1995), before and after the data acquisition.The prior density f λ,µ,ν (λ,µ,ν) represents our knowledge about the parameters before the data sample.As independence of the magnitudes from the point process has been assumed, the prior density can be written as: Using Bayes' theorem, the posterior distribution of the parameters is obtained: where L(λ,µ,ν|y 1 ,...,y n ) is the likelihood function of the excesses and y 1 ,...,y n is a sample of excesses over threshold x 0 .
Therefore, the joint posterior density f λ,µ,ν (λ,µ,ν|y 1 ,...,y n ) accounts for the uncertainty of the parameters.Not only information about the parameters can be obtained from this distribution, but it is also the basis to obtain the distribution of quantities of interest such as hazard parameters, usually through simulated samples of λ, µ and ν.Monte Carlo methods can be used in the cases where this joint posterior probability distribution is not easily tractable (Robert and Casella, 2000).As attention is set to the existence of changes in the GPD parameters, this work approaches only the characterization of the GPD distribution, leaving apart the estimation of the Poisson occurrence parameter.Taking into account the independence between parameters due to model assumptions, only the marginal joint posterior density f µ,ν (µ,ν|y 1 ,...,y n ) is considered: where L(µ,ν|y 1 ,...,y n ) is the likelihood function of the excesses However, once the estimation of the Poisson and the GPD parameters has been performed, hazard parameters can be computed easily using marked Poisson processes properties (Grandell, 1997).

Models
A common concern is to know whether magnitudes of hazardous events have changed in the last decades.Long data series are very appreciated in order to properly study these issues.The series of daily rainfall in Barcelona (1854Barcelona ( -2006) ) seems suitable to address these issues, due to its length.As previously stated, rainfall is taken in a log-scale as a relative scale appears to be natural.For strictly positive data, not only for precipitation, taking log's might be not unimportant.When using a natural scale, some modelling requirements, such as goodness-of-fit or compatibility with physical assumptions, appear to be fulfilled in an easier way than when using the usual raw scale in real space (Egozcue et al., 2006;Pawlowsky-Glahn et al., 2005).On the other hand, excesses over a threshold are modelled by a GPD with a limited maximum value (y sup = −β/ξ finite) in order to be consistent with physical limitations.First, the possible existence of abrupt changes in the parameters of the GPD is treated.The parameters of the model are considered as a function of time.Using the suggested new parameterization of the GPD distribution (µ = log(−β/ξ ) and ν = log(−ξ • β)), the location parameter µ is fixed, due to the limited character of the magnitude, and the possible abrupt change of the parameters of the GPD is then translated into a possible abrupt change in the ν parameter.Therefore, the breakpoint t 0 is a point in time such that where L is a fixed length.
The possible existence of trends in the GPD parameters is also treated.The parameters of the model are also considered as a function of time.Using the suggested new parameterization of the GPD distribution the parsimonious choice is to consider shape as a linear function of time, ν(t) = ν 0 + t ν while keeping location fixed µ(t) = µ 0 .Then, the climate change is assessed by checking the hypothesis ν = 0.

Barcelona daily rainfall series
The series of daily rainfall in Barcelona is one of the longer continuous european instrumental daily rainfall series available.This data base contains records of daily precipitation along more than 150 years (1854-2006) (Fig. 1), although a longer data series of monthly precipitation  sharing the same name is also available, (Barrera-Escoda, 2008).
The POT method has been used, and a GPD distribution has been fit to excesses over threshold x 0 .In our case, applying the mean excess graphical technique (Fig. 2), it seems that a good fit for a reference threshold is x 0 = 3.9, which corresponds to a raw daily rainfall of 50 mm approximately.
During the studied period of time (more than 150 years) the location of rainfall observatories where data were collected has changed as shown in Fig. 3.Most of these sites lied within the Barcelona built-up area, but the existence of parts of the series recorded at the El Prat Airport, ca. 15 km.from Barcelona city center, points to a possible change in the parameters of the selected distribution of daily rainfall.As technological advances have been introduced in the measuring instruments during this period as well, the first part of the work focuses on the possible existence of abrupt changes in the parameters of the GPD (µ,ν) due to these changes of location and/or instruments.Therefore, as previously stated, three parameters need to be estimated: ν 0 ,t 0 and ν.
According to the Bayesian paradigm, previous knowledge about the parameters has been included in its prior density.Prior density for ν 0 ,t 0 and ν has been assumed uniform for a wide range of values.As log-daily-rainfall is considered a bounded magnitude, the domain for the ξ parameter has been completely placed in the Weibull DA, and this character has been transmitted to the prior assumptions for the ν and µ parameters.Through the likelihood of the log-dailyrainfall excesses over x 0 = 3.9 this prior assumptions have been updated, obtaining the posterior density of the parameters.Once the posterior density is obtained, a simulated sample of 10 000 values (ν 0 ,t 0 , ν) is generated using the Gibbs sampler (Robert and Casella, 2000).
There is no strong evidence that the migration of the rain gauge positions through the city and the airport imply important shape differences (Fig. 4).Only mild "significance" can be attached to the changes from locations 6 to 7 (∼ 0.12), the return from 7 to 8 (∼ 0.1) and from the airport (10) back to the city (11) (∼ 0.07).Other suspicious transitions can be rejected, with Bayesian p-values as large as 0.2.
Next, the possible existence of trends is examined.In fact, as previously presented, the interest is set in the possible slope of the spread parameter ν.As in the first model, a uniform prior density of the parameters for a wide range of values has been considered.Posterior joint density of GPD parameters in the new and classical parameterizations are shown in Fig. 5 and Fig. 6.In order to assess the existence of a trend, the marginal posterior density of the slope in the spread parameter, ν has been obtained, Fig. 7.This density shows that no clear slope is significantly different from zero, and therefore a linear trend for the spread parameter is not significant.Furthermore, if there were a trend, it would have positive slope.

Conclusions
A Point-Over-Threshold approach with a new parameterization of the GPD distribution has been used as a model for daily rainfall.The scale and physical features of the phenomenon have been considered, and therefore, daily rainfall has been treated in a log-scale and considered as a bounded magnitude.A long instrumental data series, the daily rainfall Barcelona series, has been considered and a Bayesian approach has been used to assess the existence of abrupt changes in the parameters of the reparameterized GPD distribution and/or trends in these parameters.No strong evidence of a linear trend or step change in the Barcelona rainfall series has been found.Although there are no strong evidences of heterogeneity, attention should be paid to the changes of location with higher frequency of possible breakpoint, i.e. changes from locations 6 to 7, 7 to 8 and 10 to 11 (Fig. 4).Under the parsimonious hypothesis of no change of the upper physical limit of rainfall, a climate change hypothesis is not significant.

Fig. 4 .Fig. 5 .Fig. 6 .
Fig. 4. Upper plot: histogram of posterior simulations of t 0 .Middle plot: positions of the known rain gauge location changes.Lower plot: (color levels) kernel density estimate of the joint posterior distribution of (t 0 , ν); (black lines) quantiles of the distribution of ν conditional on t 0 .