Comparison results for the CFSv 2 hindcasts and statistical downscaling over the northeast of Brazil

An Artificial Neural Networks (ANNs) approach was used to reproduce the precipitation anomalies for the rainy seasons over the south and north parts of the Northeast of Brazil (NEB) during 1982–2009 period. The seasonal hindcasts of precipitation anomalies from Climate Forecast System v2 (CFSv2) model and the observed dominant modes of anomalous Sea Surface Temperature over the South and North Atlantic Ocean were used as explanatory variables separately. The reduction of dispersion between the explanatory and dependent variables after the fit of the networks suggest the ANN as an important complementary technique for the climate studies over the NEB. However, a large dataset are required to the models capture the non-linear process in more details. The practical implication of the results is that ANNs constructed here could be applied in further analyses, for example, to explore the ANN’s ability in improving the seasonal climate forecasts considering that the numerical and statistical methods must be complementary tools.


Introduction
The Northeast of Brazil (NEB) is located on a large portion of the tropics on the South American continent between 1 • 2 30 S-18 • 20 07 S; 34 • 47 30 W-48 • 45 24 W and is bounded to the east and north by the Atlantic Ocean.Although the General Circulation Models (GCMs) have high ability in reproducing the precipitation variability over this region there are serious difficulties in modeling the monthly or even the seasonal precipitation regimes.Part of the problem is due to poor spatial resolution that smooths the orography.However, the GCMs are usually effective in reproducing the main modes of Sea Surface Temperature (SST) variability over the tropical and subtropical oceanic basins.
An alternative to minimize the deficiencies of the GCMs is to explore the Artificial Neural Network -ANN (Gardner and Dorling, 1998).The technique allows the establishment statistical links between the observed large-scale circulation and the precipitation or temperature fields by applying transfer functions to the GCM outputs.Based in this context the present study focuses in two issues: -To evaluate the ability of statistical downscaling based on ANNs and applied to Climate Forecast System v2 (CFSv2) hindcasts in reproducing more efficiently the observed precipitation for the rainy seasons over the north and south parts of NEB during 1982-2009 period; -For the same period to verify how the trends in behavior of seasonal observed precipitation anomalies could be reproduced by the ANNs using as explanatory variables only the observed dominant modes of anomalous SST variability in the South and North Atlantic Ocean.
The motivation is use the ANN as a complementary technique to improve the GCM hindcasts over the NEB considering that most studies are focused in Southeast, South and Amazon regions of Brazil.We emphasize that no lag in time is applied between the explanatory and dependent climate variables whereas the focus is on investigation of local contribution excluding the persistence of explanatory variables.Similar analysis is found in studies of Cardoso and da Silva Dias (2004) and Mendes and Marengo (2010).In the following section, the data and methodology are described with a brief outline of the ANNs, CFSv2 model and the Empirical Orthogonal Functions (EOF) analysis.The results are presented in Sect. 3 and the discussion is made in Sect. 4.

Published by
Copernicus Publications on behalf of the European Geosciences Union.
Two grid boxes were selected over the NEB region: to the north (2 • S-7 • S; 40 • W-45 • W) and south (10.5 • S-15.5 • S; 42 • W-47 • W); and their locations were nearly similar to those used in the study of Nobre and Shukla (1996).The analyzed period covers 1982-2009 years equivalent to longest period of CFSv2 hindcasts data.Mendes and Marengo (2010) mentioned that a large amount of data is required to use the ANNs, e.g., 40-50 yr or more.However, we did not use a longer period to verify whether statistical relationships could be established for the recent climate.The observed dataset were: the Climate Prediction Center Merged Analysis of Precipitation -CMAP (Xie and Arkin, 1997), the SST OIv2 from Reynolds et al. (2002) both with 1 • resolution and the Sea Level Pressure (SLP) from the reanalysis 1 project from each 2.5 grid point (Kalnay et al., 1996) that were interpolated to horizontal resolution as 1 • latitude-longitude.
A comparison between the annual cycle of observed and simulated monthly precipitation averaged over the selected two grid boxes was made.The rainy seasons for the both grid boxes were selected and their respective hindcasts of precipitation were downscaled by using the nonlinear transfer functions.This information provides diagnostic information related to the ability of the ANN improving the seasonal hindcasts.
The EOF time series of observed SST anomalies in the South and North Atlantic Ocean during the rainy periods of the both grid boxes were used as explanatory variables in the networks schemes.This allowed investigating the establishment of statistical links between the simultaneous factors: the large-scale circulation and the observed precipitation pattern during the trimesters in analysis.

The Artificial Neural Network
The ANN is inspired in a human brain that learns through the training (Haykin, 2008).For instance, a sequence of patterns associated to a sequence of responses is provided for the network.Based on a set of explanatory variables (input) and through training the network fit weights for those that can have contributed to the variability of explained variables (target).It is required that the explanatory dataset should be independent to avoid the problem of multicollinearity.
The network maps the explanatory dataset in a second set of output variables that is compared with the desired target and corrections are made until the model reaches the lowest possible error.For this reason the ANN is a learning technique which is made in parallel.The output variables are obtained through the mathematical transfer functions that are non-linear.In summary to fit the network to a dataset is required to partition the data into three parts: the training (used to obtain the network weights), validation (used to obtain the accuracy of the model), and test (used to obtain the realistic estimative of the performance of the model).The ANN architecture used in our study was the Multilayer Perceptron previously cited by many authors as adequate network for meteorological applications because the weather and the climate can repeat along the chronological time but never exactly in the same way (e.g., Cardoso and da Silva Dias, 2004).The MATLAB were used to develop the ANN schemes.The discrepancy between the target and the modelled values were assessed by evaluating how goodness was the fit using the Root Mean Squared Error (RMSE) and the linear correlation coefficient.

The climate model
The Climate Forecast System v2 (CFSv2) is an oceanatmosphere-surface coupled model with the T126/100 km horizontal resolution and 64 hybrid sigma-pressure levels.The model showed improvements in data assimilation between atmosphere-ocean-surface and interaction between cloud-aerosol-radiation compared with its previous version the CFSv1 (Saha et al., 2013).The 24-member ensemble was used which runs started on the first month of each rainy season on the first day and every five successive days of the month and for all four times of each day (00, 06, 12 and 18 UTC).
According to Saha et al. (2013) the CFSv2 do not present significant skill improvements in global land precipitation forecast whereas the SST prediction has been improved over most of the global oceans, mainly over the extratropics.Over the tropical oceans the CFSv2 skill was slightly lower than that from CFSv1.This is related to the subsurface initial states of the reanalysis used in the model that showed significantly warmer in predicted SST after 1999, due to the introduction of the ATOVS satellite data.
For the second hindcasts period from 1999 to 2010, Silva et al. (2013) found that the pattern of rainfall was particularly well captured by the model in both Dec to Feb and Jun to Aug trimesters; however, there were differences in the seasonal averages that were more notable in tropical and midlatitude over the North Atlantic Ocean during the summer.The model captured satisfactorily the interannual variability pattern of SST although the maximum variance was shifted slightly to the east than in the reanalysis.Also, the extratropical variability in both hemispheres was well reproduced by the CFSv2.

The EOF analysis
The dominant modes of observed SST variability over the South and North Atlantic Ocean, 0 respectively, were obtained by Empirical Orthogonal Functions analysis -EOF (Wilks, 1995) for the rainy seasons from the selected grid boxes.The EOFs were obtained by data covariance matrix of SST anomalies being possible to identify the modes (patterns) that capture  were obtained by using their respective EOF time series as explanatory data.As in Hurrell et al. (2006) no statistical significant test was applied in the correlation fields once the purpose was to emphasize the physical aspects.

Statistical downscaling from the CFSv2 hindcasts
The mean total annual rainfall of observed precipitation on the north grid box over the NEB was 43 mm and the wetseason occurred during Feb-Mar-Apr (FMA) concentrating 56 % of this total (Fig. 1a).The CFSv2 well simulated the annual cycle of rainfall, however, with an underestimation of 0.71 mm on the average of the wet-season.On the south grid box the mean total annual rainfall was 35 mm and Nov-Dec-Jan (NDJ) was the wettest season with 50 % of the total accumulated.The model also showed a fairly representation of the annual cycle however with some underestimation of 0.35 mm during this wet season (Fig. 1b).
For each grid box the nonlinear ANN's created consisted of 2 layers with 10 neurons on the hidden layer and 1 on the output layer.For both rainy seasons (FMA and NDJ) the input and target data were the hindcasts and observed precipitation, respectively.For the training we used 60 % of the observed precipitation data, 20 % for validation and 20 % for test.The hyperbolic tangent sigmoid and the retropropagation were the transfer function and the training algorithm used, respectively.The hyperbolic tangent sigmoid function applied in both layers ranged from −1 to +1 and allowed the transfer of relative weights from the targets to the outputs.The retropropagation method seemed to be faster for the present problem.2a.However, the Fig. 2b shows that in some years the network was unable to reproduce the observed precipitation extremes values.
Regarding the south grid box during NDJ season the network was successfully trained with 15 iterations and RMSE was 2.14.Before the training the correlation coefficient between the time series of precipitation observed and simulated by the CFSv2 was 0.64 whereas after the training the value obtained increased to 0.71 (Fig. 3a).In some years the downscaled time series shows an underestimation of the precipitation mainly during the first decade.However in others years, for example, 2001, 2006 and 2007 the network tended to overestimate the precipitation values.
The results indicated that the networks constructed showed a reasonable skill in reproducing seasonal precipitation on the two grid boxes over the NEB.This efficacy was obtained by analyzing the dispersion between the output and target data before and after training of the network.

The main modes of observed anomalous SST and their relationship with the precipitation
The dominant modes of observed SST variability over the South and North Atlantic Ocean for NDJ and FMA seasons were captured by EOF analysis.The choice to compute the EOF modes applied to the seasonal instead to annual SST was that changes on trends shows to be more remarkable among the seasons.The corresponding EOF time series (Fig. 4b) was marked by intraseasonal and interannual variability and showed good correlation with the El Niño Southern Oscillation -ENSO (Kousky et al., 1984), the negative phase of the North Atlantic Oscillation -NAO (Hurrell et al., 2003) and positive phase of the Indian Ocean Dipole -IOD (Saji et al., 2010) (Fig. 4c).The correlation with the SLP anomalies showed negative values over the subtropical South Atlantic Ocean and positive over the most part of the tropical North Atlantic Ocean, North and Northeast regions of Brazil that are climatologically influenced by the South Atlantic Convergence Zone -SACZ (Fig. 4d).It suggested that during NDJ the positive phase of the SAD is associated with the weakening of the South Atlantic Subtropical High followed by its displacement to south and weakening of the northeast trade wind.This implies in a high pressure values and inhibition of convection over the SACZ region, North and Northeast of Brazil with opposite regime on the south of Brazil.Also, the pattern of canonical El Niño seemed to influence the subsidence over NEB during this period.According Chan et al. (2008) the IOD(+) contributes to displacement of Subtropical Atlantic High to the south by the Rossby waves train that extends from the Southern Indian and South Atlantic Ocean reaching the subtropical latitudes.Our analysis suggested a combined effect of the canonical El Niño, NAO(−), IOD(+) and positive phase of the SAD modulating the drought over the NEB.Opposite impact may occur when the opposite phase of these patterns occurs.
The SST variability over the North Atlantic Ocean during NDJ season was represented by the NAO mode (Fig. 5a) and the variance captured was 24.2 % in this area exhibiting a periodicity in intraseasonal and interannual scales (Fig. 5b).Negative correlations were presented almost all oceanic basins being more intense in isolated regions of the tropical western Pacific and Indian Oceans suggesting a com- For the FMA season the main mode captured over the South Atlantic Ocean showed a variance of 30.3 % that was higher compared to that found in NDJ, however, with similarities in spatial pattern (Fig. 6a).The time series showed predominance of intraseasonal and interannual scales (Fig. 6b) and during the years of more intense ENSO the signal showed a larger amplitude and change of phase when compared with the time series captured for NDJ (Fig. 4b).This indicates that the variability of SST anomalies in the South Atlantic Ocean during FMA season responds with lag of nearly 3 months after the occurrence of a maximum ENSO event been in agreement with the results of Haarsma et al. (2003).Also, according to our results, the SAD mode during FMA trimester may be associated with positive SST anomalies over almost the entire Indian Ocean while over the equatorial Pacific the correlations are very weak and negative (Fig. 6c).
The warming of the Indian Ocean could be related with the remote response of El Niño events occurred in NDJ.Such mechanism was suggested by Taschetto and Ambrizzi (2012) leading to the positive phase or heating mode of the Indian Ocean basin-wide or IOBW(+).The correlation between the SAD with the SLP anomalies during FMA indicated negative correlations over all the South Atlantic and Tropical North Oceans and central-east Brazil (Fig. 6d).In other words, the weakening of Subtropical Atlantic High and consequent weakening of the southeast trade winds and the slightly reduction of the northeast trade winds contributes to the positioning of the Intertropical Convergence Zone to the south of its climatology and more rain over the NEB.In this case the local effect of the South Atlantic Ocean seems to be largest than remote impact of Indian Ocean.
The first eigenvector of SST anomalies over the North Atlantic during FMA trimester captured 28 % of the variability on this basin (Fig. 7a) being associated with the NAO showing anomalies of opposite signs between the south-central and northwest area.The time series showed intraseasonal and interannual frequencies (Fig. 7b) and the positive phase of the NAO suggested a combined influence with the cooling of the central-eastern Equatorial Pacific and Indian Oceans (Fig. 7c).Negative correlation values predominated in most part of the South Atlantic and positive to the north (Fig. 7d)

The EOF time series used as explanatory variables
The correlation between the observed anomalous precipitation during NDJ (FMA) over the south (north) grid box and the time series of the SAD mode was 0.11 (0.21) whereas with the NAO mode was 0.10 (0.34).The low values of the linear correlations reflect the high climate variability and the non-linearity over the NEB that is also indicated by the correlations values between the EOF time series and the others oceanic basins in Figs.4c, 5c, 6c and 7c.Also, we believe that the precipitation anomalies observed during the trimesters NDJ and FMA could be more influenced by the captured modes if a lag in time was used.
The possible reduction in dispersion between the observed anomalous precipitation and the modes of SST was examined by applying the ANN technique.The network created consisted of two layers: 20 neurons in hidden layer and 1 in output layer.We used 60 % of the observed precipitation data during FMA season for training, 20 % for validation and 20 % for test.The explanatory variables were the main modes of SST anomalies captured in this season.The transfer function was the hyperbolic tangent sigmoid and the training algorithm was the Levenberg-Marquardt backpropagation.This training algorithm provides a fewer number of interactions until reaching the lowest error and highest correlation coefficient being more efficient for approximation of nonlinear relationships as in climate diagnostics studies.Also, it was used in previous studies based on EOF modes to reproduce the precipitation in other regions of the globe (e.g., Trigo and Palutikof, 2001).We tested some parameters during the construction of the networks, for example, the number and percentage of neurons for training-validation-test.The network showed to be more efficient for the fit with 20 neurons and 60 %-20 %-20 % between training-validationtest than using 10 neurons, 70 %-15 %-15 % relationship or 10 neurons and 60 %-20 %-20 % relationship.
Figures 8 and 9 show the validation for the south and north of NEB region, respectively.When the time series of the SAD and NAO modes were used as explanatory variables, the correlations with the CMAP anomalies (target) were 0.59 and 0.58 and the network was able in simulating the trends and reduced the dispersion between the output and target data (Fig. 8a and c, respectively).However, the RMSE of 2.57 and 2.71 in fit of Fig. 8b and d, respectively, suggest some deficiency in reproducing the extreme values of the observed precipitation anomalies.Likewise, after training when the time series of the SAD and NAO modes were used as explanatory dataset for the north grid box during the FMA (Fig. 9a and c,

Discussions
The CFSv2 model exhibited fairly ability to reproduce the annual cycle of precipitation over the south and north parts of the NEB during 1982-2009 period.However, a slight underestimation on the average of the wet-seasons of the both parts was found.The ANN showed to be an efficient tool to reduce the dispersion between the explanatory and dependent variables when compared with both dataset before and after training and even with no lag in time.This improvement was higher when the dispersion before training was higher.However, the statistical method showed a typical problem of underestimating or, in some cases, of overestimating the simulated precipitation extremes during some years.
The SAD and NAO modes captured in both NDJ and FMA seasons showed associations with SST anomalies over different oceanic basins and consequently different impacts in atmospheric circulations and rainfall over the NEB were found.Comparing FMA and NDJ seasons the SAD mode showed opposite influence on the rainy regime over the NEB.Low correlation values between the EOF time series and the observed precipitation anomalies over the NEB during the same season were found suggesting the presence of the high variability.However, when the EOF time series were used as input in the networks an increase in correlations was noted indicating that the SST modes contained sufficient information to explain the precipitation anomalies even using a short dataset (28 yr).However, a longer dataset in the network is required to capture the contribution of the non-linear process in more detail.The smallest RMSE and the best explaining on the variability in the observed precipitation anomalies were obtained by fitting EOF time series of the SAD and NAO modes during the FMA season.In a future analysis we intend to investigate the decadal modulation of these modes by the Pan Atlantic Oscillation and impacts over the NEB.Although the networks constructed in the present study were limited, the ANNs showed to be an important complementary tool for the climate studies over the NEB.We suggest that is possible to use them in further analyses, for example, for seasonal forecast purposes based in statistical downscaling but applying a longer period (e.g., 40 yr or more) and lag in time.Also, additional input variables (i.e., moisture content, temperature, wind) are required to improve the performance of the statistical downscaling models discussed in this work.

Fig. 1 .Fig. 2 .Fig. 3 .
Fig. 1.Annual cycle of total rainfall (mm) for the two grid boxes defined over NEB region: (a) to the north; (b) to the south.

Fig. 4 .
Figure 4.1Fig.4. First EOF for NDJ of 1982-2009 over the South Atlantic Ocean: (a) loading patterns; (b) time series; (c) linear correlation between the EOF time series and observed anomalous SST and (d) linear correlation between the EOF time series and observed anomalous SLP.The statistical analyses and the respective maps were constructed by the IRI/LDEO Climate Data Library.
Figures 2 and 3 illustrate the comparison between observed and downscaled precipitation during FMA (NDJ) season on the north (south) grid box over the NEB region.In both figures the top shows the corresponding scatterplots and the bottom shows the monthly time series to validation period.For the north grid box the validation performance reached a minimum error at 36th iteration and 6 more iterations were made before the training stopped.The RMSE was 1.53 and the validation and test curves were very similar (figure not shown).The correlation coefficient between the time series of observed precipitation and simulated by the CFSv2 during FMA season was 0.80 whereas after downscaling the R-value was 0.83 represented by the scatterplot in Fig.
Figure 4 illustrates the result of EOF analysis for NDJ season.In Fig. 4a the main spatial pattern captured 24.1 % of variance in SST anomalies over the South Atlantic Ocean being characterized by positive anomalies over most area except on central west between 25 • S-40 • S. The largest amplitude occurred in the central and east nearest the coast of Africa.The pattern captured is defined as South Atlantic Dipole -SAD (Haarsma et al., 2003; among others).According to Bombardi and Carvalho (2010), the positive (negative) SST anomalies over the Tropical Atlantic Ocean and negative (positive) over the subtropics are associated with anticipation (delay) of onset of monsoon in South America and wet summers (dry) on the Northeast of Brazil.

Fig. 8 .
Figure 8. 1 Fig. 8. Validation results for NDJ season when the time series of the (a) South Atlantic Dipole and (c) North Atlantic Oscillation modes were used as explanatory variables.On the left-hand column is the monthly time series of observed (blue) and downscaled anomalous precipitation (green).On the right-hand column is the corresponding scatterplots (b) and (d), respectively.The R-values are 0.59 and 0.58 for panels (b) and (d), respectively.The scatterplots are adapted from the Matlab tool and solid gray line represents the fit model.