Evaluation of random forests and Prophet for daily streamflow forecasting
Georgia A. Papacharalampous
CORRESPONDING AUTHOR
Department of Water Resources and Environmental Engineering, National
Technical University of Athens, Zografou, 157 80, Greece
Hristos Tyralis
Air Force Support Command, Hellenic Air Force, Elefsina, 192 00,
Greece
Related authors
Hristos Tyralis and Georgia A. Papacharalampous
Adv. Geosci., 45, 147–153, https://doi.org/10.5194/adgeo-45-147-2018, https://doi.org/10.5194/adgeo-45-147-2018, 2018
Short summary
Short summary
We use the CAMELS dataset to compare two different approaches in multi-step ahead forecasting of monthly streamflow. The first approach uses past monthly streamflow information only, while the second approach additionally uses past information about monthly precipitation and/or temperature (exogenous information). The incorporation of exogenous information is made by utilizing Prophet, a model largely implemented in Facebook. The findings suggest that the compared approaches are equally useful.
Thomas Dimopoulos, Hristos Tyralis, Nikolaos P. Bakas, and Diofantos Hadjimitsis
Adv. Geosci., 45, 377–382, https://doi.org/10.5194/adgeo-45-377-2018, https://doi.org/10.5194/adgeo-45-377-2018, 2018
Short summary
Short summary
The paper examines a machine learning algorithm (Random Forests) in comparison with Multivariate Linear Regression, for a data-set of 3500 transactions of residential apartments in Nicosia District in Cyprus. The methodology suggested, indicated high accuracy of the Random Forests Method, that can be applied in automated valuation models and CAMA systems.
Hristos Tyralis and Georgia A. Papacharalampous
Adv. Geosci., 45, 147–153, https://doi.org/10.5194/adgeo-45-147-2018, https://doi.org/10.5194/adgeo-45-147-2018, 2018
Short summary
Short summary
We use the CAMELS dataset to compare two different approaches in multi-step ahead forecasting of monthly streamflow. The first approach uses past monthly streamflow information only, while the second approach additionally uses past information about monthly precipitation and/or temperature (exogenous information). The incorporation of exogenous information is made by utilizing Prophet, a model largely implemented in Facebook. The findings suggest that the compared approaches are equally useful.
Cited articles
Abrahart, R. J., See, L. M., and Dawson, C. W.: Neural Network
Hydroinformatics: Maintaining Scientific Rigour, in: Practical Hydroinformatics,
edited by: Abrahart, R. J., See,
L. M., and Solomatine, D. P., Springer-Verlag
Berlin Heidelberg, 33–47, https://doi.org/10.1007/978-3-540-79881-1_3,
2008.
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: Catchment
attributes for large-sample studies, Boulder, CO, UCAR/NCAR,
https://doi.org/10.5065/D6G73C3Q, 2017a.
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data
set: catchment attributes and meteorology for large-sample studies, Hydrol.
Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017b.
Allaire, J. J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A.,
Wickham, H., Cheng, J., and Chang, W.: rmarkdown: Dynamic Documents for R. R
package version 1.10, available at: https://CRAN.R-project.org/package=rmarkdown (last
access: 16 August 2018), 2018.
Auguie, B.: gridExtra: Miscellaneous Functions for “Grid” Graphics, R
package version 2.3, available at: https://CRAN.R-project.org/package=gridExtra (last
access: 16 August 2018), 2017.
Biau, G. and Scornet, E.: A random forest guided tour, TEST, 25,
197–227, https://doi.org/10.1007/s11749-016-0481-7, 2016.
Breiman, L.: Random Forests, Mach. Learn., 45, 5–32,
https://doi.org/10.1023/A:1010933404324, 2001.
Cleveland, R. B., Cleveland, W. S., McRae, J. E., and Terpenning, I.: STL: A
Seasonal-Trend Decomposition Procedure Based on Loess, J. Off. Stat., 6,
3–33, 1990.
Elshorbagy, A., Corzo, G., Srinivasulu, S., and Solomatine, D. P.:
Experimental investigation of the predictive capabilities of data driven
modeling techniques in hydrology – Part 1: Concepts and methodology, Hydrol.
Earth Syst. Sci., 14, 1931–1941, https://doi.org/10.5194/hess-14-1931-2010, 2010a.
Elshorbagy, A., Corzo, G., Srinivasulu, S., and Solomatine, D. P.:
Experimental investigation of the predictive capabilities of data driven
modeling techniques in hydrology – Part 2: Application, Hydrol. Earth Syst.
Sci., 14, 1943–1961, https://doi.org/10.5194/hess-14-1943-2010, 2010b.
Gagolewski, M.: stringi: Character String Processing Facilities, R package
version 1.2.4, available at: https://CRAN.R-project.org/package=stringi (last access: 16
August 2018), 2018.
Grolemund, G. and Wickham, H.: Dates and Times Made Easy with lubridate, J.
Stat. Softw., 40, https://doi.org/10.18637/jss.v040.i03, 2011.
Hyndman, R. J. and Athanasopoulos, G.: Forecasting: Principles and Practice,
available at: https://otexts.org/fpp2/ (last access: 16 August 2018), 2018.
Jain, S. K., Mani, P., Jain, S. K., Prakash, P., Singh, V. P., Tullos, D.,
Kumar, S., Agarwal, S. P., and Dimri, A. P.: A Brief review of flood
forecasting techniques and their applications, Int. J. River Basin Man.,
https://doi.org/10.1080/15715124.2017.1411920, 2018.
Kingston, D. G., Lawler D. M., and McGregor, G. R.: Linkages between
atmospheric circulation, climate and streamflow in the northern North
Atlantic: research prospects, Prog. Phys. Geography, 30, 143–174,
https://doi.org/10.1191/0309133306pp471ra, 2006.
Li, L., Schmitt, R. W., and Ummenhofe, C. C.: The role of the subtropical North
Atlantic water cycle in recent US extreme precipitation events, Clim. Dynam.,
50, 1291–1305, https://doi.org/10.1007/s00382-017-3685-y, 2018.
Lima, A. R., Cannon, A. J., and Hsieh, W. W.: Nonlinear regression in
environmental sciences using extreme learning machines: A comparative
evaluation, Environ. Model. Softw., 73, 175–188,
https://doi.org/10.1016/j.envsoft.2015.08.002, 2015.
Messner, J. W.: Chapter 11 – Ensemble Postprocessing With R,
in: Statistical Postprocessing of Ensemble Forecasts, edited by: Vannitsem,
S., Wilks, D. S., and Messner, J. W., Elsevier, 291–329,
https://doi.org/10.1016/B978-0-12-812372-0.00011-X, 2018.
Newman, A. J., Sampson, K., Clark, M. P., Bock, A., Viger, R. J., and
Blodgett, D.: A large-sample watershed-scale hydrometeorological dataset for
the contiguous USA, Boulder, CO, UCAR/NCAR, https://doi.org/10.5065/D6MW2F4D, 2014.
Newman, A. J., Clark, M. P., Sampson, K., Wood, A., Hay, L. E., Bock, A.,
Viger, R. J., Blodgett, D., Brekke, L., Arnold, J. R., Hopson, T., and Duan,
Q.: Development of a large-sample watershed-scale hydrometeorological data
set for the contiguous USA: data set characteristics and assessment of
regional variability in hydrologic model performance, Hydrol. Earth Syst.
Sci., 19, 209–223, https://doi.org/10.5194/hess-19-209-2015, 2015.
Papacharalampous, G., Tyralis, H., and Koutsoyiannis, D.: Error evolution in
multi-step ahead streamflow forecasting for the operation of hydropower
reservoirs, Preprints, 2017100129, https://doi.org/10.20944/preprints201710.0129.v1,
2017a.
Papacharalampous, G., Tyralis, H., and Koutsoyiannis, D.: Forecasting of
geophysical processes using stochastic and machine learning algorithms, Eur.
Water, 59, 161–168, 2017b.
Papacharalampous, G., Tyralis, H., and Koutsoyiannis, D.: Comparison of
stochastic and machine learning methods for multi-step ahead forecasting of
hydrological processes, Preprints, 2017100133,
https://doi.org/10.20944/preprints201710.0133.v2, 2018a.
Papacharalampous, G., Tyralis, H., and Koutsoyiannis, D.: One-step ahead
forecasting of geophysical processes within a purely statistical framework,
Geosci. Lett., 5, https://doi.org/10.1186/s40562-018-0111-1, 2018b.
Papacharalampous, G., Tyralis, H., and Koutsoyiannis, D.: Predictability of
monthly temperature and precipitation using automatic time series
forecasting methods, Acta Geophys., 66, 807–831,
https://doi.org/10.1007/s11600-018-0120-7, 2018c.
Peterson, R. A.: bestNormalize: Normalizing Transformation Functions, R
package version 1.2.0, available at:
https://CRAN.R-project.org/package=bestNormalize
(last access: 16 August 2018), 2018.
Petty, T. R. and Dhingra, P.: Streamflow Hydrology Estimate Using Machine
Learning (SHEM), J. Am. Water Resour. As., 54, 55–68,
https://doi.org/10.1111/1752-1688.12555, 2018.
Probst, P. and Boulesteix, A. L.: To tune or not to tune the number of
trees in random forest, J. Mach. Learn. Res., 18, 1–18, 2018.
R Core Team: R: A language and environment for statistical computing, R
Foundation for Statistical Computing, Vienna, Austria,
available at: https://www.R-project.org/ (last access: 16 August 2018), 2018.
Scornet, E., Biau, G., and Vert, J. P.: Consistency of random forests, Ann.
Stat., 43, 1716–1741, https://doi.org/10.1214/15-AOS1321, 2015.
Shortridge, J. E., Guikema, S. D., and Zaitchik, B. F.: Machine learning
methods for empirical streamflow simulation: a comparison of model accuracy,
interpretability, and uncertainty in seasonal watersheds, Hydrol. Earth
Syst. Sci., 20, 2611–2628, https://doi.org/10.5194/hess-20-2611-2016, 2016.
Solomatine, D. P. and Ostfeld, A.: Data-driven modelling: some past
experiences and new approaches, J. Hydroinform., 10, 3–22,
https://doi.org/10.2166/hydro.2008.015, 2008.
Spinu, V., Grolemund, G., and Wickham, H.: lubridate: Make Dealing with
Dates a Little Easier, R package version 1.7.4,
available at: https://CRAN.R-project.org/package=lubridate (last access: 16 August
2018), 2018.
Taylor, S. J. and Letham, B.: Forecasting at scale, Am. Stat., 72,
37–45, https://doi.org/10.1080/00031305.2017.1380080, 2018a.
Taylor, S. J. and Letham, B.: prophet: Automatic Forecasting Procedure, R
package version 0.3.0.1, available at: https://CRAN.R-project.org/package=prophet (last
access: 16 August 2018), 2018b.
Thornton, P. E., Thornton, M. M., Mayer, B. W., Wilhelmi, N., Wei, Y.,
Devarakonda, R., and Cook, R. B.: Daymet: Daily Surface Weather Data on a
1-km Grid for North America, Version 2, ORNL DAAC, Oak Ridge, Tennessee,
USA, https://doi.org/10.3334/ORNLDAAC/1219, 2014.
Tyralis, H. and Koutsoyiannis, D.: A Bayesian statistical model for
deriving the predictive distribution of hydroclimatic variables, Clim.
Dynam., 42, 2867–2883, https://doi.org/10.1007/s00382-013-1804-y, 2014.
Tyralis, H. and Papacharalampous, G.: Variable selection in time series
forecasting using random forests, Algorithms, 10,
https://doi.org/10.3390/a10040114, 2017.
Tyralis, H. and Papacharalampous, G. A.: Large-scale assessment of Prophet
for multi-step ahead forecasting of monthly streamflow, Adv. Geosci., 45,
147–153, https://doi.org/10.5194/adgeo-45-147-2018, 2018.
Tyralis, H., Dimitriadis, P., Koutsoyiannis, D., O'Connell, P. E., Tzouka,
K., and Iliopoulou, T.: On the long-range dependence properties of annual
precipitation using a global network of instrumental measurements, Adv.
Water Resour., 111, 301–318, https://doi.org/10.1016/j.advwatres.2017.11.010, 2018.
Verikas, A., Gelzinis, A., and Bacauskiene, M.: Mining data with random
forests: A survey and results of new tests, Pattern Recogn., 44,
330–349, https://doi.org/10.1016/j.patcog.2010.08.011, 2011.
Warnes, G. R., Bolker, B., Gorjanc, G., Grothendieck, G., Korosec, A.,
Lumley, T., MacQueen, D., Magnusson, A., and Rogers, J.: gdata: Various R
Programming Tools for Data Manipulation, R package version 2.18.0,
available at: https://CRAN.R-project.org/package=gdata (last access: 16 August 2018),
2017.
Wickham, H.: ggplot2, Springer International Publishing,
https://doi.org/10.1007/978-3-319-24277-4, 2016.
Wickham, H.: scales: Scale Functions for Visualization, R package version
1.0.0, available at: https://CRAN.R-project.org/package=scales (last access: 16 August
2018), 2018.
Wickham, H., Hester, J., and Francois, R.: readr: Read Rectangular Text
Data, R package version 1.1.1, available at:
https://CRAN.R-project.org/package=readr
(last access: 16 August 2018), 2017.
Wickham, H, Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C.,
and Woo, K.: ggplot2: Create Elegant Data Visualisations Using the Grammar
of Graphics, R package version 3.0.0,
available at: https://CRAN.R-project.org/package=ggplot2 (last access: 16 August 2018),
2018a.
Wickham, H., Hester, J., and Chang, W.: devtools: Tools to Make Developing R
Packages Easier, R package version 1.13.6,
available at: https://CRAN.R-project.org/package=devtools (last access: 16 August 2018),
2018b.
Wright, M. N.: ranger: A Fast Implementation of Random Forests, R package
version 0.10.1, available at: https://CRAN.R-project.org/package=ranger (last access: 16
August 2018), 2018
Wright, M. N. and Ziegler, A.: ranger: A Fast Implementation of Random
Forests for High Dimensional Data in C and R, J. Stat. Softw., 77,
https://doi.org/10.18637/jss.v077.i01, 2017.
Wu, W., Dandy, G. C., and Maier, H. R.: Protocol for developing ANN models
and its application to the assessment of the quality of the ANN model
development process in drinking water quality modelling, Environ. Modell.
Softw., 54, 108–127, https://doi.org/10.1016/j.envsoft.2013.12.016, 2014.
Xie, Y.: knitr: A Comprehensive Tool for Reproducible Research in R, in:
Implementing Reproducible Computational Research, Chapman and Hall/CRC,
2014.
Xie, Y.: Dynamic Documents with R and knitr, 2nd edition, Chapman and
Hall/CRC, 2015.
Xie, Y.: knitr: A General-Purpose Package for Dynamic Report Generation in
R, R package version 1.20, available at: https://CRAN.R-project.org/package=knitr (last
access: 16 August 2018), 2018.
Zeileis, A. and Grothendieck, G.: zoo: S3 infrastructure for regular and
irregular time series, J. Stat. Softw., 14, https://doi.org/10.18637/jss.v014.i06, 2005.
Zeileis, A., Grothendieck, G., and Ryan, J. A.: zoo: S3 Infrastructure for
Regular and Irregular Time Series (Z's Ordered Observations), R package
version 1.8-3, available at: https://CRAN.R-project.org/package=zoo (last access: 16
August 2018), 2018.
Zhang, Z., Zhang, Q., and Singh, V. P.: Univariate streamflow forecasting
using commonly used data-driven models: literature review and case study,
Hydrolog. Sci. J., 63, 1091–1111, https://doi.org/10.1080/02626667.2018.1469756,
2018.
Short summary
The predictive performance of random forests (a machine learning algorithm)
and three configurations of Prophet (a method largely implemented in
Facebook) is assessed in daily streamflow forecasting in a river in the US.
Random forests perform better compared to the utilized benchmarks, i.e. a naïve
method and a multiple regression linear model, while Prophet's performance is
subject to improvements. Random forests are recommended for daily streamflow
forecasting.
The predictive performance of random forests (a machine learning algorithm)
and three...