Nowadays hydrologic model simulations are widely used to better understand hydrologic processes and to predict extreme events such as floods and droughts. In particular, the spatially distributed LISFLOOD model is currently used for flood forecasting at Pan-European scale, within the European Flood Awareness System (EFAS). Several model parameters can not be directly measured, and they need to be estimated through calibration. In this work we describe how the free software “R” has been used as a single environment to pre-process hydro-meteorological data, to carry out global optimization, and to post-process calibration results in Europe.
Historical daily discharge records were pre-processed for 4062 stream gauges, with different amount and distribution of data in each one of them. The hydroTSM, raster and sp R packages were used to select 700 stations with an adequate spatio-temporal coverage. Selected stations span a wide range of hydro-climatic characteristics. Nine parameters were selected to be calibrated based on previous expert knowledge. Customized R scripts were used to extract observed time series for each catchment and to prepare the input files required to fully set up the calibration thereof. The hydroPSO package was then used to carry out a single-objective global optimization on each selected catchment, by using the Standard Particle Swarm 2011 (SPSO-2011) algorithm. Among the many goodness-of-fit measures available in the hydroGOF package, the Nash-Sutcliffe efficiency was used to drive the optimization. User-defined functions were developed for reading model outputs and passing them to the calibration engine. The long computational time required to finish the calibration at continental scale was partially alleviated by using 4 multi-core machines (with both GNU/Linux and Windows OS) and the “parallel” option available in hydroPSO. Calibration results (not described here) were automatically produced in both text and graphical formats, including a comparison of observed and simulated hydrographs, histograms, boxplots and dotty plots with the parameter values sampled during the optimization. Graphical results allowed a quick assessment of model performance and the identification of individual problems during calibration.
This work illustrates how R proved to be a valuable environment to facilitate modeling, visualization, and data analysis at continental scale in an efficient and reproducible way, without switching to other applications to perform single analyzes. The application discussed here relates to the calibration of a hydrologic model written in pyhton+PCRaster. However, considering the exponentially increasing number of contributed packages, the multi-platform architecture, and the scripting capabilities available, we believe R is a promising environment for hydrology and a similar approach can be applied to a wider class of models requiring parameter optimization.
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Using R for Global Optimization of a Fully-distributed Hydrological Model at Continental Scale (AGU 2013)
1. Using R for Global Optimization of a Fully-distributed
Using R for Global Optimization of a Fully-distributed
Hydrological Model at Continental Scale
Hydrological Model at Continental Scale
Mauricio Zambrano-Bigiarini, Zuzanna Zajac and Peter Salamon
Mauricio Zambrano-Bigiarini, Zuzanna Zajac and Peter Salamon
1) Motivation
Joint
Research
Centre
AGU 2013-1804792
Identifier: H51R-06
Dec 13th, 2013
6) Calibration results + post-processing
The spatially-distributed LISFLOOD hydrological model is
used for flood forecasting at Pan-European scale, within
the European Flood Awareness System (EFAS).
Several model parameters need to be estimated through
calibration for ca. 700 subcatchments.
Calibrating all the individual catchment for the whole
Europe is a very time consuming and prone-to-error task.
2) Aim
To describe and illustrate how the free software R has been
used as a single environment to pre-process
hydro-meteorological data, carry out global optimization,
and to post-process calibration results at European scale.
3) Why using R for massive
hydrological analysis?
●
●
●
●
●
●
●
●
●
●
Fig 03. Evolution of the global optimum (Nash-Sutcliffe efficiency) and the normalized
swarm radius (δnorm) along the number of iterations.
Base functions allow efficient data manipulation and
storage (spatial data and time series).
Support for almost every vectorial and raster spatial
format (rgdal, raster and sp packages).
R is both a scientific software and a programming
language (types, objects, functions, extensions).
Scripting capabilities allow explicit documentation and
reproducible research.
Fully-customizable and high-quality graphical functions
for exploratory data analysis and visualization.
Highly extensible (4000+ packages with state-of-the art
contributions in several fields of knowledge).
Easy integration with other languages (C/C++, Fortran,
Python, etc), e.g., for intensive computations.
Easy parallelization (multi-core machines or network
clusters).
Multi-platform (GNU/Linux, Mac/OS X, Windows)
Free and open-source.
Fig 01. Shaded boxes represent the seven major calibration areas used for splitting up the
pan-European spatial domain. Colored dots represent discharge stations coming from two
different data sources, which were analyzed to select ca. 700 stations for calibration.
5) Model Calibration
NSE
Fig 06. Figure automatically generated for assessing the quality of the calibration results of each single
catchment. The upper panel shows a comparison of the observed and simulated hydrographs during the
verification period; the lower left panel shows a comparison of the flow duration curves thereof, while the
lower right panel shows numerical statistics for comparing observations with their simulated counterparts.
7) Concluding Remarks
Fig 04. Nash-Sutcliffe efficiency (NSE) response surface projected onto the parameter
space (pseudo 3D-dotty plots) for selected parameters, to highlight equifinality issues.
4) Pre-processing
●
●
●
●
●
Historical daily data for
national providers).
4062 stream gages (from
●
hydroTSM, sp and raster packages were used to
select ~700 stations with enough temporal data and
good spatial distribution across Europe.
Nine parameters were selected for calibration based on
previous expert knowledge.
The pan-European spatial extent was split up into 7
main calibration areas, in order to speed up the model
computation time.
Customized R scripts were used to extract observed
time series for each catchment and to prepare the input
files required for individual calibrations (i.e.,
ParamRanges.txt, ParamFiles.txt, obs.tss,
and hydroPSO-subbXXX.R files along with a masking
area map defining the drainage area of individual
catchments).
www.jrc.europa.eu
●
●
●
●
Fig 02. Flow chart of the calibration of a single catchment. Files ParamRanges.txt and
ParamFiles.txt defines which parameters are to be calibrated and where they have to be
modified, respectively. Settings.xml defines location of model input files and the value of model
parameters. Light-blue shaded boxes indicate some user intervention, while light-yellow shaded
boxes represent static input files (not modified during optimization).
obs.tss : file with observed discharges.
●
dis.tss
: file with simulated discharges.
●
read_tss(): user-defined R function for
reading .tss files
●
●
Fig 05. Dotty plots showing the model performance (NSE) versus parameter values, for
three selected parameters. Vertical red line indicates the “optimum” parameter value.
●
●
●
2011 (hydroPSO package).
Mauricio Zambrano-Bigiarini*, Zuzanna Zajac and Peter Salamon
European Commission • Joint Research Centre • Institute for Environment and Sustainability
*Currently at: EULA-Chile Centre, University of Concepción (Chile) • Email: mauricio.zambrano@udec.cl
data analysis at continental scale.
The use of a single environment for pre-processing, calibrating and
post-processing of results made easier further changes to any step of the
workflow.
Results in hundreds of catchments with different hydro-climatological regimes
showed that hydroPSO is an effective and efficient R package for finding
near-optimal parameter sets at a low computation cost.
Notwithstanding this case study is related only to the calibration of a hydrological
model written in Ptyhon+PCRaster, we believe that a similar approach can be
applied to a wide class of environmental models requiring some form of
parameter optimization, from micro to global scale.
References:
NSE() : R function for computing the Nash-Sutcliffe
efficiency (hydroGOF package)
● SPSO-2011 : Standard Particle Swarm Optimization
The use of the 'parallel' option available in the hydroPSO, allowed a
substantial reduction of the total calibration time (ca. 50% with 6 cores).
R proved to be an efficient environment to facilitate modeling, visualization and
●
●
EFAS (2013), “European Flood Awareness System”, http://www.efas.eu/. [Online. Last accessed 05-Dec-2013]
van Der Knijff, J. M., J. Younis, and A. P. J. De Roo (2010), LISFLOOD: a GIS-based distributed model for river basin scale water
balance and flood simulation, International Journal of Geographical Information Science, 24(2), 189–212,
doi:10.1080/13658810802549154.
Zambrano-Bigiarini, M.; R. Rojas (2013), A model-independent Particle Swarm Optimisation software for model calibration, Environmental
Modelling & Software, 43, 5-25, doi:10.1016/j.envsoft.2013.01.004.
Zambrano-Bigiarini, M. (2013). hydroTSM: Time series management, analysis and interpolation for hydrological modelling. R package
version 0.4-1. http://CRAN.R-project.org/package=hydroTSM
Zambrano-Bigiarini, M. (2013). hydroGOF: Goodness-of-fit functions for comparison of simulated and observed hydrological time series. R
package version 0.3-7. http://CRAN.R-project.org/package=hydroGOF