Content:

  • Installation

  • Preparing model inputs parameters and dataset

  • Runing DRYP model

  • Post processing model outputs

Post-processing datasets

  1. Understanding model outputs

  • Sampling model outputs

  • Post-processing model outputs

NOTE: Before you start, please change the path to access the following directories:

[ ]:
training_general_path = "D:/HAD/training/"
regional_path = "D:/HAD/training/regional/"
basin_path = "D:/HAD/training/basin/"

4.1. Understanding model outputs

Model results are stored in two different formats:

  • Comma delimited files (.csv) to store time series, and

  • netCDF files (.nc) to store gridded output datasets

  • ascii raster files (.asc) to store hydrological states for initial condions

Model outputs stored in csv files store model results at specified locations as well as average model results. Point location result files have a letter “p” followed by name of the stored variable (e.g. tht) added at the end of the name of the file. Basin average results have the letters “avg” added at the end of the file name.

[ ]:
####File does not exist
fsim = [
basin_path + "output/Tana_IMb_sim_p_dis_avg.csv",
]

[ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

TASK: Take a look at the content of the file specified above. HINT: use pandas to open and explore the dataset

[ ]:
def aggregate_slice_csv(fname, agg_step='M', mean=True,
                        date_start='2000-01-01', date_end='2023-01-01',
                        timefield='Date'):
    df = pd.read_csv(fname)
    df[timefield] = pd.to_datetime(df[timefield])
    df = df[df[timefield].between(date_start, date_end)]
    df.index = pd.DatetimeIndex(df[timefield])
    if mean is True:
            df = df.resample(agg_step).mean().reset_index()
    else:
            df = df.resample(agg_step).sum().reset_index()
    return df
[ ]:
# plot data
fig, ax = plt.subplots(3, 1, sharex=True)
fig.set_size_inches(9, 4.5)
label = ['pre', 'tht', "aet"]#, "twsc"]
field = ['pre_0','tht_0', "aet_0"]#, "twsc_0"]
#ilabel = "sim"
for ifname in fsim:
    for ifield, ilabel, iax in zip(field, label, fig.axes):
        data = aggregate_slice_csv(ifname, timefield='Date')
        #ax.plot(data['date'], data['flow(m3/s)'], label=ilabel)#data.plot(y='Flow (Cumecs)')
        if ifield == "twsc_0":
            iax.plot(data['Date'], np.cumsum(data[ifield]))#, label=ilabel)
        elif ifield == "tht_0":
            iax.plot(data['Date'], data[ifield])#, scale=True)
        else:
            iax.plot(data['Date'], data[ifield])#, label=ilabel)

        iax.set_ylabel(ilabel)
iax.legend(["Sim"], frameon=False)

Reading output stored at specific locations

[ ]:
fnamesim=[
basin_path + "output/Juba_IMa_sim_p_dis.csv",
]
#/Users/isamarcortes/Downloads/training/basin/output/Juba_IMa_sim_p_dis.csv
[ ]:
fig, ax = plt.subplots(2, 1, sharex=True)
fig.set_size_inches(9, 3)

for ifnamesim in fnamesim:
    data = aggregate_slice_csv(ifnamesim)
    stn = ['dis_5', "dis_3"]
    for istn, iax in zip(stn, fig.axes):
        iax.plot(data['Date'], data[istn]/30.5/86400, label='Sim')
        iax.set_ylabel("Flow (m3/s)")
[ ]:
# uncomment this line and save it in a local directory
#fname_out = 'd:/HAD_basins/basin_postpp/fig/Juba_streamflow.png'
#fig.savefig(fname_out, dpi=300)

TASK: plot groundwater recharge and total water storage at specific locations and basin average results.

Gridded datasets are stored by DRYP in netCDF and raster files. NetCFD files store temporal grided datasets whereas raster datasets store the last step of the simulation.

The following variables are stored as netCDF files:

  • Precipitaion (pre)

  • Potential evapotranspiration (pet)

  • Actual evapotranspiration (aet)

  • soil water content (tht)

  • Total groundwater recharge (rch)

  • Focused groundwater recharge (fch)

  • Water table elevation (wte)

  • Groundwater discharge (gdh)

  • Groundwater Evapotranspiration (egw)

  • Streamflow (dis)

  • Infiltration (inf)

  • Runoff (run)

  • Total water storage (twsc)

  • soil water content riparian zone (tht)

NetCDF files are easily handled with xarray, although netCDF4 or geopands can also be used to deal with this files.

[ ]:
import xarray as xr
[ ]:
#/Users/isamarcortes/Downloads/training/basin/output/Tana_IMb_sim_grid.nc
fname_nc = basin_path + "output/Tana_IMb_sim_grid.nc"
[ ]:
#xr.open_dataset(fname_nc)
[ ]:
def read_dataset(fname, var_name='tht'):
    # Open the first netCDF file
    data = xr.open_dataset(fname)
    data = data[var_name]
    return data

def slice_dataset_time(dataset, start_time="2003-01-01", end_time="2012-01-01"):
    # Slice the dataset between two dates
    dataset = dataset.sel(time=slice(start_time, end_time))
    return dataset
[ ]:
#read_dataset(fname_nc, var_name='tht')

DRYP stores as raster files, at the end of the simulation, the following variables:

  • water table elevation

  • soil and riparian water content

  • surface water storage

Raster files can easily be handled with rasterio.

[ ]:
import rasterio
[ ]:
def plot_raster_file(fname, ax=None, vmin=-20.0, vmax=20.0):
    # create plot
    if ax is None:
        fig, ax = plt.subplots()
    cmap = plt.cm.get_cmap('coolwarm_r', 12)
    data = rasterio.open(fname).read(1)
    im = ax.imshow(data,# origin="lower",#cmap=cmap,
                   #vmin=vmin, vmax=vmax,
                   )#extent=bounds)

    ax.axis('off')
    plt.colorbar(im)
    return im
[ ]:

fname_raster = basin_path + "output/Tana_IMb_sim_avg_wte_ini.asc"
[ ]:
plot_raster_file(fname_raster, ax=None, vmin=-20.0, vmax=20.0)

4.2. Post-processing model outputs

A post-processing component (DRYP_pptools) has been added to DRYP to perform basic operation with model outputs. This tool is still under development therefore capabilities are limited.

Bellow some of the operations that can be performed with DRYP_pptools are listed, some examples are also shown below:

  1. calulate long term average

  • calculate WRSI

  • calculate anomalies

  • calculate total water storage anomalies (TWSA)

  • calculate seasonal average

  • calculate seasonal anomalies

  • sampling model outputs

Initiallize DRYP post processing tool library

[ ]:
# Import general libraries
import matplotlib.pyplot as plt

# Import libraries from local repository
import sys
sys.path.append('C:/Users/Edisson/Documents/GitHub/DRYPv2.0.1')

import cuwalid.tools.DRYP_pptools as pptools

Initialize the grid postprocessing tool, this step will create a python object with all model output filenames. This function uses the model parameter input file to get model output and paths:

[ ]:
file_model_input = basin_path + "model/HAD_IMERG_Tana_input_sim.dmp"
[ ]:
#gridpp = pptools.grid_pptools(file_model_input)

When runing the function bellow, the long term average, WRSI, and TWSA will be estimated. These functions will directly use model paths and output files specified in the input model file

[ ]:
#gridpp.get_mean() # save mean values
#gridpp.get_wrsi() # save wsri
#gridpp.get_twsa() # save total water storage anomaly

We can also use DRYP_pptools without initalising the gridded component. We can directly call all the functions used by the post processing tools. This function is useful when model outputs are located in other folders.

A detailed description of each function can be found in DRYP documentation, which is located in DRYP/doc/build/html (you can use your browser to open html files).

  1. Calculate the long term average of model fluxes

[ ]:
fname_nc = basin_path + "postpp/Tana_IMb_sim_grid.nc"
[ ]:
fields = ['pre', 'inf', 'pet', 'rch', 'aet', 'gdh', 'egw', 'fch', 'twsc', 'run']
[ ]:
pptools.calculate_mean_from_netCDF(fname_nc, field=fields)#, fname_out=None, deltat='Y', start_time='2000-20-1', end_time=None)
[ ]:

TASK Plot long term average values of focused recharge

  1. Calculate the lWater Requirement Satisfaction Index (WRSI)

[ ]:
pptools.calculate_WRSI_from_netCDF(fname_nc)
  1. Calculate Total Water Storage Anomalies (TWSA)

[ ]:
pptools.calculate_twsa_from_netCDF(fname_nc)

  1. Calculate Aridity Index (AI)

[ ]:
pptools.calculate_AI_from_netCDF(fname_pre, fname_pet)
  1. Calculate the anomalies

[ ]:
pptools.calculate_anomalies_from_netCDF(fname_nc)
  1. Calculate seasonal averages

[ ]:
pptools.calculate_seasonal_average_from_netCDF(fname_nc, season="OND")

4.3. Sampling model outputs

Analysing model outputs often requires extracting especific values from gridded datasets that where not stored as .csv files. Python has different libraries to process netCDF files that can be uses to extract values from gridded datasets. Here, we have combined some of the Python libraries to facilitate DRYP postprocessing outputs, these functions have beed added into the DRYP pptool.

  • Extranting values from point locations

[ ]:
fname_points = basin_path + "/Tana/input/HAD_tana_dryp_station_utm.csv"
fname_netcdf = regional_path + "output/HAD_IMERG_sim_ini_grid.nc"
[ ]:
df = pptools.get_dataframe_point_from_netcdf(fname_netcdf, fname_points, field='dis')

TASK: Explore and plot the created dataframe

  • Extracting values from specified regions

For this task, a region has to be provided, it can be as raster file or a shapefile.

As an example a shape file will be used to extract infomation from a netCDF file.

[ ]:
fname_shp = basin_path + "datasets/shp/Tana_basin.shp"
fname_raster = regional_path + "model/inputs/TA_HAD_DEM_utm_mm.asc"
fname_netcdf = regional_path + "output/HAD_IMERG_sim_ini_grid.nc"
[ ]:
fname_output = regional_path + "postpp/HAD_area_mask.asc"
[ ]:
#This function will create araster file
rrtools.create_raster_from_shapefile(fname_shp, fname_raster, fname_output)
[ ]:
#This function will produce a pandas dataframe
df = pptools.extract_dataframe_zone_from_netcdf(fname_netcdf, fname_output, field=['aet', 'twsc'])

TASK: Explore and plot the created dataframe

Visualising spatial variables

Plot model results (this can be done with any library available in python for processing netcdf files or even in other application)

[ ]:
import shapefile as shp
[ ]:
def read_dataset(fname, var_name='tht'):
    # Open and select a variable of the netCDF file
    data = xr.open_dataset(fname)
    data = data[var_name]
    return data

def get_mask(fmask):
    # get a mask
    mask = np.flip(rasterio.open(fmask).read(1), 0)
    # mask values for visualisation
    mask = np.array(mask, dtype=float)
    mask[mask <= 0] = np.nan
    return mask # output an array

def resample_dataset(data, mean=True, delt='Y'):
    # calculate climatological mean
    if mean is True:
            data = data.resample(time='Y').mean()
    else:
            data = data.resample(time='Y').sum()
    return data # output an array

def get_bounds(fmask):
    # get map extend
    src = rasterio.open(fmask)
    extend = []
    for index in [0, 2, 1, 3]:
            extend.append(src.bounds[index])
    return extend

def plot_map_raster(bounds, data, ax, vmin=-20.0, vmax=20.0):
    # create plot
    cmap = plt.cm.get_cmap('coolwarm_r', 12)
    im = ax.imshow(data, cmap=cmap, origin="lower",
                            vmin=vmin, vmax=vmax,
                            extent=bounds)
    ax.axis('off')
    return im

def plot_map_raster_by_field(bounds, data, field, ax=None):
    # create figure
    if ax is None:
            fig, ax = plt.subplots()
    # create var
    columns = ['pre', 'pet', 'aet', 'tht', 'inf', 'rch', 'run',
                            'tls', 'fch', 'dch', 'gdh', 'wte', 'egw', 'dis']
    vmin = [0, 600, 0, 0.1, 0, 0, 0,
                            0, 0, 0, 0, 0, 0, 0]
    vmax = [1200, 24e2, 2e3, 0.6, 1200, 500, 1000,
                            500, 500, 500, 100, 80.0, 200, 1000]
    units = [1, 1, 1, 1, 1, 1.0, 1e-6,
                            1, 1, 1, 1e3, 1, 1, 1e-6]
    cmap = ['Blues', 'viridis_r', 'coolwarm_r', 'Blues', 'coolwarm_r', 'coolwarm_r', 'hot_r',
            'hot_r', 'hot_r', 'hot_r', 'Blues', 'Spectral', 'viridis', 'viridis']
    index = ['vmin', 'vmax', 'units', 'cmap']

    parameters_field = [vmin, vmax, units, cmap]

    var = pd.DataFrame(data=np.array(parameters_field),
            index=index,
            columns=columns)

    # create plot
    cmap = plt.cm.get_cmap(var[field]['cmap'], 12)
    im = ax.imshow(data, cmap=cmap, origin="lower",
                            vmin=var[field]['vmin'],
                            vmax=var[field]['vmax'],
                            extent=bounds)

    ax.axis('off')
    return im

def plot_polygon(polygone, ax):
    for shape in polygone.shapeRecords():
            x = [i[0] for i in shape.shape.points[:]]
            y = [i[1] for i in shape.shape.points[:]]
            ax.plot(x,y,'gray', linewidth=0.65)

def scale(data):
    ymax = np.nanmax(data)
    ymin = np.nanmin(data)
    data = (data-ymin)/(ymax-ymin)
    return data #print(ymax, ymin)
[ ]:
fname = basin_path + "postpp/Tana_IMb_sim82_grid_mean.nc"
fmask = basin_path + "model/input/HAD_basin_Tana_mask.asc"
fname_shp = basin_path + "dataset/shp/Tana_basin.shp"

Read dataset of model outputs

[ ]:
data = read_dataset(fname, var_name='pre')

read raster, a mask for plotting the dataset

[ ]:
mask = get_mask(fmask)

get boundaries of the raster dataset to specified the extent

[ ]:
bounds = get_bounds(fmask)

read shapefile, to add boundaries to the map

[ ]:
boundary = shp.Reader(fname_shp)
[ ]:
fig, ax = plt.subplots()
fig.set_size_inches(4.5, 3.1)

# plot raster
im = plot_map_raster_by_field(bounds, data*mask, ax, ifield)

# plot polygone
plot_polygon(boundary, ax)

# add labels and other characteristics
plt.axis('off')
plt.title(ifield)
plt.colorbar(im, label=ifield)
[ ]:
#fname_fig = ""
#plt.savefig(fname_fig, dpi=300)
[ ]: