{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Content:\n", "\n", "* Installation\n", "* Preparing model inputs parameters and dataset\n", "* Runing DRYP model\n", "* ***Post processing model outputs***\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Post-processing datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Understanding model outputs\n", "* Sampling model outputs\n", "* Post-processing model outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE**: Before you start, please change the path to access the following directories:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "training_general_path = \"D:/HAD/training/\"\n", "regional_path = \"D:/HAD/training/regional/\"\n", "basin_path = \"D:/HAD/training/basin/\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1. Understanding model outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Model results are stored in two different formats:\n", "\n", "* Comma delimited files (*.csv*) to store time series, and\n", "* netCDF files (*.nc*) to store gridded output datasets\n", "* ascii raster files (*.asc*) to store hydrological states for initial condions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Model outputs stored in csv files store model results at specified locations as well as average model results.\n", "Point location result files have a letter \"p\" followed by name of the stored variable (e.g. tht) added at the end\n", "of the name of the file. Basin average results have the letters \"avg\" added at the end of the file name." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "####File does not exist\n", "fsim = [\n", "basin_path + \"output/Tana_IMb_sim_p_dis_avg.csv\",\n", "]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK**: Take a look at the content of the file specified above. HINT: use pandas to open and explore the dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def aggregate_slice_csv(fname, agg_step='M', mean=True,\n", " date_start='2000-01-01', date_end='2023-01-01',\n", " timefield='Date'):\n", "\tdf = pd.read_csv(fname)\n", "\tdf[timefield] = pd.to_datetime(df[timefield])\n", "\tdf = df[df[timefield].between(date_start, date_end)]\n", "\tdf.index = pd.DatetimeIndex(df[timefield])\n", "\tif mean is True:\n", "\t\tdf = df.resample(agg_step).mean().reset_index()\n", "\telse:\n", "\t\tdf = df.resample(agg_step).sum().reset_index()\n", "\treturn df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plot data\n", "fig, ax = plt.subplots(3, 1, sharex=True)\n", "fig.set_size_inches(9, 4.5)\n", "label = ['pre', 'tht', \"aet\"]#, \"twsc\"]\n", "field = ['pre_0','tht_0', \"aet_0\"]#, \"twsc_0\"]\n", "#ilabel = \"sim\"\n", "for ifname in fsim:\n", " for ifield, ilabel, iax in zip(field, label, fig.axes):\n", " data = aggregate_slice_csv(ifname, timefield='Date')\n", " #ax.plot(data['date'], data['flow(m3/s)'], label=ilabel)#data.plot(y='Flow (Cumecs)') \n", " if ifield == \"twsc_0\":\n", " iax.plot(data['Date'], np.cumsum(data[ifield]))#, label=ilabel)\n", " elif ifield == \"tht_0\":\n", " iax.plot(data['Date'], data[ifield])#, scale=True)\n", " else:\n", " iax.plot(data['Date'], data[ifield])#, label=ilabel)\n", " \n", " iax.set_ylabel(ilabel)\n", "iax.legend([\"Sim\"], frameon=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reading output stored at specific locations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fnamesim=[\n", "basin_path + \"output/Juba_IMa_sim_p_dis.csv\",\n", "]\n", "#/Users/isamarcortes/Downloads/training/basin/output/Juba_IMa_sim_p_dis.csv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(2, 1, sharex=True)\n", "fig.set_size_inches(9, 3)\n", "\n", "for ifnamesim in fnamesim:\n", " data = aggregate_slice_csv(ifnamesim)\n", " stn = ['dis_5', \"dis_3\"]\n", " for istn, iax in zip(stn, fig.axes):\n", " iax.plot(data['Date'], data[istn]/30.5/86400, label='Sim')\n", " iax.set_ylabel(\"Flow (m3/s)\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# uncomment this line and save it in a local directory\n", "#fname_out = 'd:/HAD_basins/basin_postpp/fig/Juba_streamflow.png'\n", "#fig.savefig(fname_out, dpi=300)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK**: plot groundwater recharge and total water storage at specific locations and basin average results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gridded datasets are stored by DRYP in netCDF and raster files. NetCFD files store temporal grided datasets whereas raster datasets store the last step of the simulation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following variables are stored as netCDF files:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Precipitaion (pre)\n", "* Potential evapotranspiration (pet)\n", "* Actual evapotranspiration (aet)\n", "* soil water content (tht)\n", "* Total groundwater recharge (rch)\n", "* Focused groundwater recharge (fch)\n", "* Water table elevation (wte)\n", "* Groundwater discharge (gdh)\n", "* Groundwater Evapotranspiration (egw)\n", "* Streamflow (dis)\n", "* Infiltration (inf)\n", "* Runoff (run)\n", "* Total water storage (twsc)\n", "* soil water content riparian zone (tht)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NetCDF files are easily handled with xarray, although netCDF4 or geopands can also be used to deal with this files." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import xarray as xr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#/Users/isamarcortes/Downloads/training/basin/output/Tana_IMb_sim_grid.nc\n", "fname_nc = basin_path + \"output/Tana_IMb_sim_grid.nc\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#xr.open_dataset(fname_nc)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def read_dataset(fname, var_name='tht'):\n", "\t# Open the first netCDF file\n", "\tdata = xr.open_dataset(fname)\n", "\tdata = data[var_name]\n", "\treturn data\n", "\n", "def slice_dataset_time(dataset, start_time=\"2003-01-01\", end_time=\"2012-01-01\"):\n", "\t# Slice the dataset between two dates\n", "\tdataset = dataset.sel(time=slice(start_time, end_time))\n", "\treturn dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#read_dataset(fname_nc, var_name='tht')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DRYP stores as raster files, at the end of the simulation, the following variables:\n", "\n", "* water table elevation\n", "* soil and riparian water content\n", "* surface water storage\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Raster files can easily be handled with rasterio." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import rasterio" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_raster_file(fname, ax=None, vmin=-20.0, vmax=20.0):\n", " # create plot\n", " if ax is None:\n", " fig, ax = plt.subplots()\n", " cmap = plt.cm.get_cmap('coolwarm_r', 12)\n", " data = rasterio.open(fname).read(1)\n", " im = ax.imshow(data,# origin=\"lower\",#cmap=cmap, \n", " #vmin=vmin, vmax=vmax,\n", " )#extent=bounds)\t\n", " \n", " ax.axis('off')\n", " plt.colorbar(im)\n", " return im" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "fname_raster = basin_path + \"output/Tana_IMb_sim_avg_wte_ini.asc\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_raster_file(fname_raster, ax=None, vmin=-20.0, vmax=20.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.2. Post-processing model outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A post-processing component (DRYP_pptools) has been added to DRYP to perform basic operation with model outputs. This tool is still under\n", "development therefore capabilities are limited.\n", "\n", "Bellow some of the operations that can be performed with DRYP_pptools are listed, some examples are also shown below: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. calulate long term average\n", "* calculate WRSI\n", "* calculate anomalies\n", "* calculate total water storage anomalies (TWSA)\n", "* calculate seasonal average\n", "* calculate seasonal anomalies\n", "* sampling model outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initiallize DRYP post processing tool library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import general libraries\n", "import matplotlib.pyplot as plt\n", "\n", "# Import libraries from local repository\n", "import sys\n", "sys.path.append('C:/Users/Edisson/Documents/GitHub/DRYPv2.0.1')\n", "\n", "import cuwalid.tools.DRYP_pptools as pptools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize the grid postprocessing tool, this step will create a python object with all model output filenames. This function uses the model parameter input file to get model output and paths:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file_model_input = basin_path + \"model/HAD_IMERG_Tana_input_sim.dmp\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#gridpp = pptools.grid_pptools(file_model_input)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When runing the function bellow, the long term average, WRSI, and TWSA will be estimated. These functions will directly use model paths and output files specified in the input model file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#gridpp.get_mean() # save mean values\n", "#gridpp.get_wrsi() # save wsri\n", "#gridpp.get_twsa() # save total water storage anomaly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use DRYP_pptools without initalising the gridded component. We can directly call all the functions used by the post processing tools. This function is useful when model outputs are located in other folders.\n", "\n", "A detailed description of each function can be found in DRYP documentation, which is located in DRYP/doc/build/html (you can use your browser to open html files). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Calculate the long term average of model fluxes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fname_nc = basin_path + \"postpp/Tana_IMb_sim_grid.nc\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fields = ['pre', 'inf', 'pet', 'rch', 'aet', 'gdh', 'egw', 'fch', 'twsc', 'run']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pptools.calculate_mean_from_netCDF(fname_nc, field=fields)#, fname_out=None, deltat='Y', start_time='2000-20-1', end_time=None)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK** Plot long term average values of focused recharge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Calculate the lWater Requirement Satisfaction Index (WRSI)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pptools.calculate_WRSI_from_netCDF(fname_nc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Calculate Total Water Storage Anomalies (TWSA)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pptools.calculate_twsa_from_netCDF(fname_nc)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Calculate Aridity Index (AI)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pptools.calculate_AI_from_netCDF(fname_pre, fname_pet)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Calculate the anomalies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pptools.calculate_anomalies_from_netCDF(fname_nc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Calculate seasonal averages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pptools.calculate_seasonal_average_from_netCDF(fname_nc, season=\"OND\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.3. Sampling model outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analysing model outputs often requires extracting especific values from gridded datasets that where not stored as *.csv* files.\n", "Python has different libraries to process netCDF files that can be uses to extract values from gridded datasets. Here, we have combined some of the Python libraries to facilitate DRYP postprocessing outputs, these functions have beed added into the DRYP pptool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Extranting values from point locations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fname_points = basin_path + \"/Tana/input/HAD_tana_dryp_station_utm.csv\"\n", "fname_netcdf = regional_path + \"output/HAD_IMERG_sim_ini_grid.nc\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pptools.get_dataframe_point_from_netcdf(fname_netcdf, fname_points, field='dis')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK**: Explore and plot the created dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Extracting values from specified regions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this task, a region has to be provided, it can be as raster file or a shapefile.\n", "\n", "As an example a shape file will be used to extract infomation from a netCDF file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fname_shp = basin_path + \"datasets/shp/Tana_basin.shp\"\n", "fname_raster = regional_path + \"model/inputs/TA_HAD_DEM_utm_mm.asc\"\n", "fname_netcdf = regional_path + \"output/HAD_IMERG_sim_ini_grid.nc\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fname_output = regional_path + \"postpp/HAD_area_mask.asc\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#This function will create araster file \n", "rrtools.create_raster_from_shapefile(fname_shp, fname_raster, fname_output)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#This function will produce a pandas dataframe\n", "df = pptools.extract_dataframe_zone_from_netcdf(fname_netcdf, fname_output, field=['aet', 'twsc'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK**: Explore and plot the created dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Visualising spatial variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot model results (this can be done with any library available in python for processing netcdf files or even in other application)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shapefile as shp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def read_dataset(fname, var_name='tht'):\n", "\t# Open and select a variable of the netCDF file\n", "\tdata = xr.open_dataset(fname)\n", "\tdata = data[var_name]\n", "\treturn data\n", "\n", "def get_mask(fmask):\n", "\t# get a mask\n", "\tmask = np.flip(rasterio.open(fmask).read(1), 0)\n", "\t# mask values for visualisation\n", "\tmask = np.array(mask, dtype=float)\n", "\tmask[mask <= 0] = np.nan\n", "\treturn mask # output an array\n", "\n", "def resample_dataset(data, mean=True, delt='Y'):\n", "\t# calculate climatological mean\n", "\tif mean is True:\n", "\t\tdata = data.resample(time='Y').mean()\n", "\telse:\n", "\t\tdata = data.resample(time='Y').sum()\t\n", "\treturn data # output an array\n", "\n", "def get_bounds(fmask):\n", "\t# get map extend\n", "\tsrc = rasterio.open(fmask)\n", "\textend = []\n", "\tfor index in [0, 2, 1, 3]:\n", "\t\textend.append(src.bounds[index])\n", "\treturn extend\n", "\t\n", "def plot_map_raster(bounds, data, ax, vmin=-20.0, vmax=20.0):\n", "\t# create plot\t\n", "\tcmap = plt.cm.get_cmap('coolwarm_r', 12)\n", "\tim = ax.imshow(data, cmap=cmap, origin=\"lower\",\n", "\t\t\t\tvmin=vmin, vmax=vmax,\n", "\t\t\t\textent=bounds)\t\n", "\tax.axis('off')\n", "\treturn im\n", "\t\n", "def plot_map_raster_by_field(bounds, data, field, ax=None):\n", "\t# create figure\n", "\tif ax is None:\n", "\t\tfig, ax = plt.subplots()\n", " # create var \n", "\tcolumns = ['pre', 'pet', 'aet', 'tht', 'inf', 'rch', 'run',\n", "\t\t\t\t'tls', 'fch', 'dch', 'gdh', 'wte', 'egw', 'dis']\n", "\tvmin = [0, 600, 0, 0.1, 0, 0, 0,\n", "\t\t\t\t0, 0, 0, 0, 0, 0, 0]\n", "\tvmax = [1200, 24e2, 2e3, 0.6, 1200, 500, 1000,\n", "\t\t\t\t500, 500, 500, 100, 80.0, 200, 1000]\n", "\tunits = [1, 1, 1, 1, 1, 1.0, 1e-6,\n", "\t\t\t\t1, 1, 1, 1e3, 1, 1, 1e-6]\n", "\tcmap = ['Blues', 'viridis_r', 'coolwarm_r', 'Blues', 'coolwarm_r', 'coolwarm_r', 'hot_r',\n", " 'hot_r', 'hot_r', 'hot_r', 'Blues', 'Spectral', 'viridis', 'viridis']\n", "\tindex = ['vmin', 'vmax', 'units', 'cmap']\n", "\t\n", "\tparameters_field = [vmin, vmax, units, cmap]\n", "\n", "\tvar = pd.DataFrame(data=np.array(parameters_field),\n", "\t\tindex=index,\n", "\t\tcolumns=columns)\n", "\t\n", "\t# create plot\t\n", "\tcmap = plt.cm.get_cmap(var[field]['cmap'], 12)\n", "\tim = ax.imshow(data, cmap=cmap, origin=\"lower\",\n", "\t\t\t\tvmin=var[field]['vmin'],\n", "\t\t\t\tvmax=var[field]['vmax'],\n", "\t\t\t\textent=bounds)\n", "\t\n", "\tax.axis('off')\n", "\treturn im\n", "\n", "def plot_polygon(polygone, ax):\n", "\tfor shape in polygone.shapeRecords():\n", "\t\tx = [i[0] for i in shape.shape.points[:]]\n", "\t\ty = [i[1] for i in shape.shape.points[:]]\n", "\t\tax.plot(x,y,'gray', linewidth=0.65)\n", "\n", "def scale(data):\n", "\tymax = np.nanmax(data)\n", "\tymin = np.nanmin(data)\n", "\tdata = (data-ymin)/(ymax-ymin)\n", "\treturn data #print(ymax, ymin)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fname = basin_path + \"postpp/Tana_IMb_sim82_grid_mean.nc\"\n", "fmask = basin_path + \"model/input/HAD_basin_Tana_mask.asc\"\n", "fname_shp = basin_path + \"dataset/shp/Tana_basin.shp\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read dataset of model outputs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = read_dataset(fname, var_name='pre')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "read raster, a mask for plotting the dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mask = get_mask(fmask)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "get boundaries of the raster dataset to specified the extent" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bounds = get_bounds(fmask)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "read shapefile, to add boundaries to the map" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "boundary = shp.Reader(fname_shp)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "fig.set_size_inches(4.5, 3.1)\n", "\n", "# plot raster\n", "im = plot_map_raster_by_field(bounds, data*mask, ax, ifield)\n", "\t\t\n", "# plot polygone\t\n", "plot_polygon(boundary, ax)\n", "\t\t\n", "# add labels and other characteristics\n", "plt.axis('off')\n", "plt.title(ifield)\n", "plt.colorbar(im, label=ifield)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#fname_fig = \"\"\n", "#plt.savefig(fname_fig, dpi=300)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:cwld]", "language": "python", "name": "conda-env-cwld-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }