EVALUATING STORM’S PERFORMANCE

MAM SEASON (FOR 1 CLUSTER)

[10]:

fig, ax = plt.subplots(figsize=(5, 4), dpi=150)
# ax.set_aspect("equal")
# season.plot(cmap="precip2_17lev", ax=ax)
season.plot(cmap="WhiteBlueGreenYellowRed", ax=ax)
plt.show()

../../_images/notebooks_STORM_six__15_0.png

So far we haven’t used the Crameri library (despite we keep uploading it every time). It is designed to be color-blind safe. Let’s have a look at it… throughout some not-that-fancy Numpy-plot.

[11]:

# not-fancy plot
fig, ax = plt.subplots(figsize=(5, 4), dpi=150)
plt.imshow(season.data, origin="upper", cmap=cmc.bukavu_r, interpolation="none")

[11]:

<matplotlib.image.AxesImage at 0x7fdbb9508510>

../../_images/notebooks_STORM_six__17_1.png

Check out the cluster-mask stored in the output.

[12]:

# masks
kmeans = xr.where(skams != -1, skams, np.nan)
# kmeans = skams.where(-1, np.nan)
kmeans = np.unique(kmeans.data)
kmeans = kmeans[~np.isnan(kmeans)].astype("i1")

# counting cluster-pixels
print([xr.where(skams == x, skams, np.nan).count().data for x in kmeans])

[array(86293)]

For now, STORM outputs the total seasonal rainfall achieved by the simulation (stored in the corresponding CSV files).

Let’s see how far off (from the mean –for the overall HAD–) the simulated seasonal field is.

[14]:

# construct a Pandas dataframe
ks = list(map(lambda x: xr.where(skams == x, season, np.nan).mean().data, kmeans))
out_ks = pd.DataFrame({"k": kmeans, "mean_nc4": ks}, dtype="object")

# input kmeans
in_ks = pd.read_csv(sfile.replace(".nc", "_kmeans.csv"))

# KMEANS from IN/OUT
kkmm = pd.merge(in_ks, out_ks, how="left", on="k")

# how does that look like?
kkmm

[14]:

	k	mean_in	mean_out	mean_xtra	mean_nc4
0	0	237.76195	237.806487	237.804079	237.8040790794155

Not bad at all, right?. This shouldn’t be that surprising. STORM is precisely designed to stop when the seasonal average (over the catchment/region is reached).

(do not mind the “mean_xtra” column)

Let’s plot now the regions/clusters.

[15]:

# skams.plot(cmap=cmc.bukavu, levels=len(kmeans)+1, vmin=0, vmax=len(kmeans))
skams.plot(cmap=cmc.hawaii_r, levels=len(kmeans) + 1, vmin=0, vmax=len(kmeans))

[15]:

<matplotlib.collections.QuadMesh at 0x7fdbb78952d0>

../../_images/notebooks_STORM_six__23_1.png

ONE SIMULATED-SEASON \(\to\) CLUSTERS \(\equiv\) 4

Bear in mind that computation of the following block is somewhat “time-consuming”

[16]:

# couple of realizations for 4 clusters
sfile = "../model_output/RUN_230901T1010_S1_nada_zero.nc"  # -> 1 CLUSTERS

# read the NetCDF file via Xarray
ds = xr.open_mfdataset(
    sfile,
    group="run_02",
    combine="nested",
    # concat_dim='time',
    decode_times=True,
    use_cftime=True,
    decode_cf=True,
    mask_and_scale=True,
    # data_vars=['rain'],
)
ds = ds.assign_coords(
    {"y": ds.projection_y_coordinate.load(), "x": ds.projection_x_coordinate.load()}
)

# load the simulated year to work with
storm_x = ds["year_2023"].load()
skams_x = ds["k_means"].load()
ds.close()

# seasonal rain
season_x = (storm_x.astype("f8").round(3)).sum(axis=0)

Straight to plotting (no need for intermediate Xarray-displaying).

MAM SEASON (FOR 4 CLUSTERS)

[18]:

fig, ax = plt.subplots(figsize=(5, 4), dpi=150)
# ax.set_aspect("equal")
# season_x.plot(cmap="precip2_17lev", ax=ax)
season_x.plot(cmap="WhiteBlueGreenYellowRed", ax=ax)
plt.show()

../../_images/notebooks_STORM_six__27_0.png

Check out the cluster-mask stored in the output.

[19]:

# masks
kmeans = xr.where(skams_x != -1, skams_x, np.nan)
# kmeans = skams_x.where(-1, np.nan)
kmeans = np.unique(kmeans.data)
kmeans = kmeans[~np.isnan(kmeans)].astype("i1")

# counting cluster-pixels
print([xr.where(skams_x == x, skams_x, np.nan).count().data for x in kmeans])

[array(5418), array(32407), array(33251), array(15217)]

[20]:

# construct a Pandas dataframe
ks_x = list(map(lambda x: xr.where(skams_x == x, season_x, np.nan).mean().data, kmeans))
out_ks = pd.DataFrame({"k": kmeans, "mean_nc4": ks_x}, dtype="object")

# output kmeans
out_ks

[20]:

	k	mean_nc4
0	0	585.1801125876708
1	1	234.3012125775295
2	2	123.34646416649124
3	3	375.2105049615562

[21]:

# input kmeans
in_ks = pd.read_csv(sfile.replace(".nc", "_kmeans.csv"))
in_ks

[21]:

	k	mean_in	mean_out	mean_xtra
0	0	122.20316	123.01343482432205	122.84915667218301
1	1	374.7068	374.86894913237956	374.7010015098805
2	2	582.65515	582.956113011005	582.7885422042028
3	3	234.15459	234.3431215911475	234.178792987438
4	k	mean_in	mean_out	mean_xtra
5	0	582.8078	585.1803500996468	584.8981125876714
6	1	234.23451	234.303528511481	234.02055815101698
7	2	122.230225	123.34755948793945	123.06533824546656
8	3	374.87192	375.2136687682176	374.93388460274764

[22]:

# careful! i chose 'run_02' then i must do 'in_ks.iloc[5:,:]'

# merged IN/OUT  KMEANS
kkmm = pd.merge(in_ks.iloc[5:, :].astype(str), out_ks.astype(str), how="left", on="k")

# how does that look like?
kkmm

[22]:

	k	mean_in	mean_out	mean_xtra	mean_nc4
0	0	582.8078	585.1803500996468	584.8981125876714	585.1801125876708
1	1	234.23451	234.303528511481	234.02055815101698	234.3012125775295
2	2	122.230225	123.34755948793945	123.06533824546656	123.34646416649124
3	3	374.87192	375.2136687682176	374.93388460274764	375.2105049615562

For some reason, STORM struggles to reach the same accuracy when only dealing with 1 cluster.

(do not mind the “mean_xtra” column)

Plot the regions/clusters.

[23]:

skams_x.plot(cmap=cmc.bukavu, levels=len(kmeans) + 1, vmin=0, vmax=len(kmeans))
# skams_x.plot(cmap=cmc.hawaii_r, levels=len(kmeans) + 1, vmin=0, vmax=len(kmeans))

[23]:

<matplotlib.collections.QuadMesh at 0x7fdbb7759dd0>

../../_images/notebooks_STORM_six__34_1.png

30 RUNS

You’re encouraged to: 1) run 30 simulations on STORM; and 2) re-do this notebook for the OND realization You’d need the code below (which was used to produce the plots above) to aggregate all the seasonal storms for the simulated years given in the (NetCDF) files

[ ]:

# file path for simulations based on 4 CLUSTERS
file4 = "./model_output/RUN_230831T1626_S1_nada_zero.nc"
# file path for simulations based on 1 CLUSTER
file1 = "./model_output/RUN_230830T1734_S1_nada_zero.nc"

# variable name where the rainfall is stored (in the nc.file)
var = "year_2023"


def COLLECT(sfile, grp, var):
    # FUNCTION to collect and aggregate all seasonal storms in a given RUN
    ds = xr.open_mfdataset(
        sfile,
        group=grp,
        combine="nested",
        # concat_dim="time",
        # data_vars=[var],
        decode_times=True,
        use_cftime=True,
        decode_cf=True,
        mask_and_scale=True,
    )
    ds = ds.assign_coords(
        {"y": ds.projection_y_coordinate.load(), "x": ds.projection_x_coordinate.load()}
    )

    # 0.002 is the scaling.factor; -0.00199999999998to1128 is the agg.factor
    # ...usually you don't need to do this; but something went wrong (apparently)
    storm = (
        ((ds[var] * 0.002) + -0.001999999999981128)
        .round(3)
        .astype("f8")
        .sum(dim="time_001")
        .load()
    )
    # use the line below instead, when outputs are produced correctly
    # storm = ds[var].sum(dim="time_001").load()

    skams = ds["k_means"].load()
    ds.close()
    return storm, skams


def COMPUTE(sfile):
    # FUNCTION to call all the RUNS in a file
    # "29" means 30-RUNS (careful with your own simulations)
    xs = list(
        map(
            lambda x: COLLECT(sfile, x, var),
            list(map(lambda x: f"run_{'{:02d}'.format(x)}", np.arange(29) + 1)),
        )
    )
    storm = xr.concat(list(zip(*xs))[0], dim="run").mean(dim="run")
    skams = xr.concat(list(zip(*xs))[-1], dim="run").mean(dim="run")
    return storm, skams


def RELATIVE(season, skams):  # season=s4; skams=k4
    # FUNCTION to compute spatial BIASES
    # mask-dealing
    kmeans = np.unique(skams)
    kmeans = kmeans[~np.isnan(kmeans)].astype("i1")
    # print( [xr.where(skams == x, skams, np.nan).count().data for x in kmeans] )
    ks = list(map(lambda x: xr.where(skams == x, season, np.nan).mean().data, kmeans))
    cx = (season.copy()).where(~skams.isnull())
    # mask into averages
    aa = [skams.where(skams != i, item) for i, item in enumerate(ks)]
    aa = xr.concat(aa, dim="mask").sum(dim="mask").where(~skams.isnull())
    # relative field
    rr = (cx - rain) / aa
    # rr.plot(cmap=cmc.roma)
    return rr

[ ]:

# put into action all the above functions

# results for 4 CLUSTERS
s4, k4 = COMPUTE(file4)
k4 = COLLECT(file4, "run_30", var)[-1]

# results for 1 CLUSTER
s1, k1 = COMPUTE(file1)
k1 = COLLECT(file1, "run_30", var)[-1]

[ ]:

# the plotting happens here

# concatenate realization, 4-clusters, and 1-clusters (into ONE Xarray)
rain = rain_fs["rain"].where(~k1.isnull())
one_rain = xr.concat([rain, s1, s4], dim="case")
one_rain = one_rain.assign_coords(
    {"case": ["mean_MAM [realization]", "1K [30runs]", "4K [30runs]"]}
)

# plot the rainfall
fig, ax = plt.subplots(dpi=300)
one_rain.plot(
    figsize=(17, 5),
    x="x",
    y="y",
    col="case",
    col_wrap=3,
    cmap="precip2_17lev",
    levels=11,
    vmin=0,
    vmax=1500,
    cbar_kwargs={"shrink": 4 / 5, "pad": +0.01},
)
plt.savefig(
    f"realisation_plot10--rain.pdf",
    bbox_inches="tight",
    pad_inches=0.02,
    facecolor=fig.get_facecolor(),
)
plt.close()
plt.clf()

# concatenate the relative-bias datasets
one_diff = xr.concat([RELATIVE(s1, k1), RELATIVE(s4, k4)], dim="case")
one_diff = one_diff.assign_coords(
    {"case": ["1K [30runs_rel.BIAS]", "4K [30runs_rel.BIAS]"]}
)

# plot the biases
fig, ax = plt.subplots(dpi=300)
one_diff.plot(
    figsize=(12, 5),
    x="x",
    y="y",
    col="case",
    col_wrap=2,
    # robust=True,
    cmap=cmc.vik_r,
    levels=10,
    vmin=-2,
    vmax=2.5,
    cbar_kwargs={"shrink": 4 / 5, "pad": +0.01},
)
plt.savefig(
    f"realisation_plot10--diff.pdf",
    bbox_inches="tight",
    pad_inches=0.02,
    facecolor=fig.get_facecolor(),
)
plt.close()
plt.clf()

EVALUATING STORM’S PERFORMANCE

Objectives:

ONE SIMULATED-SEASON \(\to\) CLUSTERS \(\equiv\) 1

MAM SEASON (FOR 1 CLUSTER)

ONE SIMULATED-SEASON \(\to\) CLUSTERS \(\equiv\) 4

MAM SEASON (FOR 4 CLUSTERS)

QUANTITATIVE COMPARISONS

RELATIVE BIAS \(\to\) CLUSTERS \(\equiv\) 4

RELATIVE BIAS \(\to\) CLUSTERS \(\equiv\) 1

SEASONAL [MAM] RAINFALL SIDE-BY-SIDE

[MAM] RELATIVE BIAS SIDE-BY-SIDE

30 RUNS