{ "cells": [ { "cell_type": "markdown", "id": "39a05a8e-b585-41ed-9f66-bb407f41e2cf", "metadata": {}, "source": [ "# **GENERATING NETCDF OUTPUTS** #\n", "\n", "#### Objective: ####\n", "+ Create [NetCDF](https://unidata.github.io/netcdf4-python/#netCDF4) files *almost?* fully-complaint with [CF](https://cfconventions.org/) [conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html)\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "6f4fcd5c-8c0e-40c5-9feb-7facfa814032", "metadata": {}, "source": [ "Computational architecture nowadays is [64](https://en.wikipedia.org/wiki/64-bit_computing)-[bit](https://en.wikipedia.org/wiki/Bit) based.\n", "When dealing with numbers, this implies that computers have the capacity to store (in memory or disk) up to $2^{64}$ different values.\n", "For practical purposes, this implies that one has up to 64 spaces to store any possible combination of 1's and 0's (that ultimately represents the actual value we want).\n", "Rainfall is the variable we're interested here, so one can easily store 100 mm (or mm/h) in that 64-space \"vector\".\n", "Compared to $2^{64}$, $100$ is a pretty small number, having the following representation in a binary system: $1100100$.\n", "If you noticed, it only took 7 \"spaces\" to represent 100 in binary.\n", "Hence, wouldn't be smarter to use 7 instead of 64 \"spaces\" to store the number 100?.\n", "For [historical reasons](https://softwareengineering.stackexchange.com/questions/120126/what-is-the-history-of-why-bytes-are-eight-bits), 8-bits (i.e., 1-byte) is the minimum \"space\"-configuration of modern computers.\n", "Thus, using 8 instead of 64 spaces means saving actual \"space\" up to **4** times!.\n", "\n", "This notebook shows you how to take advantage of computer architecture concepts so you can actually save space in your PC, while storing/exporting NetCDF files in the most efficient/optimal way.\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "ce4b93a8-b154-414e-b78a-5a4406b3ed05", "metadata": {}, "source": [ "The masking of the data and the generation of NetCDF files is done by STORM through the functions: ROUNDX, NCBYTES_RAIN, NC_FILE_I, and NC_FILE_II of the [rainfall.py](../rainfall.py) module.\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "7da9b588-ddac-40d2-8434-e256612cd420", "metadata": {}, "source": [ "### STORM PARAMETERS ###\n", "" ] }, { "cell_type": "code", "execution_count": 36, "id": "1ef58462-5d24-4c32-8c70-a3021719522f", "metadata": {}, "outputs": [], "source": [ "# OGC-WKT for HAD [taken from https://epsg.io/42106]\n", "WKT_OGC = (\n", " 'PROJCS[\"WGS84_/_Lambert_Azim_Mozambique\",'\n", " 'GEOGCS[\"unknown\",'\n", " 'DATUM[\"unknown\",'\n", " 'SPHEROID[\"Normal Sphere (r=6370997)\",6370997,0]],'\n", " 'PRIMEM[\"Greenwich\",0,'\n", " 'AUTHORITY[\"EPSG\",\"8901\"]],'\n", " 'UNIT[\"degree\",0.0174532925199433,'\n", " 'AUTHORITY[\"EPSG\",\"9122\"]]],'\n", " 'PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],'\n", " 'PARAMETER[\"latitude_of_center\",5],'\n", " 'PARAMETER[\"longitude_of_center\",20],'\n", " 'PARAMETER[\"false_easting\",0],'\n", " 'PARAMETER[\"false_northing\",0],'\n", " 'UNIT[\"metre\",1,'\n", " 'AUTHORITY[\"EPSG\",\"9001\"]],'\n", " 'AXIS[\"Easting\",EAST],'\n", " 'AXIS[\"Northing\",NORTH],'\n", " 'AUTHORITY[\"EPSG\",\"42106\"]]'\n", ")\n", "\n", "T_RES = 30 # in minutes! -> TEMPORAL.RES of TIME.SLICES\n", "TIME_DICT_ = dict(seconds=1, minutes=1 / 60, hours=1 / 60**2, days=1 / (60**2 * 24))\n", "# UNITS (since DATE_ORIGIN) for NC.TIME dim\n", "TIME_OUTNC = \"minutes\"\n", "\n", "# to store dates as INT\n", "DATE_ORIGIN = \"1970-01-01\"\n", "# Local Time Zone (see links below for more names)\n", "TIME_ZONE = \"Africa/Addis_Ababa\"\n", "\n", "# RAINFMT = 'f4'\n", "# 'u' for UNSIGNED.INT || 'i' for SIGNED.INT || 'f' for FLOAT\n", "# number of Bytes (1, 2, 4 or 8) to store the RAINFALL variable (into)\n", "RAINFMT = \"u2\"\n", "TIMEINT = \"u4\" # format for integers in TIME dimension\n", "TIMEFIL = +(2 ** (int(TIMEINT[-1]) * 8)) - 1\n", "\n", "PRECISION = 0.025\n", "iMIN = 8.0" ] }, { "cell_type": "markdown", "id": "c107d9f2-f1af-447c-b084-0b61dc084e06", "metadata": {}, "source": [ "#### ANOTHER SET OF PARAMETERS ####\n", "" ] }, { "cell_type": "code", "execution_count": 37, "id": "690329e6-3cb2-415d-aa00-f81e7d5b3277", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "65535\n" ] } ], "source": [ "INTEGER = int(RAINFMT[-1])\n", "\n", "# if one wants un-signed integers (e.g.) -> 1 (1 because you need 0 for NaN-filling)\n", "MINIMUM = 1\n", "# 65535 (largest unsigned integer of 16 Bits -> 0 also counts)\n", "MAXIMUM = +(2 ** (INTEGER * 8)) - 1\n", "# # 4294967296 (largest unsigned integer of 32 Bits)\n", "# MAXIMUM = +(2 ** (32)) - 1\n", "\n", "print(MINIMUM)\n", "print(MAXIMUM)" ] }, { "cell_type": "code", "execution_count": 38, "id": "50967897-4238-4705-afb3-fa678d835c42", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iMAX = 1646.325\n", "SCLf = 0.025\n", "ADDf = 7.975\n" ] } ], "source": [ "# TO SCALE 'DOWNW' FLOATS TO INTEGERS\n", "\n", "# NORMALIZING THE RAINFALL SO IT CAN BE STORED AS 16-BIT INTEGER (65,536 -> unsigned)\n", "# https://stackoverflow.com/a/59193141/5885810 (scaling 'integers')\n", "# https://stats.stackexchange.com/a/70808/354951 (normalize data 0-1)\n", "\n", "# if you want a larger precision (or your variable is in the 'low' scale,\n", "# ...say Summer Temperatures in Celsius) you must/could lower this limit.\n", "iMAX = PRECISION * (MAXIMUM - MINIMUM - 1) + iMIN\n", "print(f\"iMAX = {iMAX}\")\n", "\n", "# SCALING FACTOR:\n", "SCL = (iMAX - iMIN) / (MAXIMUM - MINIMUM - 1) # -1 because 0 doesn't count\n", "\n", "# ADDITIVE FACTOR:\n", "# ADD = iMAX - SCL * MAXIMUM\n", "ADD = iMIN - SCL * MINIMUM\n", "\n", "print(f\"SCLf = {SCL}\")\n", "print(f\"ADDf = {ADD}\")" ] }, { "cell_type": "markdown", "id": "ca95c85b-177c-4282-9dcd-c34b885061bb", "metadata": {}, "source": [ "## CHECKING LIMITS AND RESOLUTION ##" ] }, { "cell_type": "code", "execution_count": 39, "id": "d90150d1-203b-4f9f-8a16-dc21787436a8", "metadata": {}, "outputs": [], "source": [ "# loading libraries\n", "\n", "from datetime import datetime, timezone\n", "from zoneinfo import ZoneInfo\n", "\n", "import netCDF4 as nc4\n", "import numpy as np\n", "import rioxarray as rio\n", "import xarray as xr\n", "from dateutil.tz import tzlocal" ] }, { "cell_type": "code", "execution_count": 40, "id": "5d92ac85-9b6e-4802-9f59-62043d1b1ab2", "metadata": {}, "outputs": [], "source": [ "# help functions\n", "\n", "\n", "def ROUNDX(x, prec=3, base=PRECISION):\n", " # FUNCTION to CUSTOM.ROUNDING TO SPECIFIC.BASE\n", " # https://stackoverflow.com/a/18666678/5885810\n", " return (base * (np.array(x) / base).round()).round(prec)\n", "\n", "\n", "def BASE_ROUND(stamps, base=T_RES):\n", " # FUNCTION to ROUND TIME.STAMPS to the 'T_RES' FLOOR! (or NEAREST 'T_RES')\n", " # https://stackoverflow.com/a/2272174/5885810\n", " # \"*60\" because STAMPS come in seconds; and T_RES is in minutes\n", " base = base * TIME_DICT_[TIME_OUTNC] * 60\n", " iround = base * (np.ceil(stamps / base) - 1) # .astype( TIMEINT )\n", " return iround" ] }, { "cell_type": "markdown", "id": "36a1ec40-0000-4d77-bbb2-3c45b9caf35a", "metadata": {}, "source": [ "We won't compute anything here.\n", "Somehow, you'll have arrived to the stage of having some data in memory, which you're about to export as NetCDF file.\n", "For this purpose, we'll load in memory the **for_wrong-1.nc** dataset. *(exported in notebook [for_](../for_.ipynb#wrong))*" ] }, { "cell_type": "code", "execution_count": 41, "id": "13413f40-2ca8-4416-a8ef-431aab305e05", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'x': \n", " float64 x(x)\n", " _FillValue: nan\n", " axis: X\n", " long_name: x coordinate of projection\n", " standard_name: projection_x_coordinate\n", " units: metre\n", " unlimited dimensions: \n", " current shape = (408,)\n", " filling on,\n", " 'y': \n", " float64 y(y)\n", " _FillValue: nan\n", " axis: Y\n", " long_name: y coordinate of projection\n", " standard_name: projection_y_coordinate\n", " units: metre\n", " unlimited dimensions: \n", " current shape = (470,)\n", " filling on,\n", " 'xomethin': \n", " int32 xomethin()\n", " crs_wkt: PROJCS[\"WGS84_/_Lambert_Azim_Mozambique\",GEOGCS[\"unknown\",DATUM[\"unknown\",SPHEROID[\"Normal Sphere (r=6370997)\",6370997,0]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433]],PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],PARAMETER[\"latitude_of_center\",5],PARAMETER[\"longitude_of_center\",20],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"42106\"]]\n", " spatial_ref: PROJCS[\"WGS84_/_Lambert_Azim_Mozambique\",GEOGCS[\"unknown\",DATUM[\"unknown\",SPHEROID[\"Normal Sphere (r=6370997)\",6370997,0]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433]],PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],PARAMETER[\"latitude_of_center\",5],PARAMETER[\"longitude_of_center\",20],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"42106\"]]\n", " GeoTransform: 1340000.0 5000.0 0.0 1170000.0 0.0 -5000.0\n", " unlimited dimensions: \n", " current shape = ()\n", " filling on, default _FillValue of -2147483647 used,\n", " 'rain': \n", " float32 rain(y, x)\n", " _FillValue: nan\n", " lat#axis: Y\n", " lat#bounds: lat_bnds\n", " lat#DimensionNames: lat\n", " lat#LongName: Latitude at the center of\n", " \t\t\t0.05 degree grid intervals of latitude\n", " \t\t\tfrom -90 to 90.\n", " lat#origname: lat\n", " lat#standard_name: latitude\n", " lat#units: degrees_north\n", " lat#_FillValue: nan\n", " lon#axis: X\n", " lon#bounds: lon_bnds\n", " lon#DimensionNames: lon\n", " lon#LongName: Longitude at the center of\n", " \t\t\t0.05 degree grid intervals of longitude \n", " \t\t\tfrom -180 to 180.\n", " lon#origname: lon\n", " lon#standard_name: longitude\n", " lon#units: degrees_east\n", " lon#_FillValue: nan\n", " scale_factor: 1.0\n", " add_offset: 0.0\n", " grid_mapping: xomethin\n", " unlimited dimensions: \n", " current shape = (470, 408)\n", " filling on,\n", " 'mask': \n", " uint8 mask(y, x)\n", " _FillValue: 0\n", " lat#axis: Y\n", " lat#bounds: lat_bnds\n", " lat#DimensionNames: lat\n", " lat#LongName: Latitude at the center of\n", " \t\t\t0.05 degree grid intervals of latitude\n", " \t\t\tfrom -90 to 90.\n", " lat#origname: lat\n", " lat#standard_name: latitude\n", " lat#units: degrees_north\n", " lat#_FillValue: nan\n", " lon#axis: X\n", " lon#bounds: lon_bnds\n", " lon#DimensionNames: lon\n", " lon#LongName: Longitude at the center of\n", " \t\t\t0.05 degree grid intervals of longitude \n", " \t\t\tfrom -180 to 180.\n", " lon#origname: lon\n", " lon#standard_name: longitude\n", " lon#units: degrees_east\n", " lon#_FillValue: nan\n", " scale_factor: 1.0\n", " add_offset: 0.0\n", " grid_mapping: xomethin\n", " unlimited dimensions: \n", " current shape = (470, 408)\n", " filling on}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# what's the data?\n", "nc_file = \"for_wrong-1.nc\"\n", "# read the data\n", "data = nc4.Dataset(nc_file)\n", "\n", "# how does it look like?... in the inside!\n", "data.variables" ] }, { "cell_type": "markdown", "id": "61db513e-cf1e-4d0d-8e17-cc7bf6c6ab49", "metadata": {}, "source": [ "**rain** is the variable we're interested in (and it's the one stored in the nc.file)." ] }, { "cell_type": "code", "execution_count": 42, "id": "2e155ea8-bdfb-4848-945a-d682d635b440", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "masked_array(\n", " data=[[ 8.85314465, 12.18742561, 12.00609589, ..., 19.86124039,\n", " 21.57285881, 22.31961441],\n", " [ 11.41961765, 12.11657238, 11.99657345, ..., 21.30685425,\n", " 22.31961441, 22.31961441],\n", " [ 11.08961964, 12.00933456, 11.87809658, ..., 21.30685425,\n", " 22.31961441, 22.31961441],\n", " ...,\n", " [461.05093384, 461.05093384, 462.56283569, ..., 530.9977417 ,\n", " 516.53491211, 516.53491211],\n", " [472.34197998, 472.34197998, 470.32394409, ..., 530.9977417 ,\n", " 516.53491211, 516.53491211],\n", " [459.64810181, 459.64810181, 462.9229126 , ..., 527.46783447,\n", " 526.47857666, 526.47857666]],\n", " mask=False,\n", " fill_value=1e+20)" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# what does the inside of RAIN look like?\n", "data[\"rain\"][:]" ] }, { "cell_type": "markdown", "id": "370e0dd0-ed14-46e6-b34d-21fa3cb06ca1", "metadata": {}, "source": [ "Print some statistics..." ] }, { "cell_type": "code", "execution_count": 43, "id": "9e1c9891-c698-4255-bf75-97aa983aeedb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8.853144645690918\n", "1527.81982421875\n" ] } ], "source": [ "print(data[\"rain\"][:].min())\n", "print(data[\"rain\"][:].max())" ] }, { "cell_type": "markdown", "id": "a0e0b449-c154-49db-8d58-cfe0c2e03280", "metadata": {}, "source": [ "Having previously established the [limits](#other), let's work out if the precision (also previously established) is correct enough." ] }, { "cell_type": "code", "execution_count": 44, "id": "6bc8f1a0-f021-4ff4-873a-1d14757dee5c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "starting from 8.0, you'd need a max. of ~3284 to guarantee an epsilon of 0.05\n", "starting from 8.0, you'd need a max. of ~1973 to guarantee an epsilon of 0.03\n", "starting from 8.0, you'd need a max. of ~1646 to guarantee an epsilon of 0.025\n", "starting from 8.0, you'd need a max. of ~1318 to guarantee an epsilon of 0.02\n", "starting from 8.0, you'd need a max. of ~663 to guarantee an epsilon of 0.01\n" ] } ], "source": [ "# run your own (customized) tests\n", "\n", "# initial seed for \"temp\"\n", "temp = 3.14159\n", "# seed larger than the maximum of RAIN data\n", "seed = 4000\n", "# test different PRECISIONs\n", "epsilon = [0.05, 0.03, 0.025, 0.02, 0.01]\n", "\n", "for epsn in epsilon:\n", " while temp > epsn:\n", " temp = (seed - iMIN) / (MAXIMUM - MINIMUM - 1)\n", " seed = seed - 1\n", " print(\n", " (\n", " f\"starting from {iMIN}, you'd need a max. of ~{seed+1} to guarantee an epsilon of {epsn}\"\n", " )\n", " )" ] }, { "cell_type": "markdown", "id": "def03ae8-6f9c-4630-8b7d-6ecfa0e87482", "metadata": {}, "source": [ "So, starting a bit lower that the **MINIMUM**, and going about the **PRECISION** (of 0.025), we'll surpass our **MAXIMUM**.\n", "This means that we can indeed represent all the numbers in **rain** using just [16-bit](https://en.wikipedia.org/wiki/16-bit_computing) un-signed integers.\n", "\n", "Let's test if that's at all true." ] }, { "cell_type": "code", "execution_count": 45, "id": "5d68bca0-74d3-4a4d-8dde-578102f7189a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(65534,)\n", "1646.325\n", "True\n", "(array([], dtype=int64),)\n", "[ 8. 8.025 8.05 ... 1646.275 1646.3 1646.325]\n" ] } ], "source": [ "# testing if i'm using my whole 16-INT \"bit-space\"\n", "allv = PRECISION * (np.linspace(MINIMUM, MAXIMUM - 1, MAXIMUM - 1) - 1) + iMIN\n", "print(allv.shape)\n", "\n", "# last value\n", "print(allv[-1])\n", "print(allv[-1] == iMAX)\n", "\n", "# is there any non-conscutive integer?\n", "# ...if so, we're in trouble(s)\n", "allt = (allv - ADD) / SCL\n", "allt = allt.round(0).astype(\"u2\")\n", "print(np.where(np.diff(allt) != 1))\n", "\n", "# go back from 16-INT to our \"actual\" values\n", "vall = (allt * SCL) + ADD\n", "vall = ROUNDX((allt * SCL) + ADD)\n", "print(vall)" ] }, { "cell_type": "markdown", "id": "66923250-ba2a-4dd0-9f6a-0491111ba027", "metadata": {}, "source": [ "How does the **rain-mask** look like?.\\\n", "*Note that having a mask of 1's and 0's means we can easily save it as 8-bit unsigned INT.*" ] }, { "cell_type": "code", "execution_count": 46, "id": "8d65ec67-f45b-4c7c-976d-6c268add1216", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1., ..., 0., 0., 0.],\n", " [1., 1., 1., ..., 0., 0., 0.],\n", " [1., 1., 1., ..., 0., 0., 0.],\n", " ...,\n", " [1., 1., 1., ..., 0., 0., 0.],\n", " [1., 1., 1., ..., 0., 0., 0.],\n", " [1., 1., 1., ..., 0., 0., 0.]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[\"mask\"][:].data" ] }, { "cell_type": "markdown", "id": "014d72f8-2e17-420d-8829-27cc75e99455", "metadata": {}, "source": [ "## WORKING OUT THE NETCDF ##\n", "\n", "First, we get some help from our void set." ] }, { "cell_type": "code", "execution_count": 47, "id": "51a4a6c4-9a3b-4536-a7b9-42d3eb17bd09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (y: 470, x: 408)>\n",
       "array([[nan, nan, nan, ..., nan, nan, nan],\n",
       "       [nan, nan, nan, ..., nan, nan, nan],\n",
       "       [nan, nan, nan, ..., nan, nan, nan],\n",
       "       ...,\n",
       "       [nan, nan, nan, ..., nan, nan, nan],\n",
       "       [nan, nan, nan, ..., nan, nan, nan],\n",
       "       [nan, nan, nan, ..., nan, nan, nan]])\n",
       "Coordinates:\n",
       "  * y            (y) float64 1.168e+06 1.162e+06 ... -1.172e+06 -1.178e+06\n",
       "  * x            (x) float64 1.342e+06 1.348e+06 ... 3.372e+06 3.378e+06\n",
       "    spatial_ref  int64 0\n",
       "Attributes:\n",
       "    _FillValue:  nan\n",
       "    units:       mm
" ], "text/plain": [ "\n", "array([[nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " ...,\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan]])\n", "Coordinates:\n", " * y (y) float64 1.168e+06 1.162e+06 ... -1.172e+06 -1.178e+06\n", " * x (x) float64 1.342e+06 1.348e+06 ... 3.372e+06 3.378e+06\n", " spatial_ref int64 0\n", "Attributes:\n", " _FillValue: nan\n", " units: mm" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create VOID data\n", "void = np.empty((len(data.dimensions[\"y\"]), len(data.dimensions[\"x\"])))\n", "void.fill(np.nan)\n", "\n", "# create xarray\n", "xoid = xr.DataArray(\n", " data=void,\n", " dims=[\"y\", \"x\"],\n", " # name='void',\n", " coords=dict(\n", " y=([\"y\"], data[\"y\"][:].data),\n", " x=([\"x\"], data[\"x\"][:].data),\n", " ),\n", " attrs=dict(\n", " _FillValue=np.nan,\n", " units=\"mm\",\n", " ),\n", ")\n", "xoid.rio.write_crs(rio.crs.CRS(WKT_OGC), grid_mapping_name=\"spatial_ref\", inplace=True)" ] }, { "cell_type": "markdown", "id": "95d97288-a3fb-4b3c-818b-ab6194517c03", "metadata": {}, "source": [ "We'll interested mainly in the following meta-data:" ] }, { "cell_type": "code", "execution_count": 48, "id": "28b77f5c-4a37-40af-b1f1-91a83ba0578b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROJCS[\"WGS84_/_Lambert_Azim_Mozambique\",GEOGCS[\"unknown\",DATUM[\"unknown\",SPHEROID[\"Normal Sphere (r=6370997)\",6370997,0]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433]],PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],PARAMETER[\"latitude_of_center\",5],PARAMETER[\"longitude_of_center\",20],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"42106\"]]\n" ] } ], "source": [ "# the CRS (which we already have)\n", "print(xoid.rio.crs)" ] }, { "cell_type": "code", "execution_count": 49, "id": "d8982464-5e48-498e-bce0-b37ae6f5ed3a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1340000.0 5000.0 0.0 1170000.0 0.0 -5000.0\n" ] } ], "source": [ "# ...and the transform\n", "print(\n", " \" \".join(\n", " map(\n", " str,\n", " np.roll(np.asarray(xoid.rio.transform()).reshape(3, 3), shift=1, axis=1)[\n", " :-1\n", " ]\n", " .ravel()\n", " .tolist(),\n", " )\n", " )\n", ")" ] }, { "cell_type": "markdown", "id": "77585f50-a840-4967-a894-58c918becd18", "metadata": {}, "source": [ "In dealing with NetCDFs, most of the time we'll also have to deal with dates.\\\n", "The thing is... one can also store the date-dimension (and any other dimension --if possible--) as integer." ] }, { "cell_type": "code", "execution_count": 50, "id": "38664e65-467b-4e9b-bd05-c94a4d055c5a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023-10-15 16:53:15.432848+01:00\n", "2023-10-15 16:53:15.432848+03:00\n" ] } ], "source": [ "# today's date\n", "date_now = datetime.now(tzlocal())\n", "print(date_now)\n", "\n", "# append or modify its time.zone\n", "date_now = date_now.replace(tzinfo=ZoneInfo(TIME_ZONE))\n", "print(date_now)" ] }, { "cell_type": "markdown", "id": "bd6d4305-3dd2-4523-a1cf-0cd2ea40cbf2", "metadata": {}, "source": [ "To compute integers from dates, we need an origin time from which we can count seconds, hours, days... etc." ] }, { "cell_type": "code", "execution_count": 51, "id": "aa2c76c8-0e5c-41f6-9652-f15fe255718c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1970-01-01 00:00:00+03:00\n", "1697388795.432848\n" ] } ], "source": [ "# date of origin\n", "DATE_ORIGEN = datetime.strptime(DATE_ORIGIN, \"%Y-%m-%d\").replace(\n", " tzinfo=ZoneInfo(TIME_ZONE)\n", ")\n", "print(DATE_ORIGEN)\n", "\n", "# seconds since that date (to now... which is our date in analysis)\n", "date_sec = (date_now - DATE_ORIGEN).total_seconds()\n", "print(date_sec)" ] }, { "cell_type": "markdown", "id": "0e98fafe-f008-45b1-8360-969306bad618", "metadata": {}, "source": [ "STORM can interpret/transform seconds to the desired unit. (minutes in [this](#pors) case)" ] }, { "cell_type": "code", "execution_count": 52, "id": "a944ef47-443f-4d39-9ddd-9a314f251862", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "28289813.257214133\n" ] } ], "source": [ "# seconds to minutes\n", "date_min = date_sec * TIME_DICT_[TIME_OUTNC]\n", "print(date_min)" ] }, { "cell_type": "markdown", "id": "588c0f54-aab5-45e9-abed-6e72b0e925ed", "metadata": {}, "source": [ "Rounding minutes..." ] }, { "cell_type": "code", "execution_count": 53, "id": "8ef34314-f081-431f-ba31-4a050865842a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "28289790.0\n" ] } ], "source": [ "o_date_min = BASE_ROUND(date_min)\n", "print(o_date_min)" ] }, { "cell_type": "markdown", "id": "d7338217-eee3-41fd-9721-ab144cb00ab0", "metadata": {}, "source": [ "Finally, we have all the ingredients to create the NetCDF." ] }, { "cell_type": "code", "execution_count": 54, "id": "f5138794-1a80-4ed7-8ff0-80990f5149e5", "metadata": {}, "outputs": [], "source": [ "# the name of the output-NetCDF\n", "out_name = \"fiv_ok.nc\"\n", "\n", "# CREATE NC.FILE\n", "nc = nc4.Dataset(out_name, \"w\", format=\"NETCDF4\")\n", "nc.created_on = datetime.now(tzlocal()).strftime(\"%Y-%m-%d %H:%M:%S %Z\") # %z\n", "\n", "# define SUB.GROUP\n", "sub_grp = nc.createGroup(\"group_one\")\n", "# define its dimensions\n", "sub_grp.createDimension(\"y\", len(data.dimensions[\"y\"]))\n", "sub_grp.createDimension(\"x\", len(data.dimensions[\"x\"]))\n", "# time could be created here... but it's not\n", "# sub_grp.createDimension('t', None) # unlimited\n", "\n", "# name for the CRS\n", "sref_name = \"spatial_ref\"\n", "\n", "# create variables\n", "grid = sub_grp.createVariable(sref_name, \"u1\")\n", "\n", "# define some attributes -> PART 1\n", "grid.long_name = sref_name\n", "grid.crs_wkt = xoid.spatial_ref.attrs[\"crs_wkt\"]\n", "grid.spatial_ref = xoid.spatial_ref.attrs[\"crs_wkt\"]\n", "# https://www.simplilearn.com/tutorials/python-tutorial/list-to-string-in-python\n", "# https://www.geeksforgeeks.org/how-to-delete-last-n-rows-from-numpy-array/\n", "# grid.GeoTransform = ' '.join( map(str, list(xoid.rio.transform()) ) )\n", "# # this is apparently the \"correct\" way to store the GEOTRANSFORM!\n", "grid.GeoTransform = \" \".join(\n", " map(\n", " str,\n", " np.roll(np.asarray(xoid.rio.transform()).reshape(3, 3), shift=1, axis=1)[:-1]\n", " .ravel()\n", " .tolist(),\n", " )\n", ") # [:-1] removes last row\n", "\n", "# define some attributes -> PART 2\n", "# ~~~ from CFCONVENTIONS.ORG ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [start]\n", "grid.grid_mapping_name = \"lambert_azimuthal_equal_area\"\n", "grid.latitude_of_projection_origin = 5\n", "grid.longitude_of_projection_origin = 20\n", "grid.false_easting = 0\n", "grid.false_northing = 0\n", "# grid.horizontal_datum_name = 'WGS84' # in pple this can also be un-commented!\n", "grid.reference_ellipsoid_name = \"sphere\"\n", "# new in CF-1.7 [https://cfconventions.org/wkt-proj-4.html]\n", "grid.projected_crs_name = \"WGS84_/_Lambert_Azim_Mozambique\"\n", "# ~~~ from CFCONVENTIONS.ORG ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [end]\n", "\n", "# define some attributes -> PART 3\n", "# ~~~ from https://publicwiki.deltares.nl/display/NETCDF/Coordinates [start]\n", "grid._CoordinateTransformType = \"Projection\"\n", "grid._CoordinateAxisTypes = \"GeoY GeoX\"\n", "# ~~~ from https://publicwiki.deltares.nl/display/NETCDF/Coordinates ~ [end]\n", "\n", "# STORING LOCAL COORDINATES (which are seen also as variables)\n", "yy = sub_grp.createVariable(\"projection_y_coordinate\", \"i4\", dimensions=(\"y\"))\n", "xx = sub_grp.createVariable(\"projection_x_coordinate\", \"i4\", dimensions=(\"x\"))\n", "yy[:] = data[\"y\"][:].data\n", "xx[:] = data[\"x\"][:].data\n", "yy.coordinates = \"projection_y_coordinate\"\n", "xx.coordinates = \"projection_x_coordinate\"\n", "yy.units = \"meter\"\n", "xx.units = \"meter\"\n", "yy.long_name = \"y coordinate of projection\"\n", "xx.long_name = \"x coordinate of projection\"\n", "yy._CoordinateAxisType = \"GeoY\"\n", "xx._CoordinateAxisType = \"GeoX\"\n", "yy.grid_mapping = sref_name\n", "xx.grid_mapping = sref_name\n", "\n", "# store these attributes (for later use)\n", "tag_y = yy.getncattr(\"coordinates\")\n", "tag_x = xx.getncattr(\"coordinates\")\n", "\n", "# STORING THE MASK (as 8-bit INT)\n", "ncmask = sub_grp.createVariable(\n", " \"mask\", \"i1\", dimensions=(\"y\", \"x\"), zlib=True, complevel=9\n", ")\n", "ncmask[:] = data[\"mask\"][:].data.astype(\"i1\")\n", "ncmask.grid_mapping = sref_name\n", "ncmask.long_name = \"catchment mask\"\n", "ncmask.description = \"1 means catchment or region : 0 is void\"\n", "ncmask.coordinates = f\"{tag_y} {tag_x}\"\n", "\n", "\n", "# # if ANOTHER/OTHER variable is needed (e.g.)\n", "# maskxx = sub_grp.createVariable(\n", "# \"k_means\",\n", "# datatype=\"i1\",\n", "# dimensions=(\"y\", \"x\"),\n", "# dimensions=(\"y\", \"x\"),\n", "# zlib=True,\n", "# complevel=9,\n", "# fill_value=np.array(-1).astype(\"i1\"),\n", "# )\n", "# maskxx.grid_mapping = sref_name\n", "# maskxx.long_name = \"k-means clusters\"\n", "# maskxx.description = \"-1 indicates region out of any cluster\"\n", "# maskxx.coordinates = f\"{tag_y} {tag_x}\"\n", "\n", "\n", "# define the TIME.variable/dimension\n", "nctnam = \"time\"\n", "sub_grp.createDimension(nctnam, None)\n", "timexx = sub_grp.createVariable(nctnam, TIMEINT, (nctnam), fill_value=TIMEFIL)\n", "timexx[:] = o_date_min.astype(TIMEINT)\n", "timexx.long_name = \"starting time\"\n", "# '%Y-%m-%d %H:%M:%S %Z%z'\n", "timexx.units = f\"{TIME_OUTNC} since {DATE_ORIGEN.strftime('%Y-%m-%d %H:%M:%S')}\"\n", "timexx.calendar = \"proleptic_gregorian\" # 'gregorian'#\n", "timexx._CoordinateAxisType = \"Time\"\n", "timexx.coordinates = nctnam\n", "\n", "# SWITCH IF YOU ONE STILL TO PRESERVE YOUR var AS FLOAT\n", "if RAINFMT[0] == \"f\":\n", " # -DOING.FLOATS------------------------------------------------------------------\n", " ncvarx = sub_grp.createVariable(\n", " \"rain\",\n", " datatype=f\"{RAINFMT}\",\n", " dimensions=(nctnam, \"y\", \"x\"),\n", " zlib=True,\n", " complevel=9,\n", " least_significant_digit=3,\n", " fill_value=np.nan,\n", " )\n", " ncvarx[:] = (data[\"rain\"][:].data).astype(f\"{RAINFMT}\")\n", "else:\n", " # -DOING.INTEGERS--------------------------------------------------------------\n", " ncvarx = sub_grp.createVariable(\n", " \"rain\",\n", " datatype=f\"{RAINFMT}\",\n", " dimensions=(nctnam, \"y\", \"x\"),\n", " zlib=True,\n", " complevel=9,\n", " # least_significant_digit=3),\n", " fill_value=np.array(0).astype(f\"{RAINFMT}\"),\n", " )\n", " ncvarx[0, :] = (((data[\"rain\"][:].data - ADD) / SCL).round(0)).astype(f\"{RAINFMT}\")\n", " # ncvarx[0,:] -> ONLY FOR THIS EXERCISE\n", "\n", "# attributes for the RAIN variable (so later can be properly read)\n", "ncvarx.precision = PRECISION\n", "ncvarx.scale_factor = SCL\n", "ncvarx.add_offset = ADD\n", "# ncvarx._FillValue = np.array(MINIMUM).astype(f\"{RAINFMT}\")\n", "# ncvarx._FillValue = np.array(0).astype(f\"{RAINFMT}\")\n", "ncvarx.units = \"mm\"\n", "ncvarx.long_name = \"rainfall\"\n", "ncvarx.grid_mapping = sref_name\n", "ncvarx.coordinates = f\"{tag_y} {tag_x}\"\n", "\n", "nc.close()" ] }, { "cell_type": "markdown", "id": "050e6f6c-5c12-49b8-be22-958a54596d6d", "metadata": {}, "source": [ "Time to check if this technique is at all useful??" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:prll]", "language": "python", "name": "conda-env-prll-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }