netCDF for Dummies

I have been working at Nusantara Earth Observation Network for a while, but only exposed to geospatial data recently. Without a formal education on geosciences or geospatial data, I often find myself fumbling for information. One of the most prevalent data format that I come across is netCDF. This article is written as an attempt to rewrite information about netCDF available in the internet from various source.

Network Common Data Form

Network Common Data Form (netCDF) is a file format for storing multidimensional scientific data (variables). Each netCDF file is made up of three basic components: dimensions, variables, and attributes.

A netCDF dimension is used to specify the shape of one or more of the variables, it can be used to represent time, latitude, longitude, or atmospheric level/ocean depth.

A netCDF variable is an array of values on the same type, each variable has a name, data type, and a shape described by a list of dimensions. Scalar variables have empty list of dimensions. A netCDF variable may also have an associated list of attributes to represent information about the variable, such as a units string, valid range of values, special value for missing data, and a long descriptive name.

A netCDF attribute provides auxiliary information about the variables or the dataset itself. With all three components, netCDF is a self-describing file, containing all information describing the data it contains. NetCDF data is machine-independent, by using the eXternal Data Representation (XDR), to represent array of bytes, 16-bit short integers, 32-bit long integers, IEEE-standard 32 and 64 bit floating point numbers. By using this XDR, programs will always deal with integer and floating-point data in the native form of the machine on which they run.

Since netCDF is a self-describing data that is also machine-independent, numerous scientific groups use netCDF to share their data. NetCDF can be accessed by programming interfaces in C, C++, Java, Fortran, Python, IDLL, MATLAB, R, Ruby, and Perl.

Processing NetCDF in Python with Xarray

Xarray expands on the capabilities of NumPy ndarray (N-dimensional array) by addling labels in the form of dimensions, coordinates and attributes on top of NumPy-like arrays. Xarray’s interface is based on the netCDF data model, but it goes beyond the traditional netCDF interfaces. Xarray is designed to be domain agnostic, providing multi-dimensional arrays manipulation for all sorts of applications.

Xarray is tailored to work with netCDF files, which were the source of xarray’s data model, and integrates tightly with dask for parallel computing for large datasets. Xarray has two core data structures, DataArray, which is similar to a pandas.Series for N-dimensional array, and Dataset which is similar to a pandas.DataFrame for a dict-like container of multiple N-dimensional arrays.

Xarray is built on other existing Python libraries, such as NumPy/Pandas for fast arrays/indexing, Dask for parallel computing, matplotlib for plotting.

Anaconda Environment for NetCDF Processing

To begin working with netCDF files on Python, create a new environment, so that your existing environment will be in tact.

$ conda create -n netcdf python=3.9
$ conda activate netcdf

Then, we can install from conda-forge channel, the packages we need and a JupyterLab ecosystem to sandbox and work with netCDF data. In this case, rioxarray module is also installed so that netCDF files can be exported as GeoTIFF.

$ conda install -c conda-forge xarray[complete] dask bottleneck
$ conda install -c conda-forge rioxarray
$ conda install -c conda-forge jupyterlab
$ jupyter-lab

NetCDF to Raster

By using rioxarray (rasterio and xarray), we can convert a netCDF file to a raster file (GeoTIFF). An example of how to export netCDF DataArray as a GeoTIFF is as follows:

import xarray as xr
import rioxarray as rxr

fname = 'L3m_20210101-20210108__242278547_4_AV-OLA_ZSD_8D_00.nc'

ds = xr.open_dataset(fname)
ds.rio.set_spatial_dims('lon', 'lat')
ds.rio.set_crs('EPSG:4326')
ds.ZSD_mean.rio.to_raster('ZSD_20210101-20210108_AV-OLA.tif')
Figure 1. Raster data exported from NetCDF, visualized on QGIS

References:

  1. xarray: N-D labeled arrays and datasets in Python (pydata.org)
  2. Unidata | NetCDF (ucar.edu)
  3. NetCDF: an interface for scientific data access | IEEE Journals & Magazine | IEEE Xplore

Diterbitkan oleh josefmtd

Electronics Engineer

Tinggalkan Balasan

Isikan data di bawah atau klik salah satu ikon untuk log in:

Logo WordPress.com

You are commenting using your WordPress.com account. Logout /  Ubah )

Foto Google

You are commenting using your Google account. Logout /  Ubah )

Gambar Twitter

You are commenting using your Twitter account. Logout /  Ubah )

Foto Facebook

You are commenting using your Facebook account. Logout /  Ubah )

Connecting to %s

%d blogger menyukai ini: