NetCDF is a machine-independent, array-oriented, multi-dimensional, self-describing, and portable data format used by various scientific communities. It has a filename extension of .nc or .cdf (though it is believed that there are subtle differences between the two). Unlike files in .csv or .xlsx , NetCDF format cannot be accessed and viewed directly using spreadsheet software.
Even if you could, you would not do that on a 4-dimensional data with a bunch of metadata.
I will take climate data from Atmospheric Radiation Measurement Climate Research Facility (ARM) in the United States, and European Centre for Medium-Range Weather Forecasts (ECMWF) in Europe as an example.
PrerequisitesWe will use xarray library in python for data processing. Long story short, it builds upon numpy (and dask ) libraries and leverages the power of pandas , but you probably don’t need to know about it. As you might know, package dependency is a pain in Python. That is why the most convenient way to get everything installed is to use the following command:
$ conda install xarray dask netCDF4 bottleneckExperienced Python programmers are recommended check the relevant documentation for more details. If you are a beginner, no worries. I made a list of dependencies that you need to check:
Python 2.7/3.5+ required numpy 1.12+ required pandas 0.19.2+ required scipy for interpolation features bottleneck for speeding up NaN-skipping netCDF4-python for basic netCDF operation such as reading/writing dask-array 0.16+ for parallel computing with daskIf you want to visualize your dataset, you will probably need these:
matplotlib 1.5+ for plotting cartopy for maps seaborn for better colour palettesFor absolute beginners, you can check your default version of Python by
$ python --version Python 2.7.5You can also check if Python3 is installed by
$ python3 --version Python 3.4.9To check the version of packages, use pip freeze or conda list . Things should check out if you install xarray through conda .
Alternativesiris is an alternative to xarray , but some works need to be done to make it work on windows, and it does not work well on Mac OS. Iris is also an English word, so googling ‘iris’ gives you many irrelevant results. It was a pain for me to use iris .
Data PreviewIt is always a good idea to ‘preview’ and ‘get to know’ your data, its metadata and data structures. Assume you have installed netCDF4-python and the only two commands you need are ncdump and ncview . The former gives text representation of your netCDF dataset (basically metadata and the data itself), while the latter is a very powerful graphical interface for instant data visualization.
ncdumpGo to the directory of your dataset and try
$ ncdump -h twparmbeatmC1.c1.20050101.000000.cdfAs we do not need to see the values of every data entry at the moment, -h ensures only header (metadata) is shown. You will get
netcdf twparmbeatmC1.c1.20050101.000000 { dimensions: time = UNLIMITED ; // (8760 currently) range = 2 ; p = 37 ; z = 512 ; variables: double base_time ; base_time:long_name = "Base time in Epoch" ; base_time:units = "seconds since 1970-1-1 0:00:00 0:00" ; base_time:string = "2005-01-01 00.00, GMT" ; base_time:ancillary_variables = "time_offset" ; float prec_sfc(time) ; prec_sfc:long_name = "Precipitation Rate" ; prec_sfc:standard_name = "lwe_precipitation_rate" ; prec_sfc:units = "mm/hour" ; prec_sfc:missing_value = -9999.f ; prec_sfc:_FillValue = -9999.f ; prec_sfc:source = "twpsmet60sC1.b1" ; float T_p(time, p) ; T_p:long_name = "Dry Bulb Temperature, from sounding in p coordinate" ; T_p:standard_name = "air_temperature" ; T_p:units = "K" ; T_p:missing_value = -9999.f ; T_p:_FillValue = -9999.f ; T_p:source = "twpsondewnpnC1.b1:tdry" ; // global attributes:< OTHER METADATA >
}
You can see dimensions, variables, and other metadata which are quite self-explanatory. Global attributes (not printed above) tells us how the data is collected and pre-processed. In this example, they are measurement data taken at 147.4E 2.1S, Manus, Papua New Guinea by ARM.
When we look into the list of variables: 1-dim prec_sfc and 2-dim T_p , we realize that they have different dimensions(!). Precipitation rate is a scalar measurement at each time, whereas temperature is column (measurements at different pressure levels instead of altitude levels this time) at every time. It is quite common to see 4-dim data in climate science ― latitude, longitude, altitude/pressure level, time.
ncviewTry the following command and it gives you a graphical interface that lists all variables in your dataset, and it is quite straightforward.
$ ncview twparmbeatmC1.c1.20050101.000000.cdf
Graphical interface in linux using ncview Terminology

Data structures of xarray DataArray
xarray.DataArray is an implementation of a labelled, multi-dimensional array for a single variable , such as precipitation, temperature etc.. It has the following key properties:
values : a numpy.ndarray holding the array’s values dims : dimension names for each axis (e.g., ('lat', 'lon', 'z', 'time') ) coords : a dict-like container of arrays (coordinates) that label each point (e.g., 1-dim arrays of numbers, DateTime objects, or strings) attrs : an OrderedDict to hold arbitrary metadata (attributes) DataSetxarray.DataSet is a collection of DataArrays. Each NetCDF file contains a DataSet.
Coding usingXArray Data ImportYou cannot play with the data until you read it. Use open_dataset or open_mfdataset to read a single or multiple NetCDF files, and store it in a DataSet called DS .
import xarray as xr # single file dataDIR = '../data/ARM/twparmbeatmC1.c1.20050101.000000.cdf' DS = xr.open_dataset(dataDIR) # OR m