Pangeo is a community effort for big data in the geosciences. As part of this effort, we are bringing together the a collection of Python tools, including Xarray, Dask, and Jupyter to facilitate highly scalable data analytics on large geoscientific datasets. I am one of the cofounders of the Pangeo project and sit on the Pangeo Steering Council.
Xarray is the fusion of Python’s Pandas, Numpy, and netCDF4 packages. It offers unparalleled computational ability on labeled N-dimensional arrays. I have been contributing to the Xarray project since May 2014.
Dask is Python a library for scalable computing with dynamic task scheduling. As part of my work with Pangeo and Xarray, I have been contributing to the Dask ecosystem. In particular, I have worked extensively on Dask deployment utilities like Dask-jobqueue and Dask-Kubernetes.
Zarr is a data format and for chunked, compressed, multidimensional arrays. It is designed for working with large scientific datasets and supports multiple backend storage interfaces (cloud storage, https, posix, etc.). I am on the Zarr core team and have mostly focused on optimizations for working with Zarr from Xarray and Dask.