Pangeo is a community effort for big data in the geosciences. As part of this effort, we are bringing together the a collection of Python tools, including Xarray, Dask, and Jupyter to facilitate highly scalable data analytics on large geoscientific datasets. I am one of the co-founders of the Pangeo project and sit on the Pangeo Steering Council.


Xarray is the fusion of Python’s Pandas, Numpy, and netCDF4 packages. It offers unparalleled computational ability on labeled N-dimensional arrays. I have been contributing to the Xarray project since May 2014.


Dask is Python a library for scalable computing with dynamic task scheduling. As part of my work with Pangeo and Xarray, I have been contributing to the Dask ecosystem. In particular, I have worked extensively on Dask deployment utilities like Dask-jobqueue and Dask-Kubernetes.


Zarr is a data format and for chunked, compressed, multidimensional arrays. It is designed for working with large scientific datasets and supports multiple backend storage interfaces (cloud storage, https, posix, etc.). I am on the Zarr core team and have mostly focused on optimizations for working with Zarr from Xarray and Dask.