Michael Sumner – hypertidy

I’m a researcher at the Australian Antarctic Division in Hobart, working on geospatial data systems for Southern Ocean and Antarctic science. My day job involves satellite imagery, ocean model outputs, and tracking data — which means a lot of rasters, meshes, and coordinate transformations.

I maintain the hypertidy family of R packages (see the packages listing) and contribute to the broader R geospatial ecosystem. Most of what I build is motivated by problems that come up in research: needing direct access to GDAL without heavy abstractions, working with grids that don’t fit the GIS raster mould, and connecting raster and vector through mesh-based data structures.

I originally created vapour to give R direct, dependency-light access to the GDAL C API — read raster and vector data without importing the whole spatial stack. Chris Toney’s gdalraster now easily supersedes vapour, with comprehensive bindings and an active community. It’s where I’d point anyone who wants to work close to GDAL in R today.

Hypertidy is a collaborative effort. Mark Padgham wrote geodist and contributed core C++ to laridae, and was central to the topology discussions that shaped silicate. Noam Ross at rOpenSci originally created fasterize. tidync went through rOpenSci review. Every day I rely on a collaborative data collection built with bowerbird by Ben Raymond at the AAD.

This blog is where I write about GDAL, raster logic, NetCDF, coordinate systems, and the R packages that tie them together.

Here’s an older blog that I don’t want to forget about: mdsumner.github.io/blog/ and Fair Tirade.

The hypertidy philosophy

Hypertidy is built on a few core ideas:

Spatial is not special. User interfaces, graphics, and time-series all involve “space”. Date-time is a projection. Longitude-latitude is a projection. They’re all mappings to a coordinate axis.

Raster and vector are not naturally distinct. High-dimensional, curvilinear, cell-based rasters, polygon grids, and ragged arrays all cross the traditional raster/vector boundary. Both are special cases of mesh structures — GIS just optimized for the extremes.

Topology matters. Simple features left topology behind, resulting in fragmented and inefficient workarounds. Polygons are topologically identical to closed lines. Points, lines, and surfaces are 0-, 1-, and 2-dimensional primitives whose topology is independent of their geometry.

Data sources are databases. Manage what you have, use clever indexing, don’t materialize what you don’t need. This is what GDAL does well, and it’s what gdalraster exposes directly.

Small, composable tools. R spatial needs better APIs — a rich ecosystem of focused packages rather than monolithic frameworks. Thin wrappers over GDAL and PROJ, not heavy abstractions on top of them.

Structured and unstructured grids are a continuum. Array-based data have a regular indexing relationship between dimensions. Unstructured grids — triangulations, tetrahedral meshes, ragged arrays — can represent anything. GIS vectors are a special-case optimization of the unstructured case. The general solution is the mesh.