This is a quarto website for the hypertidy family of R packages for multi-dimensional and spatial data.
Hypertidy is an approach to spatial or multi-dimensional data in R based on the following principles:
- data sources are best managed with database techniques, which mostly means managing what you have and using clever indexing
- spatial is not special, user-interfaces, graphics and time-series all involve ‘space’
- raster and vector are not naturally distinct, high-dimensional, curvilinear, cell-based rasters, polygon grids and ragged-arrays all cross this boundary
- ‘mesh-based’ or ‘indexed’ data structures are central, general data types while traditionally “geospatial” structures are domain-specific optimizations, for example vector and raster exist at extremes of a continuum defined by mesh structures
- simple features left topology behind, resulting in fragmented and inefficient workarounds in the geo-spatial communities
- R spatial needs better APIs, a rich ecosystem of small, composable software modules
- tidy data principles are about leveraging databases and database techniques
Examples of these principles are seen in these R packages.
Hypertidy manifesto
There’s a series of principles that hypertidy represents, we don’t think these are very controversial but they do not align well with some high-profile geo-spatial projects.
Gridded data
Hypertidy recognizes that not all gridded data fit into the GIS raster conventions. Gridded data comes in many forms, geographic with longitude-latitude or projected spaces, with time and or depth dimensions, with different orderings of axes (i.e. time-first, the latitude-longitude), and with generally any arbitrary space. A space is simply a set of axes with particular units and projection, and yes we mean “space” and “projection” in the more general mathematical sense. Date-time data is a projection, there is a mapping of a particular set of values to the real line and the position on that line for particular instant is defined by the axis units and epoch.
Mesh and grid share the same meaning in some contexts.
Structured vs. unstructured, topology vs. geometry
Array-based data have a straight-forward relationship between a set of axes that have discrete steps. These are “structured grids”. Unstructured grids include triangulations, non-regularly binned histograms, tetrahedral meshes and ragged arrays.
An unstructured mesh (grid) is able to represent any data structure, but structured meshes have some advantages because of the regular indexing relationship between dimensions.
GIS vector constructs “polygons”, “lines”, “points” are special case optimizations of the unstructured grid case. Polygons really are topologically identical to lines, and they are a dead-end in the broader scheme of dimensionality. Points and lines can are topologicaly 0-dimensional and 1-dimensional respectively, and this shape-constraint is the same no matter what geometric dimension they are defined in. A line can twist around a 4D space with x, y, z, t coordinates at its segment nodes or it can be constrained to single dimension with only one of those coordinates specifying its position. The topology of the line is completely independent of the geometry, if we treat the line as composed of topological primitives.
Polygons are not composed of topological primitives, but they can be treated as being composed of line primitives.