An unfinished post
There’s several meanings floating around when you say the “GDAL warper”. It can mean
- command-line gdalwarp
- the rasterio package in Python, and its WarpedVRT
- the GDAL C++ API warping library
We can also mean
- the sf package in R, and its gdal_utils() function
- the stars package in R, and its st_warp() function
- the terra package in R, and its project() function
- the gdalUtils package in R
- the gdalUtilities package in R
But, what I mean is the GDAL C++ API warping library.
The GDAL C++ API warping library
GDAL is complex. It deals with very many different formats, and has many tools. In terms of this post, it has very low-level development facilities, written in C++. This means you can usually go as deep as you need into a geospatial problem by writing C++ code against the library directly. Often you don’t need C++ and can use Python, and sometimes you can use R depending on what is exposed there. Sometimes you need to modify GDAL itself, and you can do anything you want then(!), which is how open source is supposed to work.
Key feature of the gdalwarp_lib.cpp GDALWarp()
C++ function, versus the GDALWarpDirect()
, GDALWarpIndirect()
and the lower level GDALWarpMulti()
and GDALWarpImage()
functions.
When we use gdalwarp_lib, we get handling of multiple-zoom level sources without intervention on our own.
Most use cases I see of the warper facilities are for whole-sale conversion of large datasets, multiple input files converted to a another projection in a large output file or series of tiles.
The warper is also a generalized RasterIO
(rasterio is a famous python package for using GDAL’s raster facilities, it’s not what we mean here).
RasterIO is a C++ function in the GDAL library, its job is to take a source data set (i.e. a path to GeoTIFF) and to provide you with a window of pixels from that source. A window can be a subset, the entire raster, or a resampling (i.e. fewer pixels than native) of either the entire or a subset of the raster.
It’s really, cool and using it looks like this:
= rasterBand->RasterIO(GF_Read,
err , Yoffset,
Xoffset, nYSize,
nXSize&double_scanline[0],
, outYSize,
outXSize,
GDT_Float640, 0, &psExtraArg);
There is a lot going on in that function call, but in short the key parts are (these are my argument-variable-names):
rasterBand->RasterIO()
is the way we read a pixel values, we have a raster band and we call the member function RasterIO()Xoffset
,Yoffset
is the first column and row we should consider for reading (0,0 if we start at the top left corner)nXSize
,nYSize
outXSize
,outYSize
is the dimension of the window we get out
That last bits, the two kinds of Size is the magic, we can ask to start at a particular row,column and read out a given number of pixels in x and y. But, not only that we can specifiy where to end in the source. If nXSize and outXSize are not equal in value then we have asked for a resampling of the source “only read every nXSize / outXSize
values in the x direction. If they are equal, just read every one. Note also that we might ask for an outX/YSize that is larger than the source nXSize/nYSize - and this would give us a resampling to higher resolution. This is really where the resampling algorithm comes in. ‘Nearest neighbour’ would give us copies of pixels, ‘Bilinear’ and interpolation between source pixels for new pixels in between. Other algorithms include ‘Cubic’, ‘Lanczos’, ‘Average’, ‘Sum’.
This is the key behind the fast and lazy reads provided by the terra and sf packages in R, this was originally available also in rgdal and raster made heavy use of it.
But, what is the offset and the size? We really want to think in geographic coordinates, and this is exactly what crop()
does for example. We get a discretized crop, not an exact one because we only get to read by this raster-based mechanism - we are bound to the size and alignment of the source pixels.
Under the hood, functions like crop() do the following:
offsets/scale vs. xlim,ylim
The other arguments in RasterIO.
GF_Read
controls the mode we are into (read or write or update)GDT_Float64
controls the type of data we get out (64 bit doubles here, GDAL will auto-convert if the source is different type)0, 0, &psExtraArg
are further details we won’t discuss (though, the way resampling is done is controlled in the extra args options).
What are its limitations:
- only one source at a time, you can’t resample from multiple rasters at once (we are using offset/size indexing so the rasters must all be the same shape, you can read multiple bands)
What is interesting about RasterIO vs. Warping is that we would never ever use warping for pure graphics. It doesn’t make sens
Enter the WuTang Warper!
Doing discretized extent raster math is boring. With the warper we don’t need to do it.
This is because the warper is for changing projections, this is an entire re-modelling of raster data. (in the 2000s it was possible to do but still quite slow and problematic, hence you have very slowly changing standards and perceptions of what is possible/normal/appropriate in software comunities etc.).
This is a global longlat data set, but we want a local projection so we adopt one by finding its specification.
If we were using RasterIO we’d have to do this kind of math to run the index function.
But with the warper, we can use an extent (but we still need to align it, or we won’t get what the RasterIO facility would faithfully give).