GDAL raster read/write by blocks

news
code
Author

Michael D. Sumner

Published

2022-05-04

Read and write raster by blocks.

A block is another word for a tile, a tile in a small-ish raster window within a larger raster. Tiles can be very clever, such as 256x256, they make a nice way to organize large data- keeping pieces of data that are nearby spatial nearby to each other in memory. There is a pair of functions internal to vapour that will 1) read data from a raster block 2) write data to a raster block. At the moment only Float64 data is handled.

Tiles

Tiles are a way of organizing rasters, a tile may be 256x256, 512x512 (typically powers of 2 for sensible reasons), or they may be of higher dimensions 256x256x8 - this is really a private detail for a storage format, such as a file. The data values stored in a tile are a nother matter, these are like variables - we might have 1 variable (say elevation) stored in a double floating point value, or we might have 3 variables of byte values for storing an RGB image. These are independent concepts to the size of the tile, and strictly there may be no variables at all, just the abstract idea of the tiling.

Another kind of block, or tile is a the scanline of a raster. Imagine reading each row of the raster, from the top row to the bottom. Each line is its own kind of degenerate tile. We can read from a raster this way no matter what its internal tiling is, just software might be opening several tiles and reading just one line from each. So, best if we match our “tile attack” to the native structure of the data.

Here we find a large, tiled raster, obtain its internal tiling, and use that to update a copy of the same file.

Please note how we copy the source file, then write to that. There’s probably other software better suited to doing this atm, this post simply aims to air the topic a little in an R context.

Here we use only temporary files that we download, and copy as needed. In practice, you should set this up to work across different physical disks, and for better workflow we need the ability to open an empty file to write to (WIP).

raster_url <- "ftp://ftp.data.pgc.umn.edu/elev/dem/setsm/REMA/mosaic/v1.1/200m/REMA_200m_dem.tif"

readfrom_file <- tempfile(pattern = "readfrom", fileext = ".tif")
writeto_file <- tempfile(pattern = "writeto", fileext = ".tif")
## file size is 1.3Gb
curl::curl_download(raster_url, readfrom_file)

## this now makes a copy of the 1.3Gb file, so we have two of them
fs::file_copy(readfrom_file, writeto_file)

Now we want the tiling

info <- vapour::vapour_raster_info(readfrom_file)
tiling <- list(dimension = info$dimXY, tiles = info$tilesXY)
fac <- 100
fac * tiling$tiles
if (tiling$tiles[2] == 1) {
  ## let's take fac scanlines at a time
  tiling$tiles[2] <- fac
}
calc_steps <- function(dimension, tiles) {
  bounds_x <- seq(0, dimension, by = tiles)
  steps_x <-  rep(tiles, length.out = length(bounds_x)-1)
  dangle_x <- sum(steps_x) -dimension
  if (dangle_x > 0) steps_x[length(steps_x)] <- steps_x[length(steps_x)] - dangle_x
  
  list(head(bounds_x, -1), steps_x)
}
x_step <- calc_steps(tiling$dimension[1], tiling$tiles[1])
y_step <- calc_steps(tiling$dimension[2], tiling$tiles[2])
y_step
system.time({
for (i in seq_along(x_step[[1]])) {
  startx <- x_step[[1]][i]
  countx <- x_step[[2]][i]
  for (j in seq_along(y_step[[1]])) {
  starty <- y_step[[1]][j]
  county <- y_step[[2]][j]
    
  ## now read
  offset <- c(startx, starty)
  dimension <- c(countx, county)
  vals <- vapour:::vapour_read_raster_block(readfrom_file, offset = offset, dimension = dimension, band_output_type = info$datatype, band = 1)[[1L]]
  
  ## do something to the values
  if (any(na.omit(vals) > info$nodata_value)) {
    vals <- vals * -1
    vapour:::vapour_write_raster_block(writeto_file, vals, offset, dimension, band = 1, overwrite = TRUE)
  }
  }
}