save_multitask_to_cache¶

save_multitask_to_cache(canvas, count, canvas_zarr, count_zarr, save_path='temp.zarr', *, verbose=True)[source]¶

Write accumulated horizontal row blocks to a Zarr cache on disk.

This function is called when intermediate per-head accumulators (canvas and count) become large enough to risk exceeding the memory threshold. It computes the current Dask arrays for each head, writes them to Zarr datasets under save_path, and updates canvas_zarr / count_zarr so later merges operate directly on Zarr-backed arrays rather than holding everything in memory.

For each head:
  1. The corresponding canvas and count Dask arrays are fully computed.

  2. If this is the first time spilling for that head, new Zarr datasets are created using chunk shapes consistent with the canvas rows.

  3. The computed rows are appended to the Zarr datasets by resizing the arrays and writing the new rows at the end.

  4. The updated Zarr arrays are returned to be wrapped by Dask in later steps.

Parameters:
  • canvas (list[da.Array]) – Accumulated per-head row blocks (probability/logit sums). Each head’s entry has shape (N_rows, H, W, C) where N_rows grows as horizontal rows are merged.

  • count (list[da.Array]) – Accumulated per-head row hit counts aligned with canvas, with matching shape and chunking.

  • canvas_zarr (list[zarr.Array | None]) – List of Zarr datasets for storing accumulated canvas values per head. None entries indicate that no Zarr datasets have been created yet for those heads.

  • count_zarr (list[zarr.Array | None]) – List of Zarr datasets mirroring canvas_zarr but storing hit counts instead of accumulated values.

  • save_path (str | Path) – Path to the Zarr group used for caching. A new group is created if needed on the first spill.

  • verbose (bool) – Whether to display progress bar.

Returns:

Updated canvas_zarr and count_zarr lists, where each head now has a Zarr dataset containing all accumulated rows up to this point.

Return type:

tuple[list[zarr.Array], list[zarr.Array]]

Notes

  • Chunking for the Zarr datasets follows the Dask chunk size along the row axis to allow efficient later vertical merging.

  • This function does not normalize probabilities; normalization happens in the final vertical merge via merge_multitask_vertical_chunkwise.

  • After spilling, upstream functions will reset in-memory canvas and count to free RAM and continue populating new entries.