save_multitask_to_cache¶
- save_multitask_to_cache(canvas, count, canvas_zarr, count_zarr, save_path='temp.zarr', *, verbose=True)[source]¶
Write accumulated horizontal row blocks to a Zarr cache on disk.
This function is called when intermediate per-head accumulators (
canvasandcount) become large enough to risk exceeding the memory threshold. It computes the current Dask arrays for each head, writes them to Zarr datasets undersave_path, and updatescanvas_zarr/count_zarrso later merges operate directly on Zarr-backed arrays rather than holding everything in memory.- For each head:
The corresponding
canvasandcountDask arrays are fully computed.If this is the first time spilling for that head, new Zarr datasets are created using chunk shapes consistent with the canvas rows.
The computed rows are appended to the Zarr datasets by resizing the arrays and writing the new rows at the end.
The updated Zarr arrays are returned to be wrapped by Dask in later steps.
- Parameters:
canvas (list[da.Array]) – Accumulated per-head row blocks (probability/logit sums). Each head’s entry has shape
(N_rows, H, W, C)whereN_rowsgrows as horizontal rows are merged.count (list[da.Array]) – Accumulated per-head row hit counts aligned with
canvas, with matching shape and chunking.canvas_zarr (list[zarr.Array | None]) – List of Zarr datasets for storing accumulated
canvasvalues per head.Noneentries indicate that no Zarr datasets have been created yet for those heads.count_zarr (list[zarr.Array | None]) – List of Zarr datasets mirroring
canvas_zarrbut storing hit counts instead of accumulated values.save_path (str | Path) – Path to the Zarr group used for caching. A new group is created if needed on the first spill.
verbose (bool) – Whether to display progress bar.
- Returns:
Updated
canvas_zarrandcount_zarrlists, where each head now has a Zarr dataset containing all accumulated rows up to this point.- Return type:
Notes
Chunking for the Zarr datasets follows the Dask chunk size along the row axis to allow efficient later vertical merging.
This function does not normalize probabilities; normalization happens in the final vertical merge via
merge_multitask_vertical_chunkwise.After spilling, upstream functions will reset in-memory
canvasandcountto free RAM and continue populating new entries.