Efficiently Reproducing Distributed Workflows in Notebook-based Systems
ArticleNotebooks provide an author-friendly environment for iterative development, modular execution, and easy sharing. Distributed workflows are increasingly being authored and exe cuted in notebooks, yet sharing and reproducing them remains challenging. Even small code or parameter changes often force full end-to-end re-execution of the distributed workflow, limiting iterative development for such workloads. Current methods for improving notebook execution operate on single-node work flows, while optimization techniques for distributed workflows typically sacrifice reproducibility. We introduce NBRewind, a notebook kernel system for efficient, reproducible execution of distributed workflows in notebooks. NBRewind consists of two kernels—audit and repeat. The audit kernel performs incre mental, cell-level checkpointing to avoid unnecessary re-runs; repeat reconstructs checkpoints and enables partial re-execution including notebook cells that manage distributed workflow. Both kernel methods are based on data-flow analysis across cells. We show how checkpoints and logs when packaged as part of standardized notebook specification improve sharing and reproducibility. Using real-world case studies we show that creating incremental checkpoints adds minimal overhead and enables portable, cross-site reproducibility of notebook-based distributed workflows on HPC systems.
@article{ P-2026-96,
title = { Efficiently Reproducing Distributed Workflows in Notebook-based Systems },
author = { Talha, Azaz. Raza, Ahmad. and Malik, Tanu. },
journal = { The 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid) },
publisher = { IEEE },
year = 2026,
}