Advanced Light Source Tomography
Reproducibility and repeatability are very important for tomographic analyses at the ALS, where users need to repeat an experimental run on their own data. Scientific facility users may have limited programming skills, and often just need a pre-defined computational analysis that they can run and modify without having to develop the code from scratch. Jupyter notebooks are a great way to enable this type of user: analysis notebooks become recipes that can be minimally tweaked to run with different parameters or data. In our work with the ALS, we enabled a common, shared software environment where users could run a set of curated notebooks managed by the project
\citep{cholia}. The specialized software environment included tomography software, along with custom Jupyter widgets to support interactive path selection and viewing of sinograms or digital slices through 3D data - installing and maintaining this can be challenging for end users. Our tool,
clonenotebooks (based on
nbviewer), enabled browsing of Jupyter notebooks in repositories such as GitHub, and cloning them over to NERSC with a pointer to this managed execution environment.
Exploring parameter spaces is important in tomographic analysis, where one may need to experiment with different combinations of parameters to determine the best fit for the data.
Papermill (developed at Netflix) allow us to execute parameterized Jupyter notebooks, where certain variables can be replaced across runs. We developed a Papermill-based workflow to enable interactive exploration of parameter spaces for ALS users, where each parameter set can be applied to a common template notebook. Once a notebook with an optimal parameter set is identified, this can then be applied to additional data sets. HPC scaling is achieved through Dask, similarly to the earlier use cases.
Conclusion
Jupyter is proving to have a transformative impact on modern computational science by enabling a new integrative mode of interactive supercomputing, where code, analysis, and data all come together under a single visual interface that can seamlessly access powerful hardware resources.
The patterns illustrated in the above use cases are very flexible and can be applied to other science domains as well. We see common themes around parallel execution, reproducible environments and interactive visualization come up repeatedly in our work enabling Jupyter for science on HPC systems.
Jupyter is quickly becoming the entry point to HPC for a growing class of users. The ability to provision different resources, and integrate with HPC workload management systems through JupyterHub is an important enabler of easy-to-use interactive supercomputing.
Because of mission, design, and technological trends, supercomputers and the HPC centers that run them are still less homogeneous as a group than cloud providers. This means "one size fits all" solutions are sometimes harder to come by. And while providers of supercomputing power want to increase ease of use, they are not interested in homogenizing or concealing specialized capabilities from expert users. Developers working on Jupyter projects that may intersect with HPC especially should avoid making assumptions about HPC center policy (e.g., queue configuration, submit and run limits, privileged access) and seek input from HPC developers on how to generalize those assumptions. As long as Jupyter developers remain committed to extensibility, abstraction, and remaining agnostic about deployment options, developers at HPC centers and their research partners can help fill the gaps.
There seems to be considerable momentum around Jupyter and HPC. We have built a network of contacts at other HPC centers to collaborate with and learn from. In 2019, with the Berkeley Institute for Data Science, we hosted a Jupyter Community Workshop on
Jupyter at HPC and research facilities to kickstart that process.
At NERSC our next step is to renew focus on supporting Jupyter at new scales and on new hardware. While we do a good job of meeting the needs of several hundred users per month, we need to do more to ease the barriers to entry to analytics cluster software and parallelized visualization tools. We look forward to this work on the Perlmutter machine beginning this year.
Acknowledgments
This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231. We would like to thank the core Jupyter team, and especially our collaborators at U.C. Berkeley for technical guidance and support around the Jupyter and JupyterHub ecosystem.