Our deployment has proceeded in overlapping phases with sometimes parallel configurations for different use cases:
Phase 1: JupyterHub as science gateway. First, Jupyter provided users with the ability to run notebooks and kernels that could access data on NERSC's Global Filesystem (NGF), a collection of massive file systems for project- or group-level storage and user home directories. The principal use cases were smaller-scale analytics and visualization of data, limited by hardware resources. This phase leveraged repurposed hardware external to the supercomputers and so could not provide access to platform storage or interact with batch queues. But it was a start that signaled to users that Jupyter use was welcome.
Phase 2: Jupyter on a Cori login node. Through a custom JupyterHub spawner based on SSH, we gave users a means to launch notebooks on Cori from an external hub. These notebooks could access Cori platform storage, interact with batch queues, run Jupyter kernels based on user-customized containers, and more. Networking adjustments enabled direct connections from login nodes to compute nodes for persistent notebooks to interact with analytics (e.g.,
Spark, Dask) or visualization (e.g.,
yt) applications. This all helped make Cori a much more data-science friendly supercomputer. Phases 1 and 2 overlapped, and for some period of time two hubs were managed in parallel on reclaimed servers, a difficult-to-maintain situation that also confused users.
Phase 3: Jupyter as interface to an HPC center. Moving JupyterHub to a Docker container-as-a-service platform at NERSC based on
Rancher helped us address these issues, increased stability, and provided a platform for continuous enhancement. Extending JupyterHub's named-server support and leveraging community projects like
wrapspawner and
batchspawner has given us a paradigm for systematically expanding access to more shared nodes and compute nodes on Cori, but also for integrating new hardware upon delivery: New supercomputers, smaller special-purpose clusters (e.g.,
GPU clusters), and staff-only test systems. JupyterHub now serves as a single point of entry to Jupyter at NERSC for all users and staff (Figure
\ref{900402}).
Phase 4: JupyterLab as innovation platform. JupyterLab, introduced in 2018, addresses certain challenges that various software workflows face with notebooks alone. It enhances the Jupyter user experience through a powerful, modular extension framework that enables text editors, terminals, file viewers for various data formats, other custom components side-by-side with the notebook in a tabbed workspace. We have leveraged this framework to preserve core Jupyter user experience and functionality while providing enhancements that expose HPC center features in a more user-friendly way. Example of integrations and tools arising from our research partnership: