By keeping all of the new code in a Notebook initially, it was easy to iteratively test what was working and check results. The full dataset is split as evenly as possible based on the number of workers available, and then each of these chunks is passed to the wrapper function on a worker.  That change made a noticeable improvement in throughput from the original implementation. Building a parallel implementation for the Bragg disk detection code significantly improved processing time for large data. Now a 300GB dataset takes only minutes to process instead of days. Parallelized Bragg Disk routines were run on 40 Cori nodes (1280 workers), and results are synthesized and visualized in the Jupyter Notebook.