Dmitriy Morozov (LBNL)
Tom Peterka (ANL)
Objectives
- DIY is a programming model and runtime for block-parallel analytics on DOE leadership machines.
- Its main abstraction is block parallelism: all parallel operations and communications are expressed in terms of blocks, not processors.
- This enables the same program to run in- and out-of-core with single or multiple threads.
Impact
- DIY enabled Delaunay and Voronoi tessellation of cosmology dark matter particles to 128K processes and improved performance by 50X [2].
- DIY enabled ptychographic phase retrieval of synchrotron X-ray images on 128 GPUs in real time. [3].
- Honorable mention paper at LDAV 2016 [1].
Current Activities
Components of DIY and its place in the software stack are designed to address the data movement challenge in extreme-scale data analysis (right).
- Enabling VTK-m by DIY-ing various VTK distributed-memory filters: parallel resampling, multipart dataset redistribution, and stream tracing.
- Ongoing development to prepare for exascale: relaxing synchronization, using deeper memory hierarchy, compatibility with many-core thread models.
[1] Morozov and Peterka, Block-Parallel Data Analysis with DIY2, LDAV 2016.
[2] Morozov and Peterka, Efficient Delaunay Tessellation through K-D Tree Decomposition, SC16.
[3] Nashed et al., Parallel Ptychographic Reconstruction, Optics Express 2014.
Notes
Investigators: Dmitriy Morozov (LBNL) and Tom Peterka (ANL)
Motivation: The rapid growth of computing and sensing capabilities is generating enormous amounts of scientific data. Parallelism can reduce the time required to analyze these data, and distributed memory allows datasets larger than even the largest-memory nodes to be accommodated. The most familiar parallel computing model in distributed-memory environments is arguably data-parallel domain-decomposed message passing. In other words, divide the input data into subdomains, assign subdomains to processors, and communicate between processors with messages. Complicating data-intensive analysis, however, is the fact that it occurs in multiple environments ranging from supercomputers to smaller clusters and clouds to scientists’ workstations and laptops. Hence, we need to develop portable analysis codes that are highly scalable on HPC architectures while remaining accessible on smaller machines with far fewer cores and memory capacity.
Solution: DIY is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof.
Impact: Our contributions are a set of high-level programming abstractions—block decomposition, block execution in processes and threads, communication patterns over blocks, parallel I/O—and high-level algorithms built on these abstractions. Numerous open-source scientific data analysis applications using DIY have already been released and published:
Publications that use DIY:
-Morozov, D. and Peterka, T: Efficient Delaunay Tessellation through K-D Tree Decomposition. SC16.
-Peterka, T., Morozov, D., Phillips, C.: High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation. SC14.
-Lu, K., Shen, H.-W., Peterka, T.: Scalable Computation of Stream Surfaces on Large Scale Vector Fields. SC14.
-Chaudhuri, A., Lee-T.-Y., Shen, H.-W., Peterka, T.: Efficient Indexing and Querying of Distributions for Visualizing Large-scale Data. Proceedings of IEEE PacificVis 2014.
-Nashed, Y., Vine, D., Peterka, T., Deng, J., Ross, R., Jacobsen, C.: Parallel Ptychographic Reconstruction. Optics Express 2014.
-Sewell, C., Meredith, J., Moreland, K., Peterka, T., DeMarle, D., Lo, Li-ta, Ahrens, J., Maynard, R., Geveci, B.: The SDAV Software Frameworks for Visualization and Analysis on Next-Generation Multi-Core and Many-Core Architectures. UltraVis 2012.
-Gyulassy, A., Peterka, T., Pascucci, V., Ross, R.: The Parallel Computation of Morse-Smale Complexes. IPDPS 2012.
-Nouanesengsy, B., Lee, T.-Y., Lu, K., Shen, H.-W., Peterka, T.: Parallel Particle Advection and FTLE Computation for Time-Varying Flow Fields. SC12.
-Chaudhuri, A., Lee-T.-Y., Zhou, B., Wang, C., Xu, T., Shen, H.-W., Peterka, T., Chiang, Y.-J.: Scalable Computation of Distributions from Large Scale Data Sets. LDAV 2012.
-Peterka, T., Ross, R.: Versatile Communication Algorithms for Data Analysis. IMUDI 2012.
-Peterka, T., Ross, R., Nouanesengsy, B., Lee, T.-Y., Shen, H.-W., Kendall, W., Huang, J.: A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields. IPDPS 2011.
Publications about DIY itself:
-Morozov, D. and Peterka, T.: Block-Parallel Data Analysis with DIY2. LDAV 2016.
-Peterka, T., Ross, R., Kendall, W., Gyulassy, A., Pascucci, V., Shen, H.-W., Lee, T.-Y., Chaudhuri, A.: Scalable Parallel Building Blocks for Custom Data Analysis. LDAV 2011.