Thomas M. Stricker Direct Deposit – When Message Passing meets Shared Memory Degree Type: Ph.D. in Computer Science Advisor(s): Thomas Gross Graduated: May 2000 Abstract: The winner for the most efficient, i.e., the best data transfer services with the least amount of hardware support, is neither a pure coherent shared memory architecture nor a pure, coarse grain message passing distributed memory architecture. Looking at end-to-end transfer, the optimum lies in between the two extremes. Fine grain data transfer mechanisms that rely on noncoherent remote loads and stores in a global address space are highly useful mechanisms. New models of communication that separate control and data transfer are required to link the property of those data transfer mechanisms to the property of parallel programs and their correctness. My deposit and fetch model will successfully do this. The evaluation of several implementations of direct deposit indicate that direct deposit results in a major win (factor of three on a Cray T3D) for large data transfer with complex communication or memory access patterns and that the benefit is largely due to a reduction of data copies in the internals of the communication system. The search for the optimal performance in message passing systems can be approached from two ends. First, the performance of a full function messaging library can be analyzed and the costly operations can be carefully eliminated. Second, an implementor can start from the most efficient low level primitives and add functionality until a reasonable programming model is offered. Personally, I have worked from both ends and arrived in both cases at direct deposit message passing. Thesis Committee: Thomas Gross (Chair) Guy Blelloch Dave O’Hallaron Peter Steenkiste Kai Li (Princeton University) Randy Bryant, Head, Computer Science Department James Morris, Dean, School of Computer Science Keywords: Parallel compilers, high performance Fortran, direct deposit message passing, deposit model, postal-model, rendezvous-model, shared memory, remote store, remote load, memory system performance, massively parallel multiprocessors, Cray T3E, Cray T3D, Intel iWarp, Intel Paragon CMU-CS-00-133.pdf (905.23 KB) ( 179 pages) Copyright Notice