Thomas M. Stricker

Direct Deposit – When Message Passing meets Shared Memory Degree Type: Ph.D. in Computer Science
Advisor(s): Thomas Gross
Graduated: May 2000

Abstract:

The winner for the most efficient, i.e., the best data transfer services with the least amount of hardware support, is neither a pure coherent shared memory architecture nor a pure, coarse grain message passing distributed memory architecture. Looking at end-to-end transfer, the optimum lies in between the two extremes. Fine grain data transfer mechanisms that rely on noncoherent remote loads and stores in a global address space are highly useful mechanisms. New models of communication that separate control and data transfer are required to link the property of those data transfer mechanisms to the property of parallel programs and their correctness. My deposit and fetch model will successfully do this. The evaluation of several implementations of direct deposit indicate that direct deposit results in a major win (factor of three on a Cray T3D) for large data transfer with complex communication or memory access patterns and that the benefit is largely due to a reduction of data copies in the internals of the communication system.

The search for the optimal performance in message passing systems can be approached from two ends. First, the performance of a full function messaging library can be analyzed and the costly operations can be carefully eliminated. Second, an implementor can start from the most efficient low level primitives and add functionality until a reasonable programming model is offered. Personally, I have worked from both ends and arrived in both cases at direct deposit message passing.

Thesis Committee:
Thomas Gross (Chair)
Guy Blelloch
Dave O’Hallaron
Peter Steenkiste
Kai Li (Princeton University)

Randy Bryant, Head, Computer Science Department
James Morris, Dean, School of Computer Science

Keywords:
Parallel compilers, high performance Fortran, direct deposit message passing, deposit model, postal-model, rendezvous-model, shared memory, remote store, remote load, memory system performance, massively parallel multiprocessors, Cray T3E, Cray T3D, Intel iWarp, Intel Paragon

CMU-CS-00-133.pdf (905.23 KB) ( 179 pages)
Copyright Notice