Jonathan C. Hardwick

Practical Parallel Divide-and-Conquer Algorithms Degree Type: Ph.D. in Computer Science
Advisor(s): Guy Blelloch
Graduated: December 1997

Abstract:

Nested data parallelism has been shown to be an important feature of parallel languages, allowing the concise expression of algorithms that operate on irregular data structures such as graphs and sparse matrices. However, previous nested data-parallel languages have relied on a vector PRAM implementation layer that cannot be efficiently mapped to MPPs with high inter-processor latency. This thesis shows that by restricting the problem set to that of data-parallel divide-and-conquer algorithms I can maintain the expressibility of full nested data-parallel languages while achieving good efficiency on current distributed-memory machines.

Specifically, I define the team parallel model, which has four main features: data-parallel operations within teams of processors, the subdivision of these teams to match the recursion of a divide-and-conquer algorithm, efficient single-processor code to exploit existing serial compiler technology, and an active load-balancing system to cope with irregular algorithms. I also describe Machiavelli, a toolkit for parallel divide-and-conquer algorithms that implements the team parallel model. Machiavelli consists of simple parallel extensions to C, and is based around a distributed vector datatype. A preprocessor generates both serial and parallel versions of the code, using MPI as its parallel communication mechanism to assure portability across machines. Load balancing is performed by shipping function calls between processors.

Using a range of algorithm kernels (including sorting, finding the convex hull of a set of points, computing a graph separator, and matrix multiplication), I demonstrate optimization techniques for the implementation of divide-and-conquer algorithms. An important feature of team parallelism is its ability to use efficient serial algorithms supplied by the user as the base case of recursion. I show that this allows parallel algorithms to achieve good speedups over efficient serial code, and to solve problems much larger than those which can be handled on one processor. As an extended example, a Delaunay triangulation algorithm implemented using the team-parallel model is three times as efficient as previous parallel algorithms.

Keywords:
Divide-and-conquer, parallel algorithm, language model, team parallelism, Machiavelli