next up previous contents
Next: Optimizations Up: Merge-Sort Previous: Merge-Sort

Description of the Algorithm

The first algorithm implemented and tested was merge-sort. It is used to quickly sort a large one dimensional array of numbers, and is better suited to a parallel implementation than other sort algorithms because of the way data is divided into equal sized blocks which can be processed separately. A brief recursive definition of the algorithm in pseudo C++ goes as follows[Gor97]:

arrayType sort(arrayType array[])

{
if(!array.IsBaseCase())
{
array1 = array.FirstHalf();
array2 = array.SecondHalf();
array = merge(sort(array1), sort(array2));
}
return array;
}
The base case is a single element array. The merge function takes two sorted arrays and merges them into a single, larger sorted array. The algorithm specifies that an array is divided in half repeatedly, until it is broken down into one element arrays. The one element arrays are merged into two element arrays, the two element arrays into four element arrays, and so on until all elements have been merged back into an array of the original size.

  
Figure 6: Three Level Binary Tree with Node Numbers
\includegraphics[bb = 20 20 592 428,
scale=0.4]{ParallelPaper1andPicturesFig1.eps}

In the distributed implementation of merge-sort examined in this thesis, each process starts by forming a balanced binary tree, as shown in Figure 6. The tree height is determined by calculating the number of processes in a tree of a certain height, and incrementing that height from 0 to the point where a tree with one more level would require more processes than were created. Based on the tree parameters, each process calculates the MPI rank of its parent and children. Each process figures out whether it is the root, a leaf, or in the middle. Also based on tree height and the number of processes in the tree, appropriate index[] and edges[] arrays for passing to MPI_GRAPH_CREATE are constructed. This function creates a process topology based on the description of a graph. The graph constrains which processes can talk to each other. Once the tree is established, the root begins generating the array of integers to be sorted, using the rand() function. For measurement purposes, the root node starts a timer before dividing the array and sending half to each child node. Meanwhile, each process that isn't root is waiting to receive its part of the array. When the sub-array is received, if a process isn't a leaf, it splits its array in half, and sends half to each child process. It then waits for each child to sort and return its sub-array, and merges them. If a process is a leaf, it calls the qsort() function to sort the sub-array it receives. The qsort() function implements quicksort, which is not a distributed divide and conquer algorithm. It is the fastest sorting algorithm for a single processor, so it is used when there merge-sort has been distributed to all available processors. Next, if a process isn't root, it sends its sorted array back to its parent. If a process is root, it merges the sorted sub-arrays and stops the timer since the sort is done!

next up previous contents
Next: Optimizations Up: Merge-Sort Previous: Merge-Sort
Garth Brown
2001-01-26