Next: Optimizations
Up: Merge-Sort
Previous: Merge-Sort
The first algorithm implemented and tested was merge-sort. It is used
to quickly sort a large one dimensional array of numbers, and is
better suited to a parallel implementation than other sort algorithms
because of the way data is divided into equal sized blocks which can
be processed separately. A brief recursive definition of the algorithm
in pseudo C++ goes as follows[Gor97]:
arrayType sort(arrayType array[])
{
if(!array.IsBaseCase())
{
array1 = array.FirstHalf();
array2 = array.SecondHalf();
array = merge(sort(array1), sort(array2));
}
return array;
}
The base case is a single element array. The merge function takes two
sorted arrays and merges them into a single, larger sorted array. The
algorithm specifies that an array is divided in half repeatedly, until
it is broken down into one element arrays. The one element arrays are
merged into two element arrays, the two element arrays into four
element arrays, and so on until all elements have been merged back
into an array of the original size.
Figure 6:
Three Level Binary Tree with Node Numbers
|
In the distributed implementation of merge-sort examined in this
thesis, each process starts by forming a balanced binary tree, as shown
in Figure 6. The
tree height is determined by calculating the number of processes in a
tree of a certain height, and incrementing that height from 0 to the
point where a tree with one more level would require more processes
than were created. Based on the tree parameters, each process
calculates the MPI rank of its parent and children. Each process
figures out whether it is the root, a leaf, or in the middle. Also
based on tree height and the number of processes in the tree,
appropriate index[] and edges[] arrays for passing to
MPI_GRAPH_CREATE are constructed. This function creates a process
topology based on the description of a graph. The graph constrains
which processes can talk to each other. Once the tree is established,
the root begins generating the array of integers to be sorted, using
the rand() function. For measurement purposes, the root node starts a
timer before dividing the array and sending half to each child node.
Meanwhile, each process that isn't root is waiting to receive its part
of the array. When the sub-array is received, if a process isn't a
leaf, it splits its array in half, and sends half to each child
process. It then waits for each child to sort and return its
sub-array, and merges them. If a process is a leaf, it calls the
qsort() function to sort the sub-array it receives. The qsort()
function implements quicksort, which is not a distributed divide and
conquer algorithm. It is the fastest sorting algorithm for a single
processor, so it is used when there merge-sort has been distributed to
all available processors. Next, if a process isn't root, it sends its
sorted array back to its parent. If a process is root, it merges the
sorted sub-arrays and stops the timer since the sort is done!
Next: Optimizations
Up: Merge-Sort
Previous: Merge-Sort
Garth Brown
2001-01-26