next up previous contents
Next: 5.7.3 Parallel algorithm Up: 5.7.2 Parallel components of Previous: 5.7.2.3 Smoothing   Contents

5.7.2.4 Coarse grid solver

The main problem in parallelizing a (multilevel or) multigrid algorithm consists in solving the coarse grid system exactly.

$\displaystyle \pmb{\sum\limits_{s=1}^P} A_{s,1}^T {\ensuremath{\color{green}{\s...
...P} A_{s,1}^T   \underline{{\ensuremath{\color{green}{\sf f}}}}_{s,1} \enspace.$ (5.10)

The following ideas are common :
  1. Direct solver - sequential

    The coarsest grid is so small that one processor  $ \ensuremath{\mathbb{P}}_0$ can store it additionally.
    $ \bullet$ Accumulate  $ {\ensuremath{\color{green}{\sf K}}}_1$ (REDUCE).
    $ \bullet$ Accumulate  $ \underline{{\ensuremath{\color{green}{\sf f}}}}_1$ (REDUCE).
    $ \bullet$ Processor  $ \ensuremath{\mathbb{P}}_0$ solves $ {\ensuremath{\color{red}\mathfrak{K}}}_1\cdot\underline{{\ensuremath{\color{red}\mathfrak{u}}}}_1  =  \underline{{\ensuremath{\color{red}\mathfrak{f}}}}_1$ .
    $ \bullet$ Distribute $ \underline{{\ensuremath{\color{red}\mathfrak{u}}}}_1$ to all other processors (SCATTER).

    This algorithm is purely sequential !
  2. Direct solver - parallel

    We store parts of the accumulated matrix on all processors and solve the coarse grid system by some direct parallel solver. Therein, load inbalancies will appear.
  3. Iterative solver - parallel

    Here, no accumulation of $ {\ensuremath{\color{green}{\sf K}}}_1$ and $ \underline{{\ensuremath{\color{green}{\sf f}}}}_1$ is necessary but in each iteration at least one accumulation has to be performed (see 5.1 - 5.6).

    Caution : If cg-like methods are used as coarse grid solvers then a symmetric multigrid operator cannot be achieved. To preserve the symmetry one should combine, e.g., a Gauß-Seidel forward iteration with the appropriate backward iteration (SSOR).

The (usually poor) ratio between communication and arithmetic behavior of parallel machines results in a bad parallel efficiency of the coarse grid solver. This can be observed if one uses a W-cycle on 32 or more processors because that multigrid cycle works on the coarser grids more often then a V-cycle.
One idea to overcome this bottleneck consists in the distribution of the coarse (or the coarser) grid on less then $ P$ processors. The disadvantage of this approach is the appearance of communication between certain grids $ \ell_0$ and  $ \ell_0 + 1$ in the intergrid transfer.

Communication in coarse grid solver !

next up previous contents
Next: 5.7.3 Parallel algorithm Up: 5.7.2 Parallel components of Previous: 5.7.2.3 Smoothing   Contents
Gundolf Haase 2000-03-20