5.4.4.1 Variant 1 : formal transformation

Next: 5.4.4.2 Variant 2 : Gauß-Seidel-Jacobi Up: 5.4.4 Parallel algorithms Previous: 5.4.4 Parallel algorithms Contents

5.4.4.1 Variant 1 : formal transformation

Writing the update step in Alg. 5.7 using parallel data types results in :
$\begin{algorithmus} % latex2html id marker 17184 [H]\caption{Parallel update ste... ...lor{red}\mathfrak{u}}}}^{k-1}_s \right) \end{array}\end{array}$\end{algorithmus}$
Algorithm 5.9 seems to be perfect at the first glance however an efficient parallel implementation is nearly impossible. The numbering of unknowns like in Sec. 4.3.1 produces a structure of matrices and vectors as presented in (4.2).
For a detailed analysis, we split the calculations in the update step of Alg. 5.9 according to the block structure of vectors and matrices.

$\displaystyle \begin{eqnarray}\underline{{\ensuremath{\color{green}{\sf r}}}}_V... ...j}^{k} - \sum\limits_{j=i}^{N} {K}_{I,ij}{u}_{I,j}^{k-1} \right) \end{eqnarray}$

(5.3a)

Sum $\pmb{\sum\limits_{s=1}^{P}} A_{s,i}^T(\cdot)$ means that in the update of each single component

we have to accumulate the appropriate component of the residuum. The amount of data to transfer is the same as in the update step of the $\omega$ -Jacobi iteration (

) but here we have to perform

single communications with additional

startups. Assuming the ratio $t_{\mathrm{Startup}} = c\cdot t_{\mathrm{Word}}$ we have :

$\displaystyle t_{\mathrm{Jacobi}} := t_{\mathrm{Startup}} + N_C\cdot t_{\mathrm{Word}}$	$\displaystyle \le$	$\displaystyle N_C\cdot ( t_{\mathrm{Startup}} + t_{\mathrm{Word}}) =: t_{\mathrm{GS}}$
$\displaystyle (N_C + c)\cdot t_{\mathrm{Word}}$	$\displaystyle \le$	$\displaystyle N_C (c+1)\cdot t_{\mathrm{Word}} \qquad\mathrm{if}\quad N_C \ge 1$

The ratio of the communication behavior between both iterations is

$\displaystyle t_{\mathrm{GS}} \;\ge\; N_C\left[\frac{c+1}{c+N_C}\right] \cdot ... ...tackrel{N_c\to\infty}{\longrightarrow} (c+1) \cdot t_{\mathrm{GS}} \enspace ,$

i.e. the parallel efficiency of the Gauß-Seidel iteration with update step (5.3) will be much lower than in a Jacobi iteration.

By taking advantage of the neighborhood relations in the domain decomposition, the communication ratio can be reduced to the maximal number of coupled unknowns on one interface edge/face.

For an efficient parallelization of (5.3) the requirements 3a and 3a on the FE-mesh have to be fulfilled, see also Fig. 5.4. If we cannot fulfill these conditions then we have to apply (5.3b) and (5.3d) purely sequentially for all $i=\overline{1,N_C} \makebox[0pt]{}$ ! In that case a red-black numbering of the interfaces is advisable which is not non-trivial for complex domains.

Next: 5.4.4.2 Variant 2 : Gauß-Seidel-Jacobi Up: 5.4.4 Parallel algorithms Previous: 5.4.4 Parallel algorithms Contents

Gundolf Haase 2000-03-20