next up previous contents
Next: 5.4.4.2 Variant 2 : Gauß-Seidel-Jacobi Up: 5.4.4 Parallel algorithms Previous: 5.4.4 Parallel algorithms   Contents


5.4.4.1 Variant 1 : formal transformation

Writing the update step in Alg. 5.7 using parallel data types results in :
\begin{algorithmus}
% latex2html id marker 17184
[H]\caption{Parallel update ste...
...lor{red}\mathfrak{u}}}}^{k-1}_s \right)
\end{array}\end{array}$\end{algorithmus}
Algorithm 5.9 seems to be perfect at the first glance however an efficient parallel implementation is nearly impossible. The numbering of unknowns like in Sec. 4.3.1 produces a structure of matrices and vectors as presented in (4.2).
For a detailed analysis, we split the calculations in the update step of Alg. 5.9 according to the block structure of vectors and matrices.
$\displaystyle \begin{eqnarray}\underline{{\ensuremath{\color{green}{\sf r}}}}_V...
...j}^{k} - \sum\limits_{j=i}^{N} {K}_{I,ij}{u}_{I,j}^{k-1} \right) \end{eqnarray}$ (5.3a)

Sum  $ \pmb{\sum\limits_{s=1}^{P}} A_{s,i}^T(\cdot)$ means that in the update of each single component $ i$ we have to accumulate the appropriate component of the residuum. The amount of data to transfer is the same as in the update step of the $ \omega $-Jacobi iteration ( $ N_C=N_V+N_E$) but here we have to perform $ N_C$ single communications with additional $ N_C$ startups. Assuming the ratio $ t_{\mathrm{Startup}} = c\cdot t_{\mathrm{Word}}$ we have :
$\displaystyle t_{\mathrm{Jacobi}}  := 
t_{\mathrm{Startup}} + N_C\cdot t_{\mathrm{Word}}$ $\displaystyle \le$ $\displaystyle N_C\cdot ( t_{\mathrm{Startup}} + t_{\mathrm{Word}})
 =:  t_{\mathrm{GS}}$  
$\displaystyle (N_C + c)\cdot t_{\mathrm{Word}}$ $\displaystyle \le$ $\displaystyle N_C (c+1)\cdot t_{\mathrm{Word}}
\qquad\mathrm{if}\quad N_C \ge 1$  

The ratio of the communication behavior between both iterations is

$\displaystyle t_{\mathrm{GS}} \;\ge\;
N_C\left[\frac{c+1}{c+N_C}\right] \cdot ...
...tackrel{N_c\to\infty}{\longrightarrow}
(c+1) \cdot t_{\mathrm{GS}} \enspace ,
$

i.e. the parallel efficiency of the Gauß-Seidel iteration with update step (5.3) will be much lower than in a Jacobi iteration.

By taking advantage of the neighborhood relations in the domain decomposition, the communication ratio $ N_C$ can be reduced to the maximal number of coupled unknowns on one interface edge/face.

For an efficient parallelization of (5.3) the requirements 3a and 3a on the FE-mesh have to be fulfilled, see also Fig. 5.4. If we cannot fulfill these conditions then we have to apply (5.3b) and (5.3d) purely sequentially for all $ i=\overline{1,N_C} \makebox[0pt]{}$ ! In that case a red-black numbering of the interfaces is advisable which is not non-trivial for complex domains.


next up previous contents
Next: 5.4.4.2 Variant 2 : Gauß-Seidel-Jacobi Up: 5.4.4 Parallel algorithms Previous: 5.4.4 Parallel algorithms   Contents
Gundolf Haase 2000-03-20