6.2.2 Parallelization of the LU factorization

Next: 6.3 Gauß elimination of Up: 6.2 LU factorization Previous: 6.2.1 Vectorization of LU Contents

6.2.2 Parallelization of the LU factorization

In the following we have a distributed memory computer in mind.

Here, we investigate another algorithm for the LU factorization without Pivot search. Again,

is stored column-wise and

is stored rowise.
$\begin{algorithmus} % latex2html id marker 26997 [H]\caption{Rank-r-modification... ..._{i,j}$\ \\ \>\> END DO \\ \> END DO \\ END DO \end{tabbing}\end{algorithmus}$

**Figure 6.4:** Illustration to the rang-r-modification
$\begin{figure}\unitlength0.05\textwidth \mbox{}\hfill \begin{picture}(10,10) ... ...(4.5,2.5){\makebox(0,0){$\ell_{k,i}$}} \end{picture} \hfill\mbox{} \end{figure}$

For the purpose of parallelization, a block variant of the rank-r-modification seems preferable. Now, $A_{k,i}$ denotes the rows and columns of the block splitting of matrix

.
$\begin{algorithmus} % latex2html id marker 27054 \caption{Block variant of rank-... ...l=\overline{i+1,n} \makebox[0pt]{})$\ \\ END DO \end{tabbing}\end{algorithmus}$

If one uses a rowise or column-wise distribution of matrix on the processors, then the smaller the rest matrix gets the fewer processors are used (see Fig. 6.4).

$\Downarrow$

square block scattered decomposition as it is used in a parallel version of ScaLAPACK.

Ex.: Scattered distribution

$\begin{figure}\mbox{}\hfill \unitlength0.1\textwidth \begin{picture}(10,4) \mu... ...}_{(i-1) \mod P_x, (j-1) \mod P_y}$}} % \end{picture}\hfill\mbox{} \end{figure}$

Now we can formulate the parallel block version by using the scattered distribution of

.
$\begin{algorithmus} % latex2html id marker 27159 [H] \caption{Block wise rank-r-... ...i}\cdot U_{i,\ell}$\ \\ END DO \end{tabbing}\mbox{}\\ [-5ex] \end{algorithmus}$

Remark : A non-blocking communication seems advantageous from the point of implementation . A distribution of blocks in direction of rows and columns similar to the hypercube numbering (ring embedded in hypercube !) is also possible.

Next: 6.3 Gauß elimination of Up: 6.2 LU factorization Previous: 6.2.1 Vectorization of LU Contents

Gundolf Haase 2000-03-20