Next: 6.3 Gauß elimination of
Up: 6.2 LU factorization
Previous: 6.2.1 Vectorization of LU
  Contents
6.2.2 Parallelization of the LU factorization
In the following we have a distributed memory computer in mind.
Here, we investigate another algorithm for the LU factorization without
Pivot search. Again,
is stored column-wise and
is stored
rowise.
Figure 6.4:
Illustration to the rang-r-modification
 |
For the purpose of parallelization, a block variant of the
rank-r-modification seems preferable.
Now,
denotes the rows and columns of the block splitting
of matrix
.
If one uses a rowise or column-wise distribution of matrix
on the processors, then the smaller the rest matrix gets the fewer
processors are used (see Fig. 6.4).
square block scattered decomposition as it is used in a parallel
version of ScaLAPACK.
Ex.: Scattered distribution
-
Now we can formulate the parallel block version by using the
scattered distribution of
.
Remark :
A non-blocking communication seems advantageous from the
point of implementation .
A distribution of blocks in direction of rows and columns
similar to the hypercube numbering (ring embedded in hypercube !) is also possible.
Next: 6.3 Gauß elimination of
Up: 6.2 LU factorization
Previous: 6.2.1 Vectorization of LU
  Contents
Gundolf Haase
2000-03-20