next up previous contents
Next: 4.2 Operations with sparse Up: 4.1.3 Matrix-by-Matrix-operations (BLAS3) Previous: 4.1.3.2 Algorithms for   Contents

4.1.3.3 Parallelization of $ C_{n\times n} \;:=\; C_{n\times n}
+ A_{n\times n} \ast B_{n\times n} $

Starting point :
- outer product with rowise access
- Block matrices $ C^{i,j}$, $ A^{i,j}$, $ B^{i,j}$
- 2D-Torus ( $ n \times n$) Topology, distributed memory


     DO 
$ k \;:=\; 1  ,  n $ 

DO $ i \;:=\; 1  ,  n $
$ C_{i,\ast} \;:=\; C_{i,\ast} + A_{i,k} \ast B_{k,\ast}$
END DO
END DO
The following algorithm was proposed by Fox [Fox88,KGGK94].

\begin{algorithmus}
% latex2html id marker 8653
[H]\caption{broadcast-multiply-r...
...B} \makebox[0pt]{}_{i,j}$\ to ${\cal P}^{[i-1,j]}$\end{itemize}\end{algorithmus}

Remark : Finally, $ \widetilde{B} \makebox[0pt]{}_{i,j} = B_{i,j}$ holds.



Gundolf Haase 2000-03-20