Next: 4.1.3.3 Parallelization of
Up: 4.1.3 Matrix-by-Matrix-operations (BLAS3)
Previous: 4.1.3.1 Matrix-by-Matrix multiplication
  Contents
- inner product
DO
DO
END DO
END DO
- Rowise access on
, columnwise access on
memory access conflicts.
- Access on
is scalar
poor vectorization properties.
- Parallelization
Distribution of matrices.
- middle product
DO
DO
END DO
END DO
- In the inner loop
acts as a scalar,
the remaining terms are vectors.
Daxpy
-
Efficient use of vector pipes and
memory bandwidth from cache.
- The loop on
represents the Matrix-by-Vector operation
from BLAS2.
- Columnwise access on
and
is required.
- If
and
are rowise stored then the subscripts
and
have to be exchanged in the algorithm.
- outer product
DO
DO
END DO
END DO
- Similar to middle product, vector
is constant
in the
-loop.
Gundolf Haase
2000-03-20