4.1.2 Matrix-by-Vector operations (BLAS2)

Next: 4.1.2.1 Parallel machines with Up: 4.1 BLAS libraries Previous: 4.1.1.2 Inner product on Contents

4.1.2 Matrix-by-Vector operations (BLAS2)

The libraries BLAS2 and BLAS3 handle full matrices and dense band/profile matrices. Here, we investigate mainly the first one.

Storage schemes for full matrices

row storage [C,Pascal]
column storage [F77]
: row storage of upper triangular submatrix
: column storage of lower submatrix

Exercise 12:

Rewrite operation $\underline{v} = A_{n\times n} \underline{x}$ by means of BLAS-routines (DDOT, DAXPY) for storage schemes i)-iii).
$\ast$ Scheme ii) without BLAS but with loop unrolling (stride 2).

We compare 2 variants of Matrix-by-Vector multiplications $\underline{v} = A_{n\times n} \underline{x}$ with a tridiagonal matrix $\begin{displaymath} A = \begin{pmatrix}b_1 & c_1 \\ a_1 & b_2 & c_2 \ ... ... \\ & & & & c_{n-1} \\ & & & a_{n-1}& b_n \end{pmatrix} \end{displaymath}$
on a vector unit.

Code a) Matrix will be stored in terms of the diagonal and the two subdiagonals, namely the vectors


    
 


 


DO 
 


 


END DO

Code b) In comparison to code a), we enlarge vectors


    






DO 
 


 


END DO

The first two lines in code a) have to be performed sequentially, this results in a significant deceleration on a vector unit. As a consequence, code b) ist faster although slightly more operations have to be performed.

Subsections

4.1.2.1 Parallel machines with distributed memory

Next: 4.1.2.1 Parallel machines with Up: 4.1 BLAS libraries Previous: 4.1.1.2 Inner product on Contents

Gundolf Haase 2000-03-20