next up previous contents
Next: 4.1.2.1 Parallel machines with Up: 4.1 BLAS libraries Previous: 4.1.1.2 Inner product on   Contents

4.1.2 Matrix-by-Vector operations (BLAS2)

The libraries BLAS2 and BLAS3 handle full matrices and dense band/profile matrices. Here, we investigate mainly the first one.

Storage schemes for full matrices $ A$.
  1. row storage [C,Pascal]
  2. column storage [F77]
  3. $ A  =  A^T$ : row storage of upper triangular submatrix $ A_U$
  4. $ A  =  A^T$ : column storage of lower submatrix $ A_L$



Exercise 12:
Rewrite operation $ \underline{v}  =  A_{n\times n} \underline{x}$ by means of BLAS-routines (DDOT, DAXPY) for storage schemes i)-iii).
$ \ast$ Scheme ii) without BLAS but with loop unrolling (stride 2).


We compare 2 variants of Matrix-by-Vector multiplications $ \underline{v}  =  A_{n\times n} \underline{x}$ with a tridiagonal matrix \begin{displaymath}
A  = 
\begin{pmatrix}b_1 & c_1 \\
a_1 & b_2 & c_2 \ ...
... \\
& & & & c_{n-1} \\
& & & a_{n-1}& b_n
\end{pmatrix}
\end{displaymath}
on a vector unit.

Code a) Matrix will be stored in terms of the diagonal and the two subdiagonals, namely the vectors $ a(1:n-1)$, $ b(1:n)$, $ c(1:n-1)$.

    
$ v_1 \;=\; b_1 \ast x_1 + c_1 \ast x_2 $ 

$ v_n \;=\; b_n \ast x_n + a_{n-1} \ast x_{n-1} $
DO $ i = 2, n-1$
$ y_i \;=\; a_{i-1} \ast x_{i-1} + b_i \ast x_i + c_i \ast x_{i+1} $
END DO
Code b) In comparison to code a), we enlarge vectors $ a,c,x$ :
$ a(0:n-1)$, $ b(1:n)$, $ c(1:n), x(0:n+1)$.

    
$ x_0 \;=\; x_{n+1} \;=\;0 $

$ a_0 \;=\; c_n \;=\;0 $
DO $ i  =  1,  n $
$ y_i \;=\; a_{i-1} \ast x_{i-1} + b_i \ast x_i + c_i \ast x_{i+1} $
END DO
The first two lines in code a) have to be performed sequentially, this results in a significant deceleration on a vector unit. As a consequence, code b) ist faster although slightly more operations have to be performed.

Subsections
next up previous contents
Next: 4.1.2.1 Parallel machines with Up: 4.1 BLAS libraries Previous: 4.1.1.2 Inner product on   Contents
Gundolf Haase 2000-03-20