Next: 4.1.3.2 Algorithms for
Up: 4.1.3 Matrix-by-Matrix-operations (BLAS3)
Previous: 4.1.3 Matrix-by-Matrix-operations (BLAS3)
  Contents
6 different algorithms can be realized
with respect to
- Rowise or columnwise access on matrix elements
- Available basic routines from BLAS1 like inner product or triade.
Concrete efficient implementation strongly
depends on hardware and compiler chosen :
- Memory bank conflicts :
The global memory of a computer consists of several (1,2,4,8,...)
memory banks.
A memory bank will be locked for a certain time
(named bank cycle time) after a load/store of an address.
Any further access on that memory bank has to wait until it will be
unlocked again.
Let us regard the loading of a vector (matrix column/row)
where the loading of one vector element takes one cycle
and the bank cycle time is about 4 cycles.
We investigate two different storage schemes for that vector
Scheme a)
The number of memory banks and the bank cycle time
determine those strides which cause conflicts.
Scheme b)
Here, the appearance of conflicts will be determined by the
length of subvectors stored on the memory banks.
Thus, accessing the vector with stride 1,2,3 causes conflicts in the
example.
Memory access conflicts lower significantly the performance already on
processors with cache.
This applies in particular to vector units.
- Page fault :
A memory page from which an address should be loaded
is not available in the fastest memory (level-1/level-2 cache),
i.e., that page has to be transfered from the slower memory.
Nowadays, processors try to omit page faults by loading in advance those
pages following an actual field of addresses.
The hit rate provided by several profiling tools measures the
rate of accesses on fast memory without page faults.
- Small amount of communication (no fine grain communication)
on distributed memory machines.
- Length of pipes in a vector unit.
Next: 4.1.3.2 Algorithms for
Up: 4.1.3 Matrix-by-Matrix-operations (BLAS3)
Previous: 4.1.3 Matrix-by-Matrix-operations (BLAS3)
  Contents
Gundolf Haase
2000-03-20