vectorized/cyclic reduction :
For simplicity assume
, otherwise the notation would be
more complicated.
DO
END DO
DO
DO
END DO
END DO
The technique of cyclic reduction is also known as
cascadic algorithm or as recursive doubling
and is quite often used in parallelization.
The vector length handled in the inner loop gets smaller with
increasing index of the outer loop but
it uses the vector unit at least partially.
The coincidence between cyclic reduction and a
REDUCE operation in a binary tree (Sect. 3.3.4)
can be seen in Fig. 4.1.