next up previous contents
Next: 3.2 Synchronization of parallel Up: 3. Parallelism in algorithms Previous: 3. Parallelism in algorithms   Contents


3.1 Some definitions

Scalability :
Simple (or better automatic) adaption of a program to a given number of processors.

Scalability is meanwhile more highly evaluated than highest efficiency (i.e., gain at computing time by parallelism) on one special architecture/topology.

Granularity :
Size of program sections, which are executable without communication with other processors.

Fine grain algorithm : $ \longrightarrow$ systolic arrays [SIMD]

Ex.: $ \omega $-Jacobi iteration

Laplace problem, discretized by means of a 5-point stencil.

$\displaystyle d_{i,j}$ $\displaystyle :=$ $\displaystyle c_{i,j} +
\left({ f_{i,j} - \omega
\left[ - c_{i-1,j} - c_{i,j-1} + 4 c_{i,j} - c_{i+1,j} - c_{i,j+1} \right]
}\right) / 4$  
$\displaystyle \underline{c}$ $\displaystyle :=$ $\displaystyle \underline{d}$  

\begin{figure}\unitlength0.02\textwidth
\mbox{}\hfill
\begin{picture}(14,12)
\s...
... \put(1.5,12){\makebox(0,0){$j$}}
%
\par\end{picture} \hfill\mbox{}
\end{figure}

Here, no data dependencies occur within one iteration.
$ \Longrightarrow$ easy to parallelize (and vectorize).
Ex.: $ \omega $-Gauß-Seidel iteration (forward)

Laplace problem, discretized by means of a 5-point stencil.

$\displaystyle \widehat{c} \makebox[0pt]{}_{i,j} \;:=\; c_{i,j} +
\left({ f_{i,...
...akebox[0pt]{}_{i,j-1} + 4 c_{i,j} - c_{i+1,j} - c_{i,j+1} \right]
}\right) / 4
$

Figure 3.1: Gauß-Seidel on a systolic array
\begin{figure}\unitlength0.03\textwidth
\begin{center}
\begin{picture}(12,12)
...
...put(1,11)(3,3){1}{\makebox(0,0){1}}
%
\par\end{picture} \end{center}\end{figure}

The values $ \widehat{c} \makebox[0pt]{}_{i-1,j}$ and $ \widehat{c} \makebox[0pt]{}_{i,j-1}$ have to be calculated previously to the calculation of $ \widehat{c} \makebox[0pt]{}_{i,j}$. 1 $ ^{\text{st}}$ iteration :
Reduction of run time :
$ \frac{t_{\text{parallel}}}{t_{\text{sequential}}}  = 
\frac{m+n-1}{mn}  \a...
...c{2\sqrt{mn}}{mn}  =  \frac{2}{\sqrt{mn}}  = 
{\cal O}(\frac{1}{\sqrt{P}})$
An optimal ratio would be $ \frac{1}{P}  =  \frac{1}{mn}$ !

2 $ ^{\text{nd}}$ iteration : This parallelization is asymptotically optimal, i.e., a large number of iterations achieves the best gain at time.


Coarse grain algorithm : $ \longrightarrow$ Multiple processor machine [MIMD]
e.g. block variant of $ \omega $-Jacobi or Gauß-Seidel iteration

Remark : The communication between processes takes considerably more time than arithmetic operations or accesses to local memory. Thus, coarse grain algorithms are of great advantage in parallelization of algorithms.

Functional parallelism :
Splitting of an algorithm into parallel executable tasks which realize different operations on the data.

Ex.: $ E  =  (d \ast e \ast f + g + c) \ast b + a +h $




Exercise 8:
Distribute the operations above on 3 processors.
Hint : Use arithmetic transformation rules (result: 4 clocks).
Data parallelism :
One part (or similar parts) of a program are executed in parallel on different sets of data.


Ex.: Block-$ \omega $-Jacobi



Special case : Geometric parallelism.

The set of data is split into subsets with respect to geometric relations of the objects of interest (nodes of discretization, particles, ...). These relations are represented in the matrix graph in case of a FEM,FDM or FVM discretization.


next up previous contents
Next: 3.2 Synchronization of parallel Up: 3. Parallelism in algorithms Previous: 3. Parallelism in algorithms   Contents
Gundolf Haase 2000-03-20