3.4.2 Efficiency

Next: 3.4.3 Communication expenditure Up: 3.4 Performance evaluation of Previous: 3.4.1 Speedup and Scaleup Contents

3.4.2 Efficiency

Parallel efficiency :
Here we present only the formula for the scaled parallel efficiency, the classical efficiency can be calculated similarly.

$\begin{equation} \boxed{ E_{C,par}\;=\; \frac{S_C(P)}{P} } \end{equation}$

A desirable efficiency would be 100%, i.e., the use of

processors accelerates the code by a factor of

. The parallel efficiency for the scaleup can be given explicitely :

$\displaystyle E_{C,par} \;=\; \frac{S_C(P)}{P} \;=\; \frac{s_1+P(1-s_1)}{P}\;=\; 1-s_1+\frac{s_1}{P} > 1- s_1$

$\Longrightarrow$ Slightly decreasing efficiency with increasing number of processors.
$\Longrightarrow$ Efficiency is however at least as high as the parallel part of the program.

$\Longrightarrow$ Lecture continuation makes sense.

Numerical efficiency :
Compares the fastest sequential algorithm with the fastest parallel algorithm (implemented on one processor) via the relationship

$\displaystyle \boxed{ E_{num}\;=\; \frac{t_{parallel}}{t_{seriell}} }.$

(3.4)

From that follows the scaled efficiency of a parallel algorithm with

$\displaystyle \boxed{ E\;=\; E_{C,par} \cdot E_{num} }$

(3.5)

Above statements on scaled speedup and efficiency are very optimistic.

But :

Formula (3.4) assumes a uniform distribution of the total task on all processors. In practice this often cannot be hold or is not fulfilled during the calculations. At the same time, formula (3.4) does not include any losses due to communication.
$\Longrightarrow$ Attained efficiency is lower than theoretical.

Load balancing :
Tries to distribute the work load equally on all processors.

Opportunities to achieve a load balancing close to the optimum :

Static load balancing [a priori]:
- Distribution of meshes or data is not efficient and does not take advantage from neighborhood relations of data.
- Distribution of data with several bisection techniques (e.g., coordinate bisection, recursive spectral bisection, Kerningham-Lin-Algorithm [TK96].
  $\longrightarrow$ The better the desired data distribution should be (also regarding communication), the more expensive are the individual bisection techniques.
  $\longrightarrow$ Especially suited for optimization problems, distribution of coarse meshes/grids in multigrid algorithms.
Dynamic load balancing [Redistribution during the calculations] [Bas96] :
- High effort for load distribution (heuristics necessary).
- Needed in highly adaptiv algorithms.
New parallel operating systems ??

Next: 3.4.3 Communication expenditure Up: 3.4 Performance evaluation of Previous: 3.4.1 Speedup and Scaleup Contents

Gundolf Haase 2000-03-20