Frequently used routines for vector and matrix operations in standard numerical software are collected in the Basic Linear Algebra Subroutines library. Those routines are highly optimized on nearly all hardware platforms and processors. Usually, they are coded in assembler and the special cache structure and inherent parallelism of the processor will be exploited.
Nowadays there exist 3 different levels of that library, each of them
supports Single, Double and Complex data arrays.
In the following we give an impression of the 3 levels and
investigate some points of interest concerning vectorization and
parallelization.