> sudo apt-get install libatlas-dev libatlas-base-dev
Profiling:
> g++ -g *.cpp
> g++ *.o -o main.GCC_
# [no '-pg' in link command]> valgrind --tool=callgrind --simulate-cache=yes ./main.GCC_
> kcachegrind `ls -1tr callgrind.out.* |tail -1`
> icpc -g *.cpp
> icpc *.o -o main.ICC_
> amplxe-gui &
0
. -pg
in compile flags as
well as in link flagsOriginal code for inner product:
double scalar(const int N, const double x[], const double y[]) { double sum = 0.0; for (int i=0; i<N; ++i) { sum += x[i]*y[i]; } return sum; }
int main()
{
...
double s = scalar(n,a,b);
...
}
g++ -O2 skalar.cpp
icpc -O2
skalar.cpp
pgc++ -fast skalar.cpp
clang++ -O3 skalar.cpp
Performance improvements:
__restrict__
,__builtin_prefetch
,
more /proc/cpuinfo