Start Benchmarking B: Matrix-Vector Product (row wise access)

 <A[17,.],x> = 8000


N = 8000         M = 8000
Time for Nloops: 11
Timing in sec. : 0.028
GFLOPS         : 4.3
GiByte/s       : 34



Start Benchmarking B: Matrix-Vector Product (row wise access) [parallel]

 <A[17,.],x> = 8e+03


N = 8000         M = 8000
Time for Nloops: 7.6
Timing in sec. : 0.019
GFLOPS         : 6.3
GiByte/s       : 50



Start Benchmarking C: Matrix-Matrix Product

 C[10,15] = 4e+03


N = 4000         M = 4000        L = 4000
Time for Nloops: 28
Timing in sec. : 28
GFLOPS         : 4.2
GiByte/s       : 0.013



Start Benchmarking C: Matrix-Matrix Product [parallel]

 C[10,15] = 4e+03


N = 4000         M = 4000        L = 4000
Time for Nloops: 8.2
Timing in sec. : 8.2
GFLOPS         : 15
GiByte/s       : 0.044



Start Benchmarking D: polynomial evaluation

 p(x[0]) = 1


p = 10000        N = 100000
Time for Nloops: 16
Timing in sec. : 1
GFLOPS         : 1.8
GiByte/s       : 14



Start Benchmarking D: polynomial evaluation [parallel]

 p(x[0]) = 1


p = 10000        N = 100000
Time for Nloops: 16
Timing in sec. : 1.1
GFLOPS         : 1.8
GiByte/s       : 14



Comparing the runtime (in sec) for inner product and sum with and without parallelization

k = 3   N = 1000
sum     6.1e-07 inner_prod      1.4e-06
sum_par 2.1e-06 inner_prod_par  0.00042

k = 4   N = 10000
sum     3e-06   inner_prod      3.1e-06
sum_par 1.9e-06 inner_prod_par  5.5e-06

k = 5   N = 100000
sum     2.9e-05 inner_prod      6.2e-05
sum_par 2.4e-05 inner_prod_par  2.2e-05

k = 6   N = 1000000
sum     0.00056 inner_prod      0.001
sum_par 0.00029 inner_prod_par  0.0024

k = 7   N = 10000000
sum     0.0045  inner_prod      0.0082
sum_par 0.0029  inner_prod_par  0.0065

k = 8   N = 100000000
sum     0.044   inner_prod      0.077
sum_par 0.032   inner_prod_par  0.063