task8

2025-11-11 15:50:51 +01:00 · 2025-11-11 15:50:51 +01:00 · 3882aee07a
commit 3882aee07a
parent 3763c53dab
71 changed files with 160045 additions and 0 deletions
--- a/ex3/ex3_results.txt
+++ b/ex3/ex3_results.txt
@ -0,0 +1,222 @@
+
+-------------- Task 1 --------------
+
+-------------------------------------------------------------
+STREAM version $Revision: 5.10 $
+-------------------------------------------------------------
+This system uses 8 bytes per array element.
+-------------------------------------------------------------
+Array size = 80000000 (elements), Offset = 0 (elements)
+Memory per array = 610.4 MiB (= 0.6 GiB).
+Total memory required = 1831.1 MiB (= 1.8 GiB).
+Each kernel will be executed 20 times.
+    The *best* time for each kernel (excluding the first iteration)
+    will be used to compute the reported bandwidth.
+-------------------------------------------------------------
+Your clock granularity/precision appears to be 1 microseconds.
+Each test below will take on the order of 116886 microseconds.
+    (= 116886 clock ticks)
+Increase the size of the arrays if this shows that
+you are not getting at least 20 clock ticks per test.
+-------------------------------------------------------------
+WARNING -- The above is only a rough guideline.
+For best results, please be sure you know the
+precision of your system timer.
+-------------------------------------------------------------
+Function    Best Rate MB/s  Avg time     Min time     Max time
+Copy:           29569.4     0.048585     0.043288     0.059164
+Scale:          17644.0     0.082248     0.072546     0.102548
+Add:            21030.1     0.100620     0.091298     0.124700
+Triad:          21230.7     0.100758     0.090435     0.120631
+-------------------------------------------------------------
+Solution Validates: avg error less than 1.000000e-13 on all three arrays
+-------------------------------------------------------------
+./flops.exe
+
+    FLOPS C Program (Double Precision), V2.0 18 Dec 1992
+
+    Module     Error        RunTime      MFLOPS
+                            (usec)
+        1      4.0146e-13      0.0024   5827.9076
+        2     -1.4166e-13      0.0007  10037.8942
+        3      4.7184e-14      0.0039   4371.9185
+        4     -1.2557e-13      0.0034   4355.5711
+        5     -1.3800e-13      0.0066   4415.6439
+        6      3.2380e-13      0.0065   4441.6299
+        7     -8.4583e-11      0.0053   2277.1707
+        8      3.4867e-13      0.0069   4367.6094
+
+    Iterations      =  512000000
+    NullTime (usec) =     0.0000
+    MFLOPS(1)       =  7050.6178
+    MFLOPS(2)       =  3461.6233
+    MFLOPS(3)       =  4175.0442
+    MFLOPS(4)       =  4389.7311
+
+-------------- Task 2 --------------
+
+Memory needed (double 64-bit, 8 bytes):
+(A) (2N + 1) * 8 bytes
+(B) (M*N + M + N) * 8 bytes
+(C) (M*L + L*N + M*N) * 8 bytes
+(D) (N + N + p) * 8 bytes
+
+Floating point operations:
+(A) 2N
+(B) M * 2N
+(C) M * 2L * N
+(D) 2 * N * p (Horner Schema)
+
+Read/Write operations:
+(A) Read: 2N         Write: 1
+(B) Read: M*2N       Write: M*N
+(C) Read: M*2L*N     Write: M*L*N
+(D) Read: 2*N*p      Write: N*P
+
+-------------- Task 3 --------------
+
+Functions implemented in task_3.cpp
+
+-------------- Task 4 --------------
+
+----- Benchmark (A) -----
+Memory allocated  : 0.745 GByte
+Duration per loop : 0.036 sec
+GFLOPS            : 2.579
+GiByte/s          : 20.630
+-------------------------
+----- Benchmark (B) -----
+Memory allocated  : 0.715 GByte
+Duration per loop : 0.105 sec
+GFLOPS            : 1.704
+GiByte/s          : 6.818
+-------------------------
+----- Benchmark (C) -----
+Memory allocated  : 0.026 GByte
+Duration per loop : 0.459 sec
+GFLOPS            : 4.062
+GiByte/s          : 0.057
+-------------------------
+----- Benchmark (D) -----
+Memory allocated  : 0.015 GByte
+Duration per loop : 0.310 sec
+GFLOPS            : 1.201
+GiByte/s          : 0.048
+-------------------------
+
+
+-------------- Task 5 --------------
+
+----- Benchmark norm -----
+||x|| = 897124.301552
+Memory allocated  : 0.373 GByte
+Duration per loop : 0.022 sec
+GFLOPS            : 4.222
+GiByte/s          : 16.890
+-------------------------
+What do you observe? Why?
+-> Faster per loop than scalar product, only loads elements of 1 vector, instead of 2.
+
+-------------- Task 6 --------------
+
+Benchmarks using cBLAS
+----- Benchmark (A) -----
+Memory allocated  : 0.745 GByte
+Duration per loop : 0.023 sec
+GFLOPS            : 4.006
+GiByte/s          : 32.052
+-------------------------
+----- Benchmark (B) -----
+Memory allocated  : 0.715 GByte
+Duration per loop : 0.026 sec
+GFLOPS            : 7.010
+GiByte/s          : 28.045
+-------------------------
+----- Benchmark (C) -----
+Memory allocated  : 0.026 GByte
+Duration per loop : 0.020 sec
+GFLOPS            : 91.320
+GiByte/s          : 1.278
+-------------------------
+
+
+-------------- Task 7 --------------
+
+A =
+4.000000 1.000000 0.250000 0.111111 0.062500 
+1.000000 4.000000 1.000000 0.250000 0.111111 
+0.250000 1.000000 4.000000 1.000000 0.250000 
+0.111111 0.250000 1.000000 4.000000 1.000000 
+0.062500 0.111111 0.250000 1.000000 4.000000 
+
+
+b =
+0.000000 1.000000 
+0.000000 1.000000 
+0.000000 1.000000 
+0.000000 1.000000 
+0.000000 1.000000 
+
+
+L + U =
+4.000000 1.000000 0.250000 0.111111 0.062500 
+0.250000 3.750000 0.937500 0.222222 0.095486 
+0.062500 0.250000 3.750000 0.937500 0.222222 
+0.027778 0.059259 0.250000 3.749370 0.937050 
+0.015625 0.025463 0.059259 0.249922 3.749234 
+
+
+x =
+0.000000 0.196259 
+0.000000 0.148391 
+0.000000 0.151272 
+0.000000 0.148391 
+0.000000 0.196259 
+
+
+Check solution:
+A * x = 
+0.000000 1.000000 
+0.000000 1.000000 
+0.000000 1.000000 
+0.000000 1.000000 
+0.000000 1.000000 
+
+
+N    =   | 1      | 2      | 4      | 8      | 16     | 32 
+---------|--------|--------|--------|--------|--------|-------
+Nrhs = 2 | 0.0047 | 0.0045 | 0.0046 | 0.0130 | 0.0203 | 0.0476
+Nrhs = 4 | 0.0027 | 0.0031 | 0.0033 | 0.0046 | 0.0085 | 0.0250
+Nrhs = 8 | 0.0035 | 0.0035 | 0.0045 | 0.0061 | 0.0119 | 0.0300
+Nrhs = 16 | 0.0085 | 0.0062 | 0.0221 | 0.0113 | 0.0599 | 0.0757
+Nrhs = 32 | 0.0122 | 0.0165 | 0.0112 | 0.0123 | 0.0238 | 0.0834
+Nrhs = 64 | 0.0072 | 0.0078 | 0.0164 | 0.0133 | 0.0421 | 0.0666
+Nrhs = 128 | 0.0073 | 0.0189 | 0.0269 | 0.0199 | 0.0337 | 0.1041
+Nrhs = 256 | 0.0107 | 0.0135 | 0.0279 | 0.0351 | 0.0582 | 0.1438
+Nrhs = 512 | 0.0276 | 0.0174 | 0.0237 | 0.1027 | 0.1113 | 0.2417
+
+For fixed n, the solution time per rhs does not slow down consistently and scales very well.
+Its faster than expected.
+
+
+-------------- Task 8 --------------
+
+
+ There are 1 processes running.
+
+Intervalls: 100 x 100
+
+ Start Jacobi solver for 10201 d.o.f.s
+aver. Jacobi rate :  0.997922  (1000 iter)
+final error: 0.124971 (rel)   0.000194029 (abs)
+JacobiSolve: timing in sec. : 0.079399
+ASCI file  square_100.txt  opened
+17361  2  34320  3
+
+ Start Jacobi solver for 17361 d.o.f.s
+aver. Jacobi rate :  0.998401  (1000 iter)
+final error: 0.201744 (rel)   0.000265133 (abs)
+JacobiSolve: timing in sec. : 0.18853
+
+
+