This commit is contained in:
dino.celebic 2025-11-11 15:50:51 +01:00
commit 3882aee07a
71 changed files with 160045 additions and 0 deletions

222
ex3/ex3_results.txt Normal file
View file

@ -0,0 +1,222 @@
-------------- Task 1 --------------
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 80000000 (elements), Offset = 0 (elements)
Memory per array = 610.4 MiB (= 0.6 GiB).
Total memory required = 1831.1 MiB (= 1.8 GiB).
Each kernel will be executed 20 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 116886 microseconds.
(= 116886 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 29569.4 0.048585 0.043288 0.059164
Scale: 17644.0 0.082248 0.072546 0.102548
Add: 21030.1 0.100620 0.091298 0.124700
Triad: 21230.7 0.100758 0.090435 0.120631
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
./flops.exe
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0024 5827.9076
2 -1.4166e-13 0.0007 10037.8942
3 4.7184e-14 0.0039 4371.9185
4 -1.2557e-13 0.0034 4355.5711
5 -1.3800e-13 0.0066 4415.6439
6 3.2380e-13 0.0065 4441.6299
7 -8.4583e-11 0.0053 2277.1707
8 3.4867e-13 0.0069 4367.6094
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 7050.6178
MFLOPS(2) = 3461.6233
MFLOPS(3) = 4175.0442
MFLOPS(4) = 4389.7311
-------------- Task 2 --------------
Memory needed (double 64-bit, 8 bytes):
(A) (2N + 1) * 8 bytes
(B) (M*N + M + N) * 8 bytes
(C) (M*L + L*N + M*N) * 8 bytes
(D) (N + N + p) * 8 bytes
Floating point operations:
(A) 2N
(B) M * 2N
(C) M * 2L * N
(D) 2 * N * p (Horner Schema)
Read/Write operations:
(A) Read: 2N Write: 1
(B) Read: M*2N Write: M*N
(C) Read: M*2L*N Write: M*L*N
(D) Read: 2*N*p Write: N*P
-------------- Task 3 --------------
Functions implemented in task_3.cpp
-------------- Task 4 --------------
----- Benchmark (A) -----
Memory allocated : 0.745 GByte
Duration per loop : 0.036 sec
GFLOPS : 2.579
GiByte/s : 20.630
-------------------------
----- Benchmark (B) -----
Memory allocated : 0.715 GByte
Duration per loop : 0.105 sec
GFLOPS : 1.704
GiByte/s : 6.818
-------------------------
----- Benchmark (C) -----
Memory allocated : 0.026 GByte
Duration per loop : 0.459 sec
GFLOPS : 4.062
GiByte/s : 0.057
-------------------------
----- Benchmark (D) -----
Memory allocated : 0.015 GByte
Duration per loop : 0.310 sec
GFLOPS : 1.201
GiByte/s : 0.048
-------------------------
-------------- Task 5 --------------
----- Benchmark norm -----
||x|| = 897124.301552
Memory allocated : 0.373 GByte
Duration per loop : 0.022 sec
GFLOPS : 4.222
GiByte/s : 16.890
-------------------------
What do you observe? Why?
-> Faster per loop than scalar product, only loads elements of 1 vector, instead of 2.
-------------- Task 6 --------------
Benchmarks using cBLAS
----- Benchmark (A) -----
Memory allocated : 0.745 GByte
Duration per loop : 0.023 sec
GFLOPS : 4.006
GiByte/s : 32.052
-------------------------
----- Benchmark (B) -----
Memory allocated : 0.715 GByte
Duration per loop : 0.026 sec
GFLOPS : 7.010
GiByte/s : 28.045
-------------------------
----- Benchmark (C) -----
Memory allocated : 0.026 GByte
Duration per loop : 0.020 sec
GFLOPS : 91.320
GiByte/s : 1.278
-------------------------
-------------- Task 7 --------------
A =
4.000000 1.000000 0.250000 0.111111 0.062500
1.000000 4.000000 1.000000 0.250000 0.111111
0.250000 1.000000 4.000000 1.000000 0.250000
0.111111 0.250000 1.000000 4.000000 1.000000
0.062500 0.111111 0.250000 1.000000 4.000000
b =
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
L + U =
4.000000 1.000000 0.250000 0.111111 0.062500
0.250000 3.750000 0.937500 0.222222 0.095486
0.062500 0.250000 3.750000 0.937500 0.222222
0.027778 0.059259 0.250000 3.749370 0.937050
0.015625 0.025463 0.059259 0.249922 3.749234
x =
0.000000 0.196259
0.000000 0.148391
0.000000 0.151272
0.000000 0.148391
0.000000 0.196259
Check solution:
A * x =
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
N = | 1 | 2 | 4 | 8 | 16 | 32
---------|--------|--------|--------|--------|--------|-------
Nrhs = 2 | 0.0047 | 0.0045 | 0.0046 | 0.0130 | 0.0203 | 0.0476
Nrhs = 4 | 0.0027 | 0.0031 | 0.0033 | 0.0046 | 0.0085 | 0.0250
Nrhs = 8 | 0.0035 | 0.0035 | 0.0045 | 0.0061 | 0.0119 | 0.0300
Nrhs = 16 | 0.0085 | 0.0062 | 0.0221 | 0.0113 | 0.0599 | 0.0757
Nrhs = 32 | 0.0122 | 0.0165 | 0.0112 | 0.0123 | 0.0238 | 0.0834
Nrhs = 64 | 0.0072 | 0.0078 | 0.0164 | 0.0133 | 0.0421 | 0.0666
Nrhs = 128 | 0.0073 | 0.0189 | 0.0269 | 0.0199 | 0.0337 | 0.1041
Nrhs = 256 | 0.0107 | 0.0135 | 0.0279 | 0.0351 | 0.0582 | 0.1438
Nrhs = 512 | 0.0276 | 0.0174 | 0.0237 | 0.1027 | 0.1113 | 0.2417
For fixed n, the solution time per rhs does not slow down consistently and scales very well.
Its faster than expected.
-------------- Task 8 --------------
There are 1 processes running.
Intervalls: 100 x 100
Start Jacobi solver for 10201 d.o.f.s
aver. Jacobi rate : 0.997922 (1000 iter)
final error: 0.124971 (rel) 0.000194029 (abs)
JacobiSolve: timing in sec. : 0.079399
ASCI file square_100.txt opened
17361 2 34320 3
Start Jacobi solver for 17361 d.o.f.s
aver. Jacobi rate : 0.998401 (1000 iter)
final error: 0.201744 (rel) 0.000265133 (abs)
JacobiSolve: timing in sec. : 0.18853