UPDATE 2025-12-18
Ex2_Second_Attempt:
* Makefile missing [GH: added]
* execution::par  : correct results and seems 6x faster on 8 cores.

Ex5_Second_Attempt:
* FEM assembling parallel OK
* no parallelization of GetDiag
* extra _omp version of JacobiSolve and two others
  This is not necessary. Parallelize directly in the functions, also in vdop.cpp
  See pdf Ex5_Second_Attempt.pdf
----------------------------
Summery of your results?!

g++ -O3 -fopenmp  mainEx1.cpp mylib.cpp -o dotprod
I added Makefile,  CLANG_default.mk GCC_default.mk to your repository

Linux: Using g++
> make run EX=Ex5             
Using clang++
> make run EX=Ex5 COMPILER=CLANG_

Added 
#include <algorithm>     // GH: transform()
in mylib.pp:2

1:
no scheduling tested
reduction_vec_append() implemented but never tested.


2: mainEx2.cpp
no parallelization at all. neighter OpenMP nor C++ execution policies

3: mainEx3.cpp
nested parallelization in count_goldbach(), single_goldbach()
Did that pay off?!

4:
Try 
collapse(2) 
in bench_funcs.cpp:75
(faster in my code for Mat-Mat-Mult)

5:
Why not using my provided 2D FEM code? That was already required in exercise 3.
Code only 1D FEM whioch is quite simple to parallelize because of the very simple matrix pattern.
jacobi_par_parallel()
build_fem_system_atomic()  OK