Summery of your results?!

g++ -O3 -fopenmp  mainEx1.cpp mylib.cpp -o dotprod
I added Makefile,  CLANG_default.mk GCC_default.mk to your repository

Linux: Using g++
> make run EX=Ex5             
Using clang++
> make run EX=Ex5 COMPILER=CLANG_

Added 
#include <algorithm>     // GH: transform()
in mylib.pp:2

1:
no scheduling tested
reduction_vec_append() implemented but never tested.


2: mainEx2.cpp
no parallelization at all. neighter OpenMP nor C++ execution policies

3: mainEx3.cpp
nested parallelization in count_goldbach(), single_goldbach()
Did that pay off?!

4:
Try 
collapse(2) 
in bench_funcs.cpp:75
(faster in my code for Mat-Mat-Mult)

5:
Why not using my provided 2D FEM code? That was already required in exercise 3.
Code only 1D FEM whioch is quite simple to parallelize because of the very simple matrix pattern.
jacobi_par_parallel()
build_fem_system_atomic()  OK