UPDATE 2025-12-18 Ex2_Second_Attempt: * Makefile missing [GH: added] * execution::par : correct results and seems 6x faster on 8 cores. Ex5_Second_Attempt: * FEM assembling parallel OK * no parallelization of GetDiag * extra _omp version of JacobiSolve and two others This is not necessary. Parallelize directly in the functions, also in vdop.cpp See pdf Ex5_Second_Attempt.pdf ---------------------------- Summery of your results?! g++ -O3 -fopenmp mainEx1.cpp mylib.cpp -o dotprod I added Makefile, CLANG_default.mk GCC_default.mk to your repository Linux: Using g++ > make run EX=Ex5 Using clang++ > make run EX=Ex5 COMPILER=CLANG_ Added #include // GH: transform() in mylib.pp:2 1: no scheduling tested reduction_vec_append() implemented but never tested. 2: mainEx2.cpp no parallelization at all. neighter OpenMP nor C++ execution policies 3: mainEx3.cpp nested parallelization in count_goldbach(), single_goldbach() Did that pay off?! 4: Try collapse(2) in bench_funcs.cpp:75 (faster in my code for Mat-Mat-Mult) 5: Why not using my provided 2D FEM code? That was already required in exercise 3. Code only 1D FEM whioch is quite simple to parallelize because of the very simple matrix pattern. jacobi_par_parallel() build_fem_system_atomic() OK