- Sebastian Engel: Talk 1, Talk
2, Report

"Optimal Control of the Wave Equation with BV-Functions"

Lit: FeniCS,

- Stefan Hofmeister: Talk
1, Talk 2, Report

"Active Contours without Edges: Segmentation of Coronal Holes"

Highlight: sequ. 2x faster; additional 8x faster with 16 CPU threads, 36x faster with 64 KNL threads

Lit: - Parallel Python (Cython, Cython 2, PP?, ScyPy, MPI4Py, NVIDIA: Python on GPU, PyCUDA)
- CL Linear Algebra Benchmark.
- OpenACC and CUDA and Thrust: Larkin, Harris, Talk Angerer (1st part)
- Unified memory using CUDA: Harris, Negrut, on Pascal,
- Unified memory using OpenACC: Sakharnikh, Kraus, stackoverflow,
- Video on C++ classes and OpenACC.
- Peter Leitner: Talk
1, Talk 2,
Report

"Solution of the (1+1)-D Dirac Equation on a staggered grid"

Highlight: python --> Fortran(500 times)+OpenMP 1000x faster

Lit: ParaReal [Gander], MGRIT [Falgout] - Patrick Schiffmann: Talk
1, Talk 2,
Report

"RBF Interpolation for Mesh Deformation"

Highlight: 4x faster

Lit: - Numba (Python with LLVM just in time compilation)
- Intel Report für spezifische Code Segmente (-opt-reportroutine=string : Generates reports only for functions or subroutines whose names contain string. By default, reports are generated for all functions and subroutines)
- Python Multithreading problem
- Dyn. arrays in many dimensions
- 1024 core manycore (1, 2, 3)
- VCL: C++ vector class library
- NVIDIA CUB: a lower level Thrust

March 27, 2017