High Performance Computing (Master Course)
- Contents:
- We will start with an introduction into basic principles and
algorithms of hardware aware and parallel computing followed by
transferring selected algorithms onto many-core architectures.
- Lecturer:
- Prof.
Gundolf
Haase, Heinrichstr. 36, Zi 506, Tel. 5178,
- Appointments:
Wednesday 8:15 - 9:45 in Heinrichstr. 36, SR 11.33
Contents:
- Introduction into hardware and parallel concepts: pdf.
- Iterative Methods: pdf.
- Intro into Finite Elements: pdf.
- Parallel Finite Elements (OpenMP, MPI): pdf.
Commented code for OpenMP
and for MPI.
- Geometrisches Multigrid: pdf.
- Algebraic Multigrid: pdf.
- Application in cardiovascular applications: pdf.
- Parallel improvements: pdf.
- Non-linear problems: pdf.
- MPI+X: pdf.
- Exercises (Folllow
the links in the pdf-document):
- Hardware
(login from outside KFU only via VPN):
- Mephisto at IMSC; try the jupyter
interface.
- Clusters
in Graz: sauron
(queuing
system); hostfile
, see hints.
- Remote login to servers:
- VPN to KFU is needed: install via
VPN Service the software
AnyConnect (configure as server: https://univpn.uni-graz.at;
login: KFU E-mail)
- Linux: use
ssh -X 143.50.47.xxx
to connect to
compute server
- Windows: Install WinnSSHTerm
with a guided installation of further packages (putty, winscp,
X-Server)
- Software:
- Material:
- Link
for impatient.
- Winter/Summer School
2019.
- Course
I held in Chile, its presentations,
the fast
entry.
- Using BLAS,
and other linear
algebra packages.
- OpenMP: 5.1,
Quick
Reference, LLNL-tutorial on OpenMP,
nice tutorial,
guide into
OpenMP; german
tutorial; compiler
- OpenMP Accelerator Offload (Intel)
- SYCL,
- MPI: OpenMPI(home, doc),
LLNL-tutorial on MPI
- Intel optimized BLAS/Lapack: MKL
- AMD optimizied BLAS: ACML
(good docu)
- valgrind on MPI, see §4.9
of manual.
- likwid (code, wiki)
- use of restrict,
- Dining philosophers problem
- Some words on Linux/Unix.
- R. Grimm: C++ Core Guidelines: Mehr
Fallen in der Concurrency,
- Further
Links
- Agner Fog: Software
optimization;
- SIMD instruction
list. intel intrinsics
guide.
- Top 500 [Nov 2019]: ARM
A64FX with 16.9 GFLOPS/Watt (+Nvidia),
Groq,
- Top
5 [June 2020]
- Wiki for semiconductors ( AMD
Zen
(Epyc, EPYC
7551, Threadripper),
Intel Coffee
Lake, Skylake,
Haswell
(i7-4770),
Alder
Lake)
- KI-Chip TPU
by google
- Vienna Supercomputer
[VSC-4, VSC-3]
- Chinease #1: Sunway
TaihuLight [May 2016]
- Intel Xeon E7-8890-v4 (24 x 2.20 GHz, 60 MB Cache, 102 GB/s
bandwidth, 844 GFLOPS(D)) [June '16: Intel,
press,
info]
- Intel®
Xeon® Phi Knights Landing (72 cores, 1.5 GHz, 16 GB MCDRAM, 490/102
GB/s bandwidth, 3 TFLOPS(D)) [June '16: Intel,
heise]
- Intel Xeon W-3175X (28 x 3.1 GHz, 38.5 MB Cache) [Oct'18: info]
- AMD Ryzen Threadripper (32 x 3.0 GHz, 80 MB Cache) [Aug.'18, heise]
- NVIDIA: Pascal-Architecture, Titan
X; 12 GB with 480 GB/s, 11 TFLOPS (single; double 1/32).
- 2020: Nvidia Ampere (1,2,3,
RTX
3090, dito;
36 Shader-Teraflops, 69 Raytracing-Teraflops, 285 Tensor-Teraflops)
- 2020: Arm (Ampere
Altra, Neoverse,
72-core)
- top 500 (2020: Fugaku,
2022: Frontier)
Books
- Sterling/Anderson/Brodowicz: "High Performance Computing", Elsevier,
2018 (e-book)
- Thomas
Rauber and Gudula Rünger, "Parallel Programming: for Multicore
and Cluster Systems ", Springer, Berlin, 2013, 3rd edition (e-book,
1st ed: e-book)
- Craig C. Douglas and Gundolf Haase and Ulrich Langer: "A Tutorial
on Elliptic PDE Solvers and their Parallelization", SIAM, 2003 (e-book)
- B1, B2
- Bartłomiej Filipek: C++17
in Detail.
- C++20: official standard,
blog
by Rainer Grimm
- Nov 6, 2024