Ph.D.-Seminar:

High Performance Computing I (WS 24/25)

Contents:

We will start with an introduction into basic principles and algorithms of parallel computing followed by transferring selected algorithms onto many-core architectures. We will focus on NVIDIA-GPUs using CUDA, even on multiple GPUs. The students will compare the performance of the algorithms on the GPU with performance on multi-core CPUs using OpenMP.

A tech-report has to be produced until the end of the term.

Lecturer: Prof. Gundolf Haase, Heinrichstr. 36, Zi 506, Tel. 5178,

Appointments: Wednesday 10:00 - 11:30 in Heinrichstr. 36, SR 11.33

Time table of lectures:

Oct 2, 2024
No lecture

Oct 9, 2024

Brief introduction into concepts of parallel computing: vectorization, shared/distributed resources, threads/processes. (V 0)

GPU computing: What's special? (V 1, V 3)

First steps in CUDA, improving scalar product (Codes, examples).

Oct 16, 2024

Comparison A100, H100, Further development
improving scalar product etc. (all Codes, examples).

firstSteps

skalar (+BLAS, +Thrust), float_skalar, par_skalar (+MPI)

densematrices_libs (+BLAS)

cusolver (+cuSOLVEr, +unifies memory)

GPU computing: CUDA, Matlab, python
CUDA: 12.1, docu, Best Practice, Profiler, Debugger,

Oct 23, 2024

Consulting on exercise.
Profiling
Hardware (login from outside KFU only via VPN):

Mephisto at IMSC

Remote login to servers:

VPN to KFU is needed: install via VPN Service the software AnyConnect (configure as server: https://univpn.uni-graz.at; login: KFU E-mail)

Linux: use ssh -X 143.50.47.xxx to connect to compute server

Windows: Install WinnSSHTerm with a guided installation of further packages (putty, winscp, X-Server)

Oct 30, 2024

Nov 5, 2024
No lecture

Discussion: results of first exercise.

Nov 12, 2024
No lecture

Travel Prof. Haase (COLIBRI PhD Retreat)

Nov 19, 2024

Nov 26, 2024

Individual projects

Dec 3, 2024

Dec 10, 2024

Jan 8, 2025

Jan 15, 2025

Jan 22, 2025

Jan 29, 2025

Task sheets:

Task 1 GPU
Task 2 with some linear algebra
Task 3 ??

Books:

Learn CUDA Programming: A beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++; Han/Sharma;Packt; 2019

Extended (old) Course Material : Follow the link. See templates.

Material for CUDA:

NVIDIA: Hardware Donation Program

CUDA Toolkit Documentation: all

CUDA Toolkit 12 Download/Tutorials

CUDA 6.5: blog by Mark Harries

NVIDIA, CUDA, OpenCL, OpenCL for NVIDIA
CUDA Programming: Getting Started, Guide, Reduction in CUDA
AMD, Radeon: Developer Center
List of GPU-acclerated libraries; Thrust 2.5 (C++ STL in CUDA), ppt

Software/Compiler/Hardware:

FLAMEGPU: Flexible Large Scale Agent Modelling Environment for the GPU
Nvidia: Pascal with 3840 cores
OpenACC (Cray, NVIDIA, PGI, CAPS), Quick Ref
CUBLAS, CUFFT, CUSPARSE, CURAND, Thrust 2.5, CuSolver,
LAPACK on GPU (Info): cuLA
CUDA-Programme auch auf CPUs lauffähig
great Course by Mike Giles (see also the guest talks)
PetSc on GPU
Kepler-GK110: 1, 2, 3, 4
Tesla K80: 1, ; Tesla K20: 1, 2, 3, Top 500
AMD: FFirepro S1000, Kaveri (856 GFLOPS, update), hUMA, APU13,

Further Links

Comparison GPU/CPU
Tesla mit Fermi [16.11.2009], GF100 [18.01.2010], Tesla C2050, Quadro 6000
Wiener Supercomputer VSC (Vienna Scientific Cluster)
Chinease GPU-supercomputers [May 31, 2010], nebulae, auto-tuning,
Aubrey Isle / Knights Ferry co-processor card by Intel [June 1, 2010; c't 13/2010, p.20], comparison with fermi, Compiler, 1 TFLOP DGEMM [Nov. 16, 2011]
AMD Llano [Oct. 19, 2010], aktuell [June 2011]
NVIDA GPUs in servers by Cray [Sept. 22, 2010],
next generation: Kepler and Maxwell [Sept. 22, 2010
Chinese Tianhe-1A on rank 1 in top 500 [Oct 28. 2010]: 14336 Xeon + 7168 Tesla (2.5 PFLOPS, 4.04 MW) located at NSC in Tianjin (see Spiegel, Heise, nvidia). More details.
BlueGene/Q with 17 cores (Heise)
Mathematica8 supports GPU computing
low energy supercomputer at Uni Frankfurt (Heise)
Top 500 [June 2011] (heise)
Oak Ridge plans with 18000 Kepler-GPUs [Oct. 11, 2011]; Titan [Oct. 30, 2012]
MS-AMP [Feb 2012]
Cray XC30 using Xeon Phi [Nov. 9, 2012]; Titan
Nvida: Volta (1 TB bandwidth)
Qarnot Computing

07.08.2024