M. Bauer, H. Köstler, C. Feichtinger, J. Habich, U. Rüde
WaLBerla
(Widely applicable Lattice-Boltzmann from Erlangen) is a massively
parallel software framework supporting a wide range of physical
phenomena. In this talk we will present the software design and
performance optimization techniques used in this framework and
especially focus on how to deal with conflicting goals like high node
performance and scalability on the one side, and clean software
structure, maintainability, and flexibility on the other side.
We first present single core optimization techniques and software
design concepts used for the LBM kernels in waLBerla. The single core
performance results are evaluated using the ECM machine model.
Then the scalability of the code is studied on current high performance
computing clusters. As test platforms, the BlueGene/Q cluster "Juqeen"
located in Jülich supercomputing center and the Intel Xeon E5-2680
petascale system "SuperMUC" located in Munich, are chosen.
Time permitting, we will also discuss a variant of
waLBerla suited for GPGPUs and clusters with GPU acceleration,including performance results on Tsubame 2.0.