OpenMP+MPI - first step

OpenMP+MPI - Start

Combining shared memory and distributed memory computation.

OpenMP: Quick reference, home page, tutorial (LLNL).
MPI: Quick reference (C++), docu, home page, tutorial (LLNL, MPI-book)

The compilers by PGI have to be used, see trial version.

Compiling Code:

Compilig code mpicxx [compiler options] skalar.cpp -o main.GCC_
> mpicxx [compiler options] -fopenmp skalar.cpp -o main.GCC_
Running the code on 4 processes
> mpirun -np 4 ./main.GCC_
Run code on 4 processes on 4 different computers
> mpirun -np 4 --hostfile my_hostfile ./main.GCC_
where the ASCII-file my_hostfile contains the name of the computers involved per row.

Changing the underlying Compiler for OpenMPI (briefly):

Using INTEL: > export OMPI_CXX="icpc -openmp"

How to OpenMP+MPI-parallelize the inner product:

Original code for inner product:

double scalar(const int N, const double x[], const double y[])
{
 double sum = 0.0;
 for (int i=0; i<N; ++i)
 {
    sum += x[i]*y[i];
 }
 return sum;
}

int main()
{
  ...
  double s = scalar(n,a,b);
  ... 
}

OpenMP+MPI code for inner product:

#include <mpi.h>

// local sequential inner product
double scalar(const int N, const double x[], const double y[])
{
 double sum = 0.0;
 #pragma omp parallel for private(i) shared(x,y) schedule(static) reduction(+:sum)
 for (int i=0; i<N; ++i)   
     sum += x[i]*y[i];
 return sum;
}

// MPI inner product
double scalar(const int n, const double x[], const double y[], const MPI_Comm icomm)
{
  const double s = scalar(n,x,y);                // call sequential inner product
        double sg;
  MPI_Allreduce(&s,&sg,1,MPI_DOUBLE,MPI_SUM,icomm);

  return(sg);
}

int main(int argc, char* argv[])
{
  ...
  MPI_Init(&argc,&argv);
  ...
  double s = scalar(n,a,b,MPI_COMM_WORLD);
  ...
  MPI_Finalize();
  ...
}

and compile the code with one of the available compilers

mpicxx -fopenmp skalar.cpp -o main.GCC_
pgc++ -Mmpi=mpich-fast -mp skalar.cpp -o main.PGI_

Each MPI process spreads OMP_NUM_THREADS OpenMP threads

set the environment variable in the bash: > export OMP_NUM_THREADS 2
or do it in the code explicitely: omp_set_num_threads(2);