A successful attempt has been made to integrate CUDA C/C++ into MOHID. CUDA is an NVIDIA technology that makes programming on a graphics card easier. Graphics cards (GPU’s) bring performance gains for applications that allow massive parallelization. Hydro-dynamical models are very suitable for this kind of applications due to the 2D or 3D grid structure.
The Thomas algorithm (a widely used tri-diagonal solver) can now run in CUDA while being called from within MOHID. Performance tests show that the Thomas algorithm runs 5.3x and 7.8x faster for respectively the Z and the X – Y direction on a TESLA C1060 (a standard CUDA card, current value around €1200). The results were compared to the FORTRAN implementation running on one core of a high-end CPU. Also, the overall execution time of MOHID was reduced by doing some memory optimizations (array padding and page-locked memory).
The roughly estimated potential speed up for running the complete hydro-dynamical MOHID module in CUDA is 15x to 50x for a TESLA C1060. In computation time requirements this means that a model that currently takes a month of real-time simulation will take two days or less. The estimated performance gain is higher than the currently achieved speed up, since a lot of communication is required between host (CPU) and device (GPU) for the Thomas algorithm. If a complete model is contained on the GPU, this communication operating cost can become insignificant. Of course there is always some overhead of code that cannot be parallelized (the blue part in figure 2, estimated at 5%).
Two reports have been produced, describing the work and the main results:
This work has been developed by Jonathan van der Wielen, a 24 year old Dutch Technical Computer Scientist student who is currently working on his final project for his Bachelor degree. The preliminary results here presented were achieved after integrating CUDA into MOHID, in a work developed in less than three months and with relatively cheap equipment. Jonathan believes CUDA has enough potential to convince FORTRAN programmers to take the effort of learning C++.

Performance test results Thomas in CUDA and FORTRAN, total execution time

Rough estimation of performance gain for ModuleHydroDynamic