README

CUDAEASY v. 0.954499736104
Copyright (C) 2009/11 Jani Sainio jani.sainio at utu.fi
Distributed under the terms of the GNU General Public License.
Please cite arXiv:0911.5692 if you use this code in your research.

1 Introduction

CUDAEASY is a program that solves the evolution of interacting scalar fields in an expanding universe.
It leverages modern GPUs to solve this evolution faster than CPU implementations.
See arXiv: for more information.
Files included:
- common_alt.mk (Modified CUDA .mk-file)
- cpu_dst.c (Discrete Sine Transform that uses FFTW. Used in the initialization procedures.)
- cudaeasy.cu (The main program code)
- Makefile
- README

2 Installation and dependencies

CUDAEASY naturally needs CUDA drivers installed into your machine and preferably a fast NVIDIA GPU (GT200 series tested) in order to achieve the speed advantage. It also needs CUFFT library that should come with CUDA. Additionally FFTW3 amd math library are needed. SILO output is also used which makes visualizations in LLNL VISIT easy. See and edit the common_alt.mk for your configuration. Current version has been tested only in Ubuntu 9.04 but it should work also in other distros/operating systems. Files were located in /NVIDIA_GPU_Computing_SDK/C/src/CUDAEASY except for common_alt.mk which was located in /NVIDIA_GPU_Computing_SDK/C/common .

3 Building and Running

The program should compile all the source files simply by typing make in the folder where cudaeasy.cu is located. Current version also keeps the all of the files it creates while building i.e. .ptx etc. After build process is complete the binary file can be found from /NVIDIA_GPU_Computing_SDK/C/bin/linux/release. The program can be run by typing ./cudaeasy. The current program prints to the screen information of current step number, scale factor a, 1./(H*a^(-3/2)), residual curvature K, rho_av/(3*f*H²) and the effective equation of state w. File output path should defined at main() part of the cudaeasy.cu file.

4 Running your own models

The code has been tested in chaotic inflation with potential V(phi,psi) = ½m²phi² + ½g²phi²psi². On different models it is strongly recommended that the output of this program is first verified with LATTICEEASY or DEFROST because of the different error behavior of current GPUs compared to CPUs.

In order to simulate your own model cudaeasy.cu has to be edited.
Possible changes might include:
- Different GRID_DIMX(E), GRIDDIMY(E) and GRIDDIMZ(E) that define how large the lattice is in terms of thread blocks. Note that GRIDDIMZ(E) is only used to define the length in z-direction and it is not related to the dimensions of the CUDA grid.
- Different BLOCK_DIMX(E) and BLOCKDIMY(E) that define how large thread blocks to use. BLOCKDIMZ(E) used only for convinience and it is not related to the dimensions of thread block.
- Add any additional arguments to field_evo and field_evo_rho kernel for additional fields.
- In field_evo and field_evo_rho kernels dV which is the derivative of the potential with respect to the field being evolved devided with the field value i.e. dV=1/f*dV/df where f is the field. Note that the different coefficients (for example scale factor a dependent terms) of the scalar fields are saved in constant memory and can only updated by host code which is done in cpu_evo and cpu_evo_rho functions.
- In field_evo and field_evo_rho kernels gpe variable (gradient and potential energy) is dependent on the potential V. Note also here that the different coefficients (for example scale factor a dependent terms) of the scalar fields are saved in constant memory and can only updated by host code which is done in cpu_evo and cpu_evo_rho functions.
- Initial conditions in main() part of the code.
- Allocate memory for additional fields in host and device memory in main() part of the code.
- Initialize all of the fields in host and device memory in main() part of the code.
- Update the h_coeff variables in main() which are used for the different coefficients (including the coefficients of the potential).
- In the evolution loop in main() part of the code add additional fields that need to be also evolved. The arguments in the functions have to be also changed to include any additional fields.