See this page for a list of current compilers (including CUDA): https://github.com/trilinos/Trilinos/wiki/Pull-Request-Testing-Interface
Building Trilinos with CUDA support requires a script called nvcc_wrapper, which is distributed inside Kokkos within Trilinos. Enabling both CUDA and MPI using OpenMPI can be done by setting these environment variables:
export OMPI_CXX=/<Tpath>/Trilinos/Trilinos/packages/kokkos/config/nvcc_wrapper
where Tpath is the path at which a copy of Trilinos is available.
This variable tells mpicxx to use nvcc_wrapper as the underlying compiler. Note that nvcc_wrapper uses g++ as the default C++ host compiler.
Below is a CMake configure script fragment to then configure Trilinos:
-DCMAKE_CXX_COMPILER=/<Mpath>/bin/mpicxx \
-DCMAKE_C_COMPILER=/<Mpath>/bin/mpicc \
-DCMAKE_Fortran_COMPILER=/<Mpath>/bin/mpif77 \
-DCMAKE_CXX_FLAGS="-g -lineinfo -Xcudafe \
--diag_suppress=conversion_function_not_usable -Xcudafe \
--diag_suppress=cc_clobber_ignored -Xcudafe \
--diag_suppress=code_is_unreachable" \
-DTPL_ENABLE_MPI=ON \
-DTPL_ENABLE_CUDA=ON \
-DKokkos_ENABLE_CUDA=ON \
where Mpath is the path to the base of the OpenMPI installation to use for the build.
The CMAKE_CXX_FLAGS line adds some nvcc_wrapper command-line arguments to disable some superfluous warnings generated by nvcc.