@@ -669,17 +669,23 @@ Tuning Guide*, which can be found in the `Cornelis Networks Customer Center
669
669
When do I need to select a CUDA device?
670
670
---------------------------------------
671
671
672
- "mpi-cuda-dev-selection"
673
-
674
- OpenMPI requires CUDA resources allocated for internal use. These
675
- are allocated lazily when they are first needed, e.g. CUDA IPC mem handles
676
- are created when a communication routine first requires them during a
677
- transfer. So, the CUDA device needs to be selected before the first MPI
678
- call requiring a CUDA resource. MPI_Init and most communicator related
679
- operations do not create any CUDA resources (guaranteed for MPI_Init,
680
- MPI_Comm_rank, MPI_Comm_size, MPI_Comm_split_type and MPI_Comm_free). It
681
- is thus possible to use those routines to query rank information and use
682
- those to select a GPU, e.g. using
672
+ Open MPI requires CUDA resources allocated for internal use. When possible,
673
+ these resources are allocated lazily when they are first needed, e.g. CUDA
674
+ IPC mem handles are created when a communication routine first requires them
675
+ during a transfer. MPI_Init and most communicator related operations do not
676
+ create any CUDA resources (guaranteed at least for MPI_Comm_rank,
677
+ MPI_Comm_size on ``MPI_COMM_WORLD ``).
678
+
679
+ However, this is not always the case. In certain instances, such as when
680
+ using PSM2 or the ``smcuda `` BTL (with the OB1 PML), it is not feasible to
681
+ delay the CUDA resources allocation. Consequently, these resources will need
682
+ to be allocated during ``MPI_Init() ``.
683
+
684
+ Regardless of the situation, the CUDA device must be selected before the first
685
+ MPI call that requires a CUDA resource. When CUDA resources can be initialized
686
+ lazily, it is possible to use the aforementioned communicator-related operations
687
+ to query rank information and utilize that to select a GPU.
688
+
683
689
684
690
.. code-block :: c
685
691
0 commit comments