Access GPU partitions in MIG


I have been given access to a GPU cluster where the GPUs (2x NIVIDIA A100 80GB) are partitioned using MIG to partition their GPUs into sub-elements…

Unfortunately, the I cannot find an example which can show me how to access the part via a given UUID of the sub element (MIG-11c29e81-e611-50b5-b5ef-609c0a0fe58b)… Or rather how to tell torch to use that?

device(“cuda:0”) would not be enough, it only describes the GPU the partition is placed on…

1 Like

You could use CUDA_VISIBLE_DEVICES to specify the desired MIG instance.

This variable does not exist… StackOverflow suggests to export like “export CUDA_VISIBLE_DEVICES=0” but what would then CUDA_VISIBLE_DEVICES give me? I already know all ids?

You have to define this env variable as given in the user guide:

$ nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-e86cb44c-6756-fd30-cd4a-1e6da3caf9b0)
  MIG 3g.20gb Device 0: (UUID: MIG-GPU-e86cb44c-6756-fd30-cd4a-1e6da3caf9b0/1/0)
  MIG 3g.20gb Device 1: (UUID: MIG-GPU-e86cb44c-6756-fd30-cd4a-1e6da3caf9b0/2/0)

$ CUDA_VISIBLE_DEVICES=MIG-GPU-e86cb44c-6756-fd30-cd4a-1e6da3caf9b0/1/0 ./BlackScholes &
$ CUDA_VISIBLE_DEVICES=MIG-GPU-e86cb44c-6756-fd30-cd4a-1e6da3caf9b0/2/0 ./BlackScholes &