Hello! I’m trying to train a U-Net on a dual GPU setup using torch.nn.DataParallel
. However, most of the time only the first GPU is utilized while the second one stays idle. On the other hand, if I look at memory usage using nvidia-smi
it seems the load is divided equally between the 2 GPUs
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3023 C python 3639MiB |
| 1 N/A N/A 3023 C python 3503MiB |
+-----------------------------------------------------------------------------+
Tue Aug 3 21:02:49 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:05:00.0 Off | N/A |
| 27% 66C P2 228W / 250W | 3661MiB / 11178MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:86:00.0 Off | N/A |
| 20% 44C P2 74W / 250W | 3517MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
This is how I load the model to the GPUs:
if torch.cuda.is_available():
device = torch.device('cuda')
model = torch.nn.DataParallel(model).to(device)
I am using batch
as a first dimension to my tensor so dim=0
is used by default. While training I transfer my tensors to the GPU like so:
imgA = imgA.float().cuda()
My model and submodules are implemented via nn.Module
This is my dataset:
class CityscapesExt(Cityscapes):
voidClass = 19
# Convert ids to train_ids
id2trainid = np.array([label.train_id for label in Cityscapes.classes if label.train_id >= 0], dtype='uint8')
id2trainid[np.where(id2trainid == 255)] = voidClass
# Convert train_ids to colors
mask_colors = [list(label.color) for label in Cityscapes.classes if label.train_id >= 0 and label.train_id <= 19]
mask_colors.append([0,0,0])
mask_colors = np.array(mask_colors)
# List of valid class ids
validClasses = np.unique([label.train_id for label in Cityscapes.classes if label.id >= 0])
validClasses[np.where(validClasses == 255)] = voidClass
validClasses = list(validClasses)
# Create list of class names
classLabels = [label.name for label in Cityscapes.classes if not (label.ignore_in_eval or label.id < 0)]
classLabels.append('void')
def __getitem__(self, index):
filepath = self.images[index]
image = Image.open(filepath).convert('RGB')
targets = []
for i, t in enumerate(self.target_type):
if t == 'polygon':
target = self._load_json(self.targets[index][i])
else:
target = Image.open(self.targets[index][i])
targets.append(target)
target = tuple(targets) if len(targets) > 1 else targets[0]
if self.transforms is not None:
if self.split == 'train':
image_A, image_B, affine2_to_1, target, flip = self.transforms(image, target)
target = self.id2trainid[target]
return image_A, image_B, affine2_to_1, target, flip
num_workers=8
and pin_memory=True
are also set
Am I missing something here or this is expected behavior?