I am trying to take advantage of PyTorch’s multi-GPU support on a single machine by using nn.DataParallel
.
Note: I am using a framework called fastai2
which builds on top of PyTorch, so my scrips will have a bit of that sprinkled in.
import numpy as np
from fastai2.vision.all import *
from fastai2.distributed import *
def train():
path = untar_data(URLs.CAMVID_TINY)
def label_func(fn):
return path/"labels"/f"{fn.stem}_P{fn.suffix}"
codes = np.loadtxt(path/'codes.txt', dtype=str)
fnames = get_image_files(path/"images")
dls = SegmentationDataLoaders.from_label_func(
path, bs=8, fnames = fnames, label_func = label_func, codes = codes
)
learner = unet_learner(dls, resnet34).to_fp16()
if torch.cuda.device_count() > 1:
wrapped_model = nn.DataParallel(learner.model)
learner.model = wrapped_model.module
callbacks = [
EarlyStoppingCallback(min_delta=0.001, patience=5)
]
learner.fine_tune(20, freeze_epochs=2, wd=0.01, base_lr=0.0006, cbs=callbacks)
print('Done')
if __name__ == "__main__":
train()
unet_learner
returns an instance of nn.Module
which I am trying to wrap with nn.DataParallel
.
Problem
This does not seem to have the intended effect. I am still only able to use 1 GPU.
I tried changing the batch_size
(bs
in SegmentationDataLoaders
) as well, and that did not make any difference other than running out of GPU memory.
Any ideas on what I might be missing ?