Multi-GPU Training for a Unet based segmentation model

I am trying to take advantage of PyTorch’s multi-GPU support on a single machine by using nn.DataParallel.

Note: I am using a framework called fastai2 which builds on top of PyTorch, so my scrips will have a bit of that sprinkled in.

import numpy as np
from import *
from fastai2.distributed import *

def train():
    path = untar_data(URLs.CAMVID_TINY)

    def label_func(fn): 
        return path/"labels"/f"{fn.stem}_P{fn.suffix}"

    codes = np.loadtxt(path/'codes.txt', dtype=str)
    fnames = get_image_files(path/"images")
    dls = SegmentationDataLoaders.from_label_func(
        path, bs=8, fnames = fnames, label_func = label_func, codes = codes

    learner = unet_learner(dls, resnet34).to_fp16()
    if torch.cuda.device_count() > 1:
        wrapped_model = nn.DataParallel(learner.model)
        learner.model = wrapped_model.module

    callbacks = [
        EarlyStoppingCallback(min_delta=0.001, patience=5)

    learner.fine_tune(20, freeze_epochs=2, wd=0.01, base_lr=0.0006, cbs=callbacks)

if __name__ == "__main__":

unet_learner returns an instance of nn.Module which I am trying to wrap with nn.DataParallel.


This does not seem to have the intended effect. I am still only able to use 1 GPU.

I tried changing the batch_size (bs in SegmentationDataLoaders) as well, and that did not make any difference other than running out of GPU memory. :slight_smile:

Any ideas on what I might be missing ?

I’m a bit confused, what these lines are doing:

wrapped_model = nn.DataParallel(learner.model)
learner.model = wrapped_model.module

I assume that learner.model is an nn.Module, wrapped_model would be an nn.DataParallel object, and you would reassign the module to itself?
What would happen, if you remove the second line of code?

Sorry about that. My code snippet was incorrect. I was assigning learner.model to the instance of nn.DataParallel object. Let me send you the error message I saw first thing tomorrow.