Where can I find a complete example for DataParallel on 0.4.1 version

I want to use multi-gpu to train my model but get many unexpected errors in implementing loss function and backward. So I need an example to solve those problems on 0.4.1 version.

 model = SoP.SoP_model(*args)
    optimizer = torch.optim.Adam([{'params': model.audio_s.parameters()}, {'params': model.drn_model.parameters(), 'lr': args.DRNlr},
                                 ],lr=LR,
                                weight_decay=WEIGTH_DECAY)

Here you can see hoy wo set different paramters for different parts of the model. Inilization, pretraining and the whole backward process

criterion = torch.nn.BCEWithLogitsLoss(size_average=size_average)
    def init_weights(m):
        if type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight, gain=nn.init.calculate_gain('conv2d'))
    if Pretrained is not None:
        print('Loading pretrained weights')
        model.load_state_dict(torch.load(Pretrained))
    else:
        model.unet_model.apply(init_weights)

    if freezeUNET:
        for param in model.unet_model.parameters():
            param.requires_grad = False 
 model.train()
    if CUDA:
        model = torch.nn.DataParallel(model,output_device=1).cuda()

Here you have a little of everything

    for t in range(EPOCHS):
        for j in range(iterations):
            audio,video,gt = loader()
            video=video.float()
            audio=audio.float()
            if CUDA:
                gt=torch.autograd.Variable(gt.cuda(1,async=True))
                video=torch.autograd.Variable(video.cuda(1))
                audio = torch.autograd.Variable(audio.cuda(1))
    
            else:
                gt=torch.autograd.Variable(gt)
                video=torch.autograd.Variable(video)
                audio = torch.autograd.Variable(audio)
            output = model(video,audio)
            loss = criterion(output, gt.float())
            # compute gradient and do SGD step
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            

Notice im not using default dataloader so that part is different