Error in use multiple gpu in my source

setare · November 10, 2022, 7:01am

Hi I want to run my project on two gpu parallel so after I write my code, I write this command
in this commands I write my model in paralell and then in pl.trainer in add parameter of gpus=2
’
def main():
datamodule = DataModule(train_ds, val_ds)
mymodel = mymodel(config)
trainer = pl.Trainer(
accelerator=“gpu”,
gpus=2,
)
trainer.fit(model, datamodule)

if name == “main”:
main()

but when I run it I see this error how to edit my source code?( I’d appreciate it if you could help me with it)
‘TypeError: Trainer.fit() requires a LightningModule, got: DataParallel’

fduwjj · November 14, 2022, 7:04pm

Looks like this is something more related to lightning. Maybe can you post the all of your code here so that we can better help you? Thanks!

setare · November 14, 2022, 7:26pm

Hi, yes sure. Thank you for your responce.
my source code is…

I send my source if it could be better show my error

This is data module

class DataModule(pl.LightningDataModule):

  def __init__(self, train_dataset, val_dataset,  batch_size = 1):
    super(DataModule, self).__init__()
    self.train_dataset = train_dataset
    self.val_dataset = val_dataset
    self.batch_size = batch_size
  def train_dataloader(self):
    return DataLoader(self.train_dataset, batch_size = self.batch_size, collate_fn = collate_fn, shuffle = True, num_workers = 0, pin_memory = True)
  
  def val_dataloader(self):
    return DataLoader(self.val_dataset, batch_size = self.batch_size,collate_fn = collate_fn, shuffle = False, num_workers = 0, pin_memory = True)

and this is part of model

class LaTrForVQA(pl.LightningModule):
  def __init__(self, config , learning_rate = 1e-4, max_steps = 100000//2):
    super(LaTrForVQA, self).__init__()   
    self.config = config
    self.save_hyperparameters()
    self.latr =  LaTr_for_finetuning(config)
    self.training_losses = []
    self.validation_losses = []
    self.max_steps = max_steps

  def configure_optimizers(self):
    return torch.optim.AdamW(self.parameters(), lr = self.hparams['learning_rate'])

  def forward(self, batch_dict):
    boxes =   batch_dict['boxes']
    img =     batch_dict['img']
    question = batch_dict['question']
    words =   batch_dict['tokenized_words']
    answer_vector = self.latr(lang_vect = words, 
                               spatial_vect = boxes, 
                               img_vect = img, 
                               quest_vect = question
                               )
    return answer_vector

  def calculate_metrics(self, prediction, labels):

      ## Calculate the accuracy score between the prediction and ground label for a batch, with considering the pad sequence
      batch_size = len(prediction)
      ac_score = 0

      for (pred, gt) in zip(prediction, labels):
        ac_score+= calculate_acc_score(pred.detach().cpu(), gt.detach().cpu())
      ac_score = ac_score/batch_size
      return ac_score

  def training_step(self, batch, batch_idx):
    answer_vector = self.forward(batch)

    ## https://discuss.huggingface.co/t/bertformaskedlm-s-loss-and-scores-how-the-loss-is-computed/607/2
    loss = nn.CrossEntropyLoss()(answer_vector.reshape(-1,self.config['classes']), batch['answer'].reshape(-1))
    _, preds = torch.max(answer_vector, dim = -1)

    ## Calculating the accuracy score
    train_acc = self.calculate_metrics(preds, batch['answer'])
    train_acc = torch.tensor(train_acc)

    ## Logging
    self.log('train_ce_loss', loss,prog_bar = True)
    self.log('train_acc', train_acc, prog_bar = True)
    self.training_losses.append(loss.item())

    return loss

  def validation_step(self, batch, batch_idx):
    logits = self.forward(batch)
    loss = nn.CrossEntropyLoss()(logits.reshape(-1,self.config['classes']), batch['answer'].reshape(-1))
    _, preds = torch.max(logits, dim = -1)

    ## Validation Accuracy
    val_acc = self.calculate_metrics(preds.cpu(), batch['answer'].cpu())
    val_acc = torch.tensor(val_acc)

    ## Logging
    self.log('val_ce_loss', loss, prog_bar = True)
    self.log('val_acc', val_acc, prog_bar = True)    
    return {'val_loss': loss, 'val_acc': val_acc}

  def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i, opt_closure = None, on_tpu=False,
    using_native_amp=False, using_lbfgs=False):
        if self.trainer.global_step < 1000:
            lr_scale = min(1., float(self.trainer.global_step + 1) / 1000.)
            for pg in optimizer.param_groups:
                pg['lr'] = lr_scale * self.hparams.learning_rate
        else:
            for pg in optimizer.param_groups:
                pg['lr'] = polynomial(self.hparams.learning_rate, self.trainer.global_step, max_iter = self.max_steps)
        optimizer.step(opt_closure)
        optimizer.zero_grad()

  def validation_epoch_end(self, outputs):       
        val_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        val_acc = torch.stack([x['val_acc'] for x in outputs]).mean()
        self.log('val_loss_epoch_end', val_loss, on_epoch=True, sync_dist=True)
        self.log('val_acc_epoch_end', val_acc, on_epoch=True, sync_dist=True)
        
        self.val_prediction = []

model=LaTrForVQA(config)

and this is for trainer

trainer = pl.Trainer(
    max_steps = max_steps,
    default_root_dir="runs",
    gpus=2,
    deterministic=True,
)

Now I fit my model

datamodule = DataModule(train_ds, val_ds)
trainer.fit(model,datamodule)

If you have any other questions, be sure to ask…
I change my code according some suggestion to don’t use parallel and only use pl.trainer(gpus=2) but with this code I still can’t use both of GPU in parallel and only one GPU work in source code and I dont know what other change I should apply in this source.

fduwjj · November 15, 2022, 8:04pm

To enable parallelization in Lightning training, you need to use strategy here:
accelerators — PyTorch Lightning 2.1.2 documentation.

setare · November 16, 2022, 5:07am

Hi,
I use different type of strategy and anyone don’t work and give me different type of errors and I don’t know what to do and can’t find any solution
I mention some of this error

When I set strategy=“ddp_spawn” (or ddp).It gives me this error

MisconfigurationException: Trainer(strategy='ddp_spawn') is not compatible with an interactive environment. Run your code as a script, or choose one of the compatible strategies: Trainer(strategy=None|dp|ddp_fork). In case you are spawning processes yourself, make sure to include the Trainer creation inside the worker function

When I set strategy=“ddp_fork” .It gives me this error
"Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method

When I set strategy=“dp” . It gives me this error
AssertionError: Gather function not implemented for CPU tensors

When I set strategy=" ddp_fork " (or ddp_notebook).It gives me this error
ValueError: ‘ddp_notebook’ is not a valid DistributedType

If it is my wrong in source code what to do?
I’d so appreciate it if you could help me with this wrong.

fduwjj · November 16, 2022, 5:37am

How did you set your multi-process env? I mean from these errors looks it is complaining the way you set up multi-process env.

setare · November 16, 2022, 6:03am

If I have understood your point, first of all I create a virtual environment in anaconda navigator with python 3.7 and my source could identify both of GPU

import torch
print(‘__CUDNN VERSION:’, torch.backends.cudnn.version())
print(‘__Number CUDA Devices:’, torch.cuda.device_count())
print(‘__CUDA Device Name 1:’,torch.cuda.get_device_name(0))
print(‘__CUDA Device Name 2:’,torch.cuda.get_device_name(1))
print(‘__CUDA Device current:’,torch.cuda.get_device_name(torch.cuda.current_device()))
print(‘__CUDA Device Total Memory [GB]:’,torch.cuda.get_device_properties(0).total_memory/1e9)
print(‘__CUDA Device Total Memory [GB]:’,torch.cuda.get_device_properties(1).total_memory/1e9)
print(‘__CUDA Device Total Memory [GB]:’,torch.cuda.get_device_properties(0).total_memory/1e9+torch.cuda.get_device_properties(1).total_memory/1e9)

_CUDNN VERSION: 8302
__Number CUDA Devices: 2
__CUDA Device Name 1: NVIDIA GeForce GTX 1080 Ti
__CUDA Device Name 2: NVIDIA GeForce GTX 1070 Ti
__CUDA Device Total Memory [GB]: 11.721572352
__CUDA Device Total Memory [GB]: 8.514043904
__CUDA Device Total Memory [GB]: 20.235616256

version of pytorch-lightning is 1.8.0.post1

and I set for different kinds of pytorch package . All of them have these errors.
It is worth mentioning that with the same settings I can use both of my GPU in dataparallel only but I can’t use multi gpu in pytorch-lightning .

If I have not understood your point, may you give more guidance.

setare · November 19, 2022, 7:09am

Hi @fduwjj ,
This problem is very important for me. I would appreciate it if you gave your opinion about this problem.

harshas · February 7, 2023, 9:13pm

try replacing pl.LightningDataModule in class DataModule() with pl.LightningModule

Tali_M · March 13, 2023, 7:44am

Did you find an answer? I’m having similar problems…

setare · July 11, 2023, 4:31am

Unfortunately, no. I use gpu that have more RAM.