Error in use multiple gpu in my source

Hi I want to run my project on two gpu parallel so after I write my code, I write this command
in this commands I write my model in paralell and then in pl.trainer in add parameter of gpus=2

def main():
datamodule = DataModule(train_ds, val_ds)
mymodel = mymodel(config)
trainer = pl.Trainer(
accelerator=“gpu”,
gpus=2,
)
trainer.fit(model, datamodule)

if name == “main”:
main()

but when I run it I see this error how to edit my source code?( I’d appreciate it if you could help me with it)
‘TypeError: Trainer.fit() requires a LightningModule, got: DataParallel’

Looks like this is something more related to lightning. Maybe can you post the all of your code here so that we can better help you? Thanks!

Hi, yes sure. Thank you for your responce.
my source code is…

I send my source if it could be better show my error

This is data module

class DataModule(pl.LightningDataModule):

  def __init__(self, train_dataset, val_dataset,  batch_size = 1):
    super(DataModule, self).__init__()
    self.train_dataset = train_dataset
    self.val_dataset = val_dataset
    self.batch_size = batch_size
  def train_dataloader(self):
    return DataLoader(self.train_dataset, batch_size = self.batch_size, collate_fn = collate_fn, shuffle = True, num_workers = 0, pin_memory = True)
  
  def val_dataloader(self):
    return DataLoader(self.val_dataset, batch_size = self.batch_size,collate_fn = collate_fn, shuffle = False, num_workers = 0, pin_memory = True)

and this is part of model

class LaTrForVQA(pl.LightningModule):
  def __init__(self, config , learning_rate = 1e-4, max_steps = 100000//2):
    super(LaTrForVQA, self).__init__()   
    self.config = config
    self.save_hyperparameters()
    self.latr =  LaTr_for_finetuning(config)
    self.training_losses = []
    self.validation_losses = []
    self.max_steps = max_steps

  def configure_optimizers(self):
    return torch.optim.AdamW(self.parameters(), lr = self.hparams['learning_rate'])

  def forward(self, batch_dict):
    boxes =   batch_dict['boxes']
    img =     batch_dict['img']
    question = batch_dict['question']
    words =   batch_dict['tokenized_words']
    answer_vector = self.latr(lang_vect = words, 
                               spatial_vect = boxes, 
                               img_vect = img, 
                               quest_vect = question
                               )
    return answer_vector

  def calculate_metrics(self, prediction, labels):

      ## Calculate the accuracy score between the prediction and ground label for a batch, with considering the pad sequence
      batch_size = len(prediction)
      ac_score = 0

      for (pred, gt) in zip(prediction, labels):
        ac_score+= calculate_acc_score(pred.detach().cpu(), gt.detach().cpu())
      ac_score = ac_score/batch_size
      return ac_score

  def training_step(self, batch, batch_idx):
    answer_vector = self.forward(batch)

    ## https://discuss.huggingface.co/t/bertformaskedlm-s-loss-and-scores-how-the-loss-is-computed/607/2
    loss = nn.CrossEntropyLoss()(answer_vector.reshape(-1,self.config['classes']), batch['answer'].reshape(-1))
    _, preds = torch.max(answer_vector, dim = -1)

    ## Calculating the accuracy score
    train_acc = self.calculate_metrics(preds, batch['answer'])
    train_acc = torch.tensor(train_acc)

    ## Logging
    self.log('train_ce_loss', loss,prog_bar = True)
    self.log('train_acc', train_acc, prog_bar = True)
    self.training_losses.append(loss.item())

    return loss

  def validation_step(self, batch, batch_idx):
    logits = self.forward(batch)
    loss = nn.CrossEntropyLoss()(logits.reshape(-1,self.config['classes']), batch['answer'].reshape(-1))
    _, preds = torch.max(logits, dim = -1)

    ## Validation Accuracy
    val_acc = self.calculate_metrics(preds.cpu(), batch['answer'].cpu())
    val_acc = torch.tensor(val_acc)

    ## Logging
    self.log('val_ce_loss', loss, prog_bar = True)
    self.log('val_acc', val_acc, prog_bar = True)    
    return {'val_loss': loss, 'val_acc': val_acc}

  def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i, opt_closure = None, on_tpu=False,
    using_native_amp=False, using_lbfgs=False):
        if self.trainer.global_step < 1000:
            lr_scale = min(1., float(self.trainer.global_step + 1) / 1000.)
            for pg in optimizer.param_groups:
                pg['lr'] = lr_scale * self.hparams.learning_rate
        else:
            for pg in optimizer.param_groups:
                pg['lr'] = polynomial(self.hparams.learning_rate, self.trainer.global_step, max_iter = self.max_steps)
        optimizer.step(opt_closure)
        optimizer.zero_grad()

  def validation_epoch_end(self, outputs):       
        val_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        val_acc = torch.stack([x['val_acc'] for x in outputs]).mean()
        self.log('val_loss_epoch_end', val_loss, on_epoch=True, sync_dist=True)
        self.log('val_acc_epoch_end', val_acc, on_epoch=True, sync_dist=True)
        
        self.val_prediction = []

model=LaTrForVQA(config)

and this is for trainer

trainer = pl.Trainer(
    max_steps = max_steps,
    default_root_dir="runs",
    gpus=2,
    deterministic=True,
) 

Now I fit my model

datamodule = DataModule(train_ds, val_ds)
trainer.fit(model,datamodule)   

If you have any other questions, be sure to ask…
I change my code according some suggestion to don’t use parallel and only use pl.trainer(gpus=2) but with this code I still can’t use both of GPU in parallel and only one GPU work in source code and I dont know what other change I should apply in this source.

To enable parallelization in Lightning training, you need to use strategy here:
accelerators — PyTorch Lightning 2.1.2 documentation.

Hi,
I use different type of strategy and anyone don’t work and give me different type of errors and I don’t know what to do and can’t find any solution
I mention some of this error

When I set strategy=“ddp_spawn” (or ddp).It gives me this error

MisconfigurationException: Trainer(strategy='ddp_spawn') is not compatible with an interactive environment. Run your code as a script, or choose one of the compatible strategies: Trainer(strategy=None|dp|ddp_fork). In case you are spawning processes yourself, make sure to include the Trainer creation inside the worker function

When I set strategy=“ddp_fork” .It gives me this error
"Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method

When I set strategy=“dp” . It gives me this error
AssertionError: Gather function not implemented for CPU tensors

When I set strategy=" ddp_fork " (or ddp_notebook).It gives me this error
ValueError: ‘ddp_notebook’ is not a valid DistributedType

If it is my wrong in source code what to do?
I’d so appreciate it if you could help me with this wrong.

How did you set your multi-process env? I mean from these errors looks it is complaining the way you set up multi-process env.

If I have understood your point, first of all I create a virtual environment in anaconda navigator with python 3.7 and my source could identify both of GPU

import torch
print(‘__CUDNN VERSION:’, torch.backends.cudnn.version())
print(‘__Number CUDA Devices:’, torch.cuda.device_count())
print(‘__CUDA Device Name 1:’,torch.cuda.get_device_name(0))
print(‘__CUDA Device Name 2:’,torch.cuda.get_device_name(1))
print(‘__CUDA Device current:’,torch.cuda.get_device_name(torch.cuda.current_device()))
print(‘__CUDA Device Total Memory [GB]:’,torch.cuda.get_device_properties(0).total_memory/1e9)
print(‘__CUDA Device Total Memory [GB]:’,torch.cuda.get_device_properties(1).total_memory/1e9)
print(‘__CUDA Device Total Memory [GB]:’,torch.cuda.get_device_properties(0).total_memory/1e9+torch.cuda.get_device_properties(1).total_memory/1e9)

_CUDNN VERSION: 8302
__Number CUDA Devices: 2
__CUDA Device Name 1: NVIDIA GeForce GTX 1080 Ti
__CUDA Device Name 2: NVIDIA GeForce GTX 1070 Ti
__CUDA Device Total Memory [GB]: 11.721572352
__CUDA Device Total Memory [GB]: 8.514043904
__CUDA Device Total Memory [GB]: 20.235616256

version of pytorch-lightning is 1.8.0.post1

and I set for different kinds of pytorch package . All of them have these errors.
It is worth mentioning that with the same settings I can use both of my GPU in dataparallel only but I can’t use multi gpu in pytorch-lightning .

If I have not understood your point, may you give more guidance.

Hi @fduwjj ,
This problem is very important for me. I would appreciate it if you gave your opinion about this problem. :pray:

try replacing pl.LightningDataModule in class DataModule() with pl.LightningModule

Did you find an answer? I’m having similar problems…

Unfortunately, no. I use gpu that have more RAM.