Internal assert failed from pytorch, bug report

When I run my code, this message is displayed: How can I fix it? I’m writing a neural network with pytorch-lightning and dgl with multiple optimizers, and I’m training with ddp on 1 gpu.

Traceback (most recent call last):
  File "main.py", line 50, in <module>
    trainer.fit(model, train_loader, val_loader)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 439, in fit
    results = self.accelerator_backend.train()
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 146, in train
    results = self.ddp_train(process_idx=self.task_idx, model=model)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 279, in ddp_train
    results = self.train_or_test()
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 66, in train_or_test
    results = self.trainer.train()
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 482, in train
    self.train_loop.run_training_epoch()
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 541, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 678, in run_training_batch
    self.trainer.hiddens
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 760, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 304, in training_step
    training_step_output = self.trainer.accelerator_backend.training_step(args)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 156, in training_step
    output = self.trainer.model(*args)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 176, in forward
    output = self.module.training_step(*inputs[0], **kwargs[0])
  File "/afs/ece.cmu.edu/usr/xujinl/dynamic_grpah_pooling_rl/gcn_w_dyn_pool.py", line 195, in training_step
    self.manual_backward(L_est, opts[1], retain_graph=True)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1081, in manual_backward
    self.trainer.train_loop.backward(loss, optimizer, -1, *args, **kwargs)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 781, in backward
    self.trainer.accelerator_backend.backward(result, optimizer, opt_idx, *args, **kwargs)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 98, in backward
    closure_loss.backward(*args, **kwargs)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: has_marked_unused_parameters_ INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1591914880026/work/torch/csrc/distributed/c10d/reducer.cpp:327, please report a bug to PyTorch.  (mark_variable_ready at /opt/conda/conda-bld/pytorch_1591914880026/work/torch/csrc/distributed/c10d/reducer.cpp:327)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f27ea039b5e in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::mark_variable_ready(c10d::Reducer::VariableIndex) + 0x9ba (0x7f2817a1b3aa in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: c10d::Reducer::autograd_hook(c10d::Reducer::VariableIndex) + 0x2d0 (0x7f2817a1b910 in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x8a395c (0x7f2817a1095c in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x60d (0x7f281412d00d in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7f281412eed2 in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f2814127549 in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f2817677638 in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0xc819d (0x7f2819ed219d in /afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #9: <unknown function> + 0x7ea5 (0x7f2838eebea5 in /lib64/libpthread.so.0)
frame #10: clone + 0x6d (0x7f2838c148cd in /lib64/libc.so.6)

Exception ignored in: <function tqdm.__del__ at 0x7f27dcf30b90>
Traceback (most recent call last):
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/tqdm/std.py", line 1122, in __del__
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/tqdm/std.py", line 1335, in close
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/tqdm/std.py", line 1514, in display
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/tqdm/std.py", line 1125, in __repr__
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/tqdm/std.py", line 1475, in format_dict
TypeError: cannot unpack non-iterable NoneType object

Are you seeing the same issue without using Lightning?
Also, could you post an executable code snippet so that we could reproduce this issue?

I’ve confirmed it’s an error on pytorch-lightning side. Thank you for the suggestion!