BatchNorm1d backward error

jetcai1900 · July 13, 2020, 1:36am

I am using BatchNorm1d proceeding a linear layer and the input to BatchNorm1d is 2-dimensional. But I get the error message “torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “anaconda3/lib/python3.7/site-packages/torch/autograd/init.py”, line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: tensor does not have a device”

The pytorch version is 1.4.0.

However, this code works with pytorch 0.1.12. Anyone has any ideas on fixing this bug? Thanks a lot.

ptrblck · July 13, 2020, 1:49am

Could you update PyTorch to 1.5.1 and rerun the code?
If you are still seeing this issue, could you post a code snippet to reproduce it, please?

jetcai1900 · July 13, 2020, 2:04am

Thanks for your reply. Still the same error “RuntimeError: tensor does not have a device (device at /opt/conda/conda-bld/pytorch_1591914880026/work/c10/core/TensorImpl.h:463)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f178facab5e in /anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: at::Tensor::options() const + 0x1ef (0x7f17bce2675f in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: at::native::empty_like(at::Tensor const&, c10::TensorOptions const&, c10::optionalc10::MemoryFormat) + 0x36 (0x7f17b7c5e556 in anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0xe912d1 (0x7f17b7f6b2d1 in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: + 0xee45d3 (0x7f17b7fbe5d3 in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: + 0x26c805a (0x7f17925ec05a in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: std::tuple<at::Tensor, at::Tensor, at::Tensor> at::native::batch_norm_backward_cuda_template<float, float, int>(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, bool, double, std::array<bool, 3ul>) + 0x17f (0x7f179260d77f in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: at::native::batch_norm_backward_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, bool, double, std::array<bool, 3ul>) + 0x2ce (0x7f17925eea2e in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0xde3c3c (0x7f1790d07c3c in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #9: + 0xe21143 (0x7f17b7efb143 in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x2884df4 (0x7f17b995edf4 in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: + 0xe21143 (0x7f17b7efb143 in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x25f0429 (0x7f17b96ca429 in anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::NativeBatchNormBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x338 (0x7f17b96ca9a8 in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: + 0x2ae7df5 (0x7f17b9bc1df5 in /anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f17b9bbf0f3 in anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f17b9bbfed2 in/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f17b9bb8549 in anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f17bd108638 in/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #19: + 0xb8408 (0x7f17bf958408 in /anaconda3/lib/python3.7/site-packages/torch/lib/…/…/…/…/./libstdc++.so.6)
frame #20: + 0x7e65 (0x7f17e71c6e65 in /lib64/libpthread.so.0)
frame #21: clone + 0x6d (0x7f17e6eef88d in /lib64/libc.so.6)”

my code is a little bit long.

jetcai1900 · July 13, 2020, 2:11am

I am running this code.

Not sure whether we need to register parameter?

github.com

hyqneuron/pytorch-avitm/blob/master/pytorch_model.py#L42


    self.register_buffer('prior_mean',    prior_mean)
    self.register_buffer('prior_var',     prior_var)
    self.register_buffer('prior_logvar',  prior_logvar)
    # initialize decoder weight
    if ac.init_mult != 0:
        #std = 1. / math.sqrt( ac.init_mult * (ac.num_topic + ac.num_input))
        self.decoder.weight.data.uniform_(0, ac.init_mult)
    # remove BN's scale parameters
    self.logvar_bn .register_parameter('weight', None)
    self.mean_bn   .register_parameter('weight', None)
    self.decoder_bn.register_parameter('weight', None)
    self.decoder_bn.register_parameter('weight', None)

def forward(self, input, compute_loss=False, avg_loss=True):
    # compute posterior
    en1 = F.softplus(self.en1_fc(input))                            # en1_fc   output
    en2 = F.softplus(self.en2_fc(en1))                              # encoder2 output
    en2 = self.en2_drop(en2)
    posterior_mean   = self.mean_bn  (self.mean_fc  (en2))          # posterior mean
    posterior_logvar = self.logvar_bn(self.logvar_fc(en2))          # posterior log variance
    posterior_var    = posterior_logvar.exp()

jetcai1900 · July 13, 2020, 2:32am

I think “self.logvar_bn.register_parameter(‘weight’, None)” is the fault. Could you explain why after declaring logvar_bn = nn.BatchNorm1d, we need to register_parameter to None? Thanks. Does this mean we are not going to use batch norm?

ptrblck · July 14, 2020, 1:59am

The affine parameters won’t be used, if you register the weight and bias with None.
The PyTorch nn.BatchNorm implementation does the same in these lines of code.

I’m not familiar with the linked repository, but could you replace it with a valid parameter (or just remove these lines), if you think they might cause the issue?