Failing to replicate model on MultipleGPUs

mujtabaasif · March 26, 2020, 4:10pm

Previously I raised an issue #34941. After debugging the issue I found there is a bug in function take_tensors https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/tensor_flatten.cpp#L10
The functions lost the track of a buffer in the following scenario.

Function Input:
-> Tensor of size 356, size_limit , fine_grained = false

Function steps:
At line https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/tensor_flatten.cpp#L55
we have 3 groups:
group 1 -> 1 element, size 0
group 2 -> 238 elements
group 3 -> 117 elements

So due to size 0 of group 1, this https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/tensor_flatten.cpp#L57 condition becomes True and ignores group 1.

At line https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/tensor_flatten.cpp#L62
The result becomes:
group 2 -> 238 elements
group 3 -> 117 elements

so the tensor size is 238 + 117 = 355

Which causes this to assert failure because of (356 != 355)

github.com

pytorch/pytorch/blob/master/torch/csrc/utils/tensor_flatten.cpp#L66


    auto& group = entry.second;
    if (!fine_grained && group.size == 0) {
      continue;
    }
    results.emplace_back(std::move(group));
  }
  return results;
}


void reorder_tensors_like(std::vector<Tensor>& tensors, TensorList order) {
  AT_ASSERT(tensors.size() == order.size());
  std::unordered_map<at::DeprecatedTypeProperties*, std::vector<size_t>> type_indices;
  for (size_t i = 0, num_tensors = tensors.size(); i < num_tensors; ++i)
    type_indices[&tensors[i].type()].push_back(i);


  std::unordered_map<at::DeprecatedTypeProperties*, size_t> type_used;
  std::vector<Tensor> ordered_tensors;
  ordered_tensors.reserve(tensors.size());
  for (auto & tmpl_tensor : order) {
    auto * type = &tmpl_tensor.type();
    auto & indices = type_indices[type];

Here is the traceback of a bug

"/opt/conda/envs/m2release/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/envs/m2release/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 154, in forward replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) File "/opt/conda/envs/m2release/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 159, in replicate return replicate(module, device_ids, not torch.is_grad_enabled()) File "/opt/conda/envs/m2release/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 102, in replicate buffer_copies_not_rg = _broadcast_coalesced_reshape(buffers_not_rg, devices, detach=True) File "/opt/conda/envs/m2release/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 66, in _broadcast_coalesced_reshape return comm.broadcast_coalesced(tensors, devices) File "/opt/conda/envs/m2release/lib/python3.6/site-packages/torch/cuda/comm.py", line 39, in broadcast_coalesced return torch._C._broadcast_coalesced(tensors, devices, buffer_size) RuntimeError: tensors.size() == order.size() INTERNAL ASSERT FAILED at ../torch/csrc/utils/tensor_flatten.cpp:74, please report a bug to PyTorch.