Model evaluation fails when using GPUs but works well on CPU

My model is trained using multiple GPUs and it works well on training. However, the problem happens during evaluation. it is pretty strange to me since the evaluation phase doesn’t show any problem using the CPU. However, when using the GPU, it makes an error like below. plz help me and thanks in advance.

(I’m using torch.transformer_encoder layers and the error message seems like it is related to the encoder mask problem but I don’t know what to do… the shape of mask is well defined and there was no problem on CPU and when training)


RuntimeError Traceback (most recent call last)
Input In [49], in <cell line: 4>()
10 pes = pes.to(device)
11 pads = pads.to(device)
—> 13 outputs = model(seqs, pes, pads)

File ~/anaconda3/envs/rna_ss/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/RSSP/RSSP/model.py:95, in Model.forward(self, seqs, pes, pads, verbose)
92 def forward(self, seqs, pes, pads, verbose=False):
93 input, pe = self.input_generator(seqs, pes)
—> 95 encoded = self.transformer_encoder(input, src_key_padding_mask = pads[:,:,0])
97 concatenated_encoder_output = torch.cat([encoded, pe], dim=2)
99 mat = matrix_rep(concatenated_encoder_output) # symmetric

File ~/anaconda3/envs/rna_ss/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/rna_ss/lib/python3.9/site-packages/torch/nn/modules/transformer.py:238, in TransformerEncoder.forward(self, src, mask, src_key_padding_mask)
236 output = mod(output, src_mask=mask)
237 else:
→ 238 output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)
240 if convert_to_nested:
241 output = output.to_padded_tensor(0.)

File ~/anaconda3/envs/rna_ss/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/rna_ss/lib/python3.9/site-packages/torch/nn/modules/transformer.py:437, in TransformerEncoderLayer.forward(self, src, src_mask, src_key_padding_mask)
417 tensor_args = (
418 src,
419 self.self_attn.in_proj_weight,
(…)
430 self.linear2.bias,
431 )
432 if (not torch.overrides.has_torch_function(tensor_args) and
433 # We have to use a list comprehension here because TorchScript
434 # doesn’t support generator expressions.
435 all([(x.is_cuda or ‘cpu’ in str(x.device)) for x in tensor_args]) and
436 (not torch.is_grad_enabled() or all([not x.requires_grad for x in tensor_args]))):
→ 437 return torch._transformer_encoder_layer_fwd(
438 src,
439 self.self_attn.embed_dim,
440 self.self_attn.num_heads,
441 self.self_attn.in_proj_weight,
442 self.self_attn.in_proj_bias,
443 self.self_attn.out_proj.weight,
444 self.self_attn.out_proj.bias,
445 self.activation_relu_or_gelu == 2,
446 False, # norm_first, currently not supported
447 self.norm1.eps,
448 self.norm1.weight,
449 self.norm1.bias,
450 self.norm2.weight,
451 self.norm2.bias,
452 self.linear1.weight,
453 self.linear1.bias,
454 self.linear2.weight,
455 self.linear2.bias,
456 src_mask if src_mask is not None else src_key_padding_mask, # TODO: split into two args
457 )
458 x = src
459 if self.norm_first:

RuntimeError: Mask shape should match input shape; transformer_mask is not supported in the fallback case.

It seems that the “fast path” in the transformer encoder is used and fails. Could you create a GitHub issue in case it’s still failing in the latest nightly release, please?

Met the same error today on the stable version.

input.shape: [4, 4096, 32]
mask.shape: [4, 4096]

Could you check the latest nightly binary and create a GitHub issue if you are still seeing the error?
I don’t know if @DJB followed up on it or not.

Hi, confirmed the same problem also in the latest nightly binary.

1.11.0 doesn’t have the problem. I’ll create a GitHub issue.

The latest nightly binary should work, since 83142 was already merged.
If you are still seeing the issue, feel free to update the corresponding issue.

1 Like

I am facing the same issue. I read the fix in Fix issue in softmax.cu with transformer error when mask seqlen > 1024 by erichan1 · Pull Request #83639 · pytorch/pytorch · GitHub but im still unsure of what to do? Do I need a different version of PyTorch? My current version is 1.12.0.

EDIT: Sorry I just created a new environment with the latest version of PyTorch 1.13.1 and I no longer get the error.