RuntimeError: input tensor does not match matmul output shape

aiden-leong · May 19, 2022, 4:54am

None
Frame 0 Prompt: ['A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.', 'yellow color scheme']
Seed used: 85109229
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [72], in <cell line: 173>()
    172 torch.cuda.empty_cache()
    173 try:
--> 174   do_run()
    175 except KeyboardInterrupt:
    176     pass

Input In [65], in do_run()
    475 for prompt in frame_prompt:
    476     txt, weight = parse_prompt(prompt)
--> 477     txt = clip_model.encode_text(clip.tokenize(prompt).to(device)).float()
    479     if args.fuzzy_prompt:
    480         for i in range(25):

File ~/notebook/CLIP/clip/model.py:349, in CLIP.encode_text(self, text)
    347 x = x + self.positional_embedding.type(self.dtype)
    348 x = x.permute(1, 0, 2)  # NLD -> LND
--> 349 x = self.transformer(x)
    350 x = x.permute(1, 0, 2)  # LND -> NLD
    351 x = self.ln_final(x).type(self.dtype)

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/notebook/CLIP/clip/model.py:204, in Transformer.forward(self, x)
    203 def forward(self, x: torch.Tensor):
--> 204     return self.resblocks(x)

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/container.py:139, in Sequential.forward(self, input)
    137 def forward(self, input):
    138     for module in self:
--> 139         input = module(input)
    140     return input

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/notebook/CLIP/clip/model.py:191, in ResidualAttentionBlock.forward(self, x)
    190 def forward(self, x: torch.Tensor):
--> 191     x = x + self.attention(self.ln_1(x))
    192     x = x + self.mlp(self.ln_2(x))
    193     return x

File ~/notebook/CLIP/clip/model.py:188, in ResidualAttentionBlock.attention(self, x)
    186 def attention(self, x: torch.Tensor):
    187     self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
--> 188     return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/activation.py:1153, in MultiheadAttention.forward(self, query, key, value, key_padding_mask, need_weights, attn_mask, average_attn_weights)
   1142     attn_output, attn_output_weights = F.multi_head_attention_forward(
   1143         query, key, value, self.embed_dim, self.num_heads,
   1144         self.in_proj_weight, self.in_proj_bias,
   (...)
   1150         q_proj_weight=self.q_proj_weight, k_proj_weight=self.k_proj_weight,
   1151         v_proj_weight=self.v_proj_weight, average_attn_weights=average_attn_weights)
   1152 else:
-> 1153     attn_output, attn_output_weights = F.multi_head_attention_forward(
   1154         query, key, value, self.embed_dim, self.num_heads,
   1155         self.in_proj_weight, self.in_proj_bias,
   1156         self.bias_k, self.bias_v, self.add_zero_attn,
   1157         self.dropout, self.out_proj.weight, self.out_proj.bias,
   1158         training=self.training,
   1159         key_padding_mask=key_padding_mask, need_weights=need_weights,
   1160         attn_mask=attn_mask, average_attn_weights=average_attn_weights)
   1161 if self.batch_first and is_batched:
   1162     return attn_output.transpose(1, 0), attn_output_weights

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/functional.py:5128, in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v, average_attn_weights)
   5123     dropout_p = 0.0
   5125 #
   5126 # (deep breath) calculate attention and out projection
   5127 #
-> 5128 attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
   5129 attn_output = attn_output.transpose(0, 1).contiguous().view(tgt_len * bsz, embed_dim)
   5130 attn_output = linear(attn_output, out_proj_weight, out_proj_bias)

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/functional.py:4801, in _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
   4799 # (B, Nt, E) x (B, E, Ns) -> (B, Nt, Ns)
   4800 if attn_mask is not None:
-> 4801     attn = torch.baddbmm(attn_mask, q, k.transpose(-2, -1))
   4802 else:
   4803     attn = torch.bmm(q, k.transpose(-2, -1))

RuntimeError: input tensor does not match matmul output shape

aiden-leong · May 19, 2022, 1:36pm

github.com/pytorch/pytorch

torch.baddbmm fails on Apple M1

opened 02:00AM - 19 May 22 UTC

yuxdux

### 🐛 Describe the bug torch.baddbmm operation doesn't work on Apple M1 Pro wit…h MPS backend. Context: trying to run OpenAI's CLIP as of https://github.com/openai/clip I have found a couple more bugs while trying to bring up CLIP, not sure if I should post them with this issue, please comment. ``` import torch M = torch.randn(10, 3, 5) batch1 = torch.randn(10, 3, 4) batch2 = torch.randn(10, 4, 5) print(torch.baddbmm(M, batch1, batch2).size()) # torch.Size([10, 3, 5]) M = torch.randn(10, 3, 5).to("mps") batch1 = torch.randn(10, 3, 4).to("mps") batch2 = torch.randn(10, 4, 5).to("mps") print(torch.baddbmm(M, batch1, batch2).size()) #RuntimeError: input tensor does not match matmul output shape ``` ### Versions Collecting environment information... PyTorch version: 1.12.0.dev20220518 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 12.3.1 (arm64) GCC version: Could not collect Clang version: 13.1.6 (clang-1316.0.21.2.5) CMake version: Could not collect Libc version: N/A Python version: 3.9.12 (main, May 8 2022, 17:57:49) [Clang 13.1.6 (clang-1316.0.21.2)] (64-bit runtime) Python platform: macOS-12.3.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] pytorch-lightning==1.6.3 [pip3] torch==1.12.0.dev20220518 [pip3] torchaudio==0.11.0 [pip3] torchmetrics==0.8.2 [pip3] torchvision==0.12.0 [conda] Could not collect

aiden-leong · May 19, 2022, 3:06pm

@albanD Do you have the binary wheel file mentioned in the GitHub issue? I can’t find it anywhere.

albanD · May 19, 2022, 3:14pm

Hey!
I do not necessarily recommend to use these as they are a little bit work in progress.
But what is mentioned there can be found on this page (make sure you’re logged in to github): macos-arm64-binary-wheel · pytorch/pytorch@bf961d5 · GitHub
At the very bottom, you will find zip files that contain a .whl file that you can pip install torch-XXX.whl.

aiden-leong · May 19, 2022, 3:38pm

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [6], in <cell line: 4>()
      2 prompt = "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
      3 clip_model = clip.load('ViT-B/32', jit=False)[0].eval().requires_grad_(False).to(device)
----> 4 txt = clip_model.encode_text(clip.tokenize(prompt).to(device)).float()

File ~/notebook/CLIP/clip/model.py:355, in CLIP.encode_text(self, text)
    351 x = self.ln_final(x).type(self.dtype)
    353 # x.shape = [batch_size, n_ctx, transformer.width]
    354 # take features from the eot embedding (eot_token is the highest number in each sequence)
--> 355 x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
    357 return x

NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::index.Tensor' is only available for these backends: [Dense, Negative, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37399 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:1294 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp:11951 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

I do not necessarily recommend to use these as they are a little bit work in progress.

I’ve been waiting GPU acceleration for too long, so I’m eager to test if it’s now ready to run disco diffusion locally.

github.com/pytorch/pytorch

NotImplementedError: Could not run 'aten::index.Tensor' on MPS

opened 09:49PM - 18 May 22 UTC

closed 07:26PM - 22 Aug 22 UTC

philspence

feature triaged module: mps

### 🐛 Describe the bug I see that there are other NotImplementedErrors being re…port but wanted to add this one to the list too: ``` import torch t = torch.tensor([0, 1, 2], device='mps') t[t == 1] ``` `NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::index.Tensor' is only available for these backends: [Dense, Negative, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].` ### Versions Collecting environment information... PyTorch version: 1.12.0.dev20220518 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 12.3.1 (arm64) GCC version: Could not collect Clang version: 13.1.6 (clang-1316.0.21.2.5) CMake version: version 3.22.3 Libc version: N/A Python version: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ] (64-bit runtime) Python platform: macOS-12.3.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.22.3 [pip3] torch==1.12.0.dev20220518 [pip3] torchaudio==0.11.0 [pip3] torchvision==0.12.0 [conda] numpy 1.22.3 py310h99fd38b_2 conda-forge [conda] torch 1.12.0.dev20220518 pypi_0 pypi [conda] torchaudio 0.11.0 pypi_0 pypi [conda] torchvision 0.12.0 pypi_0 pypi

albanD · May 19, 2022, 9:14pm

If you’re running from that binary, you should be able to set PYTORCH_ENABLE_MPS_FALLBACK=1 to get the fallback and avoid this error.