RuntimeError: input tensor does not match matmul output shape

None
Frame 0 Prompt: ['A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.', 'yellow color scheme']
Seed used: 85109229
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [72], in <cell line: 173>()
    172 torch.cuda.empty_cache()
    173 try:
--> 174   do_run()
    175 except KeyboardInterrupt:
    176     pass

Input In [65], in do_run()
    475 for prompt in frame_prompt:
    476     txt, weight = parse_prompt(prompt)
--> 477     txt = clip_model.encode_text(clip.tokenize(prompt).to(device)).float()
    479     if args.fuzzy_prompt:
    480         for i in range(25):

File ~/notebook/CLIP/clip/model.py:349, in CLIP.encode_text(self, text)
    347 x = x + self.positional_embedding.type(self.dtype)
    348 x = x.permute(1, 0, 2)  # NLD -> LND
--> 349 x = self.transformer(x)
    350 x = x.permute(1, 0, 2)  # LND -> NLD
    351 x = self.ln_final(x).type(self.dtype)

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/notebook/CLIP/clip/model.py:204, in Transformer.forward(self, x)
    203 def forward(self, x: torch.Tensor):
--> 204     return self.resblocks(x)

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/container.py:139, in Sequential.forward(self, input)
    137 def forward(self, input):
    138     for module in self:
--> 139         input = module(input)
    140     return input

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/notebook/CLIP/clip/model.py:191, in ResidualAttentionBlock.forward(self, x)
    190 def forward(self, x: torch.Tensor):
--> 191     x = x + self.attention(self.ln_1(x))
    192     x = x + self.mlp(self.ln_2(x))
    193     return x

File ~/notebook/CLIP/clip/model.py:188, in ResidualAttentionBlock.attention(self, x)
    186 def attention(self, x: torch.Tensor):
    187     self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
--> 188     return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/modules/activation.py:1153, in MultiheadAttention.forward(self, query, key, value, key_padding_mask, need_weights, attn_mask, average_attn_weights)
   1142     attn_output, attn_output_weights = F.multi_head_attention_forward(
   1143         query, key, value, self.embed_dim, self.num_heads,
   1144         self.in_proj_weight, self.in_proj_bias,
   (...)
   1150         q_proj_weight=self.q_proj_weight, k_proj_weight=self.k_proj_weight,
   1151         v_proj_weight=self.v_proj_weight, average_attn_weights=average_attn_weights)
   1152 else:
-> 1153     attn_output, attn_output_weights = F.multi_head_attention_forward(
   1154         query, key, value, self.embed_dim, self.num_heads,
   1155         self.in_proj_weight, self.in_proj_bias,
   1156         self.bias_k, self.bias_v, self.add_zero_attn,
   1157         self.dropout, self.out_proj.weight, self.out_proj.bias,
   1158         training=self.training,
   1159         key_padding_mask=key_padding_mask, need_weights=need_weights,
   1160         attn_mask=attn_mask, average_attn_weights=average_attn_weights)
   1161 if self.batch_first and is_batched:
   1162     return attn_output.transpose(1, 0), attn_output_weights

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/functional.py:5128, in multi_head_attention_forward(query, key, value, embed_dim_to_check, num_heads, in_proj_weight, in_proj_bias, bias_k, bias_v, add_zero_attn, dropout_p, out_proj_weight, out_proj_bias, training, key_padding_mask, need_weights, attn_mask, use_separate_proj_weight, q_proj_weight, k_proj_weight, v_proj_weight, static_k, static_v, average_attn_weights)
   5123     dropout_p = 0.0
   5125 #
   5126 # (deep breath) calculate attention and out projection
   5127 #
-> 5128 attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
   5129 attn_output = attn_output.transpose(0, 1).contiguous().view(tgt_len * bsz, embed_dim)
   5130 attn_output = linear(attn_output, out_proj_weight, out_proj_bias)

File ~/miniconda3/envs/jupyter/lib/python3.8/site-packages/torch/nn/functional.py:4801, in _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
   4799 # (B, Nt, E) x (B, E, Ns) -> (B, Nt, Ns)
   4800 if attn_mask is not None:
-> 4801     attn = torch.baddbmm(attn_mask, q, k.transpose(-2, -1))
   4802 else:
   4803     attn = torch.bmm(q, k.transpose(-2, -1))

RuntimeError: input tensor does not match matmul output shape
1 Like

@albanD Do you have the binary wheel file mentioned in the GitHub issue? I can’t find it anywhere.

Hey!
I do not necessarily recommend to use these as they are a little bit work in progress.
But what is mentioned there can be found on this page (make sure you’re logged in to github): macos-arm64-binary-wheel · pytorch/pytorch@bf961d5 · GitHub
At the very bottom, you will find zip files that contain a .whl file that you can pip install torch-XXX.whl.

1 Like
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [6], in <cell line: 4>()
      2 prompt = "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
      3 clip_model = clip.load('ViT-B/32', jit=False)[0].eval().requires_grad_(False).to(device)
----> 4 txt = clip_model.encode_text(clip.tokenize(prompt).to(device)).float()

File ~/notebook/CLIP/clip/model.py:355, in CLIP.encode_text(self, text)
    351 x = self.ln_final(x).type(self.dtype)
    353 # x.shape = [batch_size, n_ctx, transformer.width]
    354 # take features from the eot embedding (eot_token is the highest number in each sequence)
--> 355 x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
    357 return x

NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::index.Tensor' is only available for these backends: [Dense, Negative, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37399 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:1294 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:11242 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp:11951 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

I do not necessarily recommend to use these as they are a little bit work in progress.

I’ve been waiting GPU acceleration for too long, so I’m eager to test if it’s now ready to run disco diffusion locally.

If you’re running from that binary, you should be able to set PYTORCH_ENABLE_MPS_FALLBACK=1 to get the fallback and avoid this error.