'memory_format' argument is incompatible with Metal tensor

:bug: Bug

To Reproduce

Steps to reproduce the behavior:

I followed this tutorial and confirmed that MobileNetv2 with Metal backend runs correctly on my phone.

This is the code I used to export a PyTorch model with Metal backend.

import torch
import torch.nn as nn
import torch.utils.mobile_optimizer as mobile_optimizer
import torch.nn.functional as F

class Demo(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        x = F.interpolate(x, scale_factor=0.25, mode='bilinear')
        return x

model = Demo()
model = torch.quantization.convert(model)
model = torch.jit.script(model)
model = mobile_optimizer.optimize_for_mobile(model, backend='Metal')
model._save_for_lite_interpreter('model.ptl')

x = torch.rand((1, 3, 256, 256))
out = model(x)
print(out.shape)

In Xcode I used the following code:

    c10::InferenceMode mode;
    at::Tensor tensor = torch::rand({1, 3, 256, 256}, at::kFloat).metal();
    auto outputTensor = _impl.forward({tensor}).toTensor().cpu();
    std::cout << outputTensor << std::endl;

I got the following stack traces:

2021-08-06 11:14:27.162164+0900 HelloWorld[1919:810455] Metal GPU Frame Capture Enabled
2021-08-06 11:14:27.162400+0900 HelloWorld[1919:810455] Metal API Validation Enabled
2021-08-06 11:14:27.544492+0900 HelloWorld[1919:810455] 'memory_format' argument is incompatible with Metal tensor
  
  Debug info for handle, -1, not found.
  
Exception raised from empty at /path/pytorch_metal/aten/src/ATen/native/metal/MetalAten.mm:84 (most recent call first):
frame #0: _ZN3c106detail14torchCheckFailEPKcS2_jS2_ + 92 (0x10320a224 in HelloWorld)
frame #1: _ZN2at6native5metal5emptyEN3c108ArrayRefIxEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE + 284 (0x10303e948 in HelloWorld)
frame #2: _ZN2at12_GLOBAL__N_119empty_memory_formatEN3c108ArrayRefIxEENS1_8optionalINS1_10ScalarTypeEEENS4_INS1_6LayoutEEENS4_INS1_6DeviceEEENS4_IbEENS4_INS1_12MemoryFormatEEE + 220 (0x1025f3c10 in HelloWorld)
frame #3: _ZNK3c1010Dispatcher4callIN2at6TensorEJNS_8ArrayRefIxEENS_8optionalINS_10ScalarTypeEEENS6_INS_6LayoutEEENS6_INS_6DeviceEEENS6_IbEENS6_INS_12MemoryFormatEEEEEET_RKNS_19TypedOperatorHandleIFSG_DpT0_EEESJ_ + 220 (0x1024fe25c in HelloWorld)
frame #4: _ZN2at4_ops19empty_memory_format4callEN3c108ArrayRefIxEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE + 144 (0x1023c4ab0 in HelloWorld)
frame #5: _ZN2at6native15constant_pad_ndERKNS_6TensorEN3c108ArrayRefIxEERKNS4_6ScalarE + 1148 (0x102e032cc in HelloWorld)
frame #6: _ZN3c104impl34call_functor_with_args_from_stack_INS0_6detail31WrapFunctionIntoRuntimeFunctor_IPFN2at6TensorERKS5_NS_8ArrayRefIxEERKNS_6ScalarEES5_NS_4guts8typelist8typelistIJS7_S9_SC_EEEEELb0EJLm0ELm1ELm2EEJS7_S9_SC_EEENSt3__15decayINSF_21infer_function_traitsIT_E4type11return_typeEE4typeEPNS_14OperatorKernelENS_14DispatchKeySetEPNSK_6vectorINS_6IValueENSK_9allocatorISW_EEEENSK_16integer_sequenceImJXspT1_EEEEPNSH_IJDpT2_EEE + 136 (0x102722250 in HelloWorld)
frame #7: _ZN3c104impl31make_boxed_from_unboxed_functorINS0_6detail31WrapFunctionIntoRuntimeFunctor_IPFN2at6TensorERKS5_NS_8ArrayRefIxEERKNS_6ScalarEES5_NS_4guts8typelist8typelistIJS7_S9_SC_EEEEELb0EE4callEPNS_14OperatorKernelERKNS_14OperatorHandleENS_14DispatchKeySetEPNSt3__16vectorINS_6IValueENSR_9allocatorIST_EEEE + 40 (0x102722164 in HelloWorld)
frame #8: _ZNK3c1010Dispatcher9callBoxedERKNS_14OperatorHandleEPNSt3__16vectorINS_6IValueENS4_9allocatorIS6_EEEE + 128 (0x1031092ec in HelloWorld)
frame #9: _ZN5torch3jit6mobile16InterpreterState3runERNSt3__16vectorIN3c106IValueENS3_9allocatorIS6_EEEE + 4056 (0x103115bfc in HelloWorld)
frame #10: _ZNK5torch3jit6mobile8Function3runERNSt3__16vectorIN3c106IValueENS3_9allocatorIS6_EEEE + 192 (0x103107ea8 in HelloWorld)
frame #11: _ZNK5torch3jit6mobile6Method3runERNSt3__16vectorIN3c106IValueENS3_9allocatorIS6_EEEE + 516 (0x10311a76c in HelloWorld)
frame #12: _ZNK5torch3jit6mobile6MethodclENSt3__16vectorIN3c106IValueENS3_9allocatorIS6_EEEE + 24 (0x10311b350 in HelloWorld)
frame #13: _ZN5torch3jit6mobile6Module7forwardENSt3__16vectorIN3c106IValueENS3_9allocatorIS6_EEEE + 172 (0x102378984 in HelloWorld)
frame #14: -[TorchModule + + (0x102378328 in HelloWorld)
frame #15: $s10HelloWorld14ViewControllerC11viewDidLoadyyF + 1660 (0x1023886d4 in HelloWorld)
frame #16: $s10HelloWorld14ViewControllerC11viewDidLoadyyFTo + 32 (0x1023891dc in HelloWorld)
frame #17: 186F3A78-108A-3057-A67E-800A88EBFF00 + 4611712 (0x1845b0e80 in UIKitCore)
frame #18: 186F3A78-108A-3057-A67E-800A88EBFF00 + 4629560 (0x1845b5438 in UIKitCore)
frame #19: 186F3A78-108A-3057-A67E-800A88EBFF00 + 3874620 (0x1844fcf3c in UIKitCore)
frame #20: 186F3A78-108A-3057-A67E-800A88EBFF00 + 3875400 (0x1844fd248 in UIKitCore)
frame #21: 186F3A78-108A-3057-A67E-800A88EBFF00 + 3879180 (0x1844fe10c in UIKitCore)
frame #22: 186F3A78-108A-3057-A67E-800A88EBFF00 + 3884176 (0x1844ff490 in UIKitCore)
frame #23: 186F3A78-108A-3057-A67E-800A88EBFF00 + 3765444 (0x1844e24c4 in UIKitCore)
frame #24: 186F3A78-108A-3057-A67E-800A88EBFF00 + 16971476 (0x18517a6d4 in UIKitCore)
frame #25: CC806D5A-7150-373C-9CAA-1507F0A58DF1 + 1434660 (0x1855f0424 in QuartzCore)
frame #26: CC806D5A-7150-373C-9CAA-1507F0A58DF1 + 1461164 (0x1855f6bac in QuartzCore)
frame #27: CC806D5A-7150-373C-9CAA-1507F0A58DF1 + 1507692 (0x18560216c in QuartzCore)
frame #28: CC806D5A-7150-373C-9CAA-1507F0A58DF1 + 755064 (0x18554a578 in QuartzCore)
frame #29: CC806D5A-7150-373C-9CAA-1507F0A58DF1 + 930504 (0x1855752c8 in QuartzCore)
frame #30: 186F3A78-108A-3057-A67E-800A88EBFF00 + 11851464 (0x184c986c8 in UIKitCore)
frame #31: 4D6DD6DD-22E4-3858-9A0C-3CB77C2F13D6 + 632400 (0x182356650 in CoreFoundation)
frame #32: 4D6DD6DD-22E4-3858-9A0C-3CB77C2F13D6 + 628964 (0x1823558e4 in CoreFoundation)
frame #33: 4D6DD6DD-22E4-3858-9A0C-3CB77C2F13D6 + 606324 (0x182350074 in CoreFoundation)
frame #34: CFRunLoopRunSpecific + 572 (0x18234f818 in CoreFoundation)
frame #35: GSEventRunModal + 160 (0x198a55570 in GraphicsServices)
frame #36: 186F3A78-108A-3057-A67E-800A88EBFF00 + 11731176 (0x184c7b0e8 in UIKitCore)
frame #37: UIApplicationMain + 164 (0x184c80664 in UIKitCore)
frame #38: main + 84 (0x10238d1cc in HelloWorld)
frame #39: 5FFFB964-39D6-3CCF-BD34-C6CA4A148D1A + 4416 (0x18202e140 in libdyld.dylib)

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch version: 1.10.0a0+git512448a
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 11.5 (x86_64)
GCC version: Could not collect
Clang version: 12.0.5 (clang-1205.0.22.11)
CMake version: version 3.19.6
Libc version: N/A

Python version: 3.9.5 (default, May 18 2021, 12:31:01) [Clang 10.0.0 ] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.10.0a0+git512448a
[conda] blas 1.0 mkl
[conda] mkl 2021.2.0 hecd8cb5_269
[conda] mkl-include 2021.2.0 hecd8cb5_269
[conda] mkl-service 2.3.0 py39h9ed2024_1
[conda] mkl_fft 1.3.0 py39h4a7008c_2
[conda] mkl_random 1.2.1 py39hb2f4e1b_2
[conda] numpy 1.20.2 py39h4b4dc7a_0
[conda] numpy-base 1.20.2 py39he0bd621_0
[conda] torch 1.10.0a0+git512448a dev_0

Additional context

cc: @xta0. Looks like some issue with metal tensor?

Have you tried not quantizing your model? We don’t support quantization for metal yet.

:slightly_smiling_face: Hey everyone,

Dealing with compatibility issues can be really frustrating, especially when you’re following a tutorial and things should be working smoothly.

Kenmbkr, I’ve had my fair share of battles with ‘memory_format’ and Metal tensor in PyTorch, so I feel your pain. From your description and stack traces, it seems like the ‘memory_format’ argument is causing the incompatibility issue with Metal tensors. The error likely lies in how your model is being optimized for mobile using the Metal backend.

One thing you might want to try is not quantizing your model. Sometimes, these compatibility issues arise due to the interactions between optimization techniques. As of my last knowledge update, PyTorch’s quantization might not be fully compatible with the Metal backend. Removing quantization could be a workaround, although it might impact the model’s performance.

Believe me, I know what I am talking about, I have a lot of experience in the IT-Dienstleistungen und IT-Services industry and I have seen similar cases. Maybe in your case this is the only way out. Moreover, colleagues in this discussion also advise to follow this approach and not to quantize your model. Maybe you should listen to them?