RuntimeError: could not create a primitive


I’m new to pytorch and for a student project I’d like to setup a simple convolution for an image which is 26 megapixel. The below code fails with “RuntimeError: could not create a primitive”. When I decrease the size of the ´torch.randn´ the conv2d works as expected. Where is my error? The code is from a larger project SESF_Net on github. I’m just trying to learn and understand.

import os
import torch
import torch.nn as nn
import torch.nn.functional as f
r_shift_kernel = torch.FloatTensor([[0, 0, 0], [1, 0, 0], [0, 0, 0]]).reshape((1, 1, 3, 3)).repeat(64, 1, 1, 1)
f1 = torch.randn(1,64,4160,6240)
f1_r_shift = f.conv2d(f1, r_shift_kernel, padding=1, groups=64)

Thanks and best wishes.

I cannot reproduce this issue using the provided code snippet and a source build.
Unfortunately, I also cannot find a lot of references for the mentioned RuntimeError. Could you post the complete stack trace, please?

The traceback only looks like this:

Traceback (most recent call last):
  File "", line 7, in <module>
    f1_r_shift = f.conv2d(f1, r_shift_kernel, padding=1, groups=64)
RuntimeError: could not create a primitive

This morning, I tested the example on my Laptop successfully (Macbook from 2017 with 16 GB, above code takes 3 hours…). The above code crashes on a Linux computer (cpu-only, no cuda) with a Xeon 12-core and 192 GB RAM… I also reinstalled all python packages and tested python3.7 and 3.9. Running the latest torch and torch-vision versions available in pip. Same error on that machine. Any other way to debug it? I’m a bit clueless.

The only information for this error message I could find was in a PaddlePaddle discussion, where one user pointed to missing AVX instructions on the CPU, which could then raise this issue.
Could your CPU be lacking this instruction set?

The cpu is a Xeon W-3235. According to the specification the processor should have two AVX-512 instruction units.

The error is only raised when I choose the dimensions of the image quite high. The following would work, but not the larger values above.

f1 = torch.randn(1,64,2400,3200)

With large values, f.conv2d crashes later with the meaningless error. Any other way to debug this?

Could you create an issue in GitHub, so that we could track and check this error?
It could be a known limitation of specific CPU instructions, but I’m not familiar enough with these CPU backend libs.