RuntimeError: could not create a primitive

fabelfroh · April 8, 2021, 2:29pm

Hi,

I’m new to pytorch and for a student project I’d like to setup a simple convolution for an image which is 26 megapixel. The below code fails with “RuntimeError: could not create a primitive”. When I decrease the size of the ´torch.randn´ the conv2d works as expected. Where is my error? The code is from a larger project SESF_Net on github. I’m just trying to learn and understand.

import os
import torch
import torch.nn as nn
import torch.nn.functional as f
r_shift_kernel = torch.FloatTensor([[0, 0, 0], [1, 0, 0], [0, 0, 0]]).reshape((1, 1, 3, 3)).repeat(64, 1, 1, 1)
f1 = torch.randn(1,64,4160,6240)
f1_r_shift = f.conv2d(f1, r_shift_kernel, padding=1, groups=64)

Thanks and best wishes.

ptrblck · April 9, 2021, 6:51am

I cannot reproduce this issue using the provided code snippet and a source build.
Unfortunately, I also cannot find a lot of references for the mentioned RuntimeError. Could you post the complete stack trace, please?

fabelfroh · April 9, 2021, 10:36am

The traceback only looks like this:

Traceback (most recent call last):
  File "demo.py", line 7, in <module>
    f1_r_shift = f.conv2d(f1, r_shift_kernel, padding=1, groups=64)
RuntimeError: could not create a primitive

This morning, I tested the example on my Laptop successfully (Macbook from 2017 with 16 GB, above code takes 3 hours…). The above code crashes on a Linux computer (cpu-only, no cuda) with a Xeon 12-core and 192 GB RAM… I also reinstalled all python packages and tested python3.7 and 3.9. Running the latest torch and torch-vision versions available in pip. Same error on that machine. Any other way to debug it? I’m a bit clueless.

ptrblck · April 10, 2021, 5:58am

The only information for this error message I could find was in a PaddlePaddle discussion, where one user pointed to missing AVX instructions on the CPU, which could then raise this issue.
Could your CPU be lacking this instruction set?

fabelfroh · April 10, 2021, 10:24am

The cpu is a Xeon W-3235. According to the specification the processor should have two AVX-512 instruction units.

The error is only raised when I choose the dimensions of the image quite high. The following would work, but not the larger values above.

f1 = torch.randn(1,64,2400,3200)

With large values, f.conv2d crashes later with the meaningless error. Any other way to debug this?

ptrblck · April 10, 2021, 8:28pm

Could you create an issue in GitHub, so that we could track and check this error?
It could be a known limitation of specific CPU instructions, but I’m not familiar enough with these CPU backend libs.