Why pytorch changes strides of tensor after inference?

tejal567 · May 26, 2021, 8:16am

I have observed that strides of input and output tensors (just before and after network inference) are different. This behavior can be observed both in python and C++. I am not sure whether this is an inherent feature or a bug.

Example:
input_Tensor size: [1, 3, 256, 256]
input_Tensor strides: [256x256x3, 256x256, 256, 1]

output_Tensor size: [1, 3, 256, 256]
output_Tensor strides: [256x256x3, 1 , 256x3 , 3]

This behavior is specially problematic when in C++ (libtorch) we try to set input Tensor data from a buffer like this:

torch::Tensor input_Tensor = torch::from_blob(input_buffer,{1,3,256,256},torch::kFloat);

and after inference try to copy output Tensor data to a buffer like this:

memcpy(output_buffer, (float*)output_Tensor.data_ptr(), 256*256*3*sizeof(float));

So, if strides of input_Tensor and output_Tensor differs like in above example, we will get wrong data filled in output_buffer. One way to resolve this is to make output_Tensor contiguous before doing memcpy.

tejal567 · May 26, 2021, 8:16am

@ptrblck Please comment on this

ptrblck · May 26, 2021, 8:28am

I guess you are using the channels_last memory format, which would change the meta data as described here.

tejal567 · May 27, 2021, 7:31am

Yes, output_Tensor is in channels_last format but I have not explicitly defined anywhere to use channels_last memory format. It’s just that the input_Tensor is contiguous but output_Tensor is not. Pytorch (libtorch) automatically did it.

ptrblck · May 27, 2021, 7:33am

That shouldn’t happen. Could you post an executable code snippet to reproduce this issue?

tejal567 · May 27, 2021, 1:25pm

Please find below exact steps and code to reproduce this issue. In my original code, I have transferred my model weights from a caffe2 network, so below python code is simulating this thing. First we will create our model file from this python code:
My execution environment:
Python torch version: 1.7.0+cu101
Libtorch version: 1.7.0 cuda 10.1

class TestNet(nn.Module):
    def __init__(self):
        super(TestNet, self).__init__()
        self.conv1 = nn.Conv2d(2,10,3,padding=1)

    def forward(self,x):
        x = self.conv1(x)
        return x

def get_arr():
    arr = np.random.rand(10,3,3,2)
    arr = np.transpose(arr,(0,3,1,2))
    return arr

model = network.TestNet()
arr = get_arr()
model._modules["conv1"].weight.data = torch.from_numpy(arr.astype(np.float32))
print (arr.shape)
print (arr.strides)
print (arr.data.contiguous)
model.eval()
arr = np.zeros((1,2,256,256)).astype(np.float32)
inp_T = torch.from_numpy(arr)
traced_script_module = torch.jit.trace(model, inp_T)
traced_script_module.save("TestNet.pt")

We have created our model file from above code, now we will execute below C++ code:

torch::jit::script::Module model = torch::jit::load("TestNet.pt");
model.to(at::kCUDA);
model.eval();
torch::NoGradGuard no_grad;
torch::Tensor tensor_in;
float* in_data = new float[2 * 256 * 256];
tensor_in = torch::from_blob(in_data, { 1, 2, 256, 256 }, torch::kFloat);
tensor_in = tensor_in.to(at::kCUDA);
tensor_in.set_requires_grad(0);
cout << tensor_in.is_contiguous() << endl;
cout << tensor_in.strides() << endl;
cout << tensor_in.sizes() << endl;
std::vector<torch::jit::IValue> inputs;
inputs.push_back(tensor_in);
torch::Tensor pred_out = model.forward(inputs).toTensor();
cout << pred_out.is_contiguous() << endl;
cout << pred_out.strides() << endl;
cout << pred_out.sizes() << endl;

Output of python code:

(10, 2, 3, 3)
(144, 8, 48, 16)
False

Output of C++ code:

1
131072,65536,256,1
1,2,256,256
0
655360,1,2560,10
1,10,256,256

We can see from above C++ output that although our input tensor is contiguous, output tensor is not. Interestingly, I found that get_arr() function in python code is responsible for all this. For example, if we replace above get_arr() function with below one, we see that now our output tensor is also contiguous.
Replace above get_arr() with this:

def get_arr():
    arr = np.random.rand(10,3,3,2)
    arr = np.transpose(arr,(0,3,1,2))
    arr = np.ascontiguousarray(arr)
    return arr

Now. output of python code:

(10, 2, 3, 3)
(144, 72, 24, 8)
True

Output of C++ code:

1
131072,65536,256,1
1,2,256,256
1
655360,65536,256,1
1,10,256,256

Observe that our output tensor is contiguous now. How is this behaviour connected to our model weights? We can infer that if our model weights are not contiguous then output tensor is also not contiguous. But, how do we explain below behaviour then:
If get_arr() is like this:

def get_arr():
    arr = np.random.rand(10,2,3,3)
    arr = np.transpose(arr,(0,1,3,2))
    return arr

Output of python code:

(10, 2, 3, 3)
(144, 72, 8, 24)
False

Output of C++ code:

1
131072,65536,256,1
1,2,256,256
1
655360,65536,256,1
1,10,256,256

So, we can see that in this case model weights are not contiguous but output tensor is contiguous. It would be great if someone can explain this weird behaviour.

ptrblck · May 27, 2021, 8:18pm

Thanks for the update.
I haven’t reproduced the complete issue in libtorch, since you are already manually manipulating the .data attribute (which is not recommended, as it could yield unwanted issues, so wrap the code in a with torch.no_grad() block and assign the new nn.Parameter directly if necessary) and set it into a channels-last format.
As your Python script shows, arr.data.contiguous returns False and also:

print(model.conv1.weight.stride())
> (18, 1, 6, 2)
print(model.conv1.weight.is_contiguous())
> False
print(model.conv1.weight.is_contiguous(memory_format=torch.channels_last))
> True

shows that your manual assignment is setting the weight parameter to channels_last.

tejal567 · May 28, 2021, 8:03am

Yes, it looks like so. Strides and contiguity of output tensor depends on model weights. Anyway, thanks Piotr for your help.