Backpropagation after torch.flip


Suppose you have a convolution layer (C), max pooling layer (MP), and dense layers (Ds) in an architecture that follows:

Input → C1 → C2 → MP → Ds → output

Your input is a 10x2 matrix (rows x columns). Your C1 and C2 kernel is a 3x1. So after the C1, you get a 8x2 matrix, after C2 a 6x2. The MP is a (1,2), so you get a 6x1 which is then fed into Ds to get an output. Back propagation works fine.

What about this scenario where you have a flip operation (FLIP) which flips only one column of the previous output (PO).

PO[:1] = torch.flip(PO[:1], dims = 1)

For example, if your output was [[1,4], [2,5], [3,6]], FLIP output: [[1,6], [2,5], [3,4]].

Input → C1 → C2 → FLIP → MP → Ds → output

Here now after your C2 which outputs 6x2, you FLIP then carry on with the rest of the network. So sometimes the max pooling picks up values from the 1st column, sometimes the 2nd (where the position is changed).

Does back prop follow this flip operation? As in, will the C1 and C2 weights adjust accordingly. Or maybe a better question: after the torch.flip operation (which creates a copy from the original input tensor), will back propagation find the right targets to update?

I hope I didn’t confuse you too much. Thank you for your patience and consideration of this question

Yes, Autograd will track the flip operation as seen in this small example:

x = torch.randn(2, 2, requires_grad=True)
> tensor([[-0.6456,  1.6832],
          [ 0.3877,  1.0371]], requires_grad=True)

grad = torch.tensor([[1., 0.],
                     [1., 0.]])
> tensor([[1., 0.],
          [1., 0.]])

x.grad = None
y = torch.flip(x, dims=(1,))
> tensor([[ 1.6832, -0.6456],
          [ 1.0371,  0.3877]], grad_fn=<FlipBackward0>)

> tensor([[0., 1.],
          [0., 1.]])
1 Like

Great thank you!

Follow up if you can:
We add another C (C3) after the max pool. Why does a ReLU activation after C3 allow this operation but a Tanh brings up an error (an inplace error) which then requires a clone?

Inplace operations on inputs to layers are disallowed if this layer needs the inputs to calculate the gradients, which seems to be the case for tanh but not for relu.

1 Like

Thank you @ptrblck . And thank you for the countless other answers you’ve helped me (indirectly from other peoples posts) with over the past year

1 Like