I’d like to implement something similar to KeypointNets multi-view consistency and rotation loss. The gist of it is as follows:

Let M be our model, I an image and T a transformation matrix. For the multi-view consistency loss, one has a affine transformation matrix T and you’d like to impose equivariance upon the model wrt to T: M(T*I) == T*M(I)

Similarly for the rotation loss, given M(T*I) and M(I), one would like to estimate T with T_ and enforce T == T_.

The pseudo code I’ve implemented looks as follows:

```
y_1 = M(I)
y_2 = M(T*I)
mvc_loss = ||T*y_1 - y_2||^2
T_ = estimate_transformation(y_1, y_2)
rot_loss = ||T - T_||^2
loss = mvc_loss + rot_loss
loss.backward()
optim.step()
```

I’ve seen the threads Multiple forward before backward call, How to implement accumulated gradient in pytorch (i.e. iter_size in caffe prototxt) and How to implement accumulated gradient？, regarding calling multiple forwards before backpropagating. However, they all call backward() directly after each forward call. I need to call forward twice before being able to call backward, as the loss depends on both forward calls. I’ve been getting NaNs, and am wondering if I’m handling this correctly.

Can anybody confirm that what I’m doing is correct? If it isn’t could someone indicate what the correct way would be?

Thanks!