Is there a better way to handle alpha? All the torch.unsqueeze look ugly.
In the backward pass I want to handle the gradient with alpha=0.5 instead of the random alpha from the forward pass. How would I implement this? Maybe you can give a working implementation? I saw http://pytorch.org/docs/master/autograd.html#torch.autograd.Function, but I am not sure how to use it…
For your second question, wouldn’t you need to do another forward pass using 0.5 then? Otherwise the output and intermediate outputs are incorrect to calculate gradients.
Thank you for answering!
So do one forward pass, save the loss, reset the gradients and to another forward pass with 0.5 and the saved loss? Is that what you mean?
I guess I’m not understanding the idea of the algorithm. It is unclear to me what it tries to do. Does it still use the result from fwd a random alpha?
x = Variable(torch.randn(5), requires_grad=True)
y = Variable(torch.bernoulli(torch.FloatTensor(5).fill_(0.5)))
z = 0.5*x+(x*(y-0.5)).detach()
z.sum().backward()
The trick is to use the mean in the normal flow (which will be used for backward) and block the backward for the product with the difference between the mean and the normal flow.
Note that implementing your own autograd.Function would likely be somewhat more efficient computationally (saves the product of x*(x-0.5)).
@tjoseph did you have any success with this? I’m interested in Shake Shake in pytorch too. The method shown by @tom is really clever so I’m glad to come across this discussion!
Here is a work-in-progress Pytorch shake shake CIFAR10 notebook. There might be bugs in the net itself and the training around needs serious improvement.