Characterizing and Improving Stability in Neural Style Transfer, Gupta, A. and Johnson, J. and Alahi, A. and Fei-Fei, L.
It is a Recurrent Convolutional Neural Network. The light blue box is a simple convolutional neural network and the rest of structure makes the network recurrent. The authors use a sequence of 10 frames long that gets unfolded in 10 steps. The network gets fed with the current frame and the previous stylized frame (the frame generated on the previous step).
I have a working implementation of the feedforward architecture (the light blue box in the picture) and I would like to transform it in a Recurrent Convolutional Neural Network. Unfortunately I could not find much about the topic in the Pytorch community.
I have two questions in order to transform a Convolutional Neural Network in a RCNN:
How can I prepare the dataset of frames in order to feed them to the RCNN? Should I make a Dataset class that returns a sequence of frames?
How can I unfold this sequence of frames in order to use back propagation through time? I read the Pytorch documentation and saw that I can not use a RNN, LSTM, GRU layer in this particular case but I should write the recursion myself.
I would much appreciate if you have suggestions, pointers, tutorials, videos I can take a look at in order to understand this part.
Perhaps you could summarize the contents of the papers? I’m guessing people on the forum want to help, but dont have the time to read 2 papers before answering!
# Loop for the epochs
for epoch in range(1, args.epochs + 1):
model.train()
epoch_loss = 0
# Loop through sequences of frames returned by the Dataloader
for i, sequence in enumerate(training_data_loader):
optimizer.zero_grad()
loss = 0
im_out = i % args.image_freq == 0
# I initialize the previous prediction as a black image
prev_est = torch.zeros(sequence['input'].size(0), 3, sequence['input'].size(3), sequence['input'].size(4)).cuda(args.gpu, non_blocking=True)
# Loop over the sequence of 10 frames, I pass to the network 2 frames (t-1, t)
for j in range(sequence['input'].size(2) - 1):
inputs = torch.index_select(sequence['input'], 2, torch.tensor([j,j+1])).cuda(args.gpu, non_blocking=True)
t = torch.squeeze(torch.index_select(sequence['target'], 2, torch.tensor([j+1])), 2).cuda(args.gpu, non_blocking=True)
output, l = model(inputs, prev_est, t, i, writer, im_out)
loss += l
# Set the previous estimate as the output from the network
prev_est = output
epoch_loss += loss.item() / sequence['input'].size(2)
# Here I should achieve the backpropagation through time but I am not sure is doing it correctly
loss.backward()
optimizer.step()
writer.add_scalar('learning_rate', args.lr , total_iter)
writer.add_scalar('train_loss', loss.item(), total_iter)
print("===> Epoch[{}]({}/{}): Loss: {:.4f}".format(epoch, i, len(training_data_loader), loss.item()))
total_iter += 1
print("===> Epoch {} Complete: Avg. Loss: {:.4f}".format(epoch, epoch_loss / len(training_data_loader)))
The train_data_loader provides a tensor containing a sequence of 10 frames. Is it correct the way I am doing BPTT?
I guess you need to create an “unrolled” network with gradients flowing through it… sum your loss at each timestep, then apply the loss to the output of the final timestep. Then, as long as the gradients are all attached, theoretically they should flow back through “time”.
This post implies you can achieve that by using the same variable for input and output. Maybe give that a shot?
# non-truncated
for t in range(T):
out = model(out)
out.backward()
# truncated to the last K timesteps
for t in range(T):
out = model(out)
if T - t == K:
out.detach()
out.backward()
I have a question about the “warp” function in pytorch in my experiments.
If I want to warp an image from optical flow, could I grid_sample() and affine_grid() to achieve warp function?
hi @riccardosamperna, if I want to warp image1 to image2, the input optical flow(acquired from FlowNet) is from image2 to image1 rather than image1 to image2 ?
I don’t really understand your question and you should probably open another discussion but let’s see if I can help you.
If you have image1 and image2 and you want to warp image1 to image2, you calculate the optical flow between image1 and image2 and you use it to warp image1. If you the flow from image2 to image1, the flow from image1 to image2 is just the opposite.
Thanks for your reply. I deploy the warp function using pytorch. But when I use the optical flow generated by FlowNet2.0, the generated image using the warp function has little significant changes. I suppose there maybe some problem in optical flow or warp~ I will check the function carefully later.