Neural Style Transfer on videos


#1

I would like to implement an architecture similar to this:

Characterizing and Improving Stability in Neural Style Transfer, Gupta, A. and Johnson, J. and Alahi, A. and Fei-Fei, L.

It is a Recurrent Convolutional Neural Network. The light blue box is a simple convolutional neural network and the rest of structure makes the network recurrent. The authors use a sequence of 10 frames long that gets unfolded in 10 steps. The network gets fed with the current frame and the previous stylized frame (the frame generated on the previous step).

I have a working implementation of the feedforward architecture (the light blue box in the picture) and I would like to transform it in a Recurrent Convolutional Neural Network. Unfortunately I could not find much about the topic in the Pytorch community.

I have two questions in order to transform a Convolutional Neural Network in a RCNN:

  1. How can I prepare the dataset of frames in order to feed them to the RCNN? Should I make a Dataset class that returns a sequence of frames?

  2. How can I unfold this sequence of frames in order to use back propagation through time? I read the Pytorch documentation and saw that I can not use a RNN, LSTM, GRU layer in this particular case but I should write the recursion myself.

I would much appreciate if you have suggestions, pointers, tutorials, videos I can take a look at in order to understand this part.


(Duane Nielsen) #2

Perhaps you could summarize the contents of the papers? I’m guessing people on the forum want to help, but dont have the time to read 2 papers before answering!


#3

Thanks for your suggestion, I edited my question. I hope now it is clear and easy to read what I would like to achieve/implement.


#4

At the moment I have something like this:

# Loop for the epochs
for epoch in range(1, args.epochs + 1):
        model.train()
        epoch_loss = 0
        # Loop through sequences of frames returned by the Dataloader
        for i, sequence in enumerate(training_data_loader):
            optimizer.zero_grad()
            loss = 0
            im_out = i % args.image_freq == 0

            # I initialize the previous prediction as a black image
            prev_est = torch.zeros(sequence['input'].size(0), 3, sequence['input'].size(3), sequence['input'].size(4)).cuda(args.gpu, non_blocking=True)
            # Loop over the sequence of 10 frames, I pass to the network 2 frames (t-1, t)
            for j in range(sequence['input'].size(2) - 1):
                inputs = torch.index_select(sequence['input'], 2, torch.tensor([j,j+1])).cuda(args.gpu, non_blocking=True)
                t = torch.squeeze(torch.index_select(sequence['target'], 2, torch.tensor([j+1])), 2).cuda(args.gpu, non_blocking=True)
                output, l = model(inputs, prev_est, t, i, writer, im_out)
                loss += l 
                # Set the previous estimate as the output from the network
                prev_est = output   
            
            epoch_loss += loss.item() / sequence['input'].size(2)
            # Here I should achieve the backpropagation through time but I am not sure is doing it correctly
            loss.backward()
            optimizer.step()

            writer.add_scalar('learning_rate', args.lr , total_iter)
            writer.add_scalar('train_loss', loss.item(), total_iter)

            print("===> Epoch[{}]({}/{}): Loss: {:.4f}".format(epoch, i, len(training_data_loader), loss.item()))
            total_iter += 1

        print("===> Epoch {} Complete: Avg. Loss: {:.4f}".format(epoch, epoch_loss / len(training_data_loader)))

The train_data_loader provides a tensor containing a sequence of 10 frames. Is it correct the way I am doing BPTT?


(Duane Nielsen) #5

Yeah, tough question but an interesting one.

I guess you need to create an “unrolled” network with gradients flowing through it… sum your loss at each timestep, then apply the loss to the output of the final timestep. Then, as long as the gradients are all attached, theoretically they should flow back through “time”.

This post implies you can achieve that by using the same variable for input and output. Maybe give that a shot?

# non-truncated
for t in range(T):
   out = model(out)
out.backward()

# truncated to the last K timesteps
for t in range(T):
    out = model(out)
    if T - t == K:
        out.detach()
out.backward()

Correct way to do backpropagation through time?
#6

Thanks for your reply, I found the conversation you posted in an older forum post and I tried to follow it.

I put some code of my training procedure so far. The network is training but I have the feeling that the gradient does not get propagate through time.


(Duane Nielsen) #7

Cool, you can use hooks, to inspect your gradients.


(Qing En) #8

I have a question about the “warp” function in pytorch in my experiments.
If I want to warp an image from optical flow, could I grid_sample() and affine_grid() to achieve warp function?