Backpropagating a Network with a RNN at its end


I implemented a Network containing a CNN layer, a FC layer and Bidirectional GRU and I am having a behavior which I don’t understand. Here is my code snippet:

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.optim as optim

class DummyNet(nn.Module):
	def __init__(self):
		super(DummyNet, self).__init__()
		self.cnn = nn.Conv2d(1, 3, 3)
		self.fc = nn.Linear(192, 3)
		self.rnn = nn.GRU(3, 5, 2, batch_first=True, dropout=1,
		self.h0 = Variable(torch.zeros((4, 1, 5)))
	def forward(self, input):
		xc = self.cnn(input).view(-1, (input.size(2) - 2) * (input.size(3) -
															 2) * 3)
		xf = self.fc(xc)
		xs = xf.view(xf.size(0), 1, xf.size(1))
		x, self.h0 = self.rnn(xs, self.h0)
		return x

net = DummyNet()
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
inputs = torch.randn((1, 1, 10, 10))
preoutputs = net(inputs)
labels = torch.zeros(preoutputs.size())
loss = criterion(preoutputs, labels)
for param in net.parameters():
	if param.grad is not None and param.grad.sum() != 0:
		print('No zero')
	elif param.grad is None:


I declare my Network, which is a sequence of a Conv layer, a fully-connected layer and the Bidirectional GRU. Thus, the los should backpropagate through all this layers and update the parameters for all the gradients. However, all those prints indicate that most of the parameters (I believe all until the RNN) have gradient 0. If I modify the network and add another fully connected instead of the RNN then the gradients do update (different than 0). Anybody gets why?

Thank you in advanced!