How can I build an RNN without using nn.RNN

Hi ,

I need to build an RNN (without using nn.RNN) with following specifications :

  1. It should have set of weights [
  • It is a chanracter RNN.

  • It should have 1 hidden layer

  • Wxh (from input layer to hidden layer )

  • Whh (from the recurrent connection in the hidden layer)

  • W ho (from hidden layer to output layer)

  • I need to use Tanh for hidden layer

  • I need to use softmax for output layer.

I am implemented the code . I am using CrossEntropyLoss() as loss function .
Which is giving me error as

RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/THNN/generic/ClassNLLCriterion.c:22

Here is my code for model :


class CharRNN(torch.nn.Module):

    def __init__(self,input_size,hidden_size,output_size, n_layers = 1):

        super(CharRNN, self).__init__()
        self.input_size  = input_size
        self.hidden_size = hidden_size
        self.n_layers    = 1

        self.x2h_i = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.x2h_f = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.x2h_o = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.x2h_q = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.h2o   = torch.nn.Linear(hidden_size, output_size)
        self.sigmoid = torch.nn.Sigmoid()
        self.softmax = torch.nn.Softmax()
        self.tanh    = torch.nn.Tanh()

    def forward(self, input, h_t, c_t):

        combined_input = torch.cat((input,h_t),1)

        i_t = self.sigmoid(self.x2h_i(combined_input))
        f_t = self.sigmoid(self.x2h_f(combined_input))
        o_t = self.sigmoid(self.x2h_o(combined_input))
        q_t = self.tanh(self.x2h_q(combined_input))

        c_t_next = f_t*c_t + i_t*q_t
        h_t_next = o_t*self.tanh(c_t_next)

        output = self.softmax(h_t_next)
        return output, h_t, c_t
    
    def initHidden(self):
        return torch.autograd.Variable(torch.zeros(1, self.hidden_size))

    def weights_init(self,model):
    
        classname = model.__class__.__name__
        if classname.find('Linear') != -1:
            model.weight.data.normal_(0.0, 0.02)
            model.bias.data.fill_(0)

and this is the code for training the model :


input_tensor  = torch.autograd.Variable(torch.zeros(seq_length,n_vocab))
target_tensor = torch.autograd.Variable(torch.zeros(seq_length,n_vocab))

model   = CharRNN(input_size = n_vocab, hidden_size = hidden_size, output_size = output_size)
model.apply(model.weights_init)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

for i in range(n_epochs):
    print("Iteration", i)
    
    start_idx    = np.random.randint(0, n_chars-seq_length-1)
    train_data   = raw_text[start_idx:start_idx + seq_length + 1]
    
    input_tensor = torch.autograd.Variable(seq2tensor(train_data[:-1],n_vocab), requires_grad = True)
    target_tensor= torch.autograd.Variable(seq2tensor(train_data[1:],n_vocab), requires_grad = False).long()
    
    loss = 0
    
    h_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    c_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    
    for timestep in range(seq_length):
        
        output, h_t, c_t = model(input_tensor[timestep].view(1,n_vocab), h_t, c_t)
        
        loss += criterion(output,target_tensor[timestep].view(1,n_vocab))
        
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    
    x_t = input_tensor[0].view(1,n_vocab)
    h_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    c_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    
    gen_seq = []
    
    for timestep in range(100):
        output, h_t, c_t = model(x_t, h_t, c_t)
        ix = np.random.choice(range(n_vocab), p=output.data.numpy().ravel())
        x_t = torch.autograd.Variable(torch.zeros(1,n_vocab))
        x_t[0,ix] = 1
        gen_seq.append(idx2char[ix])
        
    txt = ''.join(gen_seq)
    print ('----------------------')
    print (txt)
    print ('----------------------')

Can you please help me ?

Thanks in advance.

CrossEntropyLoss needs a 1-dimensional target.
From the docs:

This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch

Example:

batch_size = 10
n_classes = 5
data = Variable(torch.randn(batch_size, n_classes))
target = Variable(torch.LongTensor(batch_size).random_(n_classes))

criterion = nn.CrossEntropyLoss()
loss = criterion(data, target)

Hi @ptrblck

Thanks for your reply.

I checked the dimension of output and target

output  torch.Size([1, 97])
target  torch.Size([97])

How can I convert the target size to [1]
I tried view(1,1) , but that did not work

target.unsqueeze_(0) will add a new dimension.
However, this won’t help, since you would need a single entry with values between [0, 96].

Could you print again the shape of target_tensor[timestep] in the line of code where the loss is calculated?

Hi @ptrblck,

target_tensor[timestep].size() torch.Size([97])

Is it the probability of all 97 classes?
I assume n_vocab is the number of all possible classes, right?

Hi ,

Yes it is…You are right.

Target is an one hot encoded matrix.

Would it work if you just all torch.max on it to get the current class?
Or do you have “soft” probabilities, i.e. some other values than [0, 0, 1, 0, 0, ...]?

target variable contains only 0 , 1. So , can I simply apply torch.max ?

Yep, you can drop the max value and just use the index:

target = torch.FloatTensor([0, 0, 1, 0, 0])
_, idx = torch.max(target, 0)

Hi @ptrblck , thanks a lot for your help. You are a saviour.

However , I have a small doubt.

I need pass the max index to cross entropy ? or the value ?

You need to pass the class index to the criterion.
Otherwise it would be one all the time. :wink:

Okay… will it be applicable for NLLLoss ?

Sure, then you would have to call F.log_softmax(output) before passing it to NLLLoss or add it as a layer in your model.
CrossEntropyLoss basically combines a log_softmax with NLLLoss.

1 Like

Hi , just one thing… I am curious how index is used in cross entropy here ?

I mean theoritically , cross entropy takes values ,right ?

Could you please clear the doubt ?

Thanks a lot again :slight_smile:

The formula from the docs shows that just the class index is used.
So basically you save another transformation needed for one-hot encoded targets.