How can I build an RNN without using nn.RNN

Shounak_Kundu · April 23, 2018, 3:51pm

Hi ,

I need to build an RNN (without using nn.RNN) with following specifications :

It should have set of weights [

It is a chanracter RNN.
It should have 1 hidden layer
Wxh (from input layer to hidden layer )
Whh (from the recurrent connection in the hidden layer)
W ho (from hidden layer to output layer)
I need to use Tanh for hidden layer
I need to use softmax for output layer.

I am implemented the code . I am using CrossEntropyLoss() as loss function .
Which is giving me error as

RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/THNN/generic/ClassNLLCriterion.c:22

Here is my code for model :


class CharRNN(torch.nn.Module):

    def __init__(self,input_size,hidden_size,output_size, n_layers = 1):

        super(CharRNN, self).__init__()
        self.input_size  = input_size
        self.hidden_size = hidden_size
        self.n_layers    = 1

        self.x2h_i = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.x2h_f = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.x2h_o = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.x2h_q = torch.nn.Linear(input_size + hidden_size, hidden_size)
        self.h2o   = torch.nn.Linear(hidden_size, output_size)
        self.sigmoid = torch.nn.Sigmoid()
        self.softmax = torch.nn.Softmax()
        self.tanh    = torch.nn.Tanh()

    def forward(self, input, h_t, c_t):

        combined_input = torch.cat((input,h_t),1)

        i_t = self.sigmoid(self.x2h_i(combined_input))
        f_t = self.sigmoid(self.x2h_f(combined_input))
        o_t = self.sigmoid(self.x2h_o(combined_input))
        q_t = self.tanh(self.x2h_q(combined_input))

        c_t_next = f_t*c_t + i_t*q_t
        h_t_next = o_t*self.tanh(c_t_next)

        output = self.softmax(h_t_next)
        return output, h_t, c_t
    
    def initHidden(self):
        return torch.autograd.Variable(torch.zeros(1, self.hidden_size))

    def weights_init(self,model):
    
        classname = model.__class__.__name__
        if classname.find('Linear') != -1:
            model.weight.data.normal_(0.0, 0.02)
            model.bias.data.fill_(0)

and this is the code for training the model :


input_tensor  = torch.autograd.Variable(torch.zeros(seq_length,n_vocab))
target_tensor = torch.autograd.Variable(torch.zeros(seq_length,n_vocab))

model   = CharRNN(input_size = n_vocab, hidden_size = hidden_size, output_size = output_size)
model.apply(model.weights_init)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

for i in range(n_epochs):
    print("Iteration", i)
    
    start_idx    = np.random.randint(0, n_chars-seq_length-1)
    train_data   = raw_text[start_idx:start_idx + seq_length + 1]
    
    input_tensor = torch.autograd.Variable(seq2tensor(train_data[:-1],n_vocab), requires_grad = True)
    target_tensor= torch.autograd.Variable(seq2tensor(train_data[1:],n_vocab), requires_grad = False).long()
    
    loss = 0
    
    h_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    c_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    
    for timestep in range(seq_length):
        
        output, h_t, c_t = model(input_tensor[timestep].view(1,n_vocab), h_t, c_t)
        
        loss += criterion(output,target_tensor[timestep].view(1,n_vocab))
        
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    
    x_t = input_tensor[0].view(1,n_vocab)
    h_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    c_t = torch.autograd.Variable(torch.zeros(1,hidden_size))
    
    gen_seq = []
    
    for timestep in range(100):
        output, h_t, c_t = model(x_t, h_t, c_t)
        ix = np.random.choice(range(n_vocab), p=output.data.numpy().ravel())
        x_t = torch.autograd.Variable(torch.zeros(1,n_vocab))
        x_t[0,ix] = 1
        gen_seq.append(idx2char[ix])
        
    txt = ''.join(gen_seq)
    print ('----------------------')
    print (txt)
    print ('----------------------')

Can you please help me ?

Thanks in advance.

ptrblck · April 23, 2018, 8:10pm

CrossEntropyLoss needs a 1-dimensional target.
From the docs:

This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch

Example:

batch_size = 10
n_classes = 5
data = Variable(torch.randn(batch_size, n_classes))
target = Variable(torch.LongTensor(batch_size).random_(n_classes))

criterion = nn.CrossEntropyLoss()
loss = criterion(data, target)

Shounak_Kundu · April 24, 2018, 1:13am

Hi @ptrblck

Thanks for your reply.

I checked the dimension of output and target

output  torch.Size([1, 97])
target  torch.Size([97])

How can I convert the target size to [1]
I tried view(1,1) , but that did not work

ptrblck · April 24, 2018, 5:25am

target.unsqueeze_(0) will add a new dimension.
However, this won’t help, since you would need a single entry with values between [0, 96].

ptrblck · April 24, 2018, 8:51am

Could you print again the shape of target_tensor[timestep] in the line of code where the loss is calculated?

Shounak_Kundu · April 24, 2018, 1:20pm

Hi @ptrblck,

target_tensor[timestep].size() torch.Size([97])

ptrblck · April 24, 2018, 1:21pm

Is it the probability of all 97 classes?
I assume n_vocab is the number of all possible classes, right?

Shounak_Kundu · April 24, 2018, 1:22pm

Hi ,

Yes it is…You are right.

Target is an one hot encoded matrix.

ptrblck · April 24, 2018, 1:23pm

Would it work if you just all torch.max on it to get the current class?
Or do you have “soft” probabilities, i.e. some other values than [0, 0, 1, 0, 0, ...]?

Shounak_Kundu · April 24, 2018, 1:26pm

target variable contains only 0 , 1. So , can I simply apply torch.max ?

ptrblck · April 24, 2018, 1:27pm

Yep, you can drop the max value and just use the index:

target = torch.FloatTensor([0, 0, 1, 0, 0])
_, idx = torch.max(target, 0)

Shounak_Kundu · April 24, 2018, 1:37pm

Hi @ptrblck , thanks a lot for your help. You are a saviour.

However , I have a small doubt.

I need pass the max index to cross entropy ? or the value ?

ptrblck · April 24, 2018, 1:38pm

You need to pass the class index to the criterion.
Otherwise it would be one all the time.

Shounak_Kundu · April 24, 2018, 1:40pm

Okay… will it be applicable for NLLLoss ?

ptrblck · April 24, 2018, 1:42pm

Sure, then you would have to call F.log_softmax(output) before passing it to NLLLoss or add it as a layer in your model.
CrossEntropyLoss basically combines a log_softmax with NLLLoss.

Shounak_Kundu · April 24, 2018, 1:52pm

Hi , just one thing… I am curious how index is used in cross entropy here ?

I mean theoritically , cross entropy takes values ,right ?

Could you please clear the doubt ?

Thanks a lot again

ptrblck · April 24, 2018, 1:54pm

The formula from the docs shows that just the class index is used.
So basically you save another transformation needed for one-hot encoded targets.