LSTM Loss of zero

Why is my LSTM giving a loss of zero. I am trying to train it on a simple task of predicting next word based on the context of two words represented as bag of words.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

test_sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.China, officially the People's Republic of China (PRC), is a country in East Asia and the world's most populous country, with a population of around 1.404 billion.[10] Covering approximately 9,600,000 square kilometers (3,700,000 sq mi), it is the third- or fourth-largest country by total area.[k][16] Governed by the Communist Party of China, the state exercises jurisdiction over 22 provinces, five autonomous regions, four direct-controlled municipalities (Beijing, Tianjin, Shanghai, and Chongqing), and the special administrative regions of Hong Kong and Macau.

China emerged as one of the world's earliest civilizations, in the fertile basin of the Yellow River in the North China Plain. For millennia, China's political system was based on hereditary monarchies, or dynasties, beginning with the semi-legendary Xia dynasty in 21st century BCE.[17] Since then, China has expanded, fractured, and re-unified numerous times. In the 3rd century BCE, the Qin reunited core China and established the first Chinese empire. The succeeding Han dynasty, which ruled from 206 BC until 220 AD, saw some of the most advanced technology at that time, including papermaking and the compass,[18] along with agricultural and medical improvements. The invention of gunpowder and movable type in the Tang dynasty (618–907) and Northern Song (960–1127) completed the Four Great Inventions. Tang culture spread widely in Asia, as the new Silk Route brought traders to as far as Mesopotamia and Horn of Africa.[19] Dynastic rule ended in 1912 with the Xinhai Revolution, when a republic replaced the Qing dynasty. The Chinese Civil War resulted in a division of territory in 1949, when the Communist Party of China established the People's Republic of China, a unitary one-party sovereign state on Mainland China, while the Kuomintang-led government retreated to the island of Taiwan. The political status of Taiwan remains disputed.

Since the introduction of economic reforms in 1978, China's economy has been one of the world's fastest-growing with annual growth rates consistently above 6 percent.[20] According to the World Bank, China's GDP grew from $150 billion in 1978 to $12.24 trillion by 2017.[21] Since 2010, China has been the world's second-largest economy by nominal GDP[22] and since 2014, the largest economy in the world by purchasing power parity (PPP).[23] China is also the world's largest exporter and second-largest importer of goods.[24] China is a recognized nuclear weapons state and has the world's largest standing army and second-largest defense budget.[25][26] The PRC is a permanent member of the United Nations Security Council as it replaced the ROC in 1971, as well as an active global partner of ASEAN Plus mechanism. China is also a leading member of numerous formal and informal multilateral organizations, including the Shanghai Cooperation Organization (SCO), WTO, APEC, BRICS, the BCIM, and the G20. In recent times, scholars have argued that it will soon be a world superpower, rivaling the United States.""".split()


vocab = set(test_sentence)
word_to_ix = {word: i for i, word in enumerate(vocab)}

out=None

trigrams = [[[test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2]]
            for i in range(len(test_sentence) - 2)]

def one_hot(word,word_to_ix):
    
    vector=torch.zeros(len(word_to_ix),dtype=torch.long)
    vector[word_to_ix[word]]+=1
    
    return vector.view(1,-1)

class NGramLanguageModeler(nn.Module):

    def __init__(self):
        
        super(NGramLanguageModeler, self).__init__()
        
        self.lstm=nn.LSTM(369,369)
        self.hidden=(torch.randn(1, 1,369),
                     torch.randn(1, 1,369))

    def forward(self, inputs):
        
        out=None
        tensor=torch.tensor(inputs[0], dtype=torch.float)
        _, self.hidden = self.lstm(tensor.view(1, 1, -1),self.hidden)
        tensor=torch.tensor(inputs[1], dtype=torch.float)
        out, self.hidden =self.lstm(tensor.view(1, 1, -1),self.hidden)
           
        return F.log_softmax(out)
        
loss_function = nn.NLLLoss()
model = NGramLanguageModeler()
optimizer = optim.SGD(model.parameters(), lr=0.001)

'''for i in trigrams:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    
    output=one_hot(i[1],word_to_ix)
    
    for word in i[0]:
        
        
        tensor=torch.tensor(one_hot(word,word_to_ix), dtype=torch.float)
        out, hidden = lstm(tensor.view(1, 1, -1), hidden)
        
    
print(out)'''

losses=[]

for epoch in range(1):
    
    total_loss = 0
    
    for context, target in trigrams:

        # Step 1. Prepare the inputs to be passed to the model (i.e, turn the words
        # into integer indices and wrap them in tensors)
        tensor_1=torch.tensor(one_hot(context[0],word_to_ix), dtype=torch.float)
        tensor_2=torch.tensor(one_hot(context[1],word_to_ix), dtype=torch.float)

        # Step 2. Recall that torch *accumulates* gradients. Before passing in a
        # new instance, you need to zero out the gradients from the old
        # instance
        model.zero_grad()

        # Step 3. Run the forward pass, getting log probabilities over next
        # words
        log_probs = model([tensor_1,tensor_2])

        # Step 4. Compute your loss function. (Again, Torch wants the target
        # word wrapped in a tensor)
        print(log_probs)
        print(tensor_1)
        print(tensor_2)
        loss = loss_function(log_probs.view(1,-1),torch.tensor([word_to_ix[target]], dtype=torch.long))

        # Step 5. Do the backward pass and update the gradient
        loss.backward(retain_graph=True)
        optimizer.step()

        # Get the Python number from a 1-element Tensor by calling tensor.item()
        print(loss.item())
        
    losses.append(total_loss)