Representing textual labels in a CNN


I’m trying to create a textual classifier. The goal is to categorize different sentences of a wikipedia page into a set of classes. I have 4 different classes which I named “other”,“eats”,“habitat” and “lifespan”. I was wondering how these labels can be turned into integers. I have a word to index dictionary that maps each word in my training wikipedia text to a unique index. Should I encode the labels with the same word to index dictionary or can I just label them 0,1,2,3( which is what I’m currently doing) . The problem I’m facing with this scheme is that my loss becomes NAN after an epoch. I’m not sure if the labeling is the problem it’s just a hunch. Please tell me if I am going wrong anywhere else.

I believe just labelling them 0,1,2,3 is the typical approach.

What do you use for the last model layer, and what loss function do you use?

In the literature models for classification typically use a softmax layer as the last layer together with cross-entropy loss. In PyTorch this can be achieved in two ways.

  1. Use nn.LogSoftmax for the last layer together with nn.NLLLoss. But, in this case you must transform your target values into one-hot format. Label 0 would be represented by the vector [1,0,0,0], label 1 would be represented by [0,1,0,0], etc…
  2. Simply use nn.CrossEntropyLoss without the nn.LogSoftmax layer and provide target values as 0, 1, 2, 3.

If that is correct, then the next thing to try is to reduce the learning rate. Though that might simply delay the loss becoming NaN.

If that fails then I wouldn’t know what to suggest without seeing any code.

def __init__(self):
    self.embed = nn.Embedding(self.voacb_size + 1, self.embedding_length)
    self.conv1=nn.ModuleList([nn.Conv2d(in_channels=self.in_channel,out_channels=self.num_filters,kernel_size=(x,self.embedding_length)) for x in self.filter_sizes])

def forward(self,input_x):
    x = self.embed(input_x)
    x = [F.relu(conv(x)).squeeze(3) for conv in self.conv1]
    x = [F.max_pool1d(i, i.size(2)).squeeze(2) for i in x]
    x =, 1)
    x = self.dropout(x)  # (batch_size,len(kernel_sizes)*num_kernels)
    logit = self.fc(x)  # (batch_size, num_aspects)
    return logit

This is what I’m doing I believe that I am using the cross_entropy loss as my loss function. The loss function is defined in my train function. Please let me know if you want to have a look at it.

I can’t see anything obviously wrong with it. Assuming your input data is properly cleaned, then I guess there must be some latent numerical instability in your model, though I can’t see where.

What would I need to do in cleaning the data, my vocabulary is pretty small and I have a corresponding unique index for each word. Is there anything else that I need to be doing? Also, the surprising thing is that the loss grows from 500 to a really big value in 3 epochs and then starts returning NaN.

def train_model(X,y,epochs=100):
    model = CNN()
    #loss_fn = F.cross_entropy()
    loss_accumulated = 0
    #trainloader =, batch_size=32, shuffle=True, num_workers=8), batch_size=32, shuffle=True, num_workers=8)
    optimizer = torch.optim.SGD(model.parameters(),lr=0.05,momentum=0.01)
    for i in range(epochs):
        for data,label in zip(X,y):
            data, target = Variable(torch.LongTensor([data])).cuda(), Variable(torch.LongTensor([label])).cuda()
        if i%10==0:
            print("Sleeping ...")
        print("Epoch is "+ str(i))
        print(loss_accumulated), "CNN_params.pkl")

this is my training code

That is definitely a good start. Are your sentences all of good quality and your labels accurate?

That is really not normal. The loss shouldn’t grow in the first epoch, and yet I can’t see any glaring errors in the code you have shown.

Maybe there are too few units in your model.
Maybe the learning rate is far too high.

I’m at a loss to know what else to suggest.