[Noob] Model with Embedding bag not learning, problem with my model?

Hi,

Apologies if my model itself is downright trash, please correct in that case. I am just learning pytorch.
So, I am trying to do sentiment analysis on imdb dataset. 1-gram, sending the word index to the embedding-bag, then single fc layer.


class ffn(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedBag = nn.EmbeddingBag(1000, 20,sparse=True)
        self.fc = nn.Linear(20, 1)
        
    def forward(self, x):
        x = self.embedBag(x)
        y = self.fc(x)
        return t.sigmoid(y)

 
mod = ffn()
#using batch_size 1 so that no padding is required
imdb_data = t.utils.data.DataLoader(imdb_ds(train_x, train_y),batch_size=1)
optim = t.optim.Adagrad(mod.parameters(), lr=0.001 )

loss_ls = []
for i,(x,y) in enumerate(imdb_data):
    optim.zero_grad()
    ybar = mod(x)
    loss = F.binary_cross_entropy(ybar,t.tensor(y,dtype=t.float32))
    loss_ls.append(loss.item())
    loss.backward()
    optim.step()
    if i%5000==0: print(loss)

Is there a problem with loss? Is sigmoid giving problem? My loss is just stuck around 0.7, randomly fluctuating.

It’s usually better to remove the sigmoid and use nn.BCEWithLogitsLoss or F.binary_cross_entropy_with_logits.
However, your model might be too small, so I would recommend to add a relu and another linear layer to it.
Let me know, if that helps. :slight_smile:

1 Like

Thanks for your suggestion. It really helped :relaxed:

2 questions, if you don’t mind:

  1. I used binary cross with logits. But then when I wrote my function to find accuracy, I had to take sigmoid of the output of the model. So, isn’t it better to directly put sigmoid or softmax in model itself? When does the reverse become more useful?
  2. Why have functions like sigmoid, tanh have been moved from nn.functional to torch.sigmoid, torch.tanh? Them being inside nn.functional makes kinda more sense intuitively, that functions are inside functional. Pretty sure team pytorch has better reasons than mine :stuck_out_tongue:
  1. Yes, you could use a sigmoid to get the probabilities and calculate the accuracy using a threshold. Alternatively you could also use a threshold directly on the logits (no sigmoid), but a threshold in the range [0, 1] might be easier and more intuitive to use.
    If you put the sigmoid directly into the model, nn.BCELoss will apply torch.log internally, while passing logits to nn.BCEWithLogitsLoss will use the log-sum-exp trick as seen in these lines of code and will thus yield more numerical stability.

  2. I don’t have a strong opinion on it and feel free to chime in your opinion on what’s the less confusing approach. :slight_smile: Generally “mathematical” functions should go into the torch namespace, while NN-specific methods should stay in the nn.functional namespace as described here. torch.nn.functional.sigmoid should still work I think.

1 Like

As the Sigmoid/Relu are still inside torch.nn, I feel it is a bit confusing.
I feel they should be moved to torch.math, and these sigmoid/relu should be moved to torch.math.functional.

I wouldn’t create a new math namespace, as this would mean that also matmul, dot, sum etc. should be moved there, wouldn’t it?
Feel free to add your suggestions to the linked GitHub issue to discuss it further. :slight_smile:

1 Like