Hi Mahammad!
This is fine.
For inference (e.g., validation), this is fine. For training (where you would
use backpropagation), you would not use with torch.no_grad():.
You haven’t said how big your dataset is, so we don’t know how many
samples you have in an epoch. However 25 epochs of training is in
general quite small. Try training much longer. It’s perfectly possible
for your training loss to “plateau” and then start making progress again
as you train more (and this can happen multiple times in a training run).
This could just be “noise.” Six epochs is basically nothing. If the random
initialization of your model happens to better match the samples randomly
selected for your validation set, your validation loss might randomly be
lower than your training loss (and you might expect to see this about half
the time). Only if your validation loss systematically remains lower than
your training loss after you have trained much longer should you start to
suspect that something fishy might be going on.
Neither. For any but the simplest toy problems, you haven’t trained long
enough to reach any conclusions.
You can do it either way – the two are equivalent. For purely stylistic
reasons, I prefer to not have the singleton dimension, but you don’t
need to change your model architecture to get rid of it – you can simply
.squeeze() away any singleton dimensions that the output of your
model might have (if you care – again, it doesn’t matter).
You don’t want to use CrossEntropyLoss. You should be using
BCEWithLogitsLoss.
I wouldn’t call this a huge imbalance, but it is large enough that you will
want to compensate for it, for example, by using pos_weight. (Some
people suggest using an intersection-over-union (IoU) or Dice-coefficient
loss for imbalanced segmentation problems. My advice is to start with
pos_weight and only augment BCEWithLogitsLoss with something
like IoU if its clear that pos_weight isn’t working well enough.)
No, a value of 4 would be way to small to effectively compensate for your
class imbalance. A value of about 400 would be a good starting point.
It’s up to you and your use case whether precision or recall is more
important to you and therefore how you should tune that trade-off.
Start by using a value that is roughly equal to the real ratio. Train for much
longer so that your results are meaningful. Then look at your precision and
recall and adjust pos_weight, as appropriate, to achieve your desired
precision / recall trade-off.
Again, 7 epochs of training is almost nothing.
Turn off early stopping (so that you can let your training run much longer).
Best.
K. Frank