Your code doesn’t show any implementation of a luss function, just the usage of criterion, which looks alright assuming the input tensor shapes and dtypes are correct.

I taught the right way of doing is following one of the 2 methods explained in this post :

either I use the avereged batch loss calculated by the creterion sum them and devide by the number of batches or
use the loss of all samples of every batch sum all and devide by the size of the data set
which I was missing here ? so either i add the multiplication by the batch size inside the loop or I devide by the number of the batches outside the loop instead of the data set size ?

Right now you are dividing the accumulated loss by the number of samples in the Dataset via:

train_loss /= len(train_loader.dataset)

which assumes train_loss is the sum of the losses of all samples.
Based on:

train_loss+=loss.data

this would be the case if your are using the sum reduction. If not (e.g. if you are using the mean reduction) you would need to scale it with the current batch size (i.e. output.size(0)).

if i am using a default definition of the loss function (which by default uses mean()), assume that I can also use this instead :

train_loss /= len(train_loader) which point the way2 in the previous post you repiled to, the sum of average loss over batches devided by the number of batches ?

This would work if all batches contain the same number of samples. In case the last one has fewer samples, you would add a small error to the loss calculation as also described here.

then I would go with the way1 summing the loss of all the samples along the data set and deviding by the size of the dataset, my validation cross entropy loss is high very high (1<) compared to the training one, and the validation accuracy is around 65 %, the train accuracy is 95%, this is a clear overffiting ? isnt ?
PS i have a small validation data set around 2000 same for test set , but for the train data is around 22000