Fully connected layer confusion

I have to do transfer learning and instead of changing my custom number of classes taht is 6 in the last layer I used this method. But what I feel like I did wrong is not used softmax.
Placing relu in 2nd last position is correct? I feel like i needed to have softmax here does it impact on accuracy and stuff?
My problem is multi class problem. I used crossEntropyLoss which on reading I think uses softmax at the backend. so do i need to worry???

model.classifier = nn.Sequential(
nn.Linear(in_features=1792, out_features=625), #1792 is the orginal in_features
nn.ReLU(), #ReLu to be the activation function
nn.Linear(in_features=625, out_features=256),
nn.ReLU(), // this part
nn.Linear(in_features=256, out_features=6)

As you have observed CrossEntropyLoss includes a softmax: “ Note that this case is equivalent to the combination of LogSoftmax and NLLLoss" as stated in the docs. Adding another softmax layer would be redundant and strange. You would only expect to see a softmax here in the forward pass if NLLoss was used instead of CrossEntropyLoss.

Yes, adding a ReLU before the final linear can be considered a standard practice. However, I would also compare the performance of just going directly from the 1792 features to the number of output classes and check if that causes accuracy degradation vs. the current approach.

well When i do with only nn.Linear(in_features=1792, out_features=6) this way it gives bit less accuray like 86% and with the above multiple method it gave 89% accuracy.
So you mean i do not need to worry about relu() before the classifier?

ReLU before the classifier is fine (it would be strange to add it after the classifier), but I would experiment to see what gives the best results.

I would also take a look at some reference model implementations e.g., torchvision.models.resnet — Torchvision main documentation for more ideas.

So you think what I did is correct? right? I need this answer because I have to move to other steps then.