Let me explain myself. I have to train a preTrained (on Imaganet) ResNet50 network, but instead of having one last Fc layer of 1000 I need to add 3 more FC layers (1000->128,128->64 and 64->10). I do this because i want train with a custom dataset (10 classes of my making). My code is this:
Successive linear layers may not be very effective: (i) everything still just a linear operation, and (ii) it adds a lot of parameters. Generally we add differentiable non-linear operations between linear layers (sigmoid, tanh, relu -> which is differentiable almost everywhere, etc).
Regarding the last layer, since you are dealing with a classification problem, you want to get a distribution as output. So, generally the last layer will output the logits, therefore no ReLU would be applied at the end.