for instance, in the last 1024d layer, I want to train first 512d with sgd and the rest with Adam. How to achieve this?
When I use the model.parameters[-2][:,-512:] and optimize it, will remind that can’t optimize the non-leaf node…
for instance, in the last 1024d layer, I want to train first 512d with sgd and the rest with Adam. How to achieve this?
When I use the model.parameters[-2][:,-512:] and optimize it, will remind that can’t optimize the non-leaf node…