How would I apply a different learning rate to different portions of a model? Would it be as simple as creating two optimizers with different sets of model parameters and calling optimizer.step() on both for each batch?
Check the Per-parameter options section here: http://pytorch.org/docs/optim.html
Instead to feeding in a generator over all parameters, pass an iterable of dicts, each with the key
params and the value as the parameter group. It should be simple to group our model params according to the learning rates you want to apply to them.