Optimizer with different learning rates, compatible with GPU

Hi all,

I am currently facing the following problem:
Given a list of inital values and a list of corresponding learning rates (sizes not know before), I want to pass the values to an optimizer with the corresponding lr specified in the lists. I cannot just create a single tensor and slice it when passing to the optimizer, as it would not be a leaf tensor anymore. Therefore i am working with a list of scalar torch.tensor. Then i use torch.stack to get a single tensor with which i can work. However I really need the code to be runnably on GPU efficienty, and I think the usage of lists leads to data tranfer between GPU and CPU, which makes the code slow (not sure, not an expert). also the usage of torch.stack in every iteration seems quite expensive. Any ideas how i can realize this to be fast on GPU?

Here are some code snippets:

self.lagrange_lambda = [torch.tensor(lagrange_lambda_init_list[i], requires_grad =True, device= self.device).float() for i in range(self.num_constr)]
self.lambda_optimizer = optim.Adam([{‘params’: [torch.tensor(self.lagrange_lambda[i])], ‘lr’:learning_rate_lambda_list[i]} for i in range(self.num_constr)])

and then in the loop:

lagrange_lambda = torch.stack(self.lagrange_lambda)

Thanks for any help!

That’s not the case since the list would only store the tensor with its metadata while the actual internal data of the tensor would still be on the GPU.

I also don’t fully understand your use case as it seems you would initialize the optimizer only once, so even a slow loop shouldn’t matter compared to the overall training duration.

This line of code:

self.lambda_optimizer = optim.Adam([{‘params’: [torch.tensor(self.lagrange_lambda[i])], ‘lr’:learning_rate_lambda_list[i]} for i in range(self.num_constr)])

also creates new tensors, so self.lagrange_lambda will never be optimized.

1 Like