@ptrblck thanks a lot for pointing out the link to that question sir . i am new to pytorch and i am finding the hint difficult to follow can you please help me understand the hint given by @tom sir . please excuse me if my question is very basic and naive . the below code is no way functional i just pieced together a psuedo code from @tom sir’s answer

```
class custom loss:
def __init__(self,mask,epochs=5000,learning_rate=0.1):
self.lamb=lamb
self.epochs=epochs
self.learning_rate=learning_rate
self.the_mask=mask
self.register_buffer('weight_update_mask', the_mask)
def forward(self, x):
weight = torch.where(self.weight_update_mask,self.weight_param,self.weight_fixed)
return torch.matmul(weight,x)
def f_loss(Y,pred,w,lamb):
pred_loss=torch.norm((Y-pred),p='fro')**2
reg=torch.norm(w,p='fro')**2
return((1/Y.size()[0])*pred_loss+ lamb*reg)
def fit(X_pt,Y_pt,w):
w_pt=torch.tensor(w,requires_grad=True) # we are computing the derivative wrt w
opt=torch.optim.Adam([w_pt], lr=self.learning_rate, betas=(0.9, 0.99), eps=1e-08,
weight_decay=0, amsgrad=False)
for epoch in range(self.epochs):
pred = torch.matmul(X_pt,w_pt)
loss = f_loss(Y_pt,pred,w_pt,self.gamma,self.lamb) # loss value , a scalar
loss.backward() # computes the derivative of the loss with respect to w
opt.step() # this performs an update to the parameters
opt.zero_grad() # this resets the gradients else they will accumulate up
return w_pt
```

the following are my doubts :

1.should the_mask contain a tensor of 0’s and 1’s. 1’s at places with variables we want to compute the gradients for and 0’s in the places which we want to ignore gradient computation?

2.what is the significance of the self.weight_param,self.weight_fixed variables ? are these automatically computed by pytorch or should i define and initialise them in my init function ? if so how do i initialise them ?

3. what changes should be done to my fit function ? ( i dont expect code but please atleast explain in words )