Hey, I am working on a loss function that detects anomaly of a model with another model.
Consider any model and loss function as this:
def loss_anom(global_model, worker_model):
global_model_copy = global_model.copy()
loss_ = 0.0
for p,q in zip(global_model_copy.parameters(), worker_model.parameters()):
loss_ += torch.sum((p-q)**2)
print(loss_.copy())
return loss_**0.5
My optimizer works on worker_model parameters.
Now problem, I am facing is that when I do loss.backward(), its setting the grads of parameters as NaN.
Is there any workaround this? I am making any mistake here?