DataParallel/LOSS element 0 of tensors does not require grad and does not have a grad_fn

Hi,
Im trying to create new weights in order to calculate projection between the featuremaps and the new weights.

I created a class called ArcFace that some of you may know in the initialized model.
self.arc= ArcFace(s = 10, num_classes=num_classes, margin= 2 )
in ArcFace class I initiate a new self.weight = torch.nn.Parameter(torch.normal(0, 0.01, (num_classes,self.feature_dim)))
before running nn.DataParallel(model).cuda() I had the same weights saved under the:
model.arc.weight variable - like it should
After the model is being set there is the DataParallel that destroys it all:
the variable changes to : model.module.arc.weight variable
both with required_grad = True
_backward_hooks:
None
_grad_fn:
None
_grad:
None

if cfg.use_gpu:
    model = nn.DataParallel(model).cuda()

then the optimizer is being set and after the scheduler.

After all that when i am training the model

    if self.loss == 'arcloss':
        
        logits  = self.model(imgs) - output the embeddings of the net for each img in a 64 batch size.

then: we send the logits to the Forward function of ArcFace class:
in this class we normalize the same initialized weights (self.weight that we initialized in the first time)

requires_grad has chaned to False !!!
and grad_fn still None
Although, the weights are just normalized and multiplied it with the same embeddings from the output Net.

        outputs = self.model.module.arc(embeddings,pids)  (here, inside the self.model.module.arc there is the self.weight that was initialized in the ArcFace and multiplied with the embeddings)

class ArcFace(torch.nn.Module):
“”" ArcFace (https://arxiv.org/pdf/1801.07698v1.pdf):
“”"
def init(self, s=30.0 , num_classes =10, margin=0.5):
super(ArcFace, self).init()
self.classes = num_classes
self.out_features = 512
self.scale = s
self.cos_m = math.cos(margin)
self.sin_m = math.sin(margin)
self.theta = math.cos(math.pi - margin)
self.sinmm = math.sin(math.pi - margin) * margin
self.easy_margin = False

    self.weight = torch.nn.Parameter(torch.randn( self.classes, self.out_features ),requires_grad=True)
    #self.weight.requires_grad = True
    nn.init.xavier_uniform_(self.weight)
    #self.classifier = nn.Linear(self.out_features, num_classes).cuda()
    #parametrize.register_parametrization(self.classifier, "weight", Norm())
    #print(self.classifier.weight)
    #self.classifier.requires_grad_ = True

def forward(self, logits: torch.Tensor,  labels: torch.Tensor):
    #print(self.classifier.weight)
    #cosine = F.linear(F.normalize(logits), self.head.weight)
    #print(self.weight)
    #
    cosine = F.linear(F.normalize(logits), F.normalize(self.weight))
    cosine = cosine.clamp(-1,1)
    sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
    phi = cosine * self.cos_m - sine * self.sin_m
    if self.easy_margin:
        phi = torch.where(cosine > 0, phi, cosine)
    else:
        phi = torch.where(cosine > self.theta, phi, cosine - self.sinmm)
    # --------------------------- convert label to one-hot ---------------------------
    one_hot = torch.zeros(cosine.size(), device='cuda')
    #one_hot = torch.zeros(cosine.size()).cuda()
    one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
    # -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
    output = (one_hot * phi) + ((1.0 - one_hot) * cosine)  # you can use torch.where if your torch.__version__ is 0.4
    output *= self.scale
    return output

and with the returning variable “output” I calculate softmaxloss with the model
self.criterion = CrossEntropyLoss(
num_classes=self.datamanager.num_train_pids,
use_gpu=self.use_gpu,
label_smooth=label_smooth
)

loss = self.compute_loss(self.criterion, outputs, pids)
and when running
self.optimizer.zero_grad()
loss.backward() - Here i got the error !
element 0 of tensors does not require grad and does not have a grad_fn

I dont understand,if i multiply my Normalized(self.weight) with the embeddings , If I did a multiplication operation, why doesn’t grad_fn recognize the operation and thus it knows by what to deviate. but it is not like that - I actually get this error :

Exception has occurred: RuntimeError

element 0 of tensors does not require grad and does not have a grad_fn

I tried to read for couple of days and understand the problem. but with no luck.

I just want to calculate the projection of embeeding that comes from the net with initialized weights (that initialized in the ArcFace module).

I dont understand why loss and the same weights has no grad_fn and no requierd_grad.

Please!!! Help meeeeeeee!!!

   self.num_classes = len(pids)
    ### Need to output the y = classifier(v) , that is the connection for the acc
    #outputs  = self.model(imgs)
    
    if self.loss == 'arcloss':
        
        logits  = self.model(imgs)
        outputs = self.model.module.head(logits,pids)
    else:
        outputs  = self.model(imgs)  
           
            
    loss = self.compute_loss(self.criterion, outputs, pids)    ##metric_fc
    
    self.optimizer.zero_grad()
    loss.backward()     -Error !!!!! I tried .double and all other stufff
    self.optimizer.step()
    
    #acc = np.mean((outputs == pids).astype(float))
    loss_summary = {
        'loss': loss.item(),

Could you post a minimal, executable code snippet reproducing the issue, please?
You can post code snippets by wrapping them into three backticks ```, which makes debugging easier.