Hi,

Im trying to create new weights in order to calculate projection between the featuremaps and the new weights.

I created a class called ArcFace that some of you may know in the initialized model.

self.arc= ArcFace(s = 10, num_classes=num_classes, margin= 2 )

in ArcFace class I initiate a new self.weight = torch.nn.Parameter(torch.normal(0, 0.01, (num_classes,self.feature_dim)))

before running nn.DataParallel(model).cuda() I had the same weights saved under the:

model.arc.weight variable - like it should

After the model is being set there is the DataParallel that destroys it all:

the variable changes to : model.module.arc.weight variable

both with required_grad = True

_backward_hooks:

None

_grad_fn:

None

_grad:

None

```
if cfg.use_gpu:
model = nn.DataParallel(model).cuda()
```

then the optimizer is being set and after the scheduler.

After all that when i am training the model

```
if self.loss == 'arcloss':
logits = self.model(imgs) - output the embeddings of the net for each img in a 64 batch size.
```

then: we send the logits to the Forward function of ArcFace class:

in this class we normalize the same initialized weights (self.weight that we initialized in the first time)

requires_grad has chaned to False !!!

and grad_fn still None

Although, the weights are just normalized and multiplied it with the same embeddings from the output Net.

```
outputs = self.model.module.arc(embeddings,pids) (here, inside the self.model.module.arc there is the self.weight that was initialized in the ArcFace and multiplied with the embeddings)
```

class ArcFace(torch.nn.Module):

“”" ArcFace (https://arxiv.org/pdf/1801.07698v1.pdf):

“”"

def **init**(self, s=30.0 , num_classes =10, margin=0.5):

super(ArcFace, self).**init**()

self.classes = num_classes

self.out_features = 512

self.scale = s

self.cos_m = math.cos(margin)

self.sin_m = math.sin(margin)

self.theta = math.cos(math.pi - margin)

self.sinmm = math.sin(math.pi - margin) * margin

self.easy_margin = False

```
self.weight = torch.nn.Parameter(torch.randn( self.classes, self.out_features ),requires_grad=True)
#self.weight.requires_grad = True
nn.init.xavier_uniform_(self.weight)
#self.classifier = nn.Linear(self.out_features, num_classes).cuda()
#parametrize.register_parametrization(self.classifier, "weight", Norm())
#print(self.classifier.weight)
#self.classifier.requires_grad_ = True
def forward(self, logits: torch.Tensor, labels: torch.Tensor):
#print(self.classifier.weight)
#cosine = F.linear(F.normalize(logits), self.head.weight)
#print(self.weight)
#
cosine = F.linear(F.normalize(logits), F.normalize(self.weight))
cosine = cosine.clamp(-1,1)
sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
phi = cosine * self.cos_m - sine * self.sin_m
if self.easy_margin:
phi = torch.where(cosine > 0, phi, cosine)
else:
phi = torch.where(cosine > self.theta, phi, cosine - self.sinmm)
# --------------------------- convert label to one-hot ---------------------------
one_hot = torch.zeros(cosine.size(), device='cuda')
#one_hot = torch.zeros(cosine.size()).cuda()
one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
# -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
output = (one_hot * phi) + ((1.0 - one_hot) * cosine) # you can use torch.where if your torch.__version__ is 0.4
output *= self.scale
return output
```

and with the returning variable “output” I calculate softmaxloss with the model

self.criterion = CrossEntropyLoss(

num_classes=self.datamanager.num_train_pids,

use_gpu=self.use_gpu,

label_smooth=label_smooth

)

loss = self.compute_loss(self.criterion, outputs, pids)

and when running

self.optimizer.zero_grad()

loss.backward() - Here i got the error !

element 0 of tensors does not require grad and does not have a grad_fn

I dont understand,if i multiply my Normalized(self.weight) with the embeddings , If I did a multiplication operation, why doesn’t grad_fn recognize the operation and thus it knows by what to deviate. but it is not like that - I actually get this error :

Exception has occurred: RuntimeError

element 0 of tensors does not require grad and does not have a grad_fn

I tried to read for couple of days and understand the problem. but with no luck.

I just want to calculate the projection of embeeding that comes from the net with initialized weights (that initialized in the ArcFace module).

I dont understand why loss and the same weights has no grad_fn and no requierd_grad.

Please!!! Help meeeeeeee!!!

```
self.num_classes = len(pids)
### Need to output the y = classifier(v) , that is the connection for the acc
#outputs = self.model(imgs)
if self.loss == 'arcloss':
logits = self.model(imgs)
outputs = self.model.module.head(logits,pids)
else:
outputs = self.model(imgs)
loss = self.compute_loss(self.criterion, outputs, pids) ##metric_fc
self.optimizer.zero_grad()
loss.backward() -Error !!!!! I tried .double and all other stufff
self.optimizer.step()
#acc = np.mean((outputs == pids).astype(float))
loss_summary = {
'loss': loss.item(),
```