Until now I was working with TensorFlow but for different reasons, I want to pass the code to Pytorch. I am working on this problem (tensorflow - Keras multioutput custom loss with intermediate layers output - Stack Overflow) and I don’t know if the code I have written in Pytorch does what I really want it to do, since the loss is stuck from the beginning. I have tried to replicate the code in this way:

```
submodel2 = submodel2()
submodel2.load_state_dict(torch.load('pretrained_submodel2.pt'))
for param in submodel2.parameters():
param.requires_grad = False
submodel3 = submodel3()
submodel3.load_state_dict(torch.load('pretrained_submodel3.pt'))
for param in submodel3.parameters():
param.requires_grad = False
all_model = all_model(submodel2,submodel3) #submodel1 (with trainable weights) + submodel2 and submodel3 with frozen weights
criterion_1 = nn.SomeLoss()
criterion_2 = nn.SomeLoss()
criterion_3 = nn.SomeLoss()
optimizer = optim.Adam(all_model.parameters(), lr=0.001)
cudnn.benchmark = True
all_model = all_model.cuda()
for epoch in range(2500): # loop over the dataset multiple times
for i, data in enumerate(trainloader, 0):
input1,input2, label1, label2= data['input1'],data['input2'],data['label1'],data['label2']
input1= Variable(input1.cuda().type(torch.cuda.FloatTensor))
input2= Variable(input2.cuda().type(torch.cuda.FloatTensor))
label1= Variable(label1.cuda().type(torch.cuda.FloatTensor))
label2= Variable(label2.cuda().type(torch.cuda.FloatTensor))
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = all_model(input1,input2)
output1,output2,output3= outputs['output1'],outputs['output2'],outputs['output3']
loss1 = criterion1(input1.float(), output1.float().detach())
loss2 = criterion2(input2.float(), output2.float().flatten().detach())
loss3 = criterion3(output2.float(), output2.float().detach())
loss = loss1+loss2+loss3
loss.backward()
optimizer.step()
```

But it doesn’t seem to work well. I have also tried to make a backward for each loss function like this:

```
loss1.backward()
loss2.backward()
loss3.backward()
optimizer.step()
```

But I get this error:

```
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
```