Combining Trained Models in PyTorch

The DenseNet and VGG models both expect an input in the shape [batch_size, 3, 224, 224].
The error is raised, because your inputs only have a spatial size of 3x3 and are thus too small.
The conv and pooling layers will decrease the spatial size so that an intermediate activation would be empty.
To avoid this error, you would have to increase the spatial size e.g. to the aforementioned shape.

1 Like

Error is solved, but this time as output I have this tensor. Thanks for your patience and answer.

tensor([[0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219],
        [0.1389, 0.0219]], device='cuda:0', grad_fn=<AddmmBackward>)

I want to save as model, I tried changing F.relu() to nn.ReLU() but it raised another error.

AttributeError: 'ReLU' object has no attribute 'dim'

Could you post a code snippet, which raises this issue, please?

1 Like

Hi ptrblck,
I am new to pytorch (deep learning). This example of combining trained models is exactly what I am facing now. I need some help to correct my thought:
My thought/problem is: I have model A and model B, and I use MyEnsemble(model A, model B) as you described.
I defined: model = MyEnsemble(modelA, modelB).
Then, for parameters optimizer I used: optimizer = th.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
model.train()
train_losses,val_losses,train_accs,val_accs = [],[],[],[]
early_stopping = EarlyStopping(patience=patience, verbose=True)
for epoch in range(n_epochs):
train_loader = DataLoader(traindata_list, batch_size=batchsize, shuffle=True, num_workers=5)
test_loader = DataLoader(testdata_list, batch_size=batchsize,
shuffle=True, num_workers=5)
avg_loss,avg_acc = [],[]
batch_idx = 0
tqdm_train_loader = tqdm(train_loader)
for Batch_data in tqdm_train_loader:
Batch_data.to(device)
out_labels = model(Batch_data)
loss = F.nll_loss(out_labels, Batch_data.y)
optimizer.zero_grad()
loss.backward()
avg_loss.append(loss.item())
optimizer.step()
_, pred = out_labels.max(dim=-1)
I am wondering that whether this is correct way to optimize the parameters.
I very appreciate for your help.
Thanks!

Your code is a bit hard to read, as you haven’t formatted it. You can post code snippet by wrapping them into three backticks ```, which would make debugging easier.
Are you seeing any issues with the current code?

Hi ptrblck,
My current code (as I posted) still works and gives me some results. But, I am still not sure is it correct?
Sorry, I will post again

optimizer = th.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
model.train()
train_losses,val_losses,train_accs,val_accs = [],[],[],[]
early_stopping = EarlyStopping(patience=patience, verbose=True)
for epoch in range(n_epochs):
train_loader = DataLoader(traindata_list, batch_size=batchsize, shuffle=True, num_workers=5)
test_loader = DataLoader(testdata_list, batch_size=batchsize,
shuffle=True, num_workers=5)
avg_loss,avg_acc = [],[]
batch_idx = 0
tqdm_train_loader = tqdm(train_loader)
for Batch_data in tqdm_train_loader:
Batch_data.to(device)
out_labels = model(Batch_data)
loss = F.nll_loss(out_labels, Batch_data.y)
optimizer.zero_grad()
loss.backward()
avg_loss.append(loss.item())
optimizer.step()
_, pred = out_labels.max(dim=-1)

The code looks generally alright besides the missing indentations, but I assume these are copy/paste issues.

The Batch_data.to(device) looks a bit weird, as you would have to reassign the tensor as:

Batch_data = Batch_data.to(device)

However, based on this usage Batch_data.y, I guess Batch_data might not be a tensor but a custom class, so the to operation might work as intended.

Hi ptrblck,
Sorry for my late reply. I was sick a couple of weeks ago.
Yes because paste, so “missing indentations” shown here.
Thanks for your comments and pointing out ‘Batch_data = Batch_data.to(device)’.
I used Batch_data.to(device), my model still train on that data actually. My question is: should we always assign ‘Batch_data = Batch_data.to(device)’, what happens if we forgot to assign?

If Batch_data is a tensor, you have to reassign it, since this operation is not executed inplace on the tensor (different than the recursive call in an nn.Module):

device = 'cuda'

model = nn.Linear(1, 1)
# parameters are initialized on the CPU
print(model.weight.device)
> cpu

# move to GPU
model.to(device)
print(model.weight.device)
> cuda:0

# create tensor on the CPU
x = torch.randn(1)
print(x.device)
> cpu

# try to move to GPU, but forget the assignemnt
x.to(device)
# x is still in the CPU
print(x.device)
> cpu

# move properly
x = x.to(device)
print(x.device)
> cuda:0
1 Like

Thanks ptrblck for detailed explanation with example.
I am clear about it now.
Many thanks!

Hi Ptrblck,
I am stuck for almost a week on a problem.I want to backpropagat the loss from text-rec model, to my generator.I have made the pipeline, but generator parameters are not being updated.Can you help me with this…
Thanks

Could you check, if the generator’s parameters get valid gradients after the corresponding backward() call?
You could check it via:

for name, param in generator.named_parameters():
    print(name, param.grad)

If these .grad attributes are None after the backward operation, it would point towards a detached computation graph and you would have to check, where it was detached.

Thanks, for replying.
Yes, I have checked the way you told, for both models, after the training loop, grads are not none, before it they were none
Actually I have ASTER model(for text-recognition), and UNET model for text-reconstruction, I want to update Generator params on the basis of loss calculated by ASTER…

Sorry, as I am taking a lot of your time.But this is the colab notebook,

Here at the end I have a test loop, which is looping on a single image,
‘model’ is ASTER model, and ‘model_torch_rep’ is generator model…

Also, usuallyif we use any pretrained model for such thing, we first put it intp eval(), and also put requires_grad=False,

BUt in this case, if I put into eval, it gives error that ‘CUDNN RNN CANNOT BE BACKPROPAGATED IN EVAL’ (Something like it.)

To avoid the cudnn RNN issue, call .train() on this layer only or disable cudnn (for this layer) so that the native PyTorch implementation will be used.

Did you check the .grad attributes after the backward call as suggested?
If so, did they contain valid values or were they set to None? Also, are you setting the required_grad attribute of these parameters to False or only in another use case for a pretrained model?

If I set requires_grad=False, then it gives error.

I donot know about this disabling cudnn,

But I am not understanding ,as requires_grad=True for ASTER model, so it will also be updated, which is not required…

Is there any way, that I can use ASTER model, only for calculating CTC loss, between text-preds and labels, and then backpropagate loss, to update the generator params only???
Also, I am using this as an intermediate model, to make output from generator compatible for input to ASTER model…

class BridgeModel(nn.Module):

  def __init__(self):
    super(BridgeModel, self).__init__()
    pass


  def forward(self,x):

    #out=torch.cat([x,x,x],1)
    out=x.repeat(1,3, 1, 1)
    out=nn.functional.interpolate(out,(128,128))
    out.sub_(0.5).div_(0.5)
    input_dict={}
    input_dict['images']=out.to(device)
    input_dict['rec_targets'] = torch.IntTensor(1, args.max_len).fill_(1).to(device)
    input_dict['rec_lengths'] = [args.max_len]
    return input_dict

Thanks again for your help…

The .requires_grad attribute of parameter of the model, which should be updated, should not be set to False.

You can disable the gradient calculation for models, which should be frozen as seen in this example:

modelA = nn.Linear(1, 1)
modelB = nn.Linear(1, 1)
for param in modelB.parameters():
    param.requires_grad = False

out = modelA(torch.randn(1, 1))
out = modelB(out)
out.backward()

for param in modelA.parameters():
    print(param.grad) # valid grads
    
for param in modelB.parameters():
    print(param.grad) # None

torch.backends.cudnn.enable = False or the context manager with torch.backends.cudnn.flags(enabled=False) would disable cudnn.