The DenseNet and VGG models both expect an input in the shape [batch_size, 3, 224, 224]
.
The error is raised, because your inputs only have a spatial size of 3x3
and are thus too small.
The conv and pooling layers will decrease the spatial size so that an intermediate activation would be empty.
To avoid this error, you would have to increase the spatial size e.g. to the aforementioned shape.
Error is solved, but this time as output I have this tensor. Thanks for your patience and answer.
tensor([[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219],
[0.1389, 0.0219]], device='cuda:0', grad_fn=<AddmmBackward>)
I want to save as model, I tried changing F.relu()
to nn.ReLU()
but it raised another error.
AttributeError: 'ReLU' object has no attribute 'dim'
Could you post a code snippet, which raises this issue, please?
Hi ptrblck,
I am new to pytorch (deep learning). This example of combining trained models is exactly what I am facing now. I need some help to correct my thought:
My thought/problem is: I have model A and model B, and I use MyEnsemble(model A, model B) as you described.
I defined: model = MyEnsemble(modelA, modelB).
Then, for parameters optimizer I used: optimizer = th.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
model.train()
train_losses,val_losses,train_accs,val_accs = [],[],[],[]
early_stopping = EarlyStopping(patience=patience, verbose=True)
for epoch in range(n_epochs):
train_loader = DataLoader(traindata_list, batch_size=batchsize, shuffle=True, num_workers=5)
test_loader = DataLoader(testdata_list, batch_size=batchsize,
shuffle=True, num_workers=5)
avg_loss,avg_acc = [],[]
batch_idx = 0
tqdm_train_loader = tqdm(train_loader)
for Batch_data in tqdm_train_loader:
Batch_data.to(device)
out_labels = model(Batch_data)
loss = F.nll_loss(out_labels, Batch_data.y)
optimizer.zero_grad()
loss.backward()
avg_loss.append(loss.item())
optimizer.step()
_, pred = out_labels.max(dim=-1)
I am wondering that whether this is correct way to optimize the parameters.
I very appreciate for your help.
Thanks!
Your code is a bit hard to read, as you haven’t formatted it. You can post code snippet by wrapping them into three backticks ```, which would make debugging easier.
Are you seeing any issues with the current code?
Hi ptrblck,
My current code (as I posted) still works and gives me some results. But, I am still not sure is it correct?
Sorry, I will post again
optimizer = th.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
model.train()
train_losses,val_losses,train_accs,val_accs = [],[],[],[]
early_stopping = EarlyStopping(patience=patience, verbose=True)
for epoch in range(n_epochs):
train_loader = DataLoader(traindata_list, batch_size=batchsize, shuffle=True, num_workers=5)
test_loader = DataLoader(testdata_list, batch_size=batchsize,
shuffle=True, num_workers=5)
avg_loss,avg_acc = [],[]
batch_idx = 0
tqdm_train_loader = tqdm(train_loader)
for Batch_data in tqdm_train_loader:
Batch_data.to(device)
out_labels = model(Batch_data)
loss = F.nll_loss(out_labels, Batch_data.y)
optimizer.zero_grad()
loss.backward()
avg_loss.append(loss.item())
optimizer.step()
_, pred = out_labels.max(dim=-1)
The code looks generally alright besides the missing indentations, but I assume these are copy/paste issues.
The Batch_data.to(device)
looks a bit weird, as you would have to reassign the tensor as:
Batch_data = Batch_data.to(device)
However, based on this usage Batch_data.y
, I guess Batch_data
might not be a tensor but a custom class, so the to
operation might work as intended.
Hi ptrblck,
Sorry for my late reply. I was sick a couple of weeks ago.
Yes because paste, so “missing indentations” shown here.
Thanks for your comments and pointing out ‘Batch_data = Batch_data.to(device)’.
I used Batch_data.to(device), my model still train on that data actually. My question is: should we always assign ‘Batch_data = Batch_data.to(device)’, what happens if we forgot to assign?
If Batch_data
is a tensor, you have to reassign it, since this operation is not executed inplace on the tensor (different than the recursive call in an nn.Module
):
device = 'cuda'
model = nn.Linear(1, 1)
# parameters are initialized on the CPU
print(model.weight.device)
> cpu
# move to GPU
model.to(device)
print(model.weight.device)
> cuda:0
# create tensor on the CPU
x = torch.randn(1)
print(x.device)
> cpu
# try to move to GPU, but forget the assignemnt
x.to(device)
# x is still in the CPU
print(x.device)
> cpu
# move properly
x = x.to(device)
print(x.device)
> cuda:0
Thanks ptrblck for detailed explanation with example.
I am clear about it now.
Many thanks!
Hi Ptrblck,
I am stuck for almost a week on a problem.I want to backpropagat the loss from text-rec model, to my generator.I have made the pipeline, but generator parameters are not being updated.Can you help me with this…
Thanks
Could you check, if the generator’s parameters get valid gradients after the corresponding backward()
call?
You could check it via:
for name, param in generator.named_parameters():
print(name, param.grad)
If these .grad
attributes are None
after the backward
operation, it would point towards a detached computation graph and you would have to check, where it was detached.
Thanks, for replying.
Yes, I have checked the way you told, for both models, after the training loop, grads are not none, before it they were none
Actually I have ASTER model(for text-recognition), and UNET model for text-reconstruction, I want to update Generator params on the basis of loss calculated by ASTER…
Sorry, as I am taking a lot of your time.But this is the colab notebook,
Here at the end I have a test loop, which is looping on a single image,
‘model’ is ASTER model, and ‘model_torch_rep’ is generator model…
Also, usuallyif we use any pretrained model for such thing, we first put it intp eval(), and also put requires_grad=False,
BUt in this case, if I put into eval, it gives error that ‘CUDNN RNN CANNOT BE BACKPROPAGATED IN EVAL’ (Something like it.)
To avoid the cudnn RNN issue, call .train()
on this layer only or disable cudnn (for this layer) so that the native PyTorch implementation will be used.
Did you check the .grad
attributes after the backward
call as suggested?
If so, did they contain valid values or were they set to None
? Also, are you setting the required_grad
attribute of these parameters to False
or only in another use case for a pretrained model?
If I set requires_grad=False, then it gives error.
I donot know about this disabling cudnn,
But I am not understanding ,as requires_grad=True for ASTER model, so it will also be updated, which is not required…
Is there any way, that I can use ASTER model, only for calculating CTC loss, between text-preds and labels, and then backpropagate loss, to update the generator params only???
Also, I am using this as an intermediate model, to make output from generator compatible for input to ASTER model…
class BridgeModel(nn.Module):
def __init__(self):
super(BridgeModel, self).__init__()
pass
def forward(self,x):
#out=torch.cat([x,x,x],1)
out=x.repeat(1,3, 1, 1)
out=nn.functional.interpolate(out,(128,128))
out.sub_(0.5).div_(0.5)
input_dict={}
input_dict['images']=out.to(device)
input_dict['rec_targets'] = torch.IntTensor(1, args.max_len).fill_(1).to(device)
input_dict['rec_lengths'] = [args.max_len]
return input_dict
Thanks again for your help…
The .requires_grad
attribute of parameter of the model, which should be updated, should not be set to False
.
You can disable the gradient calculation for models, which should be frozen as seen in this example:
modelA = nn.Linear(1, 1)
modelB = nn.Linear(1, 1)
for param in modelB.parameters():
param.requires_grad = False
out = modelA(torch.randn(1, 1))
out = modelB(out)
out.backward()
for param in modelA.parameters():
print(param.grad) # valid grads
for param in modelB.parameters():
print(param.grad) # None
torch.backends.cudnn.enable = False
or the context manager with torch.backends.cudnn.flags(enabled=False)
would disable cudnn.