You have to set the requires_grad
attribute on parameters, not an nn.Module
.
Change:
model_conv.classifier.requires_grad_=True
to
for param in model_conv.classifier.parameters():
param.requires_grad = True
You have to set the requires_grad
attribute on parameters, not an nn.Module
.
Change:
model_conv.classifier.requires_grad_=True
to
for param in model_conv.classifier.parameters():
param.requires_grad = True
Hello @ptrblck , I read through the comments however i am still unsure why the similar error occurs for my case as well . Heres my code for training
def train_fn(data_loader, model, optimizer, scheduler):
model.train()
total_train_loss = 0
lst_active_labels = []
lst_active_preds = []
for batch in tqdm(data_loader, total = len(data_loader)):
b_input_ids = batch[0].cuda()
b_input_mask = batch[1].cuda()
b_labels = batch[2].cuda()
# Zero the gradients
model.zero_grad()
outputs = model(b_input_ids,
attention_mask=b_input_mask,
labels=b_labels)
loss = outputs[0]
# loss.requires_grad =True
loss.backward()
total_train_loss += loss.item()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
scheduler.step()
labels = b_labels.view(-1)
active_logits = outputs[1].view(-1, 2)
flattened_predictions = torch.argmax(active_logits, axis=1)
active_accuracy = labels.view(-1) != -100
labels_tmp = torch.masked_select(labels, active_accuracy)
pred_tmp = torch.masked_select(flattened_predictions, active_accuracy)
lst_active_labels.extend(labels_tmp.tolist())
lst_active_preds.extend(pred_tmp.tolist())
avg_f1_score_0=f1_score(lst_active_labels,lst_active_preds,average='binary',pos_label = 0)
avg_f1_score_1=f1_score(lst_active_labels,lst_active_preds,average='binary',pos_label = 1)
avg_accuracy_score=accuracy_score(lst_active_labels,lst_active_preds)
avg_mcc_score=matthews_corrcoef(lst_active_labels,lst_active_preds)
return (float(total_train_loss / len(data_loader)),avg_f1_score_0,avg_f1_score_1, avg_accuracy_score, avg_mcc_score,lst_active_labels,lst_active_preds)
The error occurs in loss.backward() .
Thanks in advance
Most likely you are detaching an activation tensor from the computation graph inside the model. Could you post the model definition, please?
Thanks @ptrblck for the quick response.
This is my model. I am trying to predict the quality(0/1) of each token in a set to source and translated sentence. Train data is something like [SEP] source_sentence[SEP] target /translated_sentence [SEP] and labels are 0 1 1 0 … for each token in source and target.
class EntityModel(nn.Module):
def __init__(self):
super(EntityModel, self).__init__()
self.bert = XLMRobertaForTokenClassification.from_pretrained(config.BASE_MODEL,output_attentions = False, output_hidden_states = False, num_labels=2)
def forward(self, ids, attention_mask, labels):
outputs = self.bert(ids,
attention_mask = attention_mask,
labels = labels,return_dict=False)
return outputs[0], outputs[1]
I can’t see anything obviously wrong in your code and based on the docs the model should return logits as well as the loss.
Check the .grad_fn
attributes of both outputs and make sure they point to a valid backward function (not showing None
).
If the .grad_fn
of these outputs are indeed set to None
than this tensor would be detached from the computation graph and you might need to look into the model itself.
Hello,
I have a problem. I trained a model and I want to train another model containing the first one without changing the weights of the first one. So here is my code :
class FinalModel(pl.LightningModule):
def __init__(self, model, class_weights):
super().__init__()
self.model = model
for param in self.model.parameters():
param.requires_grad = False
self.linear = nn.Linear(512, 242)
self.train_loss_tracker = EMATracker()
self.train_acc_tracker = EMATracker()
self.train_bal_acc_tracker = EMATracker()
self.train_ma_prec_tracker = EMATracker()
self.train_ma_recall_tracker = EMATracker()
self.train_ma_f1_tracker = EMATracker()
self.train_mi_prec_tracker = EMATracker()
self.train_mi_recall_tracker = EMATracker()
self.train_mi_f1_tracker = EMATracker()
self.class_weights = class_weights
def forward(self, x):
x = self.model(x)
x = self.linear(x)
return x
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=3e-4)
return optimizer
When I start training it, it says : element 0 of tensors does not require grad and does not have a grad_fn.
Can you please help me ?
Regards
Your model seems to work fine if I remove some unneeded stuff:
class FinalModel(nn.Module):
def __init__(self, model):
super().__init__()
self.model = model
for param in self.model.parameters():
param.requires_grad = False
self.linear = nn.Linear(512, 242)
def forward(self, x):
x = self.model(x)
x = self.linear(x)
return x
tmp = models.resnet18()
tmp.fc = nn.Linear(512, 512)
model = FinalModel(model=tmp)
x = torch.randn(1, 3, 224, 224)
out = model(x)
print(out.grad_fn)
# <AddmmBackward0 object at 0x7f00ad895940>
out.mean().backward() # works
so I assume something else might be causing the issue in your setup. E.g. did you globally disable gradient computation?
I am sorry, you are right. The problem was somewhere else. Sorry for the inconvenience !
Try:
torch.set_grad_enabled(True)
I have this same error… Hoping someone can help me ID whatever is causing detach!
Thanks!
Mark
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(input_size, hidden_units)),
('relu', nn.ReLU()),
('fc2', nn.Linear(hidden_units, 256)),
('relu', nn.ReLU()),
('fc3', nn.Linear(256, 102)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=learning_rate)
# epochs = 10
train_losses = []
valid_losses = []
valid_accuracies = []
def train():
print('\nTraining classifier for {}. Please wait...'.format(arch))
model.to(device)
for e in range(epochs):
# print('Training Epoch {}...\r'.format(e+1), end =" ")
# epoch timing
epoch_start = time()
running_train_loss = 0
# enable dropout and gradient computing
model.train()
for images, labels in trainloader: # for each image input
# Load the training image and label into GPU
images, labels = images.to(device), labels.to(device)
# Fresh start
optimizer.zero_grad()
# run fwd to get log probabilities
log_ps = model.forward(images)
# Check & calculate loss, hopefully improves over time
train_loss = criterion(log_ps, labels)
# drive the loss backward into the model
train_loss.backward()
# take a step and adjust weights
optimizer.step()
running_train_loss += train_loss.item()
else: # loss check using validation data
img_num = 0
running_valid_loss = 0
running_valid_accuracy = 0
accuracy = 0
# disable dropout
model.eval()
# gradient computing needs to be turned off explicitly
with torch.no_grad():
for images, labels in validloader:
# print('Validing Epoch {} image {}...\r'.format(e+1, img_num+1), end =" ")
# Load the validation image and label into GPU
images, labels = images.to(device), labels.to(device)
# run fwd to get log probabilities
log_ps = model.forward(images)
# probabilities
ps = torch.exp(log_ps)
# calculations for validation loss
valid_loss = criterion(log_ps, labels)
running_valid_loss += valid_loss.item()
# calculations for accuracy
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
running_valid_accuracy += torch.mean(equals.type(torch.FloatTensor))
# epoch run time
epoch_end = time()
epoch_time = epoch_end - epoch_start
# save epoch results
train_losses.append(running_train_loss/len(trainloader))
valid_losses.append(running_valid_loss/len(validloader))
valid_accuracies.append(running_valid_accuracy/len(validloader))
# print epoch results
print(
'Epoch# {}/{}: '.format(e+1, epochs),
'{:.2f} sec. '.format(epoch_time),
'Train Loss: {:.3f}. '.format(train_losses[-1]),
'Validation Loss: {:.3f}. '.format(valid_losses[-1]),
'Accuracy: {:.1f}%'.format(100.0*valid_accuracies[-1])
)
print('Training completed.')
Your general code looks alright and I cannot reproduce the issue by adding some missing pieces:
model = models.vgg16()
input_size = model.classifier[0].in_features
hidden_units = 512
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(input_size, hidden_units)),
('relu', nn.ReLU()),
('fc2', nn.Linear(hidden_units, 256)),
('relu', nn.ReLU()),
('fc3', nn.Linear(256, 102)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
criterion = nn.NLLLoss()
optimizer = torch.optim.Adam(model.classifier.parameters(), lr=1e-3)
dataset = TensorDataset(torch.randn(10, 3, 224, 224), torch.randint(0, 102, (10,)))
trainloader = DataLoader(dataset)
device = 'cpu'
def train():
for e in range(10):
running_train_loss = 0
model.train()
for images, labels in trainloader: # for each image input
# Load the training image and label into GPU
images, labels = images.to(device), labels.to(device)
# Fresh start
optimizer.zero_grad()
# run fwd to get log probabilities
log_ps = model.forward(images)
# Check & calculate loss, hopefully improves over time
train_loss = criterion(log_ps, labels)
# drive the loss backward into the model
train_loss.backward()
# take a step and adjust weights
optimizer.step()
print(f"Epoch {e}, loss {train_loss.item()}")
train()
Could you check what the difference between our code snippets might be?
They look the same basically. Mine chokes on the loss.backward(), BTW. I should also mention that it did not choke on vgg16, but for this Udacity project it says to try with other pretrained models, and it died when I tried resnet18. I wouldn’t know if that’s useful information. Thanks!
Yes, that’s useful, since other models might not use the .classifier
attribute.
I assume you are freezing all parameters of the resnet and then want to train the newly initialized .classifier
afterwards.
However, note that resnet18
uses .fc
for the last layer:
model = models.resnet18()
print(model)
# ...
# (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
# (fc): Linear(in_features=512, out_features=1000, bias=True)
# )
model.classifier
# AttributeError: 'ResNet' object has no attribute 'classifier'
# create a *new* classifier attribute which is never used
model.classifier = nn.Linear(1, 1)
print(model)
# ...
# (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
# (fc): Linear(in_features=512, out_features=1000, bias=True)
# (classifier): Linear(in_features=1, out_features=1, bias=True)
# )
So you are creating a new .classifier
attribute which is never used while your output comes from the frozen .fc
layer and will thus raise the error.
Thanks for the tip! I’m going to stick with some other models for now and come back to try to get a shim to get this model working later.
Hi Sir, may I ask that: if I want to train an ensembled model, and only want to train the parameters of the first model, but when I do the loss.backward(), it shows error message like “element 0 of tensors does not require grad and does not have a grad_fn”
I have checked the network, the parameters are successfully set (modelA all parameters requires_grad set as True, and modelB all parameters requires_grad set as False)
Please post some code that reproduces your error, and make sure the tensors with respect to which you are calculating the gradients are being used in the computation graph of the tensor you wish to differentiate (i.e. in the calculations of the tensor you call backward
on).
Hello @ptrblck,
I’ve built latest c++ pytorch from sources with BUILD_MOBILE_AUTOGRAD : ON on ios. However, when I’m trying to reproduce the easiest autograd example, I get an error.
{
torch::AutoGradMode enable_grad(true);
auto x = torch::ones({2, 2}, torch::requires_grad());
auto y = x + 2;
y = y.sum();
y.backward();
}
ERROR_LOG: element 0 of tensors does not require grad and does not have a grad_fn
My initial task is to collect the gradients from forward pass in order to preprocess input(to perform forward pass again, and get more robust softmax scores).
Seems like autograd is not working on ios.
I think it’s a known limitation and have heard that changing some build arguments could reenable autograd. However, based on your experience it doesn’t seem to work (anymore). Could you create a GitHub issue so that the code owners could take a look at it, please?
I am having the same issue with following https://github.com/gagan3012/keytotext for custom dataset. I am encountering the same error, and I cant traceback the code. Kindly help me to solve this error
/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
485 inputs=inputs,
486 )
→ 487 torch.autograd.backward(
488 self, gradient, retain_graph, create_graph, inputs=inputs
489 )
/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
198 # some Python versions print out the first line of a multi-line function
199 # calls in the traceback and some print out the last line
→ 200 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
201 tensors, grad_tensors, retain_graph, create_graph, inputs,
202 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Could you post a minimal and executable code snippet reproducing the error, please?