HJX-zhanS
(Jiaxuan Han)
July 20, 2022, 2:15am
1
I am currently trying to build the following model.
My code is as following shows:
class CNN_1(torch.nn.Module):
def __init__(self):
super(CNN_1, self).__init__()
...
def forward(self, X):
...
return X
class CNN_2(torch.nn.Module):
def __init__(self):
super(CNN_2, self).__init__()
...
def forward(self, X):
...
return X
class CNN_3(torch.nn.Module):
def __init__(self):
super(CNN_3, self).__init__()
self.cnn1 = CNN_1()
self.cnn2 = CNN_2()
self.fc=torch.nn.Linear(... , ...)
def forward(self, X_1, X_2):
X_1 = self.cnn1(X_1)
X_2 = self.cnn2(X_2)
X = torch.cat([X_1, X_2])
X = self.fc(X)
return X
The training code is as follows:
model = CNN_3()
criterion = CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)
X_1 = ...
X_2 = ...
labe l= ...
model.train()
preds = model(X_1, X_2)
loss = criterion(preds, label)
loss.requires_grad_(True)
optimizer.zero_grad()
loss.backward()
optimizer.step()
However, the performance of the model is not good. I checked the parameters of the model before and after training, and found that after training, the model parameters did not change, and grad_value
is None.
I don’t know what’s wrong, please help me, thank you.
thecho7
(Suho Cho)
July 20, 2022, 2:22am
2
The structure is correct.
Would you check that optimizer.step()
is correctly called?
This is the basic of how your model is updated.
optimizer.zero_grad()
out = model(input)
loss = loss_function(out, gt)
loss.backward()
optimizer.step()
1 Like
HJX-zhanS
(Jiaxuan Han)
July 20, 2022, 2:28am
3
Thank you for your answer.
The optimizer.step()
is correctly called. But the model parameters did not change, and grad_value
is None.
thecho7
(Suho Cho)
July 20, 2022, 2:35am
4
Would you give us a code snippet?
HJX-zhanS
(Jiaxuan Han)
July 20, 2022, 2:45am
5
Here is the code snippet.
class ConvBlock(torch.nn.Module):
def __init__(self, kernel_h, emb_size, max_line):
super(ConvBlock, self).__init__()
self.cnn = torch.nn.Conv1d(in_channels=emb_size, out_channels=10, kernel_size=kernel_h)
self.max_pool = torch.nn.MaxPool1d(kernel_size=(max_line - kernel_h + 1))
def forward(self, X):
X = self.cnn(X.squeeze(1).permute(0, 2, 1))
X = F.relu(X)
# X = X.squeeze(-1)
X = self.max_pool(X)
X = X.squeeze(-1)
return X
class MyTextCNN(torch.nn.Module):
def __init__(self, emb_size, max_line):
super(MyTextCNN, self).__init__()
self.block2 = ConvBlock(3, emb_size, max_line)
self.block3 = ConvBlock(4, emb_size, max_line)
self.block4 = ConvBlock(5, emb_size, max_line)
def forward(self, X):
X = X.unsqueeze(1)
X_2 = self.block2(X)
X_3 = self.block3(X)
X_4 = self.block4(X)
X = torch.cat([X_2, X_3, X_4], dim=1)
return X
class AssembleModel(torch.nn.Module):
def __init__(self):
super(AssembleModel, self).__init__()
self.cnn_1 = MyTextCNN(300, 200)
self.cnn_2 = MyTextCNN(300, 500)
self.fc = torch.nn.Linear(30, 2)
def forward(self, X_1, X_2):
X_1 = self.cnn_1(X_1)
X_2 = self.cnn_2(X_2)
X = torch.cat([X_1, X_2], dim=1)
out = self.fc(X)
return out
# train
model = AssembleModel()
criterion = CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)
X_1 = ...
X_2 = ...
label= ...
model.train()
optimizer.zero_grad()
preds = model(X_1, X_2)
loss = criterion(preds, label)
loss.requires_grad_(True)
loss.backward()
optimizer.step()
HJX-zhanS
(Jiaxuan Han)
July 20, 2022, 5:10am
7
I use this code to check grad_value
:
for name, parms in model.named_parameters():
print('-->name:', name)
print('-->para:', parms)
print('-->grad_requirs:', parms.requires_grad)
print('-->grad_value:', parms.grad)
As for loss.requires_grad_(True)
, if I remove it, I get this error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I don’t see any operation which would detach the computation graph, so could you post the input shapes to make the code executable?
HJX-zhanS
(Jiaxuan Han)
July 20, 2022, 6:06am
9
The shape of X_1
is (batch_size, 200, 300)
, the shape of X_2
is (batch_size, 500, 300)
.
Thanks! The shapes don’t work as you would be running into a shape mismatch in self.fc
After fixing it by setting in_features=60
the model works correctly:
model = AssembleModel()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)
X_1 = torch.randn(10, 200, 300)
X_2 = torch.randn(10, 500, 300)
label = torch.randint(0, 2, (10,))
model.train()
optimizer.zero_grad()
preds = model(X_1, X_2)
loss = criterion(preds, label)
loss.backward()
optimizer.step()
for name, param in model.named_parameters():
print(name, param.grad.abs().sum())
Output:
cnn_1.block2.cnn.weight tensor(81.9196)
cnn_1.block2.cnn.bias tensor(0.0434)
cnn_1.block3.cnn.weight tensor(119.6594)
cnn_1.block3.cnn.bias tensor(0.0476)
cnn_1.block4.cnn.weight tensor(126.1779)
cnn_1.block4.cnn.bias tensor(0.0402)
cnn_2.block2.cnn.weight tensor(132.8887)
cnn_2.block2.cnn.bias tensor(0.0697)
cnn_2.block3.cnn.weight tensor(106.3185)
cnn_2.block3.cnn.bias tensor(0.0431)
cnn_2.block4.cnn.weight tensor(169.7578)
cnn_2.block4.cnn.bias tensor(0.0539)
fc.weight tensor(10.4102)
fc.bias tensor(0.1133)
So your actual use case seems to use another code.
1 Like
HJX-zhanS
(Jiaxuan Han)
July 20, 2022, 8:51am
11
Thank you so much! I tested the code you provided and succeeded.