Hi, I’m trying to make a classifier using a CNN. The data points consist of a 70x70 image and 5 labels represented as a list of length 5 as each image contains up to 5 digits.
Here’s example of a label for an image which contains the digits 1,5 and 9.
[1, 5, 9, -1, -1]
Below are snippets of parts of my code so far, for reference…
class complexNet(nn.Module):
def __init__(self):
super(complexNet, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, ...)
...
self.fc5 = nn.Linear(120, 11) # 11 possible classes: digits 0-9 and no digit, -1.
def forward(self, x):
x = x.cuda()
x = self.pool(torch.sigmoid(self.conv1(x)))
...
x = x.view(-1, c*w*h )
...
x = self.fc5(x)
return x
model = complexNet()
optimizer = optim.Adam(...)
loss = nn.MultiLabelSoftMarginLoss()
for i, batch in enumerate(train_loader, 0):
x,y = batch
logit = model(x)
_, predicted = torch.max(logit, dim=1)
print(predicted)
J = loss(logit,y)
print(J.item())
The print statement outputs something like this, where each tensor is of length 10 since I’m using a batch size of 10:
...
tensor([6, 6, 6, 6, 6, 6, 6, 6, 6, 6], device='cuda:0')
tensor([ 6, 6, 6, 6, 6, 6, 6, 6, 10, 6], device='cuda:0')
tensor([6, 4, 6, 5, 6, 6, 6, 5, 6, 6], device='cuda:0')
tensor([6, 6, 6, 6, 6, 6, 5, 6, 6, 6], device='cuda:0')
...
Ideally, an example of a predicted value would look like the following:
tensor([
[1,2,3,4,5],
[1,0,0,-1,-1],
[5,2,3,0,-1],
[9,2,3,3,-1],
[0,-1,-1,-1,-1]
...
...
], device='cuda:0')
In addition, unsurprisingly the loss function yields the following error:
D:\SoftwarePackages\Anaconda\lib\site-packages\torch\nn\functional.py:1594: UserWarning: Using a target size (torch.Size([10, 5])) that is different to the input size (torch.Size([10, 11])) is deprecated. Please ensure they have the same size.
"Please ensure they have the same size.".format(target.size(), input.size()))
...
ValueError: Target and input must have the same number of elements. target nelement (50) != input nelement (110)
For simplicity, we can break the problem down into 2 main concerns for the time being:
- I would like each number in the tensor to instead be an array of length 5, where each number is a digit from the image.
- Should this be done in the forward function, or within the training loop?
I’ve mostly tried to solve this problem based off of advice from 2 other topics, but have not had any success.
Any help at all is hugely appreciated, Thank you!