How would i train my model properly?

Hello.
I am trying too train a CNN.
The CNN can be found here:

however i dont get it too really run.
I want too use CrossEntropyLoss as my lossfunction. Since this is the function which is used in the model here too.
The reason why i want too retrain is, that i need to mod it a little bit for a bigger Model and a bigger picture.
(384 * 384 image → 384 * 384 * 384 Voxel Space)
I have only 1 Dataset, this means one image and the corresponding voxel-model.
I know that this never ever is going too work-out for usability but it has too be done, just too proove that it is possible. (Huge Overfitting is okay, basically i wanted too run it on 5 Datasets later and make the CNN overfit for those 5 images)

Now i wrote some script which should train my CNN. However, some Loss-Functions let it run but the loss never gets better or worse. And some others, like CrossEntropy dont work at all.
Can someone help me and say what i am doing wrong?
My Script for the CNN is following:

VRN = vrn.vrn_unguided
VRN.load_state_dict(torch.load('vrn-unguided-selfmade.pth'))
# telling the model it will be trained
VRN.train()
# lossfct to calculate error while training -> sigmoid cross entropy loss same as in original vrn.
lossfct = torch.nn.modules.loss.CrossEntropyLoss()
# optimizer settings
optimizer = torch.optim.SGD(VRN.parameters(), lr=0.1)

# between the #-lines is just data-loading 
#################################################################
# file format .txt
voxel_array = np.zeros((1, 384, 384, 384), dtype=float)
file = open("sample_1.txt","r")
for line in file:
    if not "#" in line:
        x = line.partition(" ")[0]
        y = line.partition(" ")[2].partition(" ")[0]
        z = line.partition(" ")[2].partition(" ")[2].partition(" ")[0]
        voxel_array[0, int(x), int(y), int(z)] = 1
file.close()

###############
# here we load the image too train with
### get landmarks from test image
image_file = 'rsz_pic_zu_sample.jpg'
image = cv.imread(image_file)
try:
    image_height, image_width, image_depth = image.shape
except:
    print('cannot load image:', image_file)
# vrn output
inp = torch.from_numpy(image.transpose((2, 0, 1))).float().unsqueeze_(0)

###############################################################

voxel_array = torch.from_numpy(voxel_array)


# the actual training takes place here. Right now it just iterates 20 times over the same data.
for epoch in range(20):
    print(epoch)
    VRN.train()
    predicted_model = VRN(inp)[-1].data.cpu()
    vol = predicted_model.numpy()

    optimizer.zero_grad()
    vol = torch.from_numpy(vol)

    print(vol.shape)
    print(voxel_array.shape)
    loss = lossfct(vol, voxel_array)
    print(loss)
    loss.backward()
    optimizer.step()

torch.save(VRN.state_dict(), "vrn-unguided-selfmade.pth")
print("Done, saved CNN")


You are detaching the output tensor in:

predicted_model = VRN(inp)[-1].data.cpu()

by calling the .data attribute (which you shouldn’t call in any way, as it might yield unwanted side effects) and also in:

vol = predicted_model.numpy()

since you are leaving PyTorch and use numpy (Autograd won’t be able to track these operations).

I would assume that:

    loss = lossfct(vol, voxel_array)
    loss.backward()

would raise an error, since you are using numpy arrays, but you might also have re-wrapped the array into a tensor to get rid of the error.

To fix this issue: don’t use the .data attribute and, if possible, use PyTorch operations instead of numpy.
However, if you need to use numpy specific operations, you would have to implement the backward manually via a custom autograd.Function.

Hello,
so i changed some code in the script and get the following error:

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

however my training script is like this (the rest is unchanged):

for epoch in range(20):
    print(epoch)
    VRN.train()
    predicted_model = VRN(inp)[-1].cpu()
    optimizer.zero_grad()
    print(predicted_model.shape)
    print(voxel_array.shape)
    loss = lossfct(predicted_model, voxel_array.squeeze(1))
    print(loss)
    loss.backward()
    optimizer.step()

the print from the shapes says both are the same shape such as

torch.Size([1, 384, 384, 384])

however i understand that pytorch needs it too be [384,384,384]
i change the code too

 loss = lossfct(predicted_model.squeeze(0), voxel_array.squeeze(0))

so that we loose the 1 dimension. However this leads too

ValueError: Expected target size (384, 384), got torch.Size([384, 384, 384])

which i dont understand because in any combination the function does not work.
It basically should just compute the difference between the amount of the 1 in the arrays, i think.
And by minimising this, the CNN gets better in predicting the real Voxel-Volume.
Or am i wrong here?

nn.CrossEntropyLoss expects model outputs in the shape [batch_size, nb_classes, *additional_dims], while the target should contain class indices in the range [0, nb_classes-1] and have the shape [batch_size, *additional_dims]. Assuming you are working on a multi-class segmentation use case with volumes, your output might be [batch_size, nb_classes, depth, height, width], while the target should then be [batch_size, depth, height, width]. Based on the error message I assume that the target shape might be wrong.

Since i didnt not manage too make it run with CrossEntropyLoss i simply just tryed another lossfct.
With that i trained the vrn-network 5000 times with a single picture and the model.
However i thought regardless of lossfct etc. i can always overfit a model too fit it.
This is my script :

VRN = vrn.vrn_unguided
VRN.load_state_dict(torch.load('vrn-unguided-selfmade.pth'))
# telling the model it will be trained
VRN.train()
# lossfct to calculate error while training -> sigmoid cross entropy loss  in original vrn.
lossfct = torch.nn.modules.loss.MSELoss()
# optimizer settings
optimizer = torch.optim.SGD(VRN.parameters(), lr=1)

# this is for data loading. it is checked and loading correctly
voxel_array = np.zeros((1, 384, 384, 384), dtype=float)
file = open("sample_1.txt","r")
for line in file:
    if not "#" in line:
        x = line.partition(" ")[0]
        y = line.partition(" ")[2].partition(" ")[0]
        z = line.partition(" ")[2].partition(" ")[2].partition(" ")[0]
        voxel_array[0, int(x), int(y), int(z)] = 1
file.close()
# image to model. 
image_file = 'rsz_pic_zu_sample.jpg'
image = cv.imread(image_file)
try:
    image_height, image_width, image_depth = image.shape
except:
    print('cannot load image:', image_file)
# vrn output
inp = torch.from_numpy(image.transpose((2, 0, 1))).float().unsqueeze_(0)

voxel_array = torch.from_numpy(voxel_array)
voxel_array = voxel_array.float()

for epoch in range(10):
    print(epoch)
    VRN.train()
    predicted_model = VRN(inp)[-1].cpu()
    # vol = predicted_model.numpy()
    # vol = predicted_model.numpy()

    optimizer.zero_grad()

    # voxel_array = voxel_array.squeeze(0)

    # vol = torch.from_numpy(vol)
    loss = lossfct(predicted_model.squeeze(0), voxel_array.squeeze(0))
    print(loss)
    loss.backward()
    optimizer.step()

torch.save(VRN.state_dict(), "vrn-unguided-selfmade.pth")
print("Done, saved CNN")

However after running this 5k times.
If i test the network with the exact same picture as in training, the result is way different then the training-model.
This is the model trained on

and this is the output after 5k training on the same data

How can it be, that after so many iterations, even the train data gives such a different result?

I changed my script because i think it will run if i just use the right functions etc. However the script looks like this now :

lossfct = torch.nn.modules.loss.CrossEntropyLoss()
# optimizer settings
optimizer = torch.optim.RMSprop(VRN.parameters(), lr=0.1)

voxel_array = torch.from_numpy(voxel_array)
voxel_array = voxel_array.long()

for epoch in range(10):
    print(epoch)
    VRN.train()
    predicted_model = VRN.forward(inp)
    # vol = predicted_model.numpy()
    # vol = predicted_model.numpy()
    # voxel_array = voxel_array.squeeze(0)
    # vol = torch.from_numpy(vol)
    loss = lossfct(predicted_model[0], voxel_array)
    print(loss)
    loss.backward()
    optimizer.step()

but it produces the following error :

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

However if i Squeeze, i get Tensor Erros. How could i fix this?
My Shape is [1, 384, 384, 384] for Target and the same for model-output. However, i dont know how too Add the C as in CrossEntropyLoss mentioned since i got like “unlimited” classes theoretically. So it simply does not make that much sense for me. But without the C i always get a Batch_size or Dimension Error

This is the issue, so please take another look at my previous post, which explains the expected shapes.
The issue is that the model output should contain a class dimension in dim1, while the target should not.

Thanks! But shouldnt a simple unsqueeze(0) solve this?
However, i got this error now:

AttributeError: 'list' object has no attribute 'log_softmax'

after i changed

predicted_model[0] to predicted_model

which should basically do the same as unsqueeze.

No, since this would remove the batch dimension and you would end up with the wrong shapes again.

Based on the second error, it seems that predicted_model is a list and not a tensor, so you won’t be able to use log_softmax on it.
Also, note that indexing a tensor as x[0] is only equal to squeeze(0), if the size of the dimension is 1.

Yes silly from me.

print(predicted_model)

showed that is just a 1-item list of tensor, so [0] is needed.
However, if i understand you right, all i need too do is add a dimension too my guessed output from my network? Since it is only sample too train with, it can be simply 1 ? But what, when i really want too train?
However i changed it too

    predicted_model[0].expand(1,1,384,384,384)
    loss = lossfct(predicted_model[0], voxel_array)

too add 1 dimension too my predicted modell.
Now i should have the shape of
input = [1,1,384,384,384]
and
output = [1,384,384,384]

However, i still get a error which is:

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

and i thought i could solve this by edditing the target array the same way as the input.
With

voxel_array = torch.zeros((384, 384, 384), dtype=float)
voxel_array.expand(1,384,384,384)
predicted_model[0].expand(1,1,384,384,384)

i encounter another error:

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

i feel like that the first approach is the right one but i dont know. :cry:

I’m not sure, what’s causing the issue, as the shapes work for me:

nb_classes = 10
output = torch.randn(1, nb_classes, 384,384,384, requires_grad=True)
target = torch.randint(0, nb_classes, (1, 384, 384, 384))
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
loss.backward()

Could you double check the shapes of the output and target?

The expanding of the tensors did not happen.

    print(predicted_model[0].shape)
    print(voxel_array.shape)

still gives

torch.Size([1, 384, 384, 384])
torch.Size([1, 384, 384, 384])

so i guess i’ll have too figure out why they are not transformed…

Okay now i get

torch.Size([1, 1, 384, 384, 384])
torch.Size([1, 384, 384, 384])

from the prints.
I forgot unsqueeze before. To expand you need too

    predicted_model[0].unsqueeze_(0)
    predicted_model[0].expand(1,1,384,384,384)

in case someone has the same problem.
Somehow, now i run into this error:

IndexError: Target 1 is out of bounds.

Might this happen because i just changed the size without passing values there?

Your current output tensor contains logits for a single class (check the size of dim1, which is 1 in your case and which is nb_classes in my case), so you would have to make sure the output has the shape [batch_size, nb_classes, depth, height, width].

But shouldnt it be just 1? Because i train with my 1-unit dataset? And also, what should nb_classes be at my CNN? Since when thinking about a real-training later, there are most likely “infinite” classes. (I know its not infinite but reaaaally big). I am sorry for all my missunderstanding, i am very new too pytorch.
also the cnn was already given and i am not sure how to change the output that it fits like this.

If the number of classes is set to one, you don’t really have a valid use case, since a simple model predicting zeros for all samples would achieve an accuracy of 100% (there are no other classes in the end).
Also, since the target contains a class index of 1, this indicates that at least two classes are expected (target values would be at least [0, 1]).

So basically for my Use-Case (Training with max. 4 samples) the CrossEntropyLoss is the wrong lossfunction?
I simply have too shown that a retraining would be usefull without actually going trough the whole training because we lack in data.
Can you suggest me a better lossfunction for my usecase?

No, not necessarily.
The number of samples does not correspond to the number of classes.
You can freely pick the former (number of samples), while the latter (number of classes) is defined by your use case.
E.g. if you want to train a model to predict 10 classes, the number of classes would be 10 and the output layer would return a tensor containing the logits for all these 10 classes.

Even if you are using a single sample, the corresponding class might be any value between 0 and 9, so the output shape should still be [batch_size, nb_classes, *] and the target can contain values in [0, 9].