Error training ResNet 50 model

Naina_K · March 20, 2023, 9:43pm

Hi everyone! When I try to run my code below, I get a runtime error saying: Given groups=1, weight of size [64, 3, 7, 7], expected input[32, 256, 256, 3] to have 3 channels, but got 256 channels instead.

However, I wrote code to account for this issue by creating a custom dataset, but I’m not sure what’s wrong.

Custom Dataset

class BreastCancerImages(Dataset):

    def __init__(self, csv_file, root_dir, n_channel=3, transform=None, lbl_col_idx=6, image_file_dir='/kaggle/input/rsna-mammography-images-as-pngs/images_as_pngs/train_images_processed'):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.n_channel = n_channel
        self.d_train = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform
        self.lbl_col_idx = lbl_col_idx
        self.files = list(glob(os.path.join(image_file_dir, "**", "*.png")))
    def __len__(self):
        return len(self.d_train)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        
        # x
        img_name = self.files[idx]
        image = open_img(img_name, self.n_channel)
        image = torch.tensor(image)
        
        # y
        labels = self.d_train.iloc[idx, self.lbl_col_idx]
        labels = np.array([labels])
        labels = torch.tensor(labels.astype('int'))
        return image, labels

ResNet 50

resnet50 = models.resnet50(pretrained = True)
resnet50

for param in resnet50.parameters():
    param.requires_grad = False
    
resnet50.fc = nn.Linear(1000,2)

Training Code

for epoch in range(num_epochs):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(train_loader):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

ege_b · March 20, 2023, 9:59pm

I’m assuming you accounted for this issue in your open_img function. If you don’t mind sharing that one as well, I think I might be able to help you there. Although if that is not the case, then the issue seems to be that your input tensor has the size [BxHxWxC] whereas [BxCxHxW] aka [32, 3, 256, 256] was expected. You can simply return image.permute(0, 3, 2, 1), labels and it should work fine. Otherwise, run your code on debug mode and check the tensor sizes before-after every forward call.

Naina_K · March 20, 2023, 10:03pm

I’ll try returning image.permute(0,3,2,1), labels
Here’s my open_img function:

def open_img(f_path, n_channel=3, resize_to=None):
    if n_channel==1:
        # Load image in (256,256,1)
        im = np.expand_dims(cv2.imread(str(f_path), cv2.IMREAD_GRAYSCALE), axis=-1)
    else:
        im = cv2.imread(str(f_path))
    if resize_to is not None:
        im = cv2.resize(im, (resize_to, resize_to))
    im = (im/255.0).astype(np.float32)
    return im

ege_b · March 20, 2023, 10:07pm

Yeah I think the format returned by cv2.imread differs from what a torch image tensor should look like and you did not account for that.

Naina_K · March 20, 2023, 10:08pm

How should I change my function to account for this?

ege_b · March 20, 2023, 10:11pm

Just add permute to your return, either in open_img which returns a np.array of shape [HxWxC] and in this case you have to change your return to return im.permute(0, 3, 2, 1) or the __getitem__ return to the above one.

Naina_K · March 20, 2023, 10:16pm

for im.permute(0,3,2,1), I get an error saying: AttributeError: ‘numpy.ndarray’ object has no attribute ‘permute’
for image.permute(0,3,2,1), I get an error saying: RuntimeError: number of dims don’t match in permute

ege_b · March 21, 2023, 12:44pm

I’m sorry, the numpy method is not called permute, it is called transpose:

https://numpy.org/doc/stable/reference/generated/numpy.transpose.html
For both of them, __getitem__ and cv2.imread do not directly return the batch, they return/read a single data point without a batch dimension. So try return image.permute(2, 1, 0), labels or return im.transpose(axes=(2, 1, 0)) and it should work fine.

Naina_K · March 21, 2023, 1:17pm

Thank you, I have fixed that error. However, when I go to train my model, I get the error: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu! (when checking argument for argument weight in method wrapper_nll_loss_forward)

for param in resnet50.parameters():
    param.requires_grad = False
    
resnet50.fc = nn.Linear(2048,2)

model = resnet50.to(device)

# Define the loss function and optimizer
criterion = torch.nn.CrossEntropyLoss(weight = class_weights)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for epoch in range(num_epochs):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(train_loader):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

How do I fix this error?

Manuel_Alejandro_Dia · March 21, 2023, 2:13pm

Make sure that your CrossEntropy weights are a CUDA tensor. This way when the loss function applies the weights they will be in the same device.

Naina_K · March 21, 2023, 9:44pm

I have fixed the error above with the following code:

# Define the loss function and optimizer
criterion = torch.nn.CrossEntropyLoss(weight = class_weights.to(device))
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

However, when I go to train my model, I am consistently getting this output:
training loss

training code

for epoch in range(num_epochs):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(train_loader):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, torch.max(labels, 1)[1])
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        print('Epoch: {} \tTraining Loss: {:.6f} '.format(epoch, running_loss))

print('Finished Training')

Where is this error coming from?

Manuel_Alejandro_Dia · March 21, 2023, 10:00pm

First of all, maybe check the weight tensor. It is very weird that for CE loss you get zero.

Also, what are you trying to do with the following line? Why are you applying a max operation?:

loss = criterion(outputs, torch.max(labels, 1)[1])

In my opinion, your dataloader should give you the proper ground truth labels right away.

Try just running one debug pass with batch_size of 0 and see what loss value do you get.
If you are trying to do semantic segmentation, also try to save the tensors and visualize them to make sure you are giving it the proper data to your criterion.

If you are trying to do multiclass segmentation, your tensors should be of dtype=torch.long and your output from the network should be of size [B,C,H,W] where B is batch size, C is number of classes to predict, and (H,W) is the original image size. The labels (ground truth) should come out of your dataloader with shape [B, H, W].

Naina_K · March 21, 2023, 10:04pm

Since my dataset is imbalanced, I used a weighted loss function and calculated the weights like this:

class_weights = [(class_cancer / d_train['cancer'].count()), (class_no_cancer / d_train['cancer'].count())]
class_weights=torch.tensor(class_weights,dtype=torch.float)

class_weights

if I simply use:

loss = criterion(outputs, labels)

I get an error saying: 0D or 1D target tensor expected, multi-target not supported

As for my task: my model needs to output 2 class scores (cancer, no cancer)

Manuel_Alejandro_Dia · March 21, 2023, 10:24pm

Since your model needs to differentiate between two classes, the loss function best fitted for this should be BCEWithLogitsLoss, because what you are describing is a binary classification task (it is either one class or the other).

Unless I have more than two classes in my dataset, I always define which one is the positive class (I would guess that it is ‘class_cancer’ in this case), and give it the weight you calculated with class_cancer / d_train['cancer'].count() as the positive weight.

Naina_K · March 21, 2023, 10:34pm

I see that makes sense, but since the target and input size has to be the same for BCEWithLogitsLoss - how do I adjust my labels size to make it [32,2] (same as input size)?