Hi, I am new to ML (and here), and this error has been haunting me for days.
I have checked (almost) all the solutions online but none solves my problem.
Any help will be very much appreciated.
I am using a CNN to classify images into 11 classes.
The challenge is that I have only a portion of the training set which are labelled images(about 3000)
, the others are unlabelled (about 6000).
I am trying to do the labelling before each training epoch, and if the probability for a data is higher than a certain threshold (eg. 0.7), I’ll add it (the image) and its corresponding label into two lists
def get_pseudo_labels(dataset, model, threshold=0.7):
...
samples = []
pseudolabels= []
for batch in dataloader: # the labelled portion
img, _ = batch. # Size [128(batch size), 3, 128, 128]
with torch.no_grad():
logits = model(img.to(device))
probs = softmax(logits) # Size [128, 11]
(max_probs, max_indices) = torch.max(probs, dim = 1)
for i, (max_prob, max_idx) in enumerate(zip(max_probs, max_indices)):
if max_prob > threshold:
samples.append(img[i].cpu()) # img[i] tensor of size (3, 128, 128)
pseudolabels.append(max_idx.cpu()) # 1 of the 11 classes # img[i] tensor of size (1)
# have check the two lists' lengths, they are the same
if len(samples) > 1:
dataset = MyDataset(samples, pseudolabels) # pseudo set
else:
dataset = None
then send the lists to instantiate a MyDataset
class object like this:
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, X, y=None):
# Stacking tensors into one tensor
self.data = torch.stack(X) # size: torch.Size([98, 3, 128, 128]). # 3, 128, 128 is one img size
self.label = torch.stack(y) # size: torch.Size([98]) # the 98 is an arbitrary number of the data with prob > 0.7
def __getitem__(self, idx):
return self.data[idx], self.label[idx]
def __len__(self):
return len(self.data)
The code for training:
for epoch in range(n_epochs):
if do_semi: # if do semi-supervised labelling, i.e. feed the pseudo labels back to training set
pseudo_set = get_pseudo_labels(unlabeled_set, model)
if pseudo_set != None:
concat_dataset = ConcatDataset([train_set, pseudo_set])
train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle= True , num_workers=2, pin_memory=True)
else:
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle= True , num_workers=2, pin_memory=True)
# ---------- Training ----------
model.train()
# Iterate the training set by batches.
for idx, batch in enumerate(train_loader):
imgs, labels = batch
# if the batch has no problem, it yields:
# imgs shape: torch.Size([128, 3, 128, 128]) <class 'torch.Tensor'>
# labels shape: torch.Size([128]) <class 'torch.Tensor'>
Here, in the beginning of the for loop, is where the error occurs.
AttributeError: Caught AttributeError in DataLoader worker process 1.
...
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 52, in default_collate
numel = sum([x.numel() for x in batch])
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 52, in <listcomp>
numel = sum([x.numel() for x in batch])
AttributeError: 'int' object has no attribute 'numel'
The error looks like I have passed something which is not a tensor into the dataset. I have checked almost everywhere; the pseudo set is all fine, the error occurs in the duration of the dataloader is producing a batch. The weird part is, this error does not appear every time a pseudo set is added into training, it comes in the latter period but sooner or later, it definitely comes. Please, could anyone tell me where could it go wrong? Or recommend a better way to wrap up the pseudo set?
Again, any help is very VERY MUCH appreciated.