I have 5529 labels,
and each image has variable number of labels.
In my final result, i can have upto 100 labels.
I made my custom dataset following this code. https://www.kaggle.com/mratsim/starting-kit-for-pytorch-deep-learning
class myCustomDataset(Dataset):
“”“my dataset.”""
def __init__(self, csv_file, root_dir,img_ext,transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
#.iloc[:,0]=[[0]]
#
tmp_df = pd.read_csv(csv_file,sep=';',header=None)
assert tmp_df.iloc[:,0].apply(lambda x: os.path.isfile(root_dir + x + img_ext)).all(), \
“Some images referenced in the CSV file were not found”
self.mlb = MultiLabelBinarizer()
self.root_dir = root_dir
self.img_ext = img_ext
self.transform = transform
self.X_train = tmp_df.iloc[:,0]
self.y_train = self.mlb.fit_transform(tmp_df.iloc[:,0].str.split()).astype(np.float32)
def __len__(self):
return len(self.X_train.index)
def __getitem__(self, index):
img = Image.open(self.root_dir + self.X_train[index] + self.img_ext)
img = img.convert('RGB')
if self.transform is not None:
img = self.transform(img)
label = torch.from_numpy(self.y_train[index])
return img, label
now,i give my validationset to the dataset and create dataloader.
transformedvalid_dataset= myCustomDataset(csv_file=’/home/nis/Downloads/trialdata/Validation-Concepts.csv’,
root_dir=’/home/nis/Downloads/trialdata/validation-set/’,
img_ext=’.jpg’,
transform=transforms.Compose([transforms.RandomSizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
)
I am able to create my custom dataset ,and make validation loader.
print(len(transformedvalid_dataset))
14157
trainloader = torch.utils.data.DataLoader(transformedvalid_dataset, batch_size=32, shuffle=True)
dataiter = iter(trainloader)
images, labels = dataiter.next()
print(type(images))
print(images.shape)
print(labels.shape)
<class ‘torch.Tensor’>
torch.Size([32, 3, 224, 224])
torch.Size([32, 14157])
But when i do same for train images(56638 images), it gives memory error when i provide my data set. I am far from making data loader work in case of train images as it gives memory error while giving images to dataset. Data loader would be the next step and batch size is provided at data loader.
i could not figure out or find any links to correct the error.
how can i solve it? Also,my labels shape is equal to numbers of images in validation set,is it correct?
Also, how to know how data looks at
self.y_train = self.mlb.fit_transform(tmp_df.iloc[:,0].str.split()).astype(np.float32)
and
label = torch.from_numpy(self.y_train[index])