I created a dataset that loads a single data sample at a time on demand (1 sample consists of multiple images), and I have a data loader with a small batch size. When I try to show just the first few batches of my dataset, the loader keeps trying to iterate through my entire dataset instead of pulling out just the few data samples:
class FaceDataset(Dataset):
def __init__(self):
df = pd.read_csv("data/positions.csv")
df["filename"] = df["id"].astype("str") + ".jpg"
self.filenames = df["filename"].tolist()
self.targets = torch.FloatTensor(list(zip(df["x"], df["y"])))
self.head_angle = torch.FloatTensor(df['head_angle'].tolist())
def __len__(self):
return len(self.targets)
def __getitem__(self, idx):
sample = {
"targets": self.targets[idx],
"head_angle": self.head_angle[idx]
}
for img_type in ["face", "face_aligned", "l_eye", "r_eye", "head_pos"]:
img = Image.open("data/{}/{}".format(img_type, self.filenames[idx]))
img = torch.from_numpy(np.array(img))
# img /= 255
sample[img_type] = img
return sample
ds = FaceDataset()
data = DataLoader(ds, batch_size=2, shuffle=True, num_workers=2)
for i_batch, sample_batched in enumerate(data):
print(i_batch, sample_batched)
if i_batch == 1:
break
The length of my dataset is about 15k, and the loader seems to be trying to load everything instead of just a single batch of 2. Jupyter seems to just freeze before anything actually gets printed. Am I creating my dataset/loader incorrectly?
When I try to iterate over the dataset, I seem to get the correct data back for 1 training sample. The problem only seems to come in when iterating over the data loader object:
for batch in ds:
print(batch)
break
EDIT: I think Ive narrowed this down to the num_workers
param. When I set it to 0
then the whole thing works as expected, but when I set it to > 0
then nothing gets printed?