Sure, there are several ways to implement data augmentation.
One way would be to use torchvision.transforms on images. As these transformations often work only on images, we would need to transform the numpy arrays into images first, augment the data, and finally transform them to tensors
.
Another way would be to just implement these transformations by ourselves, as we already have the tensor data.
Anyway, in both cases we would have to take care of the targets as well, since the keypoints would have to be e.g. flipped accordingly to the image.
In the first case, we could use torchvision’s functional API. Here is a small example I’ve written a while ago.
You could just reimplement the blog post’s data augmentation as the flip indices etc. is already provided:
class MyCustomDataset(Dataset):
def __init__(self, x,y,dtype):
self.x=torch.from_numpy(x).to(dtype=dtype).clone()
self.y=torch.from_numpy(y).to(dtype=dtype).clone()
self.dtype=dtype
self.data_len=len(x)
self.flip_indices = [
(0, 2), (1, 3),
(4, 8), (5, 9), (6, 10), (7, 11),
(12, 16), (13, 17), (14, 18), (15, 19),
(22, 24), (23, 25),
]
def __getitem__(self, index):
# stuff
img=self.x[index].clone()
label=self.y[index].clone()
# Transform every second sample
if random.randint(0, 1) == 1:
print('Flipping image')
img = img.flip(2)
label[::2] = label[::2] * -1
for a, b in self.flip_indices:
label[a], label[b] = label[b].clone(), label[a].clone()
return (img, label)
def __len__(self):
return self.data_len
You have to add some clone()
calls, as otherwise the original data will be modified or the label swap won’t work.
Using the Dataset
you just have to handle a single sample. The DataLoader
will automatically create batches using the Dataset
.
I’m not sure, how Lasagne calculates the losses, but I assume in both cases the mean loss of all batches is averaged over the epoch.