I am currently working on an implementation of the Consistency regularization method proposed in the following paper.
[arXiv:2002.04724] Improved Consistency Regularization for GAN
Within this paper, when injecting real and generated images into the Discriminator, they apply data augmentation to each image and compute the loss value according to the following algorithm.
If I only want to perform data augmentation on the real images, I can do so during batch data creation in
torch.utils.data.Dataset, but how can I perform data augmentation on a Tensor generated from the Generator?
The following methods involve sending data between the CPU and the GPU, which causes some overhead and reduces computational speed.
"""transform tensor for consistency regularization
:param image_tensor: torch.Tensor [B, C, H, W]
device = image_tensor.device
_, _, H, W = image_tensor.shape
cr_transforms = transforms.Compose([
transforms.Normalize((0.5, ), (0.5, ))
image_tensor = torch.cat([cr_transforms(image).unsqueeze_(0)
for image in image_tensor.cpu()], dim=0)
What’s the best way?
Obviously, as you have mentioned, transferring every tensor between cpu and gpu to use available methods which only works for PIL images are is proper.
As far as I know, there is no built-in function for
Crop but for others we have a solution.
One solution is to copy the source code and just change the PIL input to tensor. For instance, here is the implementation of RandomResizedCrop
For cropping, indexing just works fine and for resizing there is a built in function. The only issue is random generation for crop which can be copy pasted from source code.
Here is what I have changed that works for a single 3D tensor:
# a arbitrary 3D input
x = torch.ones((3, 100, 100))*255
x[:, 25:75, 25:75] = 0
ratio=(3. / 4., 4. / 3.)
width, height = x.shape[-2], x.shape[-1]
size = (64, 64)
area = height * width
for _ in range(10):
target_area = random.uniform(*scale) * area
log_ratio = (math.log(ratio), math.log(ratio))
aspect_ratio = math.exp(random.uniform(*log_ratio))
z = None
w = int(round(math.sqrt(target_area * aspect_ratio)))
h = int(round(math.sqrt(target_area / aspect_ratio)))
if 0 < w <= width and 0 < h <= height:
i = random.randint(0, height - h)
j = random.randint(0, width - w)
z = i, j, h, w
if z is None:
# Fallback to central crop
in_ratio = float(width) / float(height)
if (in_ratio < min(ratio)):
w = width
h = int(round(w / min(ratio)))
elif (in_ratio > max(ratio)):
h = height
w = int(round(h * max(ratio)))
else: # whole image
w = width
h = height
i = (height - h) // 2
j = (width - w) // 2
z = i, j, h, w
resized = F.interpolate(x[:, i:h, j:w].unsqueeze(0), size=size, mode='bicubic')
Also you can wrap your code in the form of source code I referenced.
I tested it the way you presented it here.
and fix some minor changes.
# x[:, i:h, j:w]
x[:, i:i+h, j:j+w]
That is probably making more sense as
j+w never exceeds original
width and those are only a proportion of original sizes.
Although the code I provided needs few other fixes, for instance it uses external libraries that need to be replaced by
torch for working on tensors.
Hi, I just found a much better solution using kornia, the augmentation library that supports direct tensor and works on cuda too.
Here is the simplified solution:
import kornia as K
x = torch.randn(1, 3, 200, 200).cuda()
Note that this augmentation methods can be used in the exact same way as