Worse results with Pytorch data preprocessing?

ecolss · April 9, 2017, 8:38am

I was playing with simple neural style transfer model, using the pretrained vgg model in torchvision, and followed the official advice, I preprocessed the input images (content, style, however input is a random noise image, so I didn’t preprocess it) before training (see prep function below), and then post processed the images (content, style, optimized input) when training is done (see inv_prep below).

The problem is, seems like preprocessing gave me worse results, and I removed preprocessing and tried again, which gave me better results, both with the same initial input noise image.

# below, no mean/std preprocessing, only image Scale + axis-moving
prep = thv.transforms.Compose([
    thv.transforms.Scale(200),
    thv.transforms.Lambda(lambda x: np.moveaxis(np.array(x), 2, 0)),
    thv.transforms.Lambda(lambda x: th.from_numpy(x.astype('float32'))),
    thv.transforms.Lambda(lambda x: x.unsqueeze(0))
    ])

inv_prep = thv.transforms.Compose([
    thv.transforms.Lambda(lambda x: x.squeeze(0)),
    thv.transforms.Lambda(lambda x: (x.numpy().clip(0,255).astype('uint8'))),
    thv.transforms.Lambda(lambda x: np.moveaxis(np.array(x), 0, 2)),
    ])

# below, with mean/std preprocessing, and yes I know I could use ToPILImage to convert tensors back to images
prep = thv.transforms.Compose([
    thv.transforms.Scale(200),
    thv.transforms.ToTensor(),     # need to do this before applying mean/std transformation
    thv.transforms.Normalize(mean=mean, std=std),
    thv.transforms.Lambda(lambda x: x.unsqueeze(0))
    ])

inv_prep = thv.transforms.Compose([
    thv.transforms.Lambda(lambda x: x.squeeze(0)),
    thv.transforms.Normalize(mean=[0,0,0], std=1/std),
    thv.transforms.Normalize(mean=-mean, std=[1,1,1]),
    thv.transforms.Lambda(lambda x: (x.numpy().clip(0,1) * 255).astype('uint8')),
    thv.transforms.Lambda(lambda x: np.moveaxis(np.array(x), 0, 2)),
    ])

This is the result image I got, with mean/std preprocessing, 100 itertaions:

This is the result image I got, without preprocessing, 100 itertaions:

Why is that? Anything I did wrong? I checked the final optimized input, the pixel values range from around [-50, 50], and I’m not sure how to convert them back to real image pixel values, and I think this is the problem I didn’t get right, which gives me bad image result as shown in the first pic (although I still can see the shape of Peter :-|).

smth · April 9, 2017, 3:30pm

if the input is not a natural image, but a random noise, just drawing it from a uniform [-1, 1] distribution makes much more sense than pre-processing it.

ecolss · April 10, 2017, 1:29am

The input is indeed drawn from uniform (0,1), and not preprocessed in both experiments, but the optimized results are post processed, i.e. the inv_prep function.