Pytorch equivalent of tensorflow conv2d_transpose filter tensor

asberman · April 24, 2018, 4:40pm

The target, style_labels is a 27x1 Tensor where each element (if that’s the word) is the class label [an int in the range 0-26 inclusive].

It’s not a multi-label target in that sense (i.e. an image can only have 1 class), but the Tensorflow implementation used sigmoid_cross_entropy_with_logits, of which the equivalent in Pytorch is the loss I’m using. The normal nn.CrossEntropyLoss is then definitely way better - thanks for picking that up! The tensorflow implementation might be wrong then…

Now I guess the question is whether nn.CrossEntropyLoss satisfies what the paper requires of this loss:

Maximizing the stylistic ambiguity can be achieved by maximizing the style class posterior entropy.
Hence, we need to design the loss such that the generator G produces an image x ∼ pdata
and, meanwhile, maximizes the entropy of p(c|x) (i.e. style class posterior) for the generated images.
However, instead of maximizing the class posterior entropy, we minimize the cross entropy
between the class posterior and a uniform target distribution. Similar to entropy that is maximized
when the class posteriors (i.e., p(c|G(z))) are equiprobable, cross entropy with uniform target distribution
will be minimized when the classes are equiprobable. So both objectives will be optimal
when the classes are equiprobable. However, the difference is that the cross entropy will go up
sharply at the boundary since it goes to infinity if any class posterior approaches 1 (or zero), while
entropy goes to zero at this boundary condition. Therefore, using the cross entropy
results in a hefty penalty if the generated image is classified to one of the classes with high probability.
This in turn would generate very large loss, and hence large gradients if the generated images
start to be classified to any of the style classes with high confidence.

It’s times like these I wish my maths was better!