I am training a simple GAN model where the discriminator has to classify between real and generated images. The discriminator model looks like this:
def __init__(self, image_size=128, conv_dim=64, c_dim=5, repeat_num=6):
layers = 
layers.append(nn.Conv2d(3, conv_dim, kernel_size=4, stride=2, padding=1))
curr_dim = conv_dim
for i in range(1, repeat_num):
layers.append(nn.Conv2d(curr_dim, curr_dim*2, kernel_size=4, stride=2, padding=1))
curr_dim = curr_dim * 2
kernel_size = int(image_size / np.power(2, repeat_num))
self.main = nn.Sequential(*layers)
self.conv1 = (nn.Conv2d(curr_dim, 1, kernel_size=3, stride=1, padding=1, bias=False))
self.conv2 = (nn.Conv2d(curr_dim, c_dim, kernel_size=kernel_size, bias=False))
self.soft = (nn.Softmax())
The Sigmoid layer should have a single output corresponding to whether and image is real or fake. The Softmax layer should output the probabilities for all the 5 classes the fake image might belong to.
I am getting values ranging from 258.514 to 3.999 as an output for the Sigmoid layer.
Shouldn’t this layer constrain the values between 0 and 1? Am I using the layers in the right manner?
Could you post a minimal and executable code snippet showing this unexpected behavior, please?
Your code is unfortunately still not executable and I cannot reproduce the issue using:
model = Discriminator()
x = torch.randn(2, 3, 128, 128)
out = model(x)
# tensor(0.4997, grad_fn=<MinBackward1>) tensor(0.5007, grad_fn=<MaxBackward1>)
When I try to reproduce the issue by randomly initializing a tensor I get values between 0 and 1.
tensor(0.4993, grad_fn=<MinBackward1>) tensor(0.5003, grad_fn=<MaxBackward1>)
But when I run the code with actual images, this is what I get:
tensor([[[[52.6691, 52.6701, 52.6636, 52.6650, 52.6738]
Could the issue be with the images then?
I doubt it, but would need an executable code snippet to be able to reproduce and debug the issue.
Perhaps your second Conv2d after the Sigmoid is getting kernels outside of 0 and 1 during training. Maybe try moving your Sigmoid after that layer and see what happens.
Thanks for your suggestion.
But I am feeding the output of self.conv1 into the Sigmoid layer and the output of self.conv2 into the Softmax layer. Will the second Conv2d still affect the output of the Sigmoid layer?
Anyways, I made the change and tried but its still not working as expected.
On closer inspection, I see what you’re doing in the forward pass.
Perhaps try cloning h and assign to another letter(i.e.
r=h.clone()) before sending it through further layers. Then use
r in your other branch.
It’s hard to say exactly what the issue is without seeing your training process.