I know that this value is normal, but it has no gradient
But when I use floating point numbers, the effect of the generated image is not what I want at all. I don’t know what to do.
Since you weren’t able to train the model using the uint8 approach, could it also be possible that the cause of the issue with generated image quality is the result of some other part of your model/training process rather than this particular function.
Theoretically, I can use CV2 to implement it, but there will be no gradient in training. My process is in the training graph, so I have to use tensor. But using the actual tensor is not enough and must be converted to unit8, which gives me a headache.
The main issue is that you are using torch.clamp(new_tensor, 0, 255) and this is probably why you are getting very different outputs for int vs float operations. One workaround is using rounding operations.
import torch
import numpy as np
# your int method
def apply_relighting_tensor_int(tensor, alpha, beta):
tensor = tensor * 255.0
new_tensor = tensor.to(torch.uint8)
new_tensor = new_tensor * alpha + beta / 255.
new_tensor = torch.abs(new_tensor)
new_tensor = new_tensor.to(torch.float32)
new_tensor = new_tensor / 255.0
return new_tensor
# new float method
def apply_relighting_tensor_float(tensor, alpha, beta):
tensor_float = tensor * 255.0
new_tensor = torch.round(tensor_float * alpha + beta, decimals=1)
new_tensor = new_tensor / 255.0
return torch.round(new_tensor, decimals=1)
if __name__ == "__main__":
image_tensor = torch.rand(1, 3, 64, 64)
# random alpha and beta
alpha = 2.5
beta = 0.1
# apply functions
int_tensor = apply_relighting_tensor_int(image_tensor, alpha=alpha, beta=beta)
float_tensor = apply_relighting_tensor_float(image_tensor, alpha=alpha, beta=beta)
# check stats
print(f"INT VS FLOAT MAX: {int_tensor.max()} vs {float_tensor.max()}")
print(f"INT VS FLOAT MIN: {int_tensor.min()} vs {float_tensor.min()}")
print(f"INT VS FLOAT MEAN: {int_tensor.mean()} vs {float_tensor.mean()}")
print(f"INT VS FLOAT STD: {int_tensor.std()} vs {float_tensor.std()}")
Though it’s not exactly giving the int8 outputs, the stats show that the float method is almost same as the int method.
INT VS FLOAT MAX: 2.4901974201202393 vs 2.5
INT VS FLOAT MIN: 1.537870048196055e-06 vs 0.0
INT VS FLOAT MEAN: 1.2464064359664917 vs 1.2522379159927368
INT VS FLOAT STD: 0.7196686863899231 vs 0.7208324670791626
This apply_relighting_tensor_float method shouldn’t have any issues with the gradient flow.
Thank you very much for your reply. The round operation may affect the gradient.
Since the round operation is not differentiable or has zero gradient at most points, if rounding is used in the neural network or optimization process:
Gradient blocking: This operation causes the gradient on the relevant path to be zero, thereby blocking the gradient flow. This may cause the model to be unable to learn effectively.
Optimization difficulties: Using non-differentiable operations such as round makes the optimization process more difficult because the gradient information is lost or inaccurate.
The image after apply will be sent to the detector for detection, so I can’t let it break the gradient. I’ll try it first. If it still doesn’t work, I’ll come back and tell you.