Based on your code snippet I assumed batch would correspond to the batch size, and my code snippet would yield the same result as yours.
Could you post the shape of A and an executable code snippet (using random values for A)?
batch_size = 3
height = 2
width = 2
A = torch.randint(2, 11, (batch_size, height, width)).float()
AA = A.clone()
print(A)
# I can get what I want from below for-loop solution
for i in range(batch_size):
A[i] -= torch.min(A[i])
A[i] /= torch.max(A[i])
# Your solution
AA -= AA.min(1, keepdim=True)[0]
AA /= AA.max(1, keepdim=True)[0]
print(A) # A and AA are different
print(AA)
strange, but your approach with view’s is very slow.
It is faster than loop approach when I use timeit, but inference pipeline got slower in 10 times (with for loop is about 50 FPS, with views about 5 FPS)
EDIT 1:
Just added torch.cuda.synchronize()
for loop: 0.5 ms
view approach: 150 ms
I don’t understand what happens, view shouldn’t change tensor itself (from continuous to non continuous)
Do you have any thoughts?
Additional info:
I use CUDA tensor with shape [B, 3, 1024, 1024]
torch version: 1.2.0
cuda version: 10.0.130
GPU: NVIDIA QUADRO GV100
OS: linux
The view operation should be really cheap, as it only changes the meta-data, i.e. no copy will be performed and you would get an error, if a copy is necessary.
Could you post your profiling code so that I could take a look, please?
I think the overhead is created by the torch.max call with a dimension keyword, as it’ll also return the indices.
In the last use case the batch dimension can more or less be ignored, as it’s much smaller than the flattened height*width.
If I’m not mistaken, there was recently a feature request to add a return_indices argument to torch.max.
@ptrblck , say I have a dataloader and want to normalize my whole dataset using the min_max scaling solution above. Would it be a good approach to do it on every batch (where the mins and max would differ)? Or what would be a better approach?
Usually you wouldn’t normalize with the batch statistics directly, since you would also need to do the same during your inference/deployment code.
Depending on your use case it could work (e.g. if your test use case is also using the same batch size) or it could change the behavior of your model a bit, since the min and max values depend on the actual noise (the effect might be small, so it could still work, but you would need to verify it).
I see, thanks for the reply.
How could your snippet be used for normalizing the samples in a dataloader then?
Say I got the following, is this the correct approach?:
The posted approach would also normalize the entire batch. As I previously mentioned, it could work but note that this normalization now depends on the batch size and could thus change the behavior of the model if the batch size is changed e.g. during inference. The change might be minimal and the model might still perform well, so you would need to check it in your use case.