Concerning unknown errors with Torch.abs() and F.elu()

tsuijenk · January 10, 2022, 9:52am

Hello everyone,

I’ve been decoding for many hours but I still haven’t been able to figure out why.

Torch.abs() returns unknown error when running in terminal, with PyTorch Lightning, and Wandb (if that matterrs).
- no NaN values
- range is 0,1.
- size is (20,1600,16)
- tensor sum to 12700, meaning that we probably have a Sparse Tensor
  * This “sparse tensor” thing is the only cause for the error that came to mind, as I find several relevant posts online.
F.elu(x) produces unknown error.
- x is also range (0,1), with no NaN values.
- this is taking place after the two down-sampling stages, in the first up-sampling stage.
- within the first up-sampling stage, this is after 2 convolutional layers + 2 F.elu(x) + 3rd convolutional layer.

I am very puzzled…Could both of these be due to memory error?

tsuijenk · January 10, 2022, 10:25am

Update: Sad…but I am pretty sure Out of Memory error is involved in this.

Currently, I have been using batch size 64, but I guess I might have to try batch size 32.

Still open to more solutions and tips.

P.s. My code is all vectorized.