I am University Student who studying in Speech Recognition in Korea.

Me and my team is working on a project now, but I got a problem.
When my model starts training,
comes out at random batch and kills the training sequence.

Not because it dies at exactly same batch, I don’t know what occurs this error.
I asked for advice in Korea Pytorch Forum, and many of them replied for me to upgrade CUDA version. Is that the only option I can do to fix this situation?

Here is settings and tools I used:
device : GTX 1080 Ti
CUDA version : 9.2
cuDNN version : 7.6.4
torch version : 1.3.1

Command used to install Pytorch :
conda install pytorch torchvision cudatoolkit=9.2 -c pytorch -c defaults -c numba/label/dev

It will be lovely if someone’s gonna help me.
Thanks! :smile:

The command should install PyTorch with CUDA10.1, while apparently you are using 9.2?

Could you explain your use case a bit?
E.g. what kind of data are you using (which shape)? Does it have variable shapes?
Do you see an increased memory usage on your GPU using nvidia-smi or does it stay constant during training? What kind of model are you using?

If possible, could you try to run your code with random inputs and try to reproduce this error with them? If you see the same error, could you post a small code snippet to reproduce this issue?

Sorry for confusion! :disappointed_relieved:
It was like this :
conda install pytorch torchvision cudatoolkit=9.2 -c pytorch -c defaults -c numba/label/dev

Data has fixed shape, and I used RNN model to train.
I didn’t checked the memory usage of GPU or tried to run code with random input, so let me check it and come back!

Thank you very, very much!