How to train LSTM with GPU

lugiavn · February 12, 2019, 8:54am

Actually I’m not asking anything.
By “that makes no sense to me”, I meant “you using a profiler to determine the model is running on gpu or not” is such a mistake. That is the real bug, the root of this problem (the other thing is simply “symptom”). When practicing software engineering, it’s important to find the root of the problem and fix it so that you don’t have any more “symptoms” in the future.

Crohn_eng · February 12, 2019, 9:53am

Hi @lugiavn,

I am sorry, I was using the wrong terms previously: with performance profiler, I really meant that I looked at a simple system management interface (like nvidia-smi) for seeing if the GPU was used or not.

To recap: since I didn’t receive any error, I used a debugger to determine if both the model and all the necessary data were uploaded and running on the GPU.
Since they were, then I looked at the system management interface, noticing the “symptoms” (intermittent low GPU usage between 0 and 1%) and therefore catching the “root” of my problem, which was the bottleneck.
As you may have understood, I am a beginner in using PyTorch and not a really skilled developer, so this probably was not the most clever way for finding my error: if you have any good software engineering practice to share I would love to ear them! But I think this is off topic here.

Cheers

kanikel · July 31, 2019, 9:53am

Hi, @Crohn_eng
I am currently working on a project that uses LSTM to predict video frames. I found your codes on LSTM autoencoder very enlightening. However, I am getting a bit confused defining my own train function. Would you mind sharing your github link for this project so I might refer how you define the train function in the LSTM autoencoder (Or the complete repo)? You may also send it to my email (yinzy5@gmail.com) if it’s more convenient for you. I would really appreciate your help.

Crohn_eng · August 6, 2019, 9:47am

Hey @kanikel,

sorry for the delay in answering you.
I don’t have a GitHub repository for my code, since I have everything hosted on a private server at my university (this is my thesis project).
To be honest, I haven’t worked directly with video frames using PyTorch. As I wrote in my previous comments, I used vectors of features extracted from video frames in an offline stage as my dataset.
However, in a different step of my thesis I have done a work similar to video frames prediction (but I moved to Keras in the meantime ), so maybe I can help you.
I suggest you to open a new topic explaining the problem you encountered defining the loss function, and then we can try to find a solution together with the other members of the community.

Cheers