Running Error Message

Hi,

I’m trying to finish a multi-classification problem. In detail, I’m trying to determine the speech either contains some specific noise or no noise.

After training my model 100 epochs, the Cuda utilization suddenly jumps down to 0 - 2%. In the previous epochs, that was about 60 -70%.

Even though the error message reports a format issue, I don’t think that’s the file format issue because in the previous epochs I have already successfully loaded the same file with the same code.

I guess this issue just pops out suddenly.

Does anyone have some thoughts on that?

I’ll really appreciate it!

One thing you could check if it is a memory issue. My thinking around this is that “unknown format” might really be something like “something went wrong while decoding” and if your training leaks memory, it might end up not being able to load whatever it needs for decoding.

It’s just a guess, though.

Best regards

Thomas

Thanks for your idea!

I’m not familiar with “training leaks memory”, but it does make sense to me.

I will search more on that. Thanks for your help anyway!

So there will be better ways to monitor this, but my very simple recipe for getting a first impression:

  • Start the training,
  • Every now and then I run the program “top” in a console window and after hitting “m” it will sort processes by memory use. The column RSS is the one I look at.
  • If it keeps increasing between the 5th and 10th epoch (or so) and continues to increase between 10th and 20th, you might have something that keeps using memory and never returns it (i.e. a memory “leak”).

Best regards

Thomas