Running inference of a model as part of loading training data for another model

jhnthndr · January 7, 2025, 7:04am

Hi,

I have been training a primary model in pytorch lightning. Now I would like to make modifications to my training data by running inference of another already trained secondary model on the training data and using the output for the modifications. I have tried to include the call to the secondary model both in the collate function and in the Dataset class but in both cases I get an error

"RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with
multiprocessing, you must use the 'spawn' start method

This led me to try adding mp.set_start_method('spawn', force=True) at different places in my start script but this causes another problem related to multiprocessing and pickling:

AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame'

Searching I have seen that lambda functions may be the cause of this but I have not been able to identify such a source of the problem. Also that stack trace is from deep within pytorch lightning and multiprocessing package code.

I would think that wanting to run inference of one model during data loading for training another model would not be that uncommon but I have found close to nothing on the subject. Am I on an impossible mission or does anyone here have suggestions to how I could get this to work?

Alternatively I could preprocess my dataset with the output of the secondary model but I find that inflexible and unelegant so I would prefer my initial approach