I cannot thank you enough for this post! In the last few weeks, I have been losing my mind: I work on Graph Neural Network and was pre-processing torch_geometric.data.Data
objects using multiprocessing. After a few minutes I got the same error as you, although my machine had over 200GB RAM left. Adding a simple .clone()
to my data-objects solved the problem and now everything works as expected
1 Like
connot love you more!!! you’re super hero!!!
cannot thank you more!!! after such a long time, solve it eventually!!!
btw, i guess this may related to the vm.max_map_count
in linux. for me, with vm.max_map_count=65530
, the code encounters this error after about 64900 batches. so, i think without .copy()
it may increase the map_count
continuously.
but i cannot explain why this error only ocurs when num_works>0