CUDA out of memory with multiprocessing

Hi all,

I’m using multiprocessing to do inference and it shows CUDA out of memory error. The code is like the following:

from torch.multiprocessing import Pool, set_start_method

if __name__ == '__main__':

    try:
        set_start_method('spawn')
    except RuntimeError:
        pass

    with Pool(3) as pool:
        pool.map(predict_lat, range(839))

And the predict_lat function is the following:

def predict_lat(lat):
    ds = COWholeDataset(ppt=ppt, tmax=tmax, tmin=tmin, tmean=tmean, attributions=attributions, topo_file=topo,
                        lat_id=lat, window_size=180, snow_cover=None)
    dataloader = DataLoader(ds, batch_size=512, shuffle=False)
    pred = []
    print('Sub begin: ', lat)
    for data in dataloader:
        x_d_new, x_attr = data
        x_d_new, x_attr = x_d_new.to(device), x_attr.to(device)
        y_hat = model(x_d_new, x_attr)[0]
        y_hat_sub = y_hat[:, -1:, :]
        pred.append(y_hat_sub.cpu().data.numpy())
        del x_d_new, x_attr, y_hat, y_hat_sub
    pred = np.concatenate(pred).flatten()
    np.save(path + str(lat) + '_' + str(e), pred)

I’m thinking that the GPU memory should be released when one subprocess finished, and if my GPU can handle three models at one time, it should be fine.
But every time when the first subprocess finished and the fourth begins, it will show the CUDA out of memory error. Does that imply the GPU memory occupied by the first subprocess is not released properly?

Thanks for any ideas and help!