I have one GPU card - A100, and had 80GB GPU memory.
in NLP Task , I try to using parallel training model, but it not work, due to data is more special
for example to my data.
sample 1 , 5 batch, each batch size is 500
sample 2, 1 batch, each batch size is 500
sample 3, 1574 batch, each batch size is 500
…
…
sample 1000000, ? batch, each batch size is 500
as you see, each sample had different batch number,
the best way I can do is try to flatten all of batch number to 1
for example:
sample 3-1, 1 batch, each batch size is 500
sample 3-2, 1 batch, each batch size is 500
…
…
sample 3-1574, 1 batch, each batch size is 500
but I don’t wanna do it, since each batch may had some relationship, if I split them, performence may not well.
so I want to ask is there some method can using single GPU but multiple process method.
here is I find the similar method.
Multiprocessing best practices — PyTorch master documentation
but it not work to me, when I try to running it on jupyter notebook, always show cuda initial error message.