I’m facing a tedious problem when using pytorch tensor’s ops APIs, because I want to use GPU’s performance power to accelerate my data processing speed, but my GPU’s memory size is too small, so I need cut my operands tensors into smaller size and after getting the result, move it to the CPU main memory, and after getting all the small parts of the results, I combine them into one final results in the CPU’s main memory. It means I need manually do the operands tensors cutting work and combining work of the result, and moving those tensors between GPU memory and CPU memory, and for all different Operators in pytorch, I need manually do this work all again and again.
My question is: do we have any tool to do this data cutting and result combining work automatically currently? Then I could fully utilize my GPU’s calculating performance and my CPU memory’s size.
Thanks in advance!