Using views on gpu to reduce memory transfers

Hello I am training the model that is using a lot of overlapping data, and i would like to reduce memory transfers coused by reloading for each case the same data

More precisely
I have 3d array let’s say 10000x10000x10000
Model accepts 32 x32x32 arrays and is intentionally learning only local features

Each training data that i will put in model is 32x32x32 neighberhood of each voxel of main array, so two neighbouring voxel shere most of the data

Additionally i need to invoke the same 32 by 32 by 32 block 8 times for each voxel modyfing just single voxel

Clearly i could copy each 32 by 32 by 32 neighberhood in a loop than in inner loop stack 8 modifications of the central voxels but it will lead by very rough approximation loading the data about the same voxels 10000 times in each epoch, what is extremely wastfull particularly on gpu

Is there a workaround for my case ?