My training loop looks like the following
for X, mask in dataloader(...):
X = X[mask] # only a small subset (of rows) of X is good for training, very slow
X_cuda = X.cuda(non_blocking=True)
prediction = model(X_cuda)
...
My X is very large. The masking/row-selection is taking a lot of time (because it copies the data instead of using shared storage) in each cycle so I cannot sufficiently use my GPU.
Is there a way to do X[mask] that avoids data copying?
Thanks
I don’t think that’s possible as the number of output elements depends on the mask
and is thus unknown before the actual values are available. I.e. you won’t be able to preallocate a tensor in a specific shape unless you waste memory use use the max. number of elements (X.nelement()
).
Can X[mask] return a view or something?
The following code returns a view instead of a copy right?
a, b = get_range(...)
view = X[a: b]
No, I don’t think masking can create a view, since the masked indices can be random.
Yes, slicing the tensor will create a view, so if your mask uses a specific stride/indexing logic you might want to convert it to an indexing operation.