Hi,
I’m working on a textual modeling problem. There are three steps in forward
method as shown below.
class Model(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.ModuleDict()
self.layers['layer_0'] = nn.Linear(in, out)
self.layers['relu_0'] = nn.ReLU()
def forward(self, inp):
with torch.no_grad():
# Step 1: tokenize input
tokenized_inp = tokenize(inp)
# Step 2: convert out of tokenization to embeddings on cuda device
x = generate_embedding(tokenized_inp)
# Step 3: run the network that follows
for layer_name, layer in self.layers.items():
x = layer(x)
return x
Steps 1 and 2 do not include any trainable parameters. I’m using huggingface tokenizer and embedding model. Can I parallelize these steps before executing step 3 which is the real ‘training’ step of the network?
I want to do this because my GPU memory is not fully utilized. With a batch size of 64, I only use 20% of the GPU memory.
Step 1 includes tokenization that is done serially on the CPU and step 2 includes moving data from CPU to CUDA. These steps scale linearly with the batch size. So if I try to utilize more GPU memory with a higher batch size, I lose on speed at these two steps.
Is it possible to perform tokenization and embedding in parallel to take advantage of the GPU memory capacity?
Thanks