Improving training time for LSTM model

dachosen1 · May 4, 2020, 5:49pm

I created an LSTM model and rented a GPU and CPU in the cloud for training. The problem is that the model isn’t using all the available resources. My CPU utilization is less than 5% and my GPU is at ~20%. The GPU utilization does follow a sin wave pattern. Which I suspect is due to turning my GPU for validation.

Is there a way to speed up the training time by using 100% of the CPU and GPU?

ptrblck · May 5, 2020, 6:56am

If you GPU is not fully utilized, your training might suffer from bottnecks such as data loading or small workloads on the GPU.
In the former case, you could try to use synthetic data and check the GPU utilization again.
If you are dealing with a small model and small batches, the actual computation on the GPU might be too small so that the overhead from the data transfer to the device as well as the kernel launches might be visible.

dachosen1 · May 7, 2020, 7:19am

I believe the bottle neck is in data loading. I’m using Dataset folder to load data in batches. What do you mean by using synthetic data?

would increasing the number of workers in the Dataloader method increase the number workload on the GPU?

ptrblck · May 7, 2020, 7:24am

Instead of loading real data you could initialize random data, push it to the GPU and profile the training loop without the data loading:

# for a multi-class classification
data = torch.randn(batch_size, channels, height, width, device='cuda')
target = torch.randint(0, nb_classes, (batch_size,), device='cuda')

for epoch in range(10):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()

Generally yes, but too many workers might again slow down your system. The sweet spot depends on your current workstation and you would have to profile specific values.