How to increase GPU usage?

Hello, I recently started learning pytorch, and started to play around a bit with regression.
For that, I made simple Neural Network, which I am training with a dataset consisting of roughly 780 000 samples, with 34 features.
Since its alot of data, I wanted to use my GPU(NVIDIA 1050Ti) instead of CPU for training.
However the training process still seems a bit slow for me, and when I checked GPU usage at the windows task manager, it was basically locked at 5%.
I would like to know if its possible to increase my gpu usage, so that my training would be faster, and if so, how?
Thank you in advance, and heres my code in case anything needs to be checked:

class Model(nn.Module):

    def __init__(self,input_size):
        self.input = nn.Linear(input_size,256)
        self.l1 = nn.Linear(256,128)
        self.l2 = nn.Linear(128,64)
        self.l3 = nn.Linear(64,32)
        self.l4 = nn.Linear(32,16)
        self.l5 = nn.Linear(16,8)
        self.l6 = nn.Linear(8,4)
        self.l7 = nn.Linear(4,2)
        self.output = nn.Linear(2,1)

        self.dropout = nn.Dropout(p=0.2)
    def forward(self, x):
        x = self.dropout(F.relu(self.input(x)))
        x = self.dropout(F.relu(self.l1(x)))
        x = self.dropout(F.relu(self.l2(x)))
        x = self.dropout(F.relu(self.l3(x)))
        x = self.dropout(F.relu(self.l4(x)))
        x = self.dropout(F.relu(self.l5(x)))
        x = self.dropout(F.relu(self.l6(x)))
        x = self.dropout(F.relu(self.l7(x)))
        return self.output(x)

input_dim = train.shape[1]

model = Model(input_dim)
optimizer = optim.SGD(model.parameters(),lr=0.003)
criterion = nn.MSELoss()

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

epochs = 1
for e in range(epochs):
    steps = 0
    for inputs, values in train_loader:
        steps += 1
        inputs, values =,
        output = model.forward(inputs)
        loss = criterion(output,values)
        l = loss.item()
        if(steps % 100000 == 0):
    print("Epoch ", e, "MSE: ", l)
1 Like

First, if the loading data in the memory or other preprocessing steps on the CPU are the bottleneck, then it can reduce the GPU usage. Since in this case, the GPU has to wait for the data to be pre-processed and sent over.

Second thing, how much of the GPU memory is utilised? If there is still some part of GPU memory that is still available, you can increase the batch-size, so that the GPU will do more work per batch.

Are you saying that the problem is that the train_loader loop is done by the cpu, and hence the problem?
If yes, is there any way to change this? Or even work around it?

1 Like

Yes, it depends on what is the bottleneck in your program.

If it is due to the first case, then you can try to decrease the pre-processing time. For example, using more CPU workers. Alternatively, if the pre-processing is taking too long, then it might help to save the pre-processed data in pickle files, and load the pre-processed files directly.

It really depends on how fast loading the data is compared to pre-processing data.