How to run simple neural network on GPUs?

I want to implement VGG16 net model on MNIST. For this reason I resize 28x28 images into 224x224 but this time I need run it on a GPU. I followed the tutorials, however couldn’t run my code on gpu. I am not sure how to send mini batches to gpu device. Here is my code.

dev = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")
loss_func = F.cross_entropy
bs = 64  # batch size
l_r = 0.001  # learning rate
epochs=1

model=VGG16Conv()
model.to(dev)
opt=optim.SGD(model.parameters(), lr=l_r)

train_ds = TensorDataset(x_train_, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_ds = TensorDataset(x_valid_, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)

for epoch in range(epochs):
    print('EPOCH ', epoch)
    for xb, yb in train_dl:
        for sub in range(1,4):
            loss=loss_func(model.feedforward(xb, sub),yb)
            loss.backward()
            self.plot_grad_flow()
            opt.step()
            opt.zero_grad()
    print('VALIDATION STARTS')
    valid=[]
    with torch.no_grad():   
        valid_loss = sum(loss_func(model.feedforward(xb), yb) for xb, yb in valid_dl)
        #accuracy(model.feedforward(xb), yb)
        valid.append(sum(accuracy(model.feedforward(xb), yb) for xb, yb in valid_dl))
    print('validation loss: ', valid_loss / len(valid_dl))
    print('validation accuracy: ', sum(valid) / len(y_valid))

Hi,
You can allocate any torch tensor/model in gpu by calling
tensor = tensor.cuda()
or
tensor = tensor.to(device)
being device = torch.device(‘cpu’) or torch.device(‘cuda:0’)

It means you simply have to use model=model.cuda()/to(device) and to do the same with your input, your ground truth and your loss

1 Like

I am kinda newbie and what I understand from your response for my case, input and ground truth should be modified as

train_ds = TensorDataset(x_train_, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)
train_dl = train_dl.to(dev)
valid_ds = TensorDataset(x_valid_, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)
valid_dl = valid_dl.to(dev)

and for loss

loss=loss_func(model.feedforward(xb, sub),yb).to(dev)

But, it still does not work.

Hi, sorry for the short reply.
WRT your code you should perform the following:

model=VGG16Conv()
model = model.to(dev) # It's  not in-place, thus, your have to write model=model.to(dev)
opt=optim.SGD(model.parameters(), lr=l_r)

train_ds = TensorDataset(x_train_, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_ds = TensorDataset(x_valid_, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)

for epoch in range(epochs):
    print('EPOCH ', epoch)
    for xb, yb in train_dl:
        for sub in range(1,4):
            loss=loss_func(model.feedforward(xb.to(dev), sub),yb.to(dev)) #Allocate input samples, not 
            #dataloader itself :slight_smile: 
            # YOu may have to allocate also sub, depending on what it does in your model
            loss.backward()
            self.plot_grad_flow()
            opt.step()
            opt.zero_grad()

You don’t have to allocate the dataloader but loaded samples

In this specific case, you don’t have to “allocate the loss” as you are using the functional version. It means that it behaves like a function. Its nn.Module counterpart is a class. If you use the class version you should also allocate it.

In case of validation it’s the same. Allocate inputs as in training.
Lastly, the typical way of doing forward pass is calling model directly (once it’s been instantiated). You can simple do model(x,sub). You may review if the feedforward method calls model.forward function.

If you get an error like, input 1 is type torch.float and input 2 is type cuda.float (something like that), it means there is something not allocated in the gpu.

1 Like

I was trying to send input and labels to gpu in for loop as
for xb.to(dev), yb.to(dev) in train_dl:
however it did not work. Anyways your reply worked and thank you very much.

If you are using pytorch dataloader it’s not desirable to send inputs to gpu in the loader as it preloads several batches and may occupy lot of memory. Besides in not sure if multiprocessing allows ti do that

1 Like