How much does it usually takes when running VGGNet?

I am trying to make an VGGNet by myself with CIFAR-10 datasets.

and this is the model what I made


CNN(
(layer1): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
)
(layer2): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
)
(layer3): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
)
(layer4): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
)
(layer5): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
)
(layer6): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc1): Linear(in_features=512, out_features=256, bias=True)
(fc2): Linear(in_features=256, out_features=256, bias=True)
(fc3): Linear(in_features=256, out_features=10, bias=True)
)


It runs but It takes too much time. It takes more than 30min to do 1 epoch and I am sure that I’m using GPU.

Is this normal or did I do something wrong??

If each epoch takes 30 minutes, it seems the GPU might not be used.
Could you run this dummy code snippet and report the time for each epoch you are seeing:

model = models.vgg16().to('cuda')
dataset = datasets.CIFAR10(
    root=ROOT,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
        ]),
    download=True
)

loader = DataLoader(
    dataset,
    num_workers=4,
    pin_memory=True,
    shuffle=True,
    batch_size=64
)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

for epoch in range(5):
    torch.cuda.synchronize()
    t0 = time.time()
    for data, target in loader:
        optimizer.zero_grad()
        data = data.to('cuda')
        target = target.to('cuda')
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    torch.cuda.synchronize()
    t1 = time.time()
    print('Epoch {} took {}s'.format(epoch, t1-t0))

I’m getting approx. 36 seconds per epoch on a Titan V.

1 Like

Thank you very much.
now It got much faster(30sec/1epoch)!!
and I have another question.
Do I always have to type .to(“cuda”) to use GPU?
It doesn’t automatically goes to GPU?
or is there something to change the default from CPU to GPU?

In the “normal” use case, you would have to push the model as well as your data and target tensors to the device manually and I would recommend to stick to this approach.

It’s clearer in my opinion than e.g. setting the default tensor type to a CUDATensor.

1 Like