How much does it usually takes when running VGGNet?

Yolkandwhite · January 21, 2020, 2:32am

I am trying to make an VGGNet by myself with CIFAR-10 datasets.

and this is the model what I made

CNN(
(layer1): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
)
(layer2): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
)
(layer3): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
)
(layer4): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
)
(layer5): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): ReLU()
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU()
(5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
)
(layer6): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc1): Linear(in_features=512, out_features=256, bias=True)
(fc2): Linear(in_features=256, out_features=256, bias=True)
(fc3): Linear(in_features=256, out_features=10, bias=True)
)

It runs but It takes too much time. It takes more than 30min to do 1 epoch and I am sure that I’m using GPU.

Is this normal or did I do something wrong??

ptrblck · January 21, 2020, 2:52am

If each epoch takes 30 minutes, it seems the GPU might not be used.
Could you run this dummy code snippet and report the time for each epoch you are seeing:

model = models.vgg16().to('cuda')
dataset = datasets.CIFAR10(
    root=ROOT,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
        ]),
    download=True
)

loader = DataLoader(
    dataset,
    num_workers=4,
    pin_memory=True,
    shuffle=True,
    batch_size=64
)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

for epoch in range(5):
    torch.cuda.synchronize()
    t0 = time.time()
    for data, target in loader:
        optimizer.zero_grad()
        data = data.to('cuda')
        target = target.to('cuda')
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    torch.cuda.synchronize()
    t1 = time.time()
    print('Epoch {} took {}s'.format(epoch, t1-t0))

I’m getting approx. 36 seconds per epoch on a Titan V.

Yolkandwhite · January 21, 2020, 3:25am

Thank you very much.
now It got much faster(30sec/1epoch)!!
and I have another question.
Do I always have to type .to(“cuda”) to use GPU?
It doesn’t automatically goes to GPU?
or is there something to change the default from CPU to GPU?

ptrblck · January 21, 2020, 3:29am

In the “normal” use case, you would have to push the model as well as your data and target tensors to the device manually and I would recommend to stick to this approach.

It’s clearer in my opinion than e.g. setting the default tensor type to a CUDATensor.