Training a Video model

KanZa · July 31, 2018, 9:44am

hi,

Personal Computer:

Python: 3.6
Pytorch: 0.4.0
Ubuntu: 16
GPU: GeForce GTX 1080/PCIe/SSE2

I am training a model. The dataset is videos. I have run this on the personal computer and on the server with four GPUs but I have the same error.

Kindly help me if you understand this error:

ptrblck · July 31, 2018, 10:02am

What are you passing as args.gpus?
Are you able to create a dummy CUDA tensor using:

x = torch.randn(1, device='cuda')
x = torch.randn(1, device=args.gpus[0])

KanZa · July 31, 2018, 10:08am

Thank you so much for your reply, I am actually trying to implement a project of github

You can also check the code of train.py which I am using in the screenshot to train.
Screenshot of my system:

KanZa · July 31, 2018, 10:17am

Yes, I am able to create dummy CUDA tensor.

ptrblck · July 31, 2018, 10:42am

This seems to be caused by a type. Did you try to print(label) instead of lable?

KanZa · August 3, 2018, 6:51am

Label is now printing labels with error:

Traceback (most recent call last):
File “train.py”, line 273, in
main()
File “train.py”, line 102, in main
train(train_loader, model, criterion, optimizer, epoch, cur_lr)
File “train.py”, line 140, in train
output = model(input_var)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py”, line 112, in forward
return self.module(*inputs[0], **kwargs[0])
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/kanza/Downloads/pytorch-coviar/pytorch-coviar-master/model.py”, line 64, in forward
base_out = self.base_model(input)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/kanza/.local/lib/python3.5/site-packages/torchvision/models/resnet.py”, line 144, in forward
x = self.layer1(x)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/kanza/.local/lib/python3.5/site-packages/torchvision/models/resnet.py”, line 88, in forward
residual = self.downsample(x)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/modules/batchnorm.py”, line 49, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File “/home/kanza/.local/lib/python3.5/site-packages/torch/nn/functional.py”, line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

KanZa · August 3, 2018, 8:52am

Hello @ptrblck,

I am Sorry, I didnt answer your this question as I am new in deep learning.

What are you passing as args.gpus?

It is like this:

model = torch.nn.DataParallel(model, device_ids=args.gpus).cuda()
cudnn.benchmark = True

I have also tried it like device_ids=[0,1,2,3] but still I have assertion error.

KanZa · August 3, 2018, 8:57am

Updated Condition of my program is this: