I am seeking for optimal steps in order to train my network on Imagenet. As the dataset is very big, I see a single mistake of mine may lead to several more days/hours.
Please suggest me a reading on Imagenet-loading and plugging in the model.
There is an ImageNet example in the PyTorch repo. Could you use this as a template for your use case or do you need something else?
Thank you for your response.
But when I use the give code and plugin Imagenet dataset, it’s giving me some error because of the different structure of validation dataset, as directory val does not contain subfolders (unlike train directory) but the files only and this code expects subfolders than images. Please let me know where am I doing a mistake.
File “”, line 1, in
File “/home/anaconda3/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py”, line 714, in runfile
File “/home/anaconda3/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py”, line 89, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)
File “/home/spyder_workspace/work/w_cuda/Imagenet/imagenet_training.py”, line 410, in
File “/home/spyder_workspace/work/w_cuda/Imagenet/imagenet_training.py”, line 234, in main
File “/home/anaconda3/lib/python3.5/site-packages/torchvision/datasets/folder.py”, line 178, in init
File “/home/anaconda3/lib/python3.5/site-packages/torchvision/datasets/folder.py”, line 79, in init
"Supported extensions are: " + “,”.join(extensions)))
RuntimeError: Found 0 files in subfolders of: /home/brijraj/Downloads/ILSVRC/Data/CLS-LOC/val/
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif
I solved this issue by rearranging the data in required directories and format.
Here, I got a different problem of GPU memory overflow, I got to see that GPU memory requirement is incrementally increasing with the number of iterations and after 11k iterations, its all occupied and so is the error, while I think it should not increase with the iterations and the code which creates the problem.
Can you help me in identifying which variable’s size is iteration dependent here?
for i, (input1, target) in tqdm(enumerate(train_loader)): if args.gpu is not None: input1 = input1.cuda()#args.gpu, non_blocking=True) target = target.cuda()#args.gpu, non_blocking=True) output = model(Variable(input1)) loss = criterion(output, Variable(target)) prec1, prec5 = accuracy((output.data).cpu(), target.cpu(), topk=(1, 5)) losses.update(loss, input1.size(0)) top1.update(prec1, input1.size(0)) top5.update(prec5, input1.size(0)) optimizer.zero_grad() loss.backward() optimizer.step() batch_time.update(time.time() - end) end = time.time() if i % args.print_freq == 0: batch_time=batch_time top1=top1 top5=top5 loss=losses
You are storing the computation graph in every iteration by appending
loss directly to your logging:
Change this line to the following and check the memory issue again:
But I am using pytorch version ‘0.3.0.post4’, and it is not supporting “.item()”, Since my GPU supports max Cuda Capability 3.5 only So I have no option of upgrading the version.
Do you think there could be any other solution?
In that case you could use
Let me know, if the memory issue disappears.
Thank you so much… Yes It’s resolved.
Hey, why withdraw this post? This might help someone else in future.
How to do formatting and arrangement can you please explain in detail.