RuntimeError: contiguous is not implemented for type UndefinedType

Feynman27 · December 13, 2017, 6:27pm

I’ve just updated from pytorch 0.2.0 to 0.3.0 in a clean anaconda environment using conda install pytorch torchvision -c pytorch. After the update, I’m seeing an error that I don’t see in v0.2.0:

Traceback (most recent call last):
  File "./tools/trainval_net.py", line 129, in <module>
    max_iters=args.max_iters)
  File "/home/thomasbalestri/PycharmProjects/tools/../lib/model/train_val.py", line 378, in train_net
    sw.train_model(max_iters)
  File "/home/thomasbalestri/PycharmProjects/tools/../lib/model/train_val.py", line 269, in train_model
    self.net.train_step(blobs, self.optimizer)
  File "/home/thomasbalestri/PycharmProjects/tools/../lib/nets/network.py", line 678, in train_step
    self._losses['total_loss'].backward()
  File "/home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: contiguous is not implemented for type UndefinedType

The error appears to be thrown when I call the backward() method on my loss.

type(self._losses['total_loss'])
<class 'torch.autograd.variable.Variable'>

Any idea what would cause this error?

richard · December 13, 2017, 6:32pm

Sounds like a bug.

Could you please give a little context around what you’re doing? A script that reproduces this would help a lot.

Feynman27 · December 13, 2017, 6:36pm

I’m running Faster R-CNN based on (https://github.com/ruotianluo/pytorch-faster-rcnn). The loss is a sum over the RPN and Fast RCNN losses, which include 2 L1 smooth loss functions for bbox regression and 2 cross-entropy loss functions for classification (see here).

I’m trying to reproduce the error on a small snippet, but haven’t had any luck thus far. I did find, however, that I can call backward on the rpn losses only. The error above is thrown when trying to call backward on the RCNN cross-entropy and bbox losses.

richard · December 13, 2017, 7:08pm

Hmm one thing that could help is a gdb stack trace.

Something like:

gdb python
>> catch throw
>> run <how the script was called>
>> backtrace

Feynman27 · December 14, 2017, 8:26pm

I tried running with gdb as you’ve advised above, but when running backtrace, I get

(gdb) backtrace
No stack.

I’ve tried to reproduce the issue using a self-contained cross-entropy example (see below), but this runs without error.

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class net(nn.Module):
    def __init__(self):
        nn.Module.__init__(self)
        self.conv = nn.Conv2d(3,31,1,2,1)
        self.avg_pool = nn.AvgPool2d((7,7), stride=(7,7))
        self._num_classes = 31

    def forward(self, X):
        X = self.conv(X)
        cls_score = self.avg_pool(X).squeeze()
        labels = Variable(torch.LongTensor(256,1).cuda().bernoulli_().squeeze())
        cross_entropy = F.cross_entropy(cls_score.view(-1,31), labels)
        return cross_entropy

if __name__== '__main__':
    X = Variable(torch.randn(256,3,14,14).cuda())
    _net = net()
    _net.cuda()
    cross_entropy = _net.forward(X)
    learning_rate = 1e-4
    optimizer = torch.optim.Adam(_net.parameters(), lr=learning_rate)
    optimizer.zero_grad()
    cross_entropy.backward()

richard · December 14, 2017, 8:28pm

Are you sure you typed in catch throw before you ran the python program? It’s a little strange that there’s no backtrace…

Feynman27 · December 14, 2017, 8:33pm

Yep. I’m not too familiar with gdb (I usually debug with pdb), so maybe I’m doing something incorrectly. Here’s the entire output:

(pytorch0.3.0_py2) thomasbalestri@linux02:~/PycharmProjects/pytorch-detect-to-track$ gdb python
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
---Type <return> to continue, or q <return> to quit---
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb) catch throw
Catchpoint 1 (throw)
(gdb) run ./tools/trainval_net.py --weight data/pretrained_models/res101.pth --imdb imagenet_vid_train --imdbval imagenet_vid_val  --iters 100000 --cfg experiments/cfgs/res101.yml --net res101  --set ANCHOR_SCALES [8,16,32] ANCHOR_RATIOS [0.5,1.0,2.0]
Starting program: /home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/bin/python ./tools/trainval_net.py --weight data/pretrained_models/res101.pth --imdb imagenet_vid_train --imdbval imagenet_vid_val  --iters 100000 --cfg experiments/cfgs/res101.yml --net res101  --set ANCHOR_SCALES [8,16,32] ANCHOR_RATIOS [0.5,1.0,2.0]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Called with args:
Namespace(cfg_file='experiments/cfgs/res101.yml', imdb_name='imagenet_vid_train', imdbval_name='imagenet_vid_val', max_iters=100000, net='res101', set_cfgs=['ANCHOR_SCALES', '[8,16,32]', 'ANCHOR_RATIOS', '[0.5,1.0,2.0]'], tag=None, weight='data/pretrained_models/res101.pth')
Using config:
{'ANCHOR_RATIOS': [0.5, 1.0, 2.0],
 'ANCHOR_SCALES': [8, 16, 32],
 'CLASS_AGNOSTIC': True,
 'DATA_DIR': '/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data',
 'DENSENET': {'FIXED_BLOCKS': 0},
 'EXP_DIR': 'res101',
 'MATLAB': 'matlab',
 'MOBILENET': {'DEPTH_MULTIPLIER': 1.0,
               'FIXED_LAYERS': 5,
               'REGU_DEPTH': False,
               'WEIGHT_DECAY': 4e-05},
 'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),
 'POOLING_MODE': 'roi',
 'POOLING_SIZE': 7,
 'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False},
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track',
 'TEST': {'BBOX_REG': True,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'MODE': 'nms',
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'gt',
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'RPN_TOP_N': 5000,
          'SCALES': [600],
          'SVM': False},
 'TRAIN': {'ASPECT_GROUPING': False,
           'BATCH_SIZE': 256,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': False,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'GAMMA': 0.1,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'LEARNING_RATE': 0.0005,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 256,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 5000,
           'SNAPSHOT_KEPT': 3,
           'SNAPSHOT_PREFIX': 'res101_faster_rcnn',
           'STEPSIZE': [70000, 140000, 190000, 240000, 1100000, 1160000],
           'SUMMARY_INTERVAL': 180,
           'TRUNCATED': False,
           'USE_ALL_GT': True,
           'USE_FLIPPED': False,
           'USE_GT': False,
           'WEIGHT_DECAY': 0.0001},
 'USE_GPU_NMS': True}
Number of classes: 31
Loaded dataset `imagenet_vidtrain` for training
Set proposal method: gt
Preparing training data...
imagenet_vidtrain gt roidb loaded from /home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data/cache/imagenet_vidtrain_gt_roidb.pkl
done
Number of classes: 31
38121 roidb entries
Output will be saved to `/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/output/res101/imagenet_vidtrain/default`
TensorFlow summaries will be saved to `/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tensorboard/res101/imagenet_vidtrain/default`
Number of classes: 31
Loaded dataset `imagenet_vidval` for training
Set proposal method: gt
Preparing training data...
imagenet_vidval gt roidb loaded from /home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data/cache/imagenet_vidval_gt_roidb.pkl
done
Number of classes: 31
5515 validation roidb entries
Filtered 1064 roidb entries: 38121 -> 37057
Filtered 103 roidb entries: 5515 -> 5412
Pairs in roidb: 32943
Pairs in roidb: 4825
Solving...
[New Thread 0x7fff96095700 (LWP 23365)]
[New Thread 0x7fff95894700 (LWP 23366)]
Loading initial model weights from data/pretrained_models/res101.pth
Loaded.
[New Thread 0x7fff8ffd3700 (LWP 23368)]
[New Thread 0x7fff8f7d2700 (LWP 23369)]
[New Thread 0x7fff8efd1700 (LWP 23370)]
[New Thread 0x7fff923d4700 (LWP 23373)]
[New Thread 0x7fff91bd3700 (LWP 23374)]
[New Thread 0x7fff913d2700 (LWP 23375)]
[New Thread 0x7fff90bd1700 (LWP 23376)]
[New Thread 0x7fff8e7d0700 (LWP 23377)]
[New Thread 0x7fff8dfcf700 (LWP 23378)]
[New Thread 0x7fff8d7ce700 (LWP 23379)]
[New Thread 0x7fff8cfcd700 (LWP 23380)]
[New Thread 0x7fff79fff700 (LWP 23381)]
[New Thread 0x7fff797fe700 (LWP 23382)]
[New Thread 0x7fff78ffd700 (LWP 23383)]
/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/nets/network.py:361: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
[Thread 0x7fff78ffd700 (LWP 23383) exited]
[Thread 0x7fff797fe700 (LWP 23382) exited]
[Thread 0x7fff8cfcd700 (LWP 23380) exited]
[Thread 0x7fff8dfcf700 (LWP 23378) exited]
[Thread 0x7fff79fff700 (LWP 23381) exited]
[Thread 0x7fff8d7ce700 (LWP 23379) exited]
/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/nets/network.py:414: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cls_prob = F.softmax(cls_score)
> /home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/lib/nets/network.py(314)_add_losses()
-> cross_entropy = F.cross_entropy(cls_score.view(-1, self._num_classes), label)
(Pdb) c
[New Thread 0x7fff79fff700 (LWP 23398)]
[New Thread 0x7fff8d7ce700 (LWP 23399)]
[New Thread 0x7fff8dfcf700 (LWP 23400)]
[New Thread 0x7fff8cfcd700 (LWP 23401)]
[New Thread 0x7fff797fe700 (LWP 23402)]
[New Thread 0x7fff78ffd700 (LWP 23403)]
[New Thread 0x7fff47fff700 (LWP 23404)]
[New Thread 0x7fff477fe700 (LWP 23405)]
Traceback (most recent call last):
  File "./tools/trainval_net.py", line 129, in <module>
    max_iters=args.max_iters)
  File "/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/model/train_val.py", line 378, in train_net
    sw.train_model(max_iters)
  File "/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/model/train_val.py", line 269, in train_model
    self.net.train_step(blobs, self.optimizer)
  File "/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/nets/network.py", line 676, in train_step
    self._losses['total_loss'].backward()
  File "/home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: contiguous is not implemented for type UndefinedType
[Thread 0x7fff47fff700 (LWP 23404) exited]
[Thread 0x7fff78ffd700 (LWP 23403) exited]
[Thread 0x7fff797fe700 (LWP 23402) exited]
[Thread 0x7fff8cfcd700 (LWP 23401) exited]
[Thread 0x7fff8dfcf700 (LWP 23400) exited]
[Thread 0x7fff8d7ce700 (LWP 23399) exited]
[Thread 0x7fff79fff700 (LWP 23398) exited]
[Thread 0x7fff8e7d0700 (LWP 23377) exited]
[Thread 0x7fff90bd1700 (LWP 23376) exited]
[Thread 0x7fff913d2700 (LWP 23375) exited]
[Thread 0x7fff91bd3700 (LWP 23374) exited]
[Thread 0x7fff923d4700 (LWP 23373) exited]
[Thread 0x7fff8efd1700 (LWP 23370) exited]
[Thread 0x7fff8f7d2700 (LWP 23369) exited]
[Thread 0x7fff8ffd3700 (LWP 23368) exited]
[Thread 0x7fff95894700 (LWP 23366) exited]
[Thread 0x7fff96095700 (LWP 23365) exited]
[Thread 0x7ffff7fce700 (LWP 23357) exited]
[Inferior 1 (process 23357) exited with code 01]
(gdb) backtrace
No stack.
(gdb)

richard · December 14, 2017, 8:44pm

You’re doing it correctly! I think what’s happening is that pdb is “stealing” the call from gdb? Would it be possible to disable pdb from attaching to runtime errors?

Feynman27 · December 14, 2017, 8:46pm

That pdb break was accidental. Removing it produces the same output:

(pytorch0.3.0_py2) thomasbalestri@markable02:~/PycharmProjects/pytorch-detect-to-track$ gdb python
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb) catch throw
Catchpoint 1 (throw)
(gdb) run ./tools/trainval_net.py --weight data/pretrained_models/res101.pth --imdb imagenet_vid_train --imdbval imagenet_vid_val  --iters 100000 --cfg experiments/cfgs/res101.yml --net res101  --set ANCHOR_SCALES [8,16,32] ANCHOR_RATIOS [0.5,1.0,2.0]
Starting program: /home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/bin/python ./tools/trainval_net.py --weight data/pretrained_models/res101.pth --imdb imagenet_vid_train --imdbval imagenet_vid_val  --iters 100000 --cfg experiments/cfgs/res101.yml --net res101  --set ANCHOR_SCALES [8,16,32] ANCHOR_RATIOS [0.5,1.0,2.0]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Called with args:
Namespace(cfg_file='experiments/cfgs/res101.yml', imdb_name='imagenet_vid_train', imdbval_name='imagenet_vid_val', max_iters=100000, net='res101', set_cfgs=['ANCHOR_SCALES', '[8,16,32]', 'ANCHOR_RATIOS', '[0.5,1.0,2.0]'], tag=None, weight='data/pretrained_models/res101.pth')
Using config:
{'ANCHOR_RATIOS': [0.5, 1.0, 2.0],
 'ANCHOR_SCALES': [8, 16, 32],
 'CLASS_AGNOSTIC': True,
 'DATA_DIR': '/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data',
 'DENSENET': {'FIXED_BLOCKS': 0},
 'EXP_DIR': 'res101',
 'MATLAB': 'matlab',
 'MOBILENET': {'DEPTH_MULTIPLIER': 1.0,
               'FIXED_LAYERS': 5,
               'REGU_DEPTH': False,
               'WEIGHT_DECAY': 4e-05},
 'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),
 'POOLING_MODE': 'roi',
 'POOLING_SIZE': 7,
 'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False},
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track',
 'TEST': {'BBOX_REG': True,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'MODE': 'nms',
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'gt',
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'RPN_TOP_N': 5000,
          'SCALES': [600],
          'SVM': False},
 'TRAIN': {'ASPECT_GROUPING': False,
           'BATCH_SIZE': 256,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': False,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'GAMMA': 0.1,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'LEARNING_RATE': 0.0005,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 256,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 5000,
           'SNAPSHOT_KEPT': 3,
           'SNAPSHOT_PREFIX': 'res101_faster_rcnn',
           'STEPSIZE': [70000, 140000, 190000, 240000, 1100000, 1160000],
           'SUMMARY_INTERVAL': 180,
           'TRUNCATED': False,
           'USE_ALL_GT': True,
           'USE_FLIPPED': False,
           'USE_GT': False,
           'WEIGHT_DECAY': 0.0001},
 'USE_GPU_NMS': True}
Number of classes: 31
Loaded dataset `imagenet_vidtrain` for training
Set proposal method: gt
Preparing training data...
imagenet_vidtrain gt roidb loaded from /home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data/cache/imagenet_vidtrain_gt_roidb.pkl
done
Number of classes: 31
38121 roidb entries
Output will be saved to `/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/output/res101/imagenet_vidtrain/default`
TensorFlow summaries will be saved to `/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tensorboard/res101/imagenet_vidtrain/default`
Number of classes: 31
Loaded dataset `imagenet_vidval` for training
Set proposal method: gt
Preparing training data...
imagenet_vidval gt roidb loaded from /home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data/cache/imagenet_vidval_gt_roidb.pkl
done
Number of classes: 31
5515 validation roidb entries
Filtered 1064 roidb entries: 38121 -> 37057
Filtered 103 roidb entries: 5515 -> 5412
Pairs in roidb: 32943
Pairs in roidb: 4825
Solving...
[New Thread 0x7fff96095700 (LWP 24483)]
[New Thread 0x7fff95894700 (LWP 24484)]
Loading initial model weights from data/pretrained_models/res101.pth
Loaded.
[New Thread 0x7fff8ffd3700 (LWP 24486)]
[New Thread 0x7fff8f7d2700 (LWP 24487)]
[New Thread 0x7fff8efd1700 (LWP 24489)]
[New Thread 0x7fff923d4700 (LWP 24490)]
[New Thread 0x7fff91bd3700 (LWP 24491)]
[New Thread 0x7fff913d2700 (LWP 24492)]
[New Thread 0x7fff90bd1700 (LWP 24493)]
[New Thread 0x7fff8e7d0700 (LWP 24494)]
[New Thread 0x7fff8dfcf700 (LWP 24495)]
[New Thread 0x7fff8d7ce700 (LWP 24496)]
[New Thread 0x7fff8cfcd700 (LWP 24497)]
[New Thread 0x7fff79fff700 (LWP 24498)]
[New Thread 0x7fff797fe700 (LWP 24499)]
[New Thread 0x7fff78ffd700 (LWP 24500)]
/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/nets/network.py:360: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
[Thread 0x7fff797fe700 (LWP 24499) exited]
[Thread 0x7fff8cfcd700 (LWP 24497) exited]
[Thread 0x7fff8d7ce700 (LWP 24496) exited]
[Thread 0x7fff8dfcf700 (LWP 24495) exited]
[Thread 0x7fff78ffd700 (LWP 24500) exited]
[Thread 0x7fff79fff700 (LWP 24498) exited]
/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/nets/network.py:413: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cls_prob = F.softmax(cls_score)
[New Thread 0x7fff79fff700 (LWP 24501)]
[New Thread 0x7fff78ffd700 (LWP 24502)]
[New Thread 0x7fff8dfcf700 (LWP 24503)]
[New Thread 0x7fff8d7ce700 (LWP 24504)]
[New Thread 0x7fff8cfcd700 (LWP 24505)]
[New Thread 0x7fff797fe700 (LWP 24506)]
[New Thread 0x7fff47fff700 (LWP 24507)]
[New Thread 0x7fff477fe700 (LWP 24508)]
Traceback (most recent call last):
  File "./tools/trainval_net.py", line 129, in <module>
    max_iters=args.max_iters)
  File "/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/model/train_val.py", line 378, in train_net
    sw.train_model(max_iters)
  File "/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/model/train_val.py", line 269, in train_model
    self.net.train_step(blobs, self.optimizer)
  File "/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tools/../lib/nets/network.py", line 675, in train_step
    self._losses['total_loss'].backward()
  File "/home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: contiguous is not implemented for type UndefinedType
[Thread 0x7fff477fe700 (LWP 24508) exited]
[Thread 0x7fff47fff700 (LWP 24507) exited]
[Thread 0x7fff797fe700 (LWP 24506) exited]
[Thread 0x7fff8d7ce700 (LWP 24504) exited]
[Thread 0x7fff8dfcf700 (LWP 24503) exited]
[Thread 0x7fff78ffd700 (LWP 24502) exited]
[Thread 0x7fff79fff700 (LWP 24501) exited]
[Thread 0x7fff8e7d0700 (LWP 24494) exited]
[Thread 0x7fff90bd1700 (LWP 24493) exited]
[Thread 0x7fff913d2700 (LWP 24492) exited]
[Thread 0x7fff91bd3700 (LWP 24491) exited]
[Thread 0x7fff923d4700 (LWP 24490) exited]
[Thread 0x7fff8efd1700 (LWP 24489) exited]
[Thread 0x7fff8f7d2700 (LWP 24487) exited]
[Thread 0x7fff8ffd3700 (LWP 24486) exited]
[Thread 0x7fff95894700 (LWP 24484) exited]
[Thread 0x7fff96095700 (LWP 24483) exited]
[Thread 0x7ffff7fce700 (LWP 24476) exited]
[Inferior 1 (process 24476) exited with code 01]
(gdb) backtrace
No stack.
(gdb) Quit

richard · December 14, 2017, 9:37pm

Yikes. What happens if you do this before the run:

>>> b at::runtime_error
>>> b runtime_error
>>> b utils.cpp:8

(I’m not sure which one is the correct syntax but hopefully one of them will catch the error…)

If you let me know how you’re running https://github.com/ruotianluo/pytorch-faster-rcnn I could try to run and take a look.

Feynman27 · December 14, 2017, 10:26pm

When running with >>>b utils.cpp:8 I see a missing file error:

Breakpoint 2, THPPointer<THLongStorage>::free (this=0x7fffffffba10) at /opt/conda/conda-bld/pytorch_1512378360668/work/torch/csrc/generic/utils.cpp:13
13      /opt/conda/conda-bld/pytorch_1512378360668/work/torch/csrc/generic/utils.cpp: No such file or directory.

Not sure if this is anything meaningful.

(pytorch0.3.0_py2) thomasbalestri@markable02:~/PycharmProjects/pytorch-detect-to-track$ gdb python
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb) catch throw
Catchpoint 1 (throw)
(gdb) b utils.cpp:8
No source file named utils.cpp.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (utils.cpp:8) pending.
(gdb) run ./tools/trainval_net.py --weight data/pretrained_models/res101.pth --imdb imagenet_vid_train --imdbval imagenet_vid_val  --iters 100000 --cfg experiments/cfgs/res101.yml --net res101  --set ANCHOR_SCALES [8,16,32] ANCHOR_RATIOS [0.5,1.0,2.0]
Starting program: /home/thomasbalestri/anaconda3/envs/pytorch0.3.0_py2/bin/python ./tools/trainval_net.py --weight data/pretrained_models/res101.pth --imdb imagenet_vid_train --imdbval imagenet_vid_val  --iters 100000 --cfg experiments/cfgs/res101.yml --net res101  --set ANCHOR_SCALES [8,16,32] ANCHOR_RATIOS [0.5,1.0,2.0]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Called with args:
Namespace(cfg_file='experiments/cfgs/res101.yml', imdb_name='imagenet_vid_train', imdbval_name='imagenet_vid_val', max_iters=100000, net='res101', set_cfgs=['ANCHOR_SCALES', '[8,16,32]', 'ANCHOR_RATIOS', '[0.5,1.0,2.0]'], tag=None, weight='data/pretrained_models/res101.pth')
Using config:
{'ANCHOR_RATIOS': [0.5, 1.0, 2.0],
 'ANCHOR_SCALES': [8, 16, 32],
 'CLASS_AGNOSTIC': True,
 'DATA_DIR': '/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data',
 'DENSENET': {'FIXED_BLOCKS': 0},
 'EXP_DIR': 'res101',
 'MATLAB': 'matlab',
 'MOBILENET': {'DEPTH_MULTIPLIER': 1.0,
               'FIXED_LAYERS': 5,
               'REGU_DEPTH': False,
               'WEIGHT_DECAY': 4e-05},
 'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),
 'POOLING_MODE': 'roi',
 'POOLING_SIZE': 7,
 'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False},
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track',
 'TEST': {'BBOX_REG': True,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'MODE': 'nms',
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'gt',
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'RPN_TOP_N': 5000,
          'SCALES': [600],
          'SVM': False},
 'TRAIN': {'ASPECT_GROUPING': False,
           'BATCH_SIZE': 256,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': False,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'GAMMA': 0.1,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'LEARNING_RATE': 0.0005,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 256,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 5000,
           'SNAPSHOT_KEPT': 3,
           'SNAPSHOT_PREFIX': 'res101_faster_rcnn',
           'STEPSIZE': [70000, 140000, 190000, 240000, 1100000, 1160000],
           'SUMMARY_INTERVAL': 180,
           'TRUNCATED': False,
           'USE_ALL_GT': True,
           'USE_FLIPPED': False,
           'USE_GT': False,
           'WEIGHT_DECAY': 0.0001},
 'USE_GPU_NMS': True}
Number of classes: 31
Loaded dataset `imagenet_vidtrain` for training
Set proposal method: gt
Preparing training data...
imagenet_vidtrain gt roidb loaded from /home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data/cache/imagenet_vidtrain_gt_roidb.pkl
done
Number of classes: 31
38121 roidb entries
Output will be saved to `/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/output/res101/imagenet_vidtrain/default`
TensorFlow summaries will be saved to `/home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/tensorboard/res101/imagenet_vidtrain/default`
Number of classes: 31
Loaded dataset `imagenet_vidval` for training
Set proposal method: gt
Preparing training data...
imagenet_vidval gt roidb loaded from /home/thomasbalestri/PycharmProjects/pytorch-detect-to-track/data/cache/imagenet_vidval_gt_roidb.pkl
done
Number of classes: 31
5515 validation roidb entries
Filtered 1064 roidb entries: 38121 -> 37057
Filtered 103 roidb entries: 5515 -> 5412
Pairs in roidb: 32943
Pairs in roidb: 4825
Solving...

Breakpoint 2, THPPointer<THLongStorage>::free (this=0x7fffffffba10) at /opt/conda/conda-bld/pytorch_1512378360668/work/torch/csrc/generic/utils.cpp:13
13      /opt/conda/conda-bld/pytorch_1512378360668/work/torch/csrc/generic/utils.cpp: No such file or directory.
(gdb)

richard · December 14, 2017, 10:29pm

That makes sense; gdb can’t find utils.cpp:8 because the cpp source isn’t bundled with the pytorch binary. Do any of the first two work?

Feynman27 · December 14, 2017, 10:34pm

No, they didn’t give me any additional information. Since I’ve modified the original codebase quite a bit, I’ll try to reproduce the error on the original https://github.com/ruotianluo/pytorch-faster-rcnn so that you might be able to reproduce it.

Feynman27 · December 15, 2017, 3:07am

Okay. Can reproduce the issue simply by changing the POOLING_MODE in pytorch-faster-rcnn/experiments/cfgs/res101.yml from crop to roi. This means the issue is somewhere within the roi pooling module.

To reproduce the error, please checkout my fork (master):

git clone https://github.com/Feynman27/pytorch-faster-rcnn.git

After checking out the repo, you’ll have to download pascal voc 2007:

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar

In the repo’s data directory, create a softlink to VOCdevkit called VOCdevkit2007.

You’ll also need pretrained resnet101 weights from here. Place this in data/pretrained_models/res101.pth.

After this, build the nms and roi pooling modules by running ./make_modules.sh and run the training script using ./experiments/scripts/run_resnet.sh.

richard · December 15, 2017, 3:40pm

Thank you for your detailed instructions!

Should I extract the tar files to /data?

I’m also getting this when trying to run run_resnet.sh:

[rzou@devgpu041.ash5 ~/pytorch/undefined-bug/pytorch-faster-rcnn]  ./experiments/script
s/run_resnet.sh
Traceback (most recent call last):
  File "./tools/trainval_net.py", line 20, in <module>
    from nets.vgg16 import vgg16
  File "/home/rzou/pytorch/undefined-bug/pytorch-faster-rcnn/tools/../lib/nets/vgg16.py", line 10, in <module>
    from nets.network import Network
  File "/home/rzou/pytorch/undefined-bug/pytorch-faster-rcnn/tools/../lib/nets/network.py", line 20, in <module>
    from layer_utils.proposal_layer import proposal_layer
  File "/home/rzou/pytorch/undefined-bug/pytorch-faster-rcnn/tools/../lib/layer_utils/proposal_layer.py", line 13, in <module>
    from model.nms_wrapper import nms
  File "/home/rzou/pytorch/undefined-bug/pytorch-faster-rcnn/tools/../lib/model/nms_wrapper.py", line 11, in <module>
    from nms.pth_nms import pth_nms
  File "/home/rzou/pytorch/undefined-bug/pytorch-faster-rcnn/tools/../lib/nms/pth_nms.py", line 2, in <module>
    from _ext import nms
ModuleNotFoundError: No module named '_ext'

Feynman27 · December 15, 2017, 3:43pm

That usually happens if you don’t build the nms and roi_pooling modules. Did you run make_modules.sh?

You can extract the tar files to data, but I just extract them elsewhere and create a sym link using something like ln -s <path/to/VOCdevkit> VOCdevkit2007 within the repos data directory.

Feynman27 · December 15, 2017, 3:51pm

You will also probably have to install Tensorboard-Pytorch:

pip install tensorboardX
pip install tensorflow

richard · December 15, 2017, 3:57pm

I did run make_modules.sh.

In the lib/nms folder there exists a _ext folder but it’s not being recognized from python?

  File "/home/rzou/undefined-bug/pytorch-faster-rcnn/tools/../lib/model/nms_wrapper.py", line 11, in <module>
    from nms.pth_nms import pth_nms
  File "/home/rzou/undefined-bug/pytorch-faster-rcnn/tools/../lib/nms/pth_nms.py", line 2, in <module>
    from _ext import nms
ModuleNotFoundError: No module named '_ext'
[rzou@devgpu041.ash5 ~/undefined-bug/pytorch-faster-rcnn] ls lib/nms/_ext
__init__.py  nms

Feynman27 · December 15, 2017, 4:00pm

Under lib/nms/ext, do you have an __init__.py file? I may have not pushed it to github.

richard · December 15, 2017, 4:03pm

I do have one, but it looks like an empty file: