Strangely slow weight loading [fixed]

I’m running into a strange issue where loading weights is quite slow. Specifically for the DC-GAN example in the repo, loading the weights for a DC-GAN with 10 latent variables takes 150 seconds which doesn’t seem right given the size of the model.

The code to create/load the model is below; is anything obviously wrong here? Thanks!

Edit: the slow part is torch.loadload_state_dict is almost instantaneous.

Edit 2: this was fixed by upgrading to Cuda 8.0

class _netG(nn.Module):
    def __init__(self, ngpu):
        super(_netG, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d(    ngf,      nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )
    def forward(self, input): # [omitted for previty]

netG = _netG(ngpu)
netG.apply(weights_init)
with rtk.timing.Logger('load weights'):
    if opt.netG != '':
        netG.load_state_dict(torch.load(opt.netG))
print(netG)