Long delay after `backward`

Hey everyone, I get some behavior for which I am not sure whether it’s a bug or not. I have the following piece of code

for dataset_no, (inputs, targets) in enumerate(trainloader):
    inputs = Variable(inputs.cuda(), requires_grad=False)
    targets = Variable(targets.cuda(), requires_grad=False)
    # zero the parameter gradients
    optimizer.zero_grad()
    # forward + backward + optimize
    outputs = model(inputs)
    objective = criterion(outputs, targets) \
                + model.readout.l1() * gamma_readout \
                + model.group_sparsity() * gamma_hidden
    objective.backward()
    optimizer.step()

The first loop runs fast, but the second time, the lines

    inputs = Variable(inputs.cuda(), requires_grad=False)
    targets = Variable(targets.cuda(), requires_grad=False)

are execute it takes forever.

The network I use is

NetWork (
  (core): Sequential (
    (conv0): Conv3d(1, 32, kernel_size=(5, 5, 5), stride=(1, 1, 1))
    (f0): CustomNonLinearity (
    )
    (conv1): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), dilation=(2, 2, 2))
    (f1): CustomNonLinearity (
    )
    (conv2): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), dilation=(2, 2, 2))
  )
  (readout): FullyConnected (32 x 24 x 52 -> 452)
  (nonlinearity): CustomNonLinearity (
  )
)

Does anyone see an obvious mistake? Thanks a lot.