How does on collect statistics during training in pytorch efficiently?

If I am running a conv net or a standard architecture like AlexNet, how does one collect stats of the model during training in an efficient way? Like the norm weights, norm gradients, test, train errors, etc.

Is it just something like:


inside the for loop? (and for the params stats just looping through all layers and stuff). I was just worried that doing that too often (brining data to cpu) could be inefficient so wanted to see what the good practices were and make sure I didn’t do anything silly.

I was also just wondering if there was anything in particular that I had to be careful, perhaps only using a mini-batch or something like or only recording the error & loss as a running average during each epoch (so we run the model over the data set once and at the end an average of the loss & error of the model for each mini-batch so running_average_for_one_epoch = sum_{mini batches} loss_per_minibatch / |total number of mini batches used|).

Is there anything in particular I should be careful about?

Why not having a float containing your loss, adding values to it at every loop and average at the end of the loop, and repeat ? you reduce this way the amount of data you append to the array

sure we can do that. I guess the main thing that is unavoidable is that we need to bring the array from GPU to CPU at each epoch, I guess its unavoidable…unless I create a GPU array that I add the value at each epoch.

I guess my question is, is it better to insatiate a array of zeros in gpu and fill that in or instantiate a numpy array of zeros (on cpu of course) and then fill that in (or perhaps a list and append to it).

I guess as long as you you are extracting a single float from GPU the performance changes don’t matter. Looking at cifar10 tutorial traing code:

def train_cifar(args, nb_epochs, trainloader,testloader, net,optimizer,criterion,logging_freq=2000):
    for epoch in range(nb_epochs):  # loop over the dataset multiple times
        running_train_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs
            start_time = time.time()
            inputs, labels = data
            if args.enable_cuda:
                inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())
                inputs, labels = Variable(inputs), Variable(labels)
            # zero the parameter gradients
            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            # print statistics
            running_train_loss +=[0]
            if i % logging_freq == logging_freq-1:    # print every logging_freq mini-batches
                # note you dividing by logging_freq because you summed logging_freq mini-batches, so the average is dividing by logging_freq.
                print(f'monitoring during training: eptoch={epoch+1}, batch_index={i+1}, loss={running_train_loss/logging_freq}')
                running_train_loss = 0.0

Well the only limitation I see is the size of memory, if you keep something on the gpu it might take some space you would need for your model or data. And because it seems that extracting a single float doesnt change performances a lot this is what I would do, just extract a value each time.

1 Like