Printing tensor in for loop changes value

I am dealing with an extremely weird bug in an inference loop (a modified timm inference loop, the model is a pretrained ‘convnext_xlarge_in22ft1k’) :

binary_results = []
with torch.no_grad():
        for batch_idx, (input, targets) in enumerate(loader):
            input = input.cuda()
            labels = model(input)
            probs = torch.nn.functional.softmax(labels, dim=1)
            max_val, predicted_class = probs.max(1)
            binary_results.append((predicted_class == targets).int())

Collecting the values in binary_results and printing to a file reveals that the first 512 predictions (the first 2 batches) are 0s, which is incorrect. Simply changing the code to:

binary_results = []
with torch.no_grad():
        for batch_idx, (input, targets) in enumerate(loader):
            input = input.cuda()
            labels = model(input)
            probs = torch.nn.functional.softmax(labels, dim=1)
            max_val, predicted_class = probs.max(1)
            binary_results.append((predicted_class == targets).int().cpu())

gives correct results. Additionally, simply printing the first batch fixes the first 256 predictions:

binary_results = []
with torch.no_grad():
        for batch_idx, (input, targets) in enumerate(loader):
            input = input.cuda()
            labels = model(input)
            probs = torch.nn.functional.softmax(labels, dim=1)
            max_val, predicted_class = probs.max(1)
            binary_results.append((predicted_class == targets).int())
        if batch_idx == 0:
            print('Batch results are:', binary_results[0])

What is happening?

Could you post a minimal and executable code snippet as well as more information about your setup?

Sure! Here’s an executable code snippet:

import os
import time
import argparse
import logging
import numpy as np
import torch

from timm.models import create_model, apply_test_time_pool
from timm.data import ImageDataset, create_loader, resolve_data_config

torch.backends.cudnn.benchmark = True
_logger = logging.getLogger('inference')


parser = argparse.ArgumentParser(description='PyTorch ImageNet Inference')
parser.add_argument('data', metavar='DIR',
                    help='path to dataset')
parser.add_argument('--model', '-m', metavar='MODEL', default='dpn92',
                    help='model architecture (default: dpn92)')
parser.add_argument('-j', '--workers', default=4, type=int, metavar='N',
                    help='number of data loading workers (default: 2)')
parser.add_argument('-b', '--batch-size', default=256, type=int,
                    metavar='N', help='mini-batch size (default: 256)')
parser.add_argument('--input-size', default=None, nargs=3, type=int,
                    metavar='N N N', help='Input all image dimensions (d h w, e.g. --input-size 3 224 224), uses model default if empty')
parser.add_argument('--interpolation', default='', type=str, metavar='NAME',
                    help='Image resize interpolation type (overrides model)')
parser.add_argument('--num-classes', type=int, default=1000,
                    help='Number classes in dataset')
parser.add_argument('--checkpoint', default='', type=str, metavar='PATH',
                    help='path to latest checkpoint (default: none)')
parser.add_argument('--pretrained', dest='pretrained', action='store_true',
                    help='use pre-trained model')
parser.add_argument('--no-test-pool', dest='no_test_pool', action='store_true',
                    help='disable test time pool')
parser.add_argument('--save_location', help='place you want to save the model')

def main():
    args = parser.parse_args()
    args.pretrained = args.pretrained or not args.checkpoint
    model = create_model(
        args.model,
        num_classes=args.num_classes,
        in_chans=3,
        pretrained=args.pretrained,
        checkpoint_path=args.checkpoint)

    config = resolve_data_config(vars(args), model=model)

    model, test_time_pool = (
        model, False) if args.no_test_pool else apply_test_time_pool(model, config)
    model = model.cuda()

    loader = create_loader(
        ImageDataset(args.data),
        input_size=config['input_size'],
        batch_size=args.batch_size,
        use_prefetcher=True,
        interpolation=config['interpolation'],
        num_workers=args.workers,
        crop_pct=1.0 if test_time_pool else config['crop_pct'])

    model.eval()

    binary_results = []
    with torch.no_grad():
        for batch_idx, (input, targets) in enumerate(loader):
            input = input.cuda()
            labels = model(input)
            probs = torch.nn.functional.softmax(labels, dim=1)
            max_val, predicted_class = probs.max(1)
            binary_results.append((predicted_class == targets).int())

    
    binary_results = torch.cat(binary_results, dim=0)
    
    file_name = '/test.csv'
    temp = args.save_location + file_name
    print('location of the file is:', temp)
    with open(temp, 'w') as out_file:
        filenames = loader.dataset.filenames(basename=True)
        for filename, label in zip(filenames, binary_results.tolist()):
            out_file.write('{0},{1}\n'.format(filename, label))

if __name__ == '__main__':
    main()

My pytorch version is 1.12.0, and nvcc --version gives 11.7 (on an Ubuntu 22.04 server).

Thanks, but it’s not executable as it has a data dependency, so please remove all unneeded parts or add a standard dataset.

I apologize for not specifying the relevant details. The script was run on the validation set of ImageNet, which has 50000 images organized in various folders. To run the script, I use: python name-of-scriptpath-to-image-folders” --model “convnext_xlarge_in22ft1k” --save_location “path” --pretrained

Using torch.cuda.synchronize() in every iteration of the loop seems to fix this as well