Confused about the inputs configuration

I found that the Imagenet example on Github doe not use input.cuda(async=True)

for i, (input, target) in enumerate(train_loader):
    # measure data loading time
    data_time.update(time.time() - end)

    target = target.cuda(async=True)
    input_var = torch.autograd.Variable(input)
    target_var = torch.autograd.Variable(target)

while some other examples use it, e.g. the MNIST example on github

for batch_idx, (data, target) in enumerate(train_loader):
    if args.cuda:
        data, target = data.cuda(), target.cuda()

So is it meaningful to use input.cuda? Is it meaningful to set async=True for the inputs? Is there any difference?

The imagenet example seems like a bug to me. It will raise a type error when a gpu tensor meets a cpu tensor.