Should it really be necessary to do var.detach().cpu().numpy()?

Fair enough - but could we at least get rid of the need for X.cpu().numpy()? Seems X.numpy() alone should be enough.