Push a GPU tensor (output from network) into CPU but on preallocated memory

Assuming i have a tensor output a from a network of some float [100, 100]. I then have some preallocated a_cpu float*. I could do a.cpu(), then memcpy the data to that pointer which would be double work. Is there a way i can do .cpu(), but let it target some preallocated memory?