A Workaround for HalfTensor lacking Stateless Methods

Hey guys,

EDIT: SEE BELOW

I’m working on using FP16 in DenseNets where concatenation is a key part of the architecture, but since HalfTensors don’t support stateless methods I can’t call cat().

I’ve thrown a workaround in by instantiating a temporary variable to store the concatenated results and then just dumping to it by indexing, but then I get an error calling backwards() that looks like it comes from the final layer:

torch.mm(grad_output, weight)
RuntimeError: Type HalfTensor doesn’t implement stateless methods

Is there an easy way to work around this, or is training in half precision not supported yet?

I’m on CUDA8.0, cuDNN5105, Ubuntu14.04, pytorch1.9 (the latest version that you get through conda install) and a Maxwell titan X. I’m in a situation where memory costs are more important than speed so I can eat the slowdown of FP16 if it reduces memory useage.

Aside, I get an error when calling torch.backends.cudnn.version() if I don’t call torch.backends.cudnn._loadlib() first (the “cuDNN not initialized” error). cuDNN still works, but when trying to check if a source build has succeeded this can be confusing.

EDIT: Nevermind, it looks like nothing else other than F.linear has this issue, so switching over to the state’d version (three edits) fixed this and is now running smoothly and quickly in FP16. Dongers up for pytorch, woo!

Probably would still make sense to add in a stateless cat() method, though, I suspect the way I’m doing it is inefficient.

Thanks,

Andy

if you could give a repro for the F.linear issue, we’re happy to fix this asap. Open an issue on pytorch/pytorch with the repro.

Done, and I’ve included a link to my fix.

Any ideas on if there’s a better workaround for the cat() issue would also be appreciated (I’m guessing the reason tensors don’t have a tensor.cat() function attached to them since it would change their shape/memory size?)

It’s hard to say what would be a good choice of semantics for tensor.cat() (should it modify the tensor? should it be included in the list?), so we’d rather stick to the stateless one. If the method is implemented in THC, then exposing it is a matter of changing one line in our C wrappers. Otherwise, it will require some changes in the backends.