Typecasting network and data to HalfTensor results in all parameters = NaN

The (lack of) numeric stability of half can bite you fairly easily. For example, batch norm basically doesn’t seem to work well with half. There is some more discussion in the mail and thread linked below.

Best regards

Thomas