AssertionError: min nan should be less than max nan

jiacheng1gujiaxin · October 23, 2019, 7:23am

Thank you. After I merged conv and batchnorm. I have solved this problem, but I have encountered this problem in training - aware quantification. Do you have any suggestions?

File “/home/g/anaconda3/lib/python3.7/site-packages/torch/quantization/observer.py”, line 165, in _calculate_qparams
zero_point = qmin - round(min_val / scale)
ValueError: cannot convert float NaN to integer

Is there a problem with my data?

Need to add exception capture?

It was later found that Nan existed in conv2d.weight.
And that’s happened in the process of training.

ptrblck · October 23, 2019, 10:30am

If your weights got a NaN value, this might be due to a NaN input or a faulty weight update caused by e.g. a high learning rate.
Did you observe the loss during training?
If some weights are exploding, you would usually see a NaN loss.

chihyu · February 5, 2020, 3:41pm

I have the similar problem when model use the training - aware quantification and the training loss is not nan. How to solve it?
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 540, in call
result = self.forward(*input, **kwargs)
File “model-code/resnet34/train_age_quan.py”, line 1130, in forward
x = self.ConvBNReLU1(x)
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 540, in call
result = self.forward(*input, **kwargs)
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 100, in forward
input = module(input)
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 540, in call
result = self.forward(*input, **kwargs)
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py”, line 243, in forward
return self.activation_post_process(F.relu(ConvBn2d._forward(self, input)))
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/nn/intrinsic/qat/modules/conv_fused.py”, line 95, in _forward
conv = self._conv_forward(input, self.weight_fake_quant(scaled_weight))
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 540, in call
result = self.forward(*input, **kwargs)
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/quantization/fake_quantize.py”, line 81, in forward
self.scale, self.zero_point = self.calculate_qparams()
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/quantization/fake_quantize.py”, line 76, in calculate_qparams
return self.activation_post_process.calculate_qparams()
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/quantization/observer.py”, line 481, in calculate_qparams
return self._calculate_per_channel_qparams(self.min_vals, self.max_vals)
File “/mnt/storage1/doris/miniconda3/envs/torch-nightly-py3.6/lib/python3.6/site-packages/torch/quantization/observer.py”, line 150, in _calculate_per_channel_qparams
), “min {} should be less than max {}”.format(min_vals[i], max_vals[i])
AssertionError: min nan should be less than max nan

jerryzh168 · February 14, 2020, 6:33pm

can you check if values in weights/activations contains nan?

haviernick · March 15, 2021, 6:24am

You can avoid this with a mask method. Note first that in python NaN is defined as the number which is not equal to itself:

>float('nan') == float('nan')      
False

It might be worth avoiding use of np.NaN altogether. NaN literally means “not a number”, and it cannot be converted to an integer. In general, Python prefers raising an exception to returning NaN, so things like sqrt(-1) and log(0.0) will generally raise instead of returning NaN. However, you may get this value back from some other library. From v0.24, you actually can. Pandas introduces Nullable Integer Data Types which allows integers to coexist with NaNs. Also, even at the lastest versions of pandas if the column is object type you would have to convert into float first, something like:

df['column_name'].astype(np.float).astype("Int32")

NB: You have to go through numpy float first and then to nullable Int32, for some reason.