RuntimeError: value cannot be converted to type float without overflow: inf?

If I use small learning rate, network can be trained, but it will take longer time, so what other methods can solve this problem? thanks.

/home/lshi22/.local/lib/python3.5/site-packages/requests/init.py:80: RequestsDependencyWarning: urllib3 (1.23) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Initializing weights…
Loading Dataset…
Training SSD on WiderFace
/home/lshi22/.local/lib/python3.5/site-packages/torch/autograd/functions/tensor.py:447: UserWarning: mask is not broadcastable to self, but they have the same number of elements. Falling back to deprecated pointwise behavior.
return tensor.masked_fill(mask, value)
front and back Timer: 22.854674100875854 sec.
iter 0 || Loss: 71.4618 ||
Loss conf: 38.856040954589844 Loss loc: 13.214371681213379
Loss head conf: 28.07379722595215 Loss head loc: 10.708907127380371
lr: 0.001
Saving state, iter: 0
Traceback (most recent call last):
File “train.py”, line 241, in
train()
File “train.py”, line 188, in train
loss_l, loss_c = criterion(tuple(out[0:3]), targets)
File “/home/lshi22/.local/lib/python3.5/site-packages/torch/nn/modules/module.py”, line 357, in call
result = self.forward(*input, **kwargs)
File “/home/lshi22/PyramidBox/layers/modules/multibox_loss.py”, line 102, in forward
loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))
File “/home/lshi22/PyramidBox/layers/box_utils.py”, line 246, in log_sum_exp
return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max
RuntimeError: value cannot be converted to type float without overflow: inf