Error Cholesky CPU

Hi, I’m getting this RuntimeError while running an optimization algorithm. I was able to run without problems 2 days ago, but now I occur in this error every time. The error doesn’t show up always on the same iteration, but sometimes the algorithm can run for 23 iterations or sometimes only one or less.
This is the error message:

RuntimeError                              Traceback (most recent call last)
     44         tilts, outputs, rewards = Algorithm5.update_data(next_config)
     45         #aggiorno il modello
---> 46         Algorithm5.fit_model(next_config)
     48         end_k = time.time()

 in fit_model(self, new_config)
     54             optimizer.zero_grad()
     55             output = self.model(self.model.train_inputs[0])
---> 56             loss = -mll(output, self.model.train_targets)
     57             loss.backward()
     58             optimizer.step()

~\anaconda3\lib\site-packages\gpytorch\ in __call__(self, *inputs, **kwargs)
     27     def __call__(self, *inputs, **kwargs):
---> 28         outputs = self.forward(*inputs, **kwargs)
     29         if isinstance(outputs, list):
     30             return [_validate_module_outputs(output) for output in outputs]

~\anaconda3\lib\site-packages\gpytorch\mlls\ in forward(self, function_dist, target, *params)
     49         # Get the log prob of the marginal distribution
     50         output = self.likelihood(function_dist, *params)
---> 51         res = output.log_prob(target)
     53         # Add additional terms (SGPR / learned inducing points, heteroskedastic likelihood models)

~\anaconda3\lib\site-packages\gpytorch\distributions\ in log_prob(self, value)
    134         # Get log determininat and first part of quadratic form
--> 135         inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
    137         res = -0.5 * sum([inv_quad, logdet, diff.size(-1) * math.log(2 * math.pi)])

~\anaconda3\lib\site-packages\gpytorch\lazy\ in inv_quad_logdet(self, inv_quad_rhs, logdet, reduce_inv_quad)
   1000             from .chol_lazy_tensor import CholLazyTensor
-> 1002             cholesky = CholLazyTensor(self.cholesky())
   1003             return cholesky.inv_quad_logdet(inv_quad_rhs=inv_quad_rhs, logdet=logdet, reduce_inv_quad=reduce_inv_quad)

~\anaconda3\lib\site-packages\gpytorch\lazy\ in cholesky(self, upper)
    737             (LazyTensor) Cholesky factor (lower triangular)
    738         """
--> 739         res = self._cholesky()
    740         if upper:
    741             res = res.transpose(-1, -2)

~\anaconda3\lib\site-packages\gpytorch\utils\ in g(self, *args, **kwargs)
     32         cache_name = name if name is not None else method
     33         if not is_in_cache(self, cache_name):
---> 34             add_to_cache(self, cache_name, method(self, *args, **kwargs))
     35         return get_from_cache(self, cache_name)

~\anaconda3\lib\site-packages\gpytorch\lazy\ in _cholesky(self)
    413         # contiguous call is necessary here
--> 414         cholesky = psd_safe_cholesky(evaluated_mat).contiguous()
    415         return NonLazyTensor(cholesky)

~\anaconda3\lib\site-packages\gpytorch\utils\ in psd_safe_cholesky(A, upper, out, jitter)
     46             except RuntimeError:
     47                 continue
---> 48         raise e

~\anaconda3\lib\site-packages\gpytorch\utils\ in psd_safe_cholesky(A, upper, out, jitter)
     23     """
     24     try:
---> 25         L = torch.cholesky(A, upper=upper, out=out)
     26         return L
     27     except RuntimeError as e:

RuntimeError: cholesky_cpu: U(135,135) is zero, singular U.

What can I do?

Hi RR!

It looks like this is a know bug – see the following two github issues:

Even though these two issues are marked closed, it’s not clear to me
that the core issue has been addressed.

In any event, it appears that some upstream code is letting a singular
matrix escape to a Cholesky-decomposition routine that expects a
positive-definite matrix.

These two github issues (as well as others) discuss some possible
work-arounds. You could read through the issue discussions and see
if any of the suggestions work for your use case.

If you’re not running the latest and greatest, you could try upgrading to
the up-to-date version of whatever package you are using and see if
the github issues were, in fact, legitimately closed.

Good luck.

K. Frank