How to return a boolean tensor depending on condition?

Let the cartesian product between a set of query_idx and another set of doc_idx as shown below:

torch.cartesian_prod(preds["query_idx"], preds["doc_idx"])
# tensor([
#     [0, 9], [0, 5], [0, 7], [1, 9], [1, 5], [1, 7], [2, 9], [2, 5], [2, 7]
# ])

Also consider the relevance map, stating which documents are relevant in relation to a given query:

relevance_map = {
    0: [5,9], # the docs 5 and 9 are relevant results concerning the query 0.
    1: [9],
    2: [5,7]

Then, how to generate a new tensor classifying each pair of the cartesian product as TRUE (relevant) or False (non-relevant) according to the relevance map?

    True, True, False, True, False, False, False, True, True

Hi Aldebaran, you could try this:

torch.Tensor([x[1].item() in relevance_map.get(x[0].item(), []) for x in cartesian_prod]).bool()
tensor([ True,  True, False,  True, False, False, False,  True,  True])

It is a solution. However, it’s not differentiable because it uses the .item() operation.

I’m not sure that this type of result is differentiable, since it’s so discontinuous. I think you’d have to define a continuous function that somehow captures how close a pair in your cartesian_prod is from being relevant (say, with 1 for totally irrelevant and 0 if in the relevant map, and somehow smooth in between) and run .sum().backward() on that. Basically I think you need to extend this discrete math problem to a continuous case, in some way, and then differentiate that.

Personally I’m not sure how to do that.

Hopefully I’m wrong and there’s a far simpler and more direct way to do this, and someone more knowledgeable here can help you out.

EDIT: Thinking about this more, one way to convert this to a differentiable problem would be to make your output a matrix shaped like (number_of_docs x number_of_queries). Its entries would be floats in the (0, 1) range. Your target would be a similarly shaped matrix whose values are either 0 or 1 depending on whether that pair of (document, query) is in the relevance map or not. You can run some type of binary cross entropy loss between these two matrices. Of course, you could run into memory problems if you have lots of docs and queries but your relevance map is very sparse, etc.


I am assuming that by looking for a differentiable solution, you mean that the resulting tensor has require_grad = True in the end. If so, then we have a problem because you want a boolean tensor.

If we look at the documentation for autograd we can see that only floating point tensors are supported.

If this is the case, then I don´t think it is going to be possible to get what you want (a differentiable boolean tensor).

However, let me try and fail with extra steps:


  • You don´t care that relevance_map is a dict

We can change it into a tensor, such that all end nodes have the same size. For this, I added -inf where the vectors are too short.

cartesian_prod = torch.tensor([[0., 9], [0, 5], [0, 7], [1, 9], [1, 5], [1, 7], [2, 9], [2, 5], [2, 7]], requires_grad=True)
relevance_map = torch.tensor([[5,9],[-float('inf'), 9],[5,7]])

Solution 1 - no grad

We get a boolean tensor, however, grad is lost when we do the equal (==) operation. If this was not the case, any will also remove it.

tmp = torch.any(relevance_map[cartesian_prod[:, 0].long()] == cartesian_prod[:, 1].unsqueeze(0).T, dim=1)
# tensor([ True,  True, False,  True, False, False, False,  True,  True])
# False

Solution 2 - not boolean (and weird format)

Here all opperations should be differentiable. The problem is that the output is a float tensor, where 0 means True and anything other than 0 is False. (as I said, weird format)

tmp2, _ = torch.abs(relevance_map[cartesian_prod[:, 0].long()] - cartesian_prod[:, 1].unsqueeze(0).T).min(dim=1)
# tensor([0., 0., 2., 0., 4., 2., 2., 0., 0.], grad_fn=<MinBackward0>)
# True

Solution 3 - even more unnecessary steps - still no boolean

Using tmp2 from the last solution.
Here 1 means True and 0 is False. (that´s better)

div = tmp2.clone()
div[div==0] = 1

tmp3 = -torch.div(tmp2, div) + 1
tensor([1., 1., 0., 1., 0., 0., 0., 1., 1.], grad_fn=<AddBackward0>)