Problem about the function torch.index_select(input, dim, index, out=None)

The selection is based on the scores of frames. So, if the threshold is enough low, the number of selected frames could be larger than 1.

“So, if the threshold is enough low” → I cannot understand this section.

The situation is probably like below.

The decoder of FCSN consists of several temporal deconvolution operations which produces a vector of prediction scores with the same length as the input video. Each score indicates the likelihood of the corresponding frame being a key frame or non-key frame. Based on these scores, we select k key frames to form the predicted summary video.

Is the score described in the paper the probability that you use?

The output (in the shape of [1,2,1,T]) of the FCSN is the score indicating whether to choose the frame or not. The scores actually are not limited to [0,1]. They can be any real number, because they do not add any activation function before FCSN output.

Thank you very much, Tony-Y.

I think there are several approaches:

  1. If Score(key) > Score(not) then it is a key frame.

  2. If Score(key) > threshold then it is a key frame.

etc.

By the way,

summary = x_select+h_select
summary = self.relu_summary(summary)

summary should resemble x_select, but when summary equals to x_select, h_select is vanished.

I have done the first approach you mentioned in below

Thank you very much, Tony-Y.

I have reached the maximum number of replies a new user can create.
Shall we go to Github? I have created an issue

>>> import torch
>>> h = torch.randn(1,2,1,5, requires_grad=True)
>>> h.max(1, keepdim=True)
torch.return_types.max(
values=tensor([[[[-0.2387,  0.0180,  1.2485, -0.1497, -1.0964]]]],
       grad_fn=<MaxBackward0>),
indices=tensor([[[[1, 0, 1, 0, 1]]]]))

The tensor indices also is non-differentiable.

I got the Basic badge 21 minutes ago.

I will try your suggestion in github.

But how to implement “index_mask = sigmoid( Score(key) -Score(not) )” in the output [1,2,1,T] tensor of the FCSN output?

Thank you very much, Tony-Y.

>>> import torch
>>> h = torch.randn(1,2,1,5)
>>> index_mask = torch.sigmoid( h[:,0] - h[:,1] )
>>> index_mask
tensor([[[0.7338, 0.7272, 0.6340, 0.0496, 0.2830]]])

But the process “h[:,0], h[:,1]” are just like index_select, can they be diff. ?

Thank you very much, Tony-Y.

It is a slice of the tensor.

Although we get index_mask which is diff., how can we calculate reconstruction loss and diversity loss with this index_mask?

My problem is:

  • supposed index_mask=tensor([[[0.7338, -0.7272, 0.6340, -0.0496, 0.2830]]]), how can I convert index_mask to tensor([[[1, 0, 1, 0, 1]]])? and the process of conversion need to be diff.

No component of index_mask is negative because you use the sigmoid transformation of score.

Now, any component of index_mask is a real value in the range [0,1]. If you convert it to an integer value, there is no way to differentiate it.

I have followed your idea but the loss is the same quite weird.

Thank you very much, Tony-Y.

Your code is inconsistent with my proposed approach:

By the way, can you reproduce the results of SUM-FCN?

I revised my reconst. and div. loss

Reproduction of SUM-FCN is FCSN.py file.

Thank you very much, Tony-Y.

Is your results of SUM-FCN consistent with the original results?