Hello,

I am trying to train a network in a similar was as proposed in https://arxiv.org/abs/1703.05175 (prototypical networks).

I think I have either misunderstood the paper or something strange is happening with my gradients.

Each mini-batch in my network contains (10-32) datapoints / classes to classify, and 3 examples from each class to compute class ‘centres’.

Lets say that there are always 10 classes. Then I compute cosine similarity between each of the 10 datapoints and 10 centres and use a softmax to normalize.

The labels for each mini-batch are always 1-10. Network is trained by minimizing NLL. My loss computes fine, but when I call backward() on the loss, I get an error : **none of the leaf nodes require gradients**

To check if I was doing something really silly, If I feed ‘features’ (in code snippet below to a linear layer with 10 outputs, and feed that to my loss, then backward works fine. So I guess something is going wrong when I compute the cosine distances. I am doing this within the forward pass:

```
features = self.fc(x)
153
154 enroll_embs = features[:enr,:]
155 test_embs = features[enr:,:]
156
157 output = self.output(enroll_embs) #for sanity check
158
159 speaker_centers = torch.Tensor(enroll_embs.size(0),2048)
160
161 ptr=0
162 for ind in range(enroll_embs.size(0)):
163 center = test_embs[ptr:ptr+3,:]
164 center = torch.mean(center,0)
165 speaker_centers[ind,:] = center.data
166 ptr+=3
167
168 speaker_centers = Variable(speaker_centers.cuda())
169 cosine_scores = Variable(torch.Tensor(enroll_embs.size(0),32).cuda())
170
171 for enr in enroll_embs:
172 rep_enr = enr.repeat(enroll_embs.size(0),1)
173 cosine_score = F.cosine_similarity(rep_enr,speaker_centers)
174 #normalized_scores = F.log_softmax(cosine_scores)
175 cosine_scores[ind,:] = F.log_softmax(cosine_score.data)
176
177
178 return cosine_scores,output
```

Any advice would be most appreciated.

Thanks