PyTorch version: 1.9.0

`torch.nn.functional.cosine_similarity`

got unexpected behavier with torch.int8 dtype.

```
>>> a = torch.randn(10)*100
>>> a
tensor([ 47.8993, -27.2694, 20.9548, -13.5573, -54.0388, -76.8524, -7.4037,
-95.7477, 108.9276, -44.5625])
>>> b = a
>>> F.cosine_similarity(a, b, dim=-1)
tensor(1.)
>>> F.cosine_similarity(a.to(torch.int8), b, dim=-1)
tensor(3.5215e+12)
>>> F.cosine_similarity(a.to(torch.int16), b, dim=-1)
tensor(1.0000)
```

As we can see, `F.cosine_similarity(a.to(torch.int8), b, dim=-1)`

got result of `tensor(3.5215e+12)`

, while the expected result is `tensor(1.)`

.