THCudaCheck FAIL on access to tensor element

I have GTX1080 with 8G onboard


>   device = torch.device("cuda")
>   Tensor=torch.cuda.HalfTensor(30*1000*1000,128)
 Here ok, it less than 8G
>   out=torch.mv(Tensor,Tensor[0])
Still ok, used memory still less than 8G
>  print(out[0])

And here Cuda FAIL on print(!!) (actually on access to out[0]):

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorCopy.cpp line=70 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
File “test_pytorch.py”, line 112, in
main()
File “test_pytorch.py”, line 26, in main
print(out[0])
File “/home/integral/.local/lib/python3.5/site-packages/torch/tensor.py”, line 57, in repr
return torch._tensor_str._str(self)
File “/home/integral/.local/lib/python3.5/site-packages/torch/_tensor_str.py”, line 256, in _str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/home/integral/.local/lib/python3.5/site-packages/torch/tensor_str.py", line 82, in init
copy = torch.empty(tensor.size(), dtype=torch.float64).copy
(tensor).view(tensor.nelement())
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:70

Tried to copy tensor “out” to CPU - same error.

Error raises if I occupy more than 1/2 of GPU memory.
with tensor size 16m or less no error arises.
Tensor=torch.cuda.HalfTensor(16*1000*1000,128)

Next.

I can create 2 Tensors of 15m vectors (totaly about ~8G memory, i.e. memory consumption near to 100%)
Then I can make torch.mv on both and combine result.
No errors this case

Same behaviour for Torch7