RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. Can't use GPU with tacotron2

Hasan_Khan · August 23, 2021, 8:57am

Shifting CUDA to CPU for Inferencing

I am trying to generate inference results of my trained Text-to-Speech Tacotron2 model on CPU. However, initially the model provide inferencing on GPU but due to the non-availability of GPU I am transferring to CPU device. I have made the required changes like map_location = torch.device('cpu')

CUDA to CPU Inferencing

I am trying to generate inference results of my trained Text-to-Speech Tacotron2 model on CPU. However, initially the model provide inferencing on GPU but due to the non-availability of GPU I am transferring to CPU device. I have made the required changes like map_location = torch.device('cpu').

Still the error is not resolved. Please help me understand the issue and get the error resolved. Thanks!!

ptrblck · August 23, 2021, 9:08am

I cannot reproduce the issue, as the torch.hub.load method would already load the model as described in the docs:

>>> import torch
>>> waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32')
Downloading: "https://github.com/NVIDIA/DeepLearningExamples/archive/torchhub.zip" to /root/.cache/torch/hub/torchhub.zip
Downloading checkpoint from https://api.ngc.nvidia.com/v2/models/nvidia/waveglow_ckpt_fp32/versions/19.09.0/files/nvidia_waveglowpyt_fp32_20190427
>>> waveglow
WaveGlow(
  (upsample): ConvTranspose1d(80, 80, kernel_size=(1024,), stride=(256,))
  (WN): ModuleList(
    (0): WN(
      (in_layers): ModuleList(
        (0): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(1,))
        (1): Conv1d(512, 1024, kernel_size=(3,), stride=(1,), padding=(2,), dilation=(2,))
   [...]

Calling torch.load afterwards on it yields the expected seek error:

>>> torch.load(waveglow, map_location='cpu')

AttributeError: 'WaveGlow' object has no attribute 'seek'

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier and allows to index the code for a better search.

Hasan_Khan · August 23, 2021, 9:26am

Is it capable of shifting to CPU from GPU ? Also, what is this seek error indicating?

ptrblck · August 23, 2021, 9:28am

torch.hub.load loads the model onto the CPU so there is no need to push it to the CPU again.

The seek error is raised, since waveglow is an object of the nn.Module type and not a file (which is seek’able).

stupid-doge · April 25, 2023, 6:23pm

actually, I meet the same issues. I am using pretrained model from slow_r50. In the training process, I fine tune the pre-trained model, then save the paramters as ‘ResNet18_best.pth’. Thus, in the testing file, I am trying to load the model structure, then use torch.load to load the best parameters. However, I meet the same issue, Could you help me out? Thanks!

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
from PIL import Image
from dataset import MyDataset
from models import resnet18


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


transforms = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) 
])


# test_dataset = MyDataset("/hw3_16fpv", "test_for_student.csv", stage="test",ratio=0.2,transform=transforms)
test_dataset = MyDataset("/hw3_16fpv", "test_for_student.csv", stage="test",ratio=0.2,transform=transforms)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=16)
print(len(test_loader))


# net = resnet18(num_classes=10, sample_size=224, sample_duration=16).to(device)
# use the pretrained model
net = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True).to(device)
# replace the last layers
net.proj = nn.Linear(in_features=2048, out_features=10, bias=True)
net.load_state_dict(torch.load('ResNet18_best.pth'))



net.eval()
result = []
with torch.no_grad():
    for data in test_loader:
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = net(inputs)
        _, predicted = torch.max(outputs.data, 1)
        result.extend(predicted.cpu().numpy())
        
fread = open("test_for_student.label", "r")
video_ids = []
for line in fread.readlines():
    video_id = os.path.splitext(line.strip())[0]
    video_ids.append(video_id)



with open('result_ResNet18_3D.csv', "w") as f:
    f.writelines("Id,Category\n")
    for i, pred_class in enumerate(result):
        f.writelines("%s,%d\n" % (video_ids[i], pred_class))

The error is:

Using cache found in /home/kaiz/.cache/torch/hub/facebookresearch_pytorchvideo_main
Traceback (most recent call last):
  File "/home/kaiz/hw3/test2csv.py", line 35, in <module>
    net.load_state_dict(torch.load('ResNet18_best.pth'))
  File "/opt/miniconda3/envs/5032/lib/python3.9/site-packages/torch/serialization.py", line 789, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/miniconda3/envs/5032/lib/python3.9/site-packages/torch/serialization.py", line 1131, in _load
    result = unpickler.load()
  File "/opt/miniconda3/envs/5032/lib/python3.9/site-packages/torch/serialization.py", line 1101, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/opt/miniconda3/envs/5032/lib/python3.9/site-packages/torch/serialization.py", line 1083, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/opt/miniconda3/envs/5032/lib/python3.9/site-packages/torch/serialization.py", line 215, in default_restore_location
    result = fn(storage, location)
  File "/opt/miniconda3/envs/5032/lib/python3.9/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/opt/miniconda3/envs/5032/lib/python3.9/site-packages/torch/serialization.py", line 166, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

ptrblck · April 25, 2023, 6:44pm

Did you try to use the suggested map_location=torch.device('cpu') in torch.load to map the state_dict to the CPU as it seems your system doesn’t have a GPU to move the data to?

stupid-doge · April 25, 2023, 7:07pm

Definitely, the cud’s in the training process’s is true… however, in this test file, sometimes Duda is false.
Thus, when I try to use cpu directly for the whole python file, still not work

ptrblck · April 25, 2023, 10:11pm

It’s unexpected that CUDA is sometimes available and sometimes not. Did you see this issue before and does your GPU work afterwards? Also, is this specific to one environment or a general issue on your machine?

stupid-doge · April 26, 2023, 3:48am

I always used one environment… actually, I never met this before. I think it is caused by torch.hub.load

Or do you have any idea about how to see the source code, and copy and paste the model structure, then upload the parameters I trained.

Vasil_Popov · July 9, 2023, 6:12am

Absolutely the same situation and problem, however I’m trying to test super - resolution model.