TypeError: tensor is not JSON serializable

Hello,

In my training code I write the mean accuracy and the accuracy of each class to a json file, at every epoch. The code works well in general, except on one of my datasets, it generated an error:

TypeError: tensor(0.9238, device=‘cuda:0’, dtype=torch.float64) is not JSON serializable
json.dump(class_accuracies, outfile, cls=MyEncoder)
File “/usr/lib/python2.7/json/init.py”, line 189, in dump
for chunk in iterable:
File “/usr/lib/python2.7/json/encoder.py”, line 431, in _iterencode
for chunk in _iterencode_list(o, _current_indent_level):
File “/usr/lib/python2.7/json/encoder.py”, line 332, in _iterencode_list
for chunk in chunks:
File “/usr/lib/python2.7/json/encoder.py”, line 408, in _iterencode_dict
for chunk in chunks:
File “/usr/lib/python2.7/json/encoder.py”, line 442, in _iterencode
o = _default(o)
File “/root/computing/models/image_recognition/resnet/khue_lib/train_qopius.py”, line 62, in default
return super(MyEncoder, self).default(obj)
File “/usr/lib/python2.7/json/encoder.py”, line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: tensor(0.9238, device=‘cuda:0’, dtype=torch.float64) is not JSON serializable

What is strange is that this error only occurs on one dataset. I have using this code for a long time without any problems. My PyTorch version is 1.0 and the main part of the code is attached below.

Thank you very much for your help!

class_accuracies = []
best_acc = -1.0
for epoch in range(num_epochs):
    for phase in ['train', 'val']:
        print('Entering in phase:', phase)
        if phase == 'train':
            scheduler.step(best_acc)
            model.train()
        else:
            model.eval()

        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        print('Iterating over data:')
        n_samples = 0
        for batch_idx, (inputs, labels) in enumerate(dataloaders[phase]):
            inputs = inputs.to(device)
            labels = labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
            # track history if only in train
            with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()

            # statistics
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)
            n_samples += len(labels)

            if phase == 'train':
                print('{}/{} Avg Loss: {:.4f} Avg Acc: {:.4f}'.format(n_samples, dataset_sizes[phase],
                                        running_loss/n_samples, running_corrects.double()/n_samples))
        print('Done iterating over data')
        epoch_loss = running_loss / dataset_sizes[phase]
        epoch_acc = running_corrects.double() / dataset_sizes[phase]
        print('\t{} Loss: {:.4f}\t{} Acc: {:.4f}'.format(phase, epoch_loss, phase, epoch_acc))

    # Here we are at the end of the 'val' phase
    # check for improvement over the last epochs
    if epoch_acc > best_acc:
        print('Improved.')
        best_acc = epoch_acc
        try:
            state_dict = model.module.state_dict()
        except AttributeError:
            state_dict = model.state_dict()
        torch.save(state_dict, save_path)
        
        
    class_acc_dict = defaultdict(str)
    class_acc_dict['epoch'] = epoch
    class_acc_dict['accuracy'] = best_acc
    for i, class_name in enumerate(class_names):
        class_acc_dict[unicode(class_name).encode("utf-8")] = per_class_accuracy[i]
    
    class_accuracies.append(class_acc_dict)
    with open(class_accuracy_file, 'w') as outfile:
        json.dump(class_accuracies, outfile, cls=MyEncoder) 

Update: Problem solved by changing the code as follows:

if isinstance(per_class_accuracy[i], torch.Tensor):
    per_class_accuracy[i] = per_class_accuracy[i].cpu().numpy()

and

if isinstance(best_acc, torch.Tensor):
    best_acc = best_acc.cpu().numpy()

and

if isinstance(best_acc, torch.Tensor):
    class_acc_dict['accuracy'] = best_acc.cpu().numpy()
else:
    class_acc_dict['accuracy'] = best_acc

It still seems to me that this is a bug, since the problem occurred on only one dataset.

did you ever check this? did it work for you?

I wish there was a torch.to_json() function that takes in a dictionary and it works.

2 Likes

My understanding is that JSON is the one is in charge of serializing…so it has not way to know what it should be doing for json without torch ppl implementing its dict transformation method or something like that. Thats a guess.

Anyway, if you are only interested in saving the data as a json file so you can read it better later (since pickle files are not readable) you can do this:

#%%

def _to_json_dict_with_strings(dictionary):
    """
    Convert dict to dict with leafs only being strings. So it recursively makes keys to strings
    if they are not dictionaries.

    Use case:
        - saving dictionary of tensors (convert the tensors to strins!)
        - saving arguments from script (e.g. argparse) for it to be pretty

    e.g.

    """
    if type(dictionary) != dict:
        return str(dictionary)
    d = {k: _to_json_dict_with_strings(v) for k, v in dictionary.items()}
    return d

def to_json(dic):
    import types
    import argparse

    if type(dic) is dict:
        dic = dict(dic)
    else:
        dic = dic.__dict__
    return _to_json_dict_with_strings(dic)

def save_to_json_pretty(dic, path, mode='w', indent=4, sort_keys=True):
    import json

    with open(path, mode) as f:
        json.dump(to_json(dic), f, indent=indent, sort_keys=sort_keys)

def my_pprint(dic):
    """

    @param dic:
    @return:

    Note: this is not the same as pprint.
    """
    import json

    # make all keys strings recursively with their naitve str function
    dic = to_json(dic)
    # pretty print
    pretty_dic = json.dumps(dic, indent=4, sort_keys=True)
    print(pretty_dic)
    # print(json.dumps(dic, indent=4, sort_keys=True))
    # return pretty_dic

import torch
# import json  # results in non serializabe errors for torch.Tensors
from pprint import pprint

dic = {'x': torch.randn(1, 3), 'rec': {'y': torch.randn(1, 3)}}

my_pprint(dic)
pprint(dic)

output:

{
    "rec": {
        "y": "tensor([[-0.3137,  0.3138,  1.2894]])"
    },
    "x": "tensor([[-1.5909,  0.0516, -1.5445]])"
}
{'rec': {'y': tensor([[-0.3137,  0.3138,  1.2894]])},
 'x': tensor([[-1.5909,  0.0516, -1.5445]])}

note if you want to store the torch stuff/dict then just use torch.save it is much easier (but that is not readable).

For a related answer check: https://stackoverflow.com/a/66180687/1601580

1 Like