Custom Example template in torch Dataset

lampadephoria · September 30, 2020, 3:36am

In the document,

get() return type is torch::data::Example<>. Example is a template with default types of 2 torch::Tensor.
template<typename Data = Tensor, typename Target` = Tensor>

Is there a way to customize the types? I would like get() to return
torch::data::Example<torch::Tensor, std::string>. If in my custom dataset, I define my get() as

torch::data::Example<torch::Tensor, std::string> get(size_t index) override;

the compiler complains

error: return type is neither identical to nor covariant with return type "torch::data::Example<at::Tensor, at::Tensor>" of overridden virtual function "torch::data::datasets::Dataset<Self, SingleExample>::get [with Self=CustomDataset, SingleExample=torch::data::Example<at::Tensor, at::Tensor>]"
      torch::data::Example<torch::Tensor, std::string> get(size_t index) override;

ptrblck · October 1, 2020, 3:30am

Maybe you could derive a custom class from here and change the types?
CC @yf225 who would know how to override the types properly

lampadephoria · October 4, 2020, 3:00am

Thank you for the suggestion, @ptrblck.
Here is my CustomExample following the first example in this link you suggested.

template <typename Data = torch::Tensor, typename Path = std::string>
struct CustomExample {
    using DataType = Data;
    using PathType = Path;

    CustomExample() = default;
    CustomExample(Data data, Path path)
        : data(std::move(data)), path(std::move(path)) {}

    operator Data&() { return data; }
    operator const Data&() const { return data; }

    operator Path&() { return path; }
    operator const Path&() const { return path; }

    Data data;
    Path path;
};

When I replace, however,

torch::data::Example<> get(size_t index) override;

with

CustomExample<> get(size_t index) override;

I got below errors:

.../dataset.hpp(76): error: return type is neither identical to nor covariant with return type "torch::data::Example<at::Tensor, at::Tensor>" of overridden virtual function "torch::data::datasets::Dataset<Self, SingleExample>::get [with Self=sas::dlx::CustomDataset, SingleExample=torch::data::Example<at::Tensor, at::Tensor>]"
      CustomExample<> get(size_t index) override;
                      ^

.../dataset.hpp(76): error: member function declared with "override" does not override a base class member
      CustomExample<> get(size_t index) override;

Where my get() is part of

class CustomDataset : public torch::data::Dataset<CustomDataset>{...}

I guess the base class torch::data::Dataset<> has the preset get() member signature so that any inherited class has to maintain the same get() signature. If this is true, is there a way to register a different “get()” function to the torch dataloader?

AntMorais · March 10, 2021, 3:51pm

Hi @lampadephoria, did you manage to solve this issue? I have a similar problem, except I want to create a custom class with types Data, Target and Path.

glaringlee · March 10, 2021, 5:07pm

@lampadephoria, I think you should remove override since there is no get function in parent class for you to override.
Take a look at this example:
‘examples/custom-dataset.cpp at master · pytorch/examples · GitHub’

Example class is there:
‘pytorch/example.h at master · pytorch/pytorch · GitHub’

AntMorais · March 12, 2021, 10:13am

@glaringlee, I am trying to return the path of the images in the batch (while training). In order to do that I created a torch::data::PathDataset class with the 2 types in the torch::data::Example class, plus a std::string ImgPath. I followed your suggestion and removed override from the get function. Still I had the same error as @lampadephoria, which I fixed by creating a torch::data:datasets::CustomDataset class.

using Data = std::vector<std::pair<std::string, long>>;


namespace torch {
namespace data {


/// A dataset consists of data, an associated target (label) and the associated path.
template <typename Output = torch::Tensor, typename Target = torch::Tensor, typename ImgPath = std::string>
struct PathDataset {
    using DataType = Output;
    using TargetType = Target;
    using ImgPathType = ImgPath;

    PathDataset() = default;
    PathDataset(Output data, Target target, ImgPath img_path) : data(std::move(data)), target(std::move(target)), img_path(std::move(img_path)) {}


    Output data;
    Target target;
    ImgPath img_path;
};


} // data
} // torch


namespace torch{
namespace data{
namespace datasets{

template <typename Self, typename SingleExample = PathDataset<>>
class CustomDataset : public BatchDataset<Self, std::vector<SingleExample>> {
    public:
        using ExampleType = SingleExample;

    /// Returns the example at the given index.
    virtual ExampleType get(size_t index) = 0;
    /// Returns a batch of data.
    /// The default implementation calls `get()` for every requested index
    /// in the batch.
    std::vector<ExampleType> get_batch(ArrayRef<size_t> indices) override {
    std::vector<ExampleType> batch;
    batch.reserve(indices.size());
    for (const auto i : indices) {
        batch.push_back(get(i));
    }
    return batch;
    }
};

}
}
}


class CustomDataset : public torch::data::datasets::CustomDataset<CustomDataset>
{   

    private:
        Data data;
        Options options;
    public:
        explicit CustomDataset(const Data& data) : data(data) {};

        torch::data::PathDataset<> get(size_t index)  {
            std::string path = options.datasetPath + data[index].first;
            auto mat = cv::imread(path);
            assert(!mat.empty());

            cv::resize(mat, mat, cv::Size(options.image_size, options.image_size));
            std::vector<cv::Mat> channels(1);
            cv::split(mat, channels);

            // this is for 1 channel images (e.g. mnist)
            auto pixel_values_tensor = torch::from_blob(
                channels[0].ptr(),
                {options.image_size, options.image_size},
                torch::kUInt8
                );

            auto tdata = pixel_values_tensor
                .view({1, options.image_size, options.image_size})
                .to(torch::kFloat);
            
            auto tlabel = torch::from_blob(&data[index].second, {1}, torch::kLong);
            return {tdata, tlabel, path};
        };

        torch::optional<size_t> size() const {
            return data.size();
        }
        
};

No errors here, but when I try to create a training set:

auto train_set = CustomDataset(data.first).map(torch::data::transforms::Stack<>());

I have the following:

error: static assertion failed: BatchType type of dataset does not match input type of transform
[build]   110 |           typename TransformType::InputBatchType>::value,
[build]       |                                                    ^~~~~

and

error: cannot convert ‘vector<torch::data::PathDataset<>,allocator<torch::data::PathDataset<>>>’ to ‘vector<torch::data::Example<>,allocator<torch::data::Example<>>>’
[build]    75 |     return transform_.apply_batch(dataset_.get_batch(std::move(indices)));
[build]       |                                                                         ^

I would appreciate it if you could help me or direct me to an example in the documentation.