Make_data_loader() calls dataset's destructor multiple times

Hello,
I have a CustomDataset listed below with a destructor releases the memory allocated in the constructor.

class CustomDataset : public torch::data::Dataset<CustomDataset>{
public:
    CustomDataset() {
        testArray_ = new float[1024];
    };
    ~CustomDataset() override {
        if (testArray_) {
            delete testArray_;
            testArray_ = nullptr;
        }
    };

    torch::optional<size_t> size() const override;
    torch::data::Example<> get(size_t index) override;
private:
    float* testArray_;
};

When I register it to dataloader as below

    dataset = new CustomDataset();

    auto dataLoader = torch::data::make_data_loader<torch::data::samplers::SequentialSampler>(
            (*dataset).map(torch::data::transforms::Stack<>()),
            torch::data::DataLoaderOptions()
                    .batch_size(batchSize)
                    .workers(nWorkers)
                    .drop_last(true));//Shuffle by default

make_data_loader() appears to call ~CustomDataset() multiple times. CustomDataset() is only called once in the “new” call. Interestingly when ~CustomDataset() is called the second time, testArray_ value is exactly the same as in the first time. That triggers SIGABRT. Here is the call stack.

Thread 11 "cas" received signal SIGABRT, Aborted.
0x00007f01a3682387 in raise () from /lib64/libc.so.6
(gdb) where
#0  0x00007f01a3682387 in raise () from /lib64/libc.so.6
#1  0x00007f01a3683a78 in abort () from /lib64/libc.so.6
#2  0x00007f01a36c4ed7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f01a36cd299 in _int_free () from /lib64/libc.so.6
#4  0x00007f0174d03b0f in CustomDataset::~CustomDataset (this=0x7f0177369130)
    at .../dataset.hpp:31
#5  0x00007f0174da4247 in torch::data::datasets::BatchDataset<CustomDataset, std::vector<torch::data::Example<at::Tensor, at::Tensor>, std::allocator<torch::data::Example<at::Tensor, at::Tensor> > >, c10::ArrayRef<unsigned long> >::map<torch::data::transforms::Normalize<at::Tensor> >(torch::data::transforms::Normalize<at::Tensor>) & (
    this=0x7f01715d5b10, transform=...)
    at .../pytorch/torch/include/torch/csrc/api/include/torch/data/datasets/base.h:58

My questions are:

  1. Why is the destructor called (at least) twice and the constructor is called only once?
  2. Why the values of a member variable are the same in both destructor calls?
  3. Does this mean that CustomDataset class cannot implement destructor?

Thank you very much.

The first caller of the destructor is

(gdb) f 1
#1  0x00007f6bb426cddb in torch::data::datasets::map<CustomDataset, torch::data::transforms::Normalize<at::Tensor> > (dataset=..., transform=...)
    at .../pytorch/torch/include/torch/csrc/api/include/torch/data/datasets/map.h:112
112       return {std::move(dataset), std::move(transform)};
(gdb) list
107                   DatasetType::is_stateful,
108                   typename DatasetType::BatchType::value_type,
109                   typename DatasetType::BatchType>::type,
110               typename TransformType::InputBatchType>::value,
111           "BatchType type of dataset does not match input type of transform");
112       return {std::move(dataset), std::move(transform)};
113     }
114
115     } // namespace datasets
116     } // namespace data

The second caller of the destructor is

#1  0x00007f6bb4257247 in torch::data::datasets::BatchDataset<CustomDataset, std::vector<torch::data::Example<at::Tensor, at::Tensor>, std::allocator<torch::data::Example<at::Tensor, at::Tensor> > >, c10::ArrayRef<unsigned long> >::map<torch::data::transforms::Normalize<at::Tensor> >(torch::data::transforms::Normalize<at::Tensor>) & (
    this=0x7f6bb15cc4c0, transform=...)
    at .../pytorch/torch/include/torch/csrc/api/include/torch/data/datasets/base.h:58
58          return datasets::map(static_cast<Self&>(*this), std::move(transform));
(gdb) list
53        virtual optional<size_t> size() const = 0;
54
55        /// Creates a `MapDataset` that applies the given `transform` to this dataset.
56        template <typename TransformType>
57        MapDataset<Self, TransformType> map(TransformType transform) & {
58          return datasets::map(static_cast<Self&>(*this), std::move(transform));
59        }
60
61        /// Creates a `MapDataset` that applies the given `transform` to this dataset.
62        template <typename TransformType>

Any idea on my questions? Thank you for your suggestions.

CC @yf225 who might know, what’s going on.