On MacOS I compiled lib torch for c++ and then I compiled sample code with built header and library and hit pytorch issue

yat · December 27, 2023, 5:33am

#include “torch/torch.h”

std::string kDataRoot{“/Users/xxx/exp/c++exp/modelcreate/build/data”};

// Define a new Module.
struct Net : torch::nn::Module {
Net() {
// Construct and register two Linear submodules.
fc1 = register_module(“fc1”, torch::nn::Linear(784, 64));
fc2 = register_module(“fc2”, torch::nn::Linear(64, 32));
fc3 = register_module(“fc3”, torch::nn::Linear(32, 10));
}

// Implement the Net’s algorithm.
torch::Tensor forward(torch::Tensor x) {
// Use one of many tensor manipulation functions.
x = torch::relu(fc1->forward(x.reshape({x.size(0), 784})));
x = torch::dropout(x, /p=/0.5, /train=/is_training());
x = torch::relu(fc2->forward(x));
x = torch::log_softmax(fc3->forward(x), /dim=/1);
return x;
}

// Use one of many “standard library” modules.
torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr};
};

int main() {
// Create a new Net.
auto net = std::make_shared();

// Create a multi-threaded data loader for the MNIST dataset.
auto data_loader = torch::data::make_data_loader(
torch::data::datasets::MNIST(kDataRoot).map(
torch::data::transforms::Stack<>()),
/batch_size=/64);

// Instantiate an SGD optimization algorithm to update our Net’s parameters.
torch::optim::SGD optimizer(net->parameters(), /lr=/0.01);

for (size_t epoch = 1; epoch <= 10; ++epoch) {

size_t batch_index = 0;
// Iterate the data loader to yield batches from the dataset.
for (auto& batch : *data_loader) {
// Reset gradients.
optimizer.zero_grad();
// Execute the model on the input data.
torch::Tensor prediction = net->forward(batch.data);
// Compute a loss value to judge the prediction of our model.
torch::Tensor loss = torch::nll_loss(prediction, batch.target);
// Compute gradients of the loss w.r.t. the parameters of our model.
loss.backward();
// Update the parameters based on the calculated gradients.
optimizer.step();
// Output the loss and checkpoint every 100 batches.
if (++batch_index % 100 == 0) {
std::cout << "Epoch: " << epoch << " | Batch: " << batch_index
<< " | Loss: " << loss.item() << std::endl;
// Serialize your model periodically as a checkpoint.
torch::save(net, “net.pt”);
}
}
}
}

then I run the code and hit following issue (after downloading mnist dataset):
libc++abi: terminating due to uncaught exception of type c10::Error: stream.read(reinterpret_cast<char*>(&value), sizeof value) INTERNAL ASSERT FAILED at “/Users/xxxx/exp/libtorch/pytorch/torch/csrc/api/src/data/datasets/mnist.cpp”:42, please report a bug to PyTorch.
Exception raised from read_int32 at /Users/xxxx/exp/libtorch/pytorch/torch/csrc/api/src/data/datasets/mnist.cpp:42 (most recent call first):
frame #0: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 188 (0x101167248 in libc10.dylib)
frame #1: c10::TypeError::~TypeError() + 0 (0x10b645794 in libtorch_cpu.dylib)
frame #2: torch::data::datasets::(anonymous namespace)::expect_int32(std::__1::basic_ifstream<char, std::__1::char_traits>&, unsigned int) + 164 (0x10fa95c04 in libtorch_cpu.dylib)
frame #3: torch::data::datasets::MNIST::MNIST(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, torch::data::datasets::MNIST::Mode) + 404 (0x10fa9508c in libtorch_cpu.dylib)
frame #4: main + 72 (0x100b16aac in model_create)
frame #5: start + 2360 (0x1839410e0 in dyld)

Abort trap: 6

Which is obviously indicating to open PyTorch bug. Any clue?

Versions

version is 2.3.0a0

albanD · December 27, 2023, 10:22am

Hey!

This is quite surprising indeed, it is failing to load the dataset. Are you sure the dataset downloaded properly and you have a valid archive locally?

yat · December 27, 2023, 1:02pm

yes dataset downloaded properly.

yat · December 28, 2023, 11:27pm

It is just to update that I downloaded the mnist dataset using python and it worked. Thanks for looking at this.