EXC_BAD_ACCESS when calling GRUCell forward

I am currently trying to train a GRU neural network using the C++ API.

I built PyTorch from source (commit: b3b333205fb0362d1f303a4c1054592c5fa4aea3), and I am linking it to my codebase.

I have a simple piece of code that calls the GRUCell forward function:

torch::Tensor out = _gru->forward(torch::Tensor(x));
return out;

however, the code sporadically crashes and returns the following backtrace:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x106100000)                                             
  * frame #0: 0x000000018427691c libBLAS.dylib`___lldb_unnamed_symbol2152 + 528                                                                      
    frame #1: 0x0000000184293070 libBLAS.dylib`___lldb_unnamed_symbol2240 + 396                                                                      
    frame #2: 0x0000000184275bd0 libBLAS.dylib`___lldb_unnamed_symbol2145 + 708                                                                      
    frame #3: 0x00000001842cc014 libBLAS.dylib`___lldb_unnamed_symbol2470 + 152                                                                      
    frame #4: 0x00000001844c7e10 libBLAS.dylib`cblas_dgemm + 1580                                                                                    
    frame #5: 0x00000001841eea00 libBLAS.dylib`DGEMM + 248                                                                                           
    frame #6: 0x000000010df0b1bc libtorch_cpu.dylib`at::native::cpublas::gemm(at::native::TransposeType, at::native::TransposeType, long long, long l
ong, long long, double, double const*, long long, double const*, long long, double, double*, long long) + 424                              
    frame #7: 0x000000010e00ed00 libtorch_cpu.dylib`at::native::addmm_impl_cpu_(at::Tensor&, at::Tensor const&, at::Tensor, at::Tensor, c10::Scalar c
onst&, c10::Scalar const&) + 6292                                                                                                                    
    frame #8: 0x000000010e00d1d8 libtorch_cpu.dylib`at::native::structured_addmm_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::Tensor const
&, c10::Scalar const&, c10::Scalar const&, at::Tensor const&) + 320                                                                                  
    frame #9: 0x000000010ee0ef34 libtorch_cpu.dylib`c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileT
imeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &at::(anonymous name
space)::wrapper_CPU_addmm(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>, at::Tensor, c10::guts::t
ypelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, 
at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at
::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&) + 136                                                                    
    frame #10: 0x00000001102db5c8 libtorch_cpu.dylib`c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::Compile
TimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)
, &torch::autograd::VariableType::(anonymous namespace)::addmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Sca
lar const&, c10::Scalar const&)>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor con
st&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar
 const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar c
onst&, c10::Scalar const&) + 3164                                                                                                                    
    frame #11: 0x000000010e670488 libtorch_cpu.dylib`at::_ops::addmm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const
&, c10::Scalar const&) + 324
    frame #12: 0x000000010dfe9cf4 libtorch_cpu.dylib`at::native::linear(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&) + 216
    frame #13: 0x000000010e643ea8 libtorch_cpu.dylib`at::_ops::linear::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&) + 312
    frame #14: 0x000000010e0ff770 libtorch_cpu.dylib`at::native::(anonymous namespace)::CellParams::linear_ih(at::Tensor const&) const + 84
    frame #15: 0x000000010e0f5490 libtorch_cpu.dylib`at::native::(anonymous namespace)::GRUCell<at::native::(anonymous namespace)::CellParams>::operator()(at::Tensor const&, at::Tensor const&, at::native::(anonymous namespace)::CellParams const&, bool) const + 1136
    frame #16: 0x000000010e0f4e10 libtorch_cpu.dylib`at::native::gru_cell(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&) + 732
    frame #17: 0x000000010e9b5374 libtorch_cpu.dylib`at::_ops::gru_cell::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::optional<at::Tensor> const&) + 384
    frame #18: 0x0000000111cc4bac libtorch_cpu.dylib`torch::nn::GRUCellImpl::forward(at::Tensor const&, at::Tensor) + 224
    frame #19: 0x0000000100257e74 test`GRULayer::forward(at::Tensor) + 112
    frame #20: 0x000000010028ae6c test`torch::nn::AnyValue torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::InvokeForward::operator()<at::Tensor>(at::Tensor&&) + 76
    frame #21: 0x000000010028ae14 test`torch::nn::AnyValue torch::unpack<torch::nn::AnyValue, at::Tensor, torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::InvokeForward, torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::CheckedGetter, 0ul>(torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::InvokeForward, torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::CheckedGetter, torch::Indices<0ul>) + 64
    frame #22: 0x000000010028adc8 test`torch::nn::AnyValue torch::unpack<torch::nn::AnyValue, at::Tensor, torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::InvokeForward, torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::CheckedGetter>(torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::InvokeForward, torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::CheckedGetter) + 56
    frame #23: 0x000000010028abc0 test`torch::nn::AnyModuleHolder<GRULayer, at::Tensor>::forward(std::__1::vector<torch::nn::AnyValue, std::__1::allocator<torch::nn::AnyValue> >&&) + 1080
    frame #24: 0x0000000100287120 test`torch::nn::AnyValue torch::nn::AnyModule::any_forward<at::Tensor&>(at::Tensor&) + 200
    frame #25: 0x0000000100273668 test`at::Tensor torch::nn::SequentialImpl::forward<at::Tensor, at::Tensor&>(at::Tensor&) + 140
    frame #26: 0x00000001002735d0 test`TorchNetwork::forward(at::Tensor) + 52
    frame #27: 0x00000001000bdcc8 test`RNNClassifier::train_impl(DataSet const&) + 1188
    frame #28: 0x00000001000ea974 test`ADAlgorithm::train(DataSet const&) + 44
    frame #29: 0x000000010001a0c0 test`RNNClassifier_SLStagingData_Test::TestBody() + 656
    frame #30: 0x0000000100368420 test`void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 132
    frame #31: 0x0000000100330540 test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 96
    frame #32: 0x0000000100330490 test`testing::Test::Run() + 192
    frame #33: 0x00000001003313cc test`testing::TestInfo::Run() + 304
    frame #34: 0x00000001003324f0 test`testing::TestSuite::Run() + 328
    frame #35: 0x000000010033feb0 test`testing::internal::UnitTestImpl::RunAllTests() + 1060
    frame #36: 0x0000000100368ca8 test`bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 132
    frame #37: 0x000000010033f850 test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 96
    frame #38: 0x000000010033f73c test`testing::UnitTest::Run() + 216
    frame #39: 0x0000000100033400 test`RUN_ALL_TESTS() + 16
    frame #40: 0x00000001000333c4 test`main + 96
    frame #41: 0x00000001835a3f28 dyld`start + 2236

I am using an Apple M2 chip.

Does anybody know what could be the issue?

An additional piece of information that might be useful is that before I perform a backtrace, this is the error message out of lldb:

Process 43364 stopped                                                                                                                                
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x117500000)                                             
    frame #0: 0x00000001842769e0 libBLAS.dylib`___lldb_unnamed_symbol2152 + 724                                                                      
libBLAS.dylib`___lldb_unnamed_symbol2152:                                                                                                            
->  0x1842769e0 <+724>: .long  0x00201088                ; unknown opcode                                                                            
    0x1842769e4 <+728>: add    x8, x1, x2, lsl #4                                                                                                    
    0x1842769e8 <+732>: mov    x9, #0x1700000000000000                                                                                               
    0x1842769ec <+736>: add    x8, x8, x9                                                                                                            
Target 0: (test) stopped. 

This states that there is an unknown opcode.
I know very little about low level programming but this makes me think that the code generation part of PyTorch is producing unknown machine codes for the M2 chip.