Difficulty Replicating Sci-kit Learn GP in GPyTorch

I am currently trying to transition some code from Sci-kit Learn to GPyTorch but I am having difficulty achieving the same accuracy as well as the same computatioanl time.

My data has N=300 observations with D=6 input variables and M=100 output variables, therefore I have a NxD set of inputs and NxM outputs. In Sci-kit Learn I can train a GP to handle this multioutput problem with relative ease using the follow code:

model = GaussianProcessRegressor(kernel=Matern(length_scale=6*[1],nu=2.5))

model.fit(X_train, Y_train)

and I have tried to replicate this in GPyTorch with the following code:

class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.MultitaskMean(
            gpytorch.means.ConstantMean(), num_tasks=100
        )
        self.covar_module = gpytorch.kernels.MultitaskKernel(
            gpytorch.kernels.MaternKernel(nu=2.5, nard_num_dims=6), num_tasks=100,
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)


likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=100)
model = MultitaskGPModel(X_train, Y_train, likelihood)

Ideally, I would want the kernel and mean to be shared across the independent GPs, which I believe is how this is constructed in Sci-kit Learn. However, the GPyTorch model takes significantly longer to train, does not reach the same level of accuracy and additionally crashes when in eval mode and making predictions.

I would greatly appreciate any help replicating the Sci-kit Learn model in GPyTorch.