Short intro: I don’t know a lot about the math behind machine learning, but I think I understand the basics and the general idea of it. At the moment I am following along the fastai’s course/book and decided that I’d like to write a multiclass linear classification model from scratch as a learning excercise.
I have a working version of this simple model which classifies 28*28 pictures of handwritten digits from the MNIST dataset. The highest accuracy I’ve been able to achieve is ~58%. Here’s a link to it on Kaggle: Digits recognizer  Kaggle
I would like to ask a few questions in order to understand what I could improve on:

As I understand it, a model with random weights should have an accuracy of about 10%, so an accuracy of 58% should indicate that my model is doing something right and isn’t just blindly guessing, correct?

If my model is indeed working, how good is the accuracy of 58%? What is the reasonable accuracy one could expect from a model with a singular layer and no nonlinearities?

I have picked learning rates at random and chose the ones that gave good accuracy. What better way of picking learning rates would you suggest?
And finally, I would really appreciate it if someone were to look at my model and give some advice on how to improve it or my coding style. I am more than willing to provide some comments/explanations about my model in case my code is confusing.