I hae been using two books earlier tensorflow and now switched to pytorch:
Machine Learning with PyTorch and Scikit-Learn: 978-1801819312
Tensorflow book is "Hands-on machine learning with skigit-learn keras & tensorflow 978-1-492-03264-9.
Both books have excellents coverage on all machine learning concepts starting from basic concepts to earlier linear models, cnn and up to rnn + working code examples.
But on more recent technologies i.e. attn, self-attn, it seems coverage is partial. Both book explains theoretically well however working code examples only goes partial and left there.
I am suspecting this is due to transformer with attn self-attn being recent development.
I am looking for any newer book or literature that has good code example that can construct fully functioning RNN with attn from start to end. I dont need theoretical explanation which is plenty.
I personally really like Harvard’s notebook called The Annotated Transformer. It has an explanation along with the code which makes it more hands-on and understanding at least for me.
This book has an entire chapter on the transformer and does computations step by step: https://www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-learning/dp/1801819319
thanks this looks promising, i will see about it and get back.
yes, that was what i was learning from and mentioned. I see it has code coverage up to mult-head attention as basis for transformer. After that it resorted to pip examples. However, I was hoping full fledged example not using pip libraries:
- input->tokenizer->embed->(various transformer implementation)->output.
- Simpler GPT/BERT implementation (or like implemenation) not using pip libraries + pretraining + finetuning.
But regardless, the earlier examples in the book without pip imports were massively insightful for understanding the internal workings of various training models.
Hi, Andrei, minGPT looks nice, however it is still using nn.module. I am literally looking for simple implementation of GPT like model totally hardcoded at least training parts (exception being embedding and PE using library). so that i can literally observe and graph the output of each layer at each epoch or even each sample while training samples are being fed. Ideally, the hardcoded implementation (not using library) of original attention model with 6 en/decoder (aka attention is all you need).