How do I learn PyTorch… in a Deeper manner?

GrandArth · May 5, 2024, 8:39am

So i been using PyTorch for experiments in the past two years.
I can get others’ implement up and running in a reasonable time frame.
I know how to deal with data loading, inferencing, logging and can implement something like vanilla transformer, CycleGan, etc. from scratch.
But I always have a feeling that i am playing in shallow water, I know how to get it up and running but nothing else.
As someone who is currently and will be studying in Deep learning field (for at least 3 more years) i wish to better understand and use PyTorch.

RN I just dig into some keywords about PyTorch here and there, but i wonder is there anything like “Fluent Python” for python to better guide me in this deeper learning process?

Welcome all advice in general.

KFrank · May 5, 2024, 9:58pm

Hi Grand!

There’s a lot of ways to go about this, and I’m sure that others will have a variety
of suggestions. I’m a fundamentals kind of guy, so my advice will have its focus on
the basics as its theme.

General programming and python programming are obviously a requirement. You
can of course learn them as you are learning pytorch, but don’t neglect them. As
you are working with pytorch, make a point of continuing to sharpen your general
and python programming skills.

Study the details of pytorch’s Tensors – these are the workhorses of pytorch.
Experiment with some simple individual tensors. Use the python feature id(),
type(), type().mro(), and dir() to see some of what’s going on under the
hood.

Use pytorch features to probe tensors. For example, you could try:

t = torch.arange (6).byte().reshape (2, 3)
tt = t.T
ts = t[1]
t.is_contiguous()
tt.is_contiguous()
ts.is_contiguous()
t.stride()
tt.stride()
ts.stride()
t.storage_offset()
tt.storage_offset()
ts.storage_offset()
t.untyped_storage()
tt.untyped_storage()
ts.untyped_storage()

In general, drill down from time to time to see what pytorch is actually doing. Pick
your spots – you can’t do everything – but look for opportunities to explore the
details.

Study the various ways you can “restructure” tensors – things like view(),
reshape(), permute(), and indexing and slicing of tensors, including “advanced
tensor indexing.”

[Edit] Understand how broadcasting works and how to use squeeze() and
unsqueeze() to align tensor dimensions properly. Look at how matmul() performs
different kinds of tensor-tensor multiplication depending on the dimensions of its
input tensors. Note that matmul() subsumes things like mm(), dot(), and bmm().
I generally use matmul() (which can be written with the infix operator @), but you
might prefer to use one of the more specialized tensor-multiplication functions if you
feel that it helps stylistically to make your code more readable. Study how einsum()
performs “contractions” over multiple tensors – it’s the “swiss-army knife” for
performing more-or-less any tensor-tensor “multiplication,” including contractions
over more than two tensors.

Tensor operations, autograd, and the gradient-descent-based optimizers are how
pytorch trains models.

Use pytorch tensor operations – but no autograd nor built-in optimizers – to find
the minimum of a quartic polynomial using gradient descent. Then do it again with
autograd and a built-in optimizer.

Compute the gradient of something like:

torch.nn.Sequential (
    torch.nn.Linear (2, 2), torch.nn.ReLU(), torch.nn.Linear (2, 1)
)

“by hand” (using pytorch tensor operations) and compare your result with that
produced by autograd (that is, by calling .backward()).

Compute a plain-vanilla gradient-descent step for some simple model and
compare it to the result given by SGD.

Implement your own version of Linear as a custom module, together with its
appropriate forward() and backward() methods, and compare its results with
those of pytorch’s Linear.

Experiment with some of the basic loss functions such as CrossEntropyLoss
and MSELoss. Consider implementing some of your own versions just for fun.

Use pytorch (autograd, appropriate loss function, and an optimizer) to perform a
linear and/or quadratic regression of some (potentially synthetically generated)
dataset.

Learn what requires_grad = True does, what leaf variables are (look at a
tensor’s .is_leaf property), how autograd’s “computation graph” is built and
works, what an “inplace modification” is, and track a tensor’s ._version property
as you modify that tensor inplace a couple of times.

These are the basic building blocks of how pytorch trains models. If you’re a
fundamentals kind of guy, you will want to understand all of the above in some
detail. If you do, you’ll have a solid foundation.

Pytorch has a lot of stuff in it. So after this, you will have to pick and choose which
pieces you want to learn in greater depth. Perhaps you’re applying pytorch to a
particular task and you’re running some task-specific code that you found on the
internet. If something looks fishy to you – or just looks interesting – use that as an
excuse to drill down into it along the lines described above for exploring the basics.

Even if you don’t have a reason to use any of pytorch’s “built-in” models (for
example, things like resnet18) for any of your real use cases, it would still be
worthwhile to experiment with some of them. This should give you exposure to
(hopefully) best practices for building more complicated models. Look at the
architecture of a built-in model and see if you can modify it – for example, modifying
or adding a layer or changing an activation – by performing “surgery” on the model
after its been instantiated, rather than editing the source code or rewriting it from
scratch. This will help you learn how the pieces of the model are hooked together
and how to use the run-time flexibility of python (and pytorch) to your advantage.

Good luck!

K. Frank

GrandArth · May 7, 2024, 1:11am

Thx Frank! I really appreciate this!