tl;dr: I’m considering building a budget machine for tinkering with LLMs, but I’m not sure if this is a good idea and how to go about it.
For context: I work in a university department. I currently have access to a 2080 Ti on a shared machine, and we’re in the process of acquiring a small server with 2 L40 cards. So for any larger experiments, I will be able to use this shared machine.
However, I think I would like to have my own small machine for tinkering: trying different models and techniques, and just playing around, and preparing larger experiments to be run on the server. My focus is on teaching and education not on state-of-the-art research.
With aiming for a good amount of VRAM, the 4060 Ti 16GB seems to be the most obvious choice; I also like the low power requirements (regarding energy and cooling). But this card seems to have a poor reputation overall. I’m also not sure what currently the sweet spot w.r.t. the CPU and memory is – I completely lost track of Intel’s and AMD’s generations over the last years.
Regarding getting the most VRAM for the money, an AMD 7900 XTX 24GB would also be an interesting alternative. I know that PyTorch supports ROCm, but there still seems to be the consensus to stick with Nvidia for ML/AI, at least for the time being. Or what are the community’s opinion and experiences with ROCm in practice?