Resources
Some resources I’ve found interesting or useful (all are free)
- Eugene Yan: List of Applied ML Papers/tech blogs on ML in production
- Eugene Yan: Language Modeling Reading List (to Start Your Paper Club)
- How Prototyping Can Help You to Get Buy-In
- Chip Huyen: Building LLM applications for production
- Chip Huyen: Llama Police – A dashboard of Open Source LLM Tools
- Eugene Yan’s List of Open-source LLMs
- Erik Linder-Norén: Machine Learning From Scratch: Python implementations of fundamental algorithms from scratch
- Stas Bekman - Machine Learning Engineering
- Excerpt:
Tell how many GPUs do you need in 5 secs
- Training in half mixed-precision: model_size_in_B * 18 * 1.25 / gpu_size_in_GB
- Inference in half precision: model_size_in_B * 2 * 1.25 / gpu_size_in_GB
That’s the minimum, more to have a bigger batch size and longer sequence length. Here is the breakdown:
- Training: 8 bytes for AdamW states, 4 bytes for grads, 4+2 bytes for weights
- Inference: 2 bytes for weights (1 byte if you use quantization)
- 1.25 is 25% for activations (very very approximate)
For example: Let’s take an 80B param model and 80GB GPUs and calculate how many of them we will need for:
- Training: at least 23 GPUs 80181.25/80
- Inference: at least 3 GPUs 8021.25/80
- Excerpt: