Optimized vLLM configuration for nvidia/Gemma-4-31B-IT-NVFP4 on two RTX PRO 6000 Blackwell GPUs (no NVLink), with Gemma 4's native Multi-Token Prediction (MTP) drafter (google/gemma-4-31B-it-assistant ...
You are free to share (copy and redistribute) this article in any medium or format and to adapt (remix, transform, and build upon) the material for any purpose, even commercially within the parameters ...