
Self-Hosted AI Coding Agent on a Tesla V100 Server GPU — VRAM Optimization & Speculative Decoding
Converting a Tesla V100 server GPU to PCIE for a self-hosted AI coding agent using llama.cpp, with VRAM optimization, quantization, and speculative decoding to achieve 100 tokens/s at just 100W.



