Back to Trending

0xSero/

turboquant

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

Python

Stars

1.3k

+3 today+8 /wk+62 /mo

Forks

161

Issues

12

Watchers

1.3k

Star History

Repository Info

LicenseGPL-3.0

CreatedMar 25, 2026

Last push3/27/2026

Snapshot History

Date	Stars	Forks	Issues
May 8, 2026	1.3k	161	12
May 7, 2026	1.3k	161	11
May 6, 2026	1.3k	161	11
May 5, 2026	1.3k	160	12
Apr 7, 2026	831	101	8