psmarter/

mini-infer

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

Pythoncontinuous-batchingcudainferenceinference-enginekv-cache+11 more

Stars

202

+2 today+2 /wk+6 /mo

Forks

Issues

Watchers

202

Star History

Repository Info

LicenseMIT

CreatedDec 30, 2025

Last push4/24/2026

Homepagesmarter.xin/

Open on GitHub

Snapshot History

Date	Stars	Forks
May 7, 2026	202	10
May 5, 2026	201	10
Apr 16, 2026	178	9
Apr 15, 2026	174	9
Mar 24, 2026	103	6
Mar 23, 2026	103	6