AttentionVCGitHubVC
Trending ReposTrending BuildersPortfolio
AttentionVC

Built by JustinZ and Jennie

AttentionVCA product by AttentionVC
Back to Trending
psmarter
psmarter/

mini-infer

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

Pythoncontinuous-batchingcudainferenceinference-enginekv-cache+11 more
Stars

202

+2 today+2 /wk+6 /mo
Forks

10

Issues

0

Watchers

202

Star History

Repository Info

LicenseMIT
CreatedDec 30, 2025
Last push4/24/2026
Homepagesmarter.xin/
Open on GitHub

Snapshot History

DateStarsForksIssues
May 7, 2026202100
May 5, 2026201100
Apr 16, 202617890
Apr 15, 202617490
Mar 24, 202610360
Mar 23, 202610360