microsoft/

LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python

Stars

6.1k

+4 today+9 /wk+22 /mo

Forks

375

Issues

116

Watchers

6.1k

Star History

Repository Info

LicenseMIT

CreatedJul 7, 2023

Last push4/8/2026

Homepagellmlingua.com/

Open on GitHub