Optimizing KV Cache Eviction by Adaptive Budget Allocation ...

VAIBHAV SHANKAR SHARMA's Post - NACL - LinkedIn

... KV cache eviction policy combining PROXY-TOKENS EVICTION and RANDOM EVICTION. ... optimizes token retention while satisfying cache budget ...

Optimizing Cache Efficiency for In-memory Key-value Stores

It allows us to lower the possibility of prematurely evicting hot data from the cache. Second, recognizing the distinction between keys and values, we propose ...

A Review on Methods to Optimize LLM's KV Cache Consumption

... KV Cache, mainly including three methods: Eviction, Merging, and Quantization. ... Model tells you where to merge: Adaptive kv cache merging for llms on long- ...

Synthesizing Recurrence with KV Cache Compression for Efficient ...

While existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs to dramatically ...

KV cache reuse — tensorrt_llm documentation - GitHub Pages

enableBlockReuse (default: false ) allow reuse of previously computed KV cache blocks across requests. This is expected to optimize memory use and computation.

NACL: A Robust KV Cache Eviction Framework for Efficient Long ...

This method optimizes token retention while satisfying cache budget constraints. RANDOM EVICTION incorporates randomness into the eviction ...

efficient and adaptive cache eviction policy that aligns with our ultimate goal of optimizing ... File size distribution. Average file size: 2.13GB ...

Adaptive KV Cache Compression for LLMs Oral - ICLR 2025

Oral. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao.

2D Management of KV-Cache in LLM Inference via Layer-wise ...

... allocation of KV-cache budget among layers on-the-fly and ... KV-cache for each layer with its very own budget. By optimizing the KV-cache ...