- VAIBHAV SHANKAR SHARMA's Post🔍
- Optimizing Cache Efficiency for In|memory Key|value Stores🔍
- A Review on Methods to Optimize LLM's KV Cache Consumption🔍
- Synthesizing Recurrence with KV Cache Compression for Efficient ...🔍
- KV cache reuse — tensorrt_llm documentation🔍
- Data Popularity for Cache Eviction Algorithms using Random Forests🔍
- Adaptive KV Cache Compression for LLMs Oral🔍
- 2D Management of KV|Cache in LLM Inference via Layer|wise ...🔍
Optimizing KV Cache Eviction by Adaptive Budget Allocation ...
VAIBHAV SHANKAR SHARMA's Post - NACL - LinkedIn
... KV cache eviction policy combining PROXY-TOKENS EVICTION and RANDOM EVICTION. ... optimizes token retention while satisfying cache budget ...
Optimizing Cache Efficiency for In-memory Key-value Stores
It allows us to lower the possibility of prematurely evicting hot data from the cache. Second, recognizing the distinction between keys and values, we propose ...
A Review on Methods to Optimize LLM's KV Cache Consumption
... KV Cache, mainly including three methods: Eviction, Merging, and Quantization. ... Model tells you where to merge: Adaptive kv cache merging for llms on long- ...
Synthesizing Recurrence with KV Cache Compression for Efficient ...
While existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs to dramatically ...
KV cache reuse — tensorrt_llm documentation - GitHub Pages
enableBlockReuse (default: false ) allow reuse of previously computed KV cache blocks across requests. This is expected to optimize memory use and computation.
NACL: A Robust KV Cache Eviction Framework for Efficient Long ...
This method optimizes token retention while satisfying cache budget constraints. RANDOM EVICTION incorporates randomness into the eviction ...
Data Popularity for Cache Eviction Algorithms using Random Forests
efficient and adaptive cache eviction policy that aligns with our ultimate goal of optimizing ... File size distribution. Average file size: 2.13GB ...
Adaptive KV Cache Compression for LLMs Oral - ICLR 2025
Oral. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao.
2D Management of KV-Cache in LLM Inference via Layer-wise ...
... allocation of KV-cache budget among layers on-the-fly and ... KV-cache for each layer with its very own budget. By optimizing the KV-cache ...