Grouped Query Attention Performance Theoretical Analysis
Lei Mao's Log Book
2025-02-03 02:28:15
收藏
Sharing Key and Value Tensors for a Group of Query Tensors to Reduce Transformer Attention Layer Memory IO Pressure