[BugFix] Fix NaN errors in paged attention kernel by WoosukKwon · Pull Request #936 · vllm-projec...
Fixes #641 This PR fixes the paged attention kernel. Currently, the kernel computes attn_weight * value for all tokens in a value block, even if some of them are not included in the context. It is ...