SORRY-Bench: Systematically Evaluating Large Language Model Safety...
Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face...