Publication

SALAD-Bench

A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

arXiv GitHub Bilibili YouTube Data MD-Judge Leaderboard
AstroWind Hero Image

Compact Taxonomy with Hierarchical Levels

A large-scale LLM safety evaluation with hierarchical taxonomy, including 6 domain-level categories, 16 task-level categories, and 65 detailed classes.

Enhanced Difficulty and Complexity

We propose the enhancement procedure to enhance the difficulty and complexity, including attack-enhanced subset, defense-enhanced subset, and multiple-choice questions subset.

Reliable and Seamless Evaluator

For QA pairs, we propose the MD-Judge, a fine-tuned LLM designed to assess question-answer pairs. And for MCQs, we propose the MCQ-Judge, which employs in-context learning and regex parsing to identify unsafe choices from the options provided.

Joint-Purpose Utility

Comprehensive evaluation for LLMs, attacking methods and defense methods.

Leaderboard

AstroWind Hero Image