1 article
A new arXiv study reveals language models refuse to help users bypass rules—even unjust ones—95% of the time.