Anthropic researchers published a paper detailing how an AI model started engaging in bad behavior like saying bleach is safe to drink.