There was that result about training them to be evil in one area impacting code ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		nemomarx 5 months ago \| parent \| context \| favorite \| on: Training language models to be warm and empathetic... There was that result about training them to be evil in one area impacting code generation?

roywiggins 5 months ago [–]

Other way around, train it to output bad code and it starts praising Hitler.

https://arxiv.org/abs/2502.17424

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact