Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure - the more you use RL to steer/narrow the behavior of the model in one direction, the more you are stopping it from generating others.

RL and pre/post training is not the answer.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: