Sure - the more you use RL to steer/narrow the behavior of the model in one dire...

		HarHarVeryFunny 5 months ago \| parent \| context \| favorite \| on: Training language models to be warm and empathetic... Sure - the more you use RL to steer/narrow the behavior of the model in one direction, the more you are stopping it from generating others. RL and pre/post training is not the answer.