The current state of the art is RLHF (reinforcement learning with human feedback...

		theptip on Feb 15, 2023 \| parent \| context \| favorite \| on: Bing: “I will not harm you unless you harm me firs... The current state of the art is RLHF (reinforcement learning with human feedback); initially trained to complete human utterances, plus fine-tuning to maximize human feedback on whether the completion was "helpful" etc. https://huggingface.co/blog/rlhf