Reinforcement learning from human feedback (RLHF) can teach LLMs a variety of interesting skills. As an example, Sparrow, a chatbot developed by
@DeepMind
, is taught (via RLHF) to support its factual claims by finding relevant information on Google... 🧵 [1/7]