Training Agents with Foundation Models Workshop
University of Cambridge, UK
2024-08-09
BFMs conventionally infer \(z\) from \(D_{\text{train}}\) via:
\(z \approx \mathbb{E}_{(s_t, a_t, s_{t+1}) \sim D_{\text{train}}} \big[R(s_t, a_t, s_{t+1}) B(s_t, a_t, s_{t+1})\big]\)
\(z \approx \mathbb{E}_{(s_t, a_t, s_{t+1}) \sim D_{\text{train}}} \big[(R(s_t, a_t, s_{t+1}) + {\color{green}{\Delta R}}) B(s_t, a_t, s_{t+1})\big]\)
✅ Contextual BFMs can return performant policies for dynamics not seen during training
❌ The classifier training sets need to be large (\(>10^6\) transitions) to achieve this (in our experiments)
Twitter: @enjeeneer
Website: https://enjeeneer.io
Scott Jeen | https://enjeeneer.io