4 Years in 4 (ish) Minutes

Reflections on a PhD

Scott Jeen

University of Cambridge

Agenda

  1. Zero-Shot Reinforcement Learning: Why?
  2. With No Prior Data
  3. From Low Quality Data
  4. Under Changed Dynamics
  5. Outlook
  6. Reflections

Reinforcement Learning: Why?

  • Society faces many problems, many of which can be cast as sequential decision-making problems
- Micro: drug discovery
- Micro: energy generation control (fusion, fission, wind)
- Micro: teacher interacting sequentially with a student
- Macro: climate policy
- Macro: legislative innovation
- Macro: science
  • Reinforcement learning is the (as yet) most effective computational approach to solving sequential decision-making problems

RL + Simulators + Compute = Superhuman

RL + Simulators + Compute = Superhuman

RL + Learned Simulators + Compute = Meh?

  • In the absence of a ground-truth simulator, we can learn one from data
  • In practice, this means collecting data from our problem’s environment, and building a model that simulates its dynamics
  • But these models will only ever be approximations of the real world
  • So, a gap between these learned simulators and the real world is inevitable. The system cannot see the real-world in advance.

Zero-Shot Reinforcement Learning

  • Adapting quickly to the real-world is the primary concern of zero-shot reinforcement learning methods
  • Impressive progress has been made if the gap between the learned simulator and the real-world is small
  • I contend that to solve real-world problems these methods need to deal with a larger gap and satisfy

Paper 1: Data Quality Constraint

Paper 1: Data Quality Constraint

Paper 2a: Dynamics Constraint

Paper 2a: Dynamics Constraint

Paper 2b: Dynamics Constraint

Paper 2b: Dynamics Constraint

Paper 3: With No Prior Data (for building control)

Paper 3: With No Prior Data (for building control)

Reflections

Things I did well

  • I hit the rock every day (stonecutter’s creedo)
  • I got formal feedback (peer review) as often as I could
  • I focussed on problems, not solutions (latterly)
  • I didn’t follow the zeitgeist

Reflections

Things I didn’t do well

  • I should have spoken to (and collaborated with) more people, especially at conferences
  • I could’ve written up my work more frequently
  • I could’ve followed the zeitgeist more
  • I didn’t spot the building control work was a dead-end quickly enough (I was too focussed on solutions not problems)
  • I wasted time in the first 18 months

Things I’ll take away (and you might too)

  • Your idea can always be more general
  • Your feedback loop can always be tighter (i.e you can always interact with the real-world more regularly)
  • You can always speak to more people
  • You can always write up your work (even if nobody reads it!)
  • There’s a lower bound on the number of attempts it takes to solve a hard problem
  • You can always work quicker (sorry)
  • You can untie any knot if you play with it long enough (and physics will let you untie it)

Thanks!