4 Years in 4 (ish) Minutes

Agenda

Zero-Shot Reinforcement Learning: Why?
With No Prior Data
From Low Quality Data
Under Changed Dynamics
Outlook
Reflections

Reinforcement Learning: Why?

Society faces many problems, many of which can be cast as sequential decision-making problems

- Micro: drug discovery

- Micro: energy generation control (fusion, fission, wind)

- Micro: teacher interacting sequentially with a student

- Macro: climate policy

- Macro: legislative innovation

- Macro: science

Reinforcement learning is the (as yet) most effective computational approach to solving sequential decision-making problems

RL + Simulators + Compute = Superhuman

RL + Learned Simulators + Compute = Meh?

In the absence of a ground-truth simulator, we can learn one from data

In practice, this means collecting data from our problem’s environment, and building a model that simulates its dynamics

But these models will only ever be approximations of the real world

So, a gap between these learned simulators and the real world is inevitable. The system cannot see the real-world in advance.

Zero-Shot Reinforcement Learning

Adapting quickly to the real-world is the primary concern of zero-shot reinforcement learning methods

Impressive progress has been made if the gap between the learned simulator and the real-world is small

I contend that to solve real-world problems these methods need to deal with a larger gap and satisfy

Paper 1: Data Quality Constraint

Paper 2a: Dynamics Constraint

Paper 2b: Dynamics Constraint

Paper 3: With No Prior Data (for building control)

Reflections

Things I did well

I hit the rock every day (stonecutter’s creedo)

I got formal feedback (peer review) as often as I could

I focussed on problems, not solutions (latterly)

I didn’t follow the zeitgeist

Reflections

Things I didn’t do well

I should have spoken to (and collaborated with) more people, especially at conferences

I could’ve written up my work more frequently

I could’ve followed the zeitgeist more

I didn’t spot the building control work was a dead-end quickly enough (I was too focussed on solutions not problems)

I wasted time in the first 18 months

Things I’ll take away (and you might too)

Your idea can always be more general

Your feedback loop can always be tighter (i.e you can always interact with the real-world more regularly)

You can always speak to more people

You can always write up your work (even if nobody reads it!)

There’s a lower bound on the number of attempts it takes to solve a hard problem

You can always work quicker (sorry)

You can untie any knot if you play with it long enough (and physics will let you untie it)

4 Years in 4 (ish) Minutes

Agenda

Reinforcement Learning: Why?

RL + Simulators + Compute = Superhuman

RL + Simulators + Compute = Superhuman

RL + Learned Simulators + Compute = Meh?

Zero-Shot Reinforcement Learning

Paper 1: Data Quality Constraint

Paper 1: Data Quality Constraint

Paper 2a: Dynamics Constraint

Paper 2a: Dynamics Constraint

Paper 2b: Dynamics Constraint

Paper 2b: Dynamics Constraint

Paper 3: With No Prior Data (for building control)

Paper 3: With No Prior Data (for building control)

Reflections

Things I did well

Reflections

Things I didn’t do well

Things I’ll take away (and you might too)

Thanks!