Sample Algo
(October 2020)
1 Introduction
1:Distribution over tasks
2:Learning rates and
3:Randomly initialize the dynamic model and the policy
4:Initialize data buffer
5:while not done do
6: Sample batch of tasks from
7: for all do
8: Sample trajectories using policy and split into and
9: Update
10: Update
11: end for
12:
13:
14:end while
15:meta-learned dynamic model and meta-policy parameters
16:Data buffer
1:meta-learned dynamic model , learning rate
2:meta-policy parameters , learning rate
3:Data buffer
4:Randomly initialize the dynamic model and the policy
5:Initialize data buffer
6:while not done do
7: Randomly choose batch of s from
8: for all do
9: while not done do
10: Use to update corresponding to by:
11: end while
12: Sample batch of data using dynamic model and policy
13: Update
14: end for
15: Update
16:end while
17:meta-policy