The document discusses a new reinforcement learning algorithm called Learned Policy Gradient (LPG). LPG learns to optimize the policy and value function of an agent directly from interactions with the environment. It does this without requiring domain knowledge or manual engineering. LPG consists of an agent parameterized by θ that outputs a policy and prediction vector, and an LPG module parameterized by η that learns to update the agent's θ based on its performance. Experiments show LPG can learn effective policies across different environments, generalize to new tasks, and achieve better performance than baseline algorithms on Atari games after training on simple grid worlds.