Understanding Learned Reward Functions

[2012.05862] Understanding Learned Reward Functions - arXiv

In this paper, we investigate techniques for interpreting learned reward functions. In particular, we apply saliency methods to identify failure modes.

Understanding Learned Reward Functions - People @EECS

Inverse Reinforcement Learning (IRL) infers a reward function R for which observed user demonstrations are optimal. Other reward learning techniques learn a ...

Explaining Learned Reward Functions with Counterfactual ... - arXiv

Title:Explaining Learned Reward Functions with Counterfactual Trajectories ... Abstract:Learning rewards from human behaviour or feedback is a ...

Explaining Learned Reward Functions with Counterfactual ...

We propose Counterfactual Trajectory. Explanations (CTEs) to interpret reward functions in Reinforcement Learning by contrasting an original and ...

DYNAMICS-AWARE COMPARISON OF LEARNED REWARD ...

Experiments suggest that DARD better reflects reward function similarity than does EPIC in com- paring learned reward functions in two simulated, physics-based ...

Dynamics-Aware Comparison of Learned Reward Functions

The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world. However, comparing reward ...

Poster: Understanding Learned Reward Functions - NeurIPS 2024

The NeurIPS Logo above may be used on presentations. Right-click and choose download. It is a vector graphic and may be used at any scale.

How learning reward functions can go wrong - Towards Data Science

In other words, a reward learning process gives a rule by which the agent forms its belief about the correct reward function given the actions ...

On The Fragility of Learned Reward Functions - ResearchGate

Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer ...

REVEALE: Reward Verification and Learning Using Explanations

The agent explains its reward function and the human signals whether the explanation passes the verification test. When the explanation is rejected, the agent ...

Real-World DRL: 5 Essential Reward Functions for Modeling ...

IRL involves learning the reward function based on observed behavior. This is particularly useful when the exact reward structure is difficult ...

Preprocessing Reward Functions for Interpretability | Adam Gleave

Existing work has applied general-purpose interpretability tools to understand learned reward functions. We propose exploiting the intrinsic structure of ...

On The Fragility of Learned Reward Functions - Semantic Scholar

This work demonstrates with experiments in tabular and continuous control environments that the severity of relearning failures can be sensitive to changes ...

Reward shaping — Mastering Reinforcement Learning

Overview# · Reward shaping: If rewards are sparse, we can modify/augment our reward function to reward behaviour that we think moves us closer to the solution.

HumanCompatibleAI/interpreting-rewards: Experiments in ... - GitHub

This repository accompanies the paper Understanding Learned Reward Functions by Eric J. Michaud, Adam Gleave and Stuart Russell.

How to Make a Reward Function in Reinforcement Learning?

Before defining the reward function, it's essential to clearly understand the goal of the agent. Ask yourself: ... For example, if you are working ...

How does one learn a reward function in Reinforcement ... - Quora

Reward functions are generally not learned, but part of the input to the agent. Formally, RL tackles the problem of finding solutions to Markov decision ...

How to make a reward function in reinforcement learning?

Reward functions describe how the agent "ought" to behave. In other words, they have "normative" content, stipulating what you want the ...

Reinforcement learning - Wikipedia

For reinforcement learning in psychology, see Reinforcement and Operant conditioning. Reinforcement learning (RL) is an interdisciplinary area of machine ...

Any references on how to build and evaluate reward functions?

Different tabular algorithms like SARSA and Q-Learning have slightly different reward functions, but they mostly revolve around solving Bellman ...