3260 papers • 126 benchmarks • 313 datasets
Safe Exploration is an approach to collect ground truth data by safely interacting with the environment. Source: Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
(Image credit: Papersgraph)
These leaderboards are used to track progress in safe-exploration-7
No benchmarks available.
Use these libraries to find safe-exploration-7 models and implementations
No datasets available.
No subtasks available.
This work addresses the problem of deploying a reinforcement learning agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated, and directly adds to the policy a safety layer that analytically solves an action correction formulation per each state.
The feasible actor-critic (FAC) algorithm is introduced, which is the first model-free constrained RL method that considers statewise safety, e.g, safety for each initial state, and theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization.
A suite of reinforcement learning environments illustrating various safety properties of intelligent agents, including safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries are presented.
The preliminary findings indicate that the approach can provide a basis for detecting whether the application context of an ML component is valid in the safety-security in the Empirical Cumulative Distribution Function (ECDF).
The generalization experiments conducted on both procedurally generated scenarios and real-world scenarios show that increasing the diversity and the size of the training set leads to the improvement of the RL agent's generalizability.
A novel algorithm is developed and proved that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint, and is demonstrated on digital terrain models for the task of exploring an unknown map with a rover.
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.
This paper presents a learning-based model predictive control scheme that can provide provable high-probability safety guarantees and exploits regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories.
This paper presents a learning-based model predictive control scheme that provides high-probability safety guarantees throughout the learning process, and constructs provably accurate confidence intervals on predicted trajectories based on a reliable statistical model.
This work introduces a new learning method for contextual bandit problems, Safe Exploration Algorithm (SEA), which overcomes the above drawbacks and never performs worse than the baseline policy and does not harm the user experience, while still exploring the action space and, thus, being able to find an optimal policy.
Adding a benchmark result helps the community track progress.