3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in multi-objective-reinforcement-learning-2
No benchmarks available.
Use these libraries to find multi-objective-reinforcement-learning-2 models and implementations
No datasets available.
No subtasks available.
Inspired by problems faced during medicinal chemistry lead optimization, the MolDQN model is extended with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule.
This work proposes a multi-objective Q-network whose outputs are conditioned on the relative importance of objectives, and introduces Diverse Experience Replay (DER) to counter the inherent non-stationarity of the dynamic weights setting.
A generalized version of the Bellman equation is proposed to learn a single parametric representation for optimal policies over the space of all possible preferences in MORL, with the goal of enabling few-shot adaptation to new tasks.
This work proposes Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori.
An Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems, is proposed.
MO-Gym is introduced, an extensible library containing a diverse set of multi-objective reinforcement learning environments that extends the widely-used OpenAI Gym API, allowing the reuse of algorithms and features that are well-established in the reinforcement learning community.
A novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes that improve sample-efficient learning and empirically shows that the method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks, both with discrete and continuous state and action spaces.
It is proved that the value function transforms smoothly given a transformation of weights of the reward function (and thus a smooth interpolation in the policy space) and that the interpolation can provide robust values for sample states and actions in both discrete and continuous domain problems.
This paper proposes a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way, and uses supervised learning to fit a parametric policy to a combination of these distributions.
Adding a benchmark result helps the community track progress.