3260 papers • 126 benchmarks • 313 datasets
The acrobot system includes two joints and two links, where the joint between the two links is actuated. Initially, the links are hanging downwards, and the goal is to swing the end of the lower link up to a given height.
(Image credit: Papersgraph)
These leaderboards are used to track progress in acrobot-5
No benchmarks available.
Use these libraries to find acrobot-5 models and implementations
No datasets available.
No subtasks available.
A conceptual model is constructed that, instead of specifying mechanistic requirements to generate criticality, exploits the maintenance of an organizational structure capable of reproducing critical behavior within a few universality classes of systems independently of their specific mechanisms or topologies.
A simple learning rule is proposed that maintains an internal organizational structure from a specific family of systems at criticality that is implemented in artificial embodied agents controlled by a neural network maintaining a correlation structure randomly sampled from an Ising model at critical temperature.
This work explores learning agent-agnostic synthetic environments (SEs) for Reinforcement Learning as a bi-level optimization problem and represents an SE as a neural network by using Natural Evolution Strategies and a population of SE parameter vectors.
It is found that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin, and that heteroscedasticity at training time, perhaps acting as a regularizer, improves predictions at longer horizons.
This work proposes a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains.
A reward shaping method based on source embedding similarity that is applicable to domains with both discrete and continuous action spaces and does not outperform two baselines but does show an improvement in these discrete action spaces.
It is proved that wavelets are both necessary and sufficient if the authors wish to construct a function approximator that can be adaptively refined without loss of precision, and that a fixed wavelet basis set performs comparably against the high-performing Fourier basis on Mountain Car and Acrobot.
Application of IDA-PBC to mechanical systems has received much attention in recent decades, but its application is still limited by the solvability of the so-called matching conditions. In this work, it is shown that total energy-shaping control of under-actuated mechanical systems has a control-by-interconnection interpretation. Using this interpretation, alternate matching conditions are formulated that defines constraints on the added energy, rather then the total closed-loop energy. It is additionally shown that, for systems that are under-actuated degree one with the mass matrix depending on a single coordinate, the kinetic energy matching conditions resolve to ODEs which can be evaluated numerically. Using this approach controllers are proposed for the benchmark cart-pole and acrobot systems.
An Autoencoder deep learning neural network was utilized as novelty detection for intrinsic rewards to guide the search process through a state space and achieved more efficient and accurate robot control in three of the four tasks when purely intrinsic rewards were used compared to standard extrinsic rewards.
Adding a benchmark result helps the community track progress.