3260 papers • 126 benchmarks • 313 datasets
Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. ( Image credit: FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image )
(Image credit: Papersgraph)
These leaderboards are used to track progress in head-pose-estimation-7
Use these libraries to find head-pose-estimation-7 models and implementations
No subtasks available.
An elegant and robust way to determine pose is presented by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles directly from image intensities through joint binned pose classification and regression.
This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets and builds a very strong baseline for this purpose.
The resulting Wide Headpose Estimation Network (WHENet) is the first fine-grained modern method applicable to the full-range of head yaws yet also meets or beats state-of-the-art methods for frontal head pose estimation.
This work studies learning from a synergy process of 3D Morphable Models (3DMM) and3D facial landmarks to predict complete 3D facial geometry, including 3D alignment, face orientation, and 3D face modeling.
A comprehensive comparison of several successful deep learning-based face detectors is conducted to uncover their efficiency using two metrics: FLOPs and latency and can guide to choose appropriate face detectors for different applications and also to develop more efficient and accurate detectors.
The proposed deep label distribution learning (DLDL) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from overfitting even when the training set is small.
Tests show that the proposed real-time, six degrees of freedom, 3D face pose estimation without face detection or landmark localization outperforms state of the art (SotA) face pose estimators and surpasses SotA models of comparable complexity on the WIDER FACE detection benchmark, despite not been optimized on bounding box labels.
FaceXFormer is the first model capable of handling ten facial analysis tasks while maintaining real-time performance at 33.21 FPS, and FaceX, a lightweight decoder with a novel bi-directional cross-attention mechanism, which jointly processes face and task tokens to learn robust and generalized facial representations.
This paper proposes a supervised initialization scheme for cascaded face alignment based on explicit head pose estimation and proposes two schemes for generating initialisation: the first one relies on projecting a mean 3D face shape onto 2D image under the estimated head pose; the second one searches nearest neighbour shapes from the training set according to head pose distance.
The Surrey Face Model is presented, a multi-resolution 3D Morphable Model that is made available to the public for non-commercial purposes and a lightweight open-source C++ library designed with simplicity and ease of integration as its foremost goals.
Adding a benchmark result helps the community track progress.