Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

computer-vision-2

Pose Retrieval

3260 papers • 126 benchmarks • 313 datasets

Retrieval of similar human poses from images or videos

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in pose-retrieval-2

Trend

Dataset

Best Model

Actions

Human3.6M

MPI-INF-3DHP

Libraries

Use these libraries to find pose-retrieval-2 models and implementations

Datasets

Human3.6M

MPI-INF-3DHP

Subtasks

No subtasks available.

Most implemented papers

View-Invariant Probabilistic Embedding for Human Pose

Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Jennifer J. Sun, Ting Liu, Jiaping Zhao•Sun Dec 01 2019

An approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses, and uses probabilistic embeddings to model this input uncertainty.

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Paper Graph

HOC-Search: Efficient CAD Model and Pose Retrieval From RGB-D Scans

F. Fraundorfer, Stefan Ainetter, Sinisa Stekovic, Vincent Lepetit•Mon Sep 11 2023

This work introduces a fast-search method that approximates an exhaustive search based on this objective function for simultaneously retrieving the object category, a CAD model, and the pose of an object given an approximate 3D bounding box.

3 0

Paper Graph

Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

F. Fraundorfer, V. Lepetit, Stefan Ainetter, Sinisa Stekovic•Wed Dec 21 2022

An automatic method for annotating images of indoor scenes with the CAD models of the objects by relying on RGB-D scans is presented, and a ’cloning procedure’ is introduced that identifies objects that have the same geometry, to annotate these objects with the same CAD models.

11 0

Paper Graph

FixMyPose: Pose Correctional Captioning and Retrieval

Mohit Bansal, Hyounghun Kim, Abhaysinh Zala, Graham Burri•Sat Apr 03 2021

A new captioning dataset named FixMyPose is introduced, which presents strong cross-attention baseline models (uni/multimodal, RL, multilingual) and also shows that the baselines are competitive with other models when evaluated on other image-difference datasets.

21 0

Paper Graph

Transfer Learning for Pose Estimation of Illustrated Characters

Shuhong Chen, Matthias Zwicker•Tue Aug 03 2021

This work bridges the domain gap by efficiently transfer-learning from both domain-specific and task-specific source models and applies the resultant state-of-the-art character pose estimator to solve the novel task of pose-guided illustration retrieval.

17 0

Paper Graph

PanopTOP: a framework for generating viewpoint-invariant human pose estimation datasets

Piotr Bródka, Nicola Garau, Giulia Martinelli, Niccoló Bisagno, N. Conci•Thu Sep 30 2021

Human pose estimation (HPE) from RGB and depth images has recently experienced a push for viewpoint-invariant and scale-invariant pose retrieval methods. Current methods fail to generalize to unconventional viewpoints due to the lack of viewpoint-invariant data at training time. Existing datasets do not provide multiple-viewpoint observations and mostly focus on frontal views. In this work, we introduce PanopTOP, a fully automatic framework for the generation of semi-synthetic RGB and depth samples with 2D and 3D ground truth of pedestrian poses from multiple arbitrary viewpoints. Starting from the Panoptic Dataset [15], we use the PanopTOP framework to generate the PanopTOP31K dataset, consisting of 31K images from 23 different subjects recorded from diverse and challenging viewpoints, also including the top-view. Finally, we provide baseline results and cross-validation tests for our dataset, demonstrating how it is possible to generalize from the semi-synthetic to the real-world domain. The dataset and the code will be made publicly available upon acceptance.

9 0

Paper Graph

TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation

L. Sigal, Tanzila Rahman, Mengyu Yang•Mon Oct 25 2021

This work introduces TriBERT -- a transformer-based architecture, inspired by ViLBERT, which enables contextual feature learning across three modalities: vision, pose, and audio, with the use of flexible co-attention, and introduces a learned visual tokenization scheme based on spatial attention and leverage weak-supervision to allow granular cross-modal interactions for visual and pose modalities.

8 0

Paper Graph

Category-Level Pose Retrieval with Contrastive Features Learnt with Occlusion Augmentation

T. Tuytelaars, Punarjay Chakravarty, Georgios Kouros, Shubham Shrivastava, Cédric Picron, Sushruth Nagesh•Thu Aug 11 2022

This work proposes doing category-level pose estimation by learning an alignment metric in an embedding space using a contrastive loss with a dynamic margin and a continuous pose-label space and achieves state-of-the-art performance on PASCAL3D and OccludedPASCAL 3D and surpasses the competing methods on KITTI3D in a cross-dataset evaluation setting.

2 0

Paper Graph

Adding a benchmark result helps the community track progress.

Pose Retrieval | State-of-the-Art