An affordance recognition pipeline based on a category-agnostic region proposal network for proposing instance regions of an image across categories and a self-attention mechanism trained to interpret each proposal learns to capture rich contextual dependencies through the region.