From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions - Citation Graph | Papersgraph