From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions (2014-02-28T00:00:00.000000Z)