A manually improved dataset is provided for lexical-semantic relation prediction and its impact across three pre-trained neural language models is evaluated to reveal strong performance divergences between languages and confusions of specific relations.