This work develops a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features and achieves state-of-the-art results on three existing tasks.
J. Weston
26 papers
Kurt Shuster
8 papers
Samuel Humeau
2 papers
M. Lachaux