3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in probing-language-models-3
Use these libraries to find probing-language-models-3 models and implementations
No subtasks available.
This paper presents and applies the GUI-assisted framework allowing to easily probe massive amounts of languages for all the morphosyntactic features present in the Universal Dependencies data, and proposes a toolkit to systematize the multilingual flaws in multilingual models.
The usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender is quantified to demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages.
A method based on logistic regression classifiers is proposed to probe English, French, and Arabic PTLMs and quantify the potentially harmful content that they convey with respect to a set of templates to assess and mitigate the toxicity transmitted by PTL Ms.
It is found that although large language models fine-tuned on MNLI have some basic perception of the order between points in time, at large, these models do not have a thorough understanding of the relation between temporal expressions.
In this paper, we set out to quantify the syntactic capacity of BERT in the evaluation regime of non-context free patterns, as occurring in Dutch. We devise a test suite based on a mildly context-sensitive formalism, from which we derive grammars that capture the linguistic phenomena of control verb nesting and verb raising. The grammars, paired with a small lexicon, provide us with a large collection of naturalistic utterances, annotated with verb-subject pairings, that serve as the evaluation test bed for an attention-based span selection probe. Our results, backed by extensive analysis, suggest that the models investigated fail in the implicit acquisition of the dependencies examined.
It is found that the most powerful “transformer” models predict nearly 100% of explainable variance in neural responses to sentences and generalize across different datasets and imaging modalities (functional MRI and electrocorticography).
This work is the first to apply the probing paradigm to representations learned for document-level information extraction (IE), and designed eight embedding probes to analyze surface, semantic, and event-understanding capabilities relevant to document- level event extraction.
Adding a benchmark result helps the community track progress.