A method for valid inference powered by machine learning is introduced that enables researchers to draw more reliable and accurate conclusions from machine learning predictions.
Significance Machine learning is increasingly used as an efficient substitute for traditional data collection when the latter is challenging. For example, predictions of conditions such as poverty, deforestation, and population density based on satellite imagery are used to supplement accurate survey data, which requires significant time and resources to collect. However, predictions are imperfect and potentially biased, calling into question the validity of conclusions drawn from such data. This manuscript introduces a method for valid inference powered by machine learning. The method enables researchers to draw more reliable and accurate conclusions from machine learning predictions.