Cross-prediction-powered inference

Published in

Proceedings of the National Academy of Sciences...(2023)

External Links:

Generate Graph DownloadPDF

TL;DR

A method for valid inference powered by machine learning is introduced that enables researchers to draw more reliable and accurate conclusions from machine learning predictions.

Abstract

Significance Machine learning is increasingly used as an efficient substitute for traditional data collection when the latter is challenging. For example, predictions of conditions such as poverty, deforestation, and population density based on satellite imagery are used to supplement accurate survey data, which requires significant time and resources to collect. However, predictions are imperfect and potentially biased, calling into question the validity of conclusions drawn from such data. This manuscript introduces a method for valid inference powered by machine learning. The method enables researchers to draw more reliable and accurate conclusions from machine learning predictions.

Authors

Tijana Zrnic

1 papers

E. Candès

4 papers

References60 items

PPI++: Efficient Prediction-Powered Inference

ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

A General M-estimation Theory in Semi-Supervised Framework

Prediction-powered inference

Evolutionary-scale prediction of atomic level protein structure with a language model

Cross-prediction-powered inference

Published in

Proceedings of the National Academy of Sciences...(2023)

External Links:

Generate Graph DownloadPDF

TL;DR

A method for valid inference powered by machine learning is introduced that enables researchers to draw more reliable and accurate conclusions from machine learning predictions.

Abstract

Authors

Tijana Zrnic

1 papers

E. Candès

4 papers

References60 items

PPI++: Efficient Prediction-Powered Inference

ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

A General M-estimation Theory in Semi-Supervised Framework

Prediction-powered inference

Evolutionary-scale prediction of atomic level protein structure with a language model

The structural context of posttranslational modifications at a proteome-wide scale

Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings

High-dimensional semi-supervised learning: in search of optimal inference of the mean

Retiring Adult: New Datasets for Fair Machine Learning

Highly accurate protein structure prediction for the human proteome

Highly accurate protein structure prediction with AlphaFold

A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees

Tailored inference for finite populations: conditional validity and transfer across distributions

Cross-Validation: What Does It Estimate and How Well Does It Do It?

Methods for correcting inference based on outcomes predicted by machine learning

A generalizable and accessible approach to machine learning with global satellite imagery

Cross-validation Confidence Intervals for Test Error

Satellite‐based estimates reveal widespread forest degradation in the Amazon

Asymptotics of cross-validation

A survey on semi-supervised learning

Semisupervised inference for explained variance in high dimensional linear regression and its applications

Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community

A Deep Learning Approach for Population Estimation from Satellite Imagery

Double/Debiased Machine Learning for Treatment and Structural Parameters

Mapping poverty using mobile phone and satellite data

Semi-Supervised Linear Regression

Combining satellite imagery and machine learning to predict poverty

Locally Robust Semiparametric Estimation

Semi-supervised inference: General theory and estimation of means

XGBoost: A Scalable Tree Boosting System

High-Resolution Global Maps of 21st-Century Forest Cover Change

Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey

Global, 30-m resolution continuous fields of tree cover: Landsat-based rescaling of MODIS vegetation continuous fields with lidar-based estimates of error

Introduction to Semi-Supervised Learning

Doubly Robust Estimation in Missing Data and Causal Inference Models

Asymptotics of cross-validated risk estimation in estimator selection and performance assessment

The Sloan Digital Sky Survey: Technical Summary

Multiple imputation: a primer

Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse

Multiple Imputation After 18+ Years

Semiparametric Efficiency in Multivariate Regression Models with Missing Data

Efficient and Adaptive Estimation for Semiparametric Models.

The asymptotic variance of semiparametric estimators

Estimation of Regression Coefficients When Some Regressors are not Always Observed

ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION

Consistent Estimation of the Influence Function of Locally Asymptotically Linear Estimators

Consequences and Detection of Misspecified Nonlinear Regression Models

A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

On the Efficiency of a Class of Non-Parametric Estimates

INFERENCE AND MISSING DATA

Testing Statistical Hypotheses

Models as approximations I

Asymptotic statistics

Testing Statistical Hypotheses

Semi-Supervised Learning Literature Survey

Efficient and adaptive estimation for semiparametric models

Large sample estimation and hypothesis testing

On the nonparametric estimation of functionals

Valid inference after prediction

On high-dimensional Gaussian comparisons for cross-validation

Field of Study

MedicineComputer ScienceMathematics

Journal Information

Name

Proceedings of the National Academy of Sciences of the United States of America

Volume

118

Venue Information

Name

Proceedings of the National Academy of Sciences of the United States of America

Type

journal

URL

https://www.jstor.org/journal/procnatiacadscie

Alternate Names

PNAS
PNAS online
Proceedings of the National Academy of Sciences of the United States of America.
Proc National Acad Sci
Proceedings of the National Academy of Sciences
Proc National Acad Sci u s Am