Explore 10M+ cutting-edge AI and machine learning research papers with citations, graphs, and insights.
Showing 12 of 30,951 papers
Kaiming He
X. Zhang
Shaoqing Ren
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
S. McKinney
Todor Markov
Jacob Menick
I. Sutskever
N. Keskar
+274GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs, is developed, a Transformer-based model pre-trained to predict the next token in a document which exhibits human-level performance on various professional and academic benchmarks.
Nikhila Ravi
Wan-Yen Lo
Ross B. Girshick
Laura Gustafson
Chloé Rolland
+7The Segment Anything Model (SAM) is introduced: a new task, model, and dataset for image segmentation, and its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results.
Noam Shazeer
Ashish Vaswani
Lukasz Kaiser
Jakob Uszkoreit
Niki Parmar
+3A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Zewar Shah
Shan Zhiyong
Adnan
This project aims to provide an efficient and robust real-time emotion identification framework that makes use of paralinguistic factors such as intensity, pitch, and MFCC and employs Diffusion Map to reduce data redundancy and high dimensionality.
Yuhai Wu
This chapter presents techniques for statistical machine learning using Support Vector Machines (SVM) to recognize the patterns and classify them, predicting structured objects using SVM, k-nearest neighbor method for classification, and Naive Bayes classifiers.
Edouard Grave
Armand Joulin
Hugo Touvron
Aur'elien Rodriguez
Guillaume Lample
+9LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, is introduced and it is shown that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets.
Gary S. Collins
K. Moons
P. Dhiman
R. Riley
A. L. Beam
+29The development of TRIPOD+AI is described and the expanded 27 item checklist with more detailed explanation of each reporting recommendation is presented, and the TRIPOD+AI for Abstracts checklist is presented.
Jacob Devlin
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Arpit Patidar
Abir Chakravorty
An innovative convolutional neural network architecture aimed at addressing challenges of detection and classification of apple fruit diseases is proposed and experimentally validated, achieving a remarkable classification accuracy of 95.37%.
Yinghai Lu
J. Reizenstein
Sharan Narang
Hugo Touvron
Aur'elien Rodriguez
+63This work develops and releases Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters, which may be a suitable substitute for closed-source models.
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
+22This work introduces Flamingo, a family of Visual Language Models (VLM) with this ability to bridge powerful pretrained vision-only and language-only models, handle sequences of arbitrarily interleaved visual and textual data, and seamlessly ingest images or videos as inputs.
Page 1 of 2,580