Categories
About
Login
Research Paper
Paper List
By
ayoub
on Sept. 18, 2024
9891 words
50 mins read
0 comments
26 views
SHARE :
Research Papers List
Papers List
Seminal Papers / Need-to-know
Computer Vision
2010
Noise-contrastive Estimation: a New Estimation Principle for Unnormalized Statistical Models
2012
ImageNet Classification with Deep Convolutional Neural Networks
3D Convolutional Neural Networks for Human Action Recognition
2013
Visualizing and Understanding Convolutional Networks
Learning Factored Representations in a Deep Mixture of Experts
2014
Generative Adversarial Networks
2015
Very Deep Convolutional Networks for Large-Scale Image Recognition
Going Deeper with Convolutions
FaceNet: a Unified Embedding for Face Recognition and Clustering
Distilling the Knowledge in a Neural Network
Deep Unsupervised Learning Using Nonequilibrium Thermodynamics
2016
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Rethinking the Inception Architecture for Computer Vision
Deep Residual Learning for Image Recognition
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
You Only Look Once: Unified, Real-Time Object Detection
2017
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Photo-Realistic Single Image Super-Resolution Using a GAN
Understanding Intermediate Layers Using Linear Classifier Probes
Image-to-Image Translation with Conditional Adversarial Networks
Improved Image Captioning Via Policy Gradient Optimization of SPIDEr
2018
From Recognition to Cognition: Visual Commonsense Reasoning
Focal Loss for Dense Object Detection
Relational Inductive Biases, Deep Learning, and Graph Networks
Squeeze-and-Excitation Networks
When Does Label Smoothing Help?
Unsupervised Feature Learning Via Non-Parametric Instance Discrimination
2019
Objects As Points
RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
Semantic Image Synthesis with Spatially-Adaptive Normalization
Generative Modeling by Estimating Gradients of the Data Distribution
2020
Denoising Diffusion Probabilistic Models
Designing Network Design Spaces
Training Data-efficient Image Transformers & Distillation Through Attention
NeRF: Representing Scenes As Neural Radiance Fields for View Synthesis
Bootstrap Your Own Latent: a New Approach to Self-supervised Learning
A Simple Framework for Contrastive Learning of Visual Representations
Conditional Negative Sampling for Contrastive Learning of Visual Representations
Momentum Contrast for Unsupervised Visual Representation Learning
Generative Pretraining from Pixels
Random Erasing Data Augmentation
2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
RepVGG: Making VGG-style ConvNets Great Again
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
Do Vision Transformers See Like Convolutional Neural Networks?
BEiT: BERT Pre-Training of Image Transformers
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
CvT: Introducing Convolutions to Vision Transformers
An Empirical Study of Training Self-Supervised Vision Transformers
Diffusion Models Beat GANs on Image Synthesis
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Multiscale Vision Transformers
Score-Based Generative Modeling Through Stochastic Differential Equations
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Scaling Vision with Sparse Mixture of Experts
MLP-Mixer: an All-MLP Architecture for Vision
2022
A ConvNet for the 2020s
Natural Language Descriptions of Deep Visual Features
Vision Models are More Robust and Fair When Pretrained on Uncurated Images Without Supervision
Block-NeRF: Scalable Large Scene Neural View Synthesis
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Masked Autoencoders are Scalable Vision Learners
The Effects of Regularization and Data Augmentation are Class Dependent
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Pix2seq: a Language Modeling Framework for Object Detection
An Improved One Millisecond Mobile Backbone
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Swin Transformer V2: Scaling up Capacity and Resolution
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Sequencer: Deep LSTM for Image Classification
High-Resolution Image Synthesis with Latent Diffusion Models
Make-A-Video: Text-to-Video Generation Without Text-Video Data
Denoising Diffusion Implicit Models
CSWin Transformer: a General Vision Transformer Backbone with Cross-Shaped Windows
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
IBOT: Image BERT Pre-training with Online Tokenizer
Imagen Video: High Definition Video Generation with Diffusion Models
2023
Hiera: a Hierarchical Vision Transformer Without the Bells-and-Whistles
Tree-Ring Watermarks: Fingerprints for Diffusion Images That are Invisible and Robust
From Sparse to Soft Mixtures of Experts
Estimating Example Difficulty Using Variance of Gradients
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Initializing Models with Larger Ones
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Patch N’ Pack: NaViT, a Vision Transformer for Any Aspect Ratio and Resolution
NLP
1997
Long Short-Term Memory
2003
A Neural Probabilistic Language Model
2004
ROUGE: a Package for Automatic Evaluation of Summaries
2005
METEOR: an Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
2010
Recurrent Neural Network Based Language Model
2011
Generating Text with Recurrent Neural Networks
2013
Efficient Estimation of Word Representations in Vector Space
Distributed Representations of Words and Phrases and Their Compositionality
2014
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
GloVe: Global Vectors for Word Representation
Sequence to Sequence Learning with Neural Networks
Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation
2015
Neural Machine Translation by Jointly Learning to Align and Translate
Effective Approaches to Attention-based Neural Machine Translation
Skip-Thought Vectors
2016
Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation
Neural Machine Translation of Rare Words with Subword Units
HyperNetworks
2017
Attention is All You Need
Outrageously Large Neural Networks: the Sparsely-Gated Mixture-of-Experts Layer
Using the Output Embedding to Improve Language Models
Enriching Word Vectors with Subword Information
2018
Deep Contextualized Word Representations
Improving Language Understanding by Generative Pre-Training
SentencePiece: a Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing
Self-Attention with Relative Position Representations
Blockwise Parallel Decoding for Deep Autoregressive Models
Universal Language Model Fine-tuning for Text Classification
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
MS MARCO: a Human Generated MAchine Reading COmprehension Dataset
2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
RoBERTa: a Robustly Optimized BERT Pretraining Approach
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Adaptive Input Representations for Neural Language Modeling
Attention Interpretability Across NLP Tasks
Grad-CAM: Visual Explanations from Deep Networks Via Gradient-based Localization
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
GLUE: a Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Parameter-Efficient Transfer Learning for NLP
Cross-lingual Language Model Pretraining
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Multi-Stage Document Ranking with BERT
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
Synthetic QA Corpora Generation with Roundtrip Consistency
Towards VQA Models That Can Read
2020
Language Models are Few-Shot Learners
Longformer: the Long-Document Transformer
Big Bird: Transformers for Longer Sequences
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
The Curious Case of Neural Text Degeneration
ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators
TinyBERT: Distilling BERT for Natural Language Understanding
MPNet: Masked and Permuted Pre-training for Language Understanding
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Scaling Laws for Neural Language Models
Unsupervised Cross-lingual Representation Learning at Scale
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
Document Ranking with a Pretrained Sequence-to-Sequence Model
ColBERT: Efficient and Effective Passage Search Via Contextualized Late Interaction Over BERT
REALM: Retrieval-Augmented Language Model Pre-Training
Linformer: Self-Attention with Linear Complexity
BLEURT: Learning Robust Metrics for Text Generation
Query-Key Normalization for Transformers
2021
Towards a Unified View of Parameter-Efficient Transfer Learning
BinaryBERT: Pushing the Limit of BERT Quantization
Towards Zero-Label Language Learning
Improving Language Models by Retrieving from Trillions of Tokens
WebGPT: Browser-assisted Question-answering with Human Feedback
The Power of Scale for Parameter-Efficient Prompt Tuning
Prefix-Tuning: Optimizing Continuous Prompts for Generation
LoRA: Low-Rank Adaptation of Large Language Models
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Muppet: Massive Multi-task Representations with Pre-Finetuning
Synthesizer: Rethinking Self-Attention in Transformer Models
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Extracting Training Data from Large Language Models
Large Dual Encoders are Generalizable Retrievers
Text Generation by Learning from Demonstrations
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
A General Language Assistant As a Laboratory for Alignment
2022
Formal Mathematics Statement Curriculum Learning
Survey of Hallucination in Natural Language Generation
Transformer Quality in Linear Time
Chain of Thought Prompting Elicits Reasoning in Large Language Models
PaLM: Scaling Language Modeling with Pathways
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models
Training Compute-Optimal Large Language Models
Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change)
OPT: Open Pre-trained Transformer Language Models
Diffusion-LM Improves Controllable Text Generation
DeepPERF: a Deep Learning-Based Approach for Improving Software Performance
No Language Left Behind: Scaling Human-Centered Machine Translation
Efficient Few-Shot Learning Without Prompts
Large Language Models are Different
Solving Quantitative Reasoning Problems with Language Models
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers
Finetuned Language Models are Zero-shot Learners
Learning to Summarize from Human Feedback
Training Language Models to Follow Instructions with Human Feedback
Constitutional AI: Harmlessness from AI Feedback
RoFormer: Enhanced Transformer with Rotary Position Embedding
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Locating and Editing Factual Associations in GPT
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Holistic Evaluation of Language Models
SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization
InCoder: a Generative Model for Code Infilling and Synthesis
Large Language Models are Zero-Shot Reasoners
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks
Unsupervised Dense Information Retrieval with Contrastive Learning
Implicit Relation Linking for Question Answering Over Knowledge Graph
Galactica: a Large Language Model for Science
MuRAG: Multimodal Retrieval-Augmented Generator
Distilling Knowledge from Reader to Retriever for Question Answering
Learn to Explain: Multimodal Reasoning Via Thought Chains for Science Question Answering
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Recurrent Memory Transformer
2023
ReAct: Synergizing Reasoning and Acting in Language Models
LLaMA: Open and Efficient Foundation Language Models
Alpaca: a Strong, Replicable Instruction-Following Model
Transformer Models: an Introduction and Catalog
Learning to Compress Prompts with Gist Tokens
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LIMA: Less is More for Alignment
Language is Not All You Need: Aligning Perception with Language Models
QLoRA: Efficient Finetuning of Quantized LLMs
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Deduplicating Training Data Makes Language Models Better
Llama 2: Open Foundation and Fine-Tuned Chat Models
Retentive Network: a Successor to Transformer for Large Language Models
The Case for 4-bit Precision: K-bit Inference Scaling Laws
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
UL2: Unifying Language Learning Paradigms
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Accelerating Large Language Model Decoding with Speculative Sampling
Pretraining Language Models with Human Preferences
Large Language Models As Optimizers
G-Eval: NLG Evaluation Using GPT-4 with Better Human Alignment
Chain-of-Verification Reduces Hallucination in Large Language Models
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Mass-Editing Memory in a Transformer
MTEB: Massive Text Embedding Benchmark
Language Modeling is Compression
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Zephyr: Direct Distillation of LM Alignment
Intuitions
Weights’s Alignment Handbook
Evaluating Large Language Models: a Comprehensive Survey
Tamil-LLaMA: a New Tamil Language Model Based on LLaMA 2
Think Before You Speak: Training Language Models with Pause Tokens
YaRN: Efficient Context Window Extension of Large Language Models
StarCoder: May the Source be with You!
Let’s Verify Step by Step
Scalable Extraction of Training Data from (Production) Language Models
Gemini: a Family of Highly Capable Multimodal Models
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Human-Centered Loss Functions (HALOs)
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Scale Prompt Hacking Competition
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Tuning Language Models by Proxy
Group Preference Optimization: Few-shot Alignment of Large Language Models
Large Language Models are Neurosymbolic Reasoners
LM-Infinite: Simple On-The-Fly Length Generalization for Large Language Models
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Large Language Models are Null-Shot Learners
Knowledge Fusion of Large Language Models
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
ChatQA: Building GPT-4 Level Conversational QA Models
Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients
Mathematical Discoveries from Program Search with Large Language Models
Gaussian Error Linear Units (GELUs)
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
AutoGen: Enabling Next-Gen LLM Applications Via Multi-Agent Conversation
Towards Expert-Level Medical Question Answering with Large Language Models
Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
BitNet: Scaling 1-bit Transformers for Large Language Models
2024
Relying on the Unreliable: the Impact of Language Models’ Reluctance to Express Uncertainty
Matryoshka Representation Learning
Self-Refine: Iterative Refinement with Self-Feedback
The Claude 3 Model Family: Opus, Sonnet, Haiku
ORPO: Monolithic Preference Optimization Without Reference Model
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Stealing Part of a Production Language Model
OneBit: Towards Extremely Low-bit Large Language Models
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Multilingual E5 Text Embeddings: a Technical Report
MambaByte: Token-free Selective State Space Model
How Faithful are RAG Models? Quantifying the Tug-of-war Between RAG and LLMs’ Internal Prior
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models Through Question Complexity
Many-Shot In-Context Learning
Gemma 2: Improving Open Language Models at a Practical Size
The Llama 3 Herd of Models
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Speech
2006
Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
2010
Front-end Factor Analysis for Speaker Verification
2012
Sequence Transduction with Recurrent Neural Networks
2013
Hybrid Speech Recognition with Deep Bidirectional LSTM
2014
Towards End-To-End Speech Recognition with Recurrent Neural Networks
Deep Neural Networks for Small Footprint Text-dependent Speaker Verification
2015
Listen, Attend and Spell
2017
CNN Architectures for Large-Scale Audio Classification
2018
X-Vectors: Robust DNN Embeddings for Speaker Recognition
WaveGlow: a Flow-based Generative Network for Speech Synthesis
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
2019
Wav2vec: Unsupervised Pre-training for Speech Recognition
SpecAugment: a Simple Data Augmentation Method for Automatic Speech Recognition
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Fréchet Audio Distance: a Metric for Evaluating Music Enhancement Algorithms
2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Wav2vec 2.0: a Framework for Self-Supervised Learning of Speech Representations
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
GAN-based Data Generation for Speech Emotion Recognition
Generalized End-to-end Loss for Speaker Verification
2021
Generative Spoken Language Modeling from Raw Audio
Text-Free Prosody-Aware Generative Spoken Language Modeling
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Recent Advances in End-to-End Automatic Speech Recognition
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
SUPERB: Speech Processing Universal PERformance Benchmark
2022
Direct Speech-to-speech Translation with Discrete Units
Textless Speech Emotion Conversion Using Discrete and Decomposed Representations
Generative Spoken Dialogue Language Modeling
Textless-lib: a Library for Textless Spoken Language Processing
Self-Supervised Speech Representation Learning: a Review
Masked Autoencoders That Listen
Robust Speech Recognition Via Large-Scale Weak Supervision
AudioGen: Textually Guided Audio Generation
AudioLM: a Language Modeling Approach to Audio Generation
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Scaling Speech Technology to 1,000+ Languages
Distil-Whisper: Robust Knowledge Distillation Via Large-Scale Pseudo Labelling
Matcha-TTS: a Fast TTS Architecture with Conditional Flow Matching
Audiobox: Unified Audio Generation with Natural Language Prompts
Multimodal
2015
CIDEr: Consensus-based Image Description Evaluation
2016
“Why Should I Trust You?” Explaining the Predictions of Any Classifier
SPICE: Semantic Propositional Image Caption Evaluation
2017
A Unified Approach to Interpreting Model Predictions
Mixup: Beyond Empirical Risk Minimization
Multimodal Machine Learning: a Survey and Taxonomy
2019
Representation Learning with Contrastive Predictive Coding
2020
Modality Dropout for Improved Performance-driven Talking Faces
Augmentation Adversarial Training for Self-supervised Speaker Recognition
BERTScore: Evaluating Text Generation with BERT
2021
Comparing Data Augmentation and Annotation Standardization to Improve End-to-end Spoken Language Understanding Models
Learning Transferable Visual Models from Natural Language Supervision
Zero-Shot Text-to-Image Generation
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
MLIM: Vision-and-language Model Pre-training with Masked Language and Image Modeling
MURAL: Multimodal, Multi-task Retrieval Across Languages
Perceiver: General Perception with Iterative Attention
Multimodal Few-Shot Learning with Frozen Language Models
On the Opportunities and Risks of Foundation Models
CLIPScore: a Reference-free Evaluation Metric for Image Captioning
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
2022
DeepNet: Scaling Transformers to 1,000 Layers
Data2vec: a General Framework for Self-supervised Learning in Speech, Vision and Language
Hierarchical Text-Conditional Image Generation with CLIP Latents
AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
A Generalist Agent
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
I-Code: an Integrative and Composable Multimodal Learning Framework
VL-BEIT: Generative Vision-Language Pretraining
FLAVA: a Foundational Language and Vision Alignment Model
Flamingo: a Visual Language Model for Few-Shot Learning
Stable and Latent Diffusion Model
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
UniT: Multimodal Multitask Learning with a Unified Transform
Perceiver IO: a General Architecture for Structured Inputs & Outputs
Foundation Transformers
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Imagic: Text-Based Real Image Editing with Diffusion Models
EDICT: Exact Diffusion Inversion Via Coupled Transformations
CLAP: Learning Audio Concepts from Natural Language Supervision
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
OCR-free Document Understanding Transformer
PubTables-1M: Towards Comprehensive Table Extraction from Unstructured Documents
CoCa: Contrastive Captioners are Image-Text Foundation Models
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Grounded Language-Image Pre-training (GLIP)
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
2023
Pix2Video: Video Editing Using Image Diffusion
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in HuggingFace
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
ImageBind: One Embedding Space to Bind Them All
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
PaLM-E: an Embodied Multimodal Language Model
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Visual Instruction Tuning
Multimodal Chain-of-Thought Reasoning in Language Models
Dreamix: Video Diffusion Models are General Video Editors
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
OpenFlamingo: an Open-Source Framework for Training Large Autoregressive Vision-Language Models
Med-Flamingo: a Multimodal Medical Few-shot Learner
Towards Generalist Biomedical AI
PaLI: a Jointly-Scaled Multilingual Language-Image Model
Nougat: Neural Optical Understanding for Academic Documents
Text-Conditional Contextualized Avatars for Zero-Shot Personalization
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
AnyMAL: an Efficient and Scalable Any-Modality Augmented Language Model
Phenaki: Variable Length Video Generation from Open Domain Textual Description
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
SeamlessM4T – Massively Multilingual & Multimodal Machine Translation
PaLI-X: on Scaling up a Multilingual Vision and Language Model
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Sparks of Artificial General Intelligence: Early Experiments with GPT-4
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
MiniGPT-v2: Large Language Model As a Unified Interface for Vision-Language Multi-task Learning
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Diffusion Model Alignment Using Direct Preference Optimization
Seamless: Multilingual Expressive and Streaming Speech Translation
VideoPoet: a Large Language Model for Zero-Shot Video Generation
LLaMA-VID: an Image is Worth 2 Tokens in Large Language Models
FERRET: Refer and Ground Anything Anywhere at Any Granularity
StarVector: Generating Scalable Vector Graphics Code from Images
KOSMOS-2: Grounding Multimodal Large Language Models to the World
Generative Multimodal Models are In-Context Learners
Alpha-CLIP: a CLIP Model Focusing on Wherever You Want
2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Core ML
1991
What Every Computer Scientist Should Know about Floating-Point Arithmetic
1997
Bidirectional Recurrent Neural Networks
2001
Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers
2002
Transforming Classifier Scores Into Accurate Multiclass Probability Estimates
Dimensionality Reduction by Learning an Invariant Mapping
2006
Reducing the Dimensionality of Data with Neural Networks
2007
What Every Programmer Should Know about Memory
2008
ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning
2009
Large-scale Deep Unsupervised Learning Using Graphics Processors
Practical Guide to Controlled Experiments on the Web: Listen to Your Customers Not to the HiPPO
Curriculum Learning
2011
SMOTE: Synthetic Minority Over-sampling Technique
2012
Acoustic Modeling Using Deep Belief Networks
Improving Neural Networks by Preventing Co-adaptation of Feature Detectors
Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained
2014
Dropout: a Simple Way to Prevent Neural Networks from Overfitting
Intriguing Properties of Neural Networks
2015
ADAM: a Method for Stochastic Optimization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2016
XGBoost: a Scalable Tree Boosting System
Layer Normalization
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs
2017
Axiomatic Attribution for Deep Networks
Decoupled Weight Decay Regularization
On Calibration of Modern Neural Networks
Beta Calibration: a Well-founded and Easily Implemented Improvement on Logistic Calibration for Binary Classifiers
Understanding Black-box Predictions Via Influence Functions
Mixed Precision Training
StarSpace: Embed All the Things!
2018
Model Cards for Model Reporting
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Representer Point Selection for Explaining Deep Neural Networks
Mixed Precision Training
2019
Fast Transformer Decoding: One Write-Head is All You Need
Similarity of Neural Network Representations Revisited
Toward a Better Trade-off Between Performance and Fairness with Kernel-based Distribution Matching
Root Mean Square Layer Normalization
Generating Long Sequences with Sparse Transformers
Understanding and Improving Layer Normalization
2020
Estimating Training Data Influence by Tracing Gradient Descent
LEEP - Log Expected Empirical Prediction
OTDD - Optimal Transport Dataset Distance
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
GLU Variants Improve Transformer
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
2021
Efficient Deep Learning: a Survey on Making Deep Learning Models Smaller, Faster, and Better
Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Using AntiPatterns to Avoid MLOps Mistakes
Self-attention Does Not Need O(n2)O(n2) Memory
Sharpness-Aware Minimization for Efficiently Improving Generalization
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Efficiently Modeling Long Sequences with Structured State Spaces
2022
Pathways: Asynchronous Distributed Dataflow for ML
PolyLoss: a Polynomial Expansion Perspective of Classification Loss Functions
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Federated Learning with Buffered Asynchronous Aggregation
Applied Federated Learning: Architectural Design for Robust and Efficient Learning in Privacy Aware Settings
Operationalizing Machine Learning: an Interview Study
A/B Testing Intuition Busters
Effect of Scale on Catastrophic Forgetting in Neural Networks
Fine-Tuning Can Distort Pretrained Features and Underperform Out-of-Distribution
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Robust Fine-tuning of Zero-shot Models
Efficiently Scaling Transformer Inference
2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
Dataless Knowledge Fusion by Merging Weights of Language Models
Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Kolmogorov–Arnold Networks (KANs): an Alternative to Multi-Layer Perceptrons for Enhanced Interpretability and Accuracy
RecSys
2008
Calibrated Recommendations
2009
The Wisdom of the Few: a Collaborative Filtering Approach Based on Expert Opinions from the Web
2010
Factorization Machines
2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
2015
Collaborative Deep Learning for Recommender Systems
2016
Wide & Deep Learning for Recommender Systems
Deep Neural Networks for YouTube Recommendations
Product-based Neural Networks for User Response Prediction
2017
Neural Collaborative Filtering
Deep & Cross Network for Ad Click Predictions
DeepFM: a Factorization-Machine Based Neural Network for CTR Prediction
2018
Deep Interest Network for Click-Through Rate Prediction
2019
Behavior Sequence Transformer for E-commerce Recommendation in Alibaba
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
2020
DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems
Neural Collaborative Filtering vs. Matrix Factorization Revisited
2022
PinnerFormer: Sequence Modeling for User Representation at Pinterest
RL
2015
Trust Region Policy Optimization
2016
Mastering the Game of Go with Deep Neural Networks & Tree Search
2017
Proximal Policy Optimization Algorithms
Evolution Strategies As a Scalable Alternative to Reinforcement Learning
Playing FPS Games with Deep Reinforcement Learning
Mastering the Game of Go Without Human Knowledge
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
2019
AlphaStar: Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning
2020
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
2021
Highly Accurate Protein Structure Prediction with AlphaFold
2023
Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning
Graph ML
2000
Nonlinear Dimensionality Reduction by Locally Linear Embedding
A Global Geometric Framework for Nonlinear Dimensionality Reduction
2001
Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
2014
DeepWalk: Online Learning of Social Representations
2016
Asymmetric Transitivity Preserving Graph Embedding
Structural Deep Network Embedding
Node2vec: Scalable Feature Learning for Networks
2017
Inductive Representation Learning on Large Graphs
Semi-Supervised Classification with Graph Convolutional Networks
2018
Graph Attention Networks
2019
Exploiting Edge Features for Graph Neural Networks
Selected Papers / Good-to-know
Computer Vision
2015
Learning Deep Features for Discriminative Localization
2016
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
2017
Quo Vadis, Action Recognition? a New Model and the Kinetics Dataset
Densely Connected Convolutional Networks
2018
Neural Discrete Representation Learning
2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
2020
Taming Transformers for High-Resolution Image Synthesis
Self-training with Noisy Student Improves ImageNet Classification
Big Transfer (BiT): General Visual Representation Learning
Multi-modal Dense Video Captioning
Efficient Saliency Maps for Explainable AI
2021
Finetuning Pretrained Transformers Into RNNs
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Self-supervised Learning for Fast and Scalable Time Series Hyper-parameter Tuning.
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More
Emerging Properties in Self-Supervised Vision Transformers
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples
Enhancing Photorealism Enhancement
FNet: Mixing Tokens with Fourier Transforms
Are Convolutional Neural Networks or Transformers More Like Human Vision?
RegNet: Self-Regulated Network for Image Classification
Lossy Compression for Lossless Prediction
2022
YOLOv6: a Single-Stage Object Detection Framework for Industrial Applications
2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
DINOv2: Learning Robust Visual Features Without Supervision
Consistency Models
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
ZipIt! Merging Models from Different Tasks Without Training
Self-Consuming Generative Models Go MAD
Substance or Style: What Does Your Image Embedding Know?
Scaling Vision Transformers to 22 Billion Parameters
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
On the Impact of Knowledge Distillation for Model Interpretability
Replacing Softmax with ReLU in Vision Transformers
Learning Vision from Models Rivals Learning Vision from Data
TUTEL: Adaptive Mixture-of-Experts at Scale
2024
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Scalable Diffusion Models with State Space Backbone
Towards Evaluating the Robustness of Visual State Space Models
NLP
2008
ROUGE-C: a Fully Automated Evaluation Method for Multi-document Summarization
2015
Effective Approaches to Attention-based Neural Machine Translation
2018
Generating Wikipedia by Summarizing Long Sequences
2019
Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks
Diversity and Depth in Per-Example Routing Models
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2020
Efficient Transformers: a Survey
Towards a Human-like Open-Domain Chatbot
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Dense Passage Retrieval for Open-domain Question Answering
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Unsupervised Commonsense Question Answering with Self-Talk
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
2021
Pretrained Transformers As Universal Computation Engines
SimCSE: Simple Contrastive Learning of Sentence Embeddings
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
Transformer Feed-Forward Layers are Key-Value Memories
Measuring Massive Multitask Language Understanding
2022
A Causal Lens for Controllable Text Generation
SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples
LaMDA: Language Models for Dialog Applications
Causal Inference Principles for Reasoning about Commonsense Causality
RescoreBERT: Discriminative Speech Recognition Rescoring with BERT
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, a Large-Scale Generative Language Model
Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Memorizing Transformers
Ask Me Anything: a Simple Strategy for Prompting Language Models
Large Language Models Can Self-Improve
∞∞-former: Infinite Memory Transformer
Multitask Prompted Training Enables Zero-Shot Task Generalization
Large Language Models Encode Clinical Knowledge
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Automatic Chain of Thought Prompting in Large Language Models
Less is More: Parameter-Free Text Classification with Gzip
A Length-Extrapolatable Transformer
Efficient Training of Language Models to Fill in the Middle
Language Models of Code are Few-Shot Commonsense Learners
A Systematic Investigation of Commonsense Knowledge in Large Language Models
MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Ask Me Anything: a Simple Strategy for Prompting Language Models
STaR: Self-Taught Reasoner: Bootstrapping Reasoning with Reasoning
2023
Challenges and Applications of Large Language Models
LLM-Adapters: an Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
Accelerating Large Language Model Decoding with Speculative Sampling
GPT Detectors are Biased Against Non-native English Writers
GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions
Efficient Methods for Natural Language Processing: a Survey
Better Language Models of Code Through Self-Improvement
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Active Retrieval Augmented Generation
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text
How Language Model Hallucinations Can Snowball
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Gorilla: Large Language Model Connected with Massive APIs
SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning
Deliberate Then Generate: Enhanced Prompting Framework for Text Generation
Enabling Large Language Models to Generate Text with Citations
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Fine-Tuning Language Models with Just Forward Passes
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
RWKV: Reinventing RNNs for the Transformer Era
Knowledge Distillation of Large Language Models
Unifying Large Language Models and Knowledge Graphs: a Roadmap
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Textbooks are All You Need
Extending Context Window of Large Language Models Via Positional Interpolation
Deep Language Networks: Joint Prompt Training of Stacked LLMs Using Variational Inference
A Simple and Effective Pruning Approach for Large Language Models
To Repeat or Not to Repeat: Insights from Scaling LLM Under Token-Crisis
ART: Automatic Multi-step Reasoning and Tool-use for Large Language Models
Lost in the Middle: How Language Models Use Long Contexts
Improving Retrieval-Augmented Large Language Models Via Data Importance Learning
Scaling Transformer to 1M Tokens and Beyond with RMT
Hyena Hierarchy: Towards Larger Convolutional Language Models
LongNet: Scaling Transformers to 1,000,000,000 Tokens
The Curse of Recursion: Training on Generated Data Makes Models Forget
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
FLASK: Fine-grained Language Model Evaluation Based on Alignment Skill Sets
Secrets of RLHF in Large Language Models Part I: PPO
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Universal and Transferable Adversarial Attacks on Aligned Language Models
Scaling TransNormer to 175 Billion Parameters
What Learning Algorithm is In-context Learning? Investigations with Linear Models
What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
Multimodal Neurons in Pretrained Text-Only Transformers
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
The Hydra Effect: Emergent Self-repair in Language Model Computations
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
XSTest: a Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Jina Embeddings: a Novel Set of High-Performance Sentence Embedding Models
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
AlpaGasus: Training a Better Alpaca with Fewer Data
How is ChatGPT’s Behavior Changing Over Time?
Do Multilingual Language Models Think Better in English?
Skill-it! a Data-Driven Skills Framework for Understanding and Training Language Models
In-context Autoencoder for Context Compression in a Large Language Model
No Train No Gain: Revisiting Efficient Training Algorithms for Transformer-based Language Models
Leveraging Implicit Feedback from Deployment Data in Dialogue
FacTool: Factuality Detection in Generative AI – a Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Large Language Models Can be Easily Distracted by Irrelevant Context
Fast Inference from Transformers Via Speculative Decoding
Textbooks are All You Need II
Cognitive Mirage: a Review of Hallucinations in Large Language Models
Structured Chain-of-Thought Prompting for Code Generation
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Editing Commonsense Knowledge in GPT
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models Via Chain-of-Thought Fine-Tuning
FinGPT: Open-Source Financial Large Language Models
The Reversal Curse: LLMs Trained on “A is B” Fail to Learn “B is A”
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Reinforced Self-Training (ReST) for Language Modeling
How Do Large Language Models Capture the Ever-changing World Knowledge? a Review of Recent Advances
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
A Reparameterized Discrete Diffusion Model for Text Generation
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Likelihood-Based Diffusion Language Models
Who’s Harry Potter? Approximate Unlearning in LLMs
Mistral 7B
Take a Step Back: Evoking Reasoning Via Abstraction in Large Language Models
Text Generation with Diffusion Language Models: a Pre-training Approach with Continuous Paragraph Denoise
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
LLMs As Factual Reasoners: Insights from Existing Benchmarks and Beyond
Llemma: an Open Language Model for Mathematics
CODEFUSION: a Pre-trained Diffusion Model for Code Generation
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Augmenting Language Models with Long-Term Memory
ALCUNA: Large Language Models Meet New Knowledge
The Perils & Promises of Fact-checking with Large Language Models
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
ChainPoll: a High Efficacy Method for LLM Hallucination Detection
Mixture-of-Experts Meets Instruction Tuning: a Winning Combination for Large Language Models
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Fine-tuning Language Models for Factuality
Better Zero-Shot Reasoning with Self-Adaptive Prompting
Universal Self-Adaptive Prompting for Zero-shot and Few-shot Learning
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
Thread of Thought: Unraveling Chaotic Contexts
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
Text Embeddings Reveal (Almost) As Much As Text
Influence Scores at Scale for Efficient Language Data Sampling
TableLlama: Towards Open Large Generalist Models for Tables
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Online Speculative Decoding
PaSS: Parallel Speculative Sampling
System 2 Attention (is Something You Might Need Too)
Aligning Large Language Models Through Synthetic Feedback
Contrastive Chain-of-Thought Prompting
ChipNeMo: Domain-Adapted LLMs for Chip Design
Efficient Streaming Language Models with Attention Sinks
Precise Zero-Shot Dense Retrieval Without Relevance Labels
Tied-LoRA: Enhancing Parameter Efficiency of LoRA with Weight Tying
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Chain-of-Knowledge: Grounding Large Language Models Via Dynamic Knowledge Adapting Over Heterogeneous Sources
Exponentially Faster Language Modeling
Prompt Injection Attack Against LLM-integrated Applications
Jailbroken: How Does LLM Safety Training Fail?
Orca 2: Teaching Small Language Models How to Reason
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
Pythia: a Suite for Analyzing Large Language Models Across Training and Scaling
Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF
Large Language Models are Human-Level Prompt Engineers
A Survey of Graph Meets Large Language Model: Progress and Future Directions
Nash Learning from Human Feedback
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models
Magicoder: Source Code is All You Need
TarGEN: Targeted Data Generation with Large Language Models
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Revisiting Large Language Models As Zero-shot Relation Extractors
NexusRaven-V2: Surpassing GPT-4 for Zero-shot Function Calling
Instruction-Following Evaluation for Large Language Models
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenges
FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge
Automatic Hallucination Assessment for Aligned Large Language Models Via Transferable Adversarial Attacks
OLaLa: Ontology Matching with Large Language Models
LLM-Pruner: on the Structural Pruning of Large Language Models
SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
RAGAS: Automated Evaluation of Retrieval Augmented Generation
EVER: Mitigating Hallucination in Large Language Models Through Real-Time Verification and Rectification
Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models
AlphaCode 2
MediTron-70B: Scaling Medical Pretraining for Large Language Models
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
X-InstructBLIP: a Framework for Aligning X-Modal Instruction-Aware Representations to LLMs and Emergent Cross-modal Reasoning
SwiftSage: a Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Evaluating Large Language Models: a Comprehensive Survey
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
When Do Generative Query and Document Expansions Fail? a Comprehensive Study Across Methods Retrievers and Datasets
MemGPT: Towards LLMs As Operating Systems
The Internal State of an LLM Knows When It’s Lying
GPT4All: an Ecosystem of Open Source Compressed Language Models
The Falcon Series of Open Language Models
Promptbase: Elevating the Power of Foundation Models Through Advanced Prompt Engineering
Phi-2: the Surprising Power of Small Language Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
PromptBench: a Unified Library for Evaluation of Large Language Models
Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking
Mathematical Language Models: a Survey
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges
Language Model Inversion
LLM360: Towards Fully Transparent Open-Source LLMs
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Retrieval-Augmented Generation for Large Language Models: a Survey
LLM in a Flash: Efficient Large Language Model Inference with Limited Memory
ReST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Adversarial Attacks on GPT-4 Via Simple Random Search
An In-depth Look at Gemini’s Language Abilities
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
Large Language Models are Better Reasoners with Self-Verification
PaperMage: a Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
Large Language Models on Graphs: a Comprehensive Survey
An LLM Compiler for Parallel Function Calling
Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder–Decoder Models
Is ChatGPT Good at Search? Investigating Large Language Models As Re-Ranking Agents
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models Via Complexity Classes
Robust Knowledge Extraction from Large Language Models Using Social Choice Theory
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models
Editing Models with Task Arithmetic
Time is Encoded in the Weights of Finetuned Language Models
TinyGPT-V: Efficient Multimodal Large Language Model Via Small Backbones
OpenChat: Advancing Open-Source Language Models with Mixed-Quality Data
What Makes Good Data for Alignment? a Comprehensive Study of Automatic Data Selection in Instruction Tuning
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
MultiInstruct: Improving Multi-Modal Zero-Shot Learning Via Instruction Tuning
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Dense X Retrieval: What Retrieval Granularity Should We Use?
ARES: an Automated Evaluation Framework for Retrieval-Augmented Generation Systems
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
A Survey of Reasoning with Foundation Models
GPT-4V(ision) is a Generalist Web Agent, If Grounded
Large Language Models for Generative Information Extraction: a Survey
EQ-Bench: an Emotional Intelligence Benchmark for Large Language Models
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
TrustLLM: Trustworthiness in Large Language Models
Blending is All You Need: Cheaper Better Alternative to Trillion-Parameters LLM
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models Through Logic
Airavata: Introducing Hindi Instruction-Tuned LLM
Chain-of-Symbol Prompting for Spatial Relationships in Large Language Models
Continual Pre-training of Language Models
Jina Embeddings: a Novel Set of High-Performance Sentence Embedding Models
Simplifying Transformer Blocks
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
AnglE-Optimized Text Embeddings
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
Monarch Mixer: a Simple Sub-Quadratic GEMM-Based Architecture
DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
The Unlocking Spell on Base LLMs: Rethinking Alignment Via In-Context Learning
GLiNER: Generalist Model for Named Entity Recognition Using Bidirectional Transformer
2024
BLIVA: a Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set
Leveraging Large Language Models for NLG Evaluation: a Survey
Nomic Embed: Training a Reproducible Long Context Text Embedder
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Seven Failure Points When Engineering a Retrieval Augmented Generation System
SALAD-Bench: a Hierarchical and Comprehensive Safety Benchmark for Large Language Models
DoRA: Weight-Decomposed Low-Rank Adaptation
ICDPO: Effectively Borrowing Alignment Capability of Others Via In-context Direct Preference Optimization
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Two-dimensional Matryoshka Sentence Embeddings
Benchmarking Hallucination in Large Language Models Based on Unanswerable Math Word Problem
IndicVoices: Towards Building an Inclusive Multilingual Speech Dataset for Indian Languages
ArtPrompt: ASCII Art-based Jailbreak Attacks Against Aligned LLMs
The Calibration Gap Between Model and Human Confidence in Large Language Models
Fact-Checking the Output of Large Language Models Via Token-Level Uncertainty Quantification
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
The Power of Noise: Redefining Retrieval for RAG Systems
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Human Alignment of Large Language Models Through Online Preference Optimisation
A General Theoretical Paradigm to Understand Learning from Human Preferences
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
What are Tools Anyway? a Survey from the Language Model Perspective
AutoDev: Automated AI-Driven Development
LLM4Decompile: Decompiling Binary Code with Large Language Models
OLMo: Accelerating the Science of Language Models
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
RAG Vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
RAFT: Adapting Language Model to Domain Specific RAG
Corrective Retrieval Augmented Generation
SaulLM-7B: a Pioneering Large Language Model for Law
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
AIOS: LLM Agent Operating System
Lumos: a Modular Open-Source LLM-Based Agent Framework
A Comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
SDPO: Don’t Use Your Data All at Once
RS-DPO: a Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
Dataverse: Open-Source ETL (Extract Transform Load) Pipeline for Large Language Models
Teaching Large Language Models to Reason with Reinforcement Learning
Jamba: a Hybrid Transformer-Mamba Language Model
Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge
LLM2Vec: Large Language Models are Secretly Powerful Text Encoders
HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation
ReFT: Representation Finetuning for Language Models
Towards Conversational Diagnostic AI
Reka Core, Flash, and Edge: a Series of Powerful Multimodal Language Models
Phi-3 Technical Report: a Highly Capable Language Model Locally on Your Phone
Mixtral of Experts
BioMistral: a Collection of Open-Source Pretrained Large Language Models for Medical Domains
Gemma: Open Models Based on Gemini Research and Technology
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models
Instruction-tuned Language Models are Better Knowledge Learners
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Prometheus 2: an Open Source Language Model Specialized in Evaluating Other Language Models
Mixture of LoRA Experts
Teaching Large Language Models to Self-Debug
You Only Cache Once: Decoder-Decoder Architectures for Language Models
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Better & Faster Large Language Models Via Multi-token Prediction
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Hallucination of Multimodal Large Language Models: a Survey
In-Context Learning with Long-Context Models: an In-Depth Exploration
NOLA: Compressing LoRA Using Linear Combination of Random Basis
Data Selection for Transfer Unlearning
A Primer on the Inner Workings of Transformer-Based Language Models
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Nemotron-4 340B Technical Report
RewardBench: Evaluating Reward Models for Language Modeling
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Transferring Knowledge from Large Foundation Models to Small Downstream Models
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
MAGPIE: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
MDPO: Conditional Preference Optimization for Multimodal Large Language Models
Aligning Large Multimodal Models with Factually Augmented RLHF
Statistical Rejection Sampling Improves Preference Optimization
Chameleon: Mixed-Modal Early-Fusion Foundation Models
MMMU: a Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
MMBench: is Your Multi-modal Model an All-around Player?
GPQA: a Graduate-Level Google-Proof Q&A Benchmark
Sycophancy to Subterfuge: Investigating Reward Tampering in Language Models
ReST-MCTS∗∗: LLM Self-Training Via Process Reward Guided Tree Search
FLAME: Factuality-Aware Alignment for Large Language Models
Is DPO Superior to PPO for LLM Alignment? a Comprehensive Study
Improving Multi-step Reasoning for LLMs with Deliberative Planning
SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Accessing GPT-4 Level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-refine with LLaMa-3 8B: a Technical Report
SimPO: Simple Preference Optimization with a Reference-Free Reward
Discovering Preference Optimization Algorithms with and for Large Language Models
ToRA: a Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Scaling LLM Test-Time Compute Optimally Can be More Effective Than Scaling Model Parameters
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Learn Beyond the Answer: Training Language Models with Reflection for Mathematical Reasoning
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
V-STaR: Training Verifiers for Self-Taught Reasoners
Speech
2017
On Evaluating and Comparing Conversational Agents
2018
Attention-Based Models for Text-Dependent Speaker Verification
Efficient Voice Trigger Detection for Low Resource Hardware
2020
Automatic Speaker Recognition with Limited Data
Speaker Identification for Household Scenarios with Self-attention and Adversarial Training
Stacked 1D Convolutional Networks for End-to-end Small Footprint Voice Trigger Detection
Optimize What Matters: Training DNN-HMM Keyword Spotting Model Using End Metric
MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
2021
Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation
Joint ASR and Language Identification Using RNN-T: an Efficient Approach to Dynamic Language Switching
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Deep Spoken Keyword Spotting: an Overview
BW-EDA-EEND: Streaming End-to-end Neural Speaker Diarization for a Variable Number of Speakers
Attentive Contextual Carryover for Multi-turn End-to-end Spoken Language Understanding
SmallER: Scaling Neural Entity Resolution for Edge Devices
Leveraging Multilingual Neural Language Models for On-Device Natural Language Understanding
Comparing Data Augmentation and Annotation Standardization to Improve End-to-end Spoken Language Understanding Models
CLAR: Contrastive Learning of Auditory Representations
2022
Robust Self-Supervised Audio-Visual Speech Recognition
Adaptive Global-Local Context Fusion for Multi-Turn Spoken Language Understanding
SpeechMatrix: a Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
GIT: a Generative Image-to-text Transformer for Vision and Language
2023
Simple and Controllable Music Generation
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Joint Audio and Speech Understanding
Long-Form Music Generation with Latent Diffusion
Multimodal
2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Rethinking Attention with Performers
2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Image As a Foreign Language: BEIT Pretraining for All Vision and Vision-Language Tasks
Visual Programming: Compositional Visual Reasoning Without Training
Video-ChatGPT: Towards Detailed Video Understanding Via Large Vision and Language Models
2023
Meta-Transformer: a Unified Framework for Multimodal Learning
Any-to-Any Generation Via Composable Diffusion
Bytes are All You Need: Transformers Operating Directly on File Bytes
Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
Sound Reconstruction from Human Brain Activity Via a Generative Model with Brain-like Auditory Features
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Unified Model for Image, Video, Audio and Language Tasks
Qwen-7B: Open Foundation and Human-aligned Models
Qwen-VL: a Frontier Large Vision-Language Model with Versatile Abilities
NExT-GPT: Any-to-Any Multimodal LLM
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Demystifying CLIP Data
Scalable Diffusion Models with Transformers
DeepFloyd IF
PIXART-αα: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
RAPHAEL: Text-to-Image Generation Via Large Mixture of Diffusion Paths
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
CogVLM: Visual Expert for Pretrained Language Models
Improved Baselines with Visual Instruction Tuning
Matryoshka Diffusion Models
MAViL: Masked Audio-Video Learners
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Hyperbolic Image-Text Representations
Evaluating Object Hallucination in Large Vision-Language Models
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
FLAP: Fast Language-Audio Pre-training
Jointly Learning Visual and Auditory Speech Representations from Raw Data
MIRASOL3B: a Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
Video-LLaMA: an Instruction-tuned Audio-Visual Language Model for Video Understanding
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Visual Instruction Inversion: Image Editing Via Visual Prompting
A Video is Worth 4096 Tokens: Verbalize Videos to Understand Them in Zero Shot
Emu Edit: Precise Image Editing Via Recognition and Generation Tasks
2024
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
MME: a Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
PALO: a Polyglot Large Multimodal Model for 5B People
Sigmoid Loss for Language Image Pre-Training
2024
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model Under Weak Conditions
DeepSeek-VL: Towards Real-World Vision-Language Understanding
MPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-on
Glyph-ByT5: a Customized Text Encoder for Accurate Visual Text Rendering
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Mora: Enabling Generalist Video Generation Via a Multi-Agent Framework
VILA: on Pre-training for Visual Language Models
PaliGemma: a Versatile 3B VLM for Transfer
Core ML
2016
The Peaking Phenomenon in Semi-supervised Learning
2018
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning
2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights from a Noisy Quadratic Model
2020
Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference
2021
Tensor Programs V: Tuning Large Neural Networks Via Zero-Shot Hyperparameter Transfer
2022
OmniXAI: a Library for Explainable AI
VeLO: Training Versatile Learned Optimizers by Scaling up
2023
CoLT5: Faster Long-Range Transformers with Conditional Computation
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Sophia: a Scalable Stochastic Second-order Optimizer for Language Model Pre-training
DoReMi: Optimizing Data Mixtures Speeds up Language Model Pretraining
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Tackling the Curse of Dimensionality with Physics-Informed Neural Networks
(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
The Depth-to-Width Interplay in Self-Attention
The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
2024
Evolutionary Optimization of Model Merging Recipes
BANG: Billion-Scale Approximate Nearest Neighbor Search Using a Single GPU
RecSys
2019
Deep Learning Recommendation Model for Personalization and Recommendation Systems
FiBiNET: Combining Feature Importance and Bilinear Feature Interaction for Click-Through Rate Prediction
AutoInt: Automatic Feature Interaction Learning Via Self-Attentive Neural Networks
2020
DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems
GCN-Based User Representation Learning for Unifying Robust Recommendation and Fraudster Detection
2022
DHEN: a Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
2023
Towards Deeper, Lighter, and Interpretable Cross Network for CTR Prediction
Do LLMs Understand User Preferences? Evaluating LLMs on User Rating Prediction
Fresh Content Needs More Attention: Multi-funnel Fresh Content Recommendation
Large Language Models are Zero-Shot Rankers for Recommender Systems
How Can Recommender Systems Benefit from Large Language Models: a Survey
RL
2022
Transdreamer: Reinforcement Learning with Transformer World Models
Graph ML
2019
RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space
2023
Graph-Bert: Only Attention is Needed for Learning Graph Representations
2024
GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?
Generative Diffusion Models on Graphs: Methods and Applications
A Survey on Graph Diffusion Models: Generative AI in Science for Molecule Protein and Material
Seminal papers
need-to-know
papers
research
RELATED ARTICLES
Paper List
50 min read
0
26
POST COMMENT
POST COMMENT
SEE COMMENTS (0)
POST COMMENT