Mastering Diverse Domains through World Models
Adding Conditional Control to Text-to-Image Diffusion Models
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
Token Merging: Your ViT But Faster
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Dual PatchNorm
Reversible Vision Transformers
Offsite-Tuning: Transfer Learning without Full Model
A Length-Extrapolatable Transformer
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Multimodal Chain-of-Thought Reasoning in Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
InstructPix2Pix: Learning to Follow Image Editing Instructions
Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Why do Nearest Neighbor Language Models Work?
Text2Poster: Laying Out Stylized Texts on Retrieved Images
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
Reversible Column Networks
Join Podbean Ads Marketplace and connect with engaged listeners.
Advertise Today
Create your
podcast in
minutes
It is Free
Babbage from The Economist
Cyber Security Headlines
The WAN Show
The 404 Media Podcast
Techmeme Ride Home