
3
presentations
4
number of views
Presentations

Memory Augmented Language Models through Mixture of Word Experts
Cicero Nogueira dos Santos and 4 other authors

CoLT5: Faster Long-Range Transformers with Conditional Computation | VIDEO
Joshua Ainslie and 11 other authors

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie and 5 other authors