Posts Tagged "ML"

A Mini BERT Cross-Section: How [PAD] Tokens Work in Attention Masking

A deep dive into how [PAD] tokens behave inside BERT, following the embedding to matmul and attention masking pipeline.