Epic article Miguel! I love how you broke down the scaled dot-product attention step by step with examples.
One suggestion: I’d probably include a one-liner at the top of the article stating what BERT is, I had no clue when I started reading but found out through a few link clicks.