Another epic article! What a great breakdown of the paper/findings. Incredible to see how well BERT did on some tasks, especially the SWAG!
This is how all research should be communicated — with an article of this quality.
One question is the BERT architecture open-source? Or is a trained model available to use?