[RoBERTa: A Robustly Optimized BERT Pretraining Approach]
RoBERTa = BERT의 Replication Study
[문제]
[해결]
→ BERT에 여러 가지 tuning 진행
(1) training the model longer, with bigger batches, over more data
(2) NSP task 제거
(3) Training on longer sequences
(4) Dynamic Masking 적용
(5) Collect a large new dataset (CC-News)
[Results]
[Contribution]