simple experiment
→by simply removing these biased tokens (e.g., high frequency subwords and punctuation) → using the average of the remaining token embeddings as sentence representation
결론 : avoiding embedding bias can improve the performance of sentence representations.
한계 : it is labor-intensive to manually remove embedding biases
→ 해결 : fill-in-the-blanks problems by different prompt
(1) Contrastive Learning based methods → constructing positive sentence pairs.
(2) anisotropy : BERT-flow, BERT-whitenng → to reduce the anisotropy by post-processing the sentence embeddings from original BERT
BERT layer가 미치는 영향 분석 → two sentence embedding 비교
(1) averaging static token embeddings (input of BERT)
(2) averaging last layer (output of BERT)
→ measure the sentence level anisotropy
: anisotropy of sentence embeddingd은 cosine similarity로 계산됨
→ 1에 가까울수록 the more anisotropy
bert-base-uncased, roberta-base : harm the sentence embeddings performance
또한, performance degradation of BERT layers가 NOT sentence level anisotropy 때문
→ 이유 : last layer avg.가 static token 보다 더 isotropic