Abstract

1. Introduction

response selection의 발전과정

RNN, CNN 활용 : representations을 바탕으로 relevance score 계산
utterance-response matching models : utterance-level encodings 구축 & 이를 두 문장간의 관련 단어를 찾는 곳에 사용
self-attention-based matching model : 정교한 segment representations encoding 강화
model based on contextualized language representations (BERT, XLNet)
- pre-trained contextual language models의 효율성 발견
- 여러 NLP tasks에서 SOTA
- 하지만, domain-specific corpus에 대해서는 좋은 성능을 내지 못함
  
  그 이유 1) 한쪽 domain에 biased되어 있기 때문에, dialog contexts를 fully represent 하지 못함
```
           2) 대화체에서는 구어체와 축약이 많기 때문 ← 문법적으로 안맞음 
```
→ 이를 해결하기 위한 해결법
Domain knowledge embeddings
1. embedding에 외부지식을 통합하여 넣어줌
2. RRC(review reading comprehension) : pre-trained 모델을 사용하여 domain specificity 학습 시도
3. BERT based post-training method for RRC 제안
→ 특정 도메인에 대해서만 post-training of BERT with MLM, NSP

⇒ task-awareness contextualized representations 가능
논문 제안 방식
- multi-turn conversational system에서의 post-training method
- Ubuntu Corpus V1에 BERT-base 적용
- 다음 문장이 이어지는지, 이어지지 않는지 예측하는 NSP가 매우 중요하게 작용
- [EOT] : post-training 시 utterances간의 관계 파악 가능

Lowe et al. [1]

: Ubuntu Corpus V1

Kadlec et al. [2]