Abstract

slot filling
- documents collection으로 부터 KG를 자동으로 추출해내는 능력을 평가하기 위한 방법
- $[ENTITY, SLOT, ?]$ 중 관련된 passages들로부터 $?$를 filling 하는 방법
최근에는 retrieval-based LM을 사용하여 end-to-end fashion으로 문제를 해결함.
- RAG : information extraction pipelines 없이도 좋은 성능을 냄
- 하지만 KILT Benchmark에서 real-world IE system을 따라가지 못함.
논문
- 더 좋은 slot filler를 만들기 위해, RAG의 retriever과 generator를 어떻게 적용했는지 다양한 전략들
  
  → $KGI_0$ : T-REx, zsRE를 활용하여 KILT 리더보드에서 높은 점수차로 1등 차지

1. Introduction

스크린샷 2022-05-13 오후 4.34.38.png

KG 기술 사업의 가장 큰 장벽
- 스키마 정의와 사업의 구체적인 relational data sources를 populate하는 것
→ KGI : 위 문제점을 slot filling을 zero-shot learning을 활용하여 해결함

slot filling 목표

→ HOW TO? : corpus안에 input entity의 occurrences 탐색 → 문맥으로부터 slot fillers에 대한 정보를 받음

slot filling system 과정

slot filling system의 발전 과정

기존 system의 문제
- NER, entity co-reference resolution, RE 등의 complex pipelines를 포함하고 있었음. → 그중에서도 text로부터 entity간의 relation을 추출하는 task가 가장 weakest
relation extraction 성능 향상을 위한 노력
- rule-based, supervised, distantly supervised 등
위 방법의 한계점
- hand-craft rule 생성을 위한 상당한 human effort 필요
- training data annotation 필요
- boostrapping relation classifiers를 위한 well-curated datasets 구축 필요
Knowledge source로 LM 이용
- zero-shot slot filing to pre-trained transformers
  - RAG나 REALM 같은 retrieval augmented LM을 사용하기 시작함
  - 생성된 slot fillers의 문맥적 근거를 제시하기 시작함.
KILT(Knowledge Intensive Language Task) : zsRE, T-REx 표준화
$KGI_0$
- DPR, RAG를 모두 활용한 new slot filling specific training
- RAG strategy of multiple sequence-to-sequence 가 3개 문단을 concatenation한 multi-DPR BART 보다 성능이 훨씬 좋음

RAG
- 방법론
  1. query의 evidence passage를 얻기 위해 DPR과 통합함.
  2. model initialized from BART를 사용하여 sequence-to-sequence generation 수행 → answer를 생성하기 위해 각 query와 evidence passage를 concatenated 함
- baseline RAG approach
  - fine-tuning task 시 query encoder와 generation component만 이용, passage encoder는 fixed
  - baseline의 best performance로 slot filler test 결과 BM25 보다 안좋음
    
    →전부 fine-tuning하는 것이 더 유익함
Multi-task DPR
- DPR passage, query encoder training 시, multi-task training of the KILT suite of benchmarks 활용
- query & Top 3 passage가 각 passage의 인덱스와 함께 나옴 → BART 모델이 answer 제공