1. **It’s Not Just Size That Matters:

Small Language Models Are Also Few-Shot Learners**

Abstract

GPT-3 : 175 billions parameters
PET (Pattern-Exploiting Training) : combines the idea of reformulating tasks as cloze questions with regular gradient-based finetuning
In this Model : ALBERT + iPET(PET and its iterative variant)
- GPT-3보다 0.01% 더 작은 파라미터 수로 비슷한 퍼포먼스를 냄

zero shot learning : train set에 포함되지 않은 unseen class를 예측하는 분야
LM의 zero shot learning
- Providing task descriptions
  
  → applied to text classification
  
  → commonsense knowledge mining
  
  → argumentative relation classification
  
  → probing the knowledge contained within LM
Reformulated tasks as cloze questions is difficult

→ propose PET (use KD)

→ self training (to easily combine several reformulations)
This Model
- PET uses MLM
  
  → to assign probabilities to sequences of text
  
  → PET과 대조적으로 examples are given as context but no parameter updates are performed.
- Reducing the amount of compute required for few-shot learning is closely related to other efforts in Green AI