2. Why ‘pre-trained language model’?
● Achieved great performance on a variety of language
tasks.
● similar to how ImageNet classification pre-training helps
many vision tasks (*)
● Even better than CV tasks, it does not require labeled data
for pre-training.
(*) Although recently He et al. (2018) found that pre-training might not be necessary for image segmentation task.