6. Target encoder, using EMA of online encoder.
Self-Predictive Representations(SPR)
6
1. Online encoder and
target encoder
2. Transition Model
3. Projection Heads
4. Prediction Loss
11. Related Works
11
● Data-Efficient RL
○ SiMPle:pixel-level transition model.
○ Data-Efficient Rainbow(DER) and OTRainbow:
○ 再構築Lossで潜在空間モデルを学習
○ DrQ, RAD:image augmentationすることで多くのモデルベースよりも精度が良い
○ Data augmentionはマルチタスク、転移学習における汎化性の向上に有効
SPRのアプローチの方が、data-augmentationをさらに有効に使える。
12. Related Works
12
● Representation Learning in RL:
○ CURL:image augmentation + contrastive loss.
■ Image augmentationの方が効いる?(by RAD)
○ CPC, ST-DIM, DRIML:temporal contrastive losses.
○ DeepMDP, trains a transition model with L2 loss.
■ online encoder to prediction target. prone to representational collapse.
■ add observation reconstruction objective.
○ PBL:directly predicts representations of future states.
■ Two target networks. Focus on multi-task generalization. 100 times data as SPR.
SPRはself-supervised, trained in latent space, uses a normalized loss.
Target encoder. Augmentations.