The state and skill encoder learned with contrastive learning is never used?

Hi, thank you very much for sharing the codes of the paper. Integrating contrastive learning into skill discovery is very attractive.

However, I found that in this implementation, the state encoder and skill encoder in `cic` module ($g_{\psi_1}$ and $g_{\psi_2}$ in the paper) are never used before being fed into policy neural networks. In `cic/agent/cic.py` line 222, parameters in `cic` is updated once 
 but not called for encoding `obs` and `skill` thereafter. 

Another question is how can the agent guarantee that the policy is "indeed conditioned on z" since the intrinsic reward has noting to do with z? In another word, $\tau$ can be arbitarily diverse, which is good for exploration, but there lacks a mechnism to ensure the agent know "what's the influnce of z". 

I really like your work. But these issues confuse me a lot. Please correct me if I am wrong or miss something. Thank you again for your kindness of sharing.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The state and skill encoder learned with contrastive learning is never used? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The state and skill encoder learned with contrastive learning is never used? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions