Skip to content

The state and skill encoder learned with contrastive learning is never used? #6

@xf-zhao

Description

@xf-zhao

Hi, thank you very much for sharing the codes of the paper. Integrating contrastive learning into skill discovery is very attractive.

However, I found that in this implementation, the state encoder and skill encoder in cic module ($g_{\psi_1}$ and $g_{\psi_2}$ in the paper) are never used before being fed into policy neural networks. In cic/agent/cic.py line 222, parameters in cic is updated once
but not called for encoding obs and skill thereafter.

Another question is how can the agent guarantee that the policy is "indeed conditioned on z" since the intrinsic reward has noting to do with z? In another word, $\tau$ can be arbitarily diverse, which is good for exploration, but there lacks a mechnism to ensure the agent know "what's the influnce of z".

I really like your work. But these issues confuse me a lot. Please correct me if I am wrong or miss something. Thank you again for your kindness of sharing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions