Skip to content

Explore ability to improve generation using Self-Rewarding #76

@pramitchoudhary

Description

@pramitchoudhary

From understanding there r 2 parts of it,

  1. LLM-as-a-judge, to generate reward score. As first step evaluate this strategy can help during re-generation request with exhaustive beam search.
Screen Shot 2024-01-22 at 1 27 14 PM
  1. Self-training/modification on preference pairs

#Reference: https://arxiv.org/abs/2401.10020

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions