Explore ability to improve generation using Self-Rewarding

From understanding there r 2 parts of it,
1. LLM-as-a-judge, to generate reward score. As first step evaluate this strategy can help during re-generation request with exhaustive beam search.

<img width="1056" alt="Screen Shot 2024-01-22 at 1 27 14 PM" src="https://github.com/h2oai/sql-sidekick/assets/1318029/0ad74c43-45df-4019-b063-c87e13b28a6d">

2. Self-training/modification on preference pairs

#Reference: https://arxiv.org/abs/2401.10020 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore ability to improve generation using Self-Rewarding #76

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Explore ability to improve generation using Self-Rewarding #76

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions