Skip to content

Conversation

SemyonEpanov
Copy link
Collaborator

No description provided.

if field_response not in df.columns:
df[field_response] = ""
if field_response not in df.columns:
if field_ans not in df.columns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

Return JSON only with:
{{"answer":"{letters}","rationale":"1-3 sentences (concise)","key_steps":["fact1","fact2","fact3"]}}
Answer the MCQ briefly and factually (no step-by-step reasoning).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I thought we wanted to elicit step-by-step reasoning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use step-by-step, it makes sense to use thinking, which will be very expensive on the mmlu-pro (try changing the prompt and setting the -1 flag for experiment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean

            thinking_config=types.ThinkingConfig(thinking_budget=0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to better align on the goal of the experiment then first. Could you add a design doc to docs with: hypothesis, execution plan, expected resutls

},
}

def process_tsv(tsv_path, out_jsonl, limit=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify it with the existing distill_on_dataset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can, but, I think the CLI call is more useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use it as a function call from a script in experiements instead for a plain CLI call then?

@SemyonEpanov
Copy link
Collaborator Author

I accidentally renamed the file and made changes in one commit (sorry about that).

In general: I changed gemini to openrouter, changed the logic to step-by-step reasoning, and left key_steps as the summation.

In the future, I want to merge branch a into branch c, since the first part of c duplicates a

@@ -0,0 +1,17 @@
1) **Main point**

Obtain a synthetic dataset (answers + brief explanations + analysis of erroneous answers + CoT tokens) for training subsequent models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually want to elicit the full reasoning chain, don't we?

Could you also add why we want to do it? AFAIU, we want to fine-tune small models on different versions of the distilled CoT and compare the performance. Right?

from core.prompts.mmlu_branches_aug import *

# defaults
DEFAULT_MODEL = os.getenv("OPENROUTER_MODEL", "deepseek/deepseek-r1:free")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we pass it as config? As discussed during before, we want reproducible results and it is extremely easy to forget what options we used if we pass them as env or CLI args

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a default argument in the file, and the function itself accepts and works with arguments. Therefore, the use of the config is the responsibility of the person using the code.

def synth_on_dataset(
    in_filename: str,
    out_jsonl: str,
    model: str = DEFAULT_MODEL,
    max_tokens: int = DEFAULT_MAX_TOKENS,
    dump_every: int = DUMP_EVERY,
    limit: int | None = None,
    branches: tuple[str, ...] = DEFAULT_BRANCHES
):

CHUNK_SIZE = int(os.getenv("SYNTH_CHUNK_SIZE", "16"))
DUMP_EVERY = int(os.getenv("SYNTH_DUMP_EVERY", "10"))

ALL_LETTERS = [chr(c) for c in range(ord("A"), ord("Z")+1)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this in a conference call.

pass
j = j or {}

if reasoning_text and "thinking" not in j:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand what we are doing here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation of the received json from LLM

record_in = _build_record_in(row_dict, question, choices, letters, gold, model)
jobs.append((row.Index, "A", {"question": question, "choices": choices, "gold": gold, "record_in": record_in, "letters": letters}))
jobs.append((row.Index, "B", {"question": question, "choices": choices, "gold": gold, "record_in": record_in, "letters": letters}))
jobs.append((row.Index, "C", {"question": question, "choices": choices, "gold": gold, "record_in": record_in, "letters": letters}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we first get A and B? And then run C on top of A as you propose din the chat before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. In the new version, it's refactored

Return JSON ONLY with the following schema:
{{
"answer": "{letters}",
"rationale": "concise 1-2 sentence justification (no fluff)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we ask for the final answer straight away if we are using a reasoning model and we can extract its reasoning chain?

Return JSON only:
{{"correct_answer":"{letters}",
"why_correct": "step-by-step reasoning showing why the gold option is correct",
"distractor_analysis": {distractor_tpl} }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand distractor_analysis vs why_correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distractor_analysis explains all answer options. Why_correct explains the correct answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants