Skip to content

Proposal: Document WFGY as an external LLM/RAG robustness test pack for OpenBB AI workflows #7369

@onestardao

Description

@onestardao

Hi, and thanks for OpenBB. It has become a very important open-source workspace for financial research and analytics.

I maintain an MIT-licensed open-source project called WFGY (~1.5k GitHub stars).
One of its main components is a 16-problem “ProblemMap” for RAG and LLM pipelines, which catalogues common failure modes across:

  • document and data ingestion
  • embeddings and vector stores
  • retrieval and ranking
  • tool calls and multi step reasoning
  • evaluation gaps and guardrails

ProblemMap overview:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

The map is already referenced or integrated in several external projects, including Harvard MIMS Lab’s ToolUniverse, QCRI’s Multimodal RAG Survey and curated lists such as Awesome AI in Finance.


Why this matters for OpenBB

OpenBB positions itself as an AI powered research and analytics workspace, with features such as chat, assistants and automated reporting. Many users are effectively building small LLM or RAG agents on top of OpenBB data sources and workflows.

In these setups, users often report that the hardest part is not building the first prototype, but understanding why a particular AI workflow behaves inconsistently. Typical questions are:

  • Did the retrieval over filings or time series fail
  • Did the model hallucinate a financial fact
  • Did a plugin or tool call misfire
  • Is the evaluation loop missing important checks

The 16-problem map provides a compact checklist for these failure modes and can be used as an external stress-test pack for OpenBB based LLM workflows.


Concrete proposal

If you think this is aligned with the project scope, I would be happy to:

  1. Draft a short docs section such as “External robustness testing for OpenBB AI workflows” that explains how OpenBB users can apply the WFGY ProblemMap when they build chat or agent like flows on top of the platform.

  2. Provide a small example scenario where an OpenBB based workflow is exercised against a subset of the problems, for example hallucinated metrics, broken retrieval over filings and unstable multi step reasoning.

  3. Include a compact table:
    Symptom in an OpenBB AI workflow → ProblemMap number → which part of the data, retrieval or agent configuration to inspect.

This would be a docs-only, optional reference. OpenBB would remain independent of any particular evaluation toolkit, and WFGY would simply appear as one external option for users who need robustness testing.

If this sounds useful, I can open a PR with a first draft and adjust based on your feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions