Proposal: Document WFGY as an external LLM/RAG robustness test pack for OpenBB AI workflows

Hi, and thanks for **OpenBB**. It has become a very important open-source workspace for financial research and analytics.

I maintain an MIT-licensed open-source project called **WFGY** (~1.5k GitHub stars).  
One of its main components is a **16-problem “ProblemMap” for RAG and LLM pipelines**, which catalogues common failure modes across:

- document and data ingestion  
- embeddings and vector stores  
- retrieval and ranking  
- tool calls and multi step reasoning  
- evaluation gaps and guardrails  

ProblemMap overview:  
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

The map is already referenced or integrated in several external projects, including Harvard MIMS Lab’s ToolUniverse, QCRI’s Multimodal RAG Survey and curated lists such as Awesome AI in Finance.

---

### Why this matters for OpenBB

OpenBB positions itself as an **AI powered research and analytics workspace**, with features such as chat, assistants and automated reporting. Many users are effectively building small LLM or RAG agents on top of OpenBB data sources and workflows.

In these setups, users often report that the hardest part is not building the first prototype, but understanding **why** a particular AI workflow behaves inconsistently. Typical questions are:

- Did the retrieval over filings or time series fail  
- Did the model hallucinate a financial fact  
- Did a plugin or tool call misfire  
- Is the evaluation loop missing important checks  

The 16-problem map provides a **compact checklist for these failure modes** and can be used as an external stress-test pack for OpenBB based LLM workflows.

---

### Concrete proposal

If you think this is aligned with the project scope, I would be happy to:

1. Draft a short docs section such as **“External robustness testing for OpenBB AI workflows”** that explains how OpenBB users can apply the WFGY ProblemMap when they build chat or agent like flows on top of the platform.  

2. Provide a small example scenario where an OpenBB based workflow is exercised against a subset of the problems, for example hallucinated metrics, broken retrieval over filings and unstable multi step reasoning.  

3. Include a compact table:  
   *Symptom in an OpenBB AI workflow → ProblemMap number → which part of the data, retrieval or agent configuration to inspect.*  

This would be a **docs-only, optional reference**. OpenBB would remain independent of any particular evaluation toolkit, and WFGY would simply appear as one external option for users who need robustness testing.

If this sounds useful, I can open a PR with a first draft and adjust based on your feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Document WFGY as an external LLM/RAG robustness test pack for OpenBB AI workflows #7369

Why this matters for OpenBB

Concrete proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Document WFGY as an external LLM/RAG robustness test pack for OpenBB AI workflows #7369

Description

Why this matters for OpenBB

Concrete proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions