-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Description
Hi, and thanks for OpenBB. It has become a very important open-source workspace for financial research and analytics.
I maintain an MIT-licensed open-source project called WFGY (~1.5k GitHub stars).
One of its main components is a 16-problem “ProblemMap” for RAG and LLM pipelines, which catalogues common failure modes across:
- document and data ingestion
- embeddings and vector stores
- retrieval and ranking
- tool calls and multi step reasoning
- evaluation gaps and guardrails
ProblemMap overview:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
The map is already referenced or integrated in several external projects, including Harvard MIMS Lab’s ToolUniverse, QCRI’s Multimodal RAG Survey and curated lists such as Awesome AI in Finance.
Why this matters for OpenBB
OpenBB positions itself as an AI powered research and analytics workspace, with features such as chat, assistants and automated reporting. Many users are effectively building small LLM or RAG agents on top of OpenBB data sources and workflows.
In these setups, users often report that the hardest part is not building the first prototype, but understanding why a particular AI workflow behaves inconsistently. Typical questions are:
- Did the retrieval over filings or time series fail
- Did the model hallucinate a financial fact
- Did a plugin or tool call misfire
- Is the evaluation loop missing important checks
The 16-problem map provides a compact checklist for these failure modes and can be used as an external stress-test pack for OpenBB based LLM workflows.
Concrete proposal
If you think this is aligned with the project scope, I would be happy to:
-
Draft a short docs section such as “External robustness testing for OpenBB AI workflows” that explains how OpenBB users can apply the WFGY ProblemMap when they build chat or agent like flows on top of the platform.
-
Provide a small example scenario where an OpenBB based workflow is exercised against a subset of the problems, for example hallucinated metrics, broken retrieval over filings and unstable multi step reasoning.
-
Include a compact table:
Symptom in an OpenBB AI workflow → ProblemMap number → which part of the data, retrieval or agent configuration to inspect.
This would be a docs-only, optional reference. OpenBB would remain independent of any particular evaluation toolkit, and WFGY would simply appear as one external option for users who need robustness testing.
If this sounds useful, I can open a PR with a first draft and adjust based on your feedback.