Skip to content

Latest commit

 

History

History
508 lines (388 loc) · 37.7 KB

File metadata and controls

508 lines (388 loc) · 37.7 KB

📊 Hedge Fund Tracker

repo views repo size last commit latest release Python TypeScript React FastAPI Tailwind CSS Vite License GitHub stars GitHub watchers GitHub forks

If this tool is helping you, please ⭐ the repo! It really helps discoverability.

SEC 13F Filing Tracker | Institutional Portfolio Analysis | AI-Powered Stock Research

A comprehensive Python tool for tracking hedge fund portfolios through SEC filings (13F, 13D/G, Form 4). Transform raw SEC EDGAR data into actionable investment insights. Built for financial analysts, quantitative traders, and retail investors seeking to analyze institutional investor strategies, portfolio changes, and discover stock opportunities by following elite fund managers.

Keywords: SEC filings tracker, 13F analysis, hedge fund portfolio, institutional investors, stock research, investment intelligence, CUSIP converter, financial data scraper, AI stock analysis

⫶☰ Table of Contents

🚀 Quick Start

# Clone the repository
git clone https://github.com/dokson/hedge-fund-tracker.git
cd hedge-fund-tracker

# Install Python dependencies
pipenv install

# Install and build the React frontend
cd app/frontend && npm install && npm run build && cd ../..

# Run the application (opens web UI in your browser)
pipenv run python -m app.main

✨ Key Features

Feature Description
🌐 Modern Web UI Premium React-based platform with real-time SSE streaming for AI tasks, native Dark Mode, and responsive design.
📊 Visual Analytics Interactive charts (Recharts) to track institutional holdings, sectoral trends, and quarterly portfolio evolutions.
🆚 Comparative Analysis Combines quarterly (13F) and non-quarterly (13D/G, Form 4) filings for an up-to-date view.
📋 Comprehensive Reports High-fidelity analysis pages for both investment funds (portfolios) and specific stocks (tickers).
🔍 Smart Ticker Resolution Multi-fallback system (yfinance, Finnhub, FinanceDatabase) to resolve CUSIPs into actionable stock symbols.
🤖 AI Financial Analyst Leverages top-tier LLMs to calculate "Promise Scores" and perform deep due diligence on high-conviction opportunities.
⚙️ Automated Data Pipeline Scheduled GitHub Actions to fetch, process, and commit the latest SEC filings directly to your repository.
🌐 GitHub Pages Demo Static deployment with bundled data — all analysis features work without a backend.
⭐ Personalized Watchlist Star your favorite funds or stocks for quick access and personalized tracking across the platform.
🗃️ GICS Hierarchy Autonomous parser to build a granular GICS classification database.

📦 Installation

Prerequisites

  1. 📥 Clone and navigate:

    git clone https://github.com/dokson/hedge-fund-tracker.git
    cd hedge-fund-tracker
  2. 📲 Install dependencies: Navigate to the project root and run the following command. This will create a virtual environment and install all required packages.

    pipenv install

    💡 Tip: If pipenv is not found, you might need to use python -m pipenv install. This can happen if the user scripts directory is not in your system's PATH.

  3. 🔨 Build the frontend: Build the React interface (required once before first run):

    cd app/frontend && npm install && npm run build && cd ../..
  4. ▶️ Run the application: Execute within the project's virtual environment:

    pipenv run python -m app.main

    This starts a FastAPI server on http://localhost:8000 and opens the web UI in your browser automatically.

    ⚠️ Note on CLI mode (Legacy): The terminal CLI is a deprecated version of the tool, built before the development of the modern Web UI. While still functional, it requires a manual .env configuration. This file is automatically generated the first time you launch the Web UI. So, if you still wish to use the "old school" CLI, just run:

    pipenv run python -m app.main --cli

Data Management

The data update operations (downloading and processing filings) are inside a dedicated script. This keeps the main application focused on analysis, while the updater handles populating and refreshing the database.

To run the data update operations, you need to use the updater.py script from the project root:

pipenv run python -m database.updater

Database Updater

The updater.py script includes semi-automated maintenance tasks:

  • Sorting: Upon exit (option 0), the script automatically sorts the database/stocks.csv file by ticker to maintain performance and prevent Git diff noise.
  • Auto-Documentation: This README's excluded funds section is synchronized whenever the database is refreshed manually.

This will open a separate menu for data management:

┌───────────────────────────────────────────────────────────────────────────────┐
│                     Hedge Fund Tracker - Database Updater                     │
├───────────────────────────────────────────────────────────────────────────────┤
│  0. Exit                                                                      │
│  1. Generate latest 13F reports for all known hedge funds                     │
│  2. Fetch latest non-quarterly filings for all known hedge funds              │
│  3. Generate 13F report for a known hedge fund                                │
│  4. Manually enter a hedge fund CIK to generate a 13F report                  │
└───────────────────────────────────────────────────────────────────────────────┘

GICS Classification

The project includes an autonomous GICS (Global Industry Classification Standard) parser (database/gics/updater.py). Originally developed by MSCI and S&P, it scrapes Wikipedia to build a full hierarchy of 163 sub-industries. This provides the AI Analyst with granular industry context while remaining independent of third-party libraries.

API Configuration

The tool can utilize API keys for enhanced functionality, but all are optional:

Service Purpose Get Free API Key
Finnhub Finnhub CUSIP to stock ticker conversion Finnhub Keys
GitHub Models GitHub Models Access to top-tier models (e.g., xAI Grok-3, OpenAI GPT-5, etc...) GitHub Tokens
Google AI Studio Google AI Studio Access to Google Gemini models AI Studio Keys
Groq AI Groq AI Access to various LLMs (e.g., OpenAI gpt-oss, Meta Llama, etc...) Groq Keys
Hugging Face Hugging Face Access to open weights models (e.g., DeepSeek R1, Kimi-Linear-48B, etc...) HF Tokens
OpenRouter OpenRouter Access to various LLMs (e.g., Claude 4.5 Opus, GLM 4.5 Air, etc...) OpenRouter Keys

💡 Note: Ticker resolution primarily uses yfinance, which is free and requires no API key. If that fails, the system falls back to Finnhub (if an API key is provided), with the final fallback being FinanceDatabase.

💡 Note: You don't need to use all the APIs. For the generative AI models (Google AI Studio, GitHub Models, Groq AI, Hugging Face, and OpenRouter), you only need the API keys for the services you plan to use. For instance, if you want to experiment with models like OpenAI GPT-4o mini, you just need a GitHub Token. Experimenting with different models is encouraged, as the quality of AI-generated analysis, both for identifying promising stocks and for conducting due diligence, can vary. However, top-performing stocks are typically identified consistently across all tested models. All APIs used in this project are currently free (with GitHub Models providing a generous free tier for developers).

📁 Project Structure

hedge-fund-tracker/
├── 📁 .github/
│   ├── 📁 scripts/
│   │   └── 🐍 fetcher.py           # Daily script for data fetching (scheduled by workflows/daily-fetch.yml)
│   └── 📁 workflows/                # GitHub Actions for automation
│       ├── ⚙️ deploy-pages.yml     # GitHub Actions: Deploy to GitHub Pages
│       ├── ⚙️ filings-fetch.yml    # GitHub Actions: Filings fetching job
│       └── ⚙️ python-tests.yml     # GitHub Actions: Unit tests
├── 📁 app/                          # Main application logic
│   ├── 📁 frontend/                 # React + Vite web UI
│   │   ├── 📁 public/               # Static assets (404.html, logo.png)
│   │   ├── 📁 scripts/              # copy-database.mjs (bundles CSVs for GH Pages)
│   │   ├── 📁 src/
│   │   │   ├── 📁 components/       # Shared UI components (ModelSelector, TerminalOutput, FeatureNotAvailable, etc.)
│   │   │   ├── 📁 lib/              # config.ts (IS_GH_PAGES_MODE), dataService.ts (CSV I/O), aiClient.ts (SSE)
│   │   │   └── 📁 pages/            # AIRanking, AIDueDiligence, FundsConfig, AISettings, DatabaseOperations
│   │   ├── 📦 package.json
│   │   └── ⚙️ vite.config.ts
│   ├── 🐍 server.py                 # FastAPI server (serves frontend + all API endpoints)
│   └── ▶️ main.py                  # Entry point: web server (default) or CLI (--cli)
├── 📁 database/                     # Data storage
│   ├── 📁 2025Q1/                  # Quarterly reports
│   │   ├── 📊 fund_1.csv           # Individual fund quarterly report
│   │   ├── 📊 fund_2.csv
│   │   └── 📊 fund_n.csv
│   ├── 📁 YYYYQN/
│   ├── 📁 GICS/
│   │   ├── 🗃️ hierarchy.csv        # Full GICS hierarchy
│   │   └── ▶️ updater.py           # GICS updater script
│   ├── 📝 hedge_funds.csv          # Curated hedge funds list -> EDIT THIS to add or remove funds to track
│   ├── 📝 models.csv               # LLMs list to use for AI Financial Analyst -> EDIT THIS to add or remove AI models
│   ├── 📊 non_quarterly.csv        # Stores latest 13D/G and Form 4 filings
│   ├── 📊 stocks.csv               # Master data for stocks (CUSIP-Ticker-Name)
│   └── ▶️ updater.py               # Main entry point for updating the database
├── 📁 tests/                        # Test suite
├── 📝 .env.example                 # Template for your API keys
├── ⛔ .gitignore                   # Git ignore rules
├── 🧾 LICENSE                      # MIT License
├── 🛠️ Pipfile                      # Project dependencies
├── 🔏 Pipfile.lock                 # Locked dependency versions
└── 📖 README.md                    # Project documentation (this file)

📝 Hedge Funds Configuration File: database/hedge_funds.csv contains the list of hedge funds to monitor (CIK, name, manager) and can also be edited at runtime.

📝 LLMs Configuration File: database/models.csv contains the list of available LLMs for AI analysis and can also be edited at runtime.

👨🏻‍💻 How This Tool Tracks Hedge Funds

This tracker leverages the following types of SEC filings to provide a comprehensive view of institutional activity.

  • 📅 Quarterly 13F Filings

    • Required for funds managing $100M+
    • Filed within 45 days of quarter-end
    • Shows portfolio snapshot on last day of quarter
  • 📝 Non-Quarterly 13D/G Filings

    • Required when acquiring 5%+ of company shares
    • Filed within 10 days of the transaction
    • Provides a timely view of significant investments
  • ✍🏻 Non-Quarterly SEC Form 4 Insider Filings

    • Filed by insiders (executives, directors) or large shareholders (>10%) when they trade company stocks
    • Must be filed within 2 business days of the transaction
    • Offers real-time insight into the actions of key individuals and institutions

🏢 Hedge Funds Selection

This tool tracks a curated list of what I found to be the top-performing institutional investors that file with the U.S. SEC, identified based on their performance over the last 3-5 years. This curation is the result of my own methodology designed to identify the top percentile of global investment funds. My selection methodology is detailed below.

Selection Methodology

Modern portfolio theory (MPT) offers many methods for quantifying the risk-return trade-off, but they are often ill-suited for analyzing the limited data available in public filings. Consequently, the hedge_funds.csv was therefore generated using my own custom selection algorithm designed to identify top-performing funds while managing for volatility.

Note: The selection algorithm is external to this project and was used only to produce the curated hedge_funds.csv list.

My approach prioritizes high cumulative returns but also analyzes the path taken to achieve them: it penalizes volatility, similar to the Sharpe Ratio, but this penalty is dynamically adjusted based on performance consistency; likewise, drawdowns are penalized, echoing the principle of the Sterling Ratio, but the penalty is intentionally dampened to avoid overly punishing funds that recover effectively from temporary downturns.

List Management

The list of hedge funds is actively managed to maintain its quality; funds that underperform may be replaced, while new top performers are periodically added.

However, despite their strong performance, several funds with portfolios predominantly focused on Healthcare and Biotech, such as Nextech Invest, Enavate Sciences, Caligan Partners, and Boxer Capital Management, have been intentionally excluded. These funds invest in highly specialized sectors where I lack the necessary expertise. Consequently, I consider them too risky for my personal investment profile, given the complexity and volatility inherent in biotech and healthcare ventures.

Notable Exclusions

The quality of the output analysis is directly tied to the quality of the input data. To enhance the accuracy of the insights and opportunities identified, many popular high-profile funds have been intentionally excluded by design (the list below is automatically managed and capped to 50 funds, but you can see the full list in excluded_hedge_funds.csv):

💡 Note: For convenience, key information for these funds, including their CIKs, is maintained in the database/excluded_hedge_funds.csv file.

Adding Custom Funds

Want to track additional funds? Simply edit database/hedge_funds.csv and add your preferred institutional investors. For example, to add Berkshire Hathaway, Pershing Square and ARK-Invest, you would add the following lines:

"CIK","Fund","Manager","Denomination","CIKs"
"0001067983","Berkshire Hathaway","Warren Buffett","Berkshire Hathaway Inc",""
"0001336528","Pershing Square","Bill Ackman","Pershing Square Capital Management, L.P.",""
"0001697748","ARK Invest","Cathie Wood","ARK Investment Management LLC",""

💡 Note: hedge_funds.csv currently includes not only traditional hedge funds but also other institutional investors (private equity funds, large banks, VCs, pension funds, etc., that file 13F to the SEC) selected from what I consider the top 5% of performers.

If you wish to track any of the Notable Exclusions hedge funds, you can copy the relevant rows from excluded_hedge_funds.csv into hedge_funds.csv.

Columns for Custom Funds:
  • Denomination: This is the exact legal name used by the fund in its filings. It is essential for accurately processing non-quarterly filings (13D/G, Form 4) as the scraper uses it to identify the fund's specific transactions within complex filing documents.

  • CIKs (optional): A comma-separated list of additional CIKs. This field is used to track filings from related entities or subsidiaries. Some investment firms have complex structures where different legal entities file separately (e.g., a management company and a holding company).

    Example: Jeffrey Ubben's ValueAct Holdings (CIK = 0001418814) also has filings under ValueAct Capital Management (CIK = 0001418812). By adding 0001418812 to the CIKs column, the tool aggregates non-quarterly filings from both entities for a complete view.

    "CIK","Fund","Manager","Denomination","CIKs"
    "0001418814","ValueAct","Jeffrey Ubben","ValueAct Holdings, L.P.","0001418812"

🧠 AI Models Selection

The AI Financial Analyst's primary goal is to identify stocks with the highest growth potential based on hedge fund activity. It achieves this by calculating a "Promise Score" for each stock. This score is a weighted average of various metrics derived from 13F filings. The AI's first critical task is to act as a strategist, dynamically defining the heuristic by assigning the optimal weights for these metrics based on the market conditions of the selected quarter. Its second task is to provide quantitative scores (e.g., momentum, risk) for the top-ranked stocks.

The models included in database/models.csv have been selected because they have demonstrated the best performance and reliability for these specific tasks. Through experimentation, they have proven effective at interpreting the prompts and providing insightful, well-structured responses.

Adding Custom AI Models

You can easily add or change the AI models used for analysis by editing the database/models.csv file. This allows you to experiment with different Large Language Models (LLMs) from supported providers.

To add a new model, open database/models.csv and add a new row with the following columns:

  • ID: The specific model identifier as required by the provider's API.
  • Description: A brief, user-friendly description that will be displayed in the selection menu.
  • Client: The provider of the model. Must be one of GitHub, Google, Groq, HuggingFace, or OpenRouter.

Here are the official model lists for each provider:

⚠️ Limitations & Considerations

It's crucial to understand the inherent limitations of tracking investment strategies solely through SEC filings:

Limitation Impact Mitigation
🕒 Filing Delay Data can be 45+ days old Focus on long-term strategies
🧩 Incomplete Picture Only US long positions shown Use as part of broader analysis
📉 No Short Positions Missing hedge information Consider reported positions carefully
🌎 Limited Scope No non-US stocks or other assets Supplement with additional data

A Truly Up-to-Date View

Many tracking websites rely solely on quarterly 13F filings, which means their data can be over 45 days old and miss many significant trades. Non-quarterly filings like 13D/G and Form 4 are often ignored because they are more complex to process and merge.

This tracker helps overcome that limitation by integrating multiple filing types. When analyzing the most recent quarter, the tool automatically incorporates the latest data from 13D/G and Form 4 filings. As a result, the holdings, deltas, and portfolio percentages reflect not just the static 13F snapshot, but also any significant trades that have occurred since. This provides a more dynamic and complete picture of institutional activity.

🌐 GitHub Pages Deployment

The frontend can be deployed as a static demo on GitHub Pages — no Python backend required. AI features and data updates are disabled in this mode, but all core analysis pages work with bundled data.

Live demo: https://{username}.github.io/hedge-fund-tracker/

What's Available in GitHub Pages Mode

Page Status
Dashboard (Latest Filings) Fully functional
Quarterly Trends Fully functional
Hedge Fund Portfolios Fully functional
Stocks Browser Fully functional
Funds Config Read-only (data visible, no edits)
AI Ranking Disabled (requires local backend)
AI Due Diligence Disabled (requires local backend)
AI Settings Hidden
Database Operations Hidden

How to Deploy

  1. Fork the repository on GitHub
  2. Enable GitHub Pages: Go to Settings > Pages > Source: "GitHub Actions"
  3. Push to master — the deploy workflow (.github/workflows/deploy-pages.yml) runs automatically

The build step (npm run build:gh-pages) bundles all CSV data into dist/database/ so the static site is fully self-contained.

Local Development

For full functionality (AI analysis, data updates, file editing), run locally:

pipenv install
cd app/frontend && npm install && npm run build && cd ../..
pipenv run python -m app.main

⚙️ Automation with GitHub Actions

This repository includes a GitHub Actions workflow (.github/workflows/filings-fetch.yml) designed to keep your data effortlessly up-to-date by automatically fetching the latest SEC filings.

How It Works

  • Scheduled Runs: The workflow runs automatically to check for new 13F, 13D/G, and Form 4 filings from the funds you are tracking (hedge_funds.csv). It runs four times a day from Monday to Friday (at 01:30, 13:30, 17:30, and 21:30 UTC) and once on Saturday (at 04:00 UTC).
  • Safe Branching Strategy: Instead of committing directly to your main branch, the workflow pushes all new data to a dedicated branch named automated/filings-fetch.
  • GitHub Pages Deploy: A separate workflow (.github/workflows/deploy-pages.yml) automatically rebuilds and deploys the static frontend to GitHub Pages whenever frontend or database files change on master.
  • User-Controlled Merging: This approach gives you full control. You can review the changes committed by the bot and then merge them into your main branch whenever you're ready. This prevents unexpected changes and allows you to manage updates at your own pace.
  • Automated Alerts: If the script encounters a non-quarterly filing where it cannot identify the fund owner based on your hedge_funds.csv configuration, it will automatically open a GitHub Issue in your repository, alerting you to a potential data mismatch that needs investigation.

How to Enable It

  1. Fork the Repository: Create your own fork of this project on GitHub.
  2. Enable Actions: GitHub Actions are typically enabled by default on forked repositories. You can verify this under the Actions tab of your fork.
  3. Configure Secrets: For the workflow to resolve tickers and create issues, you need to add your API keys as repository secrets. In your forked repository, you must add your FINNHUB_API_KEY as a repository secret. Go to Settings > Secrets and variables > Actions in your forked repository to add it.

🗃️ Technical Stack

🗂️ Category 🦾 Technology
Core Python 3.13+, pipenv
Backend FastAPI, uvicorn
Frontend React 18, Vite, TypeScript, Tailwind CSS
UI Components shadcn/ui, Radix UI, Lucide, Sonner
Data Viz & State Recharts, TanStack Query v5
Web Scraping Requests, Beautiful Soup 4, lxml
Reliability Tenacity, Python-Dotenv
Stocks Data yfinance, Finnhub-Stock-API, FinanceDatabase
Gen AI python-toon, Google AI SDK, OpenAI SDK

🤝🏼 Contributing & Support

💬 Loved it? Help it grow

✍🏻 Feedback

This tool is in active development, and your input is valuable. If you have any suggestions or ideas for new features, please feel free to get in touch.

📚 References

🙏🏼 Acknowledgments

This project began as a fork of sec-web-scraper-13f by Gary Pang. The original tool provided a solid foundation for scraping 13F filings from the SEC's EDGAR database. It has since been significantly re-architected and expanded into a comprehensive analysis platform, incorporating multiple filing types, AI-driven insights, and automated data management.

📄 License

This project uses a dual license:

  • Original work (Gary Pang's sec-web-scraper-13f): MIT License.
  • All new work (everything added by Alessandro Colace): Copyright © 2025 Alessandro Colace — All Rights Reserved. Personal and educational use is permitted; redistribution and commercial use require written permission.

See the LICENSE file for the full terms.