Skip to content

henrysouchien/edgar-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

edgar-parser

Open-source SEC filing parser for local Python workflows.

edgar-parser is the free parser engine behind the EdgarParser stack. It reads public SEC EDGAR filings directly and turns 10-K, 10-Q, and selected 8-K content into structured financial facts and filing sections without a market-data vendor.

Use this package when you want local, self-hosted parsing in Python. If you want the hosted API, prewarmed cache, MCP server for agents, concept registry, operational KPI extraction, or citation-oriented evidence tools, use EdgarParser and the hosted documentation.

This is not a raw EDGAR scraper. The library handles XBRL namespace resolution, fiscal period matching, current vs. prior period alignment, sign normalization, and fuzzy metric lookup so you get analysis-ready data from SEC filings.

Open engine vs. hosted product

Need Use
Local Python parsing of 10-K/10-Q XBRL facts edgar-parser
Local qualitative section parsing from 10-K/10-Q filings edgar-parser
Optional 8-K earnings-release extraction with your own LLM keys edgar-parser[llm]
Hosted API with auth, rate limits, and prewarmed filing caches edgarparser.com
MCP tools for Claude Desktop, Cursor, and other agents edgar-mcp
Concept registry, operational KPI extraction, langextract spans, grounded evidence, and richer agent workflows Hosted API docs

The public package stays focused on the parser engine. Hosted product features ship through the API and MCP surfaces.

What it does

Core Extraction (parse_filing)

  • Parses iXBRL facts from 10-Q and 10-K filings on SEC EDGAR
  • Resolves XBRL namespaces, units, scales, and dimensional qualifiers
  • Enriches facts with presentation roles and negated-label sign flipping

Period Matching (match_filing)

  • Aligns current and prior period values for each financial line item
  • Handles quarterly, full-year, and 4Q (derived annual) modes
  • Zip-matching with adaptive fallback to fuzzy matching for edge cases

8-K Earnings Extraction (earnings_8k) — optional, requires anthropic or openai

  • Extracts financial data from 8-K earnings press releases using Claude (primary) with OpenAI GPT-4o fallback
  • Automatic fallback when 10-Q/10-K is not yet available for a period
  • Same output schema as core extraction for seamless integration

Filing Sections (section_parser)

  • Parses qualitative sections from 10-K/10-Q filings (Risk Factors, MD&A, Business, etc.)
  • 8-K earnings release parsing via source="8k" — extracts narrative and tables from Item 2.02 exhibits
  • Summary and full-text modes with configurable word limits

High-Level Tools (tools)

  • get_financials(ticker, year, quarter) — all facts for a filing period
  • get_filings(ticker, year, quarter) — list available filings (10-Q, 10-K, 8-K)
  • get_metric(ticker, year, quarter, metric_name) — single metric lookup with fuzzy matching
  • get_filing_sections(ticker, year, quarter, source="8k") — qualitative section text (10-K/10-Q default, or 8-K earnings release)

Install

pip install edgar-parser

For 8-K earnings extraction (uses Claude API with OpenAI fallback):

pip install "edgar-parser[llm]"

Quick start

from edgar_parser.tools import get_financials, get_metric

# Get all financial facts from Apple's Q1 2025 10-Q
result = get_financials("AAPL", 2025, 1)
for fact in result["facts"][:5]:
    print(f"{fact['metric']}: {fact['current_value']}")

# Look up a single metric
revenue = get_metric("AAPL", 2025, 1, "Revenue")
print(f"Revenue: {revenue['value']} {revenue['unit']}")

Or use the lower-level API for more control:

from edgar_parser import parse_filing, match_filing

# Parse raw XBRL facts
parsed = parse_filing("AAPL", 2025, 1, full_year_mode=False)

# Match current vs. prior periods
matched = match_filing(parsed)
print(matched.head())

Key functions

Function Module Description
get_financials tools All facts for a ticker/period, with caching
get_filings tools List SEC filings (10-Q, 10-K, 8-K) for a period
get_metric tools Single metric lookup with fuzzy matching
get_filing_sections tools Qualitative section text (10-K/10-Q or 8-K with source="8k")
parse_filing pipeline Low-level XBRL fact extraction
enrich_filing pipeline Fiscal period categorization and enrichment
match_filing matching Current vs. prior period alignment
find_8k_for_period earnings_8k Find 8-K earnings release for a fiscal period

Requirements

  • Python 3.10+
  • No API key needed — data comes directly from SEC EDGAR (public)
  • Optional: ANTHROPIC_API_KEY environment variable for 8-K extraction
  • Optional: OPENAI_API_KEY environment variable for 8-K fallback when Anthropic API is unavailable

Release policy

edgar-parser 0.3.x receives bug fixes for the public parser surface: SEC compatibility fixes, correctness fixes, packaging fixes, and dependency compatibility updates. New hosted-product capabilities are not backported into this package.

That split keeps the local parser useful and auditable while the hosted EdgarParser API carries the agent-facing product surface.

See also

  • EdgarParser docs — hosted API, tool reference, MCP setup, and changelog.
  • edgar-mcp — MCP server that exposes the hosted API as AI-agent tools.

About

Structured financial data from SEC EDGAR filings — 10-Q, 10-K, and 8-K

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages