A comprehensive data pipeline by BetterGov.PH that converts Philippine government budget data (NEP/GAA) and UACS reference data into a Neo4j graph database for advanced analysis and visualization.
This project processes 6.8+ million budget records spanning fiscal years 2020-2026, along with comprehensive reference data including organizational hierarchies, geographic locations, funding sources, and expense classifications. The data is structured as a property graph to enable complex queries and relationship analysis.
- 6,858,701 total budget records (NEP + GAA 2020-2026)
- β±106.9 trillion total budget amount
- 70,000+ total nodes across all dimensions
- 13 budget files processed (7 NEP + 6 GAA)
open-government-data/
βββ data/ # Converted JSON output files
β βββ budget/ # Budget records by year/type
β β βββ 2020/
β β β βββ budget-mapping.json
β β β βββ items/ # Batched budget records
β β β βββ nep_2020_batch_*.json (6 batches)
β β β βββ gaa_2020_batch_0001.json
β β βββ 2021/ ... 2026/ # Same structure for each year
β β
β βββ funding_source/ # Funding source dimension
β β βββ _metadata.json
β β βββ authorizations.json
β β βββ financing_sources.json
β β βββ fund_categories.json
β β βββ fund_clusters.json
β β βββ funding_sources.json
β β
β βββ location/ # Geographic dimension
β β βββ _metadata.json
β β βββ regions.json
β β βββ provinces.json
β β βββ cities_municipalities.json
β β βββ barangays.json
β β βββ locations.json
β β
β βββ object_code/ # Expense classification
β β βββ _metadata.json
β β βββ classifications.json
β β βββ sub_classes.json
β β βββ groups.json
β β βββ objects.json
β β βββ sub_objects.json
β β βββ expense_categories.json
β β
β βββ organization/ # Government org hierarchy
β β βββ _metadata.json
β β βββ departments.json
β β βββ agencies.json
β β βββ operating_unit_classes.json
β β βββ operating_units.json
β β βββ organizations.json
β β
β βββ pap/ # Programs and activities
β βββ _metadata.json
β βββ sector_outcomes.json
β βββ horizontal_programs.json
β
βββ scripts/ # Data conversion scripts
β βββ nep-gaa/ # Budget data converters
β β βββ converter.py # Universal NEP/GAA converter
β β βββ input/ # Source NEP/GAA JSON files
β β βββ NEP-2020.json ... NEP-2026.json
β β βββ GAA-2020.json ... GAA-2025.json
β β
β βββ nep-gaa-excel/ # Excel-based converters (legacy)
β β βββ converter.py
β β βββ converter_nep2025.py
β β
β βββ uacs/ # UACS reference converters
β βββ funding-source/
β β βββ converter.py
β β βββ input/*.xlsx
β βββ location/
β β βββ converter.py
β β βββ Input/*.xlsx
β βββ mfo-pap/
β β βββ converter.py
β β βββ input/*.xlsx
β βββ object-code/
β β βββ analyze.py
β β βββ input/*.json
β βββ organization/
β βββ converter.py
β βββ input/*.xlsx
β
βββ sync.py # Neo4j sync script
βββ validator.py # Data validation utility
βββ .env.example # Environment template
βββ .gitignore
βββ README.md
The graph database models Philippine government budget data as interconnected entities:
graph TB
subgraph "Budget Records (6.8M+ nodes)"
BR[BudgetRecord<br/>---<br/>id: string<br/>budget_type: NEP/GAA<br/>fiscal_year: string<br/>amount: float<br/>description: string<br/>prexc_fpap_id: string<br/>sort_order: string]
end
subgraph "Organization Hierarchy (14,953 nodes)"
DEPT[Department<br/>---<br/>code: 2-digit<br/>description: string<br/>abbreviation: string]
AGY[Agency<br/>---<br/>code: 3-digit<br/>uacs_code: 5-digit<br/>description: string<br/>department_code: string]
OU[OperatingUnit<br/>---<br/>code: 7-digit<br/>uacs_code: 12-digit<br/>description: string<br/>class_code: 2-digit<br/>lower_ou_code: 5-digit]
OUC[OperatingUnitClass<br/>---<br/>code: 2-digit<br/>description: string<br/>status: string]
ORG[Organization<br/>---<br/>uacs_code: 12-digit<br/>description: string<br/>department_code: 2-digit<br/>agency_code: 3-digit<br/>operating_unit_code: 7-digit]
end
subgraph "Location Hierarchy (53,263 nodes)"
REG[Region<br/>---<br/>code: 2-digit<br/>description: string]
PROV[Province<br/>---<br/>psgc_code: string<br/>description: string<br/>region_code: string]
CITY[CityMunicipality<br/>---<br/>psgc_code: string<br/>description: string<br/>province_code: string]
BRGY[Barangay<br/>---<br/>psgc_code: 9-digit<br/>description: string<br/>status: string]
end
subgraph "Funding Source (806 nodes)"
FC[FundCluster<br/>---<br/>code: 2-digit<br/>description: string]
FIN[FinancingSource<br/>---<br/>code: 2-digit<br/>description: string]
AUTH[Authorization<br/>---<br/>code: 2-digit<br/>description: string]
FCAT[FundCategory<br/>---<br/>uacs_code: string<br/>description: string<br/>sub_category: string<br/>status: string]
FS[FundingSource<br/>---<br/>uacs_code: 8-digit<br/>code: 3-digit<br/>description: string<br/>fund_cluster: string<br/>financing_source: string<br/>authorization: string]
end
subgraph "Expense Classification (683 nodes)"
ECLASS[Classification<br/>---<br/>code: 1-digit<br/>description: string<br/>uacs_code: 8-digit]
ESUBCLASS[SubClass<br/>---<br/>code: 2-digit<br/>simple_code: 2-digit<br/>description: string<br/>classification_code: string<br/>uacs_code: 8-digit]
EGROUP[ExpenseGroup<br/>---<br/>code: 2-digit<br/>full_code: 5-digit<br/>description: string<br/>sub_class_code: string<br/>classification_code: string<br/>uacs_code: 8-digit]
EOBJ[Object<br/>---<br/>code: 3-digit<br/>full_code: 8-digit<br/>description: string<br/>group_code: string<br/>sub_class_code: string<br/>classification_code: string<br/>uacs_code: 8-digit]
SOBJ[SubObject<br/>---<br/>code: 2-digit<br/>uacs_code: 10-digit<br/>description: string<br/>object_code: string<br/>group_code: string<br/>sub_class_code: string<br/>classification_code: string<br/>expense_category: string]
end
subgraph "Program/Activity"
SO[SectorOutcome<br/>---<br/>code: string<br/>description: string]
HP[HorizontalProgram<br/>---<br/>code: string<br/>description: string]
end
%% Budget Record Relationships
BR -->|ALLOCATED_TO| ORG
BR -->|LOCATED_IN_REGION| REG
BR -->|FUNDED_BY| FS
BR -->|CLASSIFIED_AS| SOBJ
%% Organization Hierarchy
DEPT -->|HAS_AGENCY| AGY
AGY -->|HAS_OPERATING_UNIT_CLASS| OUC
OUC -->|HAS_OPERATING_UNIT| OU
%% Location Hierarchy
REG -->|HAS_PROVINCE| PROV
PROV -->|HAS_CITY| CITY
CITY -->|HAS_BARANGAY| BRGY
%% Funding Source Hierarchy
FC -->|HAS_FINANCING_SOURCE| FIN
FIN -->|HAS_AUTHORIZATION| AUTH
AUTH -->|HAS_FUND_CATEGORY| FCAT
FCAT -->|HAS_FUNDING_SOURCE| FS
%% Expense Classification Hierarchy
ECLASS -->|HAS_SUB_CLASS| ESUBCLASS
ESUBCLASS -->|HAS_GROUP| EGROUP
EGROUP -->|HAS_OBJECT| EOBJ
EOBJ -->|HAS_SUB_OBJECT| SOBJ
style BR fill:#e1f5ff
style ORG fill:#fff4e1
style REG fill:#e8f5e9
style FS fill:#f3e5f5
style SOBJ fill:#ffe0b2
- BudgetRecord: Individual budget line items containing fiscal allocations
id: Unique identifier (sequential)budget_type: NEP (National Expenditure Program) or GAA (General Appropriations Act)fiscal_year: Year of the budget (2020-2026)amount: Budget allocation amountdescription: Budget item descriptionprexc_fpap_id: Program/Project/Activity identifiersort_order: Ordering sequence
-
Department: Top-level executive departments (40 entities)
code: 2-digit department codedescription: Full department nameabbreviation: Short name/acronym
-
Agency: Government agencies under departments (466 entities)
code: 3-digit agency codeuacs_code: 5-digit UACS identifier (dept + agency)description: Full agency namedepartment_code: Parent department reference
-
OperatingUnitClass: Classification of operating units (26 entities)
code: 2-digit class codedescription: Class descriptionstatus: Active/Inactive status
-
OperatingUnit: Operational units within agencies (14,461 entities)
code: 7-digit unit codeuacs_code: 12-digit full UACS codedescription: Unit nameclass_code: Operating unit classlower_ou_code: 5-digit lower unit identifier
-
Organization: Complete organizational entities (14,461 entities)
uacs_code: 12-digit unique identifierdescription: Full organization namedepartment_code,agency_code,operating_unit_code: Hierarchy references
-
Region: Philippine administrative regions (17 entities)
code: 2-digit region codedescription: Region name (e.g., "NCR", "Region I")
-
Province: Provincial-level divisions (82 entities)
psgc_code: Philippine Standard Geographic Codedescription: Province nameregion_code: Parent region reference
-
CityMunicipality: Cities and municipalities (1,634 entities)
psgc_code: PSGC identifierdescription: City/municipality nameprovince_code: Parent province reference
-
Barangay: Barangay-level locations (42,046 entities)
psgc_code: 9-digit PSGC codedescription: Barangay namestatus: Active/Inactive status
-
FundCluster: Primary fund groupings
code: 2-digit cluster codedescription: Cluster name
-
FinancingSource: Source of financing
code: 2-digit source codedescription: Source description
-
Authorization: Budget authorization type
code: 2-digit authorization codedescription: Authorization description
-
FundCategory: Fund category classification
uacs_code: UACS identifierdescription: Category descriptionsub_category: Sub-category classificationstatus: Active/Inactive
-
FundingSource: Complete funding source (806 entities)
uacs_code: 8-digit unique identifiercode: 3-digit codedescription: Full funding source namefund_cluster,financing_source,authorization: Hierarchy references
-
Classification: Top-level expense classification
code: 1-digit class codedescription: Classification nameuacs_code: 8-digit UACS code
-
SubClass: Expense sub-classification
code: 2-digit subclass codesimple_code: Simplified codedescription: Subclass nameclassification_code: Parent classification
-
ExpenseGroup: Grouped expense categories
code: 2-digit group codefull_code: 5-digit full codedescription: Group descriptionsub_class_code,classification_code: Hierarchy references
-
Object: Expense object codes
code: 3-digit object codefull_code: 8-digit full codedescription: Object description- Hierarchy references to group, subclass, classification
-
SubObject: Detailed expense sub-objects (683 entities)
code: 2-digit sub-object codeuacs_code: 10-digit unique identifierdescription: Sub-object descriptionexpense_category: PS (Personal Services), MOOE (Maintenance), CO (Capital Outlay)- Full hierarchy references
-
SectorOutcome: Government sector outcomes
code: Outcome codedescription: Outcome description
-
HorizontalProgram: Cross-cutting programs
code: Program codedescription: Program description
The central BudgetRecord node connects to four key dimensions:
-
ALLOCATED_TO β Organization
- Links budget items to organizational units
- Enables department/agency budget analysis
- Supports organizational hierarchy traversal
-
LOCATED_IN_REGION β Region
- Geographic allocation of budget
- Regional distribution analysis
- Can traverse to Province β City β Barangay
-
FUNDED_BY β FundingSource
- Identifies funding source for budget items
- Traces funding hierarchy: FundCluster β FinancingSource β Authorization β FundCategory
- Enables funding pattern analysis
-
CLASSIFIED_AS β SubObject
- Expense classification and categorization
- Links to expense hierarchy: Classification β SubClass β ExpenseGroup β Object β SubObject
- Expense category analysis (PS/MOOE/CO)
Department (40)
βββ HAS_AGENCY β Agency (466)
βββ HAS_OPERATING_UNIT_CLASS β OperatingUnitClass (26)
βββ HAS_OPERATING_UNIT β OperatingUnit (14,461)
Region (17)
βββ HAS_PROVINCE β Province (82)
βββ HAS_CITY β CityMunicipality (1,634)
βββ HAS_BARANGAY β Barangay (42,046)
FundCluster
βββ HAS_FINANCING_SOURCE β FinancingSource
βββ HAS_AUTHORIZATION β Authorization
βββ HAS_FUND_CATEGORY β FundCategory
βββ HAS_FUNDING_SOURCE β FundingSource (806)
Classification
βββ HAS_SUB_CLASS β SubClass
βββ HAS_GROUP β ExpenseGroup
βββ HAS_OBJECT β Object
βββ HAS_SUB_OBJECT β SubObject (683)
βββ IN_CATEGORY β ExpenseCategory
The Unified Accounts Code Structure (UACS) standardizes government financial data:
Format: [DEPT(2)][AGENCY(3)][CLASS(2)][LOWER_OU(5)]
Example: 270012200001
- Department: Positions 0-1 (2 digits)
- Agency: Positions 2-4 (3 digits)
- Class: Positions 5-6 (2 digits)
- Lower Operating Unit: Positions 7-11 (5 digits)
Format: [FC(2)][FS(1)][AUTH(2)][CAT(3)]
Example: 01101001
- Fund Cluster: 2 digits
- Financing Source: 1 digit
- Authorization: 2 digits
- Category: 3 digits
Format: [CLASS(1)][SUBCLASS(2)][GROUP(2)][OBJ(3)][SUBOBJ(2)]
Example: 5020199001
- Classification: 1 digit
- Sub-Class: 2 digits
- Group: 2 digits
- Object: 3 digits
- Sub-Object: 2 digits
Format: [REGION(2)][PROVINCE(2)][CITY(2)][BARANGAY(3)]
Example: 133914001 (9 digits for barangay)
- Python 3.8+
- Neo4j Database 5.x
- Required Python packages:
pip install neo4j pandas openpyxl
- Set up Neo4j credentials in .env:
NEO4J_URI=neo4j://localhost NEO4J_USER=neo4j NEO4J_PASSWORD=your_password
Convert reference data from XLSX to JSON:
# Organization hierarchy
python scripts/uacs/organization/converter.py
# Location data
python scripts/uacs/location/converter.py
# Funding sources
python scripts/uacs/funding-source/converter.py
# Object codes (expense classification)
python scripts/uacs/object-code/converter.py
# Programs and activities
python scripts/uacs/mfo-pap/converter.pyConvert NEP/GAA budget records:
# Universal converter for all years
python scripts/nep-gaa/converter.pyThe converter handles:
- NEP 2020-2026 (7 fiscal years)
- GAA 2020-2025 (6 fiscal years)
- 8-digit and 6-digit funding code formats
- Automatic UACS code validation
Load all converted data into Neo4j:
python sync.pyThe sync process:
- Creates uniqueness constraints
- Loads reference dimensions (funding, organization, location, object codes)
- Loads budget records in batches
- Creates relationships between nodes
- Handles 6.8M+ records efficiently
| Fiscal Year | NEP Records | GAA Records | Total Amount |
|---|---|---|---|
| 2020 | 578,234 | 64,999 | β±8.33T |
| 2021 | 64,999 | 498,342 | β±4.99T |
| 2022 | 64,999 | 628,668 | β±10.55T |
| 2023 | 674,011 | 650,369 | β±21.07T |
| 2024 | 719,974 | 682,814 | β±23.07T |
| 2025 | 736,593 | 723,105 | β±25.36T |
| 2026 | 771,594 | - | β±13.59T |
| Dimension | Count | Description |
|---|---|---|
| Departments | 40 | Top-level government departments |
| Agencies | 466 | Government agencies |
| Operating Units | 14,461 | Organizational units |
| Regions | 17 | Philippine regions |
| Provinces | 82 | Philippine provinces |
| Cities/Municipalities | 1,634 | Cities and municipalities |
| Barangays | 42,046 | Barangay-level locations |
| Funding Sources | 806 | Budget funding sources |
| Expense Categories | 683 | Object code classifications |
MATCH (br:BudgetRecord {fiscal_year: "2025", budget_type: "NEP"})
-[:ALLOCATED_TO]->(org:Organization)
MATCH (dept:Department {code: org.department_code})
RETURN dept.description AS department,
SUM(br.amount) AS total_budget
ORDER BY total_budget DESC
LIMIT 10MATCH (br:BudgetRecord {fiscal_year: "2025"})
-[:LOCATED_IN_REGION]->(r:Region)
RETURN r.description AS region,
COUNT(br) AS record_count,
SUM(br.amount) AS total_amount
ORDER BY total_amount DESCMATCH (br:BudgetRecord {fiscal_year: "2025"})
-[:CLASSIFIED_AS]->(so:SubObject)
-[:IN_CATEGORY]->(ec:ExpenseCategory)
RETURN ec.description AS category,
COUNT(br) AS transactions,
SUM(br.amount) AS total
ORDER BY total DESCMATCH path = (br:BudgetRecord)-[:FUNDED_BY]->(fs:FundingSource)
-[:HAS_FUND_CLUSTER]->(fc:FundCluster)
WHERE br.fiscal_year = "2025"
RETURN DISTINCT fc.description AS fund_cluster,
COUNT(br) AS budget_items,
SUM(br.amount) AS total_amount
ORDER BY total_amount DESCMain synchronization script that loads data into Neo4j with:
- Batched node creation (5,000-10,000 per batch)
- Batched relationship creation (5,000 per batch)
- Progress tracking and error handling
- Automatic constraint creation
Budget Converter (scripts/nep-gaa/converter.py)
Universal converter supporting:
- All NEP/GAA formats (2020-2026)
- 8-digit native funding codes
- 6-digit funding codes with category lookup
- Field name variations (UACS_SOBJ_CD vs UACS_OBJ_CD)
- Sequential unique ID generation
Specialized converters for each dimension:
- Organization: Department β Agency β Operating Unit hierarchy
- Location: Region β Province β City β Barangay
- Funding: Fund Cluster β Financing Source β Authorization β Category
- Object Code: Classification β SubClass β Group β Object β SubObject
- Budget Data: DBM NEP/GAA JSON exports (2020-2026) - https://www.dbm.gov.ph/
- UACS Reference: DBM UACS master tables (XLSX format) - https://uacs.gov.ph/
- Location Data: PSA PSGC (Philippine Standard Geographic Code)
- PAP Data: DBM MFO/PAP reference tables
- Never commit .env file to version control
- Use strong passwords for Neo4j in production
- Restrict database access to authorized users only
- Consider encrypting data at rest for sensitive deployments
This repository is dedicated to the public domain under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.
You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
- β No Copyright
- β No Rights Reserved
- β No Attribution Required
For more information, see the CC0 1.0 Universal license.
This project is part of the BetterGov.PH initiative, dedicated to making Philippine government data transparent, accessible, and actionable for citizens, researchers, and policymakers.
Mission: Empower Filipinos with open government data to promote transparency, accountability, and evidence-based decision-making.
Visit us at https://bettergov.ph/ to explore more government data projects and visualizations.
Contributions are welcome! Please ensure:
- Data converters preserve UACS code integrity
- Batch processing for large datasets
- Comprehensive error handling and logging
- Documentation for new data sources
- Fork the repository and create a new branch for your feature
- Test your changes thoroughly with sample data
- Update documentation to reflect any new features or changes
- Submit a pull request with a clear description of your changes
- Data Quality: Improve validation and error handling
- Performance: Optimize batch processing and database queries
- Visualization: Create analysis tools and dashboards
- Documentation: Improve guides, examples, and API documentation
- Data Sources: Add new budget years or reference datasets
This project is made possible by contributors dedicated to government transparency:
- Project Lead: BetterGov.PH Team
- Data Engineering: Community contributors
- Special Thanks: Department of Budget and Management (DBM) for open data access
Want to see your name here? Contribute to the project!
Last Updated: October 2025 Data Coverage: FY 2020-2026 Total Records: 6.8M+ budget line items