Skip to content

Yuxi0048/PipeNetworkCompletion

Repository files navigation

Topology Prediction of Underground Utility Networks with Graph Neural Networks

CI License: BSD-3-Clause DOI Python 3.10

Research code package for Underground Utility Network Completion based on Spatial Contextual Information of Ground Facilities and Utility Anchor Points using Graph Neural Networks.

The repository includes an installable Python package, command-line evaluation scripts, environment checks, tests, documentation, data-layout guidance, and branch-based source traceability to the notebook-era repository state.

The initial version of this repository was published on December 26, 2023, and the codebase was refactored by Codex and Claude Code on May 13, 2026.

Caution

Data and model checkpoints are not distributed in this repository. Raw GIS files, processed graph pickles, model checkpoints, and derived artifacts that can reconstruct the utility network are excluded. Users must obtain source data from the original public providers and follow their terms; the authors cannot redistribute raw or derived data without separate written permission from the providers.

Quick Start

Windows (PowerShell)

.\scripts\create_env.ps1
.\.conda\pipe-network-completion\python.exe -m pip install -e .
.\.conda\pipe-network-completion\python.exe scripts\verify_environment.py

macOS / Linux (bash)

conda env create -f environment.yml
conda activate pipe-network-completion
pip install -e .
python scripts/verify_environment.py

Editable installation keeps pipe_network_completion importable from scripts, notebooks, and tests.

Full Test-Split Evaluation

Run this only after the local graph data, checkpoint, and metrics files are available under the documented paths:

python scripts/evaluate_checkpoint.py \
  --checkpoint models/checkpoints/model1212_hiddensize_128_drop_00.pt \
  --metrics results/metrics/model_metrics1212.csv \
  --split test

Supporting documentation: docs/TRACEABILITY.md maps refactored modules to the notebook workflow and archived branches; docs/DATA_LAYOUT.md documents artifact locations; and models/README.md explains the checkpoint naming scheme.

Pipeline

The command-line workflow follows the artifact lifecycle:

data/raw/
  gis/sewer/              Urban Utilities sewer shapefile bundles
  gis/roads/              Brisbane City Council road shapefile bundle
  mh_road/MH_Road.pkl     local manhole-road nearest-feature table
  -> process.py
  -> data/interim/*.pkl
  -> scripts/build_graphs.py
  -> data/processed/graphs/{train,val,test}_data.pkl
  + models/checkpoints/*.pt
  + results/metrics/*.csv
  -> scripts/evaluate_checkpoint.py
  -> checkpoint metrics
  1. Preprocess GIS shapefiles and the manhole-road near table into *_proc.pkl + split_mask.pkl:

    python process.py
  2. Assemble per-split HeteroData graphs from those pickles:

    python scripts/build_graphs.py
  3. Evaluate a saved checkpoint against the published metrics:

    python scripts/evaluate_checkpoint.py

Each script supports --help.

Main Artifacts

Source Traceability

The main branch is the maintained research code package. Notebook-era source code and earlier repository layouts are preserved in sanitized archived remote branches, so readers can inspect the historical code without adding data files to the current runnable tree.

git fetch --all --tags
git branch -r
git switch --detach origin/Legacy-Final

Use origin/Legacy-Final for the previous final research state and origin/Legacy-main for the earlier main-branch snapshot. Return to the current codebase with:

git switch main

Data

The source code is public. Raw GIS files, processed graph pickles, model checkpoints, and other artifacts that can reconstruct the utility network are not redistributed in this repository. Users should obtain source GIS data directly from the relevant public data providers and follow their terms of use. The authors do not redistribute raw or derived data artifacts unless separate written permission is obtained from the relevant data providers.

Input Files For process.py

Place each shapefile bundle in the folder below. Keep every .shp file with its matching sidecars, including .dbf, .shx, .prj, and any other files exported with the same base name.

File placement Dataset / source Type Role in the workflow
data/raw/gis/sewer/
SewerManholes_
ExportFeatures.shp
Urban Utilities
sewer manholes
Point Main manhole/anchor-point layer. Combined with MH_Road.pkl to build MH_proc.pkl.
data/raw/gis/sewer/
SewerGravityMa_
ExportFeature1.shp
Urban Utilities
sewer gravity main - trunk
Line Trunk gravity-main segments. Combined with SewerGravityMa_ExportFeature2.shp to build Line_proc.pkl.
data/raw/gis/sewer/
SewerGravityMa_
ExportFeature2.shp
Urban Utilities
sewer gravity main
Line Main gravity sewer segments. Combined with SewerGravityMa_ExportFeature1.shp to build Line_proc.pkl.
data/raw/gis/sewer/
SewersqlSewerP_
ExportFeature.shp
Urban Utilities
sewer pump assets
Point Loaded by process.py as pump point assets. In the current preprocessing path, pump rows without the manhole-road near-table fields are filtered before MH_proc.pkl is written.
data/raw/gis/roads/
Roads_
ExportFeatures.shp
Brisbane City Council Open Data
road hierarchy
Line Road context layer used to build road nodes and road-road relationships.
data/raw/mh_road/
MH_Road.pkl
Locally generated
manhole-road near table
Table Links manholes to nearby roads. Expected fields are OBJECTID, NEAR_FID, NEAR_POS, NEAR_DIST, and SIDE.

If pump assets should be represented as graph nodes, revise process.py and regenerate the derived artifacts; otherwise the table above documents the current notebook-compatible preprocessing path.

Generated local artifacts:

  • data/interim/MH_proc.pkl
  • data/interim/Road_proc.pkl
  • data/interim/MH_R_RL_proc.pkl
  • data/interim/Line_proc.pkl
  • data/interim/R_R_proc.pkl
  • data/interim/split_mask.pkl
  • data/processed/graphs/train_data.pkl
  • data/processed/graphs/val_data.pkl
  • data/processed/graphs/test_data.pkl

Public GitHub releases should not attach these artifacts unless redistribution permission is documented.

Running With Restricted Data

Readers can use the repository at two levels:

  1. Code and environment check: clone the repository, create the environment, install the package, and run scripts/verify_environment.py. This path does not require raw GIS files or model artifacts.
  2. Provider-data workflow: obtain GIS data directly from the relevant public data providers, follow their terms of use, place the files in the documented data/raw/ layout, then run process.py and scripts/build_graphs.py. Model evaluation requires locally generated graph artifacts and a checkpoint the reader is permitted to use.

Tests

The pytest suite checks imports, path constants, and the checkpoint/metrics inventory:

pip install -e ".[test]"
pytest

Artifact-backed validation for local provider data is handled by scripts/verify_environment.py --load-data.

Releases

Tagged source releases are published at github.com/Yuxi0048/PipeNetworkCompletion/releases. Public releases contain source code and documentation. Data archives are not provided through this repository. The release procedure is documented in RELEASE.md, and the version history lives in CHANGELOG.md.

Run Notes

Evaluation uses LinkNeighborLoader with finite neighborhood sampling, matching the notebook workflow. The checkpoint evaluation script sets Python, NumPy, and PyTorch seeds and reports observed metrics beside the published metrics row.

The archived notebook on origin/Legacy-Final records exploratory architecture variants. The maintained CLI focuses on checkpoint evaluation with the prepared graph artifacts.

Acknowledgments

This project appreciates Urban Utilities for public access to high-quality utility network data, and Brisbane City Council Open Data for public geospatial context used alongside those utility assets. Users are responsible for following provider terms when accessing or redistributing raw or derived data.

Citation

@inproceedings{10.22260/ISARC2024/0121,
  doi = {10.22260/ISARC2024/0121},
  year = {2024},
  month = {June},
  author = {Zhang, Yuxi and Cai, Hubo},
  title = {Underground Utility Network Completion based on Spatial Contextual Information of Ground Facilities and Utility Anchor Points using Graph Neural Networks},
  booktitle = {Proceedings of the 41st International Symposium on Automation and Robotics in Construction},
  isbn = {978-0-6458322-1-1},
  issn = {2413-5844},
  publisher = {International Association for Automation and Robotics in Construction (IAARC)},
  pages = {936-943},
  address = {Lille, France}
}

Contact: Yuxi Zhang, zhan2889@purdue.edu

Location Encoder Citation

The location encoder implementation is adapted from the space2vec/grid-cell spatial representation work:

@inproceedings{space2vec_iclr2020,
  title = {Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells},
  author = {Mai, Gengchen and Janowicz, Krzysztof and Yan, Bo and Zhu, Rui and Cai, Ling and Lao, Ni},
  booktitle = {The Eighth International Conference on Learning Representations},
  year = {2020},
  organization = {OpenReview}
}

About

Original implementation of the ISARC 2024 paper on underground utility network completion using spatial context and graph neural networks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors