Claude/finetune flood detection zq leg by VIncentmuyi · Pull Request #3874 · open-mmlab/mmsegmentation

VIncentmuyi · 2026-04-17T04:34:22Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D.
The documentation has been modified accordingly, like docstring or example tutorials.

修改了以下配置文件,从20000 iter训练改为100 epoch训练: - Deeplabv3+UAVflood.py - segformer_mit-b0_8xb1-160k_UAVflood-256x256.py - Unet-Uavflood.py - vit-Uavflood.py - convnext-base-uavflood.py - Swin-uavflood-256x256.py 主要修改内容: 1. param_scheduler: 将by_epoch从False改为True,调整warmup为前5个epoch,主训练为5-100 epoch 2. train_cfg: 从IterBasedTrainLoop改为EpochBasedTrainLoop,设置max_epochs=100 3. default_hooks: 将checkpoint和logger改为基于epoch,每10个epoch验证和保存一次 https://claude.ai/code/session_01HTaghbFUmt7u1CcEGHvmpJ

修改内容: 1. 从所有6个配置文件的_base_中移除了schedule_20k.py的继承,避免与epoch训练配置冲突 2. 为Deeplabv3+UAVflood.py添加了缺失的optim_wrapper配置(SGD优化器) 3. 其他5个文件已有自己的optim_wrapper定义,无需修改这样可以确保配置文件完整独立,避免iter训练配置的干扰 https://claude.ai/code/session_01HTaghbFUmt7u1CcEGHvmpJ

…OSq' # Conflicts: # configs/deeplabv3plus/Deeplabv3+UAVflood.py

…ampler The training was not stopping at max_epochs=100 because InfiniteSampler causes the dataset to loop infinitely. Changed to DefaultSampler to ensure training stops correctly after 100 epochs and validation is triggered at the configured intervals. https://claude.ai/code/session_01BjJg5WLcsLWaZp3Vx2LV5f

…cZTyT'

Same fix as benchmark script - model expects list of 3D tensors, not 4D tensors with batch dimension. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Fix visualize_expert_routing.py: dummy input should be 3D (C,H,W)

- Update MODAL_DISPLAY to new dataset names (UrbanSARFlood, FloodNet, GF-FloodNet) - Only show y-axis dataset labels on the leftmost subplot in Fig 4a and 4b to prevent long names from overlapping with adjacent subplots https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Fix overlapping y-axis labels in expert routing figures

For new flood event data that only contains one modality (e.g. RGB 3-band), overrides FixedRatioModalSampler with DefaultSampler and sets filter_modality. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Add single-modal fine-tuning config for generalization experiments

MMEngine config inheritance merges dicts recursively, so the old FixedRatioModalSampler fields (modal_ratios, modal_order, etc.) were leaking into the DefaultSampler. Using _delete_=True forces a full replacement of the sampler dict. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Fix sampler override with _delete_=True to prevent merge leak

Reads a large GeoTIFF, tiles it into patches with overlap, runs multi-modal model inference, and stitches results into a full-size GeoTIFF. Overlapping regions are averaged. Supports rasterio and GDAL. Output: flood=red(255,0,0), non-flood=black(0,0,0), preserves CRS/transform. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Add large TIF tile-based inference script with stitching

Was using per-tile min-max normalization, but training uses ImageNet-style mean/std normalization (MultiModalNormalize). This mismatch caused the model to see completely different input distributions, producing mostly non-flood predictions. Fix: use the exact same mean/std values from MultiModalNormalize for each modality (rgb, sar, GF). Also added normalization debug output. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Fix critical normalization bug in large TIF inference

Fig 4a/4c/4d now load actual test images through the dataset pipeline (with proper normalization), capturing true data-driven routing patterns. Random noise only reflects modal_bias; real images show the combined effect of input features + modal_bias on expert routing decisions. Falls back to random noise if no images found for a modality. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Use real test images instead of random noise for routing analysis

- Skip LoadAnnotations (labels not needed for routing analysis) - Use minimal pipeline: LoadImage -> Resize(256) -> Normalize -> Pack - Set test_cfg to 'whole' mode (avoid slide_inference overhead) - Remove seg_map_path dependency - Print debug info on first pipeline failure https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

…ents-NHeaL Fix routing visualization to properly load real test images

Introduce a Sen1Floods11Dataset that loads the S1Hand (2-band SAR) or S2Hand (13-band Sentinel-2 MSI) subdirectories paired with LabelHand masks. A companion LoadSen1Floods11Annotation transform decodes the signed-int label TIFFs via tifffile and remaps the -1 nodata value to the standard 255 ignore index so CrossEntropyLoss skips those pixels. Wire the new 's1' (2ch) and 's2' (13ch) modalities into MultiModalNormalize.NORM_CONFIGS with sensible defaults, and ship a tools/compute_sen1floods11_stats.py helper that can recompute mean/std from any split (nodata-masked) for users who want dataset-specific statistics. Two new finetune configs (finetune_sen1floods11_s1.py and finetune_sen1floods11_s2.py) inherit the existing freeze-backbone / retrain-stem-and-decoder recipe but override modal_configs, training_modals, and dataset_names so the pretrained Swin body is reused while the stem conv and decode head retrain from scratch for the new sensor. Shape-mismatched modal-specific weights from the pretrained ckpt are dropped by mmengine's strict=False loader.

…ion-training-SFpKO Add Sen1Floods11 S1/S2 fine-tune configs and dataset

SparseDispatcher used torch.nonzero(gates) (which treats NaN as nonzero since NaN != 0) to build _batch_index, but counted (gates > 0) (which excludes NaN) for _part_sizes. When any gate became NaN, torch.split crashed because the split sizes no longer summed to the dispatched tensor. This surfaced while fine-tuning finetune_sen1floods11_s1.py: the freshly-initialized 2-channel s1 patch embed can emit zero-norm pooled features, and F.normalize(0) -> NaN propagated through the softmax/scatter into gates. Two fixes: 1. CosineTopKGate: pass eps=1e-6 to F.normalize so zero-norm vectors no longer produce NaN logits in the first place. 2. SparseDispatcher: sanitize gates with nan_to_num as a safety net, and use a single positive_mask for both _batch_index and _part_sizes so the two can never disagree again.

Sen1Floods11 S1Hand TIFFs (and many other SAR / MSI products) encode nodata pixels as NaN or +/-Inf inside the raster. The pipeline loaded them with tifffile and fed them straight into img = (img - mean) / std which propagated NaN through the first conv, the Swin body, and the CosineTopKGate, turning every loss term (CE head + MoE balance + aux head) into NaN from the very first training step. During finetune_sen1floods11_s1 the user saw loss: nan from epoch 1 and Flood IoU pinned at 0 (the model only predicted Background). tools/compute_sen1floods11_stats.py already masks out non-finite S1Hand pixels when computing the stats, confirming this is a known property of the source data; the runtime pipeline just wasn't doing the same filtering. Fix: in MultiModalNormalize.transform, replace non-finite pixels with the per-channel mean (so they normalize to 0) before doing the actual (img - mean) / std, and then clip the result to +/-10 sigma as a safety net for products that use a finite sentinel like -9999 instead of NaN. +/-10 sigma is well outside any legitimate value for the rgb / sar / GF / s1 / s2 / multispectral configs, so real data is untouched.

Adds the missing end-to-end setup for fine-tuning the Swin+MoE backbone on Sen1Floods11 with either S1Hand (2-band SAR) or S2Hand (13-band MSI) against the shared LabelHand masks. tools/setup_sen1floods11.py (new): One-shot setup that reads data/Sen1Floods11/LabelHand/, writes deterministic 70/15/15 train/val/test splits to data/Sen1Floods11/splits/{train,val,test}.txt (MD5-hashed basenames so re-running is idempotent), and then computes per-channel mean/std for s1 and s2 using only the training split - with NaN/Inf pixels and label == -1 pixels masked out, matching how MultiModalNormalize now sanitizes inputs at runtime. Prints a copy-pasteable NORM_CONFIGS snippet so the shipped defaults can be refreshed for the actual on-disk data. configs/floodnet/finetune_sen1floods11_s1.py & configs/floodnet/finetune_sen1floods11_s2.py: * Wire ann_file='splits/train.txt' / 'splits/val.txt' / 'splits/test.txt' into the three dataloaders. Previously every dataloader scanned the full S1Hand / S2Hand directory, which meant train / val / test all saw the same tiles and any reported val mIoU was a training-set metric. BaseSegDataset._join_prefix resolves the ann_file relative to data_root, so no absolute paths in the config. * Drop RandomResize from the train pipeline. All Sen1Floods11 tiles are already 512x512; the inherited (2048, 512) + 0.5..2.0 range came from a FloodNet config and added no value for this dataset. New pipeline is Load -> LoadAnn -> RandomCrop(256) -> RandomFlip -> MultiModalNormalize -> MultiModalPad -> Pack, with LoadAnn before RandomCrop so cat_max_ratio=0.75 can reject all-background crops. * Val / test batch_size dropped to 1 (sliding-window inference on 512x512 tiles spawns 9 crops per sample; batch 16 pushed 144 crops through the network at once for no speed benefit). * Docstrings now explain the required setup order: run tools/setup_sen1floods11.py once to produce the splits (and optionally the stats), then launch tools/train.py. Both configs share the same splits so S1 vs S2 results can be compared on identical tile sets.

Remaps the two-color palette saved by SegVisualizationHook: black [0,0,0] (Background) -> #7c7c7c [124,124,124] red [255,0,0] (Flood) -> #000bc5 [0,11,197] Supports in-place overwrite or --dst output directory, single files, and recursive directory processing. https://claude.ai/code/session_01GgpRbsrDW4KRenqK8x8pQf

Problem: model predicts class 0/1 for ALL pixels including nodata regions (label=-1). Nodata pixels falsely shown as Flood in output. Training loss and IoUMetric already correctly ignore label=255 via their respective ignore_index defaults, so metrics are unaffected. Fix: - SegVisualizationHook._mask_nodata(): sets pred=255 where gt==255 before visualization. Called after evaluator.process() so metrics stay correct. _draw_sem_seg filters class>=num_classes, so nodata pixels are left uncolored (black). - remap_pred_colors.py: add --label-dir to load GT TIFFs and paint nodata pixels a distinct color (default: white #ffffff), so they are visually separable from Background. https://claude.ai/code/session_01GgpRbsrDW4KRenqK8x8pQf

Per user preference, nodata and background share the same color. Both render as black [0,0,0] after SegVisualizationHook._mask_nodata, and both get remapped to #7c7c7c by the existing COLOR_MAP entry. No GT-label loading needed. https://claude.ai/code/session_01GgpRbsrDW4KRenqK8x8pQf

CLAassistant · 2026-04-17T04:34:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ claude
❌ VIncentmuyi
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

VIncentmuyi and others added 30 commits December 3, 2025 12:50

test

d5f4786

test

e79042b

test

bd114d2

test

01529d7

test

b2405e8

test

440d160

test

eb52e5f

test

4c3e6d1

test

ef9b54f

test

7116772

test

e25fdef

test

de92dc2

test

aff08be

test

be57583

test

9e6575d

test

4604e4f

test

5fba1b2

test

8f10b23

test

7a67de8

test

c42adf3

test

d06b683

test

b569993

test

f67f3f4

test

2b3d67b

test

8251600

Merge remote-tracking branch 'origin/claude/iter-to-epoch-training-uF…

6f7e6fa

…OSq' # Conflicts: # configs/deeplabv3plus/Deeplabv3+UAVflood.py

Merge remote-tracking branch 'origin/claude/fix-checkpoint-save-path-…

d361d48

…cZTyT'

claude and others added 29 commits April 2, 2026 14:15

Fix visualize_expert_routing.py: dummy input should be 3D (C,H,W)

8f6a901

Same fix as benchmark script - model expects list of 3D tensors, not 4D tensors with batch dimension. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

Merge pull request #26 from VIncentmuyi/claude/floodnet-paper-experim…

c1f20cd

…ents-NHeaL Fix visualize_expert_routing.py: dummy input should be 3D (C,H,W)

test

8229c59

Merge branch 'main' into claude/floodnet-paper-experiments-NHeaL

0e66e7f

Merge pull request #27 from VIncentmuyi/claude/floodnet-paper-experim…

f62df8a

…ents-NHeaL Fix overlapping y-axis labels in expert routing figures

Add single-modal fine-tuning config for generalization experiments

3cf9abd

For new flood event data that only contains one modality (e.g. RGB 3-band), overrides FixedRatioModalSampler with DefaultSampler and sets filter_modality. https://claude.ai/code/session_0135dmPP4TXG96XoRSSNvu5v

Merge pull request #28 from VIncentmuyi/claude/floodnet-paper-experim…

fe66cf1

…ents-NHeaL Add single-modal fine-tuning config for generalization experiments

Merge pull request #29 from VIncentmuyi/claude/floodnet-paper-experim…

0f37ab7

…ents-NHeaL Fix sampler override with _delete_=True to prevent merge leak

test

4808c39

Merge pull request #30 from VIncentmuyi/claude/floodnet-paper-experim…

9fa96bb

…ents-NHeaL Add large TIF tile-based inference script with stitching

Merge pull request #31 from VIncentmuyi/claude/floodnet-paper-experim…

ea12db7

…ents-NHeaL Fix critical normalization bug in large TIF inference

Merge pull request #32 from VIncentmuyi/claude/floodnet-paper-experim…

3c25fd3

…ents-NHeaL Use real test images instead of random noise for routing analysis

Merge pull request #33 from VIncentmuyi/claude/floodnet-paper-experim…

fbb508a

…ents-NHeaL Fix routing visualization to properly load real test images

Merge pull request #34 from VIncentmuyi/claude/configure-flood-detect…

db6b33a

…ion-training-SFpKO Add Sen1Floods11 S1/S2 fine-tune configs and dataset

test

35c4398

test

1470e3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/finetune flood detection zq leg#3874

Claude/finetune flood detection zq leg#3874
VIncentmuyi wants to merge 141 commits intoopen-mmlab:mainfrom
VIncentmuyi:claude/finetune-flood-detection-ZqLeg

VIncentmuyi commented Apr 17, 2026

Uh oh!

CLAassistant commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

VIncentmuyi commented Apr 17, 2026

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

CLAassistant commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Apr 17, 2026 •

edited

Loading