feat(pt, dpmodel): use data stat for observed type#5269
feat(pt, dpmodel): use data stat for observed type#5269iProzd wants to merge 11 commits intodeepmodeling:masterfrom
Conversation
for more information, see https://pre-commit.ci
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds observed-type support: a BaseModel accessor to read metadata, stat utilities to collect/save/restore observed types, plumbing for a preset_observed_type through atomic/model layers, training persistence into model metadata, inference preferring metadata, and tests covering the flows. Changes
Sequence DiagramsequenceDiagram
participant Training as Training Pipeline
participant Model as AtomicModel/Model
participant Sampler as Data Sampler
participant StatFile as Stat File
participant Metadata as Model Metadata
participant Inference as Inference Engine
Training->>Model: compute_or_load_stat(preset_observed_type?)
alt preset provided
Model->>Metadata: set observed_type (preset)
else no preset
Model->>StatFile: try restore observed_type
alt found
StatFile-->>Model: observed_type
Model->>Metadata: set observed_type
else not found
Model->>Sampler: sample data
Sampler-->>Model: samples
Model->>Model: collect_observed_types(samples)
Model->>StatFile: save observed_type
Model->>Metadata: set observed_type
end
end
Inference->>Model: get_observed_types()
Model->>Metadata: check info.observed_type
alt metadata available
Metadata-->>Inference: return observed_type
else no metadata
Model->>Model: get_observed_type_list()
Model-->>Inference: return fallback observed_type
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepmd/pt/model/model/model.py (1)
1-1: 🛠️ Refactor suggestion | 🟠 MajorFix ruff linting violations before merge.
The repository currently fails ruff checks with numerous violations including:
- Unused function/method arguments (ARG001, ARG002, ARG004)
- Mutable default arguments (B006)
- Missing stacklevel in warnings (B028)
- Long exception messages outside exception class (TRY003)
- Unused unpacked variables (RUF059)
- Unused local variables (F841)
- And other code quality issues
Per the mandatory coding guideline for
**/*.pyfiles, runruff check .andruff format .to identify and fix all violations before committing changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt/model/model/model.py` at line 1, Run ruff locally (ruff check .) and ruff format . to identify and auto-fix style issues, then manually address the remaining violations in deepmd/pt/model/model/model.py: remove or use unused function/method arguments (ARG001/ARG002/ARG004) or prefix with underscore, replace mutable default args (B006) with None and set inside functions, add stacklevel to warnings.warn calls (B028), construct exception messages inside the exception class or pass the message to the exception initializer (TRY003), eliminate unused unpacked variables (RUF059) and unused local variables (F841), and resolve any other ruff-reported issues; ensure edits target the specific functions/methods/classes in this module that ruff flags and re-run ruff until all violations are cleared before committing.
🧹 Nitpick comments (7)
deepmd/dpmodel/model/base_model.py (1)
145-157: Remove redundantjsonimport.The
jsonmodule is already imported at the module level (line 3), making the in-function import at line 151 unnecessary.♻️ Proposed fix
def get_observed_type_list(self) -> list[str]: """Get observed types from model metadata. Returns empty list if not available. """ if self.model_def_script: - import json - params = json.loads(self.model_def_script) observed = params.get("info", {}).get("observed_type") if observed is not None: return observed return []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/model/base_model.py` around lines 145 - 157, In get_observed_type_list remove the in-function "import json" since json is already imported at module level; simply use json.loads(self.model_def_script) directly in the method (retain the existing variable names params and observed) and keep the same return behavior when observed is None.deepmd/pt/infer/deep_eval.py (1)
739-753: Consider consistent sorting across both paths.The metadata path returns
observed_type_listdirectly without sorting, while the fallback path appliessort_element_type(). This could lead to inconsistent ordering depending on how the model was created or loaded.If the metadata is expected to already be sorted (because it was sorted when saved), this is fine. Otherwise, consider applying
sort_element_type()to both paths for consistency:♻️ Suggested change for consistent sorting
if observed_type_list is not None: return { "type_num": len(observed_type_list), - "observed_type": observed_type_list, + "observed_type": sort_element_type(observed_type_list), }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt/infer/deep_eval.py` around lines 739 - 753, The metadata branch returns observed_type_list from self.model_def_script without sorting while the fallback uses sort_element_type(), causing inconsistent ordering; update the metadata path in the method that inspects self.model_def_script (the observed_type_list variable) to also pass the list through sort_element_type() before returning so both branches return a consistently sorted "observed_type" and matching "type_num" (refer to sort_element_type and dp.model["Default"].get_observed_type_list for the fallback behavior).deepmd/dpmodel/utils/stat.py (1)
29-56: Consider logging or raising on out-of-range type indices.Line 55 silently filters out indices that are
>= len(type_map). While this is defensive, it could hide data corruption or configuration issues whereatypecontains invalid type indices. Consider adding a warning log when this occurs:♻️ Suggested improvement
+ # Check for invalid indices + invalid_indices = [i for i in sorted(observed_indices) if i >= len(type_map)] + if invalid_indices: + log.warning( + f"Found atom type indices {invalid_indices} >= len(type_map)={len(type_map)}, ignoring." + ) observed_types = [type_map[i] for i in sorted(observed_indices) if i < len(type_map)]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/utils/stat.py` around lines 29 - 56, In collect_observed_types, currently indices >= len(type_map) are silently dropped; update the function to detect any out-of-range indices from atype (use observed_indices and len(type_map)) and emit a warning (via the project logger or warnings.warn) listing the invalid indices and the originating system info (e.g., include an index or brief system identifier) so configuration/data issues are visible; do not change the existing filtering behavior beyond logging, but consider raising an exception instead if a strict mode flag is set.source/tests/consistent/test_observed_type.py (1)
48-51: Consider adding a test for malformed JSON.Currently,
get_observed_type_listwould raise ajson.JSONDecodeErrorifmodel_def_scriptcontains invalid JSON. You may want to add a test to verify this behavior or consider if the method should handle this gracefully.🧪 Optional test case
def test_malformed_json(self) -> None: model = self._make_model_with_script("{invalid json") with self.assertRaises(json.JSONDecodeError): model.get_observed_type_list()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/consistent/test_observed_type.py` around lines 48 - 51, Add a test that verifies get_observed_type_list raises json.JSONDecodeError for invalid JSON input: create a new test method (e.g., test_malformed_json) that uses _make_model_with_script("{invalid json") and asserts that calling get_observed_type_list() raises json.JSONDecodeError (use with self.assertRaises(json.JSONDecodeError)). Ensure the test imports json if needed and sits alongside test_empty_script in the same TestCase.deepmd/pt/model/atomic_model/dp_atomic_model.py (1)
368-370: Avoid retaining caller-owned list forpreset_observed_type.Store a copy to prevent accidental external mutation from changing model state.
🧩 Suggested fix
if preset_observed_type is not None: - self._observed_type = preset_observed_type + self._observed_type = list(preset_observed_type)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt/model/atomic_model/dp_atomic_model.py` around lines 368 - 370, The code currently assigns the caller-owned list directly to self._observed_type (when preset_observed_type is not None), which allows external mutation to affect model state; change the assignment in the dp_atomic_model (class/method around preset_observed_type handling in dp_atomic_model.py) to store a defensive copy (e.g., use preset_observed_type.copy() or list(preset_observed_type)) so the model keeps its own independent list instance rather than a reference to the caller's list.source/tests/pt/test_observed_type.py (1)
245-253: TearDown cleanup is too broad for cwd-level file deletion.Deleting all
model*.ptand fixed names from the working directory is brittle and may remove unrelated artifacts. Prefer per-test temp directories and scoped cleanup.Also applies to: 301-308
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt/test_observed_type.py` around lines 245 - 253, The tearDown method currently removes files by scanning the current working directory (using os.listdir and patterns like files starting with "model" and ending with "pt" or fixed names), which is too broad; change tests to create and use a scoped temporary directory (e.g., via tempfile.TemporaryDirectory or pytest's tmp_path) for any files produced by tests and update the test logic and the tearDown method (or better, use context managers/fixtures instead of tearDown) to clean only that directory rather than calling os.listdir(".") and deleting global filenames like "lcurve.out", "frozen_model.pth", "output.txt", "checkpoint", or removing "stat_files"; ensure all file operations in the test and functions under test accept or are pointed to the temp directory so cleanup is deterministic and file deletion is limited in scope.deepmd/pt/model/model/model.py (1)
33-49: Document the newpreset_observed_typeargument in the method docstring.The signature changed, but the parameter list still omits this field.
📘 Suggested doc update
Parameters ---------- sampled_func The sampled data frames from different data systems. stat_file_path The path to the statistics files. + preset_observed_type + Optional observed element types provided by user/config. If given, + implementations should prefer this value over restoring or recomputing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt/model/model/model.py` around lines 33 - 49, The docstring for the method that now accepts the parameter preset_observed_type is missing documentation for that argument; update the method docstring (the same docstring that describes sampled_func and stat_file_path and sits on the function with return type "-> NoReturn") to add a concise description of preset_observed_type (type: list[str] | None), what values it expects, and how it affects behavior (e.g., when provided it specifies which observed descriptor types to treat as pre-set and thus used/validated when computing or loading statistics; when None default behavior applies), keeping style consistent with the existing Parameters section and referencing stat_file_path and sampled_func where relevant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/pt/utils/stat.py`:
- Around line 45-49: The __all__ list in this module is not alphabetically
sorted (RUF022); reorder the entries in __all__ so they are sorted
lexicographically (e.g., "_restore_observed_type_from_file",
"_save_observed_type_to_file", "collect_observed_types") to satisfy ruff linting
and ensure ruff check . passes; update the existing __all__ variable that
currently lists "collect_observed_types", "_restore_observed_type_from_file",
"_save_observed_type_to_file".
In `@source/tests/pt/test_observed_type.py`:
- Around line 240-244: The test currently guards loading with if
observed_file.exists(), allowing silent success when the cached file is missing;
update the test_stat_file_caching test to assert the file exists before loading
(or remove the conditional so np.load runs unconditionally) and then proceed to
load data and assert keys "H" and "O" exist, referencing the observed_file
variable and the test_stat_file_caching test function to locate where to add
self.assertTrue(observed_file.exists(), ...) or remove the if-check so failures
surface properly.
---
Outside diff comments:
In `@deepmd/pt/model/model/model.py`:
- Line 1: Run ruff locally (ruff check .) and ruff format . to identify and
auto-fix style issues, then manually address the remaining violations in
deepmd/pt/model/model/model.py: remove or use unused function/method arguments
(ARG001/ARG002/ARG004) or prefix with underscore, replace mutable default args
(B006) with None and set inside functions, add stacklevel to warnings.warn calls
(B028), construct exception messages inside the exception class or pass the
message to the exception initializer (TRY003), eliminate unused unpacked
variables (RUF059) and unused local variables (F841), and resolve any other
ruff-reported issues; ensure edits target the specific functions/methods/classes
in this module that ruff flags and re-run ruff until all violations are cleared
before committing.
---
Nitpick comments:
In `@deepmd/dpmodel/model/base_model.py`:
- Around line 145-157: In get_observed_type_list remove the in-function "import
json" since json is already imported at module level; simply use
json.loads(self.model_def_script) directly in the method (retain the existing
variable names params and observed) and keep the same return behavior when
observed is None.
In `@deepmd/dpmodel/utils/stat.py`:
- Around line 29-56: In collect_observed_types, currently indices >=
len(type_map) are silently dropped; update the function to detect any
out-of-range indices from atype (use observed_indices and len(type_map)) and
emit a warning (via the project logger or warnings.warn) listing the invalid
indices and the originating system info (e.g., include an index or brief system
identifier) so configuration/data issues are visible; do not change the existing
filtering behavior beyond logging, but consider raising an exception instead if
a strict mode flag is set.
In `@deepmd/pt/infer/deep_eval.py`:
- Around line 739-753: The metadata branch returns observed_type_list from
self.model_def_script without sorting while the fallback uses
sort_element_type(), causing inconsistent ordering; update the metadata path in
the method that inspects self.model_def_script (the observed_type_list variable)
to also pass the list through sort_element_type() before returning so both
branches return a consistently sorted "observed_type" and matching "type_num"
(refer to sort_element_type and dp.model["Default"].get_observed_type_list for
the fallback behavior).
In `@deepmd/pt/model/atomic_model/dp_atomic_model.py`:
- Around line 368-370: The code currently assigns the caller-owned list directly
to self._observed_type (when preset_observed_type is not None), which allows
external mutation to affect model state; change the assignment in the
dp_atomic_model (class/method around preset_observed_type handling in
dp_atomic_model.py) to store a defensive copy (e.g., use
preset_observed_type.copy() or list(preset_observed_type)) so the model keeps
its own independent list instance rather than a reference to the caller's list.
In `@deepmd/pt/model/model/model.py`:
- Around line 33-49: The docstring for the method that now accepts the parameter
preset_observed_type is missing documentation for that argument; update the
method docstring (the same docstring that describes sampled_func and
stat_file_path and sits on the function with return type "-> NoReturn") to add a
concise description of preset_observed_type (type: list[str] | None), what
values it expects, and how it affects behavior (e.g., when provided it specifies
which observed descriptor types to treat as pre-set and thus used/validated when
computing or loading statistics; when None default behavior applies), keeping
style consistent with the existing Parameters section and referencing
stat_file_path and sampled_func where relevant.
In `@source/tests/consistent/test_observed_type.py`:
- Around line 48-51: Add a test that verifies get_observed_type_list raises
json.JSONDecodeError for invalid JSON input: create a new test method (e.g.,
test_malformed_json) that uses _make_model_with_script("{invalid json") and
asserts that calling get_observed_type_list() raises json.JSONDecodeError (use
with self.assertRaises(json.JSONDecodeError)). Ensure the test imports json if
needed and sits alongside test_empty_script in the same TestCase.
In `@source/tests/pt/test_observed_type.py`:
- Around line 245-253: The tearDown method currently removes files by scanning
the current working directory (using os.listdir and patterns like files starting
with "model" and ending with "pt" or fixed names), which is too broad; change
tests to create and use a scoped temporary directory (e.g., via
tempfile.TemporaryDirectory or pytest's tmp_path) for any files produced by
tests and update the test logic and the tearDown method (or better, use context
managers/fixtures instead of tearDown) to clean only that directory rather than
calling os.listdir(".") and deleting global filenames like "lcurve.out",
"frozen_model.pth", "output.txt", "checkpoint", or removing "stat_files"; ensure
all file operations in the test and functions under test accept or are pointed
to the temp directory so cleanup is deterministic and file deletion is limited
in scope.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
deepmd/dpmodel/model/base_model.pydeepmd/dpmodel/utils/stat.pydeepmd/entrypoints/show.pydeepmd/pt/infer/deep_eval.pydeepmd/pt/model/atomic_model/base_atomic_model.pydeepmd/pt/model/atomic_model/dp_atomic_model.pydeepmd/pt/model/model/make_model.pydeepmd/pt/model/model/model.pydeepmd/pt/train/training.pydeepmd/pt/utils/stat.pysource/tests/consistent/test_observed_type.pysource/tests/pt/test_observed_type.py
| if observed_file.exists(): | ||
| data = np.load(str(observed_file), allow_pickle=True) | ||
| self.assertIn("H", data.tolist()) | ||
| self.assertIn("O", data.tolist()) | ||
|
|
There was a problem hiding this comment.
test_stat_file_caching can pass even when caching is broken.
The if observed_file.exists() guard suppresses failure when the file is missing, so this test can silently succeed.
✅ Suggested fix
observed_file = stat_base / "observed_type"
- if observed_file.exists():
- data = np.load(str(observed_file), allow_pickle=True)
- self.assertIn("H", data.tolist())
- self.assertIn("O", data.tolist())
+ self.assertTrue(
+ observed_file.exists(),
+ "observed_type cache file was not created",
+ )
+ data = np.load(str(observed_file), allow_pickle=True)
+ self.assertIn("H", data.tolist())
+ self.assertIn("O", data.tolist())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if observed_file.exists(): | |
| data = np.load(str(observed_file), allow_pickle=True) | |
| self.assertIn("H", data.tolist()) | |
| self.assertIn("O", data.tolist()) | |
| self.assertTrue( | |
| observed_file.exists(), | |
| "observed_type cache file was not created", | |
| ) | |
| data = np.load(str(observed_file), allow_pickle=True) | |
| self.assertIn("H", data.tolist()) | |
| self.assertIn("O", data.tolist()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/tests/pt/test_observed_type.py` around lines 240 - 244, The test
currently guards loading with if observed_file.exists(), allowing silent success
when the cached file is missing; update the test_stat_file_caching test to
assert the file exists before loading (or remove the conditional so np.load runs
unconditionally) and then proceed to load data and assert keys "H" and "O"
exist, referencing the observed_file variable and the test_stat_file_caching
test function to locate where to add self.assertTrue(observed_file.exists(),
...) or remove the if-check so failures surface properly.
There was a problem hiding this comment.
Pull request overview
This pull request implements automatic collection and persistence of observed atom types during model training, using data statistics rather than relying solely on bias-based detection. The feature collects which element types actually appear in the training data and stores this information in the model metadata, with support for stat file caching and user presets.
Changes:
- Adds
collect_observed_types(),_save_observed_type_to_file(), and_restore_observed_type_from_file()functions to collect and persist observed types from training data - Integrates observed type collection into the training pipeline with support for stat file caching and user-specified overrides
- Updates inference code to prioritize metadata-based observed types over bias-based fallback methods
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| source/tests/pt/test_observed_type.py | Comprehensive test suite for observed type collection, persistence, and integration with training pipeline |
| source/tests/consistent/test_observed_type.py | Tests for dpmodel's metadata-based observed type retrieval |
| deepmd/dpmodel/utils/stat.py | Backend-agnostic implementation of observed type collection and file I/O functions |
| deepmd/pt/utils/stat.py | Re-exports observed type functions from dpmodel for use in PyTorch backend |
| deepmd/pt/model/atomic_model/base_atomic_model.py | Adds _observed_type attribute initialization to base atomic model |
| deepmd/pt/model/atomic_model/dp_atomic_model.py | Implements observed type collection with priority: user preset > stat file > computed from data |
| deepmd/pt/model/model/model.py | Adds preset_observed_type parameter to compute_or_load_stat interface |
| deepmd/pt/model/model/make_model.py | Passes through preset_observed_type parameter to atomic model |
| deepmd/pt/train/training.py | Integrates observed type collection into training, persisting results to model_params and model_def_script |
| deepmd/pt/infer/deep_eval.py | Updates get_observed_types() to prioritize metadata over bias-based fallback |
| deepmd/entrypoints/show.py | Updates show command to read observed types from metadata when available |
| deepmd/dpmodel/model/base_model.py | Adds get_observed_type_list() method to retrieve observed types from model metadata |
Comments suppressed due to low confidence (2)
deepmd/pt/model/atomic_model/dp_atomic_model.py:334
- The docstring for
compute_or_load_statis missing documentation for the newpreset_observed_typeparameter. All parameters should be documented in the Parameters section to maintain consistency with the rest of the codebase and help users understand its purpose.
def compute_or_load_stat(
self,
sampled_func: Callable[[], list[dict]],
stat_file_path: DPPath | None = None,
compute_or_load_out_stat: bool = True,
preset_observed_type: list[str] | None = None,
) -> None:
"""
Compute or load the statistics parameters of the model,
such as mean and standard deviation of descriptors or the energy bias of the fitting net.
When `sampled` is provided, all the statistics parameters will be calculated (or re-calculated for update),
and saved in the `stat_file_path`(s).
When `sampled` is not provided, it will check the existence of `stat_file_path`(s)
and load the calculated statistics parameters.
Parameters
----------
sampled_func
The lazy sampled function to get data frames from different data systems.
stat_file_path
The dictionary of paths to the statistics files.
compute_or_load_out_stat : bool
Whether to compute the output statistics.
If False, it will only compute the input statistics (e.g. mean and standard deviation of descriptors).
"""
deepmd/pt/model/model/model.py:49
- The docstring for
compute_or_load_statis missing documentation for the newpreset_observed_typeparameter. All parameters should be documented in the Parameters section to maintain consistency with the codebase and help users understand its purpose.
def compute_or_load_stat(
self,
sampled_func: Any,
stat_file_path: DPPath | None = None,
preset_observed_type: list[str] | None = None,
) -> NoReturn:
"""
Compute or load the statistics parameters of the model,
such as mean and standard deviation of descriptors or the energy bias of the fitting net.
When `sampled` is provided, all the statistics parameters will be calculated (or re-calculated for update),
and saved in the `stat_file_path`(s).
When `sampled` is not provided, it will check the existence of `stat_file_path`(s)
and load the calculated statistics parameters.
Parameters
----------
sampled_func
The sampled data frames from different data systems.
stat_file_path
The path to the statistics files.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| os.remove(f) | ||
| if f in ["stat_files"]: | ||
| shutil.rmtree(f) | ||
|
|
There was a problem hiding this comment.
The condition f.endswith("pt") will incorrectly match files ending with ".pth" (like "frozen_model.pth") in addition to ".pt" files, since "pt" is a substring of "pth". This pattern is inconsistent with the rest of the codebase which uses f.endswith(".pt") with the dot. This could cause the tearDown to attempt to remove files that should be handled separately by the explicit check on line 305.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
source/tests/pt/test_observed_type.py (1)
236-239:⚠️ Potential issue | 🟡 Minor
test_stat_file_cachingcan pass even when caching is broken.The
if observed_file.exists()guard suppresses failure when the file is missing, so this test can silently succeed without actually validating the caching behavior.🔧 Proposed fix
observed_file = stat_base / "observed_type" - if observed_file.exists(): - data = np.load(str(observed_file), allow_pickle=True) - self.assertIn("H", data.tolist()) - self.assertIn("O", data.tolist()) + self.assertTrue( + observed_file.exists(), + "observed_type cache file was not created", + ) + data = np.load(str(observed_file), allow_pickle=True) + self.assertIn("H", data.tolist()) + self.assertIn("O", data.tolist())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt/test_observed_type.py` around lines 236 - 239, The test test_stat_file_caching currently guards the assertions with if observed_file.exists(), which masks missing-file failures; remove the conditional and instead assert the file exists (using observed_file.exists() or self.assertTrue) before loading, then load with np.load and assert the expected keys ("H", "O") from data.tolist(); update references in this test to use the observed_file variable and the existing np.load/data.tolist assertions so the test fails if caching did not create the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@source/tests/pt/test_observed_type.py`:
- Around line 236-239: The test test_stat_file_caching currently guards the
assertions with if observed_file.exists(), which masks missing-file failures;
remove the conditional and instead assert the file exists (using
observed_file.exists() or self.assertTrue) before loading, then load with
np.load and assert the expected keys ("H", "O") from data.tolist(); update
references in this test to use the observed_file variable and the existing
np.load/data.tolist assertions so the test fails if caching did not create the
file.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/dpmodel/utils/stat.pydeepmd/pt/infer/deep_eval.pydeepmd/pt/train/training.pydeepmd/pt/utils/stat.pysource/tests/consistent/test_observed_type.pysource/tests/pt/test_observed_type.py
🚧 Files skipped from review as they are similar to previous changes (1)
- source/tests/consistent/test_observed_type.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5269 +/- ##
==========================================
+ Coverage 82.12% 82.15% +0.02%
==========================================
Files 753 753
Lines 75825 75921 +96
Branches 3648 3649 +1
==========================================
+ Hits 62275 62374 +99
+ Misses 12382 12381 -1
+ Partials 1168 1166 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepmd/pt/model/atomic_model/pairtab_atomic_model.py (1)
229-257:⚠️ Potential issue | 🟠 MajorUse
preset_observed_typeand persist fallback observed types.
preset_observed_type(Line 234) is currently unused, andself._observed_typeis never set in this path. That weakens the new observed-type propagation contract and also triggers Ruff ARG002.🔧 Proposed fix
def compute_or_load_stat( self, sampled_func: Callable[[], list[dict]] | list[dict], stat_file_path: DPPath | None = None, compute_or_load_out_stat: bool = True, preset_observed_type: list[str] | None = None, ) -> None: @@ if compute_or_load_out_stat: self.compute_or_load_out_stat(sampled_func, stat_file_path) + + from deepmd.dpmodel.utils.stat import ( + _restore_observed_type_from_file, + _save_observed_type_to_file, + collect_observed_types, + ) + + if preset_observed_type is not None: + self._observed_type = preset_observed_type + else: + observed = _restore_observed_type_from_file(stat_file_path) + if observed is None: + sampled = sampled_func() if callable(sampled_func) else sampled_func + observed = collect_observed_types(sampled, self.type_map) + _save_observed_type_to_file(stat_file_path, observed) + self._observed_type = observedAs per coding guidelines:
**/*.py: Always runruff check .andruff format .before committing changes or CI will fail.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt/model/atomic_model/pairtab_atomic_model.py` around lines 229 - 257, compute_or_load_stat currently ignores the preset_observed_type parameter and never updates self._observed_type, causing the observed-type propagation contract to break and raising a lint ARG002; update compute_or_load_stat so that it uses preset_observed_type (if provided) to set self._observed_type before any stat computation or loading, and when preset_observed_type is None ensure you persist or fall back to an existing observed type (e.g., keep prior self._observed_type or derive it from loaded stats) so both branches (when compute_or_load_out_stat is True and when False) always result in a valid self._observed_type value; refer to the compute_or_load_stat method, the preset_observed_type parameter, and self._observed_type to locate where to assign and persist the observed-type.
🧹 Nitpick comments (1)
deepmd/pt/model/atomic_model/base_atomic_model.py (1)
370-395: Document the newpreset_observed_typeargument in the abstract API docstring.Line 375 introduces a public-facing parameter, but the Parameters section does not describe it yet. Add it to keep interface docs in sync.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt/model/atomic_model/base_atomic_model.py` around lines 370 - 395, Update the compute_or_load_stat docstring Parameters section to document the new preset_observed_type argument: state that preset_observed_type (list[str] | None, default None) is an optional list of observed statistic names/types to force as the set of statistics to compute or load (when provided only those observed types will be processed; when None behavior falls back to auto-detection from the merged data or existing stat files). Mention its effect on compute_or_load_out_stat if relevant and keep wording concise; place this description alongside the other parameter entries in the compute_or_load_stat docstring.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@deepmd/pt/model/atomic_model/pairtab_atomic_model.py`:
- Around line 229-257: compute_or_load_stat currently ignores the
preset_observed_type parameter and never updates self._observed_type, causing
the observed-type propagation contract to break and raising a lint ARG002;
update compute_or_load_stat so that it uses preset_observed_type (if provided)
to set self._observed_type before any stat computation or loading, and when
preset_observed_type is None ensure you persist or fall back to an existing
observed type (e.g., keep prior self._observed_type or derive it from loaded
stats) so both branches (when compute_or_load_out_stat is True and when False)
always result in a valid self._observed_type value; refer to the
compute_or_load_stat method, the preset_observed_type parameter, and
self._observed_type to locate where to assign and persist the observed-type.
---
Nitpick comments:
In `@deepmd/pt/model/atomic_model/base_atomic_model.py`:
- Around line 370-395: Update the compute_or_load_stat docstring Parameters
section to document the new preset_observed_type argument: state that
preset_observed_type (list[str] | None, default None) is an optional list of
observed statistic names/types to force as the set of statistics to compute or
load (when provided only those observed types will be processed; when None
behavior falls back to auto-detection from the merged data or existing stat
files). Mention its effect on compute_or_load_out_stat if relevant and keep
wording concise; place this description alongside the other parameter entries in
the compute_or_load_stat docstring.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
deepmd/pt/model/atomic_model/base_atomic_model.pydeepmd/pt/model/atomic_model/linear_atomic_model.pydeepmd/pt/model/atomic_model/pairtab_atomic_model.pydeepmd/pt/model/model/spin_model.pysource/tests/pt/test_observed_type.py
🚧 Files skipped from review as they are similar to previous changes (1)
- source/tests/pt/test_observed_type.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deepmd/dpmodel/model/base_model.py (1)
145-155: Consider adding defensive error handling for JSON parsing.If
model_def_scriptcontains a non-empty but malformed JSON string,json.loads()will raiseJSONDecodeError. Since this method is used in inference paths, an unhandled exception could disrupt the workflow.🛡️ Suggested defensive handling
def get_observed_type_list(self) -> list[str]: """Get observed types from model metadata. Returns empty list if not available. """ if self.model_def_script: - params = json.loads(self.model_def_script) - observed = params.get("info", {}).get("observed_type") - if observed is not None: - return observed + try: + params = json.loads(self.model_def_script) + observed = params.get("info", {}).get("observed_type") + if isinstance(observed, list): + return observed + except json.JSONDecodeError: + pass return []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/model/base_model.py` around lines 145 - 155, get_observed_type_list currently calls json.loads(self.model_def_script) without handling malformed JSON; wrap the JSON parsing in a try/except to catch json.JSONDecodeError (and optionally ValueError) and return [] on failure, logging or storing the parse error if needed; locate this logic in get_observed_type_list and ensure it still returns observed when parsing succeeds and an empty list when self.model_def_script is falsy or parsing fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@deepmd/dpmodel/model/base_model.py`:
- Around line 145-155: get_observed_type_list currently calls
json.loads(self.model_def_script) without handling malformed JSON; wrap the JSON
parsing in a try/except to catch json.JSONDecodeError (and optionally
ValueError) and return [] on failure, logging or storing the parse error if
needed; locate this logic in get_observed_type_list and ensure it still returns
observed when parsing succeeds and an empty list when self.model_def_script is
falsy or parsing fails.
| def test_with_observed_type_in_info(self) -> None: | ||
| script = json.dumps( | ||
| { | ||
| "info": {"observed_type": ["H", "O"]}, |
There was a problem hiding this comment.
is it a good design to add "observed_type" in the "info" block? the "info" is "Used only in multitask models" by the doc
There was a problem hiding this comment.
By design, "info" should be a metadata dict suitable for both single and multi-task model. Doc updated.
Signed-off-by: Duo <50307526+iProzd@users.noreply.github.com>
Summary by CodeRabbit
New Features
Tests