Skip to content

Conversation

@jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Nov 8, 2025

Summary by CodeRabbit

  • New Features
    • Added functionality to query and retrieve LAMMPS interatomic potentials by name or atomic structure
    • Enhanced potential data management capabilities with improved validation and resource path handling
    • Potential lookup features now accessible at the package level

@coderabbitai
Copy link

coderabbitai bot commented Nov 8, 2025

Walkthrough

This PR introduces a new potential management framework for LAMMPS, adding a comprehensive module with classes and utility functions for loading, querying, and mapping interatomic potentials, while exposing key query functions at the package level.

Changes

Cohort / File(s) Summary
Package-level exports
pyiron_lammps/__init__.py
Added imports for get_potential_by_name and get_potential_dataframe from potential module to expose them at package level
Potential management framework
pyiron_lammps/potential.py
New module introducing PotentialAbstract base class, LammpsPotentialFile subclass, and PotentialAvailable wrapper for potential access; utility functions for resource path discovery, dataframe manipulation, potential filtering, and path normalization; functions get_potential_dataframe, get_potential_by_name, and validate_potential_dataframe for querying potentials

Sequence Diagram

sequenceDiagram
    participant User
    participant get_potential_by_name
    participant get_resource_path_from_conda
    participant LammpsPotentialFile
    participant CSV as Potential CSV
    participant validate_potential_dataframe

    User->>get_potential_by_name: request by name
    get_potential_by_name->>get_resource_path_from_conda: discover resource path
    get_resource_path_from_conda-->>get_potential_by_name: resource path
    get_potential_by_name->>LammpsPotentialFile: load potentials
    LammpsPotentialFile->>CSV: read potential dataframe
    CSV-->>LammpsPotentialFile: dataframe
    LammpsPotentialFile->>LammpsPotentialFile: update_potential_paths
    LammpsPotentialFile-->>get_potential_by_name: filtered potential
    get_potential_by_name->>validate_potential_dataframe: validate result
    validate_potential_dataframe-->>get_potential_by_name: validated
    get_potential_by_name-->>User: potential record
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Multiple classes with inheritance patterns: Review PotentialAbstract base class design, inheritance flow in LammpsPotentialFile, and dynamic attribute/item access implementations (__getattr__, __getitem__)
  • Resource path discovery logic: Verify conda environment variable handling in get_resource_path_from_conda and path resolution in find_potential_file_base across multiple fallback paths
  • Dataframe transformation pipeline: Examine update_potential_paths and associated converter logic for column transformations (Species, Config, Filename)
  • Input validation and error handling: Check robustness of type enforcement (str/list/set), error messages in validate_potential_dataframe, and exception handling throughout
  • Path normalization: Verify POSIX conversion and absolute path resolution in convert_path_to_abs_posix

Poem

🐰 A framework hops forth, organized and neat,
Potentials now dance in a dataframe sweet,
Resources discovered with conda's good grace,
Atoms find their pairs—potential's embrace!
Paths unfold, validations ring true,
The warren's first chemistry glue ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Migrate potential interface from atomistics to pyiron_lammps' accurately describes the main change: introducing a new potential interface module to pyiron_lammps by migrating functionality from the atomistics package.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch potential

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Nov 8, 2025

Codecov Report

❌ Patch coverage is 25.00000% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.74%. Comparing base (115a677) to head (5f29115).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pyiron_lammps/potential.py 24.46% 105 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #261       +/-   ##
===========================================
- Coverage   95.17%   79.74%   -15.43%     
===========================================
  Files           4        5        +1     
  Lines         497      637      +140     
===========================================
+ Hits          473      508       +35     
- Misses         24      129      +105     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
pyiron_lammps/potential.py (5)

49-60: Consider explicit Optional type annotation.

Line 53 uses list[str] = None which implicitly creates an Optional type. PEP 484 recommends explicit Optional[list[str]] for clarity.

Apply this diff if desired:

     def __init__(
         self,
         potential_df: pandas.DataFrame,
         default_df: pandas.DataFrame = None,
-        selected_atoms: list[str] = None,
+        selected_atoms: Optional[list[str]] = None,
     ):

104-111: Consider spread operator for list construction.

Line 106 concatenates lists using +. The spread operator provides a more explicit alternative.

Apply this diff if desired:

-        selected_atoms = self._selected_atoms + [item]
+        selected_atoms = [*self._selected_atoms, item]

223-231: Consider spread operator for list construction.

Similar to the parent class, line 225 could use the spread operator for list construction.

Apply this diff if desired:

-        selected_atoms = self._selected_atoms + [item]
+        selected_atoms = [*self._selected_atoms, item]

287-304: Remove redundant backslash replacement.

Line 303 calls .replace("\\", "/") after .as_posix(), but as_posix() already converts all path separators to forward slashes, making the replacement redundant.

Apply this diff:

     return (
         Path(path.strip())
         .expanduser()
         .resolve()
         .absolute()
         .as_posix()
-        .replace("\\", "/")
     )

358-380: Consider shorter error messages or custom exception classes.

The validation logic is comprehensive and correct. However, the error messages on lines 367-370, 372-375, and 377-380 are quite verbose. While they provide helpful context, consider either:

  1. Creating custom exception classes with descriptive names, or
  2. Moving detailed messages to exception docstrings

This aligns with best practices for exception handling (TRY003).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d27297 and 5f29115.

📒 Files selected for processing (2)
  • pyiron_lammps/__init__.py (1 hunks)
  • pyiron_lammps/potential.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pyiron_lammps/__init__.py (1)
pyiron_lammps/potential.py (2)
  • get_potential_by_name (349-355)
  • get_potential_dataframe (340-346)
🪛 Ruff (0.14.3)
pyiron_lammps/potential.py

53-53: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


78-78: Avoid specifying long messages outside the exception class

(TRY003)


89-89: Avoid specifying long messages outside the exception class

(TRY003)


106-106: Consider [*self._selected_atoms, item] instead of concatenation

Replace with [*self._selected_atoms, item]

(RUF005)


213-213: Avoid specifying long messages outside the exception class

(TRY003)


225-225: Consider [*self._selected_atoms, item] instead of concatenation

Replace with [*self._selected_atoms, item]

(RUF005)


263-267: Avoid specifying long messages outside the exception class

(TRY003)


367-370: Avoid specifying long messages outside the exception class

(TRY003)


372-375: Avoid specifying long messages outside the exception class

(TRY003)


377-380: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unittest_matrix (windows-latest, 3.14)

@@ -1,5 +1,6 @@
import pyiron_lammps._version
from pyiron_lammps.output import parse_lammps_output as parse_lammps_output_files
from pyiron_lammps.potential import get_potential_by_name, get_potential_dataframe
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add new imports to __all__ or reconsider module-level import.

Line 3 imports get_potential_by_name and get_potential_dataframe at the module level, but they're not added to the __all__ list (lines 18-21). This creates ambiguity about whether these functions are part of the public API.

Recommendation: Add both functions to __all__ to explicitly expose them as part of the public API, consistent with the other exports.

Apply this diff:

 __all__ = [
+    "get_potential_by_name",
+    "get_potential_dataframe",
     "parse_lammps_output_files",
     "write_lammps_structure",
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from pyiron_lammps.potential import get_potential_by_name, get_potential_dataframe
__all__ = [
"get_potential_by_name",
"get_potential_dataframe",
"parse_lammps_output_files",
"write_lammps_structure",
]
🤖 Prompt for AI Agents
In pyiron_lammps/__init__.py around line 3, you import get_potential_by_name and
get_potential_dataframe at module level but did not add them to the module's
__all__ (lines ~18-21); update the __all__ list to include
"get_potential_by_name" and "get_potential_dataframe" so they are explicitly
exported as part of the public API, or alternatively remove the top-level import
and require users to import them from pyiron_lammps.potential if you do not
intend them to be public.

Comment on lines +8 to +34
potential_installation = """
Potential installation guide:
1. Check whether iprpy-data is installed. If not, install it using:
`conda install -c conda-forge iprpy-data`
2. Check whether the resource path is set via:
```python
import os
print(os.environ["CONDA_PREFIX"])
```
3. If the resource path is set, you can call the potential using:
```python
from atomistics.calculators import get_potential_by_name
get_potential_by_name(
potential_name=my_potential,
resource_path=os.path.join(os.environ["CONDA_PREFIX"], "share", "iprpy"),
)
```
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Update module reference in installation guide.

The installation guide references atomistics.calculators on line 25, but this PR migrates the interface to pyiron_lammps. The import statement should be updated to reflect the correct module name.

Apply this diff:

-from atomistics.calculators import get_potential_by_name
+from pyiron_lammps import get_potential_by_name
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
potential_installation = """
Potential installation guide:
1. Check whether iprpy-data is installed. If not, install it using:
`conda install -c conda-forge iprpy-data`
2. Check whether the resource path is set via:
```python
import os
print(os.environ["CONDA_PREFIX"])
```
3. If the resource path is set, you can call the potential using:
```python
from atomistics.calculators import get_potential_by_name
get_potential_by_name(
potential_name=my_potential,
resource_path=os.path.join(os.environ["CONDA_PREFIX"], "share", "iprpy"),
)
```
"""
potential_installation = """
Potential installation guide:
1. Check whether iprpy-data is installed. If not, install it using:
`conda install -c conda-forge iprpy-data`
2. Check whether the resource path is set via:
🤖 Prompt for AI Agents
In pyiron_lammps/potential.py around lines 8 to 34, the installation guide
string references the old module name "atomistics.calculators" on the import
example; update that import to the new module "pyiron_lammps" and adjust the
subsequent usage example to call get_potential_by_name from pyiron_lammps (i.e.,
replace the import line and any fully-qualified calls in the string), keeping
the rest of the textual instructions unchanged.

Comment on lines +186 to +192
def default(self):
if self._default_df is not None:
atoms_str = "_".join(sorted(self._selected_atoms))
return self._default_df[
(self._default_df["Name"] == self._default_df.loc[atoms_str].values[0])
]
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for missing default potential.

Line 190 accesses self._default_df.loc[atoms_str] without checking if atoms_str exists in the index. This will raise a KeyError if the element combination has no default defined.

Apply this diff to add proper error handling:

     def default(self):
         if self._default_df is not None:
             atoms_str = "_".join(sorted(self._selected_atoms))
+            if atoms_str not in self._default_df.index:
+                return None
             return self._default_df[
                 (self._default_df["Name"] == self._default_df.loc[atoms_str].values[0])
             ]
         return None
🤖 Prompt for AI Agents
In pyiron_lammps/potential.py around lines 186 to 192, the code directly indexes
self._default_df.loc[atoms_str] which will raise a KeyError if atoms_str is not
present; update the method to first check whether self._default_df is not None
and atoms_str exists in self._default_df.index (or catch KeyError) and if
missing return None (or raise a clear ValueError with context), otherwise
proceed to select rows matching the stored default Name; ensure no unguarded
.loc access remains so the method never throws an uncaught KeyError.

Comment on lines +194 to +221
def find_default(
self, element: Union[set[str], list[str], str]
) -> pandas.DataFrame:
"""
Find the potentials
Args:
element (set, str): element or set of elements for which you want the possible LAMMPS potentials
path (bool): choose whether to return the full path to the potential or just the potential name
Returns:
list: of possible potentials for the element or the combination of elements
"""
if isinstance(element, list):
element = set(element)
elif isinstance(element, str):
element = {element}
elif not isinstance(element, set):
raise TypeError("Only, str, list and set supported!")
element_lst = list(element)
if self._default_df is not None:
merged_lst = list(set(self._selected_atoms + element_lst))
atoms_str = "_".join(sorted(merged_lst))
return self._default_df[
(self._default_df["Name"] == self._default_df.loc[atoms_str].values[0])
]
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for missing default potential.

Line 219 has the same issue as the default() method - accessing self._default_df.loc[atoms_str] without checking if the key exists in the index.

Apply this diff:

     def find_default(
         self, element: Union[set[str], list[str], str]
     ) -> pandas.DataFrame:
         """
         Find the potentials
 
         Args:
             element (set, str): element or set of elements for which you want the possible LAMMPS potentials
             path (bool): choose whether to return the full path to the potential or just the potential name
 
         Returns:
             list: of possible potentials for the element or the combination of elements
 
         """
         if isinstance(element, list):
             element = set(element)
         elif isinstance(element, str):
             element = {element}
         elif not isinstance(element, set):
             raise TypeError("Only, str, list and set supported!")
         element_lst = list(element)
         if self._default_df is not None:
             merged_lst = list(set(self._selected_atoms + element_lst))
             atoms_str = "_".join(sorted(merged_lst))
+            if atoms_str not in self._default_df.index:
+                return None
             return self._default_df[
                 (self._default_df["Name"] == self._default_df.loc[atoms_str].values[0])
             ]
         return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def find_default(
self, element: Union[set[str], list[str], str]
) -> pandas.DataFrame:
"""
Find the potentials
Args:
element (set, str): element or set of elements for which you want the possible LAMMPS potentials
path (bool): choose whether to return the full path to the potential or just the potential name
Returns:
list: of possible potentials for the element or the combination of elements
"""
if isinstance(element, list):
element = set(element)
elif isinstance(element, str):
element = {element}
elif not isinstance(element, set):
raise TypeError("Only, str, list and set supported!")
element_lst = list(element)
if self._default_df is not None:
merged_lst = list(set(self._selected_atoms + element_lst))
atoms_str = "_".join(sorted(merged_lst))
return self._default_df[
(self._default_df["Name"] == self._default_df.loc[atoms_str].values[0])
]
return None
def find_default(
self, element: Union[set[str], list[str], str]
) -> pandas.DataFrame:
"""
Find the potentials
Args:
element (set, str): element or set of elements for which you want the possible LAMMPS potentials
path (bool): choose whether to return the full path to the potential or just the potential name
Returns:
list: of possible potentials for the element or the combination of elements
"""
if isinstance(element, list):
element = set(element)
elif isinstance(element, str):
element = {element}
elif not isinstance(element, set):
raise TypeError("Only, str, list and set supported!")
element_lst = list(element)
if self._default_df is not None:
merged_lst = list(set(self._selected_atoms + element_lst))
atoms_str = "_".join(sorted(merged_lst))
if atoms_str not in self._default_df.index:
return None
return self._default_df[
(self._default_df["Name"] == self._default_df.loc[atoms_str].values[0])
]
return None
🧰 Tools
🪛 Ruff (0.14.3)

213-213: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In pyiron_lammps/potential.py around lines 194 to 221, the code accesses
self._default_df.loc[atoms_str] without verifying that atoms_str is a valid
index key; update the logic to check whether atoms_str exists in
self._default_df.index (or wrap the access in a try/except KeyError) before
attempting to read .loc; if the key is missing, return None (or raise a clear
KeyError with a descriptive message) instead of letting an unhandled exception
occur, ensuring behavior matches the existing default() method.

Comment on lines +349 to +355
def get_potential_by_name(potential_name: str, resource_path=None):
if resource_path is None:
resource_path = get_resource_path_from_conda()
df = LammpsPotentialFile(resource_path=resource_path).list()
return update_potential_paths(
df_pot=df[df.Name == potential_name], resource_path=resource_path
).iloc[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add validation before accessing filtered result.

Line 354 filters by potential_name and immediately accesses .iloc[0] without checking if any results were found. If no matching potential exists, this will raise an IndexError instead of a meaningful error message.

Apply this diff:

 def get_potential_by_name(potential_name: str, resource_path=None):
     if resource_path is None:
         resource_path = get_resource_path_from_conda()
     df = LammpsPotentialFile(resource_path=resource_path).list()
-    return update_potential_paths(
-        df_pot=df[df.Name == potential_name], resource_path=resource_path
-    ).iloc[0]
+    filtered_df = df[df.Name == potential_name]
+    if len(filtered_df) == 0:
+        raise ValueError(f"Potential '{potential_name}' not found in database.")
+    return update_potential_paths(
+        df_pot=filtered_df, resource_path=resource_path
+    ).iloc[0]
🤖 Prompt for AI Agents
pyiron_lammps/potential.py around lines 349 to 355: the function filters df by
potential_name and directly returns .iloc[0], which will raise IndexError if no
match is found; modify the function to check if the filtered dataframe is empty
before accessing iloc[0], and if empty raise a clear ValueError (or custom
exception) stating the requested potential_name was not found and include either
a short list of available potential names or the resource_path used for lookup
so callers can diagnose the issue; if not empty, proceed to return the first row
as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants