-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
[wip][mt] Add a getter for matrix info. #11941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Return meta info with shape. - Use cupy when device is CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new getter method for matrix information that returns data with proper shape information using the array interface protocol. The implementation automatically uses CuPy arrays when the device is CUDA, improving GPU interoperability.
Changes:
- Added new
MetaFieldenum andMapMetaFieldfunction to standardize field name handling - Implemented new
GetInfooverload that returns array interface strings for device-aware data access - Added
XGDMatrixGetArrayInfoC API function to expose the new functionality - Refactored Python getters (
get_label,get_weight,get_base_margin) to use the new_get_infomethod
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| src/data/data.cc | Added MetaField enum, MapMetaField function, and new GetInfo overload for array interface; refactored existing GetInfo to use switch statements |
| src/c_api/c_api.cc | Added XGDMatrixGetArrayInfo function to expose new functionality through C API |
| python-package/xgboost/core.py | Added _get_info method and updated get_label, get_weight, get_base_margin to use it |
| include/xgboost/data.h | Added declaration for new GetInfo overload |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def get_weight(self) -> np.ndarray: | ||
| """Get the weight of the DMatrix. | ||
| Returns | ||
| ------- | ||
| weight : array | ||
| """ | ||
| return self.get_float_info("weight") | ||
| """Get the weight of the DMatrix.""" | ||
| return self._get_info("weight") | ||
|
|
||
| def get_base_margin(self) -> np.ndarray: | ||
| """Get the base margin of the DMatrix. | ||
| Returns | ||
| ------- | ||
| base_margin | ||
| """ | ||
| return self.get_float_info("base_margin") | ||
| """Get the base margin of the DMatrix.""" | ||
| return self._get_info("base_margin") |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new behavior of get_weight and get_base_margin returning CuPy arrays on CUDA devices lacks test coverage. Add tests similar to the existing test_metainfo in test_device_quantile_dmatrix.py that verifies these methods return CuPy arrays when the DMatrix is on a CUDA device.
| case MetaField::kGroupPtr: { | ||
| aif = linalg::ArrayInterfaceStr(linalg::MakeVec(this->group_ptr_)); | ||
| break; | ||
| } |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch statement is missing a default case to handle unknown field names. This means if an invalid field is passed to GetInfo, it will silently return an empty string instead of providing a clear error message. Add a default case that logs a fatal error, similar to the switch statement at lines 655-683.
| } | |
| } | |
| default: { | |
| LOG(FATAL) << "Unknown field name: " << key; | |
| break; | |
| } |
| XGB_DLL int XGDMatrixGetArrayInfo(DMatrixHandle handle, char const *field, char const **out_array) { | ||
| API_BEGIN(); | ||
| CHECK_HANDLE(); | ||
| xgboost_CHECK_C_ARG_PTR(field); | ||
|
|
||
| auto p_fmat = CastDMatrixHandle(handle); | ||
| MetaInfo const &info = p_fmat->Info(); | ||
|
|
||
| auto &res = p_fmat->GetThreadLocal().ret_str; | ||
| info.GetInfo(p_fmat->Ctx(), StringView{field}, &res); | ||
|
|
||
| *out_array = res.c_str(); | ||
| API_END(); | ||
| } |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new C API function XGDMatrixGetArrayInfo is implemented but not declared in the public C API header file (include/xgboost/c_api.h). Add a function declaration with proper documentation following the same pattern as XGDMatrixGetFloatInfo and XGDMatrixGetUIntInfo.
ref #9043