Skip to content

Conversation

foreverallama
Copy link

@foreverallama foreverallama commented Aug 20, 2025

Spinoff from #23 to read the undocumented datatype mxOPAQUE_CLASS. Part of a series of changes to read and write these types across formats v5 and v7.3.

Some context :

  • Subsystem data is written as a uint8 array. However, this looks like another MAT-file that needs to be converted and read into. This needs to parsed before reading any variables in file.
  • mxOPAQUE_CLASS variables are written with the following headers - Flags, Variable Name, Type Name, Class Name and Metadata
  • MATLAB uses the same subsystem format for both v7 and v7.3 files, so starting with v7 is good enough.

Edit:

Added a new file "MAT_subsys.jl" which contains methods for caching, parsing, and retrieving subsystem data to be assigned to an object. With this it should successfully load classdef objects. Additional context regarding how subsystem data is organized below:

  • MCOS subsystem data is a cell array tagged to a class called "FileWrapper__"
  • The first cell is a metadata array. It contains 9 blocks of metadata. Most of these blocks are to be interpreted as uint32 integers even though its written in as uint8
    -- Block 1 is a version indicator and some offset values
    -- Block 2 is a list of class and property names as uint8 integers (null terminated)
    -- Block 3 is a list of class IDs
    -- Blocks 4 and 6 contain some metadata about how linking property names and property values
    -- Block 5 is a list of object ID metadata
    -- Block 7 is a list of dynamic properties attached to the object
    -- Blocks 8 and 9 are unknown
  • Cell 2 is empty (probably reserved?)
  • Cell 3:end-3 are property values (depending on subsystem version it could be up to end-2)
  • The last 2 or 3 cells are some kind of shared class templates. Only the last cell is known - it contains default property values

@foreverallama
Copy link
Author

foreverallama commented Aug 21, 2025

With these changes, full support is added for loading classdef objects in MAT-files in both v5 and HDF5 formats. Classdef objects are returned as a Matrix{Dict{String, Any}}. The Dict is a property name, value dictionary, with an additional key __class__ containing the class name as a String.

The changes support different MAT-file and subsystem versions. It also supports loading all types of MCOS classdef objects (which is most of them), including handle class objects. Some other types I've seen are java and handle (for COM objects) which I don't know how to decode yet, but these are quite rare anyways and probably extremely specific to MATLAB.

Some notes:

  1. I didn't really add a separate test because test/v7.3/struct_table_datetime.mat already seems to contain several objects like datetime string categorical and table. Just updated the test there instead

  2. I just copied the copyright notice template from a different file, but I'm not too sure about adding a copyright notice for MAT_subsys.jl since the code is derived from reverse engineering the file format. Maybe someone else can comment on this?

  3. For most of the classes like datetime or string, you will still need to decode the property map into usable information. It would be good to have some utility functions to do that. I've already documented most of it, I'll get to it some other time though (or maybe someone else may take it up)

Edit:
Squashed and consolidated changes for readability. Adds support for loading mxOPAQUE_CLASS objects from both v7 and v7.3 formats. For a review, the main part of the code would be MAT_subsys.load_subsys! and MAT_subsys.load_mcos_object. I've also highlighted some parts I'm not sure about with FIXME or TODO

* MAT_subsys.jl: New file MAT_subsys with methods to set, parse and retrieve subsystem data
* MAT_v5.jl: New method "read_opaque" to handle mxOPAQUE_CLASS
* MAT_v5.jl: New method "read_subsystem" to handle subsystem data
* MAT.jl (matread): Update to clear subsystem and object cache after load

Support for loading mxOPAQUE_CLASS objects in v7.3 HDF5 format

* MAT_HDF5.jl (matopen): New argument Endian indicator, Reads and parses subsystem on load
* MAT_HDF5.jl (close): Update to write endian header based on system endianness
* MAT_HDF5.jl (m_read::HDF5.Dataset): Update to handle MATLAB_object_decode (mxOPAQUE_CLASS) types
* MAT_HDF5.jl (m_read::HDF5.Group): Update to read subsystem data and function_handles
* MAT.jl (matopen): Update function calls

Updated test for struct_table_datetime.mat to ensure accurate deserialization (including nested properties) in both v7 and v7.3 formats

* test/read.jl: Update tests for "function_handles.mat" and "struct_table_datetime.mat"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant