Add vs00 timed variable string events #102

geishm-ansto · 2025-09-04T04:13:59Z

Add a new flatbuffer description that is useful at ANSTO to support logging and variable strings in the nxs file writer.
The read me file was updated and there is no breaking changes.

Approval Criteria

This PR should not be merged until the ECDC Group Leader (acting or permanent) has given their explicit approval in the comments section.
SCIPP/DRAM should also be consulted on changes which may affect them.

rerpha · 2025-10-06T12:14:58Z

Hi @geishm-ansto , we were thinking of doing something similar at ISIS - nexusformat/definitions#1432 by first adding a new nexus base class (then subsequently a flatbuffers schema here for EPICS string PV value updates) that adds support for strings in the same way NXLog works. I don't know if this fits your use case too?

geishm-ansto · 2025-10-06T21:46:04Z

Hi @rerpha , it's possible that we could use it but I would need to see the details. At the moment we have implemented vs00 in a local ANSTO variant of the streaming data types and nxs writer but didn't want to add a different python package for reading the data type so raised the PR to see if there was any interest. We wanted to be able to capture system log messages during an experiment using a variable string format.

ggoneiESS · 2025-10-27T15:37:11Z

For some reason we didn't get a notification on this - I'm adding a comment so that I am updated. But yes, the current issue is writing it out in a NeXusy way

ggoneiESS · 2026-01-08T19:24:21Z

I was reminded about this today.

Are either of you using this in the filewriter? I have worked with strings before in the modules there and it can be a bit difficult when using variable lengths.

If there was an additional entry in the flatbuffer to specify size of the string it would be an improvement.

It would also be good to know about use cases.

geishm-ansto · 2026-01-11T22:20:53Z

@ggoneiESS Hi, we have a local Ansto branch of the filewriter and within that I have added support for the 'vs00' flatbuffer. We use it primarily to record logging events. It required adding a Variablestring class to the ExtensibleDataset component and a vs00 writer module.
I believe adding the string length is not necessary as it is already managed at the lower level.

/// \brief
class VariableString : public hdf5::node::ChunkedDataset {
public:
VariableString() = default;
/// \brief Create/open a fixed string length datatset.
///
/// \param Parent The group/node where the dataset is to be located.
/// \param Name The name of the dataset.
/// \param CMode Should the dataset be opened or created.
/// \param ChunkSize The number of strings in one chunk.
VariableString(const hdf5::node::Group &Parent, std::string Name, Mode CMode,
size_t ChunkSize = 1024);

/// \brief Append a new string to the dataset array
///
/// \param InString The string that is to be appended to the dataset.
void appendStringElement(std::string const &InString);

private:
hdf5::datatype::String StringType;
size_t NrOfStrings{0};
};

VariableString::VariableString(const hdf5::node::Group &Parent,
std::string Name, Mode CMode, size_t ChunkSize)
: hdf5::node::ChunkedDataset(),
StringType(hdf5::datatype::String::variable()) {

if (Mode::Create == CMode) {
hdf5::Dimensions ChunkDims{ChunkSize};
hdf5::dataspace::Simple Space({0}, {hdf5::dataspace::Simple::unlimited});
Dataset::operator=(hdf5::node::ChunkedDataset(
Parent, Name, StringType, Space, ChunkDims));
} else if (Mode::Open == CMode) {
Dataset::operator=(Parent.get_dataset(Name));
NrOfStrings = static_cast<size_t>(dataspace().size());
} else {
throw std::runtime_error(
"VariableStringValue::VariableStringValue(): Unknown mode.");
}
}

void VariableString::appendStringElement(std::string const &InString) {
Dataset::extent(0, 1); // Extend by 1 element along dimension 0
hdf5::dataspace::Hyperslab Selection{{NrOfStrings}, {1}};
write(InString, Selection);
++NrOfStrings;
}

rerpha · 2026-01-12T08:35:44Z

we aren't using it currently, but may do in the future for generic string diagnostic stuff, even if/when nexusformat/definitions#1590 is accepted and we make a new schema for SE strings.

ggoneiESS · 2026-01-14T12:39:22Z

I have done a bit of a refresher, but haven't done a deep-dive into the implementation in hdf5 2.0.0 (we aren't using that yet but we will this year).

I still worry a bit about the idea of variable-length strings. If this is used rarely (in comparison to e.g. detector data etc) it's not a big deal but:

variable-length datasets cannot be compressed
the data no longer exists contiguously (it necessarily becomes an array of pointers to strings, rather than just raw data)

And (academic but technical arguments)

heap storage requires more space than regular 'raw data' storage (i.e. how the HDF5 object exists in memory)
general reduction in I/O efficiency because it requires individual write operations for each data element rather than one write per dataset chunk (actually, chunking isn't allowed at all)

Performance is definitely at a premium V storage.

I found this via the HDF5 clinic - https://steven-varga.ca/blog/hdf5-fixed-vs-variable-benchmark/ and it provides a CPP file. It might be possible to incorporate into a filewriter test.

Geish Miladinovic added 2 commits July 23, 2025 14:25

added vs00 schema

d52894a

timestamp as long rather ulong

a236964

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vs00 timed variable string events #102

Add vs00 timed variable string events #102

Uh oh!

geishm-ansto commented Sep 4, 2025

Uh oh!

rerpha commented Oct 6, 2025

Uh oh!

geishm-ansto commented Oct 6, 2025

Uh oh!

ggoneiESS commented Oct 27, 2025

Uh oh!

ggoneiESS commented Jan 8, 2026

Uh oh!

geishm-ansto commented Jan 11, 2026

Uh oh!

rerpha commented Jan 12, 2026

Uh oh!

ggoneiESS commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add vs00 timed variable string events #102

Are you sure you want to change the base?

Add vs00 timed variable string events #102

Uh oh!

Conversation

geishm-ansto commented Sep 4, 2025

Approval Criteria

Uh oh!

rerpha commented Oct 6, 2025

Uh oh!

geishm-ansto commented Oct 6, 2025

Uh oh!

ggoneiESS commented Oct 27, 2025

Uh oh!

ggoneiESS commented Jan 8, 2026

Uh oh!

geishm-ansto commented Jan 11, 2026

Uh oh!

rerpha commented Jan 12, 2026

Uh oh!

ggoneiESS commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants