Skip to content

Multivariate Detector Pipeline #57

@abaranov25

Description

@abaranov25

Proposing to add a new multivariate version of the current SigLLM detector pipeline. This pipeline mainly uses the univariate SigLLM detector pipeline but changes the representation of scalars in LLM input strings. We provide code for an implementation of a few methods of encoding multivariate inputs into a 1D string. For example, given timesteps t₀ = [50, 30, 100] and t₁ = [55, 28, 104]:

  • Value Concatenation – Simply flatten the values across time:
    - 50,30,100,55,28,104
  • Value Interleave – Pad values to equal digit length and concatenate timestep by timestep:
    - 050030100,055028104
  • Digit Interleave – Interleave digits positionally across dimensions:
    - 001530000,001520584
  • JSON Format – Encode as dimension-labeled key:value pairs:
    - d0:50,d1:30,d2:100,d0:55,d1:28,d2:104

LLMs have shown sensitivity to token structure and ordering. Thus, we also provide easy-to-use scaffolding code to implement any other multivariate formatting method. An end-user only needs to implement format_as_string and format_as_integer with the chosen method in mind. This pipeline runs a basic test to make sure that the format succeeds in encoding scalars into strings and decoding strings back into scalars before running on real data.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions