Skip to content

Improve xarray display of structural components for multivariate time series #555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

AlexAndorra
Copy link
Contributor

Closes #545

Currently extract_components_from_idata returns square brackets notations for the state coord, making it hard to select components when working with multivariate series. Things like:

state = [
            "trend[level[gdp]]",
            "trend[trend[gdp]]",
            "trend[level[unemployment]]",
            "trend[trend[unemployment]]",
            "ar[gdp]",
            "ar[unemployment]",
]

This PR adds a restructure argument to extract_components_from_idata (default False for backwards compatibility). When True, it will restructure the state coordinates as a multi-index for easier component selection, thus enabling selections like idata.sel(component='level'), idata.sel(observed='gdp'), or even idata.sel(component='level', observed='gdp').
Again, this is especially useful for multivariate models with multiple observed states.

More precisely, the state dimension is broken down into two new ones, component and observed, whose coordinates will be [('level', 'gdp'), ('trend', 'gdp'), ('ar', 'gdp')], [('level', 'unemployment'), ('trend', 'unemployment'), ('ar', 'unemployment')] .

This also allows each observed state to have arbitrary model structure inside, which the current multivariate setup allows.

NB: This PR is a first pass, and in no way exhaustive -- we probably need to expand to more complex cases, that users will surface up. But at least it gets the ball rolling and should be self-sufficient to already merge.

@jessegrabowski
Copy link
Member

I need to think a bit about this. As a v0 I guess it's fine, but I have the feeling that the whole extract_components should be refactored to just do the right thing from jump. Probably it could make better use of arviz in the first place. For example here we're casting everything to numpy then working with that. Seems dumb?

Some general comments:

  • I'm not wild about regex when dealing with nested structures, it has a lot of sharp edges, and the patterns are quite arcane.
  • I don't think we need to be backwards compatible. We're doing API breaks with every PR these days.

@AlexAndorra
Copy link
Contributor Author

Yep, I agree with that. We do need that patch for the Berlin tutorial, but that can be just that -- a patch.
I'm all for making this better from the get-go, but have to say I won't have the bandwidth to work on such a high-stake PR. @OriolAbril will probably have some great points on whether we could and should rely more heavily on ArviZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better dimensions & coordinates for multivariate time series components
2 participants