DVEmbAttributor #193

BilyHurington · 2025-08-18T16:59:09Z

Description

1. Motivation and Context

This pull request introduces the Data Value Embedding (DVEmb) attributor, a method for calculating trajectory-specific data influence. Unlike existing methods that often overlook the temporal dynamics of model training, DVEmb captures how the influence of a data point evolves over time by creating epoch-specific embeddings, which allows for a more accurate analysis of data value.

2. Summary of the change

github issue

3. What tests have been added/updated for the change?

N/A: No test will be added (please justify)
Unit test: Typically, this should be included if you implemented a new function/fixed a bug.
Application test: If you wrote an example for the toolkit, this test should be added.
Document test: If you added an external API, then you should check if the document is correctly generated.

dattri/algorithm/dvemb.py

jiaqima · 2025-09-21T04:09:16Z

Please address the failed tests.

TheaperDeng · 2025-09-21T04:10:49Z

Please address the failed tests.

We are still working on a performance test to reproduce the a small experiment's result presented on the paper.

jiaqima · 2025-10-19T02:39:33Z

Please add some tests.

BilyHurington · 2025-10-19T13:52:44Z

Please add some tests.

I've added some tests. This is still a work in progress as we're currently adding the ghost product part.

TheaperDeng

Have not fully completed the review.

TheaperDeng · 2025-10-29T23:09:51Z

dattri/algorithm/dvemb.py

+    def __init__(
+        self,
+        model: nn.Module,
+        loss_func: Callable,


We may want to use AttributionTask here for model, loss_func's integration.

TheaperDeng · 2025-10-29T23:11:04Z

dattri/algorithm/dvemb.py

+                data_tensors: Tuple[Tensor, Tensor],
+            ) -> Tensor:
+                inputs = data_tensors[0].unsqueeze(0)
+                targets = data_tensors[1].unsqueeze(0)


If we use AttributionTask (https://github.com/TRAIS-Lab/dattri/blob/main/dattri/task.py#L40) This will be a user defined function and we don't really need to assume user's data is a tuple of 2 tensors.

TheaperDeng · 2025-10-29T23:13:01Z

dattri/algorithm/dvemb.py

+            factorization_type: Type of gradient factorization to use. Options are
+                                "none"(default),
+                                "kronecker"(same with paper),
+                                or "elementwise"(more memory-efficient).


I guess this is not directly related to memory since our behavior repect the proj_dim parameter as the main parameter to control the memory

TheaperDeng · 2025-10-29T23:16:00Z

dattri/algorithm/dvemb.py

+        loss_func: Callable,
+        device: str = "cpu",
+        proj_dim: Optional[int] = None,
+        factorization_type: str = "none",


We may also have a layer_names parameter supported here so that we can allow users to define the layers they want the grad decomposition happens. Remember, not only nn.Linear is a linear layer :)

dattri/dattri/algorithm/logra/logra.py

Lines 42 to 44 in e5eb9e7

layer_names: Optional[

Union[str, List[str]]

] = None, # Maybe support layer class as input?

TheaperDeng · 2025-10-29T23:17:03Z

dattri/algorithm/dvemb.py

+                    project_input = self._generate_projector(
+                        input_dim,
+                        self.projection_dim,
+                    )


Let's use the projector once #206 is merged.

TheaperDeng · 2025-10-29T23:18:20Z

dattri/algorithm/dvemb.py

+                )
+            projected_grads = per_sample_grads @ self.projector
+
+            scaling_factor = 1.0 / math.sqrt(self.projection_dim)


We already have a normalization in line 134 right?

fixed. Now dvemb's score is essentially equal to groundtruth's without using factorization.

gebbing12 · 2025-11-03T21:24:27Z

dattri/algorithm/dvemb.py

-        loss_func: Callable,
-        device: str = "cpu",
+        task: AttributionTask,
+        criterion: nn.Module,


Since you have already integrated the criterion into the loss_func of AttributionTask, it may be better to use get_loss_func to get it instead of redefining it.

I think it's hard to completely remove criterion. While the task does hold all the components, we need to access the functional loss_func and the stateful criterion for two different paths.
The functional task.get_grad_loss_func() is used for the full gradient path when factorization_type="none". However, when factorization is enabled, we must use _calculate_gradient_factors, which relies on backward hooks to capture the intermediate gradient factors. These hooks are only triggered by loss.backward(), requiring us to use the nn.Module directly after a standard model.forward(). Using the functional task.get_loss_func() in this case would bypass the hook system entirely.

Thanks for your reply! You could use loss = task.get_loss_func(params, data) and then call loss.backward() to trigger the backward hook. (In some versions of PyTorch, torch.func.functional_call may bypass the forward hook, so you might want to be aware of that.)

By the way, you might consider optimizing calculate_gradient_factors — instead of performing an additional forward/backward pass, you could collect the factors during the parameter update step.

I second that we should avoid the criterion argument. Factorization is also used in the LoGraAttributor, which did not use a separate criterion. Maybe take a look there about how to avoid this argument?

I think the reason is that we are using vmap and torch.func to calculate the gradient for non-factorization path and the autograd for factorization path of dvemb(we support both of them). And the loss function (target function) has a different formatting between these two. E.g.,

dattri/examples/pretrained_benchmark/influence_function_lds.py

Lines 23 to 27 in f5fb64b

def f(params, data_target_pair):

image, label = data_target_pair

loss = nn.CrossEntropyLoss()

yhat = torch.func.functional_call(model_details["model"], params, image)

return loss(yhat, label.long())

dattri/examples/pretrained_benchmark/logra_lds.py

Lines 23 to 28 in f5fb64b

def f(model, batch, device):

inputs, targets = batch

inputs = inputs.to(device)

targets = targets.to(device)

outputs = model(inputs)

return nn.functional.cross_entropy(outputs, targets)

One we we can do is to use task and remove this criterion for now, and detect if a false format is given by the user and provide clear instructions in document and error messages.

gebbing12 · 2025-11-04T03:02:35Z

test/dattri/algorithm/test_dvemb.py

+        )

    def _run_dvemb_simulation(self, attributor: DVEmbAttributor):
        """A generic simulation runner for any configured DVEmbAttributor."""


It might be fine to collect gradients for the training dataset just to verify that the algorithm runs correctly without updating the model parameters in the unit test, but it would be good to add a comment clarifying that this isn’t the correct way to collect gradients.

OK, I've added the comment clarifying that.

jiaqima · 2025-11-23T02:53:07Z

examples/dvemb/_groundtruth_loo_mlp.py

+
+from dattri.benchmark.load import load_benchmark
+
+"""This file define the MLP model."""


docstring about this file should be put at the top. Please elaborate a bit more about this file. "This file defines the MLP model" is inaccurate about this file.

dattri/algorithm/dvemb.py

jiaqima · 2025-11-23T22:48:04Z

dattri/algorithm/dvemb.py

-        loss_func: Callable,
-        device: str = "cpu",
+        task: AttributionTask,
+        criterion: nn.Module,


I second that we should avoid the criterion argument. Factorization is also used in the LoGraAttributor, which did not use a separate criterion. Maybe take a look there about how to avoid this argument?

jiaqima · 2025-11-23T22:52:02Z

examples/dvemb/dvemb_mlp.py

@@ -0,0 +1,199 @@
+"""Example code to compute Leave-One-Out (LOO) scores on MLP trained on MNIST dataset.


Update this docstrng.

TheaperDeng

I have updated the API structure and added descriptive in-line comments.

dattri/algorithm/dvemb.py

TheaperDeng · 2025-12-22T21:33:29Z

test/dattri/algorithm/test_dvemb.py

+            DVEmbAttributor(
+                task=self.task_eager,
+                factorization_type="none",
+            )


I have added a unit test to verify the validation of invalid loss function formats.

jiaqima · 2025-12-23T18:12:13Z

dattri/algorithm/dvemb.py

+                If None, uses all Linear layers.
+                You can check the names using model.named_modules().
+                Hooks will be registered on these layers to collect gradients.
+                Only available when factorization_type is not "none".


"Will only be used when ..."

jiaqima · 2025-12-23T18:17:10Z

dattri/algorithm/dvemb.py

+            factorization_type: Type of gradient factorization to use. Options are
+                                "none" (default),
+                                "kronecker" (same as in the paper),
+                                or "elementwise" (better performance while


Maybe elaborate a little bit about what "elementwise" does?

jiaqima · 2025-12-23T18:22:58Z

dattri/algorithm/dvemb.py

+            ValueError: If embeddings for the specified `epoch` are not found.
+        """
+        if not self.embeddings:
+            msg = "Embeddings not computed. Call compute_embeddings first."


Should we call compute_embeddings in self.cache() for consistency with other attributors? And in this error message we can ask the user to call cache()

dattri/algorithm/dvemb.py

Jiachen-T-Wang · 2026-01-12T20:37:33Z

dattri/algorithm/dvemb.py

+        def fwd_hook(idx: int) -> Callable:
+            def _hook(
+                layer: nn.Module,
+                inputs: Tuple[Tensor, ...],
+                _output: Tensor,
+            ) -> None:
+                a = inputs[0].detach()
+
+                if a.dim() > 2:  # noqa: PLR2004
+                    a = a.reshape(a.size(0), -1)
+                caches[idx]["A"] = a
+                caches[idx]["has_bias"] = layer.bias is not None
+
+            return _hook


In Mixed Precision, this hook forces the retention of fp32 input activations. Since Autograd saves a separate bf16 copy for the backward pass, this hook prevents the fp32 tensor from being freed, significantly increasing memory usage. To fix this, we need to use saved_tensors_hooks instead, which is non-trivial (see my recent revision to GhostSuite). For now, I suggest to make a note here and in the README.

…the embedding

tingwl0122 linked an issue Aug 18, 2025 that may be closed by this pull request

Implementation of Data Value Embedding (DVEmb) #191

Closed

TheaperDeng reviewed Sep 3, 2025

View reviewed changes

dattri/algorithm/dvemb.py Outdated Show resolved Hide resolved

dattri/algorithm/dvemb.py Show resolved Hide resolved

dattri/algorithm/dvemb.py Outdated Show resolved Hide resolved

TheaperDeng reviewed Oct 29, 2025

View reviewed changes

gebbing12 reviewed Nov 3, 2025

View reviewed changes

gebbing12 reviewed Nov 4, 2025

View reviewed changes

TheaperDeng changed the title ~~[WIP] DVEmbAttributor~~ DVEmbAttributor Nov 16, 2025

TheaperDeng requested a review from Jiachen-T-Wang November 17, 2025 16:33

jiaqima reviewed Nov 23, 2025

View reviewed changes

TheaperDeng force-pushed the DVEmb branch from 516c856 to cfd389a Compare December 20, 2025 04:21

TheaperDeng reviewed Dec 22, 2025

View reviewed changes

jiaqima reviewed Dec 23, 2025

View reviewed changes

Jiachen-T-Wang reviewed Jan 12, 2026

View reviewed changes

BilyHurington added 13 commits January 24, 2026 12:30

Initialize DVEmb

f7611b3

add NaN check

7c852b4

add example of dvemb

d941930

fix ruff check error

8d624ed

fix darglint error

e04ef06

fix dvemb_mlp.py bug

a4ff0e0

add test for dvemb

8175362

add gradient factorization to dvemb

fd68c49

update dvemb example

ed769e7

update unittest for dvemb

2f53a69

fix darglint error

daf68b3

fix unittest bug for dvemb

ad06325

fixed random projection bug

1f1baeb

BilyHurington and others added 18 commits January 24, 2026 12:30

change the initialization argument to AttributionTask

15e323b

add customizable layers argument for factorization.

10ca09e

update unittest for dvemb

9879378

optimize memory usage by clearing cached gradients after calculating …

6f6932d

…the embedding

fix darglint error in dvemb

b9ed58e

update example for dvemb

6635dd1

add a warning comment in dvemb unittest

6769e15

replace random projection to random_project in projection.py

23958a5

fix examples and some minor issues in dvemb

2bbd89e

update test

3c9a4c8

fix several scaling factor bugs

9d520e7

update API and in-line comments

2a417e4

update API and in-line comments

6acd5a1

update API and in-line comments

898c3e5

update API and in-line comments

acaac62

fix language issues

ea8d8b6

fix comments and API name

6a63ffc

update to support 3d

376aed3

TheaperDeng force-pushed the DVEmb branch from 74d5429 to 376aed3 Compare January 26, 2026 07:12

fix bug

5cbfebb

TheaperDeng approved these changes Jan 27, 2026

View reviewed changes

TheaperDeng merged commit f7d41d1 into TRAIS-Lab:main Jan 27, 2026
5 checks passed

TheaperDeng mentioned this pull request Jan 27, 2026

Use use saved_tensors_hooks for DVEmbAttributor #237

Open

	layer_names: Optional[
	Union[str, List[str]]
	] = None, # Maybe support layer class as input?

	def f(params, data_target_pair):
	image, label = data_target_pair
	loss = nn.CrossEntropyLoss()
	yhat = torch.func.functional_call(model_details["model"], params, image)
	return loss(yhat, label.long())

	def f(model, batch, device):
	inputs, targets = batch
	inputs = inputs.to(device)
	targets = targets.to(device)
	outputs = model(inputs)
	return nn.functional.cross_entropy(outputs, targets)


		from dattri.benchmark.load import load_benchmark

		"""This file define the MLP model."""

		@@ -0,0 +1,199 @@
		"""Example code to compute Leave-One-Out (LOO) scores on MLP trained on MNIST dataset.

DVEmbAttributor #193

DVEmbAttributor #193

Uh oh!

Conversation

BilyHurington commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

1. Motivation and Context

2. Summary of the change

3. What tests have been added/updated for the change?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiaqima commented Sep 21, 2025

Uh oh!

TheaperDeng commented Sep 21, 2025

Uh oh!

jiaqima commented Oct 19, 2025

Uh oh!

BilyHurington commented Oct 19, 2025

Uh oh!

TheaperDeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheaperDeng Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheaperDeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

BilyHurington commented Aug 18, 2025 •

edited

Loading

TheaperDeng Oct 29, 2025 •

edited

Loading