add CompactConvolutionalTransformer (CCT) architecture by felixdittrich92 · Pull Request #272 · frgfm/holocron

felixdittrich92 · 2023-01-10T13:43:05Z

This PR:

adds CCT model + variants
add necessary tests
add TransformerEncoderBlock module (nn.TransformerEncoder would work also but is still not onnx exportable)

NOTE:
@frgfm as discussed there is some space for improvements (have fun 😄)
(cct_2.pth attached)
cct_2.zip

vercel · 2023-01-10T13:43:09Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
holocron	❌ Failed (Inspect)		Jan 10, 2023 at 1:49PM (UTC)

felixdittrich92 · 2023-06-29T08:07:08Z

/quack review

darth-quack

Thanks for the PR 🙏

Other sections

[class-naming] detected at holocron/models/classification/cct.py:L189

[untested] detected at holocron/models/classification/cct.py:L38

[untested] detected at holocron/models/classification/cct.py:L93

[untested] detected at holocron/models/classification/cct.py:L163

[untested] detected at holocron/models/classification/cct.py:L173

[untested] detected at holocron/models/classification/cct.py:L189

[untested] detected at holocron/models/classification/cct.py:L209

[untested] detected at holocron/models/classification/cct.py:L237

[untested] detected at holocron/models/classification/cct.py:L262

[untested] detected at holocron/models/classification/cct.py:L287

[untested] detected at holocron/models/classification/cct.py:L312

[intermediate-variable] detected at holocron/models/classification/cct.py:L57

[intermediate-variable] detected at holocron/models/classification/cct.py:L84

[missing-type-annotations] detected at holocron/models/classification/cct.py:L88

[intermediate-variable] detected at holocron/models/classification/cct.py:L149

[missing-type-annotations] detected at holocron/models/classification/cct.py:L153

[intermediate-variable] detected at holocron/models/classification/cct.py:L169

[missing-func-docstring] detected at holocron/models/classification/cct.py:L80

[missing-func-docstring] detected at holocron/models/classification/cct.py:L83

[missing-func-docstring] detected at holocron/models/classification/cct.py:L88

[untested] detected at holocron/nn/modules/transformers.py:L15

[untested] detected at holocron/nn/modules/transformers.py:L38

[untested] detected at holocron/nn/modules/transformers.py:L75

[untested] detected at holocron/nn/modules/transformers.py:L97

[intermediate-variable] detected at holocron/nn/modules/transformers.py:L30

[missing-type-annotations] detected at holocron/nn/modules/transformers.py:L57

[intermediate-variable] detected at holocron/nn/modules/transformers.py:L67

[intermediate-variable] detected at holocron/nn/modules/transformers.py:L136

For a better experience, it's recommended to check the review comments in the tab Files Changed.

Feedback is a gift! Quack will adjust your review style when you add reactions to a review comment

darth-quack · 2023-06-29T08:07:29Z

holocron/models/classification/cct.py

+    )
+
+
+class CCT_2_Checkpoint(Enum):


Let's use the regular PascalCase convention for classes

Pattern explanation 👈
For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

# class moduleParser: class ModuleParser:

Quack feedback loop 👍👎
This comment is about [class-naming]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:30Z

holocron/models/classification/cct.py

+]
+
+
+class ConvPatchEmbed(nn.Module):


This section seems to be untested, let's fix that together!

Pattern explanation 👈
As your program evolves, refactoring and modifications may subtly cause regressions. Adding well-designed tests for your features will help spot them easily and consistently. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

import pytest from my_lib.submodule import MyClass @pytest.mark.parametrize( "input_arg, expected_output", [ [YOUR_INPUT, YOUR_OUTPUT], ],) def test_myclass_custommethod(input_arg, expected_output): obj = MyClass() assert obj.custommethod(input_arg) == expected_output

Quack feedback loop 👍👎
This comment is about [untested]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:30Z

holocron/models/classification/cct.py

+            nn.init.kaiming_normal_(m.weight)
+
+
+class CCT(nn.Module):


This section seems to be untested, let's fix that together!

Pattern explanation 👈
As your program evolves, refactoring and modifications may subtly cause regressions. Adding well-designed tests for your features will help spot them easily and consistently. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

import pytest from my_lib.submodule import MyClass @pytest.mark.parametrize( "input_arg, expected_output", [ [YOUR_INPUT, YOUR_OUTPUT], ],) def test_myclass_custommethod(input_arg, expected_output): obj = MyClass() assert obj.custommethod(input_arg) == expected_output

Quack feedback loop 👍👎
This comment is about [untested]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:30Z

holocron/models/classification/cct.py

+            nn.init.constant_(m.weight, 1.0)
+
+
+def _cct(


This section seems to be untested, let's fix that together!

Pattern explanation 👈
As your program evolves, refactoring and modifications may subtly cause regressions. Adding well-designed tests for your features will help spot them easily and consistently. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

import pytest from my_lib.submodule import MyClass @pytest.mark.parametrize( "input_arg, expected_output", [ [YOUR_INPUT, YOUR_OUTPUT], ],) def test_myclass_custommethod(input_arg, expected_output): obj = MyClass() assert obj.custommethod(input_arg) == expected_output

Quack feedback loop 👍👎
This comment is about [untested]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:30Z

holocron/models/classification/cct.py

+    return _configure_model(model, checkpoint, progress=progress)
+
+
+def _checkpoint(


This section seems to be untested, let's fix that together!

Pattern explanation 👈
As your program evolves, refactoring and modifications may subtly cause regressions. Adding well-designed tests for your features will help spot them easily and consistently. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

import pytest from my_lib.submodule import MyClass @pytest.mark.parametrize( "input_arg, expected_output", [ [YOUR_INPUT, YOUR_OUTPUT], ],) def test_myclass_custommethod(input_arg, expected_output): obj = MyClass() assert obj.custommethod(input_arg) == expected_output

Quack feedback loop 👍👎
This comment is about [untested]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:34Z

holocron/nn/modules/transformers.py

+        mask (torch.Tensor): optional mask
+    """
+
+    scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(query.size(-1))


It doesn't look like we need this intermediate variable

Pattern explanation 👈
You shouldn't concede readability, but when it's possible, avoid unnecessary memory allocation. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

# def make_upper(input_str: str) -> str: # upper_str = input_str.upper() # return upper_str def make_upper(input_str: str) -> str: return input_str.upper()

Quack feedback loop 👍👎
This comment is about [intermediate-variable]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:34Z

holocron/nn/modules/transformers.py

+        self.linear_layers = nn.ModuleList([nn.Linear(d_model, d_model) for _ in range(3)])
+        self.output_linear = nn.Linear(d_model, d_model)
+
+    def forward(self, query: torch.Tensor, key: torch.Tensor, value: torch.Tensor, mask=None) -> torch.Tensor:


Would you mind adding type annotations? 🙏

Pattern explanation 👈
In Python, type annotations are not enforced. But that can be valuable for debugging using mypy, for your collaborators' understanding, and the autocomplete of your IDE. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

# def is_upper(variable_name): def is_upper(variable_name: str) -> bool:

Quack feedback loop 👍👎
This comment is about [missing-type-annotations]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:35Z

holocron/nn/modules/transformers.py

+        ]
+
+        # apply attention on all the projected vectors in batch
+        x, attn = scaled_dot_product_attention(query, key, value, mask=mask)


It doesn't look like we need this intermediate variable

Pattern explanation 👈
You shouldn't concede readability, but when it's possible, avoid unnecessary memory allocation. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

# def make_upper(input_str: str) -> str: # upper_str = input_str.upper() # return upper_str def make_upper(input_str: str) -> str: return input_str.upper()

Quack feedback loop 👍👎
This comment is about [intermediate-variable]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:35Z

holocron/nn/modules/transformers.py

+        ]
+
+        # apply attention on all the projected vectors in batch
+        x, attn = scaled_dot_product_attention(query, key, value, mask=mask)


It doesn't look like we need this intermediate variable

Pattern explanation 👈
You shouldn't concede readability, but when it's possible, avoid unnecessary memory allocation. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

# def make_upper(input_str: str) -> str: # upper_str = input_str.upper() # return upper_str def make_upper(input_str: str) -> str: return input_str.upper()

Quack feedback loop 👍👎
This comment is about [intermediate-variable]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack · 2023-06-29T08:07:35Z

holocron/nn/modules/transformers.py

+
+    def forward(self, x: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
+
+        output = x


It doesn't look like we need this intermediate variable

Pattern explanation 👈
You shouldn't concede readability, but when it's possible, avoid unnecessary memory allocation. For more details, feel free to check this resource.

Code example

Here is an example of how this is typically handled:

# def make_upper(input_str: str) -> str: # upper_str = input_str.upper() # return upper_str def make_upper(input_str: str) -> str: return input_str.upper()

Quack feedback loop 👍👎
This comment is about [intermediate-variable]. Add the following reactions on this comment to let us know if:
- 👍 that comment was on point
- 👀 that doesn't seem right
- 👎 this isn't important for you right now
- 😕 that explanation wasn't clear

darth-quack