[Bugfix] Make Gelu Activations consistent across frameworks #753

vrdn-23 · 2025-11-06T01:02:29Z

What does this PR do?

This PR fixes a consistency issue with how TEI handles GeLU activation compared to the transformers library and the candle library.

It seems that the value gelu is meant to serialize to an old incorrect version of how GeLU activation (based on the comment given here) was implemented based on this code snippet in transformers.

ACT2CLS = {
    "gelu": GELUActivation,
    "gelu_10": (ClippedGELUActivation, {"min": -10, "max": 10}),
    "gelu_fast": FastGELUActivation,
    "gelu_new": NewGELUActivation,
    "gelu_python": (GELUActivation, {"use_gelu_python": True}),
    "gelu_pytorch_tanh": GELUTanh,
    "gelu_python_tanh": (GELUTanh, {"use_gelu_tanh_python": True}),
    "gelu_accurate": AccurateGELUActivation,
    "laplace": LaplaceActivation,
    "leaky_relu": nn.LeakyReLU,
    "linear": LinearActivation,
    "mish": MishActivation,
    "quick_gelu": QuickGELUActivation,
    "relu": nn.ReLU,
    "relu2": ReLUSquaredActivation,
    "relu6": nn.ReLU6,
...

This means that any config that uses the value gelu for the hidden_activation using the GeluActivation function which uses the torch.erf function. The new GeLU activation is referenced using new_gelu or gelu_pytorch_tanh.

This behavior is also what is followed by the huggingface/candle repository here (gelu corresponds to xs.gelu_erf() and not xs.gelu())

This PR brings the TEI implementation in line with how transformers parses the config.json values and how candle resolves activations.

I came across this inconsistency while I was reviewing some of the code changes I had in #746, but thought this should be opened as a separate PR, given that it will slight vary (re: correct) existing model behavior. (h/t to @bbaldino for pointing this out to me)

Please do let me know if I'm missing something obvious here as to why TEI is not in-sync with how the activation functions are defined. My understanding is that this is just a bug that got carried over from legacy code that was introduced in #41

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

@Narsil OR @alvarobartt OR @kozistr

vrdn-23 · 2025-11-06T01:44:28Z

Okay so after some more digging, it seems one of the main reasons to not change this would be speed of gelu_erf() compared to gelu(). I was digging through the candle repository and saw some relevant issues here. I can run some benchmarks later this week to see if there is still a performance loss.

huggingface/candle#1062
huggingface/candle#2418
huggingface/candle#1926 (comment)

vrdn-23 added 3 commits November 5, 2025 16:37

[Bugfix] Make Gelu Activations consistent across frameworks

a3f6719

simplify a bit further

aae4307

style change

2ca266c

vrdn-23 mentioned this pull request Nov 6, 2025

candle tensor operations are bit slower than pytorch tensor operations huggingface/candle#1926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Make Gelu Activations consistent across frameworks #753

[Bugfix] Make Gelu Activations consistent across frameworks #753

Uh oh!

vrdn-23 commented Nov 6, 2025 •

edited

Loading

Uh oh!

vrdn-23 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Bugfix] Make Gelu Activations consistent across frameworks #753

Are you sure you want to change the base?

[Bugfix] Make Gelu Activations consistent across frameworks #753

Uh oh!

Conversation

vrdn-23 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

vrdn-23 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vrdn-23 commented Nov 6, 2025 •

edited

Loading