Fix test_eager_matches_sdpa_inference for XPU backend #34889

dvrogozh · 2024-11-22T23:51:54Z

Included fixes:

Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel
Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron
Reuse CUDA MATH thresholds in for XPU (as of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH)

dvrogozh · 2024-11-23T00:29:45Z

Looks like Friday evening is not the best time to run ci. Pushed same code 3 times, seeing different errors on each run:). Not related to the change I think. Will continue on Monday :).

dvrogozh · 2024-11-25T22:27:45Z

Clarified XPU backend behavior for torch.backends.cuda.sdp_kernel. As of PyTorch 2.5 (and today PyTorch main), XPU backend supports only torch.nn.attention.SDPBackend.MATH implementation of which is device agnostic with respect to implementation of each individual aten operator. So, we can reuse CUDA or CPU MATH weights for XPU (for all cases in disrespect to SDP backend). That's the change I did in the last version. @faaany, @EikanWang : fyi.

tests/models/musicgen/test_modeling_musicgen.py

tests/models/musicgen_melody/test_modeling_musicgen_melody.py

ydshieh · 2024-11-26T10:42:55Z

tests/models/mimi/test_modeling_mimi.py

+                                        # As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH
+                                        # which is implemented on PyTorch level using aten operators and is
+                                        # device agnostic with respect to implementation of each aten operator.
+                                        atol = atols["cuda", False, torch_dtype]


nit:

although xpu only support MATH, does it means the results from XPU will be the same as, say CUDA? Otherwise, I don't see the reason to reuse CUDA threshold.

In my understanding, at least for the recent versions of PyTorch (2.5 and upcoming 2.6 ) MATH should give identical results on any hardware including different GPU devices and CPU because algorithm is implemented on torch level and is device agnostic (well, up to aten operators implementation which is device specific, but they still should give same results). The only exception here is MPS which has separate branch in the code though also implemented at torch level. Here are relevant places in the sources:

https://github.com/pytorch/pytorch/blob/5ececd4caa4ec1534c567ec9db6efb861b760685/aten/src/ATen/native/transformers/attention.cpp#L776 (see separate code branch for MPS right above this code)

https://github.com/pytorch/pytorch/blob/5ececd4caa4ec1534c567ec9db6efb861b760685/aten/src/ATen/native/transformers/attention.cpp#L795

So, per above I think that current reuse of CUDA thresholds is reasonable... That being said, question is whether this is sustainable in a longer term or we soon will need to adjust thresholds for XPU? Well, we might need to. It's likely that upstream pytorch XPU will get implementation for one or both of attention algorithms and here I am not sure that these will behave same as CUDA. Plus, there is also IPEX for XPU which might behave different. And all that might be version dependent....

Anyhow, whether we reuse CUDA thresholds or not is a call we need to make here. I would in any case start by copying CUDA thresholds to XPU specific location. Note that we might need few iterations to settle everything down.

sound reason able to reuse (for MATH) threshold now. Thank you for explaining!

ydshieh · 2024-11-26T10:45:37Z

Thanks @dvrogozh ! LGTM.

Have you run (some of) the relevant tests on a GPU (cuda) and XPU (cc @faaany for this part maybe) machine?

dvrogozh · 2024-11-26T14:26:50Z

Have you run (some of) the relevant tests on a GPU (cuda) and XPU (cc @faaany for this part maybe) machine?

I ran python3 -m pytest --pspec -k test_eager_matches_sdpa_inference tests/models:

On Nvidia A10, CUDA, passing
On Intel PVC, upstream pytorch XPU (without IPEX), passing
On Intel PVC, with IPEX 2.3.110+xpu (pytorch 2.3.1), pasing
I also think that Transformers ci executes these CUDA tests and they are passing.

Above being said, we synced with @Faany offline and she did try this PR on her side. I believe she saw tests passing for her w/o IPEX, but she mentioned that some tests are failing for her with IPEX. The later point is different from test results I have. I suspect that's due to the different IPEX versions we tried: I think @faaany tried IPEX with later version (she's offline now, I will check with her later on that).

Note that without this PR test_eager_matches_sdpa_inference tests fail for both upstream pytorch XPU and IPEX. The difference between upstream pytorch XPU and IPEX is which attention algorithms are supported. I know the current status for upstream pytorch XPU since I discussed that with relevant folks already. But I did not have such discussion for IPEX - that's what I will need to do. I know that IPEX might register additional operations and algorithms on top of upstream pytorch XPU, but whether it does so for SDP attention and in which version - I do not know (though I do have some understanding from the results I see so far).

ydshieh

Thank you again!

@faaany Do you have any further comments?

dvrogozh · 2024-11-26T15:32:24Z

For myself: need to follow up with #34941 (sdpa tests for beit) which is being reviewed in parallel.

dvrogozh · 2024-11-26T16:32:54Z

FAILED tests/models/xlm/test_modeling_xlm.py::XLMModelTest::test_batching_equivalence - AssertionError: tensor(False) is not true : Batched and Single row outputs are not equal in XLMForQuestionAnswering for key=end_top_index. Difference=1.

Failure on ci seems unrelated. I also see some flakiness on main ci results. I tried to rebase couple times, but this did not help. No changes from last review.

ydshieh · 2024-11-26T16:41:48Z

yes we do have some flaky tests. (trying to fix them but in a slow peace)

…ds.cuda.sdp_kernel Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH which is implemented on PyTorch level using aten operators and is device agnostic with respect to implementation of each aten operator. Thus, we can reuse CUDA (or CPU) MATH weights for XPU. Fixes: huggingface#34888 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

…in nemotron Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

ydshieh · 2024-11-27T16:13:04Z

There is no need to have a green CI if you think some failure are irrelevant. Ping me for a double check then we could wait for a core maintainer's review.

dvrogozh · 2024-11-27T16:22:57Z

There is no need to have a green CI if you think some failure are irrelevant. Ping me for a double check then we could wait for a core maintainer's review.

Yeah, I think that's the case here. I rebased today w/o any changes. Another test failed this time, I believe unrelated as well. Sorry, I am just paranoid about green ci - I consider that's my responsibility to achieve that on my PRs.

FAILED tests/trainer/test_trainer.py::TrainerIntegrationWithHubTester::test_push_to_hub_tags - KeyError: 'url'
FAILED tests/trainer/test_trainer.py::TrainerIntegrationWithHubTester::test_push_to_hub_with_saves_each_epoch - AssertionError: no logs of level WARNING or higher triggered on root

ArthurZucker

Feel free to merge if it's alright with you @ydshieh

ArthurZucker · 2024-11-27T11:10:21Z

tests/models/musicgen/test_modeling_musicgen.py

@@ -607,7 +607,7 @@ def get_mean_reldiff(failcase, x, ref, atol, rtol):

                                # TODO: test gradients as well (& for FA2 as well!)
                                with torch.no_grad():
-                                    with torch.backends.cuda.sdp_kernel(
+                                    with sdpa_kernel(


thanks for updating!

HuggingFaceDocBuilderDev · 2024-12-02T15:48:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…e#34889) * Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * Fix test_eager_matches_sdpa_inference for XPU backend As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH which is implemented on PyTorch level using aten operators and is device agnostic with respect to implementation of each aten operator. Thus, we can reuse CUDA (or CPU) MATH weights for XPU. Fixes: huggingface#34888 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> --------- Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh mentioned this pull request Nov 22, 2024

xpu: test_eager_matches_sdpa_inference tests fail with pytorch XPU backend #34888

Closed

dvrogozh force-pushed the sdpa branch 2 times, most recently from c472e44 to be8177c Compare November 23, 2024 00:24

Rocketknight1 added Tests Related to tests bug labels Nov 25, 2024

dvrogozh force-pushed the sdpa branch from be8177c to 0a83183 Compare November 25, 2024 22:23

faaany reviewed Nov 26, 2024

View reviewed changes

tests/models/musicgen/test_modeling_musicgen.py Outdated Show resolved Hide resolved

faaany reviewed Nov 26, 2024

View reviewed changes

tests/models/musicgen_melody/test_modeling_musicgen_melody.py Outdated Show resolved Hide resolved

dvrogozh force-pushed the sdpa branch from 0a83183 to a33fdcb Compare November 26, 2024 02:49

ydshieh reviewed Nov 26, 2024

View reviewed changes

dvrogozh force-pushed the sdpa branch from a33fdcb to 0df1bab Compare November 26, 2024 14:42

ydshieh approved these changes Nov 26, 2024

View reviewed changes

ydshieh requested a review from ArthurZucker November 26, 2024 15:09

dvrogozh force-pushed the sdpa branch from 0df1bab to 86f8b24 Compare November 26, 2024 15:28

dvrogozh force-pushed the sdpa branch from 86f8b24 to e6e1089 Compare November 26, 2024 16:21

ruidazeng approved these changes Nov 27, 2024

View reviewed changes

dvrogozh added 3 commits November 27, 2024 08:06

Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backen…

a83ba14

…ds.cuda.sdp_kernel Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast …

0191f49

…in nemotron Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh force-pushed the sdpa branch from e6e1089 to 0191f49 Compare November 27, 2024 16:06

ArthurZucker approved these changes Dec 2, 2024

View reviewed changes

ydshieh merged commit 3183047 into huggingface:main Dec 2, 2024
26 checks passed

dvrogozh mentioned this pull request Dec 2, 2024

Add sdpa for Beit #34941

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_eager_matches_sdpa_inference for XPU backend #34889

Fix test_eager_matches_sdpa_inference for XPU backend #34889

dvrogozh commented Nov 22, 2024 •

edited

Loading

dvrogozh commented Nov 23, 2024

dvrogozh commented Nov 25, 2024 •

edited

Loading

ydshieh Nov 26, 2024

dvrogozh Nov 26, 2024

dvrogozh Nov 26, 2024

ydshieh Nov 26, 2024

ydshieh commented Nov 26, 2024

dvrogozh commented Nov 26, 2024

ydshieh left a comment

dvrogozh commented Nov 26, 2024

dvrogozh commented Nov 26, 2024 •

edited

Loading

ydshieh commented Nov 26, 2024

ydshieh commented Nov 27, 2024

dvrogozh commented Nov 27, 2024

ArthurZucker left a comment

ArthurZucker Nov 27, 2024

HuggingFaceDocBuilderDev commented Dec 2, 2024

Fix test_eager_matches_sdpa_inference for XPU backend #34889

Fix test_eager_matches_sdpa_inference for XPU backend #34889

Conversation

dvrogozh commented Nov 22, 2024 • edited Loading

dvrogozh commented Nov 23, 2024

dvrogozh commented Nov 25, 2024 • edited Loading

ydshieh Nov 26, 2024

Choose a reason for hiding this comment

dvrogozh Nov 26, 2024

Choose a reason for hiding this comment

dvrogozh Nov 26, 2024

Choose a reason for hiding this comment

ydshieh Nov 26, 2024

Choose a reason for hiding this comment

ydshieh commented Nov 26, 2024

dvrogozh commented Nov 26, 2024

ydshieh left a comment

Choose a reason for hiding this comment

dvrogozh commented Nov 26, 2024

dvrogozh commented Nov 26, 2024 • edited Loading

ydshieh commented Nov 26, 2024

ydshieh commented Nov 27, 2024

dvrogozh commented Nov 27, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Nov 27, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 2, 2024

dvrogozh commented Nov 22, 2024 •

edited

Loading

dvrogozh commented Nov 25, 2024 •

edited

Loading

dvrogozh commented Nov 26, 2024 •

edited

Loading