Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Diverse Whisper fixes #33665

Merged
merged 13 commits into from
Oct 3, 2024
Merged

Conversation

ylacombe
Copy link
Contributor

@ylacombe ylacombe commented Sep 23, 2024

What does this PR do?

There's a lot of pending failing tests with Whisper. This PR addresses some issues:

  1. Whisper - list index out of range with word level timestamps #31683 and Whisper fix audio out of range #31770 mentioned a out-of-range word level timestamps. This happens because decoder_inputs_ids were once forced_input_ids. This had an impact on the beam_indices.

beam_indices has a length of decoder_input_ids + potentially_generated_ids but doesn't take into account decoder_input_ids when keeping track of the indices. In other words beam_indices[0] is really the beam indice of the first generated token, instead of decoder_input_ids[0].

  1. The Flash-Attention 2 attention mask was causing an issue

  2. The remaining work is done on the modeling tests. Note that some of these tests were failing because of straightforward reasons - e.g the output was a dict - and are actually still failing, but their reasons for failing are not straightforward anymore. Debugging will be easier though.

Note: With #33450 and this, we're down from 29 failing tests to 17

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@LysandreJik
Copy link
Member

LMK when you want me to review @ylacombe (please fix the tests and the conflicts first 😊)

@ylacombe
Copy link
Contributor Author

ylacombe commented Oct 2, 2024

Hey @LysandreJik, could you review now ? It seems that the failing check is unrelated to Whisper !

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally up to date with the changes in generation_whisper so I'll trust you with them; if you want a double check feel free to cc @ArthurZucker but if you're confident you can ahead and merge.

@ylacombe
Copy link
Contributor Author

ylacombe commented Oct 3, 2024

cc @ArthurZucker for a quick look !

@ylacombe ylacombe requested a review from ArthurZucker October 3, 2024 09:55
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's just run slow before we merge? 🤗

@ylacombe
Copy link
Contributor Author

ylacombe commented Oct 3, 2024

cc @ydshieh , there's still this error when doing slow tests:

File "/usr/lib/python3.8/runpy.py", line 194 in _run_module_as_main
Bus error (core dumped)

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 3, 2024

hi @ylacombe let me merge #33849 and try to rebase this PR on top of main and let's see.

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 3, 2024

Could you rebase and trigger again with [run-slow] whisper?

@ylacombe
Copy link
Contributor Author

ylacombe commented Oct 3, 2024

It's working thanks !

@ylacombe
Copy link
Contributor Author

ylacombe commented Oct 3, 2024

The list of failing tests are as expected, merging !

@ylacombe ylacombe merged commit bf0ffe3 into huggingface:main Oct 3, 2024
18 of 20 checks passed
@leng-yue
Copy link

leng-yue commented Oct 4, 2024

Great work! I noticed there's a small issue in this PR, especially when prompt_ids is specified—the model always returns a timestamp of 0.0.

To fix this quickly, you can update

cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2))
to:

cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2)[:, :, num_input_ids-3:, :])

@ylacombe
Copy link
Contributor Author

ylacombe commented Oct 7, 2024

Hey @leng-yue, thanks for your message!
While there's maybe something wrong with how cross-attentions weights are computed, I'm not sure that your proposed solution works.
Could you provide a code snippet and maybe an explanation of the proposed solution if you have time?
Thanks!

@leng-yue
Copy link

leng-yue commented Oct 8, 2024

Basically num_input_ids is prefix_tokens (timestamps, language, etc) + prompt_ids, therefore, we want to strip theprompt_ids, which is num_input_ids - 3.

NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Oct 21, 2024
* fix beam indices in token_timestamps

* fix attention_mask in FA2

* correct translation example with the right example

* correct how somes tests are using outputs + correct num_frames

* fix shortform batch prev cond tests

* make fix-copies

* make fix-copies

* take care of shifting beam indices

* [run-slow] whisper

* [run-slow] whisper
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* fix beam indices in token_timestamps

* fix attention_mask in FA2

* correct translation example with the right example

* correct how somes tests are using outputs + correct num_frames

* fix shortform batch prev cond tests

* make fix-copies

* make fix-copies

* take care of shifting beam indices

* [run-slow] whisper

* [run-slow] whisper
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
* fix beam indices in token_timestamps

* fix attention_mask in FA2

* correct translation example with the right example

* correct how somes tests are using outputs + correct num_frames

* fix shortform batch prev cond tests

* make fix-copies

* make fix-copies

* take care of shifting beam indices

* [run-slow] whisper

* [run-slow] whisper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
  NODES
COMMUNITY 2
innovation 1
Note 2
Project 5
USERS 1