-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tests] Diverse Whisper fixes #33665
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
LMK when you want me to review @ylacombe (please fix the tests and the conflicts first 😊) |
Hey @LysandreJik, could you review now ? It seems that the failing check is unrelated to Whisper ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally up to date with the changes in generation_whisper
so I'll trust you with them; if you want a double check feel free to cc @ArthurZucker but if you're confident you can ahead and merge.
cc @ArthurZucker for a quick look ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's just run slow before we merge? 🤗
cc @ydshieh , there's still this error when doing slow tests:
|
Could you rebase and trigger again with |
It's working thanks ! |
The list of failing tests are as expected, merging ! |
Great work! I noticed there's a small issue in this PR, especially when To fix this quickly, you can update
cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2)[:, :, num_input_ids-3:, :]) |
Hey @leng-yue, thanks for your message! |
Basically |
* fix beam indices in token_timestamps * fix attention_mask in FA2 * correct translation example with the right example * correct how somes tests are using outputs + correct num_frames * fix shortform batch prev cond tests * make fix-copies * make fix-copies * take care of shifting beam indices * [run-slow] whisper * [run-slow] whisper
* fix beam indices in token_timestamps * fix attention_mask in FA2 * correct translation example with the right example * correct how somes tests are using outputs + correct num_frames * fix shortform batch prev cond tests * make fix-copies * make fix-copies * take care of shifting beam indices * [run-slow] whisper * [run-slow] whisper
* fix beam indices in token_timestamps * fix attention_mask in FA2 * correct translation example with the right example * correct how somes tests are using outputs + correct num_frames * fix shortform batch prev cond tests * make fix-copies * make fix-copies * take care of shifting beam indices * [run-slow] whisper * [run-slow] whisper
What does this PR do?
There's a lot of pending failing tests with Whisper. This PR addresses some issues:
decoder_inputs_ids
were onceforced_input_ids
. This had an impact on thebeam_indices
.beam_indices
has a length ofdecoder_input_ids + potentially_generated_ids
but doesn't take into accountdecoder_input_ids
when keeping track of the indices. In other wordsbeam_indices[0]
is really the beam indice of the first generated token, instead ofdecoder_input_ids[0]
.The Flash-Attention 2 attention mask was causing an issue
The remaining work is done on the modeling tests. Note that some of these tests were failing because of straightforward reasons - e.g the output was a dict - and are actually still failing, but their reasons for failing are not straightforward anymore. Debugging will be easier though.
Note: With #33450 and this, we're down from 29 failing tests to 17