allow unused input parameters passthrough when chunking in asr pipelines #33889

VictorAtIfInsurance · 2024-10-02T14:07:06Z

What does this PR do?

Harmonizing behaviour for what input data is passthrough automatic speech recognition pipelines.

Today you can get different return dictionaries keys when using ASR pipeline with and without the call parameter chunk_length_s. With this parameter you will only get a dictionary with the transformed input data, while omitting this parameter allows you to retrieve any unused input data parameter back in the resulting dictionary. This inconsistency can be confusing as sometimes you need to keep track of which inputs corresponds to which outputs, while it is not obvious to a user why.

The following code snippet highlights these differences:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import numpy as np


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3-turbo"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

data = {"raw": np.zeros(100), "sampling_rate": 16_000, "id": 1}

chunk_output = pipe(data.copy(), chunk_length_s=30)
non_chunk_output = pipe(data.copy())

print(chunk_output)
print(non_chunk_output)
assert chunk_output.keys() == non_chunk_output.keys()

This will print the following and fail the assert.

{'text': ' you'}
{'text': ' you', 'id': [1]}

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

LysandreJik · 2024-10-04T09:25:33Z

cc @Rocketknight1 for pipelines and @ylacombe for ASR

ylacombe · 2024-10-04T09:36:58Z

Hey @VictorAtIfInsurance, thanks for this PR!
Could you please provide some motivation on the PR and code snippets, if necessary?

From a quick look, I think we'd rather raise an Error when some parameters are unused, rather than just passing them through. Especially because we want to avoid silent errors or mis-usages. For example, let's say I misspelled the much-used return_timestamps to retun_timestamps. With your PR, there would be no warnings and no errors and the resulting dictionary would not be as expected!

VictorAtIfInsurance · 2024-10-04T11:20:12Z

Hi,

I should've marked this as WIP. The test is not setup for properly testing the intention of the PR, it should be unused inputs and not any unused parameter as is written in the test now.

One of the reasons for this PR is inconsistent behaviour when passing chunking_length_s as a parameter to the pipeline.

E.g. I wouldv'e expected the outputs from these two different pipe calls to have the same structure of the output, i.e. that the assert would pass,

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import numpy as np


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3-turbo"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

data = {"raw": np.zeros(100), "sampling_rate": 16_000, "id": 1}

chunk_output = pipe(data.copy(), chunk_length_s=30)
non_chunk_output = pipe(data.copy())

print(chunk_output)
print(non_chunk_output)
assert chunk_output.keys() == non_chunk_output.keys()

But printing the outputs gives

{'text': ' you'}
{'text': ' you', 'id': [1]}

I.e. the unused parameter in the input "survives" when not using chunking but is removed when using chunking.

The behaviour from the non chunking pipeline call comes from this line, so would expect something similar for chunking as well

transformers/src/transformers/pipelines/automatic_speech_recognition.py

Line 472 in 4a173b8

yield {"is_last": True, **processed, **extra}

Where the behaviour of the non chunking pipeline is especially useful if you generate your input data through a generator where you can directly call the pipeline with the generator while also retrieve metadata returned from the generator, without the extra step of first retrieving the next item, and call the pipe separately. See example:

def generate_data():
    iter = 0
    while iter < 10:
        yield {"raw": np.zeros(100), "sampling_rate": 16_000, "id": iter}
        iter += 1

print("Non chunking")
for output in pipe(generate_data()):
    print(output)

print()
print("Chunking")
for output in pipe(generate_data(), chunk_length_s=30):
    print(output)

which prints

Non chunking
{'text': ' you', 'id': [0]}
{'text': ' you', 'id': [1]}
{'text': ' you', 'id': [2]}
{'text': ' you', 'id': [3]}
{'text': ' you', 'id': [4]}
{'text': ' you', 'id': [5]}
{'text': ' you', 'id': [6]}
{'text': ' you', 'id': [7]}
{'text': ' you', 'id': [8]}
{'text': ' you', 'id': [9]}

Chunking
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}
{'text': ' you'}

In these instances it wouldv'e been super useful to pass along the unused inputs.

I hope this makes sense and that you agree with my intended behaviour. I will update the tests accordingly in that case, and sorry for posting unfinished code for review.

VictorAtIfInsurance · 2024-10-07T14:15:18Z

@ylacombe

I've now added some code examples explaining my issue, and updated the test accordingly:)

ylacombe

Hey @VictorAtIfInsurance, this actually makes sense, especially since we pass extra parameters when we don't chunk.

I'm pretty sure it doesn't align with the pipeline philosophy though, and we could maybe ask @Rocketknight1's opinion on this!

Rocketknight1 · 2024-10-10T15:51:36Z

I think keeping the output consistent with different input kwargs definitely fits with our philosophy!

@VictorAtIfInsurance we're actually working on a project to standardize the inputs and outputs for our pipeline classes, to match an inference spec that will be common across lots of projects like TGI / transformers.js / etc. See this PR.

If this PR makes the pipeline always compliant with the Hub output spec, then I'm definitely in favour of it! Can you:

Rebase on the latest version of main to make sure your PR is doing the spec compliance tests
Add a compliance test when chunk_length_s is passed as well?

Other than that this PR looks good to me.

VictorAtIfInsurance · 2024-10-11T13:07:28Z

Hi,

Thanks for the input.

I rebased and added check against AutomaticSpeechRecognitionOutput using the compare_pipeline_output_to_hub_spec test utility method.

As i change the output from the pipeline call the test fails locally, which is to be expected since it doesn't follow the output structure. So this is probably not the way to go forward. Though then the non-chunk version of ASR pipeline has a problem on main, see the second assert in test_pipeline_output_on_unused_kwargs in #34087.

What would be the correct way to handle invalid kwargs using the new standardized inputs for pipelines. Raise an exception such as in test_fail_on_invalid_input_kwargs in the same PR? #34087

Or maybe a warning would be better in that case as it could break backwards compatibility for the users?

What would be the best next step here @ylacombe? I think consistence is important, but maybe my proposed change is not the way to go as it deviates from the standardized approach.

ylacombe

Hey @VictorAtIfInsurance, indeed it is not compatible with AutomaticSpeechRecognitionOutput, that you can find here.

I think keeping the output consistent with different input kwargs definitely fits with our philosophy!

@Rocketknight1, I'm not sure to follow, AutomaticSpeechRecognitionOutput are really constraining and some models (say Whisper) might return many more output fields! How can we reconcile both?

.gitignore

Rocketknight1 · 2024-10-14T14:19:48Z

@ylacombe It's okay for models to return many output fields, but we're trying to get the pipelines to have standardized inputs/outputs, so that they can fit into a standard inference API.

However, in this case, the pipeline obviously has compliance issues already, as mentioned in #34087. I didn't realize this, I'm sorry! I don't think we should delay this PR because of issues in main, so for now, I think we can skip the compliance checks I suggested, and I'll work on making the outputs compliant later in a separate PR.

VictorAtIfInsurance · 2024-10-16T08:24:35Z

@ylacombe I've removed the compliance check in the test, making the test pass. Let me know if I should update anything more here.

HuggingFaceDocBuilderDev · 2024-10-16T13:35:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

VictorAtIfInsurance · 2024-10-16T14:04:29Z

Rebased main as some tests failed due to missing files.

VictorAtIfInsurance · 2024-10-25T09:04:36Z

@Rocketknight1 Could we rerun the workflow? The last time it failed as I was missing some files from main, but now it should be rebased.

VictorAtIfInsurance · 2024-10-28T07:55:06Z

@ylacombe are we ready for merge?:)

ylacombe

LGTM, let's rebase on main ?

VictorAtIfInsurance · 2024-10-31T13:19:00Z

Sure, though there were no merge conflicts

ArthurZucker

Very nice thanks!

VictorAtIfInsurance · 2024-11-04T14:04:56Z

@ylacombe would you mind merging this PR, I do not have write access to do it myself.

ArthurZucker · 2024-11-25T10:36:49Z

Merged sorry for the wait!

…nes (huggingface#33889) * allow unused parameter passthrough when chunking in asr pipelines * format code * format * run fixup * update tests * update parameters to pipline in test * updates parametrs in tests * change spelling in gitignore * revert .gitignore to main * add git ignore of devcontainer folder * assert asr output follows expected inference output type * run fixup * Remove .devcontainer from .gitignore * remove compliance check

VictorAtIfInsurance changed the title ~~allow unused parameter passthrough when chunking in asr pipelines~~ [WIP] allow unused parameter passthrough when chunking in asr pipelines Oct 4, 2024

VictorAtIfInsurance changed the title ~~[WIP] allow unused parameter passthrough when chunking in asr pipelines~~ [WIP] allow unused input parameters passthrough when chunking in asr pipelines Oct 4, 2024

VictorAtIfInsurance changed the title ~~[WIP] allow unused input parameters passthrough when chunking in asr pipelines~~ allow unused input parameters passthrough when chunking in asr pipelines Oct 7, 2024

VictorAtIfInsurance force-pushed the feature/parameter-passthrough branch from 2ebdb16 to ac1116b Compare October 7, 2024 13:59

ylacombe reviewed Oct 10, 2024

View reviewed changes

VictorAtIfInsurance force-pushed the feature/parameter-passthrough branch from b39fbf7 to 4f43b4a Compare October 11, 2024 06:49

VictorAtIfInsurance mentioned this pull request Oct 11, 2024

[WIP] create two failing test, highlighting expectations #34087

Closed

ylacombe reviewed Oct 14, 2024

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

VictorAtIfInsurance force-pushed the feature/parameter-passthrough branch from fd8d15f to ec1d73b Compare October 16, 2024 13:20

VictorAtIfInsurance force-pushed the feature/parameter-passthrough branch 2 times, most recently from 1fd7168 to 9cdf43d Compare October 25, 2024 08:58

VictorAtIfInsurance requested a review from ylacombe October 30, 2024 08:45

ylacombe approved these changes Oct 31, 2024

View reviewed changes

VictorAtIfInsurance added 5 commits October 31, 2024 14:18

allow unused parameter passthrough when chunking in asr pipelines

a463129

format code

d85613d

format

2d05fc5

run fixup

34e4d5b

update tests

c1abc7e

VictorAtIfInsurance and others added 9 commits October 31, 2024 14:18

update parameters to pipline in test

30206e3

updates parametrs in tests

9c7916a

change spelling in gitignore

92307ec

revert .gitignore to main

dc21414

add git ignore of devcontainer folder

534ef9f

assert asr output follows expected inference output type

e6a1775

run fixup

a6a8ee8

Remove .devcontainer from .gitignore

b9aa8d3

remove compliance check

3835ade

VictorAtIfInsurance force-pushed the feature/parameter-passthrough branch from 9cdf43d to 3835ade Compare October 31, 2024 13:18

ylacombe requested a review from ArthurZucker October 31, 2024 15:52

ArthurZucker approved these changes Oct 31, 2024

View reviewed changes

ArthurZucker merged commit a0f4f31 into huggingface:main Nov 25, 2024
26 checks passed

VictorAtIfInsurance deleted the feature/parameter-passthrough branch November 26, 2024 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow unused input parameters passthrough when chunking in asr pipelines #33889

allow unused input parameters passthrough when chunking in asr pipelines #33889

VictorAtIfInsurance commented Oct 2, 2024 •

edited

Loading

LysandreJik commented Oct 4, 2024

ylacombe commented Oct 4, 2024

VictorAtIfInsurance commented Oct 4, 2024 •

edited

Loading

VictorAtIfInsurance commented Oct 7, 2024

ylacombe left a comment

Rocketknight1 commented Oct 10, 2024

VictorAtIfInsurance commented Oct 11, 2024 •

edited

Loading

ylacombe left a comment

Rocketknight1 commented Oct 14, 2024

VictorAtIfInsurance commented Oct 16, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 16, 2024

VictorAtIfInsurance commented Oct 16, 2024

VictorAtIfInsurance commented Oct 25, 2024

VictorAtIfInsurance commented Oct 28, 2024

ylacombe left a comment •

edited

Loading

VictorAtIfInsurance commented Oct 31, 2024

ArthurZucker left a comment

VictorAtIfInsurance commented Nov 4, 2024

ArthurZucker commented Nov 25, 2024

allow unused input parameters passthrough when chunking in asr pipelines #33889

allow unused input parameters passthrough when chunking in asr pipelines #33889

Conversation

VictorAtIfInsurance commented Oct 2, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

LysandreJik commented Oct 4, 2024

ylacombe commented Oct 4, 2024

VictorAtIfInsurance commented Oct 4, 2024 • edited Loading

VictorAtIfInsurance commented Oct 7, 2024

ylacombe left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Oct 10, 2024

VictorAtIfInsurance commented Oct 11, 2024 • edited Loading

ylacombe left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Oct 14, 2024

VictorAtIfInsurance commented Oct 16, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Oct 16, 2024

VictorAtIfInsurance commented Oct 16, 2024

VictorAtIfInsurance commented Oct 25, 2024

VictorAtIfInsurance commented Oct 28, 2024

ylacombe left a comment • edited Loading

Choose a reason for hiding this comment

VictorAtIfInsurance commented Oct 31, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

VictorAtIfInsurance commented Nov 4, 2024

ArthurZucker commented Nov 25, 2024

VictorAtIfInsurance commented Oct 2, 2024 •

edited

Loading

VictorAtIfInsurance commented Oct 4, 2024 •

edited

Loading

VictorAtIfInsurance commented Oct 11, 2024 •

edited

Loading

VictorAtIfInsurance commented Oct 16, 2024 •

edited

Loading

ylacombe left a comment •

edited

Loading