Generate: assistant should sample when the main model samples #33534

gante · 2024-09-17T12:01:39Z

What does this PR do?

TL;DR in assisted generation, the assistant model must sample when the main model is sampling. Otherwise, mathematical properties in the corresponding code path do not hold (see speculative decoding paper).

This reverts #30778, where I forced the assistant model to always run greedy decoding for speed purposes (more matched candidate tokens = faster).

LysandreJik

Thanks @gante!

HuggingFaceDocBuilderDev · 2024-09-20T16:12:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…gface#33534)

LysandreJik approved these changes Sep 17, 2024

View reviewed changes

it should sample

cb6e1ba

gante force-pushed the fix_32867 branch from e9a18e8 to cb6e1ba Compare September 20, 2024 15:48

gante merged commit 77c5d59 into huggingface:main Sep 20, 2024
21 checks passed

gante deleted the fix_32867 branch September 20, 2024 16:01

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Oct 2, 2024

Generate: assistant should sample when the main model samples (huggin…

95b5405

…gface#33534)

keyboardAnt mentioned this pull request Nov 1, 2024

Speculative decoding: Test the _target distribution (to prevent issues like #32867) #34553

Merged

3 tasks

keyboardAnt mentioned this pull request Nov 16, 2024

[OLD] New PR: #35029. [[Universal Speculative Decoding CandidateGenerator]] #34760

Closed

6 tasks

keyboardAnt mentioned this pull request Nov 30, 2024

Universal Speculative Decoding CandidateGenerator #35029

Open

6 tasks

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Generate: assistant should sample when the main model samples (huggin…

3aa2ec7

…gface#33534)

BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024

Generate: assistant should sample when the main model samples (huggin…

596595f

…gface#33534)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: assistant should sample when the main model samples #33534

Generate: assistant should sample when the main model samples #33534

gante commented Sep 17, 2024

LysandreJik left a comment

HuggingFaceDocBuilderDev commented Sep 20, 2024

Generate: assistant should sample when the main model samples #33534

Generate: assistant should sample when the main model samples #33534

Conversation

gante commented Sep 17, 2024

What does this PR do?

LysandreJik left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 20, 2024