Allow compressed-tensors quantized model to be trained #34520

horheynm · 2024-10-30T19:43:00Z

What does this PR do?

Using HFQuantizer, models that were quantized using compressed-tensors can be loaded.

The purpose of this pr is to allow quantized models to be loaded using the trainer pathway.
Currently, if quantized (HFQuantizer is instantiated based on the quantization_config), then raise for training.

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Who can review?

@SunMarc @younesbelkada

…nsors

SunMarc

Thanks for the PR ! Left a few suggestion. Could you explain a bit more how you are performing training with compressed-tensors models if you are not using peft ? Are you maybe doing qat or just adding custom lora layers by yourself ?

src/transformers/quantizers/quantizer_compressed_tensors.py

src/transformers/utils/quantization_config.py

src/transformers/trainer.py

src/transformers/quantizers/quantizer_compressed_tensors.py

…nsors

horheynm · 2024-11-06T19:59:32Z

@SunMarc
Yes we are quantizing the model using oneshot from compressed-tensors, loading that model using AutoModelForCausalLM and HFQuantizer. Once loaded we will be running the training - qat finetuning. The 'quantization' we run is fakequant.

We are not using LoRA adapters

…nsors

…ithub.com:neuralmagic/upstream-transformers into nm-train-quantized-models-from-compressed-tensors

…nsors

SunMarc · 2024-11-15T16:37:34Z

Yes we are quantizing the model using oneshot from compressed-tensors, loading that model using AutoModelForCausalLM and HFQuantizer. Once loaded we will be running the training - qat finetuning. The 'quantization' we run is fakequant.

We are not using LoRA adapters

Nice thanks for confirming ! It would be nice to add the is_qat_trainable in the base class (HfQuantizer) and set it to False by default. Feel free to ping me when the PR is ready !

src/transformers/trainer.py

…nsors

horheynm · 2024-11-19T21:43:52Z

Yes we are quantizing the model using oneshot from compressed-tensors, loading that model using AutoModelForCausalLM and HFQuantizer. Once loaded we will be running the training - qat finetuning. The 'quantization' we run is fakequant.
We are not using LoRA adapters

Nice thanks for confirming ! It would be nice to add the is_qat_trainable in the base class (HfQuantizer) and set it to False by default. Feel free to ping me when the PR is ready !

Hey Mark,

Its ready for review.
There is a test failure but i think its from the api timeout.
it shows

1 failed because `requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID` -> f55e37eb-9159-4b2b-bb6e-e87085e7590b)')

I tried to rerun, but the option is not clickable from my end. Anything I can do from my end to rerun?

Thank you!

SunMarc

Thanks for the PR ! I left a few comments. Also, you didn't make any substantial changes in trainer, is it expected ?

src/transformers/utils/quantization_config.py

src/transformers/trainer.py

src/transformers/quantizers/base.py

SunMarc · 2024-11-20T12:50:28Z

I tried to rerun, but the option is not clickable from my end. Anything I can do from my end to rerun?

No, don't worry. I will rerun it

…ithub.com:neuralmagic/upstream-transformers into nm-train-quantized-models-from-compressed-tensors

…nsors

SunMarc

LGTM ! Just a nit about is_qat_trainable for compressed_tensors. I don't think we should set it to True by default. Maybe this is linked to the run_compressed var that you wanted to add in another PR

src/transformers/quantizers/quantizer_compressed_tensors.py

HuggingFaceDocBuilderDev · 2024-11-21T16:35:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

horheynm · 2024-11-22T14:15:41Z

@SunMarc
Hi Marc,
This is ready to merge

ArthurZucker

Thanks! Given this:

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Just want to make sure we are not breaking (i.e. if we need a version check on this? )

dsikka · 2024-11-25T23:54:56Z

Thanks! Given this:

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Just want to make sure we are not breaking (i.e. if we need a version check on this? )

Hi @ArthurZucker where are you recommending adding a version check?

horheynm · 2024-11-26T14:51:04Z

Thanks! Given this:

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Just want to make sure we are not breaking (i.e. if we need a version check on this? )

This won't be breaking. We had a oneshot then finetune pathway in the past. We are now using HFQuantizer to load model instead of our own custom class (based on AutoModelForCausalLM) we used to have. So previous versions of llm-compressor (library to run oneshot and finetune) and compressed-tensors (library to compress/decompress tensors) will support this pipeline.

If for comfortability, we want to add a version check, I would be happy to add!

Let me know!

ArthurZucker

Thanks for answering, it's fine sounds like we don't need a check, merging!

…4520) * populate quantization_config for kv-cache-scheme only configs * make compressed-tensors quantized models trainable * populate versions on quant config * pass oneshot then finetune * remove breakpoint * SunMarc comments and fix to_dict logic * lint * lint * test * comment * comments'

horheynm added 2 commits October 1, 2024 21:35

populate quantization_config for kv-cache-scheme only configs

dc19bc1

make compressed-tensors quantized models trainable

b8a2d49

horheynm changed the title ~~Nm train quantized models from compressed tensors~~ Allow compressed-tensors quantized model to be trained Oct 30, 2024

horheynm added 2 commits October 30, 2024 15:45

Merge branch 'main' into nm-train-quantized-models-from-compressed-te…

36b8899

…nsors

populate versions on quant config

a8a757a

SunMarc reviewed Nov 4, 2024

View reviewed changes

horheynm added 3 commits November 4, 2024 14:40

pass oneshot then finetune

95a36a3

remove breakpoint

4cf4fe2

SunMarc comments and fix to_dict logic

dae8b3c

horheynm marked this pull request as ready for review November 5, 2024 16:29

Merge branch 'main' into nm-train-quantized-models-from-compressed-te…

07aac0f

…nsors

horheynm marked this pull request as draft November 5, 2024 16:31

horheynm added 5 commits November 6, 2024 14:59

Merge branch 'main' into nm-train-quantized-models-from-compressed-te…

7bf07a8

…nsors

Merge branch 'main' into nm-train-quantized-models-from-compressed-te…

dfbf282

…nsors

lint

4387053

Merge branch 'nm-train-quantized-models-from-compressed-tensors' of g…

abbc62b

…ithub.com:neuralmagic/upstream-transformers into nm-train-quantized-models-from-compressed-tensors

Merge branch 'main' into nm-train-quantized-models-from-compressed-te…

74c127f

…nsors

horheynm marked this pull request as ready for review November 13, 2024 17:06

horheynm marked this pull request as draft November 13, 2024 17:12

horheynm added 2 commits November 13, 2024 17:20

lint

e4db207

test

dca9c55

dsikka reviewed Nov 18, 2024

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

horheynm added 2 commits November 19, 2024 21:27

comment

725066a

Merge branch 'main' into nm-train-quantized-models-from-compressed-te…

c11c96e

…nsors

horheynm marked this pull request as ready for review November 19, 2024 21:43

SunMarc reviewed Nov 20, 2024

View reviewed changes

horheynm added 2 commits November 20, 2024 16:14

comments'

a3b289f

Merge branch 'nm-train-quantized-models-from-compressed-tensors' of g…

90a92f1

…ithub.com:neuralmagic/upstream-transformers into nm-train-quantized-models-from-compressed-tensors

horheynm force-pushed the nm-train-quantized-models-from-compressed-tensors branch from 8cc545b to 90a92f1 Compare November 21, 2024 14:28

Merge branch 'main' into nm-train-quantized-models-from-compressed-te…

c3e3cec

…nsors

SunMarc approved these changes Nov 21, 2024

View reviewed changes

src/transformers/quantizers/quantizer_compressed_tensors.py Show resolved Hide resolved

SunMarc requested a review from ArthurZucker November 22, 2024 15:41

ArthurZucker reviewed Nov 25, 2024

View reviewed changes

ArthurZucker approved these changes Nov 28, 2024

View reviewed changes

ArthurZucker merged commit 57ca9e6 into huggingface:main Nov 28, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow compressed-tensors quantized model to be trained #34520

Allow compressed-tensors quantized model to be trained #34520

horheynm commented Oct 30, 2024 •

edited

Loading

SunMarc left a comment

horheynm commented Nov 6, 2024 •

edited

Loading

SunMarc commented Nov 15, 2024

horheynm commented Nov 19, 2024 •

edited

Loading

SunMarc left a comment

SunMarc commented Nov 20, 2024

SunMarc left a comment

HuggingFaceDocBuilderDev commented Nov 21, 2024

horheynm commented Nov 22, 2024

ArthurZucker left a comment

dsikka commented Nov 25, 2024

horheynm commented Nov 26, 2024

ArthurZucker left a comment

Allow compressed-tensors quantized model to be trained #34520

Allow compressed-tensors quantized model to be trained #34520

Conversation

horheynm commented Oct 30, 2024 • edited Loading

What does this PR do?

Who can review?

SunMarc left a comment

Choose a reason for hiding this comment

horheynm commented Nov 6, 2024 • edited Loading

SunMarc commented Nov 15, 2024

horheynm commented Nov 19, 2024 • edited Loading

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc commented Nov 20, 2024

SunMarc left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 21, 2024

horheynm commented Nov 22, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

dsikka commented Nov 25, 2024

horheynm commented Nov 26, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

horheynm commented Oct 30, 2024 •

edited

Loading

horheynm commented Nov 6, 2024 •

edited

Loading

horheynm commented Nov 19, 2024 •

edited

Loading