Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow compressed-tensors quantized model to be trained #34520

Conversation

horheynm
Copy link
Contributor

@horheynm horheynm commented Oct 30, 2024

What does this PR do?

Using HFQuantizer, models that were quantized using compressed-tensors can be loaded.

The purpose of this pr is to allow quantized models to be loaded using the trainer pathway.
Currently, if quantized (HFQuantizer is instantiated based on the quantization_config), then raise for training.

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Who can review?

@SunMarc @younesbelkada

@horheynm horheynm changed the title Nm train quantized models from compressed tensors Allow compressed-tensors quantized model to be trained Oct 30, 2024
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR ! Left a few suggestion. Could you explain a bit more how you are performing training with compressed-tensors models if you are not using peft ? Are you maybe doing qat or just adding custom lora layers by yourself ?

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved
src/transformers/trainer.py Outdated Show resolved Hide resolved
@horheynm horheynm marked this pull request as ready for review November 5, 2024 16:29
@horheynm horheynm marked this pull request as draft November 5, 2024 16:31
@horheynm
Copy link
Contributor Author

horheynm commented Nov 6, 2024

@SunMarc
Yes we are quantizing the model using oneshot from compressed-tensors, loading that model using AutoModelForCausalLM and HFQuantizer. Once loaded we will be running the training - qat finetuning. The 'quantization' we run is fakequant.

We are not using LoRA adapters

@horheynm horheynm marked this pull request as ready for review November 13, 2024 17:06
@horheynm horheynm marked this pull request as draft November 13, 2024 17:12
@SunMarc
Copy link
Member

SunMarc commented Nov 15, 2024

Yes we are quantizing the model using oneshot from compressed-tensors, loading that model using AutoModelForCausalLM and HFQuantizer. Once loaded we will be running the training - qat finetuning. The 'quantization' we run is fakequant.

We are not using LoRA adapters

Nice thanks for confirming ! It would be nice to add the is_qat_trainable in the base class (HfQuantizer) and set it to False by default. Feel free to ping me when the PR is ready !

src/transformers/trainer.py Outdated Show resolved Hide resolved
@horheynm horheynm marked this pull request as ready for review November 19, 2024 21:43
@horheynm
Copy link
Contributor Author

horheynm commented Nov 19, 2024

Yes we are quantizing the model using oneshot from compressed-tensors, loading that model using AutoModelForCausalLM and HFQuantizer. Once loaded we will be running the training - qat finetuning. The 'quantization' we run is fakequant.
We are not using LoRA adapters

Nice thanks for confirming ! It would be nice to add the is_qat_trainable in the base class (HfQuantizer) and set it to False by default. Feel free to ping me when the PR is ready !

Hey Mark,

Its ready for review.
There is a test failure but i think its from the api timeout.
it shows

1 failed because `requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID` -> f55e37eb-9159-4b2b-bb6e-e87085e7590b)')

I tried to rerun, but the option is not clickable from my end. Anything I can do from my end to rerun?

Thank you!

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR ! I left a few comments. Also, you didn't make any substantial changes in trainer, is it expected ?

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved
src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved
src/transformers/trainer.py Outdated Show resolved Hide resolved
src/transformers/quantizers/base.py Outdated Show resolved Hide resolved
@SunMarc
Copy link
Member

SunMarc commented Nov 20, 2024

I tried to rerun, but the option is not clickable from my end. Anything I can do from my end to rerun?

No, don't worry. I will rerun it

…ithub.com:neuralmagic/upstream-transformers into nm-train-quantized-models-from-compressed-tensors
@horheynm horheynm force-pushed the nm-train-quantized-models-from-compressed-tensors branch from 8cc545b to 90a92f1 Compare November 21, 2024 14:28
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Just a nit about is_qat_trainable for compressed_tensors. I don't think we should set it to True by default. Maybe this is linked to the run_compressed var that you wanted to add in another PR

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@horheynm
Copy link
Contributor Author

@SunMarc
Hi Marc,
This is ready to merge

@SunMarc SunMarc requested a review from ArthurZucker November 22, 2024 15:41
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Given this:

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Just want to make sure we are not breaking (i.e. if we need a version check on this? )

@dsikka
Copy link
Contributor

dsikka commented Nov 25, 2024

Thanks! Given this:

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Just want to make sure we are not breaking (i.e. if we need a version check on this? )

Hi @ArthurZucker where are you recommending adding a version check?

@horheynm
Copy link
Contributor Author

Thanks! Given this:

Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).

Just want to make sure we are not breaking (i.e. if we need a version check on this? )

This won't be breaking. We had a oneshot then finetune pathway in the past. We are now using HFQuantizer to load model instead of our own custom class (based on AutoModelForCausalLM) we used to have. So previous versions of llm-compressor (library to run oneshot and finetune) and compressed-tensors (library to compress/decompress tensors) will support this pipeline.

If for comfortability, we want to add a version check, I would be happy to add!

Let me know!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering, it's fine sounds like we don't need a check, merging!

@ArthurZucker ArthurZucker merged commit 57ca9e6 into huggingface:main Nov 28, 2024
24 checks passed
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
…4520)

* populate quantization_config for kv-cache-scheme only configs

* make compressed-tensors quantized models trainable

* populate versions on quant config

* pass oneshot then finetune

* remove breakpoint

* SunMarc comments and fix to_dict logic

* lint

* lint

* test

* comment

* comments'
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
…4520)

* populate quantization_config for kv-cache-scheme only configs

* make compressed-tensors quantized models trainable

* populate versions on quant config

* pass oneshot then finetune

* remove breakpoint

* SunMarc comments and fix to_dict logic

* lint

* lint

* test

* comment

* comments'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
  NODES
COMMUNITY 2
innovation 1
Project 5
USERS 1