-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow compressed-tensors quantized model to be trained #34520
Allow compressed-tensors quantized model to be trained #34520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR ! Left a few suggestion. Could you explain a bit more how you are performing training with compressed-tensors models if you are not using peft ? Are you maybe doing qat or just adding custom lora layers by yourself ?
@SunMarc We are not using LoRA adapters |
…ithub.com:neuralmagic/upstream-transformers into nm-train-quantized-models-from-compressed-tensors
Nice thanks for confirming ! It would be nice to add the |
Hey Mark, Its ready for review.
I tried to rerun, but the option is not clickable from my end. Anything I can do from my end to rerun? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR ! I left a few comments. Also, you didn't make any substantial changes in trainer, is it expected ?
No, don't worry. I will rerun it |
…ithub.com:neuralmagic/upstream-transformers into nm-train-quantized-models-from-compressed-tensors
8cc545b
to
90a92f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Just a nit about is_qat_trainable
for compressed_tensors. I don't think we should set it to True by default. Maybe this is linked to the run_compressed var that you wanted to add in another PR
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@SunMarc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Given this:
Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).
Just want to make sure we are not breaking (i.e. if we need a version check on this? )
Hi @ArthurZucker where are you recommending adding a version check? |
This won't be breaking. We had a oneshot then finetune pathway in the past. We are now using HFQuantizer to load model instead of our own custom class (based on AutoModelForCausalLM) we used to have. So previous versions of llm-compressor (library to run oneshot and finetune) and compressed-tensors (library to compress/decompress tensors) will support this pipeline. If for comfortability, we want to add a version check, I would be happy to add! Let me know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answering, it's fine sounds like we don't need a check, merging!
…4520) * populate quantization_config for kv-cache-scheme only configs * make compressed-tensors quantized models trainable * populate versions on quant config * pass oneshot then finetune * remove breakpoint * SunMarc comments and fix to_dict logic * lint * lint * test * comment * comments'
…4520) * populate quantization_config for kv-cache-scheme only configs * make compressed-tensors quantized models trainable * populate versions on quant config * pass oneshot then finetune * remove breakpoint * SunMarc comments and fix to_dict logic * lint * lint * test * comment * comments'
What does this PR do?
Using HFQuantizer, models that were quantized using
compressed-tensors
can be loaded.The purpose of this pr is to allow quantized models to be loaded using the trainer pathway.
Currently, if quantized (HFQuantizer is instantiated based on the quantization_config), then raise for training.
Using llm-compressor, we have a pathway to run oneshot (to quantize) and then finetune (QAT).
Who can review?
@SunMarc @younesbelkada