Support gradient checkpointing in Qwen2VL ViT #34724

li-plus · 2024-11-14T01:53:08Z

What does this PR do?

Support gradient checkpointing for Qwen2VL ViT part. The current implementation in main branch only supports gradient checkpointing in language part. This PR further supports checkpointing vision encoder.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @ArthurZucker, @amyeroberts, @qubvel

qubvel · 2024-11-18T11:54:54Z

Hi @li-plus! Thanks for your contribution!

Can you please enable gradient checkpointing tests for this model to make sure it works properly? I see these tests are skipped on main, however, I was able to run them without issues locally.

li-plus · 2024-11-18T12:53:19Z

Hi @li-plus! Thanks for your contribution!

Can you please enable gradient checkpointing tests for this model to make sure it works properly? I see these tests are skipped on main, however, I was able to run them without issues locally.

@qubvel Thanks for advice. I've re-enable these gradient checkpointing tests for Qwen2VL in the latest commit. They run just fine on my machine.

qubvel · 2024-11-18T13:01:05Z

Thanks! Can you please also push an empty commit with the message [run-slow] qwen2_vl to trigger all model tests? We should be fine here, cause test_training_gradient_checkpointing is not a slow test, however just to double check everything else is fine 🙂

li-plus · 2024-11-18T13:14:01Z

Thanks. Just pushed!

HuggingFaceDocBuilderDev · 2024-11-18T13:48:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

li-plus · 2024-11-18T14:02:20Z

@qubvel It seems those failures are not related to this PR. Any idea?

qubvel

No worries, I checked, the same tests fail on main. Thanks for triggering slow tests!

ArthurZucker

Thanks indeed there is supports_gradient_checkpointingset to True good catch

* Support gradient checkpointing in Qwen2VL ViT * Enable gradient checkpoint tests for Qwen2VL * [run-slow] qwen2_vl

ShuaibinQi · 2024-12-16T12:28:28Z

@li-plus

Thanks your commit!
when I use gradient_checkpointing for Qwen2VisionTransformerPretrainedModel, I meet this Error:

AttributeError: 'Qwen2VisionTransformerPretrainedModel' object has no attribute '_gradient_checkpointing_func'. Did you mean: 'gradient_checkpointing'?

Is it necessary to implement this '_gradient_checkpointing_func' in 'Qwen2VisionTransformerPretrainedModel' ?

ShuaibinQi · 2024-12-16T12:29:57Z

@li-plus

Thanks your commit! when I use gradient_checkpointing for Qwen2VisionTransformerPretrainedModel, I meet this Error:

AttributeError: 'Qwen2VisionTransformerPretrainedModel' object has no attribute '_gradient_checkpointing_func'. Did you mean: 'gradient_checkpointing'?

Is it necessary to implement this '_gradient_checkpointing_func' in 'Qwen2VisionTransformerPretrainedModel' ?

My transfromers version is latest, and it is:
transformers ==4.47.0

li-plus · 2024-12-16T12:49:18Z

@ShuaibinQi Hi, I did not reproduce this error using transformers 4.47.0 using this demo code:

import torch
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info


processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="cuda",
)
model.gradient_checkpointing_enable()
model.train()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

print(f'before forward {torch.cuda.memory_allocated()/1e9=:.3f} GB, {torch.cuda.memory_reserved()/1e9=:.3f} GB')
output = model(**inputs, use_cache=False)
print(f'after forward {torch.cuda.memory_allocated()/1e9=:.3f} GB, {torch.cuda.memory_reserved()/1e9=:.3f} GB')
output.logits.sum().backward()
print(f'after backward {torch.cuda.memory_allocated()/1e9=:.3f} GB, {torch.cuda.memory_reserved()/1e9=:.3f} GB')

Did you use gradient_checkpointing_enable to enable gradient checkpointing?

li-plus force-pushed the qwen2vl-grad-ckpt branch 2 times, most recently from 9163b7f to cad72d1 Compare November 14, 2024 01:58

qubvel added the Multimodal label Nov 18, 2024

li-plus added 2 commits November 18, 2024 20:51

Support gradient checkpointing in Qwen2VL ViT

b4a52c1

Enable gradient checkpoint tests for Qwen2VL

4060f07

li-plus force-pushed the qwen2vl-grad-ckpt branch from cad72d1 to 4060f07 Compare November 18, 2024 12:51

qubvel added the run-slow label Nov 18, 2024

[run-slow] qwen2_vl

d991705

qubvel approved these changes Nov 18, 2024

View reviewed changes

qubvel requested a review from ArthurZucker November 18, 2024 21:02

ArthurZucker approved these changes Nov 19, 2024

View reviewed changes

ArthurZucker merged commit 0db91c3 into huggingface:main Nov 19, 2024
16 of 18 checks passed

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Support gradient checkpointing in Qwen2VL ViT (huggingface#34724)

82b1e2d

* Support gradient checkpointing in Qwen2VL ViT * Enable gradient checkpoint tests for Qwen2VL * [run-slow] qwen2_vl

li-plus deleted the qwen2vl-grad-ckpt branch December 16, 2024 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support gradient checkpointing in Qwen2VL ViT #34724

Support gradient checkpointing in Qwen2VL ViT #34724

li-plus commented Nov 14, 2024

qubvel commented Nov 18, 2024 •

edited

Loading

li-plus commented Nov 18, 2024

qubvel commented Nov 18, 2024 •

edited

Loading

li-plus commented Nov 18, 2024

HuggingFaceDocBuilderDev commented Nov 18, 2024

li-plus commented Nov 18, 2024

qubvel left a comment

ArthurZucker left a comment

ShuaibinQi commented Dec 16, 2024

ShuaibinQi commented Dec 16, 2024

li-plus commented Dec 16, 2024 •

edited

Loading

Support gradient checkpointing in Qwen2VL ViT #34724

Support gradient checkpointing in Qwen2VL ViT #34724

Conversation

li-plus commented Nov 14, 2024

What does this PR do?

Before submitting

Who can review?

qubvel commented Nov 18, 2024 • edited Loading

li-plus commented Nov 18, 2024

qubvel commented Nov 18, 2024 • edited Loading

li-plus commented Nov 18, 2024

HuggingFaceDocBuilderDev commented Nov 18, 2024

li-plus commented Nov 18, 2024

qubvel left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ShuaibinQi commented Dec 16, 2024

ShuaibinQi commented Dec 16, 2024

li-plus commented Dec 16, 2024 • edited Loading

qubvel commented Nov 18, 2024 •

edited

Loading

qubvel commented Nov 18, 2024 •

edited

Loading

li-plus commented Dec 16, 2024 •

edited

Loading