Mistral-related models for QnA #34045

vasqu · 2024-10-09T15:59:09Z

What does this PR do?

Adds question answering to mistral, mixtral, qwen2, qwen2moe. Either we take every model due to the copy statements or we need to ignore it in the copied checks. Based on #29168 but using copied from instead.

Motivation: We have a benchmark paper at https://github.com/LSX-UniWue/SuperGLEBer which uses the transformers QnA models for simplicity but due to it not being available in main, it's manually patched in. Would be great to see it getting into main!

Fixes #28908

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LysandreJik @ArthurZucker

vasqu · 2024-10-10T10:37:40Z

src/transformers/models/mistral/modeling_mistral.py

+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.__init__ with Llama->Mistral,transformer->model
+    def __init__(self, config):
+        super().__init__(config)
+        self.model = MistralModel(config)
+        self.qa_outputs = nn.Linear(config.hidden_size, 2)
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.get_input_embeddings with transformer->model
+    def get_input_embeddings(self):
+        return self.model.embed_tokens
+
+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.set_input_embeddings with transformer->model
+    def set_input_embeddings(self, value):
+        self.model.embed_tokens = value
+
+    @add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.forward with Llama->Mistral, transformer->model


Copying from each one individually due to llama having a wrong base model prefix which already led/leads to issues in the past: #30381. Currently, it makes copying force to include the base model prefix (ref. here) if using top-level copied from.

So it's more of a stylistic choice: individual copies vs. include (unnecessary) base model prefix.

If #34061 gets merged, we can top-level copy from llama without any problems.

You can also use # Ignore copy on the single place where the copy does not match!

Ah ok, perfect I'll change it later and ping you when ready ;)

ArthurZucker

LGTM in general! would be nice to have a single # Copied from at the top of the class (either # Ignore copy or just don't copy from llama for one of them!)

ArthurZucker · 2024-10-10T12:45:19Z

src/transformers/models/mistral/modeling_mistral.py

+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.__init__ with Llama->Mistral,transformer->model
+    def __init__(self, config):
+        super().__init__(config)
+        self.model = MistralModel(config)
+        self.qa_outputs = nn.Linear(config.hidden_size, 2)
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.get_input_embeddings with transformer->model
+    def get_input_embeddings(self):
+        return self.model.embed_tokens
+
+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.set_input_embeddings with transformer->model
+    def set_input_embeddings(self, value):
+        self.model.embed_tokens = value
+
+    @add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
+    # Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.forward with Llama->Mistral, transformer->model


You can also use # Ignore copy on the single place where the copy does not match!

vasqu · 2024-10-10T14:35:43Z

@ArthurZucker Changed it to top-level copied from now. Lmk if I should change something else.

vasqu · 2024-10-10T14:37:11Z

src/transformers/models/mistral/modeling_mistral.py

+)
+# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering with Llama->Mistral,LLAMA->MISTRAL,transformer->model
+class MistralForQuestionAnswering(MistralPreTrainedModel):
+    base_model_prefix = "model"


base_model_prefix = "model" is due to the llama stuff I mentioned, otherwise the classes have different structures and copied from will fail in an error.

ArthurZucker · 2024-10-14T06:53:29Z

That's it! Merging 🤗

* mistral qna start * mixtral qna * oops * qwen2 qna * qwen2moe qna * add missing input embed methods * add copied to all methods, can't directly from llama due to the prefix * make top level copied from

vasqu added 7 commits October 9, 2024 17:47

mistral qna start

345b3b5

mixtral qna

d838f35

oops

33b4a8f

qwen2 qna

359aadd

qwen2moe qna

41aff11

add missing input embed methods

a38bde6

add copied to all methods, can't directly from llama due to the prefix

eaf417c

vasqu commented Oct 10, 2024

View reviewed changes

ArthurZucker approved these changes Oct 10, 2024

View reviewed changes

make top level copied from

0234500

vasqu commented Oct 10, 2024

View reviewed changes

ArthurZucker merged commit 7434c0e into huggingface:main Oct 14, 2024
21 checks passed

vasqu deleted the mistral-for-qna branch October 14, 2024 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral-related models for QnA #34045

Mistral-related models for QnA #34045

vasqu commented Oct 9, 2024 •

edited

Loading

vasqu Oct 10, 2024

vasqu Oct 10, 2024

vasqu Oct 10, 2024

ArthurZucker Oct 10, 2024

vasqu Oct 10, 2024

ArthurZucker left a comment

ArthurZucker Oct 10, 2024

vasqu commented Oct 10, 2024

vasqu Oct 10, 2024 •

edited

Loading

ArthurZucker commented Oct 14, 2024

Mistral-related models for QnA #34045

Mistral-related models for QnA #34045

Conversation

vasqu commented Oct 9, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

vasqu Oct 10, 2024

Choose a reason for hiding this comment

vasqu Oct 10, 2024

Choose a reason for hiding this comment

vasqu Oct 10, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 10, 2024

Choose a reason for hiding this comment

vasqu Oct 10, 2024

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Oct 10, 2024

Choose a reason for hiding this comment

vasqu commented Oct 10, 2024

vasqu Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

ArthurZucker commented Oct 14, 2024

vasqu commented Oct 9, 2024 •

edited

Loading

vasqu Oct 10, 2024 •

edited

Loading