Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistral-related models for QnA #34045

Merged
merged 8 commits into from
Oct 14, 2024
Merged

Conversation

vasqu
Copy link
Contributor

@vasqu vasqu commented Oct 9, 2024

What does this PR do?

Adds question answering to mistral, mixtral, qwen2, qwen2moe. Either we take every model due to the copy statements or we need to ignore it in the copied checks. Based on #29168 but using copied from instead.

Motivation: We have a benchmark paper at https://github.com/LSX-UniWue/SuperGLEBer which uses the transformers QnA models for simplicity but due to it not being available in main, it's manually patched in. Would be great to see it getting into main!

Fixes #28908

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LysandreJik @ArthurZucker

Comment on lines 1296 to 1314
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.__init__ with Llama->Mistral,transformer->model
def __init__(self, config):
super().__init__(config)
self.model = MistralModel(config)
self.qa_outputs = nn.Linear(config.hidden_size, 2)

# Initialize weights and apply final processing
self.post_init()

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.get_input_embeddings with transformer->model
def get_input_embeddings(self):
return self.model.embed_tokens

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.set_input_embeddings with transformer->model
def set_input_embeddings(self, value):
self.model.embed_tokens = value

@add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.forward with Llama->Mistral, transformer->model
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying from each one individually due to llama having a wrong base model prefix which already led/leads to issues in the past: #30381. Currently, it makes copying force to include the base model prefix (ref. here) if using top-level copied from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's more of a stylistic choice: individual copies vs. include (unnecessary) base model prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #34061 gets merged, we can top-level copy from llama without any problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use # Ignore copy on the single place where the copy does not match!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, perfect I'll change it later and ping you when ready ;)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general! would be nice to have a single # Copied from at the top of the class (either # Ignore copy or just don't copy from llama for one of them!)

Comment on lines 1296 to 1314
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.__init__ with Llama->Mistral,transformer->model
def __init__(self, config):
super().__init__(config)
self.model = MistralModel(config)
self.qa_outputs = nn.Linear(config.hidden_size, 2)

# Initialize weights and apply final processing
self.post_init()

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.get_input_embeddings with transformer->model
def get_input_embeddings(self):
return self.model.embed_tokens

# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.set_input_embeddings with transformer->model
def set_input_embeddings(self, value):
self.model.embed_tokens = value

@add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering.forward with Llama->Mistral, transformer->model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use # Ignore copy on the single place where the copy does not match!

@vasqu
Copy link
Contributor Author

vasqu commented Oct 10, 2024

@ArthurZucker Changed it to top-level copied from now. Lmk if I should change something else.

)
# Copied from transformers.models.llama.modeling_llama.LlamaForQuestionAnswering with Llama->Mistral,LLAMA->MISTRAL,transformer->model
class MistralForQuestionAnswering(MistralPreTrainedModel):
base_model_prefix = "model"
Copy link
Contributor Author

@vasqu vasqu Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_model_prefix = "model" is due to the llama stuff I mentioned, otherwise the classes have different structures and copied from will fail in an error.

@ArthurZucker
Copy link
Collaborator

That's it! Merging 🤗

@ArthurZucker ArthurZucker merged commit 7434c0e into huggingface:main Oct 14, 2024
21 checks passed
@vasqu vasqu deleted the mistral-for-qna branch October 14, 2024 09:46
NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Oct 21, 2024
* mistral qna start

* mixtral qna

* oops

* qwen2 qna

* qwen2moe qna

* add missing input embed methods

* add copied to all methods, can't directly from llama due to the prefix

* make top level copied from
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* mistral qna start

* mixtral qna

* oops

* qwen2 qna

* qwen2moe qna

* add missing input embed methods

* add copied to all methods, can't directly from llama due to the prefix

* make top level copied from
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
* mistral qna start

* mixtral qna

* oops

* qwen2 qna

* qwen2moe qna

* add missing input embed methods

* add copied to all methods, can't directly from llama due to the prefix

* make top level copied from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add MistralForQuestionAnswering
2 participants
  NODES
COMMUNITY 3
innovation 1
Project 5
USERS 1