0.1.171
- Support accurate chat-formatting when using local LLMs via
oobabooga
(and possibly alsollama-cpp-python
) . Once you spin up a local LLM at an endpoint likelocalhost:5000/v1
, you can either use the chat-formatting done by the API server, or (to have more control over the formatting, and/or ensure accurate formatting), you can specify the formatter viaOpenAIGPTConfig.formatter
, e.g.
llm_config = OpenAIGPTConfig(
chat_model="local/localhost:5000/v1",
formatter="mistral-instruct-v0.2"
)
Langroid uses the formatter name to find the nearest matching model name on the HuggingFace hub, gets its corresponding tokenizer
, and uses tokenizer.apply_chat_template(chat)
to convert a chat into a single string.
HuggingFace tends to have the most reliable chat templates, so this will ensure accurate chat formatting that complies with how the local LLM was trained, and will typically produce better results compared to deviating from this format.
(For example when using litellm
via ollama
we found that the Mistral chats were being formatted differently from how MistralAI specified them).
As a convenience, the formatter
can be included as a suffix at the end of chat_model
, separated by //
, e.g.
llm_config = OpenAIGPTConfig( chat_model = "local/localhost:5000/v1//mistral-instruct-v0.2")
This lets you easily run any of the example scripts that have a model-switch param -m
, by simply adding
-m local/localhost:5000/v1//mistral-instruct-v0.2
Also any of the tests can be run against a local model by using the --m <local_model>//<formatter>
option, e.g.
pytest -s -x tests/main/test_llm.py --m local/localhost:5000/v1//mistral-instruct-v0.2
- Add all OpenAI API params (such as
logprobs
etc) asOpenAIGPTConfig.params
, which is a Pydantic object of classOpenAICallParams
(Caution these params are not yet used anywhere in the code, other thantemperature
).