-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemGPT code is AAA+ unfortunately I cannot get it to work (no matter which LLM I try I cannot get it to work reliably) #1776
Comments
Can you try again with 0.5.0? There should be a lot of bugfixes to the configuration for LLMs and embedding models now. Please re-open if you are still having issues. |
I have been facing similar issues with claude sonnet. The "decision making" part of managing memory and tools seem to be quite unusable in the sense it cannot decide when and which functions to call. Once I tell it to update and fetch from core/archive memory it does it but that becomes less practical somehow for most usecases Would be great if you guys can share some benchmarks and also best practices around making it work @distributev have you given it a try more recently? |
@shivamatfigr From the MemGPT perspective I got pretty good results with gpt4o-mini which has also a very good price. As a basic assistant it works well - it still gives some stacktraces here and there related with memory updating but it is for sure usable (not like the other LLMs which I could not get them to work at all) gpt4o-mini comes 5th on this leaderboard and notice the pricing also (vs other LLMs in the top - which anyway I could not get https://gorilla.cs.berkeley.edu/leaderboard.html This leaderboard make sense for MemGPT because it tests the LLMs which are good at "function calling" - what MemGPT needs As a basic assistant to keep your TODOs gpt4o-mini would work. The limitation is that .... it is gpt4o-mini so you cannot do much more than "keeping your TODOs" I tried to overcome this by giving gpt4o-mini tools to use when its limits are reached. One tool I gave it was "another smarter LLM to call using a command line interface" - interesting to play with but the system becomes too complex, prone to errors and a rabbit hole. gpt4o-mini does not realize for itself "I'm too stupid for this, let's use the LLM cli tool to ask the other smarter LLM" I need to explicitly tell it "now use your LLM cli tool and ask Claude Sonnet" - which defeats the purpose. |
@distributev quick question as we're looking into this - does gpt4o more reliably use the memory tools? |
Most of my testing I described above was done before the MemGPT to Letta project name change. At that time I could get only gpt4o-mini working gtp4o (directly from openai), claude sonnet (directly from anthropic), llama and few other LLMs from open router were all unusable and failing with the same stacktrace generated by the memory functions tools which MemGPT is using. gtp4o-mini was the only model I could get working. I tried letta once and it is working the same with gtp4o-mini but I did not re-tried letta with all the LLMs which |
Hi MemGPT Team,
Thank you for such a high quality codebase I'm pretty confident that, as LLMs will improve, MemGPT will become the "standard" between all products of its kind.
I would recommend MemGPT team to put in place a webpage similar with
https://aider.chat/docs/leaderboards/
where people would immediately see (and with high confidence) what kind of quality to expect from any of the available LLMs.
I want to ask the community if you were able to get MemGPT working in a "day to day" kind of way and, if yes, which LLMs are you using with and for which kind of scenario? (how long is your "persona" prompt? do you have instructions for the LLM to follow in your persona? How many? Are you using custom tools, if you are using custom tools how many and how complex?) If you are using "day to day" does MemGPT/LLM flawlesly work or you are used to see "stacktraces" and when you get a stacktrace you just click "run again" and it works next time?
I'm very curious to understand how (and if) people are using MemGPT.
I want to say that, with my own AI projects, I understood (before MemGPT) that it is incredible difficult to make LLMs to follow instructions no matter how well crafted and clear the prompt instructions are and it becomes even more difficult when the number of instructions to follow by the LLM increase and when you combine this situation with "function calling" ability of LLMs (where you can also have LLM to call functions and, similarly with instructions, the more functions you add the more confused the LLM becomes) => it becomes a very difficult problem, with the current ability of LLMs, to get anything more that "hello world" working. Even when the LLMs will follow the instructions the first time for the next two requests will not and will get a stacktrace (for anything more than "hello world").
Because of that I'm pretty sure what I will describe below it is happening because what the LLMs are (not) capable now and not because of MemGPT (which I already said has very well crafted source code).
I tried MemGPT two times the first time 6 months ago and gave up because for 90% of requests I was getting stacktraces and 10% of requests were working. For the past few days I tried again MemGPT and this time I also got familiar with the codebase. The situation is the same like 6 months ago.
With anthropic claude sonnet I could not get anything working.
With openai's gpt-4-1106-preview (which is advertised as 'featuring improved instruction following, JSON mode, reproducible outputs') I am able (from time to time) to get some requests processed but only when I start with
--first --no-verify
- even so subsequent requests starts to fail and cannot recover. I also tried other openai models and I could not get any to work.Here is how I create my agent.
There is no point in attaching here long stacktraces. I pretty confident I have the setup/configuration correctly done.
The text was updated successfully, but these errors were encountered: