-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Use Python pickle protocol version 4 for np.save #26388
Conversation
Should this change be backported to NumPy 1.23/1.24? Python changed its default pickle protocol version starting from Python 3.8, and NumPy 1.23 and later versions only support Python 3.8 and newer. |
The failures of the two Windows jobs are not related to this PR. The issue lies with the new spin version 0.9, which added gcov support but called the build with the gcov keyword argument even when the test does not require gcov. I have created a PR (scientific-python/spin#183) at spin to address this problem. |
No, we're currently only backporting to the 2.0 and 1.26 branches. |
There is a reason that we hardcode the protocol, despite the drawbacks when we forget about it as we drop support for versions of Python. We don't want Python 3.14 to decide that In an ideal world, the policy that we probably want is something like:
|
Scratch 1.26, we cannot build wheels for macosx arm64 and it is not worth the work to fix that. |
The Since NumPy only supports the officially supported Python versions, using the It's also possible to allow np.save() users to override the pickle protocol in order to benefit from the performance gains provided by the newer protocols. However, I believe this requirement can be addressed by letting users tweak the |
This doesn't say anything about changes in Python 3.14 or any future version though...
Today I learned this is a mutable constant:
Still, I don't think we should be telling users to mutate that, and adding a new keyword to handle this in numpy makes sense to me, along with updating the default in numpy. |
@ambv is this a CPython policy that |
This could use a release note, see |
I believe users who really request a specific pickle protocol version know what they're doing, since it's pretty low level details.
If we decide to add the |
The policy in that comment is that you are only allowed to bump it if the last version that didn't have it is EOLed. There's nothing in there that states that it will be bumped when that version goes EOL.
The |
Since the support of Python version is 3.8+, I believe 5 will be the best default? Does this change need steering council or something? |
Yes. |
Lets just hardwire a later version. '4' would be conservative and would solve the size problem. |
After some thought, I believe it's the best way if we change the interface to provide users with flexibility. I will implement the changes with a default protocol of 5 (or 4?), and allow users to use the keyword |
We discussed this at the triage meeting. We suggest sticking with 4. This does need a release note. |
Shall we add the keyword to np.save? I’m currently doing some benchmarks to see whether it’s possible to increase the performance with protocol 5 |
I'll follow your decision. If you believe we shall give user the flexibility, I'll add the keyword, add the performance character of version 4 and 5 to the document, and write the release note following the existing example. If you believe we should hardwire version 4, I'll only write the release note to explain the performance improvement of protocol 4 and the possible incompatibility if someone's using ancient python version, everyone using Python 3.4+ shall be fine, since |
Let's break this into pieces:
|
Currently I only see performance gains for protocol 5 in IPC situation, which is not actually the use case for Since we hardcode protocol version 4 in |
Many people actually use |
It is best to treat separate problems in separate PRs/issues. Let's go with what we have here now. If you want to continue, you can open a deprecation PR to deprecate the out-dated |
Thanks @vxst |
* BUG: Use default Python pickle protocol version rather than outdated protocol 3 * Update default pickle of np.save to 4 * Adds document for default pickle protocol 4
BUG: Use Python pickle protocol version 4 for np.save (#26388)
Currently, when NumPy saves data using pickle, it forces the protocol version to 3, which was the default value from Python 3.0 to Python 3.7. However, since Python 3.7 has reached its end-of-life (EOL), there are no actively maintained Python versions that default to using pickle protocol 3.
By allowing NumPy to use the default pickle protocol version, objects larger than 4GB can be pickled, resolving the issue described in #26224. Although this new protocol version is incompatible with Python 3.3 and earlier versions, Python 3.3 has long reached its EOL. Therefore, there is no need to force pickle protocol 3 when saving data using pickle.
It is important to note that the pickle module still supports reading pickled objects from protocol 3. As a result, this change will not introduce incompatibilities when reading files saved with an earlier pickle protocol(which is used in current numpy release).