Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Refactor stringdtype casts.c to use cpp templates #28091

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

peytondmurray
Copy link
Contributor

@peytondmurray peytondmurray commented Jan 3, 2025

Summary

Part of #25693.

This PR replaces several places where macros were being used to generate cast specs for the vstring type with c++ templates. No user-facing changes were made.

Other changes

  • In working on this, I added a few checks in various places to ensure that the cast specs were being correctly generated.
  • Added a few comments about why 64-bit and 128-bit floats are handled differently than the 16-bit and 32-bit floats
  • Fixed an instance where a UnicodeEncodeError could be raised. Previously this was done like this:
PyObject *exc = PyObject_CallFunction(
        PyExc_UnicodeEncodeError, "ss#nns", "ascii", s.buf,
        (Py_ssize_t)s.size, (Py_ssize_t)i, (Py_ssize_t)(i+1), "ordinal not in range(128)");
PyErr_SetObject(PyExceptionInstance_Class(exc), exc);
Py_DECREF(exc);

but in using ss#nns, PyExc_UnicodeEncodeError expected a string and buffer length for the second and third arguments. I've changed this to sOnns, to accept a bytes object containing the offending data.

Notes

  • In swapping over to templates, it was helpful to add a few new functions that mapped numpy types to names, shortnames, and PyArray_DTypeMeta structs. I'm not sure if there is a better way to do this, but if you have any ideas about this I'd be grateful to learn how this could be done more elegantly (or if these would make sense to have in a different place in the code - I could see them being useful in more places than just this...):
  1. typenum_to_shortname_cstr - I realize not all NPY_TYPES have corresponding shortnames, but these were the ones necessary to finish this effort
  2. typenum_to_cstr
  3. typenum_to_dtypemeta
  • C++ is a bit more picky about a few other things that are unrelated to the macros, so I fixed those as well:
    • Variables of type char can't be initialized with string literals
    • Some cases where integer types were being compared to enum types or size_t
    • Cases where enum flags were being combined with |, e.g. NPY_METH_NO_FLOATINGPOINT_ERRORS | NPY_METH_REQUIRES_PYAPI is not of type NPY_ARRAYMETHOD_FLAGS, it is of type int. Explicit casts were added to ensure the correct type
    • There was one instance where a goto was jumping over a variable initialization, which isn't allowed by the cpp compiler. The instance was modified to work without the jump.

@ngoldbaum I noticed that the string to f128 cast has only the NPY_METH_NO_FLOATINGPOINT_ERRORS array method flag. But if you look at string_to_float, or string_to_float_resolve_descriptors, it looks like there are lines that are accessing python objects, calling Py_INCREF, etc. I've kept the array flags identical to how they were before, but should this also have a NPY_METH_REQUIRES_PYAPI flag as well?

@seberg seberg changed the title [MAINT] Refactor casts.c to use cpp templates [MAINT] Refactor stringdtype casts.c to use cpp templates Jan 3, 2025
@seberg seberg changed the title [MAINT] Refactor stringdtype casts.c to use cpp templates MAINT:Refactor stringdtype casts.c to use cpp templates Jan 3, 2025
@seberg seberg changed the title MAINT:Refactor stringdtype casts.c to use cpp templates MAINT: Refactor stringdtype casts.c to use cpp templates Jan 3, 2025
@peytondmurray peytondmurray force-pushed the 25693-npystring-templating-casts branch 6 times, most recently from fd3e922 to 3cdcafe Compare January 9, 2025 06:13
@peytondmurray peytondmurray force-pushed the 25693-npystring-templating-casts branch from 3cdcafe to 220615b Compare January 9, 2025 06:26
@peytondmurray peytondmurray force-pushed the 25693-npystring-templating-casts branch from 220615b to d15e583 Compare January 9, 2025 06:56
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone else wants to look over this carefully, I strongly recommend the side-by-side diff view.

Most of these comments are formatting related. Overall this looks like a very nice improvement.

numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
goto next_step;
in += in_stride;
out += out_stride;
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're going to change this can you also do the same change in stringdtype_ufuncs.cpp? Or just leave it, I added the gotos because I was spending far too much time trying to understand the control flow and the repeated code was confusing me.

Copy link
Contributor Author

@peytondmurray peytondmurray Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I should have left a note about this in the body of the PR. This change is because the compiler will not allow you to goto across variable definitions. So with cpp this is actually a compile time error.

I can still make the same changes in stringdtype_ufuncs.cpp as well so that it matches, if you prefer - but on first look it seems like those jumps don't cross variable definitions.

numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/stringdtype/casts.cpp Outdated Show resolved Hide resolved
peytondmurray and others added 7 commits January 10, 2025 12:07
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@peytondmurray peytondmurray marked this pull request as ready for review January 10, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
  NODES
COMMUNITY 2
Idea 1
idea 1
Note 2
Project 5
USERS 1