Wikidata:Property proposal/Internet Dictionary of Polish Surnames ID

‎Internet Dictionary of Polish Surnames ID

edit

Originally proposed at Wikidata:Property proposal/Authority control

Descriptionidentifier in the nazwiska.ijp.pan.pl dictionary of surnames used in Poland
RepresentsInternet Dictionary of Polish Surnames (Q118130420)
Data typeExternal identifier
Domainproperty, instance of family name (Q101352), masculine family name (Q18972245), and any other entity that is some form of a surname (male, female, toponymic and whatnot)
Allowed values[-A-Z]+
Example 1Kowalski (Q3199417)KOWALSKI
Example 2Nowak (Q15073902)NOWAK
Example 3Czeszejko-Sochacki (Q121880021)CZESZEJKO-SOCHACKI
Sourcehttps://nazwiska.ijp.pan.pl/
Number of IDs in source29,997 (as of 2023-09-12)
Expected completenesseventually complete (Q21873974)
Formatter URLhttps://nazwiska.ijp.pan.pl/haslo/show/name/$1
Robot and gadget jobsYes, it would be perfect to let a bot simply populate this field with data I've collected.

Motivation

edit

https://nazwiska.ijp.pan.pl is an online dictionary of surnames used in Poland (chiefly "Polish" surnames). For each surname in the database (~30,000 items as of August 2023), there's information re: origins of the surname, its popularity, and alternative spelling variants. This is now a freely available "ground truth" version of what we know about surnames in use in Poland. The URL formatter is pretty straightforward. However, there are multiple links concerning the same surname that will resolve successfully for the same surname: https://nazwiska.ijp.pan.pl/haslo/show/id/4910 https://nazwiska.ijp.pan.pl/haslo/show/name/TARCZYŃSKI https://nazwiska.ijp.pan.pl/haslo/show/name/TARCZY%C5%83SKI And all the above resolve successfully no matter the capitalization. Then there's a link to a full-page printout, too: https://nazwiska.ijp.pan.pl/haslo/print/id/4910 The printout version is in HTML and is actually more convenient to read than the standard page version. I don't know what the unique and permanent URL structure is and there are no indicators on the website itself. I think it would make most sense to go by the actual surname string and default to all caps (that's how surnames are listed in the body of the page; this is how data in the PESEL registry is maintained; and this would also avoid dilemmas related to automatic injection and capitalization of compound surnames. It's my understanding that the ID would be more "unique" than the character string, but it would entail and extra step (i.e., go to the search box, look up the surname, copy the ID). Given the database is high-quality, there's little reason to believe IDs would be more stable than surname strings. Finally, I went through the whole site and only found two instances of double entries – and there the IDs were different, so it seems that if there's potential for duplication, it'll happen no matter the ID or surname trying. TL;DR I suggest we go with https://nazwiska.ijp.pan.pl/haslo/show/name/TARCZYŃSKI ([A-Z])

(moved from description ArthurPSmith (talk) 19:11, 5 September 2023 (UTC))[reply]

This would provide readers with a certain "ground" truth" on surnames in Poland. (Add your motivation for this property here and your signature)

Discussion

edit

I'm very sorry if this isn't sufficient. First time posting something like this; it's a bit overwhelming. Update: the path can contain - (hyphens).

--Itorokelebogile (talk) 16:07, 27 August 2023 (UTC)[reply]

@Itorokelebogile: I fixed up the proposal a bit. The examples should look like what I did for example 1 - can you add 2 more to flesh this out a little? Otherwise it looks good to me. ArthurPSmith (talk) 19:16, 5 September 2023 (UTC)[reply]
Bless you, kind Wikidata soul! Updated the remaining 2 examples. Itorokelebogile (talk) 09:51, 6 September 2023 (UTC)[reply]
  NODES
INTERN 7
Project 2