Wikidata:Property proposal/IP range start
IP range start
editOriginally proposed at Wikidata:Property proposal/Organization
Description | start of the IP range in hexidecimal |
---|---|
Data type | String |
Domain | Qualifier for IPv4 routing prefix (P3761) and IPv6 routing prefix (P3793) in raw hex. |
Allowed values | ^[0-9a-f]{8,12}$ |
Example 1 | University of Oxford (Q34433) → IPv6 routing prefix (P3793) → 2001:630:440::/44 → 20010db81234 |
Example 2 | Wikimedia Foundation (Q180) → IPv4 routing prefix (P3761) → 198.35.26.0/23 → c6231a00 |
Example 3 | Johns Hopkins University (Q193727) → IPv4 routing prefix (P3761) → 128.220.0.0/16 → 80dc0000 |
Example 4 | University of Chicago (Q131252) → IPv6 routing prefix (P3793) → 2a03:b600:640::/107 → 2a03b6000640 |
Planned use | Create a bot to update the qualifier based on the values in the properties. |
Robot and gadget jobs | This qualifier should be populated with a bot that I will write |
See also | IPv6 routing prefix (P3793) & IPv4 routing prefix (P3761) |
Motivation
editI would like to make a tool in toolforge that will allow a user to input an IP address and get the organization (Q43229) associated for that IP address. Unfortunately, you cannot query a range unless you have the start and end of that range. Therefore, I would like to create this property (and I will propose an end range if this is approved) that will be populated by a bot. Then organizations can be queried by the IP address. I chose raw hex as a format because that is what MediaWiki uses in the ipblocks table, however any ordered format should work (for instance, a non-seperated decimal format, etc.) U+1F360 (talk) 14:51, 26 September 2019 (UTC)
Discussion
edit- Comment This would be redundant with the IP range itself. To make such a tool, the simplest way is to do as follows: 1. extract all the IP ranges know to Wikidata with SPARQL, 2. index them in your own database, with the appropriate datatype (for instance postgresql has a dedicated datatype for ranges, but it can also be done with numeric fields in MySQL / MariaDB), 3. Query your SQL database directly. (4. set-up a job to periodically update your database from Wikidata). − Pintoch (talk) 15:04, 26 September 2019 (UTC)
- That is true, that is a way to do it, but is it valuable for other people to be able to query the data themselves? I suppose ideally there would be a new type of field that would allow for more than one value (start and end), but as far as I know that doesn't exist right now. U+1F360 (talk) 15:09, 26 September 2019 (UTC)
- Another way to (maybe) solve this, would be to create a FILTER in SPARQL, but I'm not sure that's possible (errr... if it can create a FILTER that would index that itself rather than having to generate the range for every item on every query). U+1F360 (talk) 15:19, 26 September 2019 (UTC)
- I'm asking the development team if there is a better solution than duplicating the data Wikidata:Contact_the_development_team#IP_Range_Querieis. U+1F360 (talk) 15:51, 26 September 2019 (UTC)
- @U+1F360: Well, almost any solution is going to require duplication somewhere… you’re proposing to duplicate the data in the qualifiers, Pintoch is suggesting to duplicate it in an external database instead, and special support in Wikibase/WDQS would most likely require duplication in some internal index. I don’t think adding special support for this to the query service is worth the effort, so I agree with Pintoch that an external database seems like the preferable approach. (You can also make that database accessible to other tools, e. g. via a simple HTTP API, to avoid them having to repeat the work.) --Lucas Werkmeister (WMDE) (talk) 12:05, 30 September 2019 (UTC)
- Oppose. If attempting to represent an IPv4 or IPv6 range with a Quantity data type rather than String data type (due to the extra complexity in parsing String data types), then the value should be in decimal (base 10) notation with the value selected as the middle of the range, with +/- X where X is the distance to the start and end of the range. Dhx1 (talk) 14:02, 8 October 2019 (UTC)
- @Dhx1: That is a great idea! Let me figure out some examples. U+1F360 (talk) 14:06, 8 October 2019 (UTC)
- @Dhx1: As a worse case scenario example, if the range was
10.0.0.1/24
then the value would be167772287±127.5
167772287.5±127.5
? That seems awesome to me(rounding up in either direction gets the most extreme values). Since that value can be queried, and the current CIDR cannot be, perhaps this should be the default and the existing field should be deprecated? If someone needs the CIDR syntax it's a simple conversion to get the start and end and convert that CIDR (I'm just wondering if it would be wise to not duplicate the data per the discussion above). U+1F360 (talk) 20:30, 8 October 2019 (UTC) - @Dhx1: Oops. I did my math wrong, given
10.0.0.1/24
the value would be167772287.5±127.5
. Even better, doesn't require rounding to get the start and end of the range. U+1F360 (talk) 21:23, 8 October 2019 (UTC) - Here's the remaining questions I have:
- Should I create a new proposal and close this one, or modify/rename this one?
- Should this be a new main property (i.e. not a qualifier)?
- Should the existing properties be deprecated?
- Should there be a property for IPv4 and IPv6? Or should there be a single property and require a of (P642) qualifier with Internet Protocol version 4 (Q11103) or IPv6 (Q2551624) values? Or should we use the Unit with IP address (Q11135) or IPv6 address (Q11097)? U+1F360 (talk) 01:15, 9 October 2019 (UTC)
- Comment Upon further thought, I think it would be best if a new data type was created that extends from quantity. This would be similar to the way Commons Media extends from string. Doing this would allow the user to input (and display) an IP address, or a CIDR range, and that data would be stored in deceimal format (see above). Likewise, it would allow that data to be queried by converting an IP address to decimal format. @Lucas Werkmeister (WMDE): if I were to write a patch for this, would WMDE be willing to review and merge it? U+1F360 (talk) 15:59, 10 October 2019 (UTC)
- @U+1F360: I Support having a new datatype for this; there is a phabricator "epic" phab:T91505 for tracking proposals for new datatypes, and you can see how some of those have faired over the past few years. ArthurPSmith (talk) 16:49, 10 October 2019 (UTC)
- I created a new task phab:T235389. I think this proposal can be closed and I'll create a new one once that work is done. :) U+1F360 (talk) 16:43, 13 October 2019 (UTC)
Comment I've created a new proposal, so this one can be closed. Thanks for everyone's help! U+1F360 (talk) 02:06, 8 November 2019 (UTC)