Talk:Q19859377

Latest comment: 1 year ago by Wostr in topic decimal separators
description: chemical compound
Useful links:
Classification of the class N-methylformanilide (Q19859377)  View with Reasonator View with SQID
For help about classification, see Wikidata:Classification.
Parent classes (classes of items which contain this one item)
Subclasses (classes which contain special kinds of items of this class)
N-methylformanilide⟩ on wikidata tree visualisation (external tool)(depth=1)
Generic queries for classes
See also


decimal separators

edit

  Notified participants of WikiProject Chemistry

If I want to get the the molar weight of the chemical substance N-Methylformanilide, which has a wikidata page at https://www.wikidata.org/wiki/Q19859377, I use the following sparql query: SELECT DISTINCT ?chemical_substance ?value ?unitLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } ?chemical_substance wdt:P231 ?CAS_Registry_Number. FILTER REGEX(STR(?chemical_substance),"Q19859377"). ?chemical_substance p:P2067 ?property. ?property psv:P2067 [ wikibase:quantityAmount ?value; wikibase:quantityUnit ?unit;]. } ORDER BY ASC(?chemical_substance) ASC(?value) ASC(?unitLabel) What I get back is a data record with the following content: row('http://www.wikidata.org/entity/Q19859377',literal(type('http://www.w3.org/2001/XMLSchema#decimal','13517')),literal(lang(en,dalton))) The problem is that in the wikidata page, the value for the molar weight is entered as '13,517' (note the comma as decimal separator) but that in the value returned from the sparql query, the decimal separator is gone, so the value is then 13517, which is a factor 1000 to high. Maybe the comma is interpreted as a 1000s separator? See also: https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fchemical_substance%20%3Fvalue%20%3FunitLabel%20WHERE%20%7B%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%20%3Fchemical_substance%20wdt%3AP231%20%3FCAS_Registry_Number.%20FILTER%20REGEX%28STR%28%3Fchemical_substance%29%2C%22Q19859377%22%29.%20%3Fchemical_substance%20p%3AP2067%20%3Fproperty.%20%3Fproperty%20psv%3AP2067%20%5B%20wikibase%3AquantityAmount%20%3Fvalue%3B%20wikibase%3AquantityUnit%20%3Funit%3B%5D.%20%20%7D%20ORDER%20BY%20ASC%28%3Fchemical_substance%29%20ASC%28%3Fvalue%29%20ASC%28%3FunitLabel%29%20 Of course I can manually change the value on this particular wikidata page, but I cannot do that for all (+500k) chemical substances and all their property values. The solution might be to enforce .(dot) as decimal separator, or take locales (of editor and/or reader) into account, or something else. Please note that this particular molar weight value is just an example, which can easily be verified by a calculation using the molecule formula (C₈H₉NO: 8*12+9*1+14+16 = 135). Other problematic values are not so easily detectable. Jromme (talk) 10:06, 18 September 2023 (UTC)Reply

  • @Jromme: I don't see a comma anywhere, for me this value is just "13 517" with a proper separator in Polish being a space. I always enter data in WD with a comma as a decimal separator and WD always interpret it the way it should. This value was added in this edition, so I suppose that user:Rhj may use the proper German decimal separator (comma), but WD interpret as an American separator – maybe either due to some settings in Preferences, or lack of any language data on the userpage. This problem still persist [1], [2]. However, this is the first time I see such thing in WD items, so it's not a problem of "(+500k) chemical substances". In fact, I found only 5 other problems like this, all are corrected now. Wostr (talk) 10:42, 18 September 2023 (UTC)Reply
    I see you changed the value, but it is still not correct.
    The molar weight should be something like 135 (Calculating by hand gives 8 * 12 + 9 * 1 + 14 + 16 = 135 for C₈H₉NO), and yes you are right in my initial message it should have said 'which is a factor 100 too high' (so 100 instead of 1000).
    But that still does not address the generic problem with values with decimal commas, that appeared as a factor 10/100/1000/? to big in some(?) peoples (at least mine) sparql output, depending on locales(?), I don't know.
    And with +500k chemical substances, I did not mean that all the values are incorrect, what I meant is that is very difficult to identify the (probably very few) incorrect values. Molar weight is easy to verify, checking other property values is not straightforward. Jromme (talk) 12:47, 18 September 2023 (UTC)Reply
    I see you changed the value, but it is still not correct. → that was just a mistake resulting in placing comma in a wrong place, 2 sec to correct.
    I still don't know where you get this issue, I can't reproduce it. Your query is faulty in a number of ways. I don't know why you choose CAS RNs as an indicator that something is a chemical substance. CAS RNs are really not an indicator of anything other than the fact that some record exists in CAS database. CAS RNs are issued for a variety of physical and theoretical objects, not only chemical substances. It's way better to use InChI/InChIKey or Wikidata metaclasses like type of chemical entity (Q113145171), group of stereoisomers (Q59199015) etc.
    How I found that there should not be any similar problems in WD right now? → I queried all instance of (P31)type of chemical entity (Q113145171) and instance of (P31)group of stereoisomers (Q59199015) with mass (P2067) above 1000. There were some obvious mistakes like in this item, but such cases were incidental.
    However, I wouldn't use WD molecular masses for anything right now. There were no guidelines in the past regarding automatic import of masses from external databases, nor there are any guidelines right now. Most WD items have monoisotopic mass imported, some have average mass imported, some may have this mass edited in some way – there is no way to tell. What's more, automatic import from external databases is not perfect, in some items wrong IDs resulted in importing wrong data. There is no way right now to check the scale of such problems. Wostr (talk) 13:40, 18 September 2023 (UTC)Reply
Return to "Q19859377" page.
  NODES
Note 2
Project 1
Verify 1