Wikipedia talk:AutoWikiBrowser/Typos

This is an old revision of this page, as edited by Certes (talk | contribs) at 08:59, 7 June 2019 (encyclopaedia > encyclopedia: suggestion). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Latest comment: 5 years ago by Certes in topic encyclopaedia > encyclopedia

Rules that match correct spellings?

Are the typo rules supposed to avoid matching properly spelled words?

  • "Collaborate" matches the properly spelled "collaboration" (e.g. Meego)
  • "Prestigious" matches the properly spelled "prestigious" (e.g. Kristalina Georgieva)
  • "Translate" matches the properly spelled "translate" (e.g. Quidditch Benelux)
  • "-tility" matches the properly spelled "utilities" (e.g. Metro Tunnel)

Thanks! GoingBatty (talk) 18:21, 9 February 2019 (UTC)Reply

I think we need to find a consensus between "overuse" of look-behinds, and compatibility with JWB, which can't use them, so omits those rules. The downside of not using look-behinds is, in AWB, the Typo tab will be filled with cases where it matched the correct text. And performing many unnecessary replacements is slower than using look-behinds. The upside of not using look-behinds is that JWB, and possibly other software(?), is able to use the typo rules. Which of these is more important?
If we decide against look-behinds (limiting them anyway), that's a good argument to keep capitalizations in their own section, since capitalization + another typo = look-behind needed. It would be useful to know how many non-capitalization rules have and still need a look-behind (i.e. would adding look-behinds to capitalizations greatly increase the look-behind count, or if it would be a drop in the bucket?).   ~ Tom.Reding (talkdgaf)  12:18, 11 February 2019 (UTC)Reply
A few thoughts from a non-expert:
  1. Could JWB be enhanced to remove look-behinds rather than skip the entire expression? The safest approach is to apply this tactic only to look-behinds which we mark as "efficiency only" in some way, but we may even be able to let JWB remove all look-behinds unless some of them actually prevent inappropriate changes.
  2. Could AWB be enhanced to filter typos from its list where the replacement text equals the replaced text?
  3. Could we use look-aheads instead, or would that be grossly inefficient as we'd have to check every start position rather than just those that matched the typo?
  4. Look-behinds may be coming soon to JavaScript, so this problem may eventually solve itself. Interesting reading, and a couple of useful links at the bottom: [1].
Certes (talk) 15:43, 11 February 2019 (UTC)Reply
@Joeytje50: would question #1 above be doable in JWB, perhaps by appending a specific comment at the end of specific rules which have 'efficiency-only' look-behinds?   ~ Tom.Reding (talkdgaf)  05:06, 15 February 2019 (UTC)Reply
@Reedy: would question #2 above be doable in AWB? If >= maybe, I'll create a phab ticket.   ~ Tom.Reding (talkdgaf)  05:11, 15 February 2019 (UTC)Reply
Could AWB add lookbehinds automatically, changing L → R to L(?<!r) → R? r is R made suitable for searching; I don't know this flavour of regex in detail but that may mean replacing $1 by \1, etc. For example, if we write ([Ss])pel+(ed|ing) → $1pell$2, could AWB actually run ([Ss])pel+(ed|ing)(?<!\1pell\2) → $1pell$2? Does this work? Is it efficient? Certes (talk) 12:24, 6 March 2019 (UTC)Reply

New infobox image?

The infobox image shows that AWB/T uses replacewith=. Since AWB/T actually uses replace=, would someone like to change the infobox image? Thanks! GoingBatty (talk) 18:21, 10 March 2019 (UTC)Reply

HTML entities

So I've been running database scans and finding that people often write certain non-ASCII characters as HTML entities. General consensus seems to be that for wikitext readability, this shouldn't be done for Latin alphabet-based letters, certainly, and some common symbols. There are many hundreds of instances, so it would be nice to have some semi-automated help fixing them. Below is a list I've put together of the most frequent occurrences that should be universally safe. I'm hoping this syntax with work with AWB and friends...would anyone be interested in testing them out and adding to the official list if they work? Thanks! -- Beland (talk) 23:30, 13 March 2019 (UTC)Reply

<Typo find="&deg;" replace="°"/>
<Typo find="&sect;" replace="§"/>
<Typo find="&eacute;" replace="é"/>
<Typo find="&pound;" replace="£"/>
<Typo find="&ccedil;" replace="ç"/>
<Typo find="&uuml;" replace="ü"/>
<Typo find="&auml;" replace="ä"/>
<Typo find="&oacute;" replace="ó"/>
<Typo find="&ouml;" replace="ö"/>
<Typo find="&aacute;" replace="á"/>
<Typo find="&#233;" replace="é"/>
<Typo find="&egrave;" replace="è"/>
<Typo find="&iacute;" replace="í"/>
<Typo find="&ntilde;" replace="ñ"/>
<Typo find="&agrave;" replace="à"/>
<Typo find="&euml;" replace="ë"/>
<Typo find="&Eacute;" replace="É"/>
<Typo find="&#246;" replace="ö"/>
<Typo find="&oslash;" replace="ø"/>
<Typo find="&szlig;" replace="ß"/>
<Typo find="&para;" replace="¶"/>
<Typo find="&Uuml;" replace="Ü"/>
<Typo find="&#225;" replace="á"/>
<Typo find="&#237;" replace="í"/>
<Typo find="&#163;" replace="£"/>
<Typo find="&atilde;" replace="ã"/>
<Typo find="&ecirc;" replace="ê"/>
<Typo find="&#228;" replace="ä"/>
<Typo find="&permil;" replace="‰"/>
<Typo find="&Iacute;" replace="Í"/>
<Typo find="&Ouml;" replace="Ö"/>
<Typo find="&#229;" replace="å"/>
<Typo find="&Aacute;" replace="Á"/>
<Typo find="&Aring;" replace="Å"/>
<Typo find="&#250;" replace="ú"/>
<Typo find="&ocirc;" replace="ô"/>
<Typo find="&acirc;" replace="â"/>
<Typo find="&euro;" replace="€"/>
<Typo find="&#248;" replace="ø"/>
<Typo find="&#333;" replace="ō"/>
<Typo find="&#257;" replace="ā"/>
<Typo find="&#182;" replace="¶"/>
<Typo find="&#252;" replace="ü"/> 
<Typo find="&#243;" replace="ó"/>
<Typo find="&#335;" replace="ŏ"/>
<Typo find="&#x101;" replace="ā"/>
<Typo find="&#xe9;" replace="é"/>
<Typo find="&#263;" replace="ć"/>
<Typo find="&#232;" replace="è"/>
<Typo find="&#176;" replace="°"/>
<Typo find="&#347;" replace="ś"/>
<Typo find="&#x17C;" replace="ż"/>
<Typo find="&#241;" replace="ñ"/>
<Typo find="&#224;" replace="à"/>
<Typo find="&#353;" replace="š"/>
<Typo find="&#351;" replace="ş"/>
<Typo find="&#299;" replace="ī"/>
<Typo find="&#x14d;" replace="ō"/>
<Typo find="&#491;" replace="ǫ"/>
<Typo find="&#xe8;" replace="è"/>
<Typo find="&#363;" replace="ū"/>
<Typo find="&#322;" replace="ł"/>
<Typo find="&#269;" replace="č"/>
<Typo find="&#235;" replace="ë"/>
<Typo find="&#365;" replace="ŭ"/>
<Typo find="&#268;" replace="Č"/>
<Typo find="&#xE9;" replace="é"/>
<Typo find="&#226;" replace="â"/>
<Typo find="&#379;" replace="Ż"/>
<Typo find="&#x000e1;" replace="á"/>
<Typo find="&#x000f6;" replace="ö"/>
<Typo find="&#201;" replace="É"/>
<Typo find="&#x000E1;" replace="á"/>
<Typo find="&#xE4;" replace="ä"/>
<Typo find="&#xf3;" replace="ó"/>
<Typo find="&#193;" replace="Á"/>
<Typo find="&#352;" replace="Š"/>
<Typo find="&#xe4;" replace="ä"/>
<Typo find="&#324;" replace="ń"/>
<Typo find="&#256;" replace="Ā"/>
<Typo find="&#xe1;" replace="á"/>
<Typo find="&#97;" replace="a"/>
<Typo find="&#xE1;" replace="á"/>
<Typo find="&#321;" replace="Ł"/>
<Typo find="&#x000f3;" replace="ó"/>
<Typo find="&#xFC;" replace="ü"/>
<Typo find="&#xA3;" replace="£"/>
<Typo find="&#355;" replace="ţ"/>
<Typo find="&#x00B0;" replace="°"/>
<Typo find="&#283;" replace="ě"/>
<Typo find="&#xF3;" replace="ó"/>
<Typo find="&#382;" replace="ž"/>
<Typo find="&#345;" replace="ř"/>
<Typo find="&#x000E9;" replace="é"/>
<Typo find="&#332;" replace="Ō"/>
<Typo find="&#xed;" replace="í"/>
<Typo find="&#167;" replace="§"/>
<Typo find="&#199;" replace="Ç"/>
<Typo find="&#281;" replace="ę"/>
<Typo find="&#380;" replace="ż"/>
@Beland: most of this is duplication of AWB's WP:AWB/UNICODIFY feature. There are only a handful of exceptions, which can be made into temporary typo rules until they're added to the unicodifying function. See Special:Diff/899831606 for reference.   ~ Tom.Reding (talkdgaf)  18:07, 1 June 2019 (UTC)Reply
Aha, good to know that's already available. I was trying to test with JWB, but neither the regular rules nor the new ones are working. (And I don't think it supports unicodify?) I'll remove the duplicates from the live listing for now. -- Beland (talk) 18:13, 1 June 2019 (UTC)Reply

"In the mean time" vs "In the meantime"

I recently created a new rule to change "In the mean time" to "In the meantime" and updated the relevant articles, including Henry Gage (soldier). My edit was reverted by Andreas Philopater as "stylistically inferior". Looking for thoughts from the experts here about this rule. Thanks! GoingBatty (talk) 16:33, 20 March 2019 (UTC)Reply

"Mean time" would mean "average time", while "meantime" means "concurrently" or similar, as described in Merriam-Webster and many other reliable sources. I'm failing to find an authoritative source recommending "in the mean time". I suppose there might exist some earlier period in English usage where "mean time" was common, but it has no place in article prose except for inside quotation marks.   ~ Tom.Reding (talkdgaf)  17:01, 20 March 2019 (UTC)Reply
But why would you take an American dictionary as the sole standard? And why would you rule out a legitimately existing option within the full range of the English language, simply because American English is poor in alternatives? --Andreas Philopater (talk) 21:24, 20 March 2019 (UTC)Reply
Andreas Philopater, I'll be interested in further discussion after you provide a reliable source supporting your position.   ~ Tom.Reding (talkdgaf)  22:49, 20 March 2019 (UTC)Reply
@Andreas Philopater:: It looks like Tom.Reding referred to "Merriam-Webster and many other reliable sources" instead of taking "an American dictionary as the sole standard". Since American English can be different from English in other parts of the world, I look forward to other reliable sources. Thanks! GoingBatty (talk) 01:52, 23 March 2019 (UTC)Reply
  • Since both forms are available in English, there's no need to force either one as an automatic "correction". --Andreas Philopater (talk) 21:24, 20 March 2019 (UTC)Reply
  • "Mean time" has a specific meaning, that is the local mean time is determined by the sun as distinct from local standard time. It is defined as noon by local mean time is when the sun is directly above the local meridian. Meantime does not have the same meaning at all. "In the meantime" means "while something else is currently happening". I don't know which use is appropriate in the article concerned but I do know that you cannot simply switch between "mean time" and "meantime". American English has nothing to do with it. - Nick Thorne talk 22:00, 20 March 2019 (UTC)Reply
Upon re-reading the OP, I think the rule is appropriate. "in the mean time' does not actually make any sense, given what mean time really means. - Nick Thorne talk 22:03, 20 March 2019 (UTC)Reply
And of course it isn't conceivable that the "mean time" in "in the mean time" signifies something other than the "mean time" in "Greenwich Mean Time". Because language always lends itself to neat categorisation and polysemy just isn't a thing. --Andreas Philopater (talk) 19:31, 21 March 2019 (UTC)Reply
@Andreas Philopater: If you find some examples in the English Wikipedia where "in the mean time" means something other than "while something else is currently happening", could you please post them here? Thanks! GoingBatty (talk) 01:52, 23 March 2019 (UTC)Reply
I've looked but can't see any on Wikipedia. They occasionally occur elsewhere, when reporting the average duration or interval of some event.
  • a change in the mean time between unscheduled removal [2]
  • variations in the mean time of diabetes onset [3]
  • Subsequent failures are considered in the mean time between failures [4]
  • a significant difference in the mean time of the mortality [5]
Maybe run this but do a negative lookahead for "of|between"? – Certes (talk) 11:19, 23 March 2019 (UTC)Reply
  Done   ~ Tom.Reding (talkdgaf)  16:00, 27 March 2019 (UTC)Reply
  • As others have said, "mean time" is a term of art for average time, such as the mean time of solar noon or the mean time between failures. The other meaning, a rough synonym for "meanwhile", is usually written "meantime" but is sometimes written "mean time". The question is whether the latter is valid or should be corrected. It's listed in Wiktionary and The Free Dictionary, and there examples of its use attributed to Reuters reports here, but I can't find a RS to confirm whether it's actually right. Certes (talk) 00:40, 21 March 2019 (UTC)Reply

"nationalwide" → "nationwide"

Doesn't occur too often, but a useful addition in my opinion. --bender235 (talk) 21:22, 15 April 2019 (UTC)Reply

"based off of" → "based on"

This is reasonably common but incorrect usage. Dalziel 86 (talk) 01:44, 25 May 2019 (UTC)Reply

Am I not seeing this as first rule under Wikipedia:AutoWikiBrowser/Typos#Incorrect_phrases? Shenme (talk) 04:27, 1 June 2019 (UTC)Reply

encyclopaedia > encyclopedia

@Tom.Reding: Since this edit, the typo fixer has been suggesting we change encyclopaedia to encyclopedia. Was that your intention? Lots of dictionaries list encyclopaedia as a valid spelling. -- John of Reading (talk) 06:30, 7 June 2019 (UTC)Reply

How about something like <Typo word="Encyclopedia (1)" find="\b([eE])ncyl?c?l?op(a?e|æ)a?di(as?|c)\b(?<![eE]ncyclop(a?e|æ)di[ac]s?)" replace="$1ncyclop$2di$3"/> (not tested)? Certes (talk) 08:59, 7 June 2019 (UTC)Reply
  NODES
HOME 2
Interesting 1
Javascript 1
languages 2
os 19
text 4
web 2