Wikipedia talk:AutoWikiBrowser/Typos
- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
This page has archives. Sections older than 40 days may be automatically archived by Lowercase sigmabot III. |
at the at the
I'm working my way through "at the at the" -"at the" and have got this down to 68 remaining. But it probably makes more sense to feed it into AWB. ϢereSpielChequers 11:39, 18 October 2022 (UTC)
- Done. Neils51 (talk) 08:24, 25 October 2022 (UTC)
- Ta muchly ϢereSpielChequers 19:46, 14 November 2022 (UTC)
Fractions
One of the fraction conversion rules is changing ½ to 1⁄2, which should not be happening on chess-related pages, where ½ is always used to indicate the points allocation in a drawn game, per MOS:FRAC. Example: 40th Chess Olympiad. Colonies Chris (talk) 16:45, 1 November 2022 (UTC).
- @Colonies Chris - I have disabled these rules, since they can't tell whether the article falls under one of the exemptions listed at MOS:FRAC. GoingBatty (talk) 19:03, 10 November 2022 (UTC)
- Thanks. Colonies Chris (talk) 08:30, 11 November 2022 (UTC)
"on" --> "in" in date expressions
Also, there's a problem with inappropriate conversion of "on" to "in" in cases such as "Average ratings calculated by chess-results.com based on August 2014 ratings" --> "Average ratings calculated by chess-results.com based in August 2014 ratings", Example: 41st_Chess_Olympiad. Colonies Chris (talk) 17:02, 1 November 2022 (UTC)
- @Colonies Chris: I've adjusted the rule so that it won't damage "based on August 2014 ratings". I've thrown in a few other words from a similar rule of my own. There's still plenty of scope for false positives, too hard for a regular expression to catch, along the lines "an increase of an astounding 24 million dollars on August 2014 figures" -- John of Reading (talk) 18:05, 1 November 2022 (UTC)
- Thanks. Colonies Chris (talk) 18:31, 1 November 2022 (UTC)
a eu -> an eu
Can we remove the general fix "a eu" -> "an eu"? It is causing false positives with French text. Example diff. cc @Kudpung, MB, and Elinruby:. Thanks. –Novem Linguae (talk) 23:49, 9 November 2022 (UTC)
- Novem Linguae, "a eu" is the French for "has had". I wasn't aware that AWB is supposed to be partly a translation tool. You may also wish to consider "dont" which in French means "of which" and not "don't". There are probably thousands more false positives, I can immediately think of dozens, but the effort should be to explain to users that mainspace is probably not the best place to dump an article in French (or any other language) into mainspace even if the intention is to translate it. Kudpung กุดผึ้ง (talk) 00:20, 10 November 2022 (UTC)
- We do get quotations which aren't marked clearly enough for AWB to skip them. This occurs not only with other languages but with archaic formes of Engliſhe. We just click on the line to undo the fix (or skip RETF in JWB) and move on. A good compromise might be to require more letters after eu, so "a eusomething" is fixed but not "il a eu un ...". However, most "eu*" words take "a" – a Euro, a euphemism, a eureka moment – and I'm struggling to think of one that needs "an". Do we know which rule is doing this? Is it the enormous "A to An" regex which won't fit on my screen? Can we just remove [the eu part of] whatever rule is doing this? Certes (talk) 09:59, 10 November 2022 (UTC)
- I suggest we seek some other remedy than scolding translators, as this vastly decreases the number of people willing to do that work. This particular diff is an edge case caused by problems in another tool, as previously discussed at great length with Kudpung (talk · contribs). However, I also think that Novem Linguae (talk · contribs) and Certes (talk · contribs) are correct; I cannot think of an instance where changing a->an before the string ‘eu’ would result in correct English, and having just expressed a willingness to improve tools, in general, perhaps I should just ask where I might find this regex? Possibly I can help, and would not make any changes without discussing them. It is a small part of the issue I was complaining about, but the small improvement of fixing this one small problem would nonetheless be an improvement. Elinruby (talk) 19:30, 17 November 2022 (UTC)
- I don't think anyone's tracked down exactly which regex makes the change. I guessed at the "A to An" entry in WP:AWB/T#New additions (a larger and more complex regex than I ever saw in 30 years as a software professional) but regex101.com says neither "il a eu un" nor "a euphemism" match it. Certes (talk) 22:07, 17 November 2022 (UTC)
- Agreed (and I've done some pretty tricky + and - lookaround expressions). Even short regex strings can be challenging, and it would be arrogant for anyone to say that one that long and complex properly handles the cases it was designed for, without corrupting ones it wasn't. And that's without even considering the unmaintainability of such a monstrosity by other users, even if it were "correct" under some limited set of conditions. Stuff like that is more about look-at-me bragging rights, than actually helping Wikipedia. Mathglot (talk) 22:14, 17 November 2022 (UTC)
- I don't think anyone's tracked down exactly which regex makes the change. I guessed at the "A to An" entry in WP:AWB/T#New additions (a larger and more complex regex than I ever saw in 30 years as a software professional) but regex101.com says neither "il a eu un" nor "a euphemism" match it. Certes (talk) 22:07, 17 November 2022 (UTC)
- I suggest we seek some other remedy than scolding translators, as this vastly decreases the number of people willing to do that work. This particular diff is an edge case caused by problems in another tool, as previously discussed at great length with Kudpung (talk · contribs). However, I also think that Novem Linguae (talk · contribs) and Certes (talk · contribs) are correct; I cannot think of an instance where changing a->an before the string ‘eu’ would result in correct English, and having just expressed a willingness to improve tools, in general, perhaps I should just ask where I might find this regex? Possibly I can help, and would not make any changes without discussing them. It is a small part of the issue I was complaining about, but the small improvement of fixing this one small problem would nonetheless be an improvement. Elinruby (talk) 19:30, 17 November 2022 (UTC)
- Ping Novem Linguae (talk · contribs), MB (talk · contribs) Elinruby (talk) 19:32, 17 November 2022 (UTC)
- @Mathglot: this is what I was asking you about Elinruby (talk) 20:02, 17 November 2022 (UTC)
- One approach here, in my opinion, concerns the use of the {{lang}} template. As long as a eu, dont (or any other text, in any language other than English) is contained in a {{lang}} template, then *with proper coding* in AWB, the problems can be avoided. It would be unfair and impractical to require hundreds of regular expressions to be changed, just to deal with this; in my opinion, this is an AWB-wide issue, for all regular expressions that involve typos, therefore, the proper approach, in my opinion, is to make a Change Request to change the operation of AWB itself. What should happen, is, when an AWB regex is tagged as a typo (I don't use AWB so I don't know how that is done) then the code in AWB itself, should ignore cases embedded within a {{lang}} tag (unless the specified language is English, if that ever happens; possibly in copy/paste or translation from other Wikipedias into English). Meanwhile, users should be reminded to use {{lang}} for all foreign text, whose original raison d'etre was about metadata and this just provides another reason to do so. Mathglot (talk) 20:49, 17 November 2022 (UTC)
- @Mathglot: this is what I was asking you about Elinruby (talk) 20:02, 17 November 2022 (UTC)
@Certes, Novem Linguae, Mathglot, and Elinruby: The change was indeed made by the "A to An" rule. You can check this by looking in the "Typos" tab at the bottom right of the AWB window to see which rules have just fired. The rule will change "a" to "an" before all words beginning with "E" or "e" except for some listed exceptions: [eE](?!\b|cologia ... |xtranj)
. One of the listed exceptions is that it will not change "euphemism" or any other word beginning with "eu" and at least two more letters: [uU](?:[A-Za-z]{2}|\sde\b)
. After checking the following word, the rule looks back at the preceding word to check for some foreign-language false positives; this stops it changing "il a eu un". The part that spots "il" is [iI](?:\b|l\b| ... |storie|terum)
I have added u\b
to the list of exceptions so that in future the rule will change "a EU" but not "a eu" or "a Eu".
But, yes, this rule has false positives, and all AWB users should be checking every edit they make, and {{lang}} tags are a good idea. -- John of Reading (talk) 08:17, 18 November 2022 (UTC)
- thank you for the explanation and the fix for the specific error. I also agree that Mathglot (talk · contribs) has made a good case for the lang template. Since I frequently see quotes in other languages, I will see about applying it in such cases, if it prevents headaches over here. Elinruby (talk) 15:55, 18 November 2022 (UTC)
till -> until
In the phrase "glacial till", the word "till" should not be changed to "until". Glacial till is glacial sediment. example diff Kaltenmeyer (talk) 20:13, 13 November 2022 (UTC)
- Should we remove this well-intentioned new rule altogether? Several other uses such as "to till the pasture" and "Gone till November" would be hard to detect and skip. Certes (talk) 22:03, 13 November 2022 (UTC)
- Since this has been added, I've corrected dozens, maybe hundreds of "till"s and many/most are "till [date]". Allowing it to at least change it when followed by a number should be an improvement (although "till 20 acres" is still a problem). MB 22:54, 13 November 2022 (UTC)
- There are about 1K+ exceptions, combinations/permutations of such as soil, glacial, plain, money, with around 50K entries. So unless some serious work is done around exception processing then would vote yes, for suspension. I couldn't find a discussion around the expunging of 'till'? Editors who want to see it removed could use the current regex in their own configs as they are more likely to check use syntax. Maybe it should be _targeted as suggested by MB. Neils51 (talk) 23:51, 13 November 2022 (UTC)
- Changing "till" to "until" before a number not followed by acre etc. sounds like a good compromise. Most occurrences of "till word" seem to be titles of works such as From Dusk till Dawn or land-related use such as "till plain". (Non-optimal regexp:
\btill(?=\s+\d)(?!\s+\d+[-\s]+(acre|hectare)s?\b)
→until
.) Certes (talk) 12:16, 14 November 2022 (UTC)- I've not been editing for several days, and have come here because in the few i've done so far today i'm seeing a silly number of "till" > "until"s; why are we making this change? "Till" is not incorrect, more a matter of style or taste, isn't it? Was there any discussion about this change? Happy days ~ LindsayHello 12:50, 15 November 2022 (UTC)
- Changing "till" to "until" before a number not followed by acre etc. sounds like a good compromise. Most occurrences of "till word" seem to be titles of works such as From Dusk till Dawn or land-related use such as "till plain". (Non-optimal regexp:
- There are about 1K+ exceptions, combinations/permutations of such as soil, glacial, plain, money, with around 50K entries. So unless some serious work is done around exception processing then would vote yes, for suspension. I couldn't find a discussion around the expunging of 'till'? Editors who want to see it removed could use the current regex in their own configs as they are more likely to check use syntax. Maybe it should be _targeted as suggested by MB. Neils51 (talk) 23:51, 13 November 2022 (UTC)
- Since this has been added, I've corrected dozens, maybe hundreds of "till"s and many/most are "till [date]". Allowing it to at least change it when followed by a number should be an improvement (although "till 20 acres" is still a problem). MB 22:54, 13 November 2022 (UTC)
My go-to reference for usage is Bryan Garner; here is his entry for till:
till; until. Till is, like until, a bona fide preposition and conjunction. Though less formal than until, till is neither colloquial nor substandard. It's especially common in BrE—e.g. ...
followed by several examples from BrE usage, and then:
And it still occurs in AmE—e.g. ...
followed by examples, and then:
If a form deserves a sic it's the incorrect 'til. Worse yet is 'till which is abominable, ...
followed by yet more examples of that monstrosity in reliable, printed publications. My take: get rid of it; it's neither helpful, nor correct. Mathglot (talk) 20:39, 17 November 2022 (UTC)
- Not certain, Mathglot which "it" you are recommending we get rid of; if you mean the "till > until" change, i fully agree; when AWB editing it is the typo"fix" i most often have to unfix. Only thing is, it's a month since the last comment here, and the thing still is there. Any possibility that someone with the ability can/will remove it? Happy days ~ LindsayHello 11:06, 24 December 2022 (UTC)
Fiancée
I recently fixed about 100 instances of "Fianceé", mostly to "Fiancée" but a few to the masculine "Fiancé". I was thinking of adding something like
<Typo word="Fiancée" find="\b([fF])ianc[eé]é" replace="$1iancée"/>
That works in testing but has no effect in the actual WP:AWB/T file. It might be a useful addition if anyone can get it to actually do something! (I omitted the customary final \b for the benefit of JWB, which doesn't recognise "é" as a letter. It should be safe unless you know of a Mr. Fianceéwibble.) Certes (talk) 16:11, 25 November 2022 (UTC)
- Double entry? Did some testing and what's working is this current entry;
<Typo word="Fiancé" find="\b([fF])ianc[eè](e)?\b(?![^\s\.]*\.\w)" replace="$1iancé$2"/><!--avoid domains-->
- Neils51 (talk) 21:31, 24 December 2022 (UTC)
- Yes, that similar entry works. It's fixing Fiancee and Fiancèe (also without the final e); I'm trying to fix Fianceé and Fiancéé. Certes (talk) 22:32, 24 December 2022 (UTC)
- Tried with unicode values and seems to work. Neils51 (talk) 12:49, 25 December 2022 (UTC)
- Thanks. That should work in AWB. I've removed the final \b, partly to match "Fianceés" but mainly because JWB's \b matches only an A-Z boundary, which does not occur between é and space in text such as "...fianceé was...". Certes (talk) 13:05, 25 December 2022 (UTC)