Wikipedia talk:AutoWikiBrowser/Typos
- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
This page has archives. Sections older than 40 days may be automatically archived by Lowercase sigmabot III. |
Hyphenated phrase
The hyphen is not removed from "less-populated". MB 04:11, 19 April 2022 (UTC)
- @MB I just added a rule for you to fix both "less-populated" and "more-populated". GoingBatty (talk) 04:34, 19 April 2022 (UTC)
I'm getting a lot of -- what I consider -- false positives for the ly-hyphens. Can somebody point me in the direction of the styleguide for that rule? Smasongarrison (talk) 18:46, 9 May 2022 (UTC)
- @Smasongarrison See the response I received from BD2412 on Wikipedia_talk:AutoWikiBrowser/Typos/Archive_4#privately-. Happy editing! GoingBatty (talk) 18:50, 9 May 2022 (UTC)
- thanks! Smasongarrison (talk) 18:53, 9 May 2022 (UTC)
Testing with JWB
Perhaps everyone else knew this already, and there may well be an easier way to do it, but I've finally found a way to test new additions without riskily adding them to the public list or going through the tedious and error-prone process of copying and pasting every regexp into the UI. To add a custom set of typos in a format matching AWB/T to the list, start JWB, invoke the browser's JavaScript console and paste
RETF.list = []; // Empty the list - only needed for iterative testing
(new mw.Api()).get({
action: 'query',
prop: 'revisions',
titles: 'User:Example/typos', // Substitute the title of your typo list page here
rvprop: 'content',
rvlimit: '1',
indexpageids: true,
format: 'json',
}).done(RETF.buildList);
Omit the first line to retain the standard list, but it's useful to get rid of a broken custom list before retesting after a fix. The titles: line can be any Wikipedia page, e.g. User:You/sandbox. Certes (talk) 21:03, 20 April 2022 (UTC)
- Certes, in Bawl you can now enter a custom page title to be used for RegExTypoFix. Only one page will be used so if a custom title is given the regular RETF won't be used. To test your entries, enter a page title for RETF to use instead of the default, save the settings, enter some text, press the magnifying glass and press the AWB RegExTypoFix button. Bawl will immediately report which (if any) rules matched something. Afterwards, empty the custom page title and save the settings to revert back to the title that is associated with your wiki according to d:Q6585066. — Alexis Jazz (talk or ping me) 22:37, 17 May 2022 (UTC)
- Thanks. I've not been using Bawl but it looks useful; I'll investigate it soon. Certes (talk) 22:40, 17 May 2022 (UTC)
Duplicate word=
We have a few duplicated value for word= in the typo list. Do these need to be made unique? List: "-ality", "First (3)", "Its (after)", "Its (before)", "Nonoperational", "Predecessor", "Regardless", "Sanskrit", "Thaw", "e.g.", "east–west", "km²", "north–south", "south–north", "sworn in", "west–east". (I was checking in case I duplicated any, but someone seems to have beaten me to it.) Certes (talk) 22:47, 20 April 2022 (UTC)
- Also, we have a typo entry marked disable=. Should that be disabled=, or are the two equivalent (perhaps anything other than word= works)? Certes (talk) 11:41, 21 April 2022 (UTC)
- @Certes: If I remember correctly, the AWB implementation just checks that "word=" is present, but doesn't do anything else with it. So, yes, changing "word" to anything else will disable a rule. Duplicate names have no effect, but it's easier to refer to a rule in edit summaries and discussions if they are unique. It's time I downloaded the source code again. -- John of Reading (talk) 14:55, 21 April 2022 (UTC)
"libration war"
Hi, we currently have 107 examples of "libration war", please can they be changed to "liberation war"? Ta ϢereSpielChequers 21:37, 27 April 2022 (UTC)
- In progress, done. Neils51 (talk) 03:37, 28 April 2022 (UTC)
MilliWatt = MediaWiki
<Typo word="W (watt)" find="([\d\.]+(?:[−―–—\s]| )?[µmkMGT])w\b" replace="$1W"/>
changes ".mw-first-heading" (a CSS class of #firstHeading) to ".mW-first-heading". For a non-code example, the ccTLD for Malawi (http://www.registrar.mw/) also matches. Found only one three live bad replacements: 2004 New Zealand local elections (diff 457286863), Gulf University for Science and Technology (diff 708765198) and What's Going On up There? (diff 660471548). — Alexis Jazz (talk or ping me) 04:06, 4 May 2022 (UTC)
- @Alexis Jazz: Could we fix this by ensuring a digit appears before the period, such as this:
find="(\d[\d\.]*(?:[−―–—\s]| )?[µmkMGT])w\b"
GoingBatty (talk) 12:37, 4 May 2022 (UTC) - ...or indeed after the period with just
find="(\d(?:…
, as ".123 mW" seems more likely than "123. mW". That also avoids domains such as "source123.mw". Certes (talk) 13:30, 4 May 2022 (UTC)- Certes, GoingBatty, ensuring there's a digit sounds good. The digit would have to appear after the period (if there is a period) as .1mW is sometimes used for 0.1mW. — Alexis Jazz (talk or ping me) 14:24, 4 May 2022 (UTC)
- @Alexis Jazz Fixed! GoingBatty (talk) 18:33, 4 May 2022 (UTC)
- GoingBatty, thanks! I think originally it was meant to also match 5.−mw. Seems like an unusual way to write to me (it's more common for prices?), but the −―–— is probably not really needed anymore when not matching a period. Edit: you're right, matching "48-kw engine" makes more sense. — Alexis Jazz (talk or ping me) 12:03, 5 May 2022 (UTC)
- @Alexis Jazz I think the dashes are needed for something like "a 48-kw engine". GoingBatty (talk) 14:16, 5 May 2022 (UTC)
- GoingBatty, thanks! I think originally it was meant to also match 5.−mw. Seems like an unusual way to write to me (it's more common for prices?), but the −―–— is probably not really needed anymore when not matching a period. Edit: you're right, matching "48-kw engine" makes more sense. — Alexis Jazz (talk or ping me) 12:03, 5 May 2022 (UTC)
- @Alexis Jazz Fixed! GoingBatty (talk) 18:33, 4 May 2022 (UTC)
- Certes, GoingBatty, ensuring there's a digit sounds good. The digit would have to appear after the period (if there is a period) as .1mW is sometimes used for 0.1mW. — Alexis Jazz (talk or ping me) 14:24, 4 May 2022 (UTC)
Olso
Saw this edit correcting a typo of Oslo, had ran AWB with Regex on that page right before so would have been fixed earlier if it was in. Just made me think it might be worth adding if someone familiar with the process would like to. Cheers! --TylerBurden (talk) 12:50, 21 June 2022 (UTC)
- I fixed about 30 Olso→Oslo typos in April. There are a few dozen false positives, including some typos for also, Olsen, etc. and the usual verbatim quotes of mistyped sources, so I didn't create a rule, but it might be useful if applied carefully. Certes (talk) 14:20, 21 June 2022 (UTC)
MOS:CURLY
@Trebuchette: To what extent have these new rules been tested? After a quick check, using User:John of Reading/X3, I don't think they work in AWB itself, because AWB automatically protects quoted text from typo-fixing. And the "CURLY SINGLE QUOTES" rule could cause formatting damage if a curly quote is placed next to a straight quote, as the resultant double-straight-quote will trigger italic markup. -- John of Reading (talk) 07:04, 6 July 2022 (UTC)
- Also beware of converting italic to bold, as in
Spielberg wrote Amblin´
. Certes (talk) 11:07, 6 July 2022 (UTC)