Wikipedia talk:AutoWikiBrowser/Typos

This is an old revision of this page, as edited by BillFlis (talk | contribs) at 20:51, 9 March 2010 (New or Fix existing typos). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Latest comment: 14 years ago by BillFlis in topic New or Fix existing typos

"whoom" -> "whom"

Could someone please add that rule? Thanks. --bender235 (talk) 16:21, 23 January 2010 (UTC)Reply

Testing: \b([Ww])hoo+m\b => $1hom right now. I'll see if there are problematic false positives. Shadowjams (talk) 00:01, 25 January 2010 (UTC)Reply
That expression works, but it's not a common typo. Scanning the November database dump I only find that misspelling used in 4 articles, 2 of which are intentional, and 2 of which I corrected. The two misspellings were added by one editor. I'm going to hold off adding it. Shadowjams (talk) 01:55, 25 January 2010 (UTC)Reply

"intitled" -> "entitled"

Please add that one. Found it here, but AWB did not detect (changed it manually). --bender235 (talk) 22:03, 27 January 2010 (UTC)Reply

Make sure not to correct "intitle", this is used in query strings for Google Books URLs, e. g. Populares. Paradoctor (talk) 22:34, 27 January 2010 (UTC)Reply
Do you have any indication that there is another rule that generally handles this, but didn't in this case? I can't find (in a very quick search, admittedly) a rule that would have matched this. I'll work on a new one, but if there's an old one that should have gotten it, knowing that would be very helpful. Shadowjams (talk) 08:26, 29 January 2010 (UTC)Reply
Ok, this should work. I don't want to add it in quite yet because I haven't tested it very much, but feel free to add it to your add/replaces, and if you don't see any problems then go ahead and add it to the typo list.
I'm not 100% that "intitled" is a typo, the dictionary references I looked up were a little unclear. But I don't think it's a problem edit either. In most cases "entitled" is going to be more right than "intitled", although I wonder if there are cases where "intitled" is correct. I'm not sure.
The other downside, I can't offhand think of a way to keep the case correct while transforming letters, so you'll need two rules, one for "Intitled" and one for "intitled". Just change the first letters, respectively. This one should also catch a simple transposition or deletion in the middle (the most likely typo).

Find: \binti[tl]{1,2}ed\b
Replace: entitled

Let me know how it works out. I'm using it on my own personal set at the moment. Shadowjams (talk) 08:52, 29 January 2010 (UTC)Reply
I'm finding a lot of English language quotes, particularly in legal opinions, from the 1800s and before use "intitled". Perhaps we need to make sure any edit doesn't change a quote. Shadowjams (talk) 08:59, 29 January 2010 (UTC)Reply
Don't know why I missed this, but my Merriam Webster lists "intitle" as an archaic version of "entitle". Paradoctor (talk) 09:08, 29 January 2010 (UTC)Reply
There are ways to exclude quoted statements like this, but all of them that I'm coming up with right now are pretty processor intensive. There might be a way to creatively limit this, at some expense of type 2 errors, that is less processor intensive. I might revisit it at another time. I would recommend against using the above regex unless you're extremely careful you're not changing a quote. Shadowjams (talk) 09:10, 29 January 2010 (UTC)Reply
Paradoctor - That is what I found, more or less as well. I don't think there's a problem converting modern text, but we certainly don't want to alter any quotes that use it. Because AWB uses the .net regex library there are some non-greedy expressions that aren't possible in most other regexes that might fix this nicely... but I'm concerned that most solutions will eat a lot of processing power. If some others have ideas I'd like any advice. Shadowjams (talk) 09:14, 29 January 2010 (UTC)Reply
AWB does not apply the typo fixing rules within templates e.g. {{cquote}} or within quote marks e.g. " and all the common variations. Rjwilmsi 09:38, 29 January 2010 (UTC)Reply
Oh, ok, so in a find-replace yes, but not within AWB/t? Shadowjams (talk) 09:42, 29 January 2010 (UTC)Reply
I'm not sure I understand your question. I'll explain my answer again in more detail in the hope it does answer your question: when AWB executes a typo rule from the WP:AWB/T list it first hides the quotes then applies the typo regexes, then unhides the quotes again. If you apply the regex by other means you will not get this quote hiding (unless you write a custom module to access the functions). Rjwilmsi 09:55, 29 January 2010 (UTC)Reply
Sorry for the confusion. That wasn't very clear. You understood what I meant though. I believe, in that case, that the above should fix what the OP was talking about. Of course, the question of whether or not the i version is appropriate in the modern context is still open, although I would assume not especially controversial. Shadowjams (talk) 10:14, 29 January 2010 (UTC)Reply

E.g.

The rule for “e.g.” (currently fourth among new additions) adds left bracket, for example “eg.” → “(e.g.”. This should be fixed by removing the bracket. Svick (talk) 04:08, 30 January 2010 (UTC)Reply

I originally put it there, and then its structure was changed, and then User:Marek69 disabled it, then made some changes and renabled it. The original one had a leading ( because the overwhelming majority of examples I found were at the beginning of parentheticals, which makes sense when you consider how people use the abbreviation. It is probably adding it because it was removed by Marek without changing the corresponding output.
I had tested the first version and was reasonably confident it didn't have many (I never found any) false positives. I cannot say the same about this new version. I am going to revert it back to the earlier version with a note. If someone wants to test it and change it that's fine too, but I think we're seeing some problems with it right now. Shadowjams (talk) 22:38, 30 January 2010 (UTC)Reply
Another small question. Is E.g. ever proper in the Manual of style? (compared to e.g.). I don't know the answer, but wanted to bring it up. Shadowjams (talk) 22:41, 30 January 2010 (UTC)Reply
The last version didn't work again (changed “eg.” to “(e.g.”, but didn't change “(eg.”), so I disabled it. Before it is turned on again, please make sure it works as it should. Svick (talk) 23:36, 30 January 2010 (UTC)Reply
Looks fixed now. My mistake for not noticing that Marek's change was correct; the simplification is where it caused the problem.
If there are false positives without the (, then we'll need to note those here. Shadowjams (talk) 02:41, 31 January 2010 (UTC)Reply

"Discoverinig" -> "Discovering"

AWB accidently replaced "Discoverinig" with "Discoverining" here, but it should be "Discovering", of course. --bender235 (talk) 23:31, 6 February 2010 (UTC)Reply

That appears to be a result of the "-ining" regex, which is (?!\b(?:(?:Br|Kl|M|H|St)e|Nar|Kurt|Lap)inig\b)\b(\w+)inig(s|ly)?\b. I don't see any systematic way to fix this class of typos without interfering with the others. In other words, "inig" that should be "ing" are virtually indistinguishable from "inig" that should be "ining". If someone has some way to distinguish the two that would be useful, but I can't think of one right now.
I also don't know which is more common, but that could be a useful exercise. Shadowjams (talk) 08:24, 7 February 2010 (UTC)Reply

Fluorescent

Using the "-escent" rule, AWB changes "floresent" to "florescent". Although that is a valid word, the more likely intended word is "fluorescent". A wiki search for "fluorescent" produced 1042 articles, and "florescent" found 32 pages. For those 32, I fixed the incorrect usages, discovering that all except 3 were actually intended to be "fluorescent". MANdARAX • XAЯAbИAM 21:31, 9 February 2010 (UTC)Reply

I've expanded the "Fluoresce" rule and removed "|[Ff]lu?or" from the "-escent" rule. I excluded "florescent" and "florescence" from "fixing" as they are correctly spelled words; however, as noted above, they're extremely rare on Wikipedia and the "fluo..." word is almost always the intended one, so if anyone thinks it's better without the exclusion, feel free to remove it. MANdARAX  XAЯAbИAM 04:09, 21 February 2010 (UTC)Reply

New or Fix existing typos

I have come across a couple typos that are either not working or need to be added. Below are a few that I have found that either need to be added or don't seem to be working.

Workign -> Working

For some reason, AWB tried to replace "Workign" with "Wooking" here (I correct it manually), but it should be "Working". --bender235 (talk) 20:10, 9 March 2010 (UTC)Reply

  NODES
HOME 2
Idea 1
idea 1
languages 2
Note 3
os 24
text 2
web 1