Wikipedia talk:AutoWikiBrowser/Typos
- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
This page has archives. Sections older than 60 days may be automatically archived by Lowercase sigmabot III. |
Broken regex
I'm being told regex is broken and I don't know how to fix it. MBisanz talk 02:57, 29 February 2008 (UTC)
Looks like AWB replaces "manouvers" with "manoeuvers".
Aren't later one is a misspelling? Like should not it be "manoeuvres"? I would say "manouvers" should be replaced with "maneuvers" and "manoeuvers" with "manoeuvres" ... But I could be wrong cuz English is a second language for me. TestPilot 11:02, 10 March 2008 (UTC)
in so far → insofar?
Looks like "in so far" is a legitimate spelling. Should we really replace it? TestPilot 14:05, 10 March 2008 (UTC)
- I can see a lot of false positives with that. Also [1]. Rocket000 (talk) 23:06, 15 March 2008 (UTC)
Imtrec Aviation -> Intrec Aviation
Can someone add as exception? Imtrec Aviation is a legitimate company. Should "imtrec"->"intrec" rule be kept at all? TestPilot 15:02, 10 March 2008 (UTC)
- Looks like this one got fixed by User:BillFlis. Thanx. TestPilot 07:05, 11 March 2008 (UTC)
Imdadkhani
The script wants to replace perfectly good "Imdadkhani"(28 pages in WP) with nonexistent "Indadkhani" for some reason. TestPilot 16:24, 11 March 2008 (UTC)
Retuned
It is not a valid word - it is a misspelling of returned. TestPilot 17:57, 11 March 2008 (UTC)
- Re+tuned, see the usage[2]. MaxSem(Han shot first!) 18:03, 11 March 2008 (UTC)
- Opps. Yes. Correct, sorry. TestPilot 18:05, 11 March 2008 (UTC)
AutoCorrect database
I have created a page with huge list of typo corrections from AutoCorrect software. RegExTypoFix got covered lots of entries, but far from all. The list itself was originally based on old list of wiki typo corrections. And it was created by AHK community. The easiest way to check it out in AWB is to create list from "what links here" - Zelavin article. Make sure you enable user space pages. Second, today I started to work on my own utility for typo autocorrection on the fly. It sort of working already, as I type:), and the good news is that it checks against 2200 regexpressions (all that was on AutoWikiBrowser/Typos page) in a blink of an eye. Even faster then that - on relatively old computer. So it do looks like we can expand regex list like tenfold without having to worry too much about performance. TestPilot 02:24, 13 March 2008 (UTC)
- I cleaned out list and updated typo page with new rules. TestPilot 03:58, 14 March 2008 (UTC)
heavly → heavily
In this edit, it somehow used avly → avely. Can this be fixed? Thanks. — E talk 23:44, 20 March 2008 (UTC)
- Removed. MaxSem(Han shot first!) 13:03, 23 March 2008 (UTC)
Thru -> through
Given the number of legitimate uses (including in article titles - see Special:Prefixindex/Thru), should this be an automatic correction? Black Falcon (Talk) 22:41, 22 March 2008 (UTC)
- I think no, so I've removed it. Thanks Rjwilmsi (talk) 12:44, 23 March 2008 (UTC)
Inbhir -> Imbhir
I'm not sure which line is causing the change, but I think there are too many false positive associated with this change of "In" to "Im". Examples of articles on which this would cause errors include Ayr and Cullen. – Black Falcon (Talk) 16:49, 24 March 2008 (UTC)
- A similar issue takes place with replacement of "En" with "Em" (e.g. "Enman" -> "Emman", in the article William George Barker). Black Falcon (Talk) 21:36, 24 March 2008 (UTC)
The Ayr and Cullen articles were incorrectly tagged. I've fixed them [3] and [4]. Thanks Rjwilmsi (talk) 00:07, 30 March 2008 (UTC)
I've added an exception so 'Enman' isn't caught - [5]. Rjwilmsi (talk) 00:13, 30 March 2008 (UTC)
- Thanks. Black Falcon (Talk) 06:47, 6 April 2008 (UTC)
Vitaly → Vitally
I recently ran AWB on Category:Soviet actors (123 articles) and encountered three false positives (Boris Babochkin, Vasily Livanov, and Vitaly Solomin) with "Vitaly" → "Vitally". While "vitaly" is probably a common misspelling of "vitally" (and, thus, the fix for it is useful), the fix could cause errors in articles about Russian people. Since names are likely to be written in upper-case, is there any way to restrict the change to lower-case instances of "vitaly" only? If not, is there some other way to reduce the potential for false positives while preserving the typo fix? Black Falcon (Talk) 06:29, 6 April 2008 (UTC)
- Hopefully fixed. TestPilot 10:20, 16 April 2008 (UTC)
- It seems to be working: I just tried AWB on the articles that produced the false positives and was not prompted for any typo fixes. Thanks! Black Falcon (Talk) 16:49, 16 April 2008 (UTC)
Consitution > Constitution
Capitalization in Wikipedia DNS
Noticed it changed wikipedia to Wikipedia in Wikiquote in the dns addresses listed there. Convention dictates they remain lowercase. - Kaobear (talk) 15:05, 8 April 2008 (UTC)
- Yeah, I agree, strings "wikipedia.org", "wikipedia.com", "wiktionary.org" and "microsoft.com" should not be capitalized - too many false positives. But, unfortunately, I don't know how to fix that. TestPilot 10:34, 16 April 2008 (UTC)
esp. --> especially
I was thinking:
<Typo word="especially" find="\b(Esp|esp)\.([ \t])\b" replace="$1ecially$2" />
..but am open to corrections... Ling.Nut (talk) 19:29, 26 April 2008 (UTC)
- Most likely it will be encountered in quotations, where it shouldn't be changed. MaxSem(Han shot first!) 19:57, 26 April 2008 (UTC)
WP:MOS fixes, such as "no spaces around mdashes"
Is there a reason why AWB doesn't do the more mechanical WP:MOS fixes? Ling.Nut (talk) 09:32, 28 April 2008 (UTC)
- These types of fixes can be proposed at Wikipedia talk:AutoWikiBrowser/Feature requests. I'm not sure whether a spacing fix could (or should) be incorporated into this page... Black Falcon (Talk) 20:15, 28 April 2008 (UTC)
- Thanks! Ling.Nut (talk) 02:12, 29 April 2008 (UTC)
Fix may be needed
While I was adding a nav box and also doing the general and typo fixes, AWB changed spelling of a word Succeeded from succedded to succeededd in preview. But when i checked using diffs after saving it was Succeeded. Can someone look at this problem? --SMS Talk 16:57, 28 April 2008 (UTC)
- BillFlis has corrected this [6]. Thanks Rjwilmsi (talk) 23:44, 30 April 2008 (UTC)
Souffle
It seems to change Souffle into Souffléouffl. Interesting word, but not strictly a correction... -- 20.133.0.13 (talk) 09:40, 29 April 2008 (UTC)
- Thanks, this has already been corrected [7] by BillFlis. Thanks Rjwilmsi (talk) 23:42, 30 April 2008 (UTC)
Entries to move hyphens to en dashes
Per WP:DASH, I'd like to add some entries here that will convert hyphens to en dashes. This is a bit of a departure, thopugh, so I wanted to discuss it first. I've tested these extensively, and not encountered any false positives (I have others that do have a lot of false positives, but I'm not adding them here).
<Typo word="en dash in page ranges" find="(pages\ ?=\ ?|pp\.?\ )([0-9]+)-([0-9]+)" replace="$1$2–$3" />
<Typo word="en dash in date ranges" find="(\[?\[?(January|February|March|April|May|June|July|August|September|October|November|December)\ [1-3]?[0-9]\]?\]?,\ \[?\[?[1-2][0-9][0-9][0-9]\]?\]?)\ ?-\ ?(\[?\[?(January|February|March|April|May|June|July|August|September|October|November|December)\ [1-3]?[0-9]\]?\]?,\ \[?\[?[1-2][0-9][0-9][0-9]\]?\]?)" replace="$1–$3" />
<Typo word="en dash in money ranges" find="(\$[1-9]?[0-9]?[0-9]?[0-9])\ ?-\ ?(\$?[1-9]?[0-9]?[0-9]?[0-9])" replace="$1–$2" />
<Typo word="en dash in measurement ranges" find="([1-9]?[0-9])\ ?-\ ?([1-9]?[0-9])(\ |\ )(years|months|weeks|days|hours|minutes|seconds|kg|mg|kb|km|GHz|Hz|kHz|miles|mi\.|%|MPH|mph)\b" replace="$1–$2$3$4"
<Typo word="en dash in time ranges" find="([0-1]?[0-9]:[0-5][0-9]\ ?([AaPp][Mm])?)\ ?-\ ?([0-1]?[0-9]:[0-5][0-9]\ ?([AaPp][Mm])?)" replace="$1–$3" />
<Typo word="en dash in age ranges" find="([Aa]ge[sd])\ ([1-9]?[0-9])\ ?-\ ?([1-9]?[0-9])" replace="$1 $2–$3 />
So let me know what you think...—Chowbok ☠ 17:29, 6 May 2008 (UTC)
- Since Wikipedia is now UTF-8–compatible, why don't you replace the hyphens with the single en-dash character "–", rather than the lame old HTML entity "& n d a s h ;", which takes up seven times the space?--BillFlis (talk) 17:40, 6 May 2008 (UTC)
- Because the edit box is (for most people) in a monospaced font, which makes it impossible to tell the difference between a hyphen, an en dash, and an em dash. You'll also note that the dash characters are not converted to UTF-8 automatically by AWB, for the same reason.—Chowbok ☠ 17:45, 6 May 2008 (UTC)
- I'm with BillFlis in preferring that the single character be used rather than the html entity. If AWB can only support the html entity, I'd rather not see this implemented. older ≠ wiser 18:05, 6 May 2008 (UTC)
- Sigh. Did you read what I just wrote? At least try to address my point...—Chowbok ☠ 18:23, 6 May 2008 (UTC)
- I don't really see why the monospaced font display is an issue. I venture that most editors could care less about the difference and we shouldn't be unnecessarily filling the edit screen with techno-jargon. If AWB is unable to make the distinction, I don't think we should be using AWB to implement such a "solution". older ≠ wiser 18:30, 6 May 2008 (UTC)
- AWB is capable of putting in the UTF-8 character, I'm not sure how you got that it isn't. Anyway, the monospaced font is very much an issue, and editors that know the difference between the dashes absolutely need to be able to see which has been implemented. It's ridiculous to say that it's not a big deal that commonly-confused characters look identical in the edit box.—Chowbok ☠ 18:34, 6 May 2008 (UTC)
- A bad assumption perhaps because it is rather inconceivable why anyone would want to clutter the articles up with html entities when there is a perfectly good UTF character available. If it is so very important for editors to be able to distinguish them, then why does the MOS makes no mention whatsoever of the distinction let alone indicate any sort of preference. Now that you indicate AWB is capable of inserting the UTF character, then I very very strongly oppose having it insert the cludgey html entity. older ≠ wiser 19:02, 6 May 2008 (UTC)
- I don't see why it's "inconceivable" when I've explained it several times now. The reason is that editors need to be able to see if something is a hyphen, en dash, or em dash when editing an article. The advantage of doing it this way is that it allows that. The disadvantage is that you think HTML entities are ugly. Sorry, I'm not convinced that's the better argument.—Chowbok ☠ 19:22, 6 May 2008 (UTC)
- Well, if as you say, it is so important to see the distinction, then why is the MOS and other editing guidelines silent on this point? If it is simply a matter of your preference vs. mine, that is certainly something that should be more widely discussed before encoding it into AWB. older ≠ wiser 19:41, 6 May 2008 (UTC)
- Do keep in mind that, as I said, AWB already does not move – to – when fixing Unicode. So if we're discussing this, we need to discuss them removing that exception as well. Also, please see below for my question.—Chowbok ☠ 18:45, 7 May 2008 (UTC)
- Well, if as you say, it is so important to see the distinction, then why is the MOS and other editing guidelines silent on this point? If it is simply a matter of your preference vs. mine, that is certainly something that should be more widely discussed before encoding it into AWB. older ≠ wiser 19:41, 6 May 2008 (UTC)
- I don't see why it's "inconceivable" when I've explained it several times now. The reason is that editors need to be able to see if something is a hyphen, en dash, or em dash when editing an article. The advantage of doing it this way is that it allows that. The disadvantage is that you think HTML entities are ugly. Sorry, I'm not convinced that's the better argument.—Chowbok ☠ 19:22, 6 May 2008 (UTC)
- A bad assumption perhaps because it is rather inconceivable why anyone would want to clutter the articles up with html entities when there is a perfectly good UTF character available. If it is so very important for editors to be able to distinguish them, then why does the MOS makes no mention whatsoever of the distinction let alone indicate any sort of preference. Now that you indicate AWB is capable of inserting the UTF character, then I very very strongly oppose having it insert the cludgey html entity. older ≠ wiser 19:02, 6 May 2008 (UTC)
- AWB is capable of putting in the UTF-8 character, I'm not sure how you got that it isn't. Anyway, the monospaced font is very much an issue, and editors that know the difference between the dashes absolutely need to be able to see which has been implemented. It's ridiculous to say that it's not a big deal that commonly-confused characters look identical in the edit box.—Chowbok ☠ 18:34, 6 May 2008 (UTC)
- I don't really see why the monospaced font display is an issue. I venture that most editors could care less about the difference and we shouldn't be unnecessarily filling the edit screen with techno-jargon. If AWB is unable to make the distinction, I don't think we should be using AWB to implement such a "solution". older ≠ wiser 18:30, 6 May 2008 (UTC)
- Sigh. Did you read what I just wrote? At least try to address my point...—Chowbok ☠ 18:23, 6 May 2008 (UTC)
- I'm with BillFlis in preferring that the single character be used rather than the html entity. If AWB can only support the html entity, I'd rather not see this implemented. older ≠ wiser 18:05, 6 May 2008 (UTC)
- Because the edit box is (for most people) in a monospaced font, which makes it impossible to tell the difference between a hyphen, an en dash, and an em dash. You'll also note that the dash characters are not converted to UTF-8 automatically by AWB, for the same reason.—Chowbok ☠ 17:45, 6 May 2008 (UTC)
This certainly seems like a worthwhile fix for AWB to do, but I think it would be better as an AWB general fix so it's available to all AWB users, not just those doing typo fixing. Therefore I suggest you post it at Wikipedia talk:AutoWikiBrowser/Feature requests. Thanks Rjwilmsi (talk) 17:52, 6 May 2008 (UTC)
- Will do, thanks.—Chowbok ☠ 18:23, 6 May 2008 (UTC)
Just beneath the edit window are all these special characters, which an editor can simply click on to insert. Guess what's the very first one? An en-dash character. The second is an em-dash. If we're not supposed to use them, then why are they there?--BillFlis (talk) 22:28, 6 May 2008 (UTC)
- I'm not saying it's a policy to use the entities, just good practice. Let me ask you and Bkonrad a question. Suppose I'm editing a page by hand, and I see 1941—1945 in the edit box. What should I do to quickly determine if the correct dash is being used?—Chowbok ☠ 18:42, 7 May 2008 (UTC)
- Hmm, well just eyeballing it in my edit window it looks to me like an endash. And confirmed by using Firefox's search function. older ≠ wiser 19:29, 7 May 2008 (UTC)
- Put a hyphen, an em dash, and an en dash in an edit window. Assuming you're using a monospaced font, I guarantee two of those will be identical.—Chowbok ☠ 22:21, 7 May 2008 (UTC)
- Yep, I did that. I did nothing special to configure Firefox. The difference between them was pretty easy to spot. older ≠ wiser 00:36, 8 May 2008 (UTC)
- Well, I don't know what font you're using, but in Courier, these look the same:
- –
- —
- —Chowbok ☠ 03:01, 8 May 2008 (UTC)
- Hmm, I misspoke. 1st, when in response to your example of 1941—1945, I said it looked like an endash in the edit window, but on more careful examination it is an mdash. 2nd, a regular hyphen and an ndash do appear identical in the edit window (immediately above, you show an endash and and mdash which are clearly different, even to my not particularly acute vision. But the Firefox search function does find the correct characters. But in any case, you have not responded to my query about why, if it is so important for editors to be able to make this distinction (based on using the html entities), is no mention made of it in the MOS or other editing guidelines? older ≠ wiser 12:23, 8 May 2008 (UTC)
- Yep, I did that. I did nothing special to configure Firefox. The difference between them was pretty easy to spot. older ≠ wiser 00:36, 8 May 2008 (UTC)
- Put a hyphen, an em dash, and an en dash in an edit window. Assuming you're using a monospaced font, I guarantee two of those will be identical.—Chowbok ☠ 22:21, 7 May 2008 (UTC)
- Hmm, well just eyeballing it in my edit window it looks to me like an endash. And confirmed by using Firefox's search function. older ≠ wiser 19:29, 7 May 2008 (UTC)
suggesting a change
Here, AWB changed "reciding" to "resideing". Using the link to Dictionary.com on User:Mboverload/RegExTypoFix/rejectedwords, the word "resideing" is not a real word. I suggest that the typo fix for "reciding" be changed to "residing".--Rockfang (talk) 12:33, 7 May 2008 (UTC)
- Fixed.--BillFlis (talk) 13:18, 7 May 2008 (UTC)
- Thanks.--Rockfang (talk) 13:51, 7 May 2008 (UTC)
Of fornames and feilds
It's currently changing "forname" to "oref" and "feilding" to "field$S" which, while both interesting words, are probably not correct; can someone who understands these things fix it? — iridescent 20:57, 8 May 2008 (UTC)
- I fixed the "feilding" one. Someone selse is going to have to tackle the other one.--BillFlis (talk) 23:01, 8 May 2008 (UTC)