Wikipedia talk:AutoWikiBrowser/Typos
- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
Archive |
---|
Misspellings to be added
Should new misspellings go here or in the "Misspellings to be Added" section of the main project page? Regardless, here's about 90 that I've amassed. I'd add them myself, but some of those regexes are pretty complex and scare me. I've verified that all these aren't acceptable by dictionary.com and that there are at least 10 instances of each in Wikipedia. False positives haven't been checked for, however. And there are probably prefixes/suffixes that can be added to most of them.
(Can someone please add some of these? --Thiseye 07:02, 2 March 2007 (UTC))
- jeapordy → jeopardy
- likley → likely
- liqour → liquor
- literaly → literally
- minsitry → ministry
- mountian → mountain
- newstands → newsstands
- nobilty → nobility
- oppenent → opponent
- orginial → original
- personna → persona
- editted → edited
- posibility → possibility
- precip(a|ia)tion → precipitation
- prepatory → preparatory
- pricipal → principal
- recruting → recruiting
- reliquish → relinquish
- reminicent → reminiscent
- replacment → replacement
- responed → responded
- sectretary → secretary
- signiture → signature
- similarily → similarly
- similiar → similar
- unsheath → unsheathe
- valiently → valiantly
- wherupon → whereupon
- wheter → whether
- protray → portray
- protrayed → portrayed
Questioned
- widly → widely
- Might be a typo for "wildly" instead of "widely" -- JHunterJ 11:27, 13 April 2007 (UTC)
- intitution → institution
- Might be a typo for "intuition" instead of "institution" -- JHunterJ 16:39, 22 June 2007 (UTC)
Reliable sources
Is dictionary.com a reliable source?--Andeh 06:04, 11 August 2006 (UTC)
- Nope. See here. alphaChimp laudare 06:19, 11 August 2006 (UTC)
- OK, what about Microsoft Word 2000's or higher dictionary?--Andeh 06:25, 11 August 2006 (UTC)
This looks like a good source for misspellings: http://www.misspelled.com/common/a.htm --BillFlis 10:45, 27 August 2006 (UTC)
Full stops, commas, colons, brackets and double spaces
I have felt that following mistakes are too comon (specially in stubs) to ignore:
- c denotes any alphanumeric character
- s denotes a space character
Mistake | Correction | Suggested code |
---|---|---|
c.c | c.sc | <Typofind="\b(a-zA-Z).(a-zA-Z)\b" replace="$1. $2" /> |
cs.c | c.sc | <Typofind="\b(a-zA-Z) .(a-zA-Z)\b" replace="$1. $2" /> |
cs.sc | c.sc | <Typofind="\b(a-zA-Z) . (a-zA-Z)\b" replace="$1. $2" /> |
c,c | c,sc | <Typofind="\b(a-zA-Z),(a-zA-Z)\b" replace="$1, $2" /> |
cs,c | c,sc | <Typofind="\b(a-zA-Z) ,(a-zA-Z)\b" replace="$1, $2" /> |
cs,sc | c,sc | <Typofind="\b(a-zA-Z) , (a-zA-Z)\b" replace="$1, $2" /> |
c;c | c;sc | <Typofind="\b(a-zA-Z);(a-zA-Z)\b" replace="$1; $2" /> |
cs;c | c;sc | <Typofind="\b(a-zA-Z) ;(a-zA-Z)\b" replace="$1; $2" /> |
cs;sc | c;sc | <Typofind="\b(a-zA-Z) ; (a-zA-Z)\b" replace="$1; $2" /> |
c(c | cs(c | And so forth |
c(sc | cs(c | And so forth |
cs(sc | cs(c | And so forth |
c)c | c)sc | And so forth |
cs)c | c)sc | And so forth |
cs)sc | c)sc | And so forth |
ss | s | And so forth |
Note: Suggested code is based on my preliminary understanding of the pattern of the working code at Wikipedia:AutoWikiBrowser/Typos, and I am very sure it is wrong and needs to be corrected.
Szhaider 15:39, 9 October 2006 (UTC)
- These are indeed common mistakes, but unfortunately, in my experience there are too many legitimate exceptions, such as ".NET", the other mistakes may not have so many exceptions though. Martin 16:16, 9 October 2006 (UTC)
- Yeah, and what about U.S.A.? Or T.S. Eliot? Also, semi-colon is part of many HTML entities, like "—" etc., which will butt right up against letters.--BillFlis 02:11, 10 October 2006 (UTC)
facilitate
The new entry for facilitate is not correct. It's changing facilitate to facilitatli. I think it should have $3 instead of $2. --Thiseye 00:44, 1 March 2007 (UTC)
- Thanks for reporting; fixed. -- intgr 00:47, 1 March 2007 (UTC)
secretarty -> secretary
found in Marita Ulvskog. Jobjörn (Talk ° contribs) 01:21, 8 March 2007 (UTC)
- Added to existing "Secretary" entry.--BillFlis 22:33, 8 March 2007 (UTC)
RETF oddities
I noticed something strange that could be a bug in AWB. I've noticed in several articles that if a typo is in wiki tags [[]], then RETF will not catch this. I assumed this was because it's not excluding the brackets as part of the word so it wasn't matching the regex. But then I noticed in the Akshay Pratap Singh article, that the FAR does catch typos within wiki tags. In this article, "politican" is misspelled. I had a FAR entry to correct this which I recently added to RETF. However, I noticed when I disabled the FAR entry, it would no longer be corrected. I updated the FAR regex to exactly that of the RETF regex, and still FAR would correct it, but RETF would not. --Thiseye 22:43, 11 March 2007 (UTC)
- I believe this has been discussed a few times over on the AWB talk pages, it has been setup like this purposely. There are reasons for doing it both ways, and i think we are looking into having it check more... Post it on the AWB talk page... Reedy Boy 17:55, 12 March 2007 (UTC)
Not sure if anyone will see this...
I was wondering if the AWB could include the often misused words "reoccur", "reoccured", and "reoccuring". These are not actual words (contrary to popular assumption)! They should all be changed to "recur", "recurred", and "recurring". Mahalo. --Ali'i 20:44, 13 March 2007 (UTC)
- Oops, they already are included:
<Typo word="(Re(o)c/Re)currence" find="\b([Rr]eoc|[Oo]c|Re)curran(ces?|t|tly)\b" replace="$1curren$2" /> <Typo word="Recurr(ed/ing)" find="\b(R|r)ec(?:cur?|u)r(ed|ing|ent|ently)\b" replace="$1ecurr$2" />
- Sorry about that. Thanks anyway. --Ali'i 20:47, 13 March 2007 (UTC)
Includeing -> Including
As above, suggest replacing includeing with including. Harryboyles 05:59, 17 March 2007 (UTC)
Asian needs to be updated
There is a misspelling in Kai Chen as asain, the current accounts for aisian....
Dependant vs. Dependent
It appears that "dependant" is acceptable in British English, esp. as a noun. If people concur, it should be removed from the typo list IMHO. —Wknight94 (talk) 15:21, 23 March 2007 (UTC)
- It's not just British. An American dictionary http://www.m-w.com/dictionary/dependant lists it too.--BillFlis 18:10, 23 March 2007 (UTC)
- So it should be removed, no? —Wknight94 (talk) 14:05, 24 March 2007 (UTC)
- It definitely needs to be removed. As a noun a dependant is a person looked after by another e.g. a father's dependants are his children (sorry for the approximate definition). Dependant may well be incorrectly used e.g. 'dependant on the weather ...' but can't be fixed this way. Rjwilmsi 19:19, 26 March 2007 (UTC)
- I removed it shortly after my last message. —Wknight94 (talk) 21:21, 26 March 2007 (UTC)
- It definitely needs to be removed. As a noun a dependant is a person looked after by another e.g. a father's dependants are his children (sorry for the approximate definition). Dependant may well be incorrectly used e.g. 'dependant on the weather ...' but can't be fixed this way. Rjwilmsi 19:19, 26 March 2007 (UTC)
- So it should be removed, no? —Wknight94 (talk) 14:05, 24 March 2007 (UTC)
Regex/CPU question
I know that we want to reduce the number of regexes to reduce the amount of CPU time used to process them all. I'm assuming this means that there is little to no CPU cost associated with adding a variant to an existing regex compared to adding a completely new entry. Should we avoid adding variants to an existing regex that don't occur too often, or does that matter?
Also, it seems we avoid "catching" the correct spelling within the regex. Is that the standard we should go by? And to what extent should we go to avoid that situation? I've seen some regexes that do catch the correct spelling, so should I try to rework these, or is this sometimes acceptable ("available" is an example). Further, should we avoid trying to catch certain variants of typos to avoid catching the correct spelling? Should we avoid adding a new entry to try to catch a variant to avoid catching the correct spelling ("Vancouver" is an example)? --Thiseye 18:28, 25 March 2007 (UTC)
Combining regexes that catch missing "e" before "ly" suffix
I wanted to get some other's thoughts on combining several regexes (and incorporating some new ones). The thing is that if we want to add other variants to these, we'd probably want to separate them out again.
<Typo word="(Accurate/Active/Affectionate/Alternate/Appropriate/(Ab/Re)solute/Collective/Consecutive/Desperate/Exclusive/Extensive/False/Large/Separate/Severe)ly" find="\b((A|a)(ccurat|ctiv|ffectionat|lternat|ppropriat)|([Aa]b|[Rr]e)solut|(C|c)o(llec|nsecu)tiv|(D|d)esperat|(E|e)x(clu|ten)siv|(F|f)als|(L|l)arg|(S|s)e(parat|ver))ly\b" replace="$1ely" />
--Thiseye 00:01, 26 March 2007 (UTC)
- I think this is a good idea, I have been using some regexes like this personally and they can work pretty well. Gaius Cornelius 00:05, 26 March 2007 (UTC)
- Good idea, but I have a suggestion. No English words end in "ivly" or "avly". This:
<Typo word="-(a/i)vely" find="(a|i)vly\b" replace="$1vely" />
- catches your "-ively" words and over a thousand more. I went ahead and added this and a few others under New Additions; I'll let them cook for a while to see if any unforeseen problems arise before deleting any existing entries.--BillFlis 10:29, 26 March 2007 (UTC)
'infinate' fixed to 'infinit'
The typo correction ((In)De/In/Af)Finite fixes 'infinate' to 'infinit'. I'm not competent enough with regex to fix it. Rjwilmsi 19:16, 26 March 2007 (UTC)
- Fixed, but I had to take out the case of "infinity".--BillFlis 19:33, 26 March 2007 (UTC)
- Thanks. And another: ballon can't be corrected to balloon as 'ballon' exists in French and is quoted e.g. Ballon D'or in the Roberto Baggio article.
- That sounds questionable since this is the English Wikipedia. That's one that would need to be rejected manually by the WP:AWB user but shouldn't be removed from the typo list. (My opinion anyway). —Wknight94 (talk) 21:21, 26 March 2007 (UTC)
- Yes, but if you search for "ballon", you get not just Ballon D'Or but a host of articles with that word in the title. On the other hand, we could certainly keep the corrections of "balloning", "ballonist", etc. On the third hand, there aren't a lot of these errors.--BillFlis 10:24, 28 March 2007 (UTC)
- That sounds questionable since this is the English Wikipedia. That's one that would need to be rejected manually by the WP:AWB user but shouldn't be removed from the typo list. (My opinion anyway). —Wknight94 (talk) 21:21, 26 March 2007 (UTC)
- Thanks. And another: ballon can't be corrected to balloon as 'ballon' exists in French and is quoted e.g. Ballon D'or in the Roberto Baggio article.
'responsable(s)' fix needs to be removed
Responsable(s) exists in French so needs to be removed from the "(Ir)Responsible" correction. Rjwilmsi 20:27, 27 March 2007 (UTC)
tPA is corected to TPa but it's correct in articles such as Serpin. Rjwilmsi 20:37, 27 March 2007 (UTC)
- Sorry to push back again (as I did above) but this is the English Wikipedia. Shouldn't French words be occurring very very rarely? To me, that's better to cover as an exception by the WP:AWB user (which is what this list is for). —Wknight94 (talk) 22:03, 27 March 2007 (UTC)
- While, I tend to agree, the RETF project page does state that the "lofty goal of RETF is to be completely automatic. That is, 100% accuracy." So something's got to give. We can't really have it both ways. I have a couple of ideas that I'm going to propose soon to alleviate this. --Thiseye 04:27, 28 March 2007 (UTC)
- From that goal, anytime someone runs across any change in WP:AWB that they need to roll back, they should remove it from the list, right? I'll do that then. Thanks. —Wknight94 (talk) 11:21, 28 March 2007 (UTC)
- While, I tend to agree, the RETF project page does state that the "lofty goal of RETF is to be completely automatic. That is, 100% accuracy." So something's got to give. We can't really have it both ways. I have a couple of ideas that I'm going to propose soon to alleviate this. --Thiseye 04:27, 28 March 2007 (UTC)
For phrases in a language other than English, use {{lang}} for the phrase, for example {{lang|fr|Responsable}}, where the second parameter is the ISO 639 code. It stops AWB changing the text, but I'm not sure about WikEd (if not, it probably should). mattbr 10:53, 28 March 2007 (UTC)
- Thanks. That's a really useful tip I didn't know about. I'll probably go through and tag all French 'responsable's like that. Rjwilmsi 17:25, 28 March 2007 (UTC)
Typica
Typica exists (in English!) but is corrected to Typical. Wasn't sure how to fix the regex myself. Rjwilmsi 07:03, 28 March 2007 (UTC)
- I have removed the regex doing this ((A)Typically). Other changes in he removed regex appear to already be covered in (A)Typical, but someone please update it not. Thanks, mattbr 10:53, 28 March 2007 (UTC)
Another: In (fact/the/a/an) corrects the name Ina
- Removed "ina" and "inan" from regex because of name false positives. I'd also be concerned "inan" would be a typo of "inane". --Thiseye 01:24, 29 March 2007 (UTC)
Nation name capitalization
What do folks think about taking out some of the capitalizations since there are so many animal species that use lower-case versions of words that would ordinarily be upper-case (see this edit for an example of the mistakes that are often made). —Wknight94 (talk) 22:03, 27 March 2007 (UTC)
- "gum arabic" too. -- Euchiasmus 20:17, 7 April 2007 (UTC)
Proposing to remove "Millennium_" since there is a well-known 18th century book, Millenium Hall. —Wknight94 (talk) 00:06, 30 March 2007 (UTC)
There's a band called 'Agression', so the 'agression' -> aggression fix needs to be edited. Rjwilmsi 06:24, 31 March 2007 (UTC)
Official
There is currently an entry for Official, but I'm not sure if it corrects "Offical" --> "Official". Can someone either please add this or let me know that it is in there already? --After Midnight 0001 05:09, 1 April 2007 (UTC)
- I added that case, as well as a couple more word endings.--BillFlis 11:17, 1 April 2007 (UTC)
.coms
I couldn't get negative lookahead to work properly on the .com's (OK, brainfart Harvard would be .edu anyway). Try 1 and Try 2. I'm trying to get it to ignore URLs and emails (ex NSAKEY). Can somebody take a peek? I was reloading the file with click/unclick of the RETF option. — RevRagnarok Talk Contrib 17:40, 1 April 2007 (UTC)
- AWB ignores external http: links (and from the next release https:, ftp: and mailto:), so these shouldn't be a problem. In regular text, I can't think of a situation where you would write a web or email address outside a link. Could you point me to where you are having the problem? You can try out a regex using the find-and-replace option in AWB, and I don't think clicking/unclicking the checkbox reloads the list, but you can from the last option on the 'General' menu. mattbr 18:12, 1 April 2007 (UTC)
- The developers told me click/unclick reloads and that seems to work. The test article is listed above - NSAKEY has the public key for an email @microsoft.com. — RevRagnarok Talk Contrib 18:18, 1 April 2007 (UTC)
- Sorry missed that. Wrap the text in <pre></pre> rather than using a space at the beginning. AWB will then ignore them. mattbr 18:30, 1 April 2007 (UTC)
- That fixes this case, but on a side note, I'd like to know why the regex didn't work. — RevRagnarok Talk Contrib 18:35, 1 April 2007 (UTC)
- Ticking and unticking the box just enables and disables it, it doesnt refresh the typo list. I've just commited a change that if you use the option on the general menu, it will reload them. Reedy Boy 18:41, 1 April 2007 (UTC)
- Two weeks ago you said it did reload the typo page. Guess there was a misunderstanding somewhere. Either way, I < pre> tagged the one spot anyway per Matt. — RevRagnarok Talk Contrib 18:52, 1 April 2007 (UTC)
- Sorry about that, i thought (as it was a bit of a quick fix), that it did. When i looked over the code just now, i realised, that unless the decleration for the typo's was blank (ie = null), it wouldnt load them. I've now put a parameter on that, so that you can force reload, and that works. Sorry for the confusion/lack of complete attention on my part, and for the next release, it definately has been sorted!! Reedy Boy 19:01, 1 April 2007 (UTC)
Re the regex, sorry bit of a regex novice. Can anyone else help? mattbr 18:50, 1 April 2007 (UTC)
august > August
Since august is a word, should this correction be removed, or improved to fix <number> august > <number> August only? Rjwilmsi 17:53, 3 April 2007 (UTC)
- Good point. Probably, but I was having some problems with lookahead in the past (see above). — RevRagnarok Talk Contrib 18:10, 3 April 2007 (UTC)
discribed -> described
As in [1]? Jobjörn (Talk ° contribs) 12:06, 4 April 2007 (UTC)
- Added to "Describe", which is now "(De/Pre)scribe".--BillFlis 19:49, 4 April 2007 (UTC)
strengtened > strengthened
as here. Jobjörn (Talk ° contribs) 14:16, 4 April 2007 (UTC)
- Added to "Strength".--BillFlis 19:43, 4 April 2007 (UTC)
"significatly" --> "significately" ???
The rule <Typo word="-(b/c/d/g/i/m/s/t/v)ately_" find="([bcdgimstv])atly\b" replace="$1ately" /> converts significatly to significately.
Surely that can't be what the inventor intended?
--Euchiasmus 20:13, 7 April 2007 (UTC)
- Yeah, that needs to go away. —Wknight94 (talk) 21:19, 7 April 2007 (UTC)
- I added this case to the existing rule for "Significant", and moved the general rules to the end, so this will be treated as a special case before the general rules kick in.--BillFlis 19:39, 9 April 2007 (UTC)
"distictively" --> "districtively" ???
The word "districtively" doesn't even exist.
Let's have rules that rectify a recognised and bounded set of incorrect words, rather than trying to make the rules too general. What do you think? Euchiasmus 20:30, 7 April 2007 (UTC)
- Agreed as your other significatly example demonstrates. —Wknight94 (talk) 21:19, 7 April 2007 (UTC)
- As the "inventor" of these attempts at general rules, may I ask, what is the harm in replacing one type of error by another? If you did not have the general rule, you would still leave an error. At least its presence in this case alerted you that we need separate rules for these exceptional misspellings. I'll add a rule for "(Di/In)stinctive" to handle your clever discovery!--BillFlis 19:11, 9 April 2007 (UTC)
- It turns out that there was an existing rule to handle "distictively" but it was down in the D's, behind the general rules. I've now moved the general rules to the end, to allow the special cases to be handled first. I also modified the previous "Distinction" to "(Di/In)stinctive".--BillFlis 19:17, 9 April 2007 (UTC)
- As the "inventor" of these attempts at general rules, may I ask, what is the harm in replacing one type of error by another? If you did not have the general rule, you would still leave an error. At least its presence in this case alerted you that we need separate rules for these exceptional misspellings. I'll add a rule for "(Di/In)stinctive" to handle your clever discovery!--BillFlis 19:11, 9 April 2007 (UTC)
"other than"
The regexp for other than would change "Will have to agree with each other then convince the rest." Another regexp to change "(another|(?:the|each|some)) other then" to "$1 other, then" first and then apply the "then" to "than" fix would avoid it. This could also be extended to handle "better then" and "worse then". Or the line could be removed. Is it too much processing for these cases? -- JHunterJ 11:31, 13 April 2007 (UTC)
KPA is being changed to kPa because a rule in the Wikipedia:AutoWikiBrowser/Typos#Abbreviations of SI units section is too general. (I've run into other problems with the SI units but I haven't seen them in a while. I'll bring them up next time I see them.) Can we do a (k[pP][Aa]|[Kk][pP]a|KpA) rule instead? —Wknight94 (talk) 21:58, 15 April 2007 (UTC)
- Unrelated: Canarian Black Oystercatcher subspecies' scientific name is "Haematopus niger meade-waldoi" but "niger" gets changed to "Niger". —Wknight94 (talk) 02:31, 16 April 2007 (UTC)
Easter
I have found a false positive for the capitalization of "easter" and that is "easter egg" in the sense of Easter egg (virtual). After looking at the what links here page, there are between 250 and 500 links to that page, so there are a fair number of instances of this false positive out there. I personally cannot think of a way to alter the rule to fix this problem. I will be removing the rule. If anyone can think of a way to fix it, feel free to add it back. For future reference, this was the rule: <Typo word="Easter" find="\beaster\b" replace="Easter" /> --Maelnuneb (Talk) 17:07, 17 April 2007 (UTC)
- According to Easter egg (virtual), that usage is capitalized as well. I don't see the problem. -- JHunterJ 17:14, 17 April 2007 (UTC)
- As an aside, if it had been needed, I believe
<Typo word="Easter" find="\beaster(?! egg)\b" replace="Easter" />
- would have accomplished the exception-handling desired. -- JHunterJ 17:29, 17 April 2007 (UTC)
- Looking through the page they don't consistently capitalize Easter. Given that, I am going to change the rule to the one you suggested. --Maelnuneb (Talk) 20:25, 17 April 2007 (UTC)
- Meh. Inconsistency in that article is the problem to fix first, IMO. Which I've done, just now. I'd still say the original rule should be restored here, but I'll see if someone else agrees. -- JHunterJ 21:09, 17 April 2007 (UTC)
"Comprised of" rule
I'm a little confused about these rules:
<Typo word="comprises" find="\bis comprised (?:up )?of\b" replace="comprises" /> <Typo word="comprise" find="\bare comprised (?:up )?of\b" replace="comprise" /> <Typo word="comprised" find="\b(?:was|were|been) comprised (?:up )?of\b" replace="comprised" /> <Typo word="comprising" find="\b([Cc])omprised (?:up )?of\b" replace="$1omprising" />
Could somebody with a little more English grammar knowledge please explain these. I don't remember there being a problem with "comprised of" but I could be wrong. --Maelnuneb (Talk) 16:33, 19 April 2007 (UTC)
If X, Y, and Z compose a thing (or a thing is composed of X, Y, and Z), that thing comprises X, Y, and Z. See wikt:comprise -- JHunterJ 16:58, 19 April 2007 (UTC)
- Thanks for looking that one up. One would think that I would have known to look there before asking questions, but apparently not. --Maelnuneb (Talk) 17:07, 19 April 2007 (UTC)
- There was an earlier question about this, which I answered on the RegExTypoFix talk page under the heading Urgh!! -- Euchiasmus 21:26, 19 April 2007 (UTC)
It has been suggested on my Talk that the replacement for "is comprised of" be "is composed of" instead of "comprises". I tend to prefer keeping the base word, and as a bonus I like active voice over passive voice. Any other suggestions or agreements with either choice? -- JHunterJ 11:01, 26 April 2007 (UTC)
Significantion?
signification -> significantion? Looked weird to me so I didn't save the change. --Guinnog 23:04, 26 April 2007 (UTC)
- It was wrong. The pattern for "significant" was lacking the word boundaries. Fixed. -- JHunterJ 23:22, 26 April 2007 (UTC)
- Wow, that was fast! Thank you. --Guinnog 23:26, 26 April 2007 (UTC)
Distinguish
I think this is wrong:
replace="$1istinguis$2" />
Shouldn't it be
replace="$1istinguish$2" />
It seem to be changing 'distinguish' to 'distinguis'. Colonies Chris 13:14, 28 April 2007 (UTC)
- It looks like this has been fixed. --Thiseye 01:26, 30 April 2007 (UTC)
Turks
I think I just found a minor regexp bug while editing this revision of self-loading rifle. The suggested edit for "turks" was "Turks$4" (ie. with the variable in the string). Cheers, -- Seed 2.0 14:51, 1 May 2007 (UTC)
- Fixed.--BillFlis 17:46, 1 May 2007 (UTC)
- Great. Thank you. -- Seed 2.0 18:22, 2 May 2007 (UTC)
Question
Sorry I don't know much about programming or anything, but I'm guessing that we should copy the codes on the page somewhere on our AWB so that it can fix the mistakes when we're using the program, right? I was wondering how we could do that, like where & how do we copy all the typo codes in the program to "make it work" if you see what I mean... Thanks in advance. Zouavman Le Zouave (Talk to me!) 13:06, 3 May 2007 (UTC)
- Nope, just set the option and it will be "on" -- AWB reads it from this article itself. -- JHunterJ 13:12, 3 May 2007 (UTC)
Thanks a lot for such a fast answer! ^^ Zouavman Le Zouave (Talk to me!) 13:14, 3 May 2007 (UTC)
comprised of
AWB changes comprised of to composed of. This is not a typo--one of the meanings of comprised is "to constitue, to make up, to compose", or, pass "to be composed of, to consist of". Could someone explain why AWB is effectively making a word choice change under the guise of fixing a typo? Miss Mondegreen talk 00:42, 4 May 2007
- One of the meanings of "comprised" is as you say. "Comprised of" is informal or incorrect though (see wikt:comprise and my Talk page), and should be changed either to "comprising" or "composed of". I added it as "comprising", but received some complaints that that was hard to understand, so I switched it to "composed of". I certainly don't mind switching it back, and will do so now. -- JHunterJ 11:02, 4 May 2007 (UTC)
- The day I turn to wikitionary as a dictionary is the day I...well, I don't know what, but something as drastic as hell freezing and pigs flying but less cliche.
- Considering that we aren't supposed to site Wikipedia or other wiki sites as references, I'll site the OED:
8. Of things: a. To take up, fully occupy (a space). Obs. rare.
b. To constitute, make up, compose.
c. pass. To be composed of, to consist of.- "Comprised of" is not incorrect, nor is it listed as informal.
- "Comprised of" isn't listed at all in that excerpt. If "comprise" means "to be composed of", "comprised of" therefore means "to be composed of of", which is why it's wrong (or at best informal). Does the OED not have any usage information for "comprise"? Merriam Webster does [2], as do the American Heritage Dictionary [3], Bartleby's [4], and the Random House Word of the Day [5]. -- JHunterJ 12:33, 4 May 2007 (UTC)
- There does seem to be a lot of fervor about this usage. Googling not only gets a variety of definitions that do or don't include the usage, it also yields a number of grammar junkies lecturing on it. This may mainly come from the fact that it is a fairly recent form of the word. The first known occurance comes a century after "comprising" (same meaning), and didn't become well used until halfway through the twentieth century. Regardless, it's both correct, and not listed in the dictionary as being an informal usage. American Heritage alludes to it, but the definition there covers scarely a third of the usages and meanings of the word that the OED covers. I've removed comprised altogether, and unless I've done so correctly or there's an issue with the source I provided, it should stay that way.
- I also think it's a bad idea for word changes, changing incorrect word usage to be under the guise of typo fixing. Even if the change is absolutely correct, the AWB edit summary reflects a typo change, unless the edit summary is manually changed. I'm also concerned that apparantly, definitions and usages for words are being obtained from wiktionary, or at best, online dictionaries that do not list complete definitions and usages. Is it too much to ask that people actually go and look the word up in a comprehensive dictionary before making an edit that sets in motion changes throughout Wikipedia? Changes that cannot necessarily be easily undone. This seems to me to be the height of irresponsibility. Miss Mondegreen talk 05:00, 4 May 2007 (UTC)
- Well, it's certainly turned out to be contentious, so I don't object to its removal, but I did look it up in other dictionaries first (although I don't have ready access to an OED), so please don't cast the move as irresponsible or hasty. I choose to point to the Wiktionary definition first because it is, like this, a Wikimedia project. -- JHunterJ 12:33, 4 May 2007 (UTC)
- One thing that I am 100% sure of is that this change was made with the best intentions and with a lot of debate. This rule has been talked about a lot recently. There are 2 sections now on this page, one of which I personally started, one on JHunterJ's talk page, and one on this page. As far as I am concerned, this rule has been pretty well defended. JHunterJ's talk page and the talk page for RegExTypoFix have more complete information to look at. Please make sure to look at these pages. --Maelnuneb (Talk) 16:55, 4 May 2007 (UTC)
- I saw the debate on the talk page. And I understand that it was made with good intentions. However, I'm still concerned. Wiktionary, wikipedia, wikimedia projects are NOT acceptable sources per WP:V, WP:RS etc. An issue with the change was raised, and no one really had the answer. Information was taken from Wiktionary, both incomplete and not ok due to policy, and from a seriously incomplete American Heritage entry. I understand not having access to sources, but if you don't have access to information, then don't make changes based on something you can't site or prove. All the discussions show me is that people fought about something factual without using verified facts; no one pulled out a complete dictionary until I came to the discussion. And that's just irresponsbile. If this was important to you, to any of the people who believed in this change, if you honestly thought that it was incorrect and AWB should be fixing it, then you really needed to get access to a dictionary somehow. I find it very hard to believe that none of you could have gone to a library, or at least hunted down a fellow wiki user with access to something like the OED. Look if someone creates userboxes for access to online databases, I'll be the first person to put them on my userpage and field requests. Half of the articles I do get involved in are because I do have access and I show up to stop a "he said she said" argument about something factual, where all someone needs to do is go and look it up.
- But aside from the particulars of this case, I'm concerned in general. Pretend that this is right. Who on earth thinks that replacing a misused word is fixing a typo? Changing a word, misused or no is always going to have subtleties that you can't program into code. The people who use AWB move a mile a minute, and while they check the proposed edits to make sure that they make sense, you're asking them to know a fair amount to catch stuff like this. And I'm betting that when they don't see them not making sense, they just let it go ahead. And that's a problem. Because now you have a machine correcting grammar based on programming by users who don't always use dictionaries when programming that machine, and that's a really bad kind of self-correcting wiki.
- Having a mistake in AWB can't really be undone. How many edits get performed with the error before it's caught? How many correct spellings of "dependant" were changed to a different spelling, and therefore different meaning by AWB and had to, or still have to be caught by hand? That's not saying that AWB doesn't serve a good purpose, but if word changes are an even greater liability and if they are going to continue to be considered typos, they should at least be kept in a seperate section, so that a closer eye can be kept on them. Miss Mondegreen talk 14:49, 5 May 2007 (UTC)
- Please quote the OED passage that uses the phrase "comprised of" as acceptable in other than informal usage before continuing with the idea of the "mistake" done by AWB. I've "really" had an answer for each objection raised so far. -- JHunterJ 22:02, 5 May 2007 (UTC)
[restarting indent] I was not using an OED passage to rebut the American Heritage passage. I was using the OED definition. The OED will list the definition as informal or slang or archaic, if it is in fact informal or slang or archaic, and you can see above that definition 8 was archaic. The definition and usage I was referring to had no such listing--the OED does not list it as any of these things. I'm including the quotes, and spelling and etmomolgy, and everything for the defintion and usage that is being discussed here. Then, you'll have everything I have. Miss Mondegreen talk 16:41, 5 May 2007 (UTC)
(k{schwa}m{sm}pra{shti}z) Also 5-7 compryse, 5 Sc. compris, 7-9 comprize. [f. F. comprendre (pa. pple. and pret. Ind. compris):{em}L. comprend{ebreve}re, contr. from comprehend{ebreve}re to COMPREHEND. Probably formed by association with emprise, and possibly with enterprise, both of which verbs were derivatives from Eng. ns. of the same form (repr. F. emprise, entreprise, fem. ns. from pa. pple.), but being used as the Eng. reprs. of emprendre, entreprendre, formed a precedent for the analogous representation of other compounds of -prendre by verbs. in -prise: cf. apprise, surprise. (Many of the early passages in which this word occurs are so vague that it is difficult to gather the exact sense.)]
b. To constitute, make up, compose.
1794 G. ADAMS Nat. & Exp. Philos. II. xvi. 238 The wheels and pinions comprizing the wheel-work. 1794 PALEY Evid. I. ix. (1817) 169 The propositions which comprise the several heads of our testimony. 1850 W. S. HARRIS Rudimentary Magnetism iv. 73 These substances which we have termed diamagnetic..and which comprise a very extensive class of bodies. 1907 H. E. SANTEE Anat. Brain & Spinal Cord (1908) iii. 237 The fibres comprising the zonal layer have four sources of origin. 1925 Brit. Jrnl. Radiology XXX. 148 The various fuses etc. comprising the circuit. 1950 M. PEAKE Gormenghast (1968) xiv. 94 Who, by the way, do comprise the Staff these latter days? 1959 Chambers's Encycl. XIII. 653/1 These fibres also comprise the main element in scar tissue. 1969 W. HOOPER in C. S. Lewis Sel. Lit. Ess. p. xix, These essays together with those contained in this volume comprise the total of C. S. Lewis's essays on literature. 1969 N. PERRIN Dr. Bowdler's Legacy (1970) i. 20 As to who comprised this new reading public, Jeffrey..guessed in 1812 that there were 20,000 upper-class readers in Great Britain.
c. pass. To be composed of, to consist of.
1874 Art of Paper-Making ii. 10 Thirds, or Mixed, are comprised of either or both of the above. 1928 Daily Tel. 17 July 10/7 The voluntary boards of management, comprised..of very zealous and able laymen. 1964 E. PALMER tr. Martinet's Elem. Gen. Ling. i. 28 Many of these words are comprised of monemes. 1970 Nature 27 June 1206/2 Internally, the chloroplast is comprised of a system of flattened membrane sacs.
9. The participles are used absolutely: = Including, included (cf. F. y compris); so the gerund.
1653 H. COGAN tr. Pinto's Trav. vii. 21 He had lost above three thousand and five hundred men, not comprising the wounded. 1663 GERBIER Counsel 37 One quarter of the Ionick Column, the Base and Capital comprised. Ibid. 56 Brick-layers will work..the inside for thirty three shillings, arches comprised. 1887 W. G. PALGRAVE Ulysses, Phra Bat, The edifice..is square, about thirty feet in dimension each way, without comprising the outer colonnade.
Hence com{sm}prised ppl. a., com{sm}prising vbl. n. and ppl. a.
c1575 SIR J. BALFOUR Practicks (1754) 147 Redemptioun of comprysit landis. Marg. Difference betwix comprysit landis and wodset landis. 1603 FLORIO Montaigne (1634) 295 If he be in himselfe, they are also two, the comprizing and the comprized. 1609 SKENE Reg. Maj. 110 Comprisings of lands. 1691 E. TAYLOR tr. Behmen 316 Which breaketh the comprized Life again. 1879 SIR G. SCOTT Lect. Archit. I. 229 The subdivisions..three or four under one comprising arch.
Other rules
- Thanks. It's the "c" definition above that I was looking for. I'm surprised to learn that it doesn't address the usage question that arises in the other sources. And I am happy to have removed the rule that was replacing a form of comprise with a form of compose. Just to be sure, are you objecting to the other rules (is comprised of -> comprises, etc) or no? I'd still like to replace them, even if both are correct according to the OED, under the "Try to find words that are common to all" part of the style guidelines, but if they're also at issue, they should be removed as well. -- JHunterJ 00:59, 6 May 2007 (UTC)
- I'm a little suprised to, but I find time and time again when an issue arises that the OED is so much more complete than other sources that I just go back to it. I suspect that "comprised of" is regarded sometimes as informal because it came into existence later--a whole century after comprises. And it's not like there was no other way to say "comprised of"--there were a few other ways to say it just with the word comprised alone, and in this meaning comprised is practically a synonym for composed, so the usage most likely didn't become integrated into the language quickly the way that other usages and words do when there is a need for them to. However, it's not listed as informal by the OED, the only dictionary I've found to actually list all of the definitions and usages, and I read the stuff you linked to, and the way that the issue is written about seems to be of historical note, though I agree--there are always going to be people who prefer one usage over another an enforce that wherever they can.
- My issue with the other rules is that I'd prefer not to mess with people's grammar or writing. I assume that you're referring to Wikipedia:Manual of Style#National varieties of English? Maybe I'm being completely dense, but I really fail to see how on earth that applies to this at all. Can you explain? The thing is at this point, with the remaining rules that you're referring to, is that both are correct, in most instances (unlike spelling, I won't say all). But fixing with AWB could potentially fix something that was correct to something that isn't, or something that read nicely to something that sounds really clumsy because of the sentence structure. Each article is written by different people and they're going to have slightly different tones and be written in different fashions and I think that switching wording like that is a bad idea. There is only a certain extent to which you can copy-edit blindly--there is an art to editing, and it can't be done with an automated browser. Miss Mondegreen talk 02:54, May 6 2007
- The "national varieties" reads to cover variations in usage national and otherwise, and this seems to fit its description, if not its heading. While I have come across replacements that would have been wrong to use "comprised of" -> "comprising", I haven't yet found any that would be rendered incorrect by the other rules, "is comprised of" -> "comprises", etc., and I don't think there would be any. Could there be? -- JHunterJ 12:04, 6 May 2007 (UTC)
- Sure. "Manjung's land area is predominantly comprised of agricultural land" That's actually the change that brought this to my attention. This is why I'm so against fixing grammar automatically--it's hard enough for human to do. English grammar is complex, obscure, complicated and bizarre--humans have immense difficulty with it. I'm not sure it can be programmed--what absolutes are there? And even then, the programming is dependent on the rest of the article being correct, which is ironic, since it's meant to fix errors. Maybe the minor grammar error that AWB detects and attempts to fix is really a grammar edit elsewhere, but it triggers that phrase that AWB is programmed with. In terms of grammar and word usage, phrases and sentences and paragraphes have to be looked at as ever increasing wholes, until you get to the article as a whole. I just don't think that this is possible. Miss Mondegreen talk 13:05, May 6 2007
- That was a change of "comprised of" to "composed of", and would be eliminated by the elimination of the "comprised of" rule. Is there a potential problem with "is comprised of" -> "comprises"? -- JHunterJ 17:26, 6 May 2007 (UTC)
- Sure. "Manjung's land area is predominantly comprised of agricultural land" That's actually the change that brought this to my attention. This is why I'm so against fixing grammar automatically--it's hard enough for human to do. English grammar is complex, obscure, complicated and bizarre--humans have immense difficulty with it. I'm not sure it can be programmed--what absolutes are there? And even then, the programming is dependent on the rest of the article being correct, which is ironic, since it's meant to fix errors. Maybe the minor grammar error that AWB detects and attempts to fix is really a grammar edit elsewhere, but it triggers that phrase that AWB is programmed with. In terms of grammar and word usage, phrases and sentences and paragraphes have to be looked at as ever increasing wholes, until you get to the article as a whole. I just don't think that this is possible. Miss Mondegreen talk 13:05, May 6 2007
- The "national varieties" reads to cover variations in usage national and otherwise, and this seems to fit its description, if not its heading. While I have come across replacements that would have been wrong to use "comprised of" -> "comprising", I haven't yet found any that would be rendered incorrect by the other rules, "is comprised of" -> "comprises", etc., and I don't think there would be any. Could there be? -- JHunterJ 12:04, 6 May 2007 (UTC)
- Ooh, sorry, I misread that. Uhh...I'm trying examples in my head. I'm not sure if it makes it incorrect, but there are certainly cases where it makes it clumsy, though I'll admit that the wording I'm using to begin with is clumsy already. For example, "a fruit salad is comprised of apples, oranges and grapes" -- "a fruit salad comprises apples, oranges and grapes" -- "a fruit salad is composed of apples, oranges and grapes".
- Now really, I wouldn't user any of these wordings, but composed of and comprised of are best, and comprises is just awful here, though it may be technically correct. But everything I said before, with the wrong example about not wanting to correct grammar with AWB still stands, and it will stand for every instance. English grammar is ridiculously complex and there are so many ifs and ors and buts and we use different spellings and dialects and there are so many variables that I can't see a machine doing this by absolutes, when it is so hard for humans to do this with each individual scenario. Do you really think that AWB can work with grammar the way it does with spelling? Miss Mondegreen talk 21:15, May 6 2007
- Well, in that example, "comprises" and "is composed of" are best to my ear. I don't think substituting "comprises" for "is comprised of" reaches the level of grammar fixing, any more than replacing "I ain't" with "I am not" would. It's still just a rote copy edit. (I can go on like this all day, and wouldn't mind the exchange. If you're still not swayed, though, you can edit the list to remove them, or say so here and I'll remove them.) :-) -- JHunterJ 11:08, 7 May 2007 (UTC)
- Hmmm, then it's clearly some people are familiar with some usages, because to me, comprises sounds painful there, even though technically, I know.... I don't think it should be in the list though, because since all are technically correct and what you are or are not familiar with is closer to a dialect issue than a grammar issue since they are all right, and AWB definitely shouldn't correct for that. Could you remove it? I'm sure I could, but it's code I'm really not familiar with and I noticed you fixed my removal last time.
- By the way, I was serious about the whole userbox thing before. I don't know if anyone is interested in making them, but if so, let me know. Miss Mondegreen talk 10:40, May 8 2007
- Well, in that example, "comprises" and "is composed of" are best to my ear. I don't think substituting "comprises" for "is comprised of" reaches the level of grammar fixing, any more than replacing "I ain't" with "I am not" would. It's still just a rote copy edit. (I can go on like this all day, and wouldn't mind the exchange. If you're still not swayed, though, you can edit the list to remove them, or say so here and I'll remove them.) :-) -- JHunterJ 11:08, 7 May 2007 (UTC)
Capitalization of state names
I just noticed that we seem to have a rule to %s/georgia/Georgia/gcI but not for other states. I haven't gone through the regexp list but we're at least missing the Carolinas and from the looks of it a few other states. *insert semi-obscure Friends quote about getting 56 states here* ;). -- Seed 2.0 01:35, 5 May 2007 (UTC)
- You must mean "state names of the United States of America", whereas the Georgia you found is a state of the former Soviet Union. Since we have that, we don't need to duplicate it in the long-but-incomplete list of Geographical Place Names of the United States.--BillFlis 12:01, 5 May 2007 (UTC)
Mineral, suggestion
miniral -> mineral, came across it the other day. Pax:Vobiscum 22:51, 9 May 2007 (UTC)
Stratagy -> stratey?
Should go to strategy, of course. I don't know regexes well so I can't really fix it myself. —Dark•Shikari[T] 13:51, 10 May 2007 (UTC)
Also directer -> director should be added. —Dark•Shikari[T] 21:20, 10 May 2007 (UTC)
efectiv -> effectiveive
Just a quick heads up. I just noticed that the suggested fix for the 'efectiv' on Silver Nanoparticles was 'effectiveive' and figured that I'd rather just report it than mess with the regexp myself. -- Seed 2.0 10:39, 17 May 2007 (UTC)
out added as a prefix to {{infobox}}
Can someone explain why AWB would have made this change? Miss Mondegreen talk 09:02, May 18 2007
- I think that's going to be user error. The cursor starts in the upper left, and he may ahve not realized that he was typing in the AWB window. Note the edit summaries in this sequence:
- 13:59, 12 May 2007 (hist) (diff) One Piece Grand Battle! (Typo fixing, Typos fixed: american → American, english → English, using AWB) (top)
- 13:59, 12 May 2007 (hist) (diff) InuYasha the Movie: Fire on the Mystic Island (Typo fixing using AWB)
- 13:58, 12 May 2007 (hist) (diff) Yotsuya Kaidan (Typo fixing, Typos fixed: the the → the, using AWB) (top)
If the user enters text manually, he loses the "Typos fixed:" portion of the automatic edit summary. -- JHunterJ 10:55, 18 May 2007 (UTC)
Leftfield
What should be done about regexes that are likely to generate false positives? I mean specifically this one:
<Typo word="(Center/Left/Right) field" find="\b([Cc]enter|[Ll]eft|[Rr]ight)f(?:ie|ei)ld(|ers?)\b" replace="$1 field$2" />
It changes "leftfield" to "left field" which is problematic in case of the Leftfield duo. Jogers (talk) 11:44, 22 May 2007 (UTC)
- In the case where the false positive is a proper noun, just remove the relevant capital letter:
<Typo word="(Center/Left/Right) field" find="\b([Cc]enter|left|[Rr]ight)f(?:ie|ei)ld(|ers?)\b" replace="$1 field$2" />
- That will remove the false positives and some of the real positives, which can be added back in as a separate rule:
<Typo word="Left field" find="\bLeftf(?:eild|ield(ers?))\b" replace="Left field$1" />
- (untested). -- JHunterJ 12:14, 22 May 2007 (UTC)
francophone --> Francophone and anglophone --> Anglophone
I was advised by another user that the capitalisation of these words and their derivatives is not used in all variants of English - see WP:CAPITAL#Anglo-_and_similar_prefixes. Therefore I think it would be appropriate to remove / comment out these corrections. Opinions? Rjwilmsi 01:21, 2 June 2007 (UTC)
- Just the "-one" section? Yes, I think that would be definitely be appropriate. I think commenting out the "-ile" and "-obe" entries would also be appropriate, since they should remain lowercase on Canada-related articles. -- JHunterJ 11:01, 2 June 2007 (UTC)
Problem with "operational" typo fix
My AWB just replaced "opperational" with "operationional" here, so I think the regex could use a second look. TomTheHand 15:29, 4 June 2007 (UTC)
- Thanks. I adjusted it. -- JHunterJ 15:34, 4 June 2007 (UTC)
Duplicated words
I collapsed the duplicated words into one entry. It could be made even more generic:
<Type word="Duplicated words" find="\b(\w+)\b\s+\1\b" replace="$1" />
but that'll have more false positives. If you want to be careful with it, add it explicitly to your personal Find & Replace section in AWB. -- JHunterJ 00:18, 10 June 2007 (UTC)
- I think your elegant rule is a good contribution, but it doesn't work when the first of the duplicated words is capitalized, as at the beginning of a sentence, which the old clumsy rules were able to deal with. I don't see how to handle all those cases in a general rule.--BillFlis 00:55, 10 June 2007 (UTC)
- The rule as written just fixed By by -> by here. -- JHunterJ 00:59, 10 June 2007 (UTC)
- Of course, that was in the AWB Find & Replace section, not in the Typos, so maybe it behaves differently in the Typo list. -- JHunterJ 01:00, 10 June 2007 (UTC)
- Ah, if that's the case, as I see it is, it seems that AWB is using a very non-standard type of regular expressions!--BillFlis 01:49, 10 June 2007 (UTC)
- Of course, that was in the AWB Find & Replace section, not in the Typos, so maybe it behaves differently in the Typo list. -- JHunterJ 01:00, 10 June 2007 (UTC)
- The rule as written just fixed By by -> by here. -- JHunterJ 00:59, 10 June 2007 (UTC)
In my experience of using the duplicate words rules so far, if we only correct lowercase entries there are fewer false positives (say hardly any compared to a few), so perhaps it's better than separate rules for each word. I agree that the above generic line is far too broad for inclusion in the typo list (just consider 'had had', 'in in'), but is useful for very careful use by an individual. Rjwilmsi 07:48, 10 June 2007 (UTC)
- BTW, I found the case-insensitive solution:
- <Type word="Duplicated words" find="\b(?i:(\w+)\b\s+\1)\b" replace="$1" />
- but I'll just leave it here based on Rjwilmsi's note. -- JHunterJ 16:59, 26 June 2007 (UTC)
Using the ?: part
If you need to use parentheses for grouping but not for capturing, it's a good idea to use the (?:blah|yadda) form. This allows subsequent capturing parentheses to be accessible in order ($1 and $2 instead of $1 and $3). Even if there are not subsequent capturing parentheses in the regexp, it's a good idea because it (a) alerts future readers/maintainers that the group is not used in the replacement and (b) it allows for a future editor to add a trailing capture without having to figure out what number it is -- the next $x number can be assumed. In my opinion; that's how I do it in my non-Wikipedia programming. -- JHunterJ 22:39, 18 June 2007 (UTC)
Febuary ->> February
A typo I usually do, Febuary ->> February
37 Pages have that typo.
-Flubeca (t) 16:31, 23 June 2007 (UTC)
- Thanks, we've already got that one listed as a correction. I'll do a search for it later today to correct any articles containing it. Rjwilmsi 16:17, 24 June 2007 (UTC)
- Update: corrected two more articles. I ran the correction about a month ago using a Google search and got most of them. We'll need to wait for the Google cache to reparse the pages before a Google search is clean (mainspace articles only). Rjwilmsi 21:03, 24 June 2007 (UTC)
Affluent (false positive)
Affluent should NOT correct to Afluent.
Affluent - being rich and wealthy --Breno talk 14:22, 27 June 2007 (UTC)
- Fixed. -- JHunterJ 18:31, 27 June 2007 (UTC)
Intension
I suppose that intension should not be changed to intention. Jogers (talk) 17:32, 1 July 2007 (UTC)
- Fixed. -- JHunterJ 19:09, 1 July 2007 (UTC)
Centerfield
Changing "Centerfield" to "Center field" produces false positives. Jogers (talk) 17:44, 1 July 2007 (UTC)
- Fixed. -- JHunterJ 19:09, 1 July 2007 (UTC)
Cristian → Christian
Cristian is a given name and place and shouldn't be corrected to Christian. Thanks, mattbr 19:37, 2 July 2007 (UTC)
New Jersey
One more, new jersey should not auto-capitalise.
The soccer player got his new jersey today. --Breno talk 13:18, 3 July 2007 (UTC)
- Did you actually come across that in wikipedia? It doesn't sound like a very encyclopedic sentence, and ought to be copy-edited.--BillFlis 13:31, 3 July 2007 (UTC)