Currently, Mediawiki automatically converts spaces before various punctuation ( ; ? ! ) into non-breaking spaces. It has been suggested that the same feature be implemented for spaces after section markers (§). For example, the following article currently includes 249 manually encoded non-breaking spaces due to the heavy use of section markers:
https://de.wikipedia.org/wiki/%C2%A7_175
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T15619 Add non-breaking spaces in additional places automatically | |||
Open | None | T119463 Automatically convert spaces after section markers (§) into non-breaking spaces |
Event Timeline
Found this part which seems related to the bug in Parser.php.
$fixtags = [
- french spaces, last one Guillemet-left
- only if there is something before the space '/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1 ',
- french spaces, Guillemet-right '/(\\302\\253) /' => '\\1 ', '/ (!\s*important)/' => ' \\1', ];
Is Section marker to be added here ?
If you "found this part", where did you find it? Clear links and references are always welcome. Thanks!
@Aklapper The section of code is from includes/parser/Parser.php.
Line number 1297.
Should I proceed to add Section marker here ?
Created a patch to add non-breaking space after §.
Screenshot:
I have also uploaded the change to gerrit, needs review for the code.
@Harjotsingh: Thanks for the patch! Please follow https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines and link to this task in your commit message, to automatically get a notification link here.
Change 274770 had a related patch set uploaded (by Harjotsingh):
Convert space after § to non-breaking spaces
Change 275203 had a related patch set uploaded (by Harjotsingh):
Convert space after § to non-breaking spaces
This would be a subtask/duplicate of T15619: Add non-breaking spaces in additional places automatically.
@cscott
You mentioned the parser tests here https://gerrit.wikimedia.org/r/#/c/275203/.
Which tests am I supposed to add and where can I find how to do so ?
Change 332037 had a related patch set uploaded (by Harjotsingh):
Convert space after § to non-breaking spaces
I don't like this kind of processing in the parser at all. Non-breaking spaces should be added while editing time and saved into the database and not added in parser time while output.
Imho This patch will fail. The code
'/(§) (.)/' => '§ '
would delete the character right from the space. It should be something like
'/§ (.)/' => '§ \\1'
or
'/§\K (?=.)/' => ' '
or
'/§\K \b/' => ' '
@seth
Yes it was deleting the next character and backreference was needed.
I've done the necessary changes.
Thanks !
Converting spaces to non-breaking spaces based on special replacement rules on parser time generates additional parser errors and sometimes unwanted effects. For some example problems with the current whitespace replacements in the parser see T40797. These problems are syntactical and may be solved by adding additional replacement rules, which makes everything more complex. There are also semantical problems, because a non-breaking space is semantical not wanted at all situations.
Here some real examples:
https://de.wikipedia.org/wiki/DIN_1505-2:
Danach folgt die Kennzeichnung, zum Beispiel § und dann die Zählung, die auch die Untergliederung, wie gezählte Absätze oder ...
https://de.wikipedia.org/wiki/Codepage_437
Dem Steuerzeichenbereich 00hex–1Fhex sind verschiedene, mit Ausnahme des Paragraphenzeichens § nicht druckbare Grafikzeichen zugeordnet, die zum einen ...
https://de.wikipedia.org/wiki/Halbeink%C3%BCnfteverfahren
Beispielsweise befreite Buchstabe d dieses § die Hälfte der Bezüge ...
Of course it is possible to add workarounds in the wikitext to work around these parser errors. When changing the parser these workaround must be inserted in the wiki before changing the parser.
I think it is not worth. I think the better solution is to add this automatic replacement rules in to the wikieditor. When there is a unwanted replacement error then it can fixed in the editor.
Visual Editor makes it very easy to add automatic replacement rules. However, the might be seen as ugly by wikitext editors. T5461: Syntax extensions: special character, e.g. underscore, for non-breaking space ( ) would make the wikitext look much nicer when explicit are added, and give editors full control over their placement (or not).
The current automatically replacement for French spacing in the parser generates problems an several places, for example in T5158.
Replacement rules in the editor are better. To avoid the ugly in the wikieditor the Unicode character U+00A0 should be used. T181677 implements a syntax highlight for U+00A0 in the CodeMirror wikitext editor.
@Harjotsingh: Hi! This task has been assigned to you a while ago. Could you maybe share an update? Do you still plan to work on this task? Thanks! :)
This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!
For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)
Change 332037 abandoned by Thiemo Kreuz (WMDE):
[mediawiki/core@master] Convert space after § to non-breaking spaces
Reason:
4 years old, disputed and in conflict. This is easy to redo or reopen if it's still needed.