Page MenuHomePhabricator

Locally created fallback should take precedent over config fallback
Open, Needs TriagePublic

Description

Issue:

  • There are always only zh-hans.json , zh-hant.json , zh-hk.json (with only language variant name messages in zh.json )
  • Currently, MediaWiki JSON messages took precedence over local customized messages in the language fallback chain
  • Which means, to override messages in zh-hans.json , we need to duplicate ALL language messages falls back to zh-hans
  • This makes custom system messages hard for maintenance
  • We should able to override zh-hans.json messages simply by ONLY creating /zh-hans instead of create all of /zh-hans , /zh-cn , /zh-sg , /zh-my

Solution:

  • Split out "JSON fallback" and "MediaWiki: fallback"
  • For example: The fallback for "/zh-mo" pages / messages if the site language code is zh:
"/zh-mo" (no longer need to be duplicated)
 => "zh-mo.json" (always not existed)
 => "/zh-hk" (no longer always need to be duplicated)
 => "zh-hk.json" (should just fallback to here if string translated separately)
 => "/zh-hant" (should just fallback to here if defined)
 => "zh-hant.json"
 => "(root MediaWiki: page)" ("/zh" pages are not used for backend)
 => "zh.json"
 => "/zh-hans"
 => "zh-hans.json"
 => "/en"
 => "en.json"
 => "qqx"

Checklist:


zh-my, zh-sg, zh-mo, zh-tw, zh-cn, zh are no longer maintained as an interface language variant, thanks to the fallback scheme. currently maintained ones are: zh-hant zh-hans and zh-hk. It should be stressed although these are disabled, the existing have not been removed from the system, which cause maintenance overhead to downstream administrators. Suppose a system message have all the legacy variants set in mediawiki software, it means the interface administrator of a site need to create all the 8ish pages for a single message, which fallback to default mediawiki message instead of its fallback message on local instance.
So if MSG/zh-my and its identical MSG/zh-hans exist in mediawiki software, and the MSG/zh-hans is updated to use some templates maybe. MSG/zh-my needs to be created as well, otherwise it fallback to system message.


After further digging, it seems the problem exist because the read order seems to be MW:MSG/zh-sg, SYS:MSG/zh-sg, SYS:MSG/zh-hans, MW:MSG/zh-hans.
I would propose something like MW:MSG/zh-sg, SYS:MSG/zh-sg; MW:MSG/zh-hans; SYS:MSG/zh-hans. where SYS denotes default in config file, and MW denotes messages created locally in db.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Winston_Sung You want to backport this patch? I think it may not fulfil the backport policy though, it's more like a new feature and includes a bunch of deprecation.

I think it may not fulfil the backport policy though

Is there link to the backport policy?

It's more like a new feature

I think it's more like fix something unintended behavior.

includes a bunch of deprecation

I didn't include the deprecation in the cherry-picks, though.

I think it may not fulfil the backport policy though

Is there link to the backport policy?

https://www.mediawiki.org/wiki/Backporting_policy

It's more like a new feature

I think it's more like fix something unintended behavior.

I think at least should not be backported to 1.39. Maybe 1.40 is fine since it's still in rc stage, but you may consult @Reedy who is doing the release things.

See also https://www.mediawiki.org/wiki/Version_lifecycle#Release_schedule but this isn't followed so rigorously.

We don't tend to backport features, but occasionally it happens. Primarily, it's trying to avoid introducing breaking changes in patch releases is the key. Sometimes there is no way to avoid it, and we occasionally have to do so.

If it was included in REL1_40, the deprecations could potentially move to 1.40. But doesn't matter too much either way for that.

I can see it being both a feature, but also a bug fix.

The problem I have, is really, REL1_40 is already late, and is going out today. The patch is not overly well tested, though it wasn't reverted, so that's always a good start after a couple of train cycles.

And looking at the backport patches, there's some divergence against master with tests/parser/ParserTestRunner.php. Generally, I try and bring in cherry picks of patches where possible (modifying as needed), rather than just copying code randomly in from other patches.

For example, from what I can tell in the patches, chunks of rMW886e52503f6d: Cleanup ParserTestRunner - take 2 are being copied in to the backports, without even any attribution.

Then, this is the second attempt at a patch which was originally reverted.

Which creates divergent code from master, without it being obvious where these changes came from. Creating potential maintenance issues down the line. Remembering REL1_39 is an LTS and needs to be supported for years.

For REL1_40, I'd prefer rMWbfd4001c6c22: LocalisationCache: Preserve fallback source language info was copied in as close as possible, and then if it's useful/necessary, chain rMW886e52503f6d: Cleanup ParserTestRunner - take 2 ontop. Rather than just arbitrarily merging two commits into one as a backport. If supporting ParserTestRunner changes are needed, cherry pick those standalone too...

For REL1_39, I'd honestly prefer something similar too. But I don't know how many supporting changes would be needed (to ParserTestRunner), as it's ~9 months behind.

For example, from what I can tell in the patches, chunks of rMW886e52503f6d: Cleanup ParserTestRunner - take 2 are being copied in to the backports, without even any attribution.

Oh, I was wondering where the merge conflicts came from.

If it's useful/necessary, chain rMW886e52503f6d: Cleanup ParserTestRunner - take 2 ontop.
If supporting ParserTestRunner changes are needed, cherry pick those standalone too...

Any thoughts? @Func

If it's useful/necessary, chain rMW886e52503f6d: Cleanup ParserTestRunner - take 2 ontop.
If supporting ParserTestRunner changes are needed, cherry pick those standalone too...

Any thoughts? @Func

Not needed for this task, I would like not touch that part. Even if we want to fix T340390, we can do a minimum fix instead.

Yeah, I'd prefer minimum fix, too.

Reopened as more work is needed.

Is it worth backporting these patches then?

This should be a good candidate for TechNews since it will help many wikis with localisation, and we should inform interface admins of the best practice to maintain local localisation message overrides from now.

The task description is relatively poor, so I would try to write something useful here.

After wmf.16 (and wmf.18 fixed T340840) rolled out, wikis would not be required to override MediaWiki: pages of all sub-language codes anymore, they can maintain local overrides easier if fallback to the next local override is acceptable. For example, enwiki doesn't have to create en-ca and en-gb subpages with transclusion of the base pages anymore.

For Tech News, is this ready to go out in the upcoming edition? (I.e. drafted this week, and sent on Monday)
If so, does this accurately represent the description? (Edits or approval would be appreciated!)

Changes later this week

  • MediaWiki system messages will now fallback to a local fallback if it is available, instead of always using the default fallback. This means wikis no longer need to override each fallback-language separately. For example, English Wikipedia doesn't have to create en-ca and en-gb subpages with a transclusion of the base pages anymore. This makes it easier to maintain local overrides.

Perhaps it would also be good to add a final sentence reminding folks to request changes to the defaults when applicable, e.g. something like this?

If a message change will benefit all projects, then please do remember to file tasks to update the default messages instead of creating an override.

Lastly, a reminder to update anything needed in the main documentation, e.g. https://www.mediawiki.org/wiki/Help:System_message - Thanks!

For Tech News, is this ready to go out in the upcoming edition? (I.e. drafted this week, and sent on Monday)

Yeah, this is ready since the fix for the edit interface will be shipped with wmf.18.

If so, does this accurately represent the description? (Edits or approval would be appreciated!)

Changes later this week

  • MediaWiki system messages will now fallback to a local fallback if it is available, instead of always using the default fallback. This means wikis no longer need to override each fallback-language separately. For example, English Wikipedia doesn't have to create en-ca and en-gb subpages with a transclusion of the base pages anymore. This makes it easier to maintain local overrides.

Overall looks good, but I wonder if "fallback to a local fallback" is a little bit confusing.

I will try:

  • MediaWiki system message system will now look for available local fallbacks, instead of always using the default fallback defined by software. This means wikis no longer need to override each language on the fallback chain separately. For example, English Wikipedia doesn't have to create en-ca and en-gb subpages with a transclusion of the base pages anymore. This makes it easier to maintain local overrides.

Feel free to amend.


Perhaps it would also be good to add a final sentence reminding folks to request changes to the defaults when applicable, e.g. something like this?

If a message change will benefit all projects, then please do remember to file tasks to update the default messages instead of creating an override.

Translations of localisation messages are managed on translatewiki.net, and the edit notice of local message pages already has a link point to there, so I think we don't need to mention this again.

Lastly, a reminder to update anything needed in the main documentation, e.g. https://www.mediawiki.org/wiki/Help:System_message - Thanks!

It seems this page doesn't have much relevant information, I will try to search for and fix other pages, or extend that page later.
Maybe different communities documented the annoying old behaviour locally.

Test wiki on Patch demo by Func86 using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/c4810be08f/w/

@Quiddity - is there a way to tell what the "fall backs" are on a project?

@Xaosflux I believe the Special:AllMessages page's filters are the way to do that? E.g. mw's en-ca overrides, or enwiki's en-gb overrides. I'm not deeply familiar with this set of features though, so there might be other/better tools available?

@Func That's great, thank you for the draft and the context about the existing edit notice. I've added to the tech news draft. It will be frozen for translations on Friday, if any edits are needed before then.

@Quiddity - is there a way to tell what the "fall backs" are on a project?

You can find them following the fallback chain.
For example, en-ca and en-gb fallback to en, which is the base page for wikis with en as their default content language.
The ultimate fallback for languages within the family (e.g. English) of the content language usually is the base page.

With the fix for T340840 deployed, you will also be able to see the correct fallback result when viewing non-existent subpages.

Thanks for the note Func, digging though files is a mess - guess a FR for making a front-end lookup/display would be needed

Thanks for the note Func, digging though files is a mess - guess a FR for making a front-end lookup/display would be needed

I created T342669 for improving Special:AllMessages, not sure if that covered your points, feel free to extend that task.

Hum, since we have done the work in this task, we should merge it into this one.

(removed patch-for-review since only some backports are attached to this task)

Change 933684 merged by jenkins-bot:

[mediawiki/core@REL1_39] LocalisationCache: Preserve fallback source language info

https://gerrit.wikimedia.org/r/933684

Change 933685 merged by jenkins-bot:

[mediawiki/core@REL1_40] LocalisationCache: Preserve fallback source language info

https://gerrit.wikimedia.org/r/933685

Is work on this still ongoing? In production noticed that en-ca messages are falling back to locally defined en messages, however en-gb messages are not.

Winston_Sung reopened this task as In Progress.
Winston_Sung updated the task description. (Show Details)
  • (further check needed)

In production noticed that en-ca messages are falling back to locally defined en messages, however en-gb messages are not.

That's by design. en-gb has its own software-defined message, which may or may not very differ from the one from en, so it would not fallback to the created override for en.
If the difference is not necessary, you may proceed to translatewiki to delete the en-gb subpage.

Wait, so if anyone on that external site creates a page it will still override local? Shouldn't it only use that fall back when there is not a locally defined fallback? Is this something special only for en-gb? The user story in the description of this task says that you should only have to define the root language, not mirror it to every possible variant in order to override them.

The lookup sequence for en-gb of this particular instance is: /en-gb message subpage (does not exist)->en-gb.json (got the message here, stop)->message base page (en)->en.json
This is to avoid fallback too aggressively, it might be acceptable for en-gb to always fallback to en local overrides, but it's not true for other languages.

The user story in the description of this task says that you should only have to define the root language, not mirror it to every possible variant in order to override them.

This is still true for most en-gb messages since en-gb.json only has a few lines of definition.

Well, that's annoying - suppose we would need a highly advertised request to avoid this by having WMF wiki's "not accept en/xxx variants from translatewiki" anymore?

I've attempted to get someone there to delete that page (https://translatewiki.net/wiki/MediaWiki:Blockiptext/en-gb)

Well, that's annoying - suppose we would need a highly advertised request to avoid this by having WMF wiki's "not accept en/xxx variants from translatewiki" anymore?

Does that mean spelling variants would need to be recreated on every WMF wiki?

Sorry, it's not safe to place deletion template on message page, so I have undone your change.

And, the difference is en-gb message uses vandalised instead of vandalized, I am not sure if it's good to delete.

Does that mean spelling variants would need to be recreated on every WMF wiki? -- not if they are localized, only if they are on that tiny list coming from external source

So this behavior is opposite of the "Locally created fallback should take precedent over config fallback" idea in this ticket, the config fallback messages are being used, when they are not localized, when the root language is localized.

Does that mean spelling variants would need to be recreated on every WMF wiki? -- not if they are localized, only if they are on that tiny list coming from external source

What exactly do you mean by "localized", and what evidence do you have that the mentioned list is "tiny"?

As far as local:

If Mediawiki:X1 isn't created locally (localized), we expect it will use the mediawiki default text
If it is created, local users expect that people using their project's native language will see it instead
The just of this ticket is that if someone has picked a variant of the project's native language, showing them the localized version is preferable to the upstream off-project version

If someone has picked a variant of the project's native language, showing them the localized version is preferable to the upstream off-project version.

Not really in all cases.

It is intended that not to use aggressive fallback:

  • Giving untranslated custom message is actually bad-"information-accessibility" and may just give none information (compare to giving default translated information, which has more "information-accessibility").
  • In some cases, we would just fix the translation "only for specified language code" and don't want to override other language translations.

We could find solutions that fit the "en case", but making aggressive fallback in all languages is not a good idea.

I added T349115 as a subtask since it is a (minor) bug introduced in rMWbfd4001c6c229657091d866ae51e2cbb5979344a. I proposed a patch.

Winston_Sung moved this task from MediaWiki core to Closed on the Chinese-Sites board.

Reject closure due to the unresolved concern I re-raised above. If it needs to be split out, that should be done first.

Does that mean spelling variants would need to be recreated on every WMF wiki? -- not if they are localized, only if they are on that tiny list coming from external source

They ALREADY need to be created, both before AND after the change.

Before the change:

  • Custom en-CA → Default en-CA (use fallback'd default en, never load custom en)
  • Custom en-GB → Default en-GB (won't load default en if default en-gb exist, never load custom en)

After the change:

  • Custom en-CA → Default en-CA → [added] Custom en → [moved fallback sequence] Default en
  • Custom en-GB → Default en-GB → [added] Custom en (still won't load as default en-gb already defined) → [moved fallback sequence] Default en

Xaosflux wants:

Custom en-CA -> custom en -> default en-ca -> default en

That seems right to me too - wiki customizations are more important than which variety of English is used.

The user story in the description of this task says that you should only have to define the root language, not mirror it to every possible variant in order to override them.

It did not say that.

What it said is to make the custom messages being part of the fallback chain instead of ignoring them (what we did before).

Xaosflux wants:

Custom en-CA -> custom en -> default en-ca -> default en

That seems right to me too - wiki customizations are more important than which variety of English is used.

Imaging that your user language is zh-hans and one day someone created custom messages in en/zh-hant/whatever in the language fallback chain, then Boom! The zh-hans user interface now fullfilled with English/whatever non-zh-hans language's system messages.

This comment was removed by Pppery.

(based on an outdated verison of the previous comment)

Wouldn't it be possible to apply that only to language variants, and use the other rules for totally different languages, though?

And the wiki may even want the behavior you're describing as "boom". On Spanish Wikipedia with my interface language set to English, I think I'd still rather see the lengthy message explaining why the Spanish Wikipedia Main Page is protected at https://es.wikipedia.org/w/index.php?title=Wikipedia:Portada&action=edit&uselang=es than the default at https://es.wikipedia.org/w/index.php?title=Wikipedia:Portada&action=edit&uselang=en even though I speak only a little Spanish and am a native English speaker.

It looks more like something custom content vs. custom translation.

  • Custom translation:
    • "Discussion" vs. "Talk" vs. ...
    • "Publish" vs. "Save" vs. ...
  • Custom content:
    • "This page has been protected to prevent editing or other actions. ..." vs. "This is the source code of the main page of the site. You can propose changes on ..."
    • "This action has been identified as harmful ..." vs. "This edit does not comply ..."
  NODES
admin 3
Idea 2
idea 2
INTERN 1
Note 10
Project 12
USERS 1