Page MenuHomePhabricator

Disable translate extension on non-wikitext page
Closed, ResolvedPublic

Description

Right now, typing a <translate> tag on a script page (as in, a page ending with .js in userspace or Mediawiki-space) makes the translate extension kick in, which means it's impossible to save the page with unbalanced tags.

It'd be good to implement some kind of disabling technique to prevent having to use ugly workarounds like groups and such (e.g., /(<)(translate>)/ or "<tran" + "slate>".

Event Timeline

With script pages, do you mean user page subpages ending in .js and MediaWiki namespaces pages ending in .js, or something else?

With script pages, do you mean user page subpages ending in .js and MediaWiki namespaces pages ending in .js, or something else?

Yes, sorry.

Change 444549 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[mediawiki/extensions/Translate@master] Disable Translate extension on non-wikitext pages

https://gerrit.wikimedia.org/r/444549

Change 444549 abandoned by Matěj Suchánek:
[mediawiki/extensions/Translate@master] Disable Translate extension on non-wikitext pages

Reason:
Probably not the best way to do it

https://gerrit.wikimedia.org/r/444549

AntiCompositeNumber renamed this task from Disable translate extension on script pages to Disable translate extension on non-wikitext page.Sep 4 2021, 12:04 AM

Change 718014 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[mediawiki/extensions/Translate@master] PageTranslation: Prevent non-wikitext pages from being marked for translation

https://gerrit.wikimedia.org/r/718014

Can we have a list of non-wikitext pages across Wikimedia that are currently translatable? I recall seeing some smart solution, but I don’t even remember on which wiki, and I can’t query them on Quarry due to T290378.

Can we have a list of non-wikitext pages across Wikimedia that are currently translatable? I recall seeing some smart solution, but I don’t even remember on which wiki, and I can’t query them on Quarry due to T290378.

There isn't an easy way to do that, since there's also no simple API to list translation source pages. However, I did find out in testing that the translation pages keep the same content model as the source page, so we can use the hack of looking for /en subpages to find most translation pages. That will show some false positives of course.

Queries for the content wikis with $wmgUseTranslate: commons meta mediawiki testwiki wikidata wikispecies wikisource betawikiversity incubator frwiktionary

I found the following:

Everything else on commons, meta, mediawiki, testwiki, and incubator were manual translations or Flow talk pages, all other queried wikis had no results.

There isn't an easy way to do that, since there's also no simple API to list translation source pages.

Actually there is (see the noticeboard I linked in T290378), it’s just not available for those who haven’t signed a non-disclosure agreement with Wikimedia…

However, I did find out in testing that the translation pages keep the same content model as the source page, so we can use the hack of looking for /en subpages to find most translation pages.

And has it been so ever? What I recall was years ago. (Of course, this also makes more probable that I recall it incorrectly, or that the hack was removed since.) If a translatable page hasn’t been changed for years, its /en subpage won’t have correct content model unless it was set correctly years ago.

That will show some false positives of course.

And it may have some false negatives as well (although what I recall is certainly not among them): while the vast majority of translatable pages are English, not all of them, there are a few pages that are translated from another language, and if they haven’t been translated into English, they won’t be found by your queries.

Queries for the content wikis with $wmgUseTranslate: commons meta mediawiki testwiki wikidata wikispecies wikisource betawikiversity incubator frwiktionary

I found the following:

Everything else on commons, meta, mediawiki, testwiki, and incubator were manual translations or Flow talk pages, all other queried wikis had no results.

Thanks! Even though I have some concerns above, I’m more confident with these queries that nothing will break.

Coming at it another way, I used Global Search to look for \<translate\> in all JS, CSS, and JSON pages and the entire Module namespace on all public Wikimedia wikis.

Generated using Wikimedia Global Search on 2021-09-04 17:37

Generated using [https://global-search.toolforge.org/?q=%5C%3Ctranslate%5C%3E&regex=1&namespaces=828%2C486&title= Wikimedia Global Search] on 2021-09-04 17:41

WikiPage title
www.mediawikiModule:Transcluder/doc
www.mediawikiModule:Yesno/doc
www.mediawikiModule:Template translation/doc
www.mediawikiModule:No globals/doc
www.mediawikiModule:Arguments/doc
www.mediawikiModule:String/doc
www.mediawikiModule:Message box/configuration/doc
www.mediawikiModule:Version/doc
www.mediawikiModule:Tmpl/doc
www.mediawikiModule:Message box/doc
www.mediawikiModule:Int/doc
pl.wikimediaModuł:ModuleMsg
meta.wikimediaModule:Date/sandbox

Only new-found potential problem is https://meta.wikimedia.org/wiki/Special:PrefixIndex/User:Jeph_paul/ "Luckily" for the purposes of this task all the arbitrary translated javascript is stored as Wikitext, and the two actual JavaScript files are not directly translated. Of course this means there is no protection for the arbitrary javascript loaded and run by the script whatsoever, and is a terrible idea that I'm probably going to have to go find someone to clean up, but this task won't break it. Yay! https://meta.wikimedia.org/wiki/Special:PrefixIndex/Meta:AddMe/ is similar but at least it loads the wikitext as JSON instead of arbitrary javascript. T238386 T156210

Everything else is just doc pages or <translate> used as a string, not as markup to be parsed.

https://meta.wikimedia.org/wiki/Module:Date/sandbox, a sandbox where @Pols12 was trying to translate 2 error messages

I don’t remember the result of this test. However, I have also tried to use Translate in JSON page: meta:Template:Years_or_months_ago/l10n.json. It is parsed as JSON by meta:Module:Years or months ago. However, this does not work when the content model is JSON, so I use wikitext content model.

Solving this task should not create an obstacle for T156210 or T155100: instead of fully disabling Translate extension on non-wikitext pages, we could create a keyword {{DISABLETRANSLATE}} which would do the job, for example.

Solving this task should not create an obstacle for T156210 or T155100: instead of fully disabling Translate extension on non-wikitext pages, we could create a keyword {{DISABLETRANSLATE}} which would do the job, for example.

Magic words work only in wikitext, which is exactly the content model we don’t want to disable Translate for. Also, the current implementation is only about page translation, i.e. <translate> tags, and as far as I see, T156210 is not planned to be resolved with page translation, while T155100 is not even possible (page translation puts translation on subpages, while tabular data / Module:TNT uses one page for all translations).

Tested the patchset #6 - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/718014/6

Test Case #1 - Unbalanced translate tags - FAIL

Steps

  1. Create a page with page content model Javascript
  2. Added a <translate> tag but skipped the closing tag.

Expected

Should be allowed to save a page with unbalanced translate tag if the page content model is not Wikitext

Observation

Did not expect an error, anbd expect the page save operation to work.

Error: Unbalanced <translate> tag.

Technical

PageTranslationHook::tpSyntaxError checks for TextContent.

Test Case #2 - New pages - OK

Steps

  1. Create a page with content model text
  2. Add balanced translate tag

Expected

Should not see a "Mark page for translation" header.

Observation

As expected

TC #3 - Existing translatable pages - OK

Steps

  1. Take an existing translatable page
  2. Change content model to text
  3. DO NOT edit the page

Expected

  1. Page should appear under "Pages with pending changes" section
  2. Should be able to mark the page for translation
  3. Should be able to translate the content

Observation

  1. Page appears under "Pages with pending changes" extension
  2. After updating the page content model, the mark for I'm still able to mark the page for translation. Note that I've not edited the content.
  3. Able to translate the content

TC #3.1 - OK

Steps

  1. Perform steps in TC #3
  2. Edit the translatable page

Expected

  1. After updating / editing the page the page should appear under "Broken pages" section under Page Translation

Observation

  1. As expected the page appears under "Broken pages"

Change 718014 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] PageTranslation: Prevent non-wikitext pages from being marked for translation

https://gerrit.wikimedia.org/r/718014

  NODES
Idea 1
idea 1
Note 4
Project 5
USERS 1