Page MenuHomePhabricator

PHP Warning: Wikimedia\RemexHtml\DOM\DOMBuilder::createNode(): unterminated entity reference
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   PHP Warning: Wikimedia\RemexHtml\DOM\DOMBuilder::createNode(): unterminated entity reference          & 'sm'
exception.trace
from /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/DOM/DOMBuilder.php(241)
#0 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/DOM/DOMBuilder.php(241): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/DOM/DOMBuilder.php(321): Wikimedia\RemexHtml\DOM\DOMBuilder->createNode(Wikimedia\RemexHtml\TreeBuilder\Element)
#2 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/TreeBuilder/TreeBuilder.php(277): Wikimedia\RemexHtml\DOM\DOMBuilder->insertElement(integer, Wikimedia\RemexHtml\TreeBuilder\Element, Wikimedia\RemexHtml\TreeBuilder\Element, boolean, integer, integer)
#3 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/TreeBuilder/TreeBuilder.php(257): Wikimedia\RemexHtml\TreeBuilder\TreeBuilder->insertForeign(string, string, Wikimedia\RemexHtml\Tokenizer\LazyAttributes, boolean, integer, integer)
#4 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/TreeBuilder/InBody.php(456): Wikimedia\RemexHtml\TreeBuilder\TreeBuilder->insertElement(string, Wikimedia\RemexHtml\Tokenizer\LazyAttributes, boolean, integer, integer)
#5 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/TreeBuilder/Dispatcher.php(420): Wikimedia\RemexHtml\TreeBuilder\InBody->startTag(string, Wikimedia\RemexHtml\Tokenizer\LazyAttributes, boolean, integer, integer)
#6 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(1479): Wikimedia\RemexHtml\TreeBuilder\Dispatcher->startTag(string, Wikimedia\RemexHtml\Tokenizer\LazyAttributes, boolean, integer, integer)
#7 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(595): Wikimedia\RemexHtml\Tokenizer\Tokenizer->handleAttribsAndClose(integer, string, boolean, integer)
#8 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(369): Wikimedia\RemexHtml\Tokenizer\Tokenizer->dataState(boolean)
#9 /srv/mediawiki/php-1.40.0-wmf.12/vendor/wikimedia/remex-html/src/Tokenizer/Tokenizer.php(178): Wikimedia\RemexHtml\Tokenizer\Tokenizer->executeInternal(boolean)
#10 /srv/mediawiki/php-1.40.0-wmf.12/includes/ResourceLoader/VueComponentParser.php(110): Wikimedia\RemexHtml\Tokenizer\Tokenizer->execute()
#11 /srv/mediawiki/php-1.40.0-wmf.12/includes/ResourceLoader/VueComponentParser.php(66): MediaWiki\ResourceLoader\VueComponentParser->parseHTML(string)
#12 /srv/mediawiki/php-1.40.0-wmf.12/includes/ResourceLoader/FileModule.php(1372): MediaWiki\ResourceLoader\VueComponentParser->parse(string, array)
#13 /srv/mediawiki/php-1.40.0-wmf.12/includes/ResourceLoader/FileModule.php(341): MediaWiki\ResourceLoader\FileModule->getPackageFiles(MediaWiki\ResourceLoader\DerivativeContext)
#14 /srv/mediawiki/php-1.40.0-wmf.12/includes/ResourceLoader/Module.php(817): MediaWiki\ResourceLoader\FileModule->getScript(MediaWiki\ResourceLoader\DerivativeContext)
#15 /srv/mediawiki/php-1.40.0-wmf.12/includes/ResourceLoader/Module.php(786): MediaWiki\ResourceLoader\Module->buildContent(MediaWiki\ResourceLoader\DerivativeContext)
#16 /srv/mediawiki/php-1.40.0-wmf.12/includes/ResourceLoader/ResourceLoader.php(1109): MediaWiki\ResourceLoader\Module->getModuleContent(MediaWiki\ResourceLoader\DerivativeContext)
#17 /srv/mediawiki/php-1.40.0-wmf.12/extensions/WikimediaMaintenance/blameStartupRegistry.php(139): MediaWiki\ResourceLoader\ResourceLoader->makeModuleResponse(MediaWiki\ResourceLoader\DerivativeContext, array)
#18 /srv/mediawiki/php-1.40.0-wmf.12/maintenance/includes/MaintenanceRunner.php(309): BlameStartupRegistry->execute()
#19 /srv/mediawiki/php-1.40.0-wmf.12/maintenance/doMaintenance.php(85): MediaWiki\Maintenance\MaintenanceRunner->run()
#20 /srv/mediawiki/php-1.40.0-wmf.12/extensions/WikimediaMaintenance/blameStartupRegistry.php(336): require_once(string)
#21 /srv/mediawiki/multiversion/MWScript.php(120): require_once(string)
#22 {main}
Impact
Notes
  • > 100 in the last 30m

Event Timeline

Krinkle subscribed.

RemexHtml is probably not at fault, but rather its caller is. Either RL's Vue converter or GrowthExperiments's Vue file.

Input is from this file:
https://gerrit.wikimedia.org/g/mediawiki/extensions/GrowthExperiments/+/c6d6595adc14df9fd94bd80bc93f37a68c9e703e/modules/ext.growthExperiments.Homepage.NewImpact/components/NoEditsDisplay.vue#22

				<c-text
					v-i18n-html:growthexperiments-homepage-impact-unactivated-subheader-subtext="[ userName ]"
					class="ext-growthExperiments-NoEditsDisplay__content__messages__subtext"
					:size="[ , , 'sm' ]"
					:weight="renderMode !== 'overlay-summary' && 'bold'"
				>
		<c-text
				v-if="isDisabled"
				:size="mode !== 'desktop' && 'sm'"
				color="subtle"
			>

@kostajh Based on that triage, I'm assuming you'll take a look first. If it turns out that Vue requires it this way, pass it back to us and we can look into updating out Vue parser in PHP to grok this somehow.

Change 865065 had a related patch set uploaded (by Kosta Harlan; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] User impact: avoid parse warnings in unactivated state

https://gerrit.wikimedia.org/r/865065

Vue's SFC Syntax Specification help page says "A Vue SFC is syntactically compatible with HTML." So I think VueComponentParser is doing the right thing when parsing .vue files as HTML. We'd either have to write &amp;&amp; for conjunction (bleh), or just avoid it altogether like the patch does.

This is a very easy mistake to make though: the syntax feels natural, the Vue compiler doesn't try to validate the source code of dynamic props as HTML so the code works, and few people look at the PHP error logs when testing a .vue file. So IMO it absolutely needs either a linter rule or better error reporting (instead of logging the Remex error to some PDR-3 channel, maybe it could be converted to a console.log command?) or both.

This is a very easy mistake to make though: the syntax feels natural, the Vue compiler doesn't try to validate the source code of dynamic props as HTML so the code works, and few people look at the PHP error logs when testing a .vue file. So IMO it absolutely needs either a linter rule or better error reporting (instead of logging the Remex error to some PDR-3 channel, maybe it could be converted to a console.log command?) or both.

I agree. Looping Design-System-Team, maybe they have a say on this. I think &amp;&amp; is not natural to a Javascript expression which is what lives inside Vue template bound props. I don't know much about Remex but I think parsing Vue SFC files as HTML is right but parsing the content of <template> tag as pure HTML might be tricky. Vue attributes like @keypress.up="someHandler" or :some-attribute="<some JS expression>" might be wrongly treated by standard HTML parsers. My main concern is the solution we apply is compatible with other Vue SFC parsers eg: Vite so a component written for a MW extension can be reused in a Vite stie and viceversa. Or can be compiled with standard rollup, webpack modules.

Looks like a bug in RemexHtml to me. Invalid input should be normalized a long time before it gets to DOMBuilder. The Tokenizer will call error(), which will propagate through to the pipeline to the errorCallback specified in the DOMBuilder constructor. But you probably didn't specify a callback, so the error would be discarded. The invalid references will be interpreted in a compliant way and the DOM should otherwise be normal.

On line 241 is the assignment

$attrNode->value = $attr->value;

Where $attrNode is a DOMAttr. I wouldn't expect that to be expanding character references in the new value. If you use &amp;&amp; instead of && you'll probably just get the same warning. The Tokenizer will convert it to &&, this time without calling the error callback, and then DOMAttr will raise the same warning.

As a quick fix (without knowing the specific context of what you are doing), I'd recommend moving the logic containing && out of the HTML part of the template and into a dedicated computed property. So something like:

<c-text v-if="isDisabled" :size="computedSize" color="subtle">
computedSize() {
    return this.mode !== 'desktop' && 'sm';
}

However, it would be good to know if this is another thing that developers in MediaWiki need to avoid doing inside Vue SFCs. There are a few other issues that have come up in the past (PascalCase component names and self-closing component tags are valid from Vue's perspective but won't work in MW, for example).

The full list of "things you can't do in Vue SFCs inside MW" that I'm aware of lives here: https://www.mediawiki.org/wiki/Vue.js/Guidelines#Use_Single-file_components. If anything here needs to be corrected or updated let me know.

php > $attrNode = new DOMAttr('test');
php > $attrNode->value = '&amp;';
php > print $attrNode->value;
&

Incredible and alarming. The manual points to the W3C spec -- @cscott, am I reading it correctly? This behaviour is explicitly disallowed, right?

JS console:

> var test = document.createAttribute('test');
> test.value = "&amp;"
"&amp;"
> test.value
"&amp;"

I've got a RemexHtml patch almost ready, but we should probably patch PHP as well.

As a quick fix (without knowing the specific context of what you are doing), I'd recommend moving the logic containing && out of the HTML part of the template and into a dedicated computed property. So something like:

There's no need to fix the input. The input is fine. It's OK to put bare ampersands in attributes. The warning is not due to bare ampersands.

Change 869324 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/libs/RemexHtml@master] Workaround for attribute expansion when setting DOMAttr::$value

https://gerrit.wikimedia.org/r/869324

There's no need to fix the input. The input is fine. It's OK to put bare ampersands in attributes. The warning is not due to bare ampersands.

Per the W3C spec, if the tokenizer encounters an ampersand in an attribute, it should try to consume a character reference:

  • && and & followed by whitespace is explicitly allowed and has no special meaning
  • & followed by a word which is not a valid character reference name is a parse error, and has no special meaning
  • & followed by a valid character reference name should be interpreted as a character reference, whether or not followed by a semicolon; if it isn't followed by a semicolon then it is also a parse error.

(The WhatWG spec is a little more confusingly worded but I think it says the same.)

Because && is explicitly allowed, this is a fringe issue since you'd have to use bitwise-and in some dynamic property to encounter it.

It also seems there is an unrelated issue with Remex and/or the PHP DOM implementation as it issues an unterminated entity reference & 'sm' error when encountering 'desktop' && 'sm', but per the rules above && should not be interpreted as the start of a character reference (and even a single & shouldn't be, when followed by space).

All that said, bare ampersands in attributes remain ambiguous: to give a very contrived example, the attribute string x&lt+1 (which is valid Javascript) should be parsed into the attribute value x<+1 (also valid Javascript, with an entirely different meaning). It would then get reserialized as x&lt;+1 for use as a ResourceLoader asset, after Remex has split the .vue DOM into CSS/JS/template parts – a third different valid Javascript snipper.

In practice this usually shouldn't be an issue because the JS coding style mandates space around operators (although do we actually lint .vue files? And does the linter know to treat the right attributes as Javascript?), but could still happen with e.g. a double-quoted attribute containing a single-quoted Javascript literal containing an ampersand.

As @Sgs says, the more important question is probably how the Vue template parser treats ampersands in attributes.

Here's a simple test case: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/869336 which has <layout :expression="x&lt+1"> in a template.

It results in module.exports.template="<layout :expression=\"x<+1\"><\/layout>";, with or without the patch in T324408#8480065.

I think that's wrong: the template string will be parsed by Vue, it should be left to that parser what to do with ampersands.

php > $attrNode = new DOMAttr('test');
php > $attrNode->value = '&amp;';
php > print $attrNode->value;
&

Incredible and alarming. The manual points to the W3C spec -- @cscott, am I reading it correctly? This behaviour is explicitly disallowed, right?

Agreed that PHP is not doing the right thing here.

Change 869324 merged by jenkins-bot:

[mediawiki/libs/RemexHtml@master] Workaround for attribute expansion when setting DOMAttr::$value

https://gerrit.wikimedia.org/r/869324

Change 870740 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/870740

Change 870739 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/870739

Change 870740 merged by jenkins-bot:

[mediawiki/vendor@master] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/870740

Change 870739 merged by jenkins-bot:

[mediawiki/core@master] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/870739

Thanks for the fix @tstarling! Happy to not have to check this at code review/lint level. We'll close the issue after wmf.17 reaches all wikis.

Change 876028 had a related patch set uploaded (by Reedy; author: C. Scott Ananian):

[mediawiki/vendor@REL1_39] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/876028

Change 876046 had a related patch set uploaded (by Reedy; author: C. Scott Ananian):

[mediawiki/core@REL1_39] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/876046

Change 876028 merged by Reedy:

[mediawiki/vendor@REL1_39] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/876028

Change 876046 merged by jenkins-bot:

[mediawiki/core@REL1_39] Update wikimedia/remex-html to 3.0.3

https://gerrit.wikimedia.org/r/876046

Etonkovidova subscribed.

Checked wmf.18 - no errors were recorded after Jan 5, 2023 @ 19:51:12.209 logstash normalized_message:"unterminated entity reference".

  NODES
INTERN 1
Note 8
Project 11