Page MenuHomePhabricator

Level 2 headings are omitted from action=query extracts in the API on pages where discussion tools are enabled
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
All level 2 headings (Definition, Onset of illness, Blanching of rash) are not being returned by the API endpoint

What should have happened instead?:
Level 2 headings should be included in the extracts response

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

Are you reporting that this used to work but no longer does (In which case when was the last time it worked?), or has it always been this way and you are suggesting that the module should be changed to work as you describe?

yes exactly, used to work but no longer works

mdwiki.org runs and older version and it works there

There are no interesting changes between 643fb6c (what mdwiki uses) and current master of TextExtracts. So either its a configuration difference or a change in a different component

So theoretically both mdwiki and en wp should return the same results, BUT

Here is a page on en wp https://en.wikipedia.org/wiki/Wikipedia:VideoWiki/Dengue_fever
and the api en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&titles=Wikipedia:VideoWiki/Dengue_fever&explaintext=1&exsectionformat=wiki&redirects
only returns === sections, not == or =

The equivalent page on mdwiki is https://mdwiki.org/wiki/Video:Dengue_fever (they are not identical, but both have the three levels of headings)
and the equivalent api call https://mdwiki.org/w/api.php?action=query&format=json&prop=extracts&titles=Video:Dengue_fever&explaintext=1&exsectionformat=wiki&redirects
returns =, == and === sections.

I have made both the EN WP and MDWiki articles exactly the same. And they still return different results via the API.

Okay @Bawolff figured out the issue. It is something called DiscussionTools https://www.mediawiki.org/wiki/Help:DiscussionTools

You can add this functionality to an article, and when this is done it stops working NEWSECTIONLINK

https://en.wikipedia.org/w/index.php?title=User:Doc_James/Test&diff=1222436191&oldid=1222416242

The question is, is their a way to turn it off?

Bawolff renamed this task from Level 2 headings are omitted from action=query extracts in the API to Level 2 headings are omitted from action=query extracts in the API on pages where discussion tools are enabled.May 5 2024, 10:57 PM
Bawolff added a project: DiscussionTools.

Okay we appear to have this solved...

In terms of the solution, you could ask if this is the intended behavior of the extract api, but the workaround is to add the NOTALK to the {{Videowiki}} template. (correct me if I am wrong.)

matmarex claimed this task.
matmarex edited projects, added TextExtracts; removed DiscussionTools, MediaWiki-Action-API.
matmarex subscribed.

Hi, I saw this issue reported a few weeks ago (I watch DiscussionTools bug reports), but I didn't find the time to investigate until now, especially after I saw it was marked as resolved.

I regret to say that the bug was only tangentially related to DiscussionTools, and while adding __NOTALK__ to the pages won't hurt, the issue will reappear soon. The root cause is that the TextExtracts extension, which provides the action=query&prop=extracts API, removes all <div> tags from the page (see here), and we are in the process of adding such wrapper tags to headings (see Heading HTML changes). These changes happened on talk pages first (T314714), but deployments to all pages are in progress (T13555).

There is good news too though: it should be fairly easy to provide HTML without these wrappers to the extension, so that it will stop removing the headings.

Change #1048082 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/TextExtracts@master] ExtractFormatter: Rescue headings from being removed

https://gerrit.wikimedia.org/r/1048082

Change #1048082 merged by jenkins-bot:

[mediawiki/extensions/TextExtracts@master] ExtractFormatter: Rescue headings from being removed

https://gerrit.wikimedia.org/r/1048082

The fix will be deployed to Wikimedia wikis next week, on the usual schedule.

  NODES
Note 3
Project 7