Page MenuHomePhabricator

Emit PageUpdatedEvents on all occasion the page content changes.
Open, In Progress, MediumPublic

Description

MediaWiki should emit PageUpdatedEvents whenever the content of a page changes. Besides the obvious case of page edits edit, the following actions change page content:

  • undeleting pages/revisions (see the UndeletePage class)
  • changing the content model (see ContentModelChange class) [currently implicit, through WikiPage::doUserEditContent]
  • rollbacks (see RollbackPage and McrUndoAction classes) [currently implicit, through PageUpdater::saveRevision]
  • moving/renaming pages (see the MovePage class) - or should they? We are not changing page content, just the title of the page. But anything that associates derived data with that title needs to re-generate, so this is probably needed. On the other hand, the event doesn't contain the previous title, so ther ewould not be sufficient information to remove the old derived data...
  • importing pages/revisions (see the ImportableOldrevisionImporter class) - unless importing only old revisions of an existing page.
  • uploading media files (see the LocalFile class) - at least when first creating the file page. Uploads of new versions of the file only create dummy revisions. [currently implicit, through WikiPage::doUserEditContent]
  • MAYBE purges (see ApiPurge and maybe also PurgeAction) - they currently don't trigger all updtes, but perhaps they should?!
  • MAYBE restriction changes, per WikiPage::doUpdateRestrictions which fires RevisionFromEditComplete
NOTE: The RevisionFromEditComplete hook is already triggered in most of these cases. To be a viable replacement, PageUpdated needs to be emitted under the same conditions (except probably WikiPage::doUpdateRestrictions).
NOTE: We are creating dummy revisions in several places where we want to record a change to the revision history, e.g. when changing page restrictions (check for callers of WikiPage::updateRevisionOn). However, we generally do *not* want to emit PageUpdatedEvents in these places, because we are not changing page content. In contrast, when do need to emit PageUpdatedEvents on null edits, purges (!), and page moves (?).

There are further cases where we perform programmatic edits, e.g. DoubleRedirectJob, various maintenance scripts, and numerous extensions. They will already be emitting PageUpdated events, but could set additional flags to better signal to listeners how the edit should be handled. Survey callers of WikiPage::doUserEditContent and PageUpdater::saverevision to identify them.

Consolidating how we create dummy revisions would help: T198297: Use PageUpdater to create dummy revisions

Event Timeline

daniel updated the task description. (Show Details)

Change #1099368 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Ensure necessary updates are performed on import and undelete

https://gerrit.wikimedia.org/r/1099368

How does this task relate to T378934: Define domain events for page deletion and undeletion?

IMO: conceptually PageDelete,Undelete,Move etc. are PageUpdates. We named the changelog state change event 'mediawiki.page_change' (schema, docs)
rather than 'page updated' to avoid confusion between e.g. UPDATE vs DELETE.

(This also relates to the discussion about channels vs eventTypes.)

You could (and probably should?) model a PageDelete as a PageUpdate event, but perhaps you want to allow subscribers to choose to subscribe only to deletes. There are a few ways you could do this:

  • let listeners filter manually after being called (simplest solution!) - but this would require an (easy) way to differentiate the kind of change (delete/move etc.) an event is (I think I prefer this option)
  • Use channels, and on a page delete produce the same event to both e.g. a page_change and a page_delete channel. Subscribers choose which channel to listen to.
  • Make 'changelogKind' part of the DomainEvent interface, and allow subscription based on eventType and changelogKind. See also "page state changes and changelog kinds" section in the task description of T308017: Design Schema for page state and page state with content (enriched) streams.

Oh wait! Sorry I did not review https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1099368 before writing ^ (I was just trying to do some task organization).

You are sort of doing "an (easy) way to differentiate the kind of change" in that patch. I'll comment there.

HCoplin-WMF changed the task status from Open to In Progress.Wed, Dec 4, 2:12 PM
HCoplin-WMF triaged this task as High priority.

purges (see ApiPurge and maybe also PurgeAction).

Hm! Are these page updates? The page (and its revisions) aren't actually changing here, right? It is a materialized view/cached version (html, etc.) of the page that is changing?

daniel updated the task description. (Show Details)

My concern with some of these actions would be that it might become somewhat unclear what

PageUpdatedEvent.php
	/**
	 * Returns the user that performed the edit.
	 */
	public function getAuthor(): UserIdentity {
		return $this->newRevision->getUser( RevisionRecord::RAW );
	}

would/should return:

  • The user that performed the purge/changed the restriction, or the user that performed the last edit?
  • The user that imported the page or the user that saved the last revision at the source wiki?
  • The user that undeleted a revision, or the user that created that undeleted revision, or the user that created the latest revision of the article?

purges (see ApiPurge and maybe also PurgeAction).

Hm! Are these page updates? The page (and its revisions) aren't actually changing here, right? It is a materialized view/cached version (html, etc.) of the page that is changing?

The purpose of a purge is to re-generate any derived data. So purges are not page updates, but (most) listeners should behave as if they were page updates: any data they would have derived from that update should get regenerated, as if it was a fresh update.

For instance, if there is a listener that updates the search index every time a page changes, it should re-index the page when a purge is requested (using action=purge or a null edit or some such).

In contrast, a listener that creates entries in RecentChanges, generates Echo notifications, or increments the user edit count should ignore purges.

would/should return:

  • The user that performed the purge/changed the restriction, or the user that performed the last edit?
  • The user that imported the page or the user that saved the last revision at the source wiki?
  • The user that undeleted a revision, or the user that created that undeleted revision, or the user that created the latest revision of the article?

Yes, that is a fair point. I think getAuthor() should always return the user who created the new revision, because that's the meaning of "author".

But perhaps it's then redundant to have that method, and it would me more useful to have getPerformer() instead. In that case, the performer should indeed be the user who requested the import/undeletion/purge/etc.

Maybe it would be worth having both methods side by side, to make it obvious that there is a difference.

As I said on Slack, I think we shouldn't conflate page (content) updates with page state changes. Some page state change may also cause a change of the content, but most wouldn't. Each change in page state (create, delete, move, etc) should have a separate event that contains the appropriate information. PageUpdated is for things that actually change page content; trying to model page deletion with it will be awkward. Firing it on page moves is already dubious.

I'm starting the think the name of the event isn't great - perhaps PageContentUpdated would have been better.

I'm starting the think the name of the event isn't great - perhaps PageContentUpdated would have been better.

That would indeed make it clearer.

We are going to get funky with data model scope I think though, so we should really take some time to think about this and other event models.

E.g. if an edit comment is hidden on a page, would a PageContentUpdated event be fired? What if it is suppressed on a revision from 10 years ago?

So purges are not page updates, but (most) listeners should behave as if they were page updates: any data they would have derived from that update should get regenerated, as if it was a fresh update.

Okay, but this goes into our discussion about what a DomainEvent is, and about persisted state. In the case of a purge, there is no change to the page entity's persisted state; What is changing is a derived state (rendered html), and the thing that does the re-rendering (parsoid?) should emit the fact that the rendering has changed as a different event. Or, perhaps as a different 'cause'?

We shouldn't use DomainEvents as commands. A purge is a command to do something, not a representation about something that happened.

I'm now pretty confused about what the scope of PageUpdatedEvent is. What is the entity it is modeling updates to?

But perhaps it's then redundant to have that method, and it would me more useful to have getPerformer() instead. In that case, the performer should indeed be the user who requested the import/undeletion/purge/etc.

Maybe it would be worth having both methods side by side, to make it obvious that there is a difference.

This is what we did in the mediawiki/page/change schema. In the schema example, there is a top level performer field, and also a revision.editor (AKA here author?) field. In the case of edits, these are the same. In the case of e.g. revision visibility changes, they are different.

Change #1099368 merged by jenkins-bot:

[mediawiki/core@master] Ensure necessary updates are performed on import and undelete

https://gerrit.wikimedia.org/r/1099368

Change #1104599 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] PageUpdater: no PageUpdatedEvent on derived slot updates

https://gerrit.wikimedia.org/r/1104599

The obvious cases are done. Keeping this open to remind us to have another look at additional cases that may need covering.

daniel lowered the priority of this task from High to Medium.Wed, Dec 18, 8:15 PM
  NODES
Note 3
Project 3