Page MenuHomePhabricator

Event validation errors for mediawiki.page_change.v1 due to missing performer field on revision suppressions
Closed, ResolvedPublic

Description

https://logstash.wikimedia.org/goto/950f2f72d0ea408c9cde46020aafc650

'' should have required property 'performer'

On first glance, Looks isolated to a handful of wikis. I see many for be.wikipedia.org.

These also mostly look like revision visibility changes.

e.g.

  "page_change_kind": "visibility_change",
...
    "is_content_visible": true,
    "is_editor_visible": false,
    "is_comment_visible": false,
...
  "prior_state": {
    "revision": {
      "is_editor_visible": true,
      "is_comment_visible": true
    }

Event Timeline

Hm.

I think I introduced this bug in T342487: [Event Platform] Actor performing suppression revealed publicly. We are not setting performer correctly, but we never made performer a non-required field in the event schema?

https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/page/change/1.1.0.yaml

Hm. Technically, removing required-ness of a field is a breaking change. jsonschema-tools will require a major version bump.

In this case, there is nothing in place that stops us from doing this as a major version bump and still using the same stream. The field types are not changes, its only the required-ness of performer that will change. I was just going to do this and submit mediawiki/page/change/2.0.0...but

We have versioned the mediawiki.page_change.v1 stream. The idea was that the stream version would match the allowed schema major versions. Again, this is just a convention (nothing enforces it), but I don't love the idea of putting events with schema version 2.0.0 in a stream that is API versioned for major version 1.

Options:

  • A. make a new v2 stream. YUCK no way.
  • B. Produce 2.0.0 events to v1 stream. Also yuck.
  • C. Do a minor version bump to 1.2.0 and skip the CI check in this case. I think we can add a CI exception for this rule in .jsonschema-tools skipSchemaTestCases.
  • D. In place edit the old schema version(s), making it as if performer was never required.

I'm leaning toward option C. It see like we've done this before for mediawiki/client/error

Change #1047549 had a related patch set uploaded (by Ottomata; author: Ottomata):

[schemas/event/primary@master] Make performer field optional in mediawiki/page/change/1.2.0

https://gerrit.wikimedia.org/r/1047549

Change #1047630 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/EventBus@master] Bump mediawiki/page/change schema version to 1.2.0

https://gerrit.wikimedia.org/r/1047630

C. Do a minor version bump to 1.2.0 and skip the CI check in this case. I think we can add a CI exception for this rule in .jsonschema-tools skipSchemaTestCases.

FWIW: +1 on this approach.

We _might_ break a convention anyway (event schema changes that break backward compatibility should have major version bump), but IMHO
we can be flexible with interpretation here. Since the performer compat break was accidental, this sound like a hotfix.

@Ottomata one thing to be mindful about is downstream consumers. eventutilites_python stream descriptors pin event schema versions:
https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/services/mw-page-content-change-enrich/values-main.yaml#L31

We might want to version bump that config too.

@Ottomata one thing to be mindful about is downstream consumers. eventutilites_python stream descriptors pin event schema versions:
https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/services/mw-page-content-change-enrich/values-main.yaml#L31

We might want to version bump that config too.

And we'll need to bump the producer's event schema version too. mediawiki.page_content_change.v1 shares the same event schema as mediawiki.page_change.v1. With the current version (1.1.0), enriched events missing performer would also be rejected.

We _might_ break a convention anyway (event schema changes that break backward compatibility should have major version bump)

Hm. I wonder if we should consider making changes to requiredness non breaking. This is not the first time this has happened.

Or, we could decide to not support requiredness at all. In Hive at least, we make everything nullable anyway. I dunno, probably not a good idea. Maybe we can just recommend not using required unless you are really really sure.

And ya good points about the enrichment job and schemas. Will do.

What I want to know is how I deployed T342487: [Event Platform] Actor performing suppression revealed publicly without doing this. I remember it was difficult for me to test this, since I didn't have admin rights to suppress things. But the reporter of that bug said it looked good to them?

Change #1047995 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] Bump page_change and page_content_change event schema versions to make performer optional

https://gerrit.wikimedia.org/r/1047995

What does this mean for consumers of the Hive table generated by this stream? Can any of these field be NULL now?

We make all fields NULLable by default in Hive tables.

But, for this specific one, yes a missing (null) performer will be allowed by the schema.

It will usually be set. But as T342487 describes, it will not be set in cases where the the actual 'revision deletion' AKA visibility change is being 'suppressed'. In those cases, the admin user doing the suppression is not public data.

NULLable by default in Hive tables.

Seems like in practice though, this is still a behavior change. I suggest we announce it.

What I want to know is how I deployed T342487: [Event Platform] Actor performing suppression revealed publicly without doing this

Ah! T342487 was about the mediawiki.revision-visibility-change change stream, for which performer was never required. We did not test how this would work for suppressions on the current page, which are captured in mediawiki.page_change.v1.

Seems like in practice though, this is still a behavior change. I suggest we announce it.

Good point. Will do.

Ottomata renamed this task from Event validation errors for mediawiki.page_change.v1 since 2024-03-20 to Event validation errors for mediawiki.page_change.v1 due to missing performer field on revision suppressions.Jun 20 2024, 3:15 PM
Ottomata claimed this task.

Change #1047549 merged by jenkins-bot:

[schemas/event/primary@master] Make performer field optional in mediawiki/page/change/1.2.0

https://gerrit.wikimedia.org/r/1047549

I've merged the new schema version and the EventBus produced event schema version. I wanted to merge EventBus change so it will go out with MW train next week.

I'll redeploy the Flink enrichment app on Monday. It should be safe to do this anytime, as it won't really affect the input or output format. I just don't want to do that deploy on a Friday :)

Change #1047630 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] Bump mediawiki/page/change schema version to 1.2.0

https://gerrit.wikimedia.org/r/1047630

I just tested that mediawiki.page_change.v1 still works in beta. I made an edit to https://meta.wikimedia.beta.wmflabs.org/wiki/Sandbox:otto1 and was able to verify that an event was emitted at https://stream-beta.wmflabs.org/v2/ui/#/?streams=mediawiki.page_change.v1

Change #1047995 merged by jenkins-bot:

[operations/deployment-charts@master] Bump page_change and page_content_change event schema versions to make performer optional

https://gerrit.wikimedia.org/r/1047995

I'll redeploy the Flink enrichment app on Monday. It should be safe to do this anytime, as it won't really affect the input or output format. I just don't want to do that deploy on a Friday :)

I think I was wrong about this! Because the schema version that the running app was using required performer, validation failed when producing the enriched page_content_change event, causing the app to die.

This didn't happen until today because the MW train rolled out to the wiki where this happens today! If I had deployed flink app on monday, this would have been fine.

Change #1050461 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] mw-page-content-change-enrich Bump image version to pick up latest schema repos

https://gerrit.wikimedia.org/r/1050461

Change #1050461 merged by Ottomata:

[operations/deployment-charts@master] mw-page-content-change-enrich Bump image version to pick up latest schema repos

https://gerrit.wikimedia.org/r/1050461

causing the app to die.

Why would it die instead of producing an error?

In this case Flink considered it a serialization error. I think, because the field was required, it was kept as null in the value. null could not be serialized to a required object.

For optional values, if a field is null, the serializer will remove it from the record altogether.

I sent this announcement to wikitech-l today:

Hello,

tl;dr

The performer field on events in the mediawiki.page_change.v1 stream is no longer > required. It will be omitted in certain cases.

If you have any code that expects the performer field to be present, please adapt > accordingly.


Last fall, the Wikimedia Data Engineering team fixed a bug in the > mediawiki.page_change.v1 (and other) streams. Since that fix, the performer field will > be omitted in events for which a wiki admin is doing a RevisionDelete with suppression.

This bug fix was not properly applied to the mediawiki/page/change event schema, causing > validation errors for these cases.

Since mid November 2023, mediawiki.page_change.v1 events caused by admin using ‘revision > suppression’ have been dropped.

We did not notice these errors, as the mediawiki.page_change.v1 stream only contains > RevisionDelete (AKA “revision visibility”) related changes if those changes are made to > the current revision of a page. Using RevisionDelete on the current revision of a page > is usually rare. This recently began happening more often on certain wikis, so we > noticed.

To work around the validation errors, we have made the performer field optional in the > mediawiki/page/change schema.

If you have any questions or issues, please create a Phabricator ticket and tag > Data-Engineering.

Thank you,

Andrew Otto

Wikimedia Data Engineering

  NODES
admin 4
Idea 3
idea 3
Note 3
Project 7
Verify 1