Page MenuHomePhabricator

Data-Engineering (Q1 2024 July 1st - September 30th)Milestone
ArchivedPublic

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Kanban style board for work scheduled for Q1 FY24-25

Recent Activity

Nov 26 2024

gmodena added a comment to T363587: [Event Platform] Instrument EventBus with prometheus MW Statslib.

I have reverted that patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1097995) because the change got merged while I was doing the MediaWiki train deployment. I did not want to deploy the config change given I don't know what it is about.

Nov 26 2024, 1:43 PM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board), Event-Platform
hashar reopened T363587: [Event Platform] Instrument EventBus with prometheus MW Statslib as "Open".

Change #1062430 merged by jenkins-bot:

[operations/mediawiki-config@master] config: remove eventbus instrumentation setting

https://gerrit.wikimedia.org/r/1062430

Nov 26 2024, 1:25 PM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board), Event-Platform
Maintenance_bot removed a project from T363587: [Event Platform] Instrument EventBus with prometheus MW Statslib: Patch-For-Review.
Nov 26 2024, 11:31 AM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board), Event-Platform
gerritbot added a comment to T363587: [Event Platform] Instrument EventBus with prometheus MW Statslib.

Change #1062430 merged by jenkins-bot:

[operations/mediawiki-config@master] config: remove eventbus instrumentation setting

https://gerrit.wikimedia.org/r/1062430

Nov 26 2024, 10:47 AM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board), Event-Platform

Nov 4 2024

Ottomata added a comment to T367403: Validate CI integration so that Ci can release Maven artifacts on user's demand.

There was never an answer to some questions there.

Nov 4 2024, 3:24 PM · Discovery-Search (Current work), Release-Engineering-Team (Radar), Data-Engineering (Q1 2024 July 1st - September 30th), Java-Scala-Standardization, Data-Platform-SRE

Oct 31 2024

EBernhardson added a comment to T367403: Validate CI integration so that Ci can release Maven artifacts on user's demand.

I'm looking at this in reference to releasing some projects, initially search/extra, search/extra-analysis, and search/highlighter, from gerrit. I see we didn't test releasing from gerrit yet. I suppose my main question is where should gerrit repos publish in gitlab? It seems either we create a project per repo, or we use a shared repo. I'm not fully versed in gitlab, but the access tokens i've used so far are per-repo. Shared repo would mean shared secret instead of per-project. Also i believe access tokens are max 1 year, so they will need rotation.

Oct 31 2024, 9:11 PM · Discovery-Search (Current work), Release-Engineering-Team (Radar), Data-Engineering (Q1 2024 July 1st - September 30th), Java-Scala-Standardization, Data-Platform-SRE

Oct 23 2024

Ahoelzl closed T373633: Allow maxLength changes for json schema compatibility as Resolved.
Oct 23 2024, 8:32 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl edited projects for T373633: Allow maxLength changes for json schema compatibility, added: Data-Engineering (Q1 2024 July 1st - September 30th); removed Data-Engineering.
Oct 23 2024, 8:31 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
Ottomata added a subtask for T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies: T378000: Write documentation on usage of RestExternalTaskSensor.
Oct 23 2024, 5:42 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)

Oct 11 2024

Antoine_Quhen updated subscribers of T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies.
Oct 11 2024, 4:22 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)
gmodena updated the task description for T363587: [Event Platform] Instrument EventBus with prometheus MW Statslib.
Oct 11 2024, 9:11 AM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board), Event-Platform

Oct 9 2024

Maintenance_bot removed a project from T372768: [BUG] MediawikiPageContentChangeEnrichAvailability is firing : Patch-For-Review.
Oct 9 2024, 2:30 PM · Dumps 2.0 (Kanban Board), Event-Platform, Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl archived Data-Engineering (Q1 2024 July 1st - September 30th).
Oct 9 2024, 2:05 PM
Ahoelzl closed T370428: PHP Warning: Invalid argument supplied for foreach() in EventBus.php as Resolved.
Oct 9 2024, 2:05 PM · Data-Engineering (Q1 2024 July 1st - September 30th), MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Data-Platform, Event-Platform, Wikimedia-production-error
Ahoelzl closed T372899: Ingest a test hive database into datahub as Resolved.
Oct 9 2024, 2:05 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data-Catalog, Data Pipelines
Ahoelzl closed T360968: [Developer Experience] [SPIKE] Investigate process to automate deployment of folders and artifacts to HDFS as Resolved.
Oct 9 2024, 2:05 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Release-Engineering-Team, Spike
Ahoelzl closed T365005: Evaluate ESC and explore an alternative design. as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Event-Platform
Ahoelzl closed T366627: [MPIC] Analyse risk of potential performance issues with static approach to stream configuration as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Experimentation Lab, Metrics Platform
Ahoelzl closed T370186: Decommission produce_canary_events systemd timer as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review, Event-Platform
Ahoelzl closed T367923: Event validation errors for mediawiki.page_change.v1 due to missing performer field on revision suppressions as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Event-Platform
Ahoelzl closed T370199: gobblin-wmf: bump event-utilities dependency to unblock MW on K8s migration. as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), MW-on-K8s
Ahoelzl closed T362785: Add host level instrumentation on webrequest as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review
Ahoelzl closed T372768: [BUG] MediawikiPageContentChangeEnrichAvailability is firing as Resolved.
Oct 9 2024, 2:04 PM · Dumps 2.0 (Kanban Board), Event-Platform, Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl closed T362783: Add instrumentation for actor signatures as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review
Ahoelzl closed T367134: [Refine Refactoring] Changes to EventStreamConfig needed for scheduling Refine via airflow as Resolved.
Oct 9 2024, 2:04 PM · MW-1.43-notes (1.43.0-wmf.15; 2024-07-23), Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl closed T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies as Resolved.
Oct 9 2024, 2:04 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl closed T372014: Problem deploying - missing airflow_client dependency, a subtask of T369900: Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies, as Resolved.
Oct 9 2024, 2:04 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl closed T372014: Problem deploying - missing airflow_client dependency as Resolved.
Oct 9 2024, 2:04 PM · Dumps 2.0 (Kanban Board), Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl closed T369851: NEW BUG REPORT Mediawiki_history contains duplicate rows for some revisions as Resolved.
Oct 9 2024, 2:04 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Insights, Analytics-Data-Problem, Data-Platform
Ahoelzl closed T367403: Validate CI integration so that Ci can release Maven artifacts on user's demand as Resolved.
Oct 9 2024, 2:03 PM · Discovery-Search (Current work), Release-Engineering-Team (Radar), Data-Engineering (Q1 2024 July 1st - September 30th), Java-Scala-Standardization, Data-Platform-SRE
Ahoelzl closed T372456: Rollback haproxy feed automated ingestion as Resolved.
Oct 9 2024, 2:03 PM · Patch-For-Review, Event-Platform, Data-Engineering (Q1 2024 July 1st - September 30th)
Ahoelzl closed T363587: [Event Platform] Instrument EventBus with prometheus MW Statslib as Resolved.
Oct 9 2024, 2:03 PM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board), Event-Platform
Ahoelzl closed T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow as Resolved.
Oct 9 2024, 2:03 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Metrics, Movement-Insights
Ahoelzl closed T366562: [Event Platform] - Add schema CI test that array ensures properties with object types also enumerate object properties as Resolved.
Oct 9 2024, 2:03 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Event-Platform
Ahoelzl closed T361502: [Refine Refactoring] Define and implement a automated testing / comparison tool for config store configured datasets as Resolved.
Oct 9 2024, 2:03 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
Maintenance_bot removed a project from T369845: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment: Patch-For-Review.
Oct 9 2024, 1:31 PM · Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)
Ahoelzl added a project to T375527: NEW BUG REPORT - Issues in calculation logic for unique devices tables: Data-Engineering (Q1 2024 July 1st - September 30th).
Oct 9 2024, 1:19 PM · Experimentation Lab (Data Products (Data Products Sprint 21 🪂)), Data-Engineering (Q2 2024 October 1st - December 31th), Traffic, Data-Platform
CodeReviewBot added a comment to T369845: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment.

aqu merged https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary/-/merge_requests/11

Oct 9 2024, 12:50 PM · Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)
CodeReviewBot added a comment to T368787: Flink job to enrich reconciliation events.

gmodena updated https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-event-enrichment/-/merge_requests/84

Oct 9 2024, 12:16 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Patch-For-Review, Dumps 2.0 (Kanban Board)
gerritbot added a project to T368787: Flink job to enrich reconciliation events: Patch-For-Review.
Oct 9 2024, 11:40 AM · Data-Engineering (Q2 2024 October 1st - December 31th), Patch-For-Review, Dumps 2.0 (Kanban Board)
gerritbot added a comment to T368787: Flink job to enrich reconciliation events.

Change #1078923 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/deployment-charts@master] dse-k8s-services: content_history: version bump image.

https://gerrit.wikimedia.org/r/1078923

Oct 9 2024, 11:40 AM · Data-Engineering (Q2 2024 October 1st - December 31th), Patch-For-Review, Dumps 2.0 (Kanban Board)
CodeReviewBot added a project to T369845: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment: Patch-For-Review.

aqu opened https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary/-/merge_requests/11

Oct 9 2024, 10:34 AM · Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)

Oct 8 2024

Ottomata added a comment to T366836: Migrate Event Platform Schema Respositories to Gitlab.

!!! amazing!

Oct 8 2024, 5:29 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
Ottomata added a comment to T376144: Some Gobblin folders don't have `_IMPORTED` flags.

It isn't yet, but I think it will be? There are others that are ingested. mediawiki.page_change.v1, but it only has one partition.

Oct 8 2024, 5:19 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
Snwachukwu updated subscribers of T366836: Migrate Event Platform Schema Respositories to Gitlab.

The switchover has been done. The gerrit repositories are deprecated(set to read-only) and the schema servers have all been updated with the gitlab urls with @BTullis support.

Oct 8 2024, 4:54 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
JAllemandou added a comment to T376144: Some Gobblin folders don't have `_IMPORTED` flags.

Ah! I had forgotten :) But this st4ream is not ingested in hadoop.
Anyhow, if you agree with the approach, I can implement it.

Oct 8 2024, 4:41 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
Ottomata added a comment to T376144: Some Gobblin folders don't have `_IMPORTED` flags.

keyed-topics is theoretical for us as we don't use them

FWIW, we do use them:

Oct 8 2024, 4:31 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
Maintenance_bot removed a project from T366836: Migrate Event Platform Schema Respositories to Gitlab: Patch-For-Review.
Oct 8 2024, 4:30 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
JAllemandou added a comment to T376144: Some Gobblin folders don't have `_IMPORTED` flags.

Our algorithm about flagging folders is working at topic-partition-level for high-volume topics, and at topic-level for low-volume topics. It doesn't really make sense to mix them.
My suggested solution would be to discard low-volume topic-partitions from high-volume topics. I think this would cover the issue.

Oct 8 2024, 4:26 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
CodeReviewBot added a comment to T366836: Migrate Event Platform Schema Respositories to Gitlab.

ebysans merged https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-secondary/-/merge_requests/5

Oct 8 2024, 4:20 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
  NODES
Note 10
Project 14