mfossati (Marco Fossati)
Software Engineer, Structured Content

Projects

Calendar

User Details

User Since: Jan 6 2022, 7:27 PM (155 w, 6 d)
Availability: Available
LDAP User: Marco Fossati
MediaWiki User: MFossati (WMF) [ Global Accounts ]

Recent Activity
View All

Mon, Dec 23

mfossati moved T380389: [SPIKE] Investigate search index delta variations from Doing to Needs Design on the Structured-Data-Backlog (Current Work) board.

Moving to needs design, discussion with engineers needed.
FYI @Sneha you can safely ignore this ticket.

Mon, Dec 23, 5:47 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati added a comment to T380389: [SPIKE] Investigate search index delta variations.

Almost no search queries on Commons contain custommatch:depicts_or_linked_from:

python
def collect_searches(spark):
    initial_query = """SELECT http, params
    FROM event.mediawiki_cirrussearch_request
    WHERE database='commonswiki' AND params IS NOT NULL
    """
    ddf = spark.sql(initial_query)
    filtered = (
        ddf
        .where(
            ddf.params.title.contains('Special:Search') | ddf.params.title.contains('Special:MediaSearch')
        )
        .where(
            ddf.http.request_headers.referer.contains('index.php')
        )
    )

Mon, Dec 23, 5:45 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati added a comment to T380389: [SPIKE] Investigate search index delta variations.

We're focusing here on the following weighted tags that go to the Commons search index:

image.linked.from.wikidata.p18/QID|SCORE
image.linked.from.wikidata.p373/QID|SCORE
image.linked.from.wikipedia.lead_image/QID|SCORE

where QID is a Wikidata item and SCORE is computed in commonswiki_file.py.

Mon, Dec 23, 3:46 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati moved T381333: UploadWizard - improving errors for multiple uploads from Code Review to Ready for Development on the Structured-Data-Backlog (Current Work) board.

Looks like one AC is missing, moving back to ready for dev.

Mon, Dec 23, 2:03 PM · Patch-For-Review, Structured-Data-Backlog (Current Work), UploadWizard

Fri, Dec 20

mfossati closed T368931: Unify CI jobs, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.

Fri, Dec 20, 4:23 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati closed T368931: Unify CI jobs as Resolved.

Fri, Dec 20, 4:23 PM · Structured-Data-Backlog (Current Work), Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati removed projects from T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge: Structured-Data-Backlog (Current Work), Structured Data Engineering.

@BTullis this is done from the Structured Content team's side, so I'm removing tags.

Fri, Dec 20, 4:23 PM · Data-Platform-SRE

mfossati moved T370898: [L] Track commons deletion requests from Code Review to Ready for Development on the Structured-Data-Backlog (Current Work) board.

Review done: https://gitlab.wikimedia.org/repos/structured-data/upload-tracking/-/merge_requests/3
Moving back to ready for dev

Fri, Dec 20, 2:59 PM · Data-Engineering, Structured-Data-Backlog (Current Work)

mfossati moved T370106: Add option to cover the usecase of users uploading on behalf of someone else from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Fri, Dec 20, 9:44 AM · Patch-For-Review, MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

mfossati updated the task description for T331968: [XL] Let the model that learns section alignments consume section topics output.

Fri, Dec 20, 9:43 AM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions, Section-Topics

Thu, Dec 19

mfossati moved T381286: [S] Place user input's error messages closer to their relevant field from Code Review to Verify on Production on the Structured-Data-Backlog (Current Work) board.

Thu, Dec 19, 4:59 PM · MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

Wed, Dec 18

mfossati moved T380389: [SPIKE] Investigate search index delta variations from Code Review to Doing on the Structured-Data-Backlog (Current Work) board.

Wed, Dec 18, 9:36 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati moved T368931: Unify CI jobs from Code Review to Verify on Production on the Structured-Data-Backlog (Current Work) board.

Wed, Dec 18, 9:36 PM · Structured-Data-Backlog (Current Work), Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati moved T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge from Code Review to Verify on Production on the Structured-Data-Backlog (Current Work) board.

Wed, Dec 18, 9:35 PM · Data-Platform-SRE

mfossati added a comment to T370106: Add option to cover the usecase of users uploading on behalf of someone else.

In T370106#10411930, @matthiasmullie wrote:

Thanks for catching that, Elena. I've submitted a patch that should fix that.

@Etonkovidova , it's merged.

Wed, Dec 18, 6:07 PM · Patch-For-Review, MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

mfossati added a comment to T381286: [S] Place user input's error messages closer to their relevant field.

In T381286#10411129, @Sneha wrote:

we don't do it for other input fields such as Authors, AI, where you found it etc...

Correct, I can confirm that.

Wed, Dec 18, 4:28 PM · MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

Tue, Dec 17

mfossati added a comment to T381286: [S] Place user input's error messages closer to their relevant field.

In T381286#10410339, @Sneha wrote:

I think we only do that for date field right now.

I think we're doing that for pretty much all text inputs in the describe step, so different behavior in the release right step seems odd to me.

But are you saying that it would not clear the error as well if the user fixes the error?

No, errors are correctly cleared.

Tue, Dec 17, 6:24 PM · MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T380389: [SPIKE] Investigate search index delta variations from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Moving to code review, but looking at Commons weighted tags usage in the meanwhile.

Tue, Dec 17, 6:08 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati added a comment to T381286: [S] Place user input's error messages closer to their relevant field.

@Sneha , @matthiasmullie: both in own work and 3rd party, the custom license text inputs don't display errors as the user types, only when they hit the next button.
I think this should go to a different ticket, though.

Tue, Dec 17, 5:49 PM · MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T370106: Add option to cover the usecase of users uploading on behalf of someone else from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Tue, Dec 17, 11:59 AM · Patch-For-Review, MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

Wed, Dec 11

mfossati changed the status of T380389: [SPIKE] Investigate search index delta variations, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , from Open to In Progress.

Wed, Dec 11, 10:52 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati changed the status of T380389: [SPIKE] Investigate search index delta variations from Open to In Progress.

Wed, Dec 11, 10:52 AM · Structured-Data-Backlog (Current Work), Image-Suggestions

Tue, Dec 10

mfossati edited projects for T331968: [XL] Let the model that learns section alignments consume section topics output, added: Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog, Machine-Learning-Team.

Tue, Dec 10, 6:01 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions, Section-Topics

mfossati added a comment to T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge.

@Htriedman I'll let you update https://gitlab.wikimedia.org/repos/security/differential-privacy/-/blob/main/.gitlab-ci.yml, CC @BTullis .

Tue, Dec 10, 1:39 PM · Data-Platform-SRE

mfossati updated the task description for T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge.

Tue, Dec 10, 1:29 PM · Data-Platform-SRE

mfossati moved T368931: Unify CI jobs from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Tue, Dec 10, 1:29 PM · Structured-Data-Backlog (Current Work), Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati updated the task description for T368931: Unify CI jobs.

Tue, Dec 10, 1:29 PM · Structured-Data-Backlog (Current Work), Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati moved T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Tue, Dec 10, 1:26 PM · Data-Platform-SRE

Mon, Dec 9

mfossati moved T368931: Unify CI jobs from Incoming to Doing on the Structured-Data-Backlog (Current Work) board.

Mon, Dec 9, 5:57 PM · Structured-Data-Backlog (Current Work), Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati changed the status of T368931: Unify CI jobs, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , from Open to In Progress.

Mon, Dec 9, 5:57 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati changed the status of T368931: Unify CI jobs from Open to In Progress.

Mon, Dec 9, 5:57 PM · Structured-Data-Backlog (Current Work), Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati moved T380389: [SPIKE] Investigate search index delta variations from Incoming to Ready for Development on the Structured-Data-Backlog (Current Work) board.

Mon, Dec 9, 4:56 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati edited projects for T380389: [SPIKE] Investigate search index delta variations, added: Structured-Data-Backlog (Current Work); removed Structured-Data-Backlog.

Mon, Dec 9, 4:55 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati added a comment to T380389: [SPIKE] Investigate search index delta variations.

Filter out Commons while we figure out the importance of its weighted tags.

Mon, Dec 9, 4:52 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati closed T374434: [M] Decouple ALIS from SLIS, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.

Mon, Dec 9, 11:32 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati closed T374434: [M] Decouple ALIS from SLIS as Resolved.

For 2024-11-25 snapshot we didn't have wmf.mediawiki_wikitext_current/snapshot=2024-11, so SLIS skipped. The SLIS sensor correctly failed, and the ALIS DAG completed, effectively shipping ALIS with no SLIS:

isu = spark.read.table('analytics_platform_eng.image_suggestions_suggestions').where('snapshot="2024-11-25"')
isu.where(isu.section_index.isNull()).count(), isu.where(isu.section_index.isNotNull()).count()

Mon, Dec 9, 11:32 AM · Structured-Data-Backlog (Current Work), Image-Suggestions

Fri, Dec 6

mfossati updated the task description for T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge.

Fri, Dec 6, 5:07 PM · Data-Platform-SRE

mfossati changed the status of T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge, a subtask of T372417: Switch from miniconda to miniforge, from Open to In Progress.

Fri, Dec 6, 4:50 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), Patch-For-Review

mfossati changed the status of T379545: Update the DAGs on the platform_eng airflow instance to use miniforge instead of condaforge and mambaforge from Open to In Progress.

Fri, Dec 6, 4:49 PM · Data-Platform-SRE

mfossati closed T370759: [M] Create the logo detection model card, a subtask of T349641: [EPIC] MVP Logo machine detection on Commons, as Resolved.

Fri, Dec 6, 3:18 PM · OKR-Work, UploadWizard, Epic, Structured-Data-Backlog (Current Work)

mfossati closed T370759: [M] Create the logo detection model card as Resolved.

Published at https://meta.wikimedia.org/wiki/Machine_learning_models/Production/gogologo.
Closing.

Fri, Dec 6, 3:17 PM · Machine-Learning-Team, Structured-Data-Backlog (Current Work)

Wed, Dec 4

mfossati added a subtask for T340437: [EPIC] Image suggestions data pipelines maintenance : T381482: The `write_empty_search_index_delta` ALIS DAG task doesn't run if upstream sensors are skipped.

Wed, Dec 4, 11:20 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati added a parent task for T381482: The `write_empty_search_index_delta` ALIS DAG task doesn't run if upstream sensors are skipped: T340437: [EPIC] Image suggestions data pipelines maintenance .

Wed, Dec 4, 11:20 AM · Image-Suggestions, Structured-Data-Backlog

mfossati created T381482: The `write_empty_search_index_delta` ALIS DAG task doesn't run if upstream sensors are skipped.

Wed, Dec 4, 11:19 AM · Image-Suggestions, Structured-Data-Backlog

Dec 2 2024

mfossati added a subtask for T347298: [Epic] Upload wizard Release rights step improvements on Commons: T381287: [S] Reduce spacing below gray boxes.

Dec 2 2024, 4:32 PM · OKR-Work, Commons, UploadWizard, Epic, Structured-Data-Backlog (Current Work)

mfossati added a parent task for T381287: [S] Reduce spacing below gray boxes: T347298: [Epic] Upload wizard Release rights step improvements on Commons.

Dec 2 2024, 4:32 PM · CSS, Structured-Data-Backlog (Current Work), UploadWizard

mfossati created T381287: [S] Reduce spacing below gray boxes.

Dec 2 2024, 4:31 PM · CSS, Structured-Data-Backlog (Current Work), UploadWizard

mfossati added a subtask for T347298: [Epic] Upload wizard Release rights step improvements on Commons: T381286: [S] Place user input's error messages closer to their relevant field.

Dec 2 2024, 4:29 PM · OKR-Work, Commons, UploadWizard, Epic, Structured-Data-Backlog (Current Work)

mfossati added a parent task for T381286: [S] Place user input's error messages closer to their relevant field: T347298: [Epic] Upload wizard Release rights step improvements on Commons.

Dec 2 2024, 4:29 PM · MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

mfossati created T381286: [S] Place user input's error messages closer to their relevant field.

Dec 2 2024, 4:28 PM · MW-1.44-notes (1.44.0-wmf.12; 2025-01-14), Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T373568: [L] TypeError: Cannot read properties of undefined (reading 'data') from Code Review to Verify on Production on the Structured-Data-Backlog (Current Work) board.

Dec 2 2024, 2:05 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), JavaScript, Wikimedia-production-error, UploadWizard

mfossati moved T373567: [L] TypeError: event is undefined from Code Review to Verify on Production on the Structured-Data-Backlog (Current Work) board.

Dec 2 2024, 2:04 PM · Structured-Data-Backlog (Current Work), JavaScript, UploadWizard, Wikimedia-production-error

Nov 29 2024

mfossati moved T373568: [L] TypeError: Cannot read properties of undefined (reading 'data') from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Nov 29 2024, 3:22 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), JavaScript, Wikimedia-production-error, UploadWizard

mfossati moved T373567: [L] TypeError: event is undefined from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Nov 29 2024, 3:22 PM · Structured-Data-Backlog (Current Work), JavaScript, UploadWizard, Wikimedia-production-error

mfossati changed the status of T373568: [L] TypeError: Cannot read properties of undefined (reading 'data') from Open to In Progress.

Nov 29 2024, 2:57 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), JavaScript, Wikimedia-production-error, UploadWizard

mfossati changed the status of T373567: [L] TypeError: event is undefined from Open to In Progress.

Nov 29 2024, 11:04 AM · Structured-Data-Backlog (Current Work), JavaScript, UploadWizard, Wikimedia-production-error

mfossati moved T361055: [M] Restyle the release right step to match the improved describe step form from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Nov 29 2024, 10:59 AM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), UploadWizard

Nov 28 2024

mfossati added a comment to T361055: [M] Restyle the release right step to match the improved describe step form.

Indent all the sub-question boxes (1 in own-work flow and 2 in not-own-work flow) as shown in the UI

@Sneha not sure about this one: what exactly needs to be changed, if anything? The current production Commons already has indented boxes, and the patch doesn't seem to change that. Or maybe I'm not seeing obvious differences.

Nov 28 2024, 5:41 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), UploadWizard

mfossati changed the status of T370759: [M] Create the logo detection model card from Open to In Progress.

Nov 28 2024, 3:16 PM · Machine-Learning-Team, Structured-Data-Backlog (Current Work)

mfossati changed the status of T370759: [M] Create the logo detection model card, a subtask of T349641: [EPIC] MVP Logo machine detection on Commons, from Open to In Progress.

Nov 28 2024, 3:13 PM · OKR-Work, UploadWizard, Epic, Structured-Data-Backlog (Current Work)

mfossati moved T342358: [S] ApiUsageException: Invalid value for integer parameter "gsroffset". from Ready for Development to Code Review on the Structured-Data-Backlog (Current Work) board.

Nov 28 2024, 11:05 AM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), SDAW-MediaSearch, Unstewarded-production-error, Wikimedia-production-error

Nov 27 2024

mfossati moved T370259: UploadWizard does not warn of long title between Describe and Publish step from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Nov 27 2024, 3:27 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T366501: [M] Insert "custom license" field for own works in UploadWizard from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Nov 27 2024, 2:30 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), OKR-Work, Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T370103: [M] Update copy in several places on release right step for better education and understanding from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Nov 27 2024, 2:30 PM · Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T374165: [M] The warning style is incorrect under the "not-own-work" flow in upload wizard from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Nov 27 2024, 2:30 PM · Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T375790: [M] Add "I don't know" option to a list of licenses in not-own-work flow from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Nov 27 2024, 2:30 PM · UploadWizard, Structured-Data-Backlog (Current Work)

mfossati moved T371050: [M] Improve the custom license field for "not own work" in the upload wizard from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Nov 27 2024, 2:30 PM · UploadWizard, Structured-Data-Backlog (Current Work)

mfossati moved T375494: [M] Improve the copyright question in the not-own-work from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.

Nov 27 2024, 2:30 PM · Structured-Data-Backlog (Current Work), UploadWizard

Nov 26 2024

mfossati changed the status of T370259: UploadWizard does not warn of long title between Describe and Publish step from Open to In Progress.

Nov 26 2024, 10:25 AM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), UploadWizard

mfossati changed the status of T370259: UploadWizard does not warn of long title between Describe and Publish step, a subtask of T358765: [EPIC] Describe step UX improvements in the UW on Commons, from Open to In Progress.

Nov 26 2024, 10:24 AM · Epic, UploadWizard, Structured-Data-Backlog (Current Work)

Nov 22 2024

mfossati added a comment to T370103: [M] Update copy in several places on release right step for better education and understanding.

@Sneha @matthiasmullie :

Update the warning copy under "not own work" > q1 option "I don’t know if it is free to share" as shown in the UI.

People will be happy to assist you at Wikimedia Commons's Village Pump. Thank you. is in patch set 1, but doesn't seem to be in the design: https://www.figma.com/design/PSsy485pa5YAiMsUrcoOui/Commons-upload-wizard?node-id=4362-21818&t=xbQmRcRVbbtM3fDv-4
I'll remove it.

Nov 22 2024, 6:04 PM · Structured-Data-Backlog (Current Work), UploadWizard

mfossati moved T370105: [S] Remove the copyright confirmation checkbox from the not-own work flow from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Nov 22 2024, 1:11 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), UploadWizard

mfossati changed the status of T370105: [S] Remove the copyright confirmation checkbox from the not-own work flow from Open to In Progress.

Nov 22 2024, 11:46 AM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), UploadWizard

mfossati changed the status of T370105: [S] Remove the copyright confirmation checkbox from the not-own work flow, a subtask of T347298: [Epic] Upload wizard Release rights step improvements on Commons, from Open to In Progress.

Nov 22 2024, 11:46 AM · OKR-Work, Commons, UploadWizard, Epic, Structured-Data-Backlog (Current Work)

mfossati changed the status of T370104: [S] Hide "this is for my personal use" question for auto-confirmed users from Open to In Progress.

Nov 22 2024, 11:18 AM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Structured-Data-Backlog (Current Work), UploadWizard

mfossati changed the status of T370104: [S] Hide "this is for my personal use" question for auto-confirmed users, a subtask of T347298: [Epic] Upload wizard Release rights step improvements on Commons, from Open to In Progress.

Nov 22 2024, 11:17 AM · OKR-Work, Commons, UploadWizard, Epic, Structured-Data-Backlog (Current Work)

Nov 21 2024

mfossati closed T350012: Schedule all data pipeline DAGs on Thursdays as Resolved.

All affected DAGs started today, closing.

Nov 21 2024, 5:23 PM · Structured-Data-Backlog (Current Work), Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati closed T350012: Schedule all data pipeline DAGs on Thursdays, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.

Nov 21 2024, 5:22 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

• dcausse awarded T380389: [SPIKE] Investigate search index delta variations a Love token.

Nov 21 2024, 8:30 AM · Structured-Data-Backlog (Current Work), Image-Suggestions

Nov 20 2024

mfossati updated the task description for T380389: [SPIKE] Investigate search index delta variations.

Nov 20 2024, 5:57 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati updated the task description for T380389: [SPIKE] Investigate search index delta variations.

Nov 20 2024, 5:55 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati closed T370095: [SPIKE] Determine the best way to evaluate ongoing relevance of image suggestion results, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.

Nov 20 2024, 5:31 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati closed T370095: [SPIKE] Determine the best way to evaluate ongoing relevance of image suggestion results as Resolved.

Nov 20 2024, 5:31 PM · Image-Suggestions, Structured-Data-Backlog (Current Work)

mfossati added a comment to T372912: Migrate image recommendation to use page_weighted_tags_changed stream.

In T372912#10336830, @dcausse wrote:

in the worst case scenario possibly but this means the score would have to change always in the same direction over time. I suspect (and ideally this would have to be tested) that the variations can go up or down so the gain I hope can be significant but I'm not familiar with your data so I can't judge.

Opened T380389: [SPIKE] Investigate search index delta variations.

Nov 20 2024, 4:46 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Structured-Data-Backlog, Structured Data Engineering, Data-Engineering, Discovery-Search (Current work), CirrusSearch

mfossati added a subtask for T340437: [EPIC] Image suggestions data pipelines maintenance : T380389: [SPIKE] Investigate search index delta variations.

Nov 20 2024, 4:43 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati added a parent task for T380389: [SPIKE] Investigate search index delta variations: T340437: [EPIC] Image suggestions data pipelines maintenance .

Nov 20 2024, 4:43 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati created T380389: [SPIKE] Investigate search index delta variations.

Nov 20 2024, 4:43 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

mfossati closed T331522: [XL] Let section alignment consume section topics output as Resolved.

alis.groupBy('snapshot').count().orderBy('snapshot').toPandas()

snapshot     count

0 2024-09-30 24284047
1 2024-10-07 24287195
2 2024-10-14 24290046
3 2024-10-21 24302041
4 2024-10-28 24329950
5 2024-11-04 24339009

Nov 20 2024, 4:32 PM · Structured-Data-Backlog (Current Work), Section-Topics

mfossati closed T331522: [XL] Let section alignment consume section topics output, a subtask of T339120: [XL] Deduplicate code in section-topics, section-image-recs and image-suggestions, as Resolved.

Nov 20 2024, 4:32 PM · Structured-Data-Backlog, Section-Topics, Section-Level-Image-Suggestions, Image-Suggestions

mfossati closed T331522: [XL] Let section alignment consume section topics output, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.

Nov 20 2024, 4:32 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati added a comment to T370095: [SPIKE] Determine the best way to evaluate ongoing relevance of image suggestion results.

Things we could do per wiki:

[active wikis] compute a precision P based on user feedback, where P = accepted suggestions / ( accepted + rejected suggestions )

accepted = spark.sql("SELECT wiki, COUNT(is_accepted) AS accepted FROM event_sanitized.mediawiki_image_suggestions_feedback WHERE datacenter!='' AND year>=2022 AND month>0 AND day>0 AND hour<24 AND is_accepted=true GROUP BY wiki ORDER BY wiki").toPandas()
rejected = spark.sql("SELECT wiki, COUNT(is_rejected) AS rejected FROM event_sanitized.mediawiki_image_suggestions_feedback WHERE datacenter!='' AND year>=2022 AND month>0 AND day>0 AND hour<24 AND is_rejected=true GROUP BY wiki ORDER BY wiki").toPandas()
df = accepted.merge(rejected)
df['precision'] = df.accepted / (df.accepted + df.rejected)
df.sort_values('precision', ascending=False)

Nov 20 2024, 11:32 AM · Image-Suggestions, Structured-Data-Backlog (Current Work)

mfossati closed T380188: Unblock the image suggestions search index update pipeline, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.

Nov 20 2024, 11:00 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

mfossati closed T380188: Unblock the image suggestions search index update pipeline as Resolved.

Nov 20 2024, 11:00 AM · Structured-Data-Backlog (Current Work)

Nov 19 2024

mfossati added a comment to T372912: Migrate image recommendation to use page_weighted_tags_changed stream.

In T372912#10336760, @dcausse wrote:

to avoid sending too many updates we skip changes when the variation is within 20%, could this be something you could test and see how many updates we could save?

one issue I see here is: if we keep skipping small changes (when runs aren't skipped) then we'll always end up in huge updates, no?

Nov 19 2024, 5:00 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Structured-Data-Backlog, Structured Data Engineering, Data-Engineering, Discovery-Search (Current work), CirrusSearch

mfossati added a comment to T372912: Migrate image recommendation to use page_weighted_tags_changed stream.

In T372912#10336684, @dcausse wrote:

What's surprising is that "big" snapshots (which are assumed to be caused by missing upstream dependencies) can be ~40x bigger for commons but only 2x for enwiki.
@Cparle @mfossati is this something that can be understood and possibly fixed?

First thought off the top of my head is that Wikipedias get arrays of size 1 with exists|1 boolean tags, while Commons get arrays of size N with Wikidata item|score ones, which may be subject to a higher variation depending on Wikidata and how those scores are computed.

Nov 19 2024, 4:44 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Structured-Data-Backlog, Structured Data Engineering, Data-Engineering, Discovery-Search (Current work), CirrusSearch

mfossati updated the task description for T372912: Migrate image recommendation to use page_weighted_tags_changed stream.

Nov 19 2024, 4:28 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Structured-Data-Backlog, Structured Data Engineering, Data-Engineering, Discovery-Search (Current work), CirrusSearch

mfossati changed the status of T370095: [SPIKE] Determine the best way to evaluate ongoing relevance of image suggestion results from Open to In Progress.

Nov 19 2024, 3:55 PM · Image-Suggestions, Structured-Data-Backlog (Current Work)

mfossati changed the status of T370095: [SPIKE] Determine the best way to evaluate ongoing relevance of image suggestion results, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , from Open to In Progress.

Nov 19 2024, 3:54 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions