User Details
- User Since
- Jan 6 2022, 7:27 PM (155 w, 6 d)
- Availability
- Available
- LDAP User
- Marco Fossati
- MediaWiki User
- MFossati (WMF) [ Global Accounts ]
Mon, Dec 23
Moving to needs design, discussion with engineers needed.
FYI @Sneha you can safely ignore this ticket.
Almost no search queries on Commons contain custommatch:depicts_or_linked_from:
python def collect_searches(spark): initial_query = """SELECT http, params FROM event.mediawiki_cirrussearch_request WHERE database='commonswiki' AND params IS NOT NULL """ ddf = spark.sql(initial_query) filtered = ( ddf .where( ddf.params.title.contains('Special:Search') | ddf.params.title.contains('Special:MediaSearch') ) .where( ddf.http.request_headers.referer.contains('index.php') ) )
We're focusing here on the following weighted tags that go to the Commons search index:
- image.linked.from.wikidata.p18/QID|SCORE
- image.linked.from.wikidata.p373/QID|SCORE
- image.linked.from.wikipedia.lead_image/QID|SCORE
where QID is a Wikidata item and SCORE is computed in commonswiki_file.py.
Looks like one AC is missing, moving back to ready for dev.
Fri, Dec 20
@BTullis this is done from the Structured Content team's side, so I'm removing tags.
Review done: https://gitlab.wikimedia.org/repos/structured-data/upload-tracking/-/merge_requests/3
Moving back to ready for dev
Thu, Dec 19
Wed, Dec 18
@Etonkovidova , it's merged.
Correct, I can confirm that.
Tue, Dec 17
I think we're doing that for pretty much all text inputs in the describe step, so different behavior in the release right step seems odd to me.
But are you saying that it would not clear the error as well if the user fixes the error?
No, errors are correctly cleared.
Moving to code review, but looking at Commons weighted tags usage in the meanwhile.
@Sneha , @matthiasmullie: both in own work and 3rd party, the custom license text inputs don't display errors as the user types, only when they hit the next button.
I think this should go to a different ticket, though.
Wed, Dec 11
Tue, Dec 10
@Htriedman I'll let you update https://gitlab.wikimedia.org/repos/security/differential-privacy/-/blob/main/.gitlab-ci.yml, CC @BTullis .
Mon, Dec 9
Filter out Commons while we figure out the importance of its weighted tags.
For 2024-11-25 snapshot we didn't have wmf.mediawiki_wikitext_current/snapshot=2024-11, so SLIS skipped. The SLIS sensor correctly failed, and the ALIS DAG completed, effectively shipping ALIS with no SLIS:
isu = spark.read.table('analytics_platform_eng.image_suggestions_suggestions').where('snapshot="2024-11-25"') isu.where(isu.section_index.isNull()).count(), isu.where(isu.section_index.isNotNull()).count()
Fri, Dec 6
Published at https://meta.wikimedia.org/wiki/Machine_learning_models/Production/gogologo.
Closing.
Wed, Dec 4
Dec 2 2024
Nov 29 2024
Nov 28 2024
- Indent all the sub-question boxes (1 in own-work flow and 2 in not-own-work flow) as shown in the UI
@Sneha not sure about this one: what exactly needs to be changed, if anything? The current production Commons already has indented boxes, and the patch doesn't seem to change that. Or maybe I'm not seeing obvious differences.
Nov 27 2024
Nov 26 2024
Nov 22 2024
- Update the warning copy under "not own work" > q1 option "I don’t know if it is free to share" as shown in the UI.
People will be happy to assist you at Wikimedia Commons's Village Pump. Thank you. is in patch set 1, but doesn't seem to be in the design: https://www.figma.com/design/PSsy485pa5YAiMsUrcoOui/Commons-upload-wizard?node-id=4362-21818&t=xbQmRcRVbbtM3fDv-4
I'll remove it.
Nov 21 2024
All affected DAGs started today, closing.
Nov 20 2024
alis.groupBy('snapshot').count().orderBy('snapshot').toPandas()
snapshot count
0 2024-09-30 24284047
1 2024-10-07 24287195
2 2024-10-14 24290046
3 2024-10-21 24302041
4 2024-10-28 24329950
5 2024-11-04 24339009
Things we could do per wiki:
- [active wikis] compute a precision P based on user feedback, where P = accepted suggestions / ( accepted + rejected suggestions )
accepted = spark.sql("SELECT wiki, COUNT(is_accepted) AS accepted FROM event_sanitized.mediawiki_image_suggestions_feedback WHERE datacenter!='' AND year>=2022 AND month>0 AND day>0 AND hour<24 AND is_accepted=true GROUP BY wiki ORDER BY wiki").toPandas() rejected = spark.sql("SELECT wiki, COUNT(is_rejected) AS rejected FROM event_sanitized.mediawiki_image_suggestions_feedback WHERE datacenter!='' AND year>=2022 AND month>0 AND day>0 AND hour<24 AND is_rejected=true GROUP BY wiki ORDER BY wiki").toPandas() df = accepted.merge(rejected) df['precision'] = df.accepted / (df.accepted + df.rejected) df.sort_values('precision', ascending=False)
Nov 19 2024
one issue I see here is: if we keep skipping small changes (when runs aren't skipped) then we'll always end up in huge updates, no?
First thought off the top of my head is that Wikipedias get arrays of size 1 with exists|1 boolean tags, while Commons get arrays of size N with Wikidata item|score ones, which may be subject to a higher variation depending on Wikidata and how those scores are computed.