Page MenuHomePhabricator

Design and merge the new tables of file tables
Open, MediumPublic

Description

There will be three new tables:

  • file
  • filerevision
  • deleted_files.
  • (more?)

Details of the schema needs to be hashed out, added, and merged. Preferably with POC so you can try read and write locally and see how it looks like.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ladsgroup triaged this task as Medium priority.Jun 20 2024, 10:26 PM
Ladsgroup moved this task from Triage to In progress on the DBA board.

deleted_files

Note currently we do not use a table to store deleted pages. One of solutions in T20493: RFC: Unify the various deletion systems represents deleted pages using one bit field, so there are no need for a deleted pages (or archive/deleted revisions) table. Similarly we can use a bit field to indicate whether a file is deleted. This will also have the benefit of keeping the (upcoming) file ID upon deletion and undeletion.

Per T28741#9912401, we may want a new table to stored normalized img_media_type, img_major_mime and img_minor_mime.

Per parent task needed columns are:
file table:

  • file_id
  • file_latest
  • file_name
  • file_type (normalized type)
  • file_delete

filerevision:

  • fr_id
  • fr_file
  • fr_archive_name (if can not be generated automatically)
  • fr_size
  • fr_width
  • fr_height
  • fr_bits
  • fr_description_id
  • fr_actor
  • fr_timestamp
  • fr_metadata
  • fr_type (normalized type)
  • fr_deleted (for revdel)
  • fr_sha1 (if we need to keep backwards compatibility)
  • fr_delete (for normal deletion, unless unified with revdel - see T20493)
  • fr_sha256
  • fr_perceptual_hash

This might affect some data we sqoop into HDFS and some of how we compute commons impact metrics or similar future metrics. We have to wait until a schema change is proposed to know for sure.

I had to deprioritize this for a bit to deal with the aftermath of the outages. I will get back to it next week.

Change #1091477 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [WIP] New schema of file tables

https://gerrit.wikimedia.org/r/1091477

Change #1100125 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [WIP] file: Basic support for writing to the new file tables

https://gerrit.wikimedia.org/r/1100125

Change #1100125 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [WIP] file: Basic support for writing to the new file tables

https://gerrit.wikimedia.org/r/1100125

Note: the proposed migration path seems not functionally separate deletion and revdel. See T20493#10389320 for why this is a bad idea for page.

I am aware. That's why the patch is WIP

  NODES
HOME 1
Idea 1
idea 1
Note 3
OOP 1
os 3