This is a somewhat complex problem so bear with me, I'll try to
describe it as concisely as possible.
Background:
I'm in the process of updating a large (20k pages), old (started ca
2004) MediaWiki site. It was running MediaWiki 1.18.3 and I thought a
good first step would be to update the site to the 1.19.x LTS version,
which I did (with the plan of moving to 1.23.x in March). Extensions
were updated as well. Everything seemed stable and pre-upgrade backups
were not kept after a month of no problem reports (big mistake, I
know!) There are nightly backups but only a week's worth are kept for
storage reasons. However, I have located some very old (3 year+)
database backups.
The site admins use a MW extension called "merge-and-delete" to deal
with spammers. There is a permanent, blocked user called "spammer" and
any time a new user account is created by a spammer, the editors merge-
and-delete that account into the "spammer" account. There are < 100
real editors on the site and this process kept them from being
overwhelmed by the thousands of spammers creating accounts on the site
in recent years.
The Problem
About a month after the 1.19.x update, a problem was discovered.
Somehow, all edits dated 2011 or earlier were altered such that they
are now credited to the "spammer" user rather than the actual user who
made the edits. The cause of the corruption appears to be a bug/problem
related to the merge-and-delete extension and MW 1.19. I'm not seeking
help for that here - I have turned off and removed the extension.
My problem is: how to restore the edit history so that edits are
credited to the correct users again. The timestamp and page names of
all edits are still correct, only the user name was corrupted. But the
only backup old enough to have uncorrupted edit history data is years
old and from a much older version of MW (maybe 1.16). And I need to fix
the problem without losing years of edits.
The Solution?
I can't see any easy fix for this. But I have thought of an approach
that might work. If I write a bot that reads the very old database,
extracts only <=2011 edit history, and then compares that data to the
corrupted live site, perhaps it could work its way through the edit
history making corrections. Does this seem plausible? And, if so, any
advice on what to look out for? If anyone has alternate suggestions,
I'm up for entertaining just about any idea on how to fix this.
-Steve