Page MenuHomePhabricator

Migrate jackbot from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/jackbot) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

JackPotte changed the task status from Open to In Progress.EditedOct 8 2022, 3:00 PM

Thank you for this message.
Actually I had already tried to migrate, the second March 2022 by testing:

toolforge-jobs run wt2 --command ./WT.sh --image tf-bullseye-std

instead of my current:

jsub -mem 1G -once -quiet -N WT "$HOME/WT.sh"

But it didn't do anything and toolforge-jobs list is empty after, still today.

The wt2.err contains:

python: not found
python3: not found
git: not found

wt2.out is empty.

So I'll have to dig a little bit harder...

Following https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python, I've got:

toolforge-jobs run bootstrap-venv --command "cd $PWD && ./bootstrap_venv.sh" --image tf-python39 --wait

ERROR: Could not find a version that satisfies the requirement apturl==0.5.2 (from versions: none)

I may need to change my app to fit to this platform...

After refreezing requirements.txt it's another story:

ERROR: Cannot install -r JackBot/requirements.txt (line 18) and PyYAML==5.4.1 because these package versions have conflicting dependencies.

Let's try v6.0: https://pypi.org/project/PyYAML/#history

After solving my pip conflicts, everything works on my PC, but my toolforge-jobs run on the server crashes on pycairo 1.16.2.

So I've updated it to its last version 1.21.0 but it persists. I'll have to downgrade it to fit to the server dependencies.

1.15.6 crashes on the server too.

1.13.0 crashes locally.

This was a Pywikipedia dependency, so I might have started by upgrading the framework...

With Pywikibot 7.7.0, I've removed the crashing useless pip dependencies, and it crashed on dbus-python.

Then, I've managed to make work the initialization by removing it from the dependencies:

toolforge-jobs run bootstrap-venv --command "cd $PWD && ./bootstrap_venv.sh" --image tf-python39 --wait

But my bot crashes after because of a dependency: No module names 'requests'.

And if I pip install in each job, it crashed on Running setup.py install for dbus-python: finished with status 'error'.

(temporarily resolved by removing the dbus-python==1.3.2with vim requirements.txt)

Now my bot works with toolforge-jobs run, but crashed when login is required.

By the past, the login on the host machine worked for the jsub job. I'm gonna have to search for a doc for that...

I've tried:

python3 pwb.py login -all -pass

from the host machine: same problem.

Then https://www.mediawiki.org/wiki/Manual:Pywikibot/BotPasswords/fr, it can login without any password, but in the pod it ends after five minutes with:

ERROR: Logged in as .... instead of 'JackBot'. Forcing re-login.
WARNING: No user is logged in on site wiktionary.fr
EOFError

Adding site.login() in the code does the same.

I think this is exactly the topic of https://phabricator.wikimedia.org/T248471 and https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_165#Pywikibot_login_doesn't_read_password_file._Help/more_doc?_(on_Toolforge)

So I've tried https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration/propose/oauth2, but it requires a callback URL for my cron (which has no HTTP server), unless I check a box with my login (so I have to reconnect with the bot).

https://doc.wikimedia.org/pywikibot/stable/search.html?q=oauth2 is empty too

https://www.mediawiki.org/wiki/Manual:Pywikibot/login.py describes the process, but not totally: ImportError: mwoauth is not installed

So I've tried to follow this guide locally to create a user-config.py for the pod, but
Error 10: Consumer is owner-only

Then, I've discovered https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth and it returns: UserWarning: config.authenticate["*.wiktionary.org"] has invalid value

Adding each language made it work, but with the same problem as the bot password after e few minutes (locally or on the server):

ERROR: Logged in as .... instead of 'JackBot'. Forcing re-login.
WARNING: No user is logged in on site wiktionary.fr
EOFError

The framework Pywikibot seems to have no Oauth2 refresh token handling.

In conclusion, https://www.mediawiki.org/wiki/Manual:Pywikibot/BotPasswords/fr worked (Pywikibot could re-login) if my user-config.py only contains the same account for all wikis.

But toolforge-jobs run says it fails, maybe because I should return something more at the end...

Now it says timed out 300 seconds waiting for job, so I'll have to test without --wait with a tail -f on the logs.

Now I can't use crontab -e anymore, because it automatically adds jsub after saving.

All my jsub crons have been moved to toolforge-jobs run --schedule

  NODES
Note 3
Project 1
USERS 1