Page MenuHomePhabricator

Krenair (Alex Monk)
Wikimedia volunteer

Projects (69)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 2:34 PM (534 w, 3 h)
Availability
Available
IRC Nick
Krenair
LDAP User
Alex Monk
MediaWiki User
Krenair [ Global Accounts ]

I am a Wikimedia volunteer helping in various technical ways. These days it's usually Beta Cluster, Cloud VPS, or Operations related labs puppet migrations. Since 2012 I've spent significant amounts of time involved in MediaWiki development, software deployments to the Wikimedia cluster, OTRS (email response to e.g. info-en@wikimedia.org addresses), and various other things.

Some of my old VisualEditor and other work (2014-2016) can be found under @AlexMonk-WMF instead.

I have opinions on things, which do not necessarily represent those of any organisation I am, have previously been, or will in the future be affiliated with.

Recent Activity

Jun 28 2024

Dzahn awarded T152882: Many misc wikis lack mobile domains a Like token.
Jun 28 2024, 5:04 PM · Patch-For-Review, Traffic-Icebox, SRE, DNS, Mobile

Jan 8 2024

Pppery awarded T35429: Make the title blacklist allow auto creations for global accounts a Dislike token.
Jan 8 2024, 3:08 AM · WMF-deploy-2015-07-14_(1.26wmf14), TitleBlacklist

Nov 18 2022

sbassett awarded T109147: Malicious meta admin can add javascript to https://office.wikimedia.org/api/ . Move api listing off wiki a Like token.
Nov 18 2022, 2:59 PM · Security, Wikimedia-Portals, Discovery-ARCHIVED, SRE, Vuln-XSS, Wikimedia-Site-requests

Jun 29 2022

Krenair updated the task description for T308013: Assign SPDX headers to puppet.git.
Jun 29 2022, 6:54 PM · Patch-For-Review, Infrastructure-Foundations, SRE

Apr 19 2022

Krenair added a comment to T305831: Cloud VPS: evaluate if VM name global uniqueness enforcement can be dropped.

Wow, vm-name.wmflabs must be from the extremely early days because I don't remember anything before vm-name.pmtpa.wmflabs, and that was in 2012.

Apr 19 2022, 9:20 PM · Cloud-VPS, cloud-services-team (Kanban)

Dec 28 2021

Krenair added a comment to T298353: problem with let's encrypt cert for star.tools.wmflabs.org.

I just went to have a look and it appears the cert in
/var/lib/acme-chief/certs/tools-legacy/live/rsa-2048.crt just got renewed
like a minute ago. Majavah I see you're logged in, did you do some magic?

Dec 28 2021, 8:31 PM · Acme-chief, Toolforge, cloud-services-team (Kanban)

Sep 22 2021

Krenair awarded T291398: Turn usage of AJAX interface to API Modules (1) a Barnstar token.
Sep 22 2021, 12:15 AM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Patch-For-Review, Platform Engineering Code Jam-2021, Platform Team Workboards (MW Expedition), Technical-Debt, Web-Team-Backlog (Tracking), Collection
Krenair awarded T291399: Turn usage of AJAX interface to API Modules (2) a Barnstar token.
Sep 22 2021, 12:15 AM · MW-1.38-notes (1.38.0-wmf.4; 2021-10-12), Patch-For-Review, Platform Team Workboards (MW Expedition), Platform Engineering Code Jam-2021, Technical-Debt, Collection, User-xSavitar

May 29 2021

Krenair awarded T283980: Phacility (Maintainer of Phabricator) is winding down. Upstream support ending. a The World Burns token.
May 29 2021, 11:15 PM · Release-Engineering-Team (Seen), User-Matthewrbowker, Phabricator

May 26 2021

Krenair awarded T280034: deploy Let's Encrypt certificates for additional fundraising services a Party Time token.
May 26 2021, 1:47 PM · fundraising-tech-ops

May 24 2021

Krenair added a comment to T280400: Change the user-visible domain of OTRS wiki.

Martin: Sounds good, thanks. If you re-run your WebAuthn query now you should get a more convenient result :)

May 24 2021, 6:22 PM · Patch-For-Review, User-Urbanecm, Wiki-Setup (Rename), Znuny

May 22 2021

Krenair added a comment to T283400: Rename OTRS IRC channels.

If we're going to do it, should probably be VRT rather than VRTS

May 22 2021, 1:15 PM · Znuny, wikimedia-irc-libera

May 12 2021

Krenair added a comment to T282624: Limit IA granting/revoking to stewards only.

I would like to know why bureaucrats should not be allowed to revoke IA
rights.

May 12 2021, 3:05 PM · Community-consensus-needed, Tech Ambassadors & Translators, [DEPRECATED] wdwb-tech, Chinese-Sites, Wikidata, Serbian-Sites, Commons, Wiktionary-fr, Stewards-and-global-tools, User-notice, Trust-and-Safety, Wikimedia-Site-requests

Apr 22 2021

Krenair added a comment to T280693: Update interwiki map on Meta.

Second one is part of and dependent on T280400

Apr 22 2021, 11:04 PM · MediaWiki-Interwiki, Znuny

Apr 19 2021

Krenair renamed T280400: Change the user-visible domain of OTRS wiki from Set up a new DNS name for OTRS wiki to Change the user-visible domain of OTRS wiki.
Apr 19 2021, 5:25 PM · Patch-For-Review, User-Urbanecm, Wiki-Setup (Rename), Znuny
Krenair assigned T280400: Change the user-visible domain of OTRS wiki to Keegan.
Apr 19 2021, 4:34 PM · Patch-For-Review, User-Urbanecm, Wiki-Setup (Rename), Znuny
Krenair added a comment to T280251: Upgrade mysql on db1107 (m2 db master).

Sorry, just saw this. I might not be the best of contacts for OTRS, am just an ordinary user with technical knowledge, not an admin or anything.

Apr 19 2021, 10:36 AM · SRE-tools, Recommendation-API, Performance-Team, Znuny, DBA

Apr 18 2021

Krenair added a comment to T280400: Change the user-visible domain of OTRS wiki.

tl;dr: I think we can do a wiki domain rename here:

  • add new name to dns
  • add new name to apache config
  • add to staticMappings
  • change these in mediawiki-config:
tests/multiversion/MWMultiVersionTest.php:                      [ 'otrs_wikiwiki', 'otrs-wiki.wikimedia.org' ],
tests/urls.txt:https://otrs-wiki.wikimedia.org/wiki/Main_page
wmf-config/CommonSettings.php:          'otrs-wiki.wikimedia.org',
wmf-config/CommonSettings.php:          'otrs-wiki.m.wikimedia.org',
wmf-config/InitialiseSettings.php:      'otrs_wikiwiki' => '//otrs-wiki.wikimedia.org',
wmf-config/InitialiseSettings.php:      'otrs_wikiwiki' => 'https://otrs-wiki.wikimedia.org',
wmf-config/InitialiseSettings.php:      'otrs_wikiwiki' => 'OTRS Wiki',
wmf-config/logos.php:   'otrs_wikiwiki' => '/static/images/project-logos/otrs_wikiwiki.png',
  • update the interwiki map on meta appropriately and go through the dumpInterwiki process
  • find and update any RB/Parsoid/etc. config needed
  • set up redirects for old name in apache
Apr 18 2021, 1:51 PM · Patch-For-Review, User-Urbanecm, Wiki-Setup (Rename), Znuny
Krenair added a comment to T280400: Change the user-visible domain of OTRS wiki.

Renaming a wiki is extremely complex and we have done it only once (b-x-old to be-tarask and that one is not also finished properly yet) and it's still quite a mess (probably the renaming script is also terribly broken by now) and all renames are blocked T172035: Blockers for Wikimedia wiki domain renaming

Apr 18 2021, 1:27 PM · Patch-For-Review, User-Urbanecm, Wiki-Setup (Rename), Znuny

Apr 13 2021

Krenair changed the status of T127570: Rename be_x_oldwiki database to be_taraskwiki from Resolved to Declined.
Apr 13 2021, 5:42 PM · SRE, DBA
Krenair added a comment to T279303: Migrate OTRS CE 6 to Znuny LTS fork.

Looks like it's all okay. Thanks Alexandros

Apr 13 2021, 8:31 AM · User-notice-archive, SRE, Security, Znuny
Krenair added a comment to T275294: ((OTRS)) Community Edition 6 is end-of-life; no FOSS replacement provided.

I can be around during that window I think.

Thanks!

Apr 13 2021, 7:06 AM · User-notice-archive, SRE, Security, Znuny

Apr 10 2021

Krenair added a comment to T279827: Ambiguous user-visible string "owner" mashes concepts of Assignee and Author.

I would assume the owner of a task is the assignee, not the author.

Apr 10 2021, 2:56 PM · Upstream, Phabricator (Upstream), Voice & Tone

Apr 7 2021

Krenair added a comment to T279486: Web proxies are resolved to internal IPs outside of WMCS network.

https://gerrit.wikimedia.org/r/c/openstack/horizon/wmf-proxy-dashboard/+/609859/1/wikimediaproxydashboard/views.py#b240 looks possibly suspect as it appears to conflate proxy external IP with backend internal IP but it's from July, might be it didn't get rolled out until recently?

Apr 7 2021, 12:07 AM · cloud-services-team (Kanban), Cloud-VPS

Apr 6 2021

Krenair awarded T278390: Toolforge root for Majavah a Like token.
Apr 6 2021, 11:15 PM · cloud-services-team (Kanban), Toolforge
Krenair added a comment to T275294: ((OTRS)) Community Edition 6 is end-of-life; no FOSS replacement provided.

You might be interested in something else we may need to do in the near future, move OTRS wiki to a new name. It doesn't have the Wikidata problem blocking it, so perhaps it's feasible? Anyway, that's a ticket to look out for eventually and it can be discussed there when it's created.

Apr 6 2021, 9:05 PM · User-notice-archive, SRE, Security, Znuny

Apr 5 2021

Krenair added a comment to T275294: ((OTRS)) Community Edition 6 is end-of-life; no FOSS replacement provided.

I would prefer to wait a week until this is published in Tech/News, I'm meeting with the admins and we're discussing and planning a larger re-branding effort. For now the migration is simply a re-branding our software internals, communicating this change needs to be done carefully to avoid confusion.

So, April 20th? Fine by me.

@akosiaris apologies, I was referring to waiting to publish the notice in this week's Tech/News instead of last week. I'd still like to do this on 13 April. Do you have time window we can schedule this for, and would you like me to make a task for the migration?

Oh, sorry my bad. I had reserved 2 hours on the 13th, from 07:00 UTC to 09:00 UTC, for the main migration. Judging from the input from Znuny, it should be sufficient, major issues aside. If any minor issues show up up after the migration, we can handle them outside that window of course.

+1 on the task. We probably want to have the most technically inclined agents aware and able to quickly provide input.

Apr 5 2021, 3:28 PM · User-notice-archive, SRE, Security, Znuny

Apr 2 2021

Krenair added a comment to T218729: Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster.

Thanks @Majavah

Apr 2 2021, 12:52 PM · Cloud-VPS (Debian Jessie Deprecation), Beta-Cluster-Infrastructure

Mar 26 2021

Andrew awarded T218729: Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster a Yellow Medal token.
Mar 26 2021, 1:22 PM · Cloud-VPS (Debian Jessie Deprecation), Beta-Cluster-Infrastructure

Feb 23 2021

Krenair added a comment to T275453: addWiki.php warns Deprecated: Premature access to HookContainer, ObjectFactory and ServiceContainer.

This brings back memories. Think it's long past time we get some tests
added so this script stops getting broken multiple times a year.

Feb 23 2021, 8:03 PM · MW-1.37-notes (1.37.0-wmf.14; 2021-07-12), Platform Team Workboards (Clinic Duty Team), Wiki-Setup, MediaWiki-extensions-WikimediaMaintenance

Feb 22 2021

Krenair added a comment to T275294: ((OTRS)) Community Edition 6 is end-of-life; no FOSS replacement provided.

@TonyBallioni's original comment before deletion:

Is there any non-open source proprietary software that will function well and we wouldn’t be picking for ideological reasons?

I’d strongly support one of those.

Feb 22 2021, 12:16 AM · User-notice-archive, SRE, Security, Znuny

Feb 19 2021

Krenair awarded T274953: Access group for Gitlab contractors a Dislike token.
Feb 19 2021, 8:20 PM · GitLab, User-brennen, SRE, SRE-Access-Requests

Feb 6 2021

Krenair added a comment to T273956: acme-chief sometimes doesn't refresh certificates because it ignores SIGHUP.

I think I've seen acme-chief not responding to SIGHUP as expected before in deployment-prep, I worry this could happen in prod too.

Feb 6 2021, 10:03 AM · User-bd808, User-dcaro, Acme-chief, cloud-services-team (Kanban)

Feb 5 2021

hashar awarded T138672: Duplicate LDAP user for cn=smccandlish a Love token.
Feb 5 2021, 5:54 PM · User-bd808, SRE, cloud-services-team (Kanban), LDAP-Access-Requests, Gerrit, LDAP, Phabricator

Jan 30 2021

Krenair added a comment to T258660: WebAuthn: signed in {some bogus number} times with this key.

Also ran into this, after first login I saw over 100 logins on this page. Either needs removal or clarification

Jan 30 2021, 8:45 PM · MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), MediaWiki-extensions-OATHAuth

Jan 17 2021

Krenair added a comment to T271778: Issues with acme-chief cert rotation on deployment-prep, 2021-01-12.

Unlikely

I was asking because T267006#6624466 (deployment-cache-upload06 is upload.beta.wmflabs.org I think?) and the problems started I think around T267858. But could be just coincidence.

Jan 17 2021, 7:33 PM · Beta-Cluster-Infrastructure, Acme-chief

Jan 14 2021

Krenair added a comment to T271778: Issues with acme-chief cert rotation on deployment-prep, 2021-01-12.

Unlikely

Jan 14 2021, 1:27 PM · Beta-Cluster-Infrastructure, Acme-chief
Krenair added a comment to T271808: The certificate for upload.beta.wmflabs.org expired on January 12, 2021..
root@deployment-cache-upload06:/etc/acmecerts/unified/live# openssl x509 -dates -noout -in rsa-2048.crt
notBefore=Jan 12 01:23:09 2021 GMT
notAfter=Apr 12 01:23:09 2021 GMT
root@deployment-cache-upload06:/etc/acmecerts/unified/live# touch /srv/trafficserver/tls/etc/ssl_multicert.config
root@deployment-cache-upload06:/etc/acmecerts/unified/live# systemctl reload trafficserver-tls.service

It should be up & running now.. I'm not really familiar with the cloud puppetization but this doesn't mimic production behaviour

Jan 14 2021, 12:31 AM · SRE, Traffic, HTTPS, Beta-Cluster-reproducible

Jan 12 2021

Dzahn awarded T252199: Stop using letsencrypt::cert::integrated a Like token.
Jan 12 2021, 9:32 PM · cloud-services-team (Kanban), Mail
Krenair added a comment to T271778: Issues with acme-chief cert rotation on deployment-prep, 2021-01-12.

re acme-chief part: It looks like the same thing happened to the mx and wikibase certs too. Haven't checked those updated on the machines that serve them.
Also spotted various prod ncredir certs in /etc/acme-chief/config.yaml that can't be doing any good.

Jan 12 2021, 2:37 AM · Beta-Cluster-Infrastructure, Acme-chief
Krenair updated the task description for T271778: Issues with acme-chief cert rotation on deployment-prep, 2021-01-12.
Jan 12 2021, 2:34 AM · Beta-Cluster-Infrastructure, Acme-chief
Krenair created T271778: Issues with acme-chief cert rotation on deployment-prep, 2021-01-12.
Jan 12 2021, 2:34 AM · Beta-Cluster-Infrastructure, Acme-chief
Krenair edited projects for T271644: Fatal exception undeleting a file on Commons: rev_page field must not be 0!, added: MediaWiki-Page-deletion; removed MediaWiki-Revision-deletion.
Jan 12 2021, 2:13 AM · MW-1.37-notes (1.37.0-wmf.7; 2021-05-25), Patch-For-Review, Platform Team Workboards (Clinic Duty Team), MediaWiki-Page-deletion, Wikimedia-production-error, Commons
Krenair added a comment to T207372: Add simple script for account creation.

well, ideally it would've been a script applicable to all installs of the
package, not just in wikimedia puppet.git

Jan 12 2021, 1:58 AM · Patch-Needs-Improvement, Acme-chief

Dec 5 2020

Krenair awarded T260614: Phase out use of .wmflabs tld a Burninate token.
Dec 5 2020, 11:57 PM · Cloud-VPS, cloud-services-team (Kanban)

Dec 1 2020

Krenair reopened T268978: String vs Binary issues while running the puppet compiler as "Open".

reopening too to ensure this gets looked at

Dec 1 2020, 2:00 AM · SRE, Puppet CI
Krenair set Security to security-bug on T268978: String vs Binary issues while running the puppet compiler.

Protecting as security issue due to presence of what appears to be a Jenkins API token in the task description, based on https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Catalog_compiler_local_run_(pcc_utility)

Dec 1 2020, 2:00 AM · SRE, Puppet CI

Nov 30 2020

Krenair added a comment to T268948: Add editprotected permission for interface-admin.

@Urbanecm: I can see it being interpreted either way - at the time this task was named for Wikipedias :) But I don't mind

Nov 30 2020, 3:48 AM · MediaWiki-General

Nov 29 2020

Krenair added a project to T268948: Add editprotected permission for interface-admin: Wikimedia-Site-requests.
Nov 29 2020, 6:10 PM · MediaWiki-General
Krenair added a comment to T268926: tools-sgeexec-0908.tools.eqiad1.wikimedia.cloud is misbehaving.

I just went and checked on this again and found I can SSH in, tasks are running on it, SGE has cleared the alarm/unreachable flags, based on the prometheus data it came back at 02:12:25 (after having stopped at 21:36:25), and according to uptime it hasn't been restarted.

Nov 29 2020, 6:04 PM · Tools
Krenair added a comment to T268893: [tools-sgecron-01] The server is getting out of space, daemon.log is growing a lot.

sudo service webservicemonitor restart has shut it up. Broken connection to LDAP/SSSD or something? I notice sssd has only been running since Tue 2020-11-24 18:06:07 UTC; 4 days ago, and zgrep collector-runner /var/log/syslog.3.gz | grep Traceback -C3 | head -n 300 reveals these exceptions took off only 34 seconds later. That file also shows puppet had just applied a config change and restarted sssd. Maybe we're missing a subscribe/notify relationship in puppet to have it restart webservicemonitor as well, or if that's awkward (do we still have some old sssd alternative lurking somewhere that's conditional in puppet?) then maybe we can make it detect this through monitoring the existence of some always-existing LDAP user, and when that fails, crash to have systemd restart it.

Nov 29 2020, 5:58 PM · Toolforge, cloud-services-team (Kanban)
Krenair added a comment to T268943: crontab: crontabs/tmp.YdH9kW: No space left on device.

No we just removed some stuff under the system's /var/log. It looks like /var/log/syslog for example had filled up with collector-runner exceptions, it had managed to generate over 9 million lines in 36 hours, like this:

Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]: 2020-11-29 17:42:11,517 Exception trying to validate / load tool grantmetrics
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]: Traceback (most recent call last):
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]:   File "/usr/lib/python3/dist-packages/tools/manifest/webservicemonitor.py", line 39, in from_name
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]:     user_info = pwd.getpwnam(username)
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]: KeyError: 'getpwnam(): name not found: tools.grantmetrics'
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]: During handling of the above exception, another exception occurred:
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]: Traceback (most recent call last):
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]:   File "/usr/lib/python3/dist-packages/tools/manifest/webservicemonitor.py", line 146, in collect
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]:     tool = Tool.from_name(toolname)
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]:   File "/usr/lib/python3/dist-packages/tools/manifest/webservicemonitor.py", line 42, in from_name
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]:     raise Tool.InvalidToolException("No tool with name %s" % (name,))
Nov 29 17:42:11 tools-sgecron-01 collector-runner[9414]: tools.manifest.webservicemonitor.Tool.InvalidToolException: No tool with name grantmetrics

It's still generating more so this will happen again at some point. It has 3.7G to burn through first though.

Nov 29 2020, 5:45 PM · Toolforge
Krenair updated subscribers of T268943: crontab: crontabs/tmp.YdH9kW: No space left on device.

Me and @Andrew removed some stuff, please try again. Note this is writing files on tools-sgecron-01 rather than whatever bastion you are logged on to, so a simple df won't show anything.

Nov 29 2020, 5:23 PM · Toolforge
Krenair added a comment to T268926: tools-sgeexec-0908.tools.eqiad1.wikimedia.cloud is misbehaving.

Some of the continuous jobs that were stopped (except anomie's) have issued root@ failure emails with errors like can't get password entry for user "tools.ket-bot" (I imagine that given how broken this instance is, LDAP connectivity is one of the issues), and there's another couple more 'problems with defaults entries' emails too. All around 01:12

Nov 29 2020, 1:41 AM · Tools
Krenair created T268926: tools-sgeexec-0908.tools.eqiad1.wikimedia.cloud is misbehaving.
Nov 29 2020, 12:03 AM · Tools

Nov 28 2020

Krenair added a comment to T268904: can't start webservices kubernetes.

Hi @Krenair sorry for bothering you, i just have a little question, shortly before i use --backend=kubernetes and worked fine, now i try to use --backend=gridengine, and it give me a massage:

Could not find a public_html folder or a .lighttpd.conf file in your tool home.

is that normal and i just need to set lighttpd.conf file or it's a problem need to be fixed? thx again.

I don't know much about the Grid Engine, sorry.

@Krenair that's fire thank you, and if can mention any body knows i'll be grateful, if u don't it's OK also.

Nov 28 2020, 6:05 AM · Toolforge, Kubernetes
Krenair added a comment to T268904: can't start webservices kubernetes.

Hi @Krenair sorry for bothering you, i just have a little question, shortly before i use --backend=kubernetes and worked fine, now i try to use --backend=gridengine, and it give me a massage:

Could not find a public_html folder or a .lighttpd.conf file in your tool home.

is that normal and i just need to set lighttpd.conf file or it's a problem need to be fixed? thx again.

Nov 28 2020, 5:51 AM · Toolforge, Kubernetes
Krenair closed T268904: can't start webservices kubernetes as Resolved.
Nov 28 2020, 5:14 AM · Toolforge, Kubernetes
Krenair added a comment to T248041: puppetdb on deployment-puppetdb03 keeps getting OOMKilled.
alex@alex-laptop:~$ ssh deployment-puppetdb03
Linux deployment-puppetdb03 4.19.0-11-amd64 #1 SMP Debian 4.19.146-1 (2020-09-17) x86_64
Debian GNU/Linux 10 (buster)
deployment-puppetdb03 is a PuppetDB server (puppetmaster::puppetdb (postgres master))
The last Puppet run was at Thu Nov 26 20:45:39 UTC 2020 (1890 minutes ago). 
Last puppet commit: 
Last login: Sun Jul 26 10:45:20 2020 from 172.16.1.136
krenair@deployment-puppetdb03:~$ sudo service puppetdb  status
● puppetdb.service - Puppet data warehouse server
   Loaded: loaded (/lib/systemd/system/puppetdb.service; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Thu 2020-11-26 21:15:48 UTC; 1 day 7h ago
     Docs: man:puppetdb(8)
           file:/usr/share/doc/puppetdb/index.markdown
 Main PID: 519 (code=killed, signal=KILL)
Nov 28 2020, 4:39 AM · Patch-For-Review, Developer Productivity, Puppet, Beta-Cluster-Infrastructure
Krenair claimed T268904: can't start webservices kubernetes.

@Mohnd_Kh: Try now

Nov 28 2020, 4:37 AM · Toolforge, Kubernetes

Nov 25 2020

Krenair changed the status of T190781: Secure deployment-prep sudo access to prevent privilege escalation by dns-manager credentials, a subtask of T182927: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains, from Invalid to Declined.
Nov 25 2020, 1:26 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Beta-Cluster-Infrastructure
Krenair changed the status of T190781: Secure deployment-prep sudo access to prevent privilege escalation by dns-manager credentials from Invalid to Declined.
Nov 25 2020, 1:26 AM · Beta-Cluster-Infrastructure
Krenair closed T190781: Secure deployment-prep sudo access to prevent privilege escalation by dns-manager credentials as Invalid.

Based on that task having been done, I think we can safely say the rest of this is fairly pointless. If you have membership on deployment-prep the most destructive stuff you could do is likely within the instances you're expected to have root on, DNS is likely the more easily recoverable part. If anything this permissions setup more closely relates to what a production root would be able to do.

Nov 25 2020, 1:25 AM · Beta-Cluster-Infrastructure
Krenair closed T190781: Secure deployment-prep sudo access to prevent privilege escalation by dns-manager credentials, a subtask of T182927: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains, as Invalid.
Nov 25 2020, 1:25 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Beta-Cluster-Infrastructure

Nov 22 2020

Krenair placed T219085: wmf_style check in puppet silently fails when it finds the addition of an error that was also already occurring in the same file up for grabs.
Nov 22 2020, 4:41 AM · Puppet-Core, Infrastructure-Foundations
Krenair closed T257968: Certificate for *.beta.wmflabs.org has expired (July 2020) as Resolved.

@Vgutierrez: Can we make it dynamically reload its code somehow? We should probably have another task for this if so, I'm resolving this one

Nov 22 2020, 4:38 AM · Beta-Cluster-Infrastructure
Krenair placed T220268: Consider ways to make puppetmaster CA changes smoother on the puppet client end up for grabs.
Nov 22 2020, 4:37 AM · Puppet-Infrastructure, Cloud-VPS, cloud-services-team, Infrastructure-Foundations
Krenair awarded T268393: UDP traffic throughput to instances in the "meet" Cloud VPS project not meeting expectations a Evil Spooky Haunted Tree token.
Nov 22 2020, 12:57 AM · cloud-services-team, Cloud-VPS, Wikimedia Meet

Nov 16 2020

Krenair closed T267858: The certificate for upload.beta.wmflabs.org expired on November 13, 2020. as Resolved.
Nov 16 2020, 7:55 PM · SRE, Traffic, HTTPS, Beta-Cluster-Infrastructure
Krenair added a comment to T267935: wikitech: INSERT command denied to user 'wikiuser'@'10.64.32.36' for table 'comment' (10.64.0.98).

It's coming from the job runner, given the values shown this is likely a
post distributed via MassMessage or something that expects to be able to
write cross-wiki, and a wikitech page was one of the expected destinations?
This probably should've been permitted

Nov 16 2020, 4:30 PM · wikitech.wikimedia.org, cloud-services-team (Kanban)
Krenair added a comment to T196248: TLS certificates renewal process.

I don't think we use certbot anywhere except maybe Gerrit.

Nov 16 2020, 12:50 AM · Documentation, Performance-Team (Radar), HTTPS, Traffic, SRE

Nov 14 2020

Krenair claimed T267858: The certificate for upload.beta.wmflabs.org expired on November 13, 2020..

@Vgutierrez FYI in case this could happen in prod too, I haven't been keeping track of changes lately. If we think it won't happen again or won't happen in prod (e.g. maybe it didn't restart because puppet is erroring somewhere in varnish code on this box?) then I guess we can close this

Nov 14 2020, 3:07 PM · SRE, Traffic, HTTPS, Beta-Cluster-Infrastructure
Krenair added a comment to T267858: The certificate for upload.beta.wmflabs.org expired on November 13, 2020..

For some reason I had to do a full restart of the trafficserver-tls service on the cache-upload06 VM but it has loaded the latest cert now:

root@deployment-cache-upload06:~# openssl s_client -connect upload.beta.wmflabs.org:443 2>/dev/null | openssl x509 -noout -text | grep After
            Not After : Jan 12 06:00:26 2021 GMT
Nov 14 2020, 3:01 PM · SRE, Traffic, HTTPS, Beta-Cluster-Infrastructure
Restricted Application added a project to T267858: The certificate for upload.beta.wmflabs.org expired on November 13, 2020.: SRE.

Cert was renewed:

root@deployment-acme-chief03:~# openssl x509 -in /var/lib/acme-chief/certs/unified/live/rsa-2048.crt -noout -text | grep After
            Not After : Jan 12 05:01:51 2021 GMT
Nov 14 2020, 2:56 PM · SRE, Traffic, HTTPS, Beta-Cluster-Infrastructure

Nov 13 2020

Krenair removed a watcher for Wikimedia-production-error: Krenair.
Nov 13 2020, 9:14 PM

Sep 19 2020

Krenair added a comment to T263328: Agents can view watched tickets outside of assigned queues.

This worked in the past on OTRS 5 with e.g. oversight queues. I assumed it
was deliberate - the most sensitive part of a ticket is almost always going
to be the first article and in this case the agent has already seen it.

Sep 19 2020, 5:19 PM · Znuny

Sep 14 2020

Krenair added a comment to T262816: The certificate for en.wikipedia.beta.wmflabs.org expired on 2020-09-14.

It's possible - if acme chief has got a new cert issued but the cache-text
box hasn't run puppet since, you'll see this. Check whether acme-chief has
a new one and if it does, fix puppet on cache-text. If not investigate why.
Am having lunch and then working again but I can look this evening if no
one has fixed it by then.

Sep 14 2020, 12:12 PM · Beta-Cluster-Infrastructure

Sep 2 2020

Krenair awarded T261900: Request for floating IP / DNS for gitlab-test.wmcloud.org a Like token.
Sep 2 2020, 9:10 PM · User-brennen, Release-Engineering-Team, GitLab-Test, Cloud-VPS (Quota-requests)
Krenair added a comment to T261900: Request for floating IP / DNS for gitlab-test.wmcloud.org.

Yeah that would work and is the mechanism that allows people direct access to e.g. the bastions and the tools login machines. The other potential option is just to require people using the test setup to use some custom SSH config to proxy through the bastions to get there.

Sep 2 2020, 9:10 PM · User-brennen, Release-Engineering-Team, GitLab-Test, Cloud-VPS (Quota-requests)

Aug 31 2020

Krenair awarded T261656: Grant merge rights (+2) on MediaWiki Core to Martin Urbanec a Like token.
Aug 31 2020, 4:52 PM · MediaWiki-Gerrit-Group-Requests

Aug 30 2020

Krenair added a comment to T261551: https://meet.wmflabs.org creates a redirect loop.

maybe you can look for an X-Forwarded-Proto: https header which I think the proxy should be setting? if it's set then treat the request as if you would on port 443, if it's not set than issue redirect?

Aug 30 2020, 1:07 AM · User-Ladsgroup, Wikimedia Meet

Aug 28 2020

Krenair added a comment to T251414: Support TLSv1.3 in IABot.

This is not something I believe I have control over.

Aug 28 2020, 11:42 PM · Traffic, InternetArchiveBot, SRE

Aug 24 2020

Krenair added a comment to T261133: Ban IP edits on pt.wiki.

Note to those I see in the ptwiki comments proposing AbuseFilters: Abuse Filter has emergency checks that will disable a filter matching 5% or more of edits.

Aug 24 2020, 11:38 PM · Growth-Team, Anti-Harassment, Wikimedia-Site-requests
Krenair added a comment to T261133: Ban IP edits on pt.wiki.

Yeah this should probably be added to https://meta.wikimedia.org/wiki/Limits_to_configuration_changes

Aug 24 2020, 11:26 PM · Growth-Team, Anti-Harassment, Wikimedia-Site-requests

Aug 19 2020

Krenair created T260835: Stop using letsencrypt::cert::integrated on toolserver-legacy.
Aug 19 2020, 6:06 PM · User-bd808, cloud-services-team (Kanban)
Krenair updated the task description for T252199: Stop using letsencrypt::cert::integrated.
Aug 19 2020, 6:06 PM · cloud-services-team (Kanban), Mail
Krenair created T260834: Stop using letsencrypt::cert::integrated on mx-out*.cloudinfra.
Aug 19 2020, 6:05 PM · Patch-For-Review, cloud-services-team (Kanban), Mail

Aug 18 2020

Krenair added a comment to T260732: ORES icinga alerts.

modules/icinga/manifests/monitor/ores_labs_web_node.pp has check_command => "check_ores_workers!oresweb/node/${title}", which would be e.g. check_ores_workers!oresweb/node/ores-web-04. Also host ores.wmflabs.org.
modules/nagios_common/files/check_commands/check_ores_workers.cfg says this is $USER4$/check_ores_workers $HOSTADDRESS$ '$ARG1$'
So it becomes /usr/local/lib/nagios/plugins/check_ores_workers ores.wmflabs.org 'check_ores_workers!oresweb/node/ores-web-04'
./modules/nagios_common/files/check_commands/check_ores_workers turns that into /usr/local/lib/nagios/plugins/check_http -f follow -H "ores.wmflabs.org" -I "ores.wmflabs.org" -A "wmf-icinga/something (root@wikimedia.org)" -u "http://oresweb/node/ores-web-04/v3/scores/fakewiki/$(/bin/date +%s)/"

Aug 18 2020, 9:46 PM · ORES, SRE, Machine-Learning-Team

Aug 17 2020

Krenair added a comment to T260449: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites.

I don't have OTRS access, sorry. Is this a new reported issue with Jio users?

Aug 17 2020, 6:06 PM · Infrastructure-Foundations, SRE, netops, Traffic

Aug 4 2020

Krenair awarded T88258: Convert WikibaseRepository, WikibaseClient, WikibaseLib and WikibaseView to use extension registration a Barnstar token.
Aug 4 2020, 5:50 PM · Wikidata-Campsite, MW-1.35-notes (1.35.0-wmf.10; 2019-12-10), MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), Patch-For-Review, Wikidata-Trailblazing-Exploration, Story, Technical-Debt, [DEPRECATED] wdwb-tech, Wikidata-Turtles-Tech-Debt, Wikidata-Ministry-Of-Magic-Tech-Debt, Wikidata-Sprint-2017-12-20, Wikidata-Sprint-2015-08-11, Wikidata-Sprint-2015-06-30, Wikidata-Sprint-2015-06-16, Wikidata-Sprint-2015-06-02, MediaWiki-extensions-WikibaseRepository, Wikidata, MediaWiki-extensions-WikibaseClient

Aug 3 2020

Krenair claimed T248041: puppetdb on deployment-puppetdb03 keeps getting OOMKilled.

replacing with a medium instance, deployment-puppetdb04

Aug 3 2020, 11:57 PM · Patch-For-Review, Developer Productivity, Puppet, Beta-Cluster-Infrastructure
Krenair added a project to T259540: deployment-perfapt01 seems to be broken: Beta-Cluster-Infrastructure.
Aug 3 2020, 5:57 PM · Beta-Cluster-Infrastructure
Krenair created T259540: deployment-perfapt01 seems to be broken.
Aug 3 2020, 5:57 PM · Beta-Cluster-Infrastructure

Aug 2 2020

Krenair added a comment to T259444: Request for creating a DNS record for lists.wmcloud.org to 185.15.56.28.

As I recall with the meet project the project itself in OpenStack was named meet, therefore you automatically got a meet.wmflabs.org designate zone. Could get one for lists created too I guess (similar to the beta zone in deployment-prep). This way you could administer it without going through more tickets in future

Aug 2 2020, 10:30 PM · User-bd808, cloud-services-team (Kanban), VPS-Projects, SRE, User-Ladsgroup, Wikimedia-Mailing-lists
Krenair added a comment to T259444: Request for creating a DNS record for lists.wmcloud.org to 185.15.56.28.

This should probably just be a record under mailman.wmcloud.org ?

Aug 2 2020, 9:39 PM · User-bd808, cloud-services-team (Kanban), VPS-Projects, SRE, User-Ladsgroup, Wikimedia-Mailing-lists

Jul 30 2020

Krenair added a comment to T255249: acme-chief: support for generating a concatenated cert/key file.

I think the keys are generated first and the certs appear when acme-chief
has gone through the ACME API to get stuff signed by the CA

Jul 30 2020, 5:47 PM · Patch-For-Review, Acme-chief

Jul 17 2020

Krenair added a comment to T257968: Certificate for *.beta.wmflabs.org has expired (July 2020).

I'm still getting the cert error on https://upload.beta.wmflabs.org . Other subdomains, e.g. https://en.wikisource.beta.wmflabs.org , are working fine now.

Jul 17 2020, 12:01 AM · Beta-Cluster-Infrastructure

Jul 15 2020

Krenair created P11917 fixes for `puppet` hostname serving on a new labs central puppetmaster in codfw1dev.
Jul 15 2020, 6:29 PM · Cloud-VPS

Jul 14 2020

Krenair updated subscribers of T257968: Certificate for *.beta.wmflabs.org has expired (July 2020).

@Vgutierrez: I'm guessing puppet had failed to run the reload exec itself due to the errors connecting to acme-chief (Error 400 on SERVER: part must be in ['ec-prime256v1.crt', 'ec-prime256v1.chain.crt', 'ec-prime256v1.chained.crt', 'ec-prime256v1.key', 'ec-prime256v1.ocsp', 'rsa-2048.crt', 'rsa-2048.chain.crt', 'rsa-2048.chained.crt', 'rsa-2048.key', 'rsa-2048.ocsp'] from puppet and requests like /puppet/v3/file_content/acmedata/mx/bfcd4752e6b346289533bcb6934671a2/rsa-2048.crt.key?environment=production& showing up in the uwsgi-acme-chief logs) - it had new puppet classes and was making the new .crt.key CERTIFICATE_TYPE calls to acme-chief, and the acme-chief instance had v0.26 installed, but the uwsgi-acme-chief service on the acme-chief box had not been restarted. Wonder if we should automatically restart uwsgi-acme-chief on upgrading the acme-chief package somehow (puppet?)

Jul 14 2020, 9:23 PM · Beta-Cluster-Infrastructure
Krenair lowered the priority of T257968: Certificate for *.beta.wmflabs.org has expired (July 2020) from Unbreak Now! to High.

the immediate problem is solved by me manually doing the cert reload (something like touch /srv/trafficserver/tls/etc/ssl_multicert.config && /bin/systemctl reload trafficserver except there are two different ssl_multicert.config files on the system and two different trafficserver services)

Jul 14 2020, 8:50 PM · Beta-Cluster-Infrastructure
  NODES
admin 6
COMMUNITY 5
Idea 1
idea 1
INTERN 4
Note 10
Project 11
USERS 2