Page MenuHomePhabricator

Issues with acme-chief cert rotation on deployment-prep, 2021-01-12
Open, Needs TriagePublic

Description

Your certificate (or certificates) for the names listed below will expire in 10 days (on 12 Jan 21 06:00 +0000). Please make sure to renew your certificate before then, or visitors to your website will encounter errors.

[...]
*.wikibooks.beta.wmflabs.org
*.wikimedia.beta.wmflabs.org
*.wikinews.beta.wmflabs.org
*.wikipedia.beta.wmflabs.org
*.wikiquote.beta.wmflabs.org
*.wikisource.beta.wmflabs.org
*.wikiversity.beta.wmflabs.org
*.wikivoyage.beta.wmflabs.org
*.wiktionary.beta.wmflabs.org
etc.

12th January arrived and it was still old so I went to take a look.

I found:

  • acme-chief on deployment-acme-chief03 had not begun to create a new certificate. It still seemed quite content with version 0d829dc82393450fa34fabd837364efa, with the usual hourly reloading. At around 02:22 I found and killed the acme-chief-backend process, which caused it to get restarted and it produced sudden activity, issuing the new cert as version 1c321ac6da0c4103bf630165d91bca2c. Why was this necessary?
  • After this, puppet on deployment-cache-text06 happily updated its local files in /etc/acmecerts, but ATS was still serving the old versions. I had to service trafficserver-tls restart to get it to load new certs from disk.

Event Timeline

re acme-chief part: It looks like the same thing happened to the mx and wikibase certs too. Haven't checked those updated on the machines that serve them.
Also spotted various prod ncredir certs in /etc/acme-chief/config.yaml that can't be doing any good.

Unlikely

I was asking because T267006#6624466 (deployment-cache-upload06 is upload.beta.wmflabs.org I think?) and the problems started I think around T267858. But could be just coincidence.

Unlikely

I was asking because T267006#6624466 (deployment-cache-upload06 is upload.beta.wmflabs.org I think?) and the problems started I think around T267858. But could be just coincidence.

This isn't about varnish, this is about acme-chief and the interaction between our puppet manifests and ATS

Looks like *.wikimedia.beta.wmflabs.org expired again, on Tue, 10 Aug 2021 01:01:19 GMT.

It doesn't seem. It is serving a non-expired certificate generated on July:
Not Before: 7/11/2021
Not After: 10/9/2021

  NODES
Note 1