SystemdUnitDown The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours.
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	phaultfinder
	Dec 5 2024, 4:01 AM

Description

dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev
description: Unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been down for long.
runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
summary: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours.

dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev
description: Unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been down for long.
runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
summary: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours.
alertname: SystemdUnitDown
cluster: wmcs
instance: cloudbackup1002-dev:9100
job: node
name: remove_dangling_cinder_snapshots.service
prometheus: ops
severity: critical
site: eqiad
source: prometheus
state: failed
team: wmcs
type: oneshot
Source

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 5 2024, 4:01 AM

Andrew triaged this task as Medium priority.

this seems to be working now