[01:55:48] FIRING: PuppetFailure: Puppet has failed on ms-be2096:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:05:48] FIRING: [2x] PuppetFailure: Puppet has failed on ms-be2095:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [06:06:03] FIRING: [2x] PuppetFailure: Puppet has failed on ms-be2095:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:12:20] those are nodes still with DC-Ops for setup. [08:14:08] 👍 now that I am aware it doesn't worry me [08:16:45] looks like Dell sending us not-blank disks again, but I'll have more of a look after some caffeine [08:37:21] non-blank...?! [08:53:09] I'm restarting instances on db2239, it is consuming unexpectedly too much memory [08:55:09] yeah, I presume it's some artifact of internal testing, but we keep finding that one of the spinning disks in Dell Config-J systems has a vfat filesystem on (looks like an EFI partition from Windows) [08:56:32] Our (re-)imaging of swift backends doesn't wipe the spinning disks (because we want to preserve them during upgrades), so this means puppet can't run to completion because it wants to make an xfs partition on each spinning disk, but won't overwrite what's already there. [08:58:26] wait, so you use xfs for swift? [08:58:51] (unrelated question) [08:59:20] yes, hence my comments about the y2038 bugs, poor repair tooling and slight concerns about performance when nearly-full [08:59:44] ok, that changes everything [08:59:49] ? [09:00:39] for me, sorry, this is an unrelated train of thought- sorry, I don't have any helpful tip re:partman [09:01:27] I may use ext4 instead [09:02:03] as I was seeing degraded performance exactly as I was close to full drive [09:05:18] Meh, fixing these two servers is fine - wipe the offending disk properly, re-image. I'll talk to dc-ops about asking Dell again to actually send us empty disks.