[10:08:39] Emperor: o/ [10:09:42] I think I found something a little odd with apus that I can't explain. An s3 upload with s3cmd hangs indefinitely for a small file (like a txt with a date in it) but works perfectly when uploading a 2GB file [10:10:08] eventually envoy returns 504 after waiting 3 minutes (stream idle timeout) [10:11:08] so my theory is that the registry hangs when adding small files (it has to do it for bookkeeping etc..) [10:11:33] does anything ring a bell? Or any suspicion? [10:11:54] elukey: that is a bit odd, but I'm afraid I've got to try and un-break syncing, which I think got busted by your attempting to delete things from registry-restricted in both eqiad and codfw at the same time last week. Which wasn't an unreasonable thing to have wanted to do, but does seem to have broken sync for at least that bucket [10:12:34] Emperor: sorry :( [10:12:50] if it is easier we can drop everything and re-create [10:12:53] like I say, I don't think you did anything unreasonable. [10:12:59] elukey: nono, that would make it worse. [10:13:27] I'm kindof hoping I can fix this without having to do a full sync of everything in apus, but I suspect not. [10:15:14] Emperor: lemme know if I can help, otherwise write down a list of $beverage that you'll want in Lisbon [10:57:35] elukey: I think we're caught up on sync now(!) [10:58:45] or at least radosgw-admin sync status now says so. If that's not accurate, I probably do now need to do a full resync [11:16:50] Emperor: nice! [11:17:12] the small file upload doesn't hang anymore [11:17:50] trying a docker push [11:18:19] worked \o/ [11:24:53] I am going to switch s4 eqiad master [11:32:32] marostegui: please use cumin [11:32:58] I am using a cookbook in cumin, but I don't think it will touch homer [11:33:30] marostegui: you really think out of netbox [11:33:51] (out of the netbox, more correct) [11:44:43] Emperor: I'd like to do a cleanup in the bucket, so basically using s3cmd towards the eqiad apus endpoint (and then leaving codfw to catch up). Green light to proceed? [11:49:13] elukey: clean-up at either end should be OK (but both at once may have caused confusion) [11:52:39] all right I'll do it thanks! [21:35:25] FIRING: SystemdUnitFailed: swift_rclone_sync.service on ms-be2069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed