[14:05:31] with go.dog away, is there anyone here who could give me pointers for using puppet-private in pontoon, or at least tell me to stop trying? :D [14:08:42] (TIL .dog is a TLD!) [14:12:38] there are some funny TLDs https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains [14:16:24] on-call handover; one incident today, deleted eqiad's captcha bucket in swift. That's resolved. [15:13:51] could T395814 be related to any ongoing migrations from deploy1003 cron to kubernetes by any chance? [15:13:52] T395814: wikidata-updatequeryservicelag cron job failing - https://phabricator.wikimedia.org/T395814 [15:15:04] Lucas_WMDE: re: the job not being flaky before, systemd would erase a failure before it even got to alerting for a failed run with a minute interval [15:15:27] ah, interesting [15:16:12] how far back does the journal go? can we see if the systemd version used to fail occasionally? ^^ [15:16:20] too late unfortunately [15:16:27] (I think) [15:16:29] claime: could use a centrallog host maybe? [15:16:47] cdanis: I don't think we had the systemd timer outputs there [15:16:57] hmmm I'm not sure, I'd check [15:17:17] I'm in a meeting rn, but that's possible [15:17:37] they have syslog, which answers the "did this job fail or succeed" question [15:18:09] A likely cause would be a temp network failure like the ones we get for memcached (race condition between puppet updating iptables and kubeproxy) [15:18:38] 💔cdanis@centrallog1002.eqiad.wmnet /srv/syslog/deploy1003 🕦☕ zgrep -i updatequeryservicelag syslog.log-202503* [15:18:42] oh you have the job output in there too [15:18:46] not sure I'm holding this right, I'm also in a meeting [15:20:09] cdanis: would be in mwmaint actually [15:20:13] cdanis: wrong host, it'd be on mwmaint not deploy [15:20:13] but checking [15:20:29] so far no hits for non-successful jobs for my grep, but it's still running [15:20:46] (`taavi@centrallog1002 /srv/syslog/mwmaint2002 $ zgrep -a "mediawiki_job_wikidata-updateQueryServiceLag.service: " syslog.log*.gz | grep -v Succeeded` is what I'm running) [15:22:03] thanks yall [15:22:22] zgrep -i 'updatequeryservice' syslog.log-202505* | grep -c 'Failed to get lagged' [15:22:25] 5585 [15:22:58] ok, thanks! [15:23:13] I’ll add that to the task then