[00:06:09] RECOVERY - bacula2 Bacula Private Git on bacula2 is OK: OK: Full, 5399 files, 17.10MB, 2021-02-14 00:05:00 (1.1 minutes ago) [00:42:03] RhinosF1|NotHere bp [00:42:08] s/bp/np [00:42:08] dmehus meant to say: RhinosF1|NotHere np [01:42:22] PROBLEM - jobrunner3 Current Load on jobrunner3 is WARNING: WARNING - load average: 5.80, 4.60, 3.14 [01:44:22] PROBLEM - jobrunner3 Current Load on jobrunner3 is CRITICAL: CRITICAL - load average: 6.02, 5.06, 3.48 [01:52:22] PROBLEM - jobrunner3 Current Load on jobrunner3 is WARNING: WARNING - load average: 5.52, 5.89, 4.63 [02:02:22] RECOVERY - jobrunner3 Current Load on jobrunner3 is OK: OK - load average: 4.41, 5.07, 4.85 [06:47:46] RECOVERY - phab2 APT on phab2 is OK: APT OK: 24 packages available for upgrade (0 critical updates). [12:36:10] PROBLEM - cp11 Current Load on cp11 is WARNING: WARNING - load average: 3.54, 2.88, 1.34 [12:40:10] RECOVERY - cp11 Current Load on cp11 is OK: OK - load average: 0.57, 2.09, 1.41 [12:44:34] !log disable puppet on mon2 for T6849 [12:44:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:46:47] PROBLEM - mon2 Puppet on mon2 is WARNING: WARNING: Puppet is currently disabled, message: SPF, last run 26 minutes ago with 0 failures [13:37:31] PROBLEM - cp10 Current Load on cp10 is CRITICAL: CRITICAL - load average: 7.36, 8.02, 3.82 [13:41:50] !log revert grafana hack and enable puppet on mon2 [13:41:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:42:47] RECOVERY - mon2 Puppet on mon2 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [13:43:22] RECOVERY - cp10 Current Load on cp10 is OK: OK - load average: 0.42, 3.33, 3.13 [13:43:58] PROBLEM - guia.cineastas.pt - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - guia.cineastas.pt All nameservers failed to answer the query. [13:50:54] RECOVERY - guia.cineastas.pt - reverse DNS on sslhost is OK: rDNS OK - guia.cineastas.pt reverse DNS resolves to cp11.miraheze.org [13:55:30] [02miraheze/dns] 07Reception123 pushed 031 commit to 03master [+1/-0/±0] 13https://git.io/JtPWp [13:55:32] [02miraheze/dns] 07Reception123 03c80d63c - add ecole.science zone [14:47:11] ^ looks like Reception123 is doing a custom domain. :) [14:47:30] [02miraheze/ssl] 07Reception123 pushed 031 commit to 03master [+1/-0/±1] 13https://git.io/JtP8w [14:47:31] [02miraheze/ssl] 07Reception123 0324dfd23 - add ecole.science cert [14:47:51] * dmehus wonders how long till the next Puppet run... do we have a dashboard for Puppet that tracks when it was last run? heh [14:47:52] dmehus: lol, you posted that exactly as I pushed the commit to SSL :D [14:48:02] lol yeah [14:48:15] dmehus: the only comparable thing is https://grafana.miraheze.org/d/W9MIkA7iz/miraheze-cluster?orgId=1&var-job=node&var-node=puppet3.miraheze.org&var-port=9100 :) [14:48:16] [ Grafana ] - grafana.miraheze.org [14:48:24] not quite a dashboard but yeah [14:49:19] Too bad it didn't show the crontab schedule in there [14:49:24] that'd be cool if it did [14:49:38] would you ever manually run Puppet for a custom domain or just wait? [14:54:03] yeah I do sometimes if I have to do something afterwards and prefer just to get it done before [14:54:54] dmehus: here it is - https://github.com/miraheze/puppet/blob/48162281fa7f478a84ecd512e2888b8f4ed493ec/hieradata/hosts/jobrunner1.yaml#L17 [14:54:55] [ puppet/jobrunner1.yaml at 48162281fa7f478a84ecd512e2888b8f4ed493ec · miraheze/puppet · GitHub ] - github.com [14:55:23] so at :02 and at :32 for mw servers [14:58:12] Reception123, ah, thanks. Nice. Only five more minutes then, so can just wait for the next run? [14:58:37] Yes :) [14:58:52] so if Puppet can't run on time, is that when we'll see a puppet is failing error? [15:00:30] PROBLEM - cp10 Current Load on cp10 is CRITICAL: CRITICAL - load average: 4.05, 5.61, 3.19 [15:02:25] PROBLEM - cp10 Current Load on cp10 is WARNING: WARNING - load average: 0.80, 3.90, 2.85 [15:04:20] RECOVERY - cp10 Current Load on cp10 is OK: OK - load average: 0.57, 2.78, 2.56 [15:04:23] dmehus: not exactly. The failing error is usually if there's a configuration error in puppet or if there's a large change to something and it times out [15:04:33] I forgot what the timeout threshold is but when it reaches that it fails [15:04:48] And sometimes it happens when a mediawiki extension is upgraded or installed [15:19:59] Reception123, yeah I figured the failing error would usually be something else. That was my next question... what's the Puppet timeout threshold? [15:20:45] * dmehus thinks paladox probably has the puppet timeout threshold memorized [15:21:25] dmehus: it's the same as git [15:22:19] RhinosF1|NotHere, thanks, but that doesn't help me. [15:22:43] dmehus: https://github.com/miraheze/puppet/blob/master/modules/git/manifests/clone.pp#L20 [15:22:44] [ puppet/clone.pp at master · miraheze/puppet · GitHub ] - github.com [15:22:57] ah, thanks :) [15:23:02] that helps [15:23:10] is that in 600 in seconds? [15:24:14] Yep [15:24:20] ah, thanks :) [15:24:22] cool [15:24:37] so 10 minutes basically then it'll squawk [15:26:11] Whenever the next Icinga check after the failure is [15:27:10] Which is every 2 minutes [15:27:42] RhinosF1|NotHere, ah, right, yeah that makes sense, thanks :) [15:29:49] But if the last puppet run is showing as failed according to puppet internally then Icinga will tell you on here what resource failed [15:36:48] ah [16:00:06] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JtPRU [16:00:08] [02miraheze/puppet] 07paladox 03aa620a3 - grafana: Block off /api/snapshots [16:23:59] [02ssl] 07Universal-Omega opened pull request 03#389: Remove highstreethistory.uk domain - 13https://git.io/JtP0t [16:24:25] [02ssl] 07Universal-Omega synchronize pull request 03#389: Remove highstreethistory.uk domain - 13https://git.io/JtP0t [16:24:40] PROBLEM - meta.nocyclo.tk - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for meta.nocyclo.tk could not be found [16:26:57] PROBLEM - es.nocyclo.tk - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for es.nocyclo.tk could not be found [16:27:08] PROBLEM - en.nocyclo.tk - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for en.nocyclo.tk could not be found [16:27:49] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JtP0n [16:27:51] [02miraheze/puppet] 07paladox 033f444e8 - grafana: Disable external publishing of snapshots [16:27:56] Universal_Omega: none of them are resolving [16:28:15] That just alerted [16:28:59] RhinosF1|NotHere: I know. It's active externally though. There is a site on it that's not miraheze. [16:29:51] Universal_Omega: I mean *.nocyclo.tk [16:30:02] yeah the one you removed was Wordpress [16:33:54] PROBLEM - es.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - es.nocyclo.tk reverse DNS resolves to crawl-66-249-72-188.googlebot.com [16:34:02] PROBLEM - en.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - en.nocyclo.tk reverse DNS resolves to crawl-66-249-72-188.googlebot.com [16:35:48] Universal_Omega: ^ looks like something got changed [16:36:29] RhinosF1|NotHere: should that one be removed also then or not yet? [16:36:33] .ip 66.249.72.188 [16:36:34] [IP/Host Lookup] Hostname: crawl-66-249-72-188.googlebot.com | Location: United States | ISP: AS15169 GOOGLE [16:37:00] Universal_Omega: give me 5 [16:37:13] RhinosF1|NotHere: alright, thanks! [16:38:20] PROBLEM - meta.nocyclo.tk - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - meta.nocyclo.tk reverse DNS resolves to crawl-66-249-72-188.googlebot.com [16:40:02] Universal_Omega: it looks like the registration might have lapsed [16:40:07] But just very recently [16:40:14] (Like last 24 hours) [16:40:48] RhinosF1|NotHere: so should it be removed or wait 24-48 hours? [16:41:12] Universal_Omega: ack the alert for 48 hours to give leeway [16:42:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JtPEJ [16:42:38] [02miraheze/puppet] 07paladox 03008c89c - grafana: Enable gzip [16:46:21] paladox: how many redis procs should be running on any given jbr? [16:46:51] Nevermind it resolved [16:46:51] 1 [16:46:57] hmm? [16:46:58] Jbr3 had 2 [16:47:28] ok [16:49:26] Zppix: that should always self resolve. I'd assume for some reason one got killed so another started up but for a very short time 2 were running [16:49:36] paladox: does that sound a sane guess? [16:50:51] I guess so. [16:51:21] * RhinosF1|NotHere shrugs as it's not really an issue if it recovers quick [16:54:46] RECOVERY - es.nocyclo.tk - reverse DNS on sslhost is OK: rDNS OK - es.nocyclo.tk reverse DNS resolves to cp11.miraheze.org [16:56:22] [02ssl] 07Reception123 closed pull request 03#389: Remove highstreethistory.uk domain - 13https://git.io/JtP0t [16:56:24] [02miraheze/ssl] 07Reception123 pushed 031 commit to 03master [+0/-1/±1] 13https://git.io/JtPuU [16:56:25] [02miraheze/ssl] 07Universal-Omega 03bbbd370 - Remove highstreethistory.uk domain (#389) [17:07:00] ^ wow, highstreethistory.uk has already left us, Reception123 :( [17:07:10] that was just added <2 months ago [17:07:47] :( [17:12:44] dmehus: looks like they moved to Wordpress [17:12:59] yep [17:24:31] PROBLEM - es.nocyclo.tk - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for es.nocyclo.tk could not be found [17:25:16] Yes I know Icinga [17:31:24] RECOVERY - es.nocyclo.tk - reverse DNS on sslhost is OK: rDNS OK - es.nocyclo.tk reverse DNS resolves to cp11.miraheze.org [17:35:50] RECOVERY - en.nocyclo.tk - reverse DNS on sslhost is OK: rDNS OK - en.nocyclo.tk reverse DNS resolves to cp10.miraheze.org [17:37:13] Universal_Omega: looks back [17:37:16] Maybe they renewed [17:37:55] Whois don't show registered with freedom [17:37:59] Freenom [17:39:47] RECOVERY - meta.nocyclo.tk - reverse DNS on sslhost is OK: rDNS OK - meta.nocyclo.tk reverse DNS resolves to cp11.miraheze.org [17:48:26] RhinosF1|NotHere: great! [18:01:59] PROBLEM - cp10 Current Load on cp10 is CRITICAL: CRITICAL - load average: 1.95, 4.20, 2.76 [18:03:57] RECOVERY - cp10 Current Load on cp10 is OK: OK - load average: 0.90, 3.16, 2.54 [20:22:45] RECOVERY - dbbackup2 Puppet on dbbackup2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:24:18] !log dump db13 (c4) to dbbackup1 [20:24:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:24:11] PROBLEM - cp10 Current Load on cp10 is WARNING: WARNING - load average: 2.59, 3.54, 2.06 [21:26:09] RECOVERY - cp10 Current Load on cp10 is OK: OK - load average: 0.91, 2.55, 1.87 [21:49:37] PROBLEM - dbbackup2 Current Load on dbbackup2 is CRITICAL: CRITICAL - load average: 5.16, 3.39, 1.77 [21:50:24] [02miraheze/puppet] 07JohnFLewis pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JtPiz [21:50:26] [02miraheze/puppet] 07JohnFLewis 03b6e3b27 - Remove NDKilla from ops [21:50:44] hmm [21:51:09] !log remove ndkilla from sre and sre-infrastructure groups following inactivity removal [21:51:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:52:39] PROBLEM - ping6 on dbbackup2 is WARNING: PING WARNING - Packet loss = 0%, RTA = 148.38 ms [21:52:57] PROBLEM - ping6 on dbbackup1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 144.56 ms [21:53:50] PROBLEM - ping6 on ns1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 146.20 ms [21:54:58] PROBLEM - ping6 on dbbackup1 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 142.57 ms [21:55:53] !log removed NDKilla from GitHub and Matomo [21:55:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:56:57] PROBLEM - ping6 on dbbackup1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 142.78 ms [22:00:00] RECOVERY - ping6 on ns1 is OK: PING OK - Packet loss = 0%, RTA = 109.88 ms [22:00:34] RECOVERY - ping6 on dbbackup2 is OK: PING OK - Packet loss = 0%, RTA = 97.86 ms [22:00:56] RECOVERY - ping6 on dbbackup1 is OK: PING OK - Packet loss = 0%, RTA = 99.02 ms [22:19:18] PROBLEM - cp10 Current Load on cp10 is CRITICAL: CRITICAL - load average: 7.71, 4.02, 2.30 [22:21:16] RECOVERY - cp10 Current Load on cp10 is OK: OK - load average: 1.43, 2.85, 2.07 [22:38:42] !log remove NDKilla email address and forward his steward emails to his personal email [22:38:45] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log