[06:26:33] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3076 MB (12% inode=94%); [07:24:33] PROBLEM - private.revi.wiki - Comodo on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:34] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:24:34] PROBLEM - wiki.autocountsoft.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:38] PROBLEM - wiki.cloudytheology.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:38] PROBLEM - guiasdobrasil.com.br - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:39] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:43] PROBLEM - disabled.life - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:45] PROBLEM - reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:02] PROBLEM - cp4 Disk Space on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:14] PROBLEM - netazar.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:17] PROBLEM - cp4 Puppet on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:19] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:28] PROBLEM - cp4 SSH on cp4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:33] PROBLEM - cp4 Current Load on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:41] PROBLEM - Host cp4 is DOWN: PING CRITICAL - Packet loss = 100% [07:25:55] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [07:26:06] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [07:26:19] Oh crap - cp4 died [07:26:28] RECOVERY - private.revi.wiki - Comodo on sslhost is OK: OK - Certificate 'private.revi.wiki' will expire on Thu 07 Nov 2019 11:59:59 PM GMT +0000. [07:26:30] RECOVERY - wiki.autocountsoft.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.autocountsoft.com' will expire on Sun 10 Nov 2019 11:13:25 AM GMT +0000. [07:26:32] RECOVERY - wiki.cloudytheology.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.cloudytheology.com' will expire on Sun 15 Sep 2019 11:17:19 PM GMT +0000. [07:26:38] RECOVERY - disabled.life - LetsEncrypt on sslhost is OK: OK - Certificate 'disabled.life' will expire on Sun 10 Nov 2019 10:58:16 AM GMT +0000. [07:26:38] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [07:26:40] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [07:27:45] paladox, Reception123: ^ ideas? [07:27:54] Cp4 is showing as down [07:28:04] looking [07:29:24] RhinosF1: RamNode again [07:29:33] Ok [07:29:50] opening a ticket so it should be back soon [07:31:05] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:35:13] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [07:37:32] RECOVERY - netazar.org - LetsEncrypt on sslhost is OK: OK - Certificate 'www.netazar.org' will expire on Mon 07 Oct 2019 06:46:45 PM GMT +0000. [07:44:47] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:01:49] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:06:16] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:08:15] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:14:41] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:16:41] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:21:08] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:27:25] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:33:51] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:35:30] Reception123: ramnode on a bank holiday slow go? [08:35:47] I will let you know when I see JohnLewis and I will deliver that message to them [08:35:47] @notify JohnLewis https://anopequotes.org/?442 [08:35:48] [ Anope Quotes - The unofficial quotes site of Anope IRC Services ] - anopequotes.org [08:40:25] Perhaps [08:40:33] I did high priority so not sure that's taking that long [08:40:38] :) [08:40:41] *What's [08:48:44] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:53:10] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:25:15] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [09:36:48] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:47] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [09:39:24] RECOVERY - guiasdobrasil.com.br - LetsEncrypt on sslhost is OK: OK - Certificate 'guiasdobrasil.com.br' will expire on Sat 16 Nov 2019 01:37:57 PM GMT +0000. [09:39:32] RECOVERY - Host cp4 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [09:39:36] PROBLEM - cp4 Stunnel Http for test1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:39:36] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. PROBLEM - cp4 Stunnel Http for misc2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:39:36] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:39:36] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.004 second response time [09:39:48] RECOVERY - cp4 Stunnel Http for misc2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 0.077 second response time [09:39:55] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.003 second response time [09:39:57] RECOVERY - cp4 Disk Space on cp4 is OK: DISK OK - free space: / 20822 MB (50% inode=99%); [09:40:06] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 1% [09:40:07] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [09:40:19] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [09:40:51] RECOVERY - cp4 Current Load on cp4 is OK: OK - load average: 0.29, 0.18, 0.07 [09:40:54] RECOVERY - cp4 SSH on cp4 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) [09:41:01] RECOVERY - cp4 Stunnel Http for test1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24499 bytes in 0.015 second response time [09:41:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [09:43:00] RECOVERY - cp4 Puppet on cp4 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:30:17] Reception123: it came back then ^ [10:31:03] seems like it yeah [10:31:24] still discussing with RamNode about bandwidth exchange [10:32:20] Reception123: from ? To ? [10:32:37] RhinosF1: well cp4 connects to mw* [10:32:54] Reception123: ok [11:27:35] Hey JohnLewis [12:05:09] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjARX [12:05:11] [02miraheze/services] 07MirahezeSSLBot 03d26a218 - BOT: Updating services config for wikis [12:10:13] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjARH [12:10:14] [02miraheze/services] 07MirahezeSSLBot 03fd4f906 - BOT: Updating services config for wikis [12:18:00] JohnLewis: promotional user page - meta - delete? [12:19:28] Reception123: ^ [12:19:39] on it [12:20:13] Reception123: CA shows no other edits so not a spam bot I'd say just promotional [12:20:45] https://www.irccloud.com/pastebin/O0tnqXCR [12:20:45] [ Snippet | IRCCloud ] - www.irccloud.com [12:20:59] Reception123: cp4 playing up ^ [12:21:10] We're down [12:21:12] just saw when I was trying to go on Meta [12:21:23] And back [12:21:36] But slow [12:22:06] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [12:22:21] I can't access anything for now [12:22:21] Reception123: see Icinga web [12:22:31] ok looks fine now [12:22:51] PROBLEM - kkutu.wiki - LetsEncrypt on sslhost is CRITICAL: connect to address kkutu.wiki and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [12:23:04] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [12:23:16] Reception123: ^ here it comes [12:23:26] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw2 [12:23:46] well I dealt with the Meta spam [12:23:48] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw2 [12:23:48] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [12:23:49] * RhinosF1 prods JohnLewis and paladox [12:24:03] now it's mw* what is going on [12:24:06] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [12:24:47] RECOVERY - kkutu.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'kkutu.wiki' will expire on Sat 16 Nov 2019 02:04:40 PM GMT +0000. [12:25:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [12:25:12] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: HTTP CRITICAL - No data received from host [12:25:16] MW2 is down [12:25:18] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:27] PROBLEM - mw2 SSH on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:43] Reception123: RamNode? Mw2 [12:25:46] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:25:48] I can see, checking [12:26:00] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: HTTP CRITICAL - No data received from host [12:26:07] RhinosF1: no, mw2 does not appear suspended by RamNode [12:26:19] PROBLEM - mw2 Current Load on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:26:23] can't connect to it though [12:26:26] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:26:39] Reception123: well something has taken it offline [12:26:39] PROBLEM - mw2 php-fpm on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:26:49] RhinosF1: yes, RamNode status indicates "Offline" but not a suspension [12:26:55] PROBLEM - Host mw2 is DOWN: PING CRITICAL - Packet loss = 100% [12:27:15] Ah [12:27:21] RhinosF1: hm? [12:29:28] * RhinosF1 doesn't understand what broke it [12:29:55] me neither [12:30:02] paladox: JohnLewis when you're around pls ping me [12:30:21] Pls [12:33:32] !log depool mw2 [12:33:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:33:46] !log reboot mw2 (offline) [12:33:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:34:13] RECOVERY - Host mw2 is UP: PING WARNING - Packet loss = 80%, RTA = 0.28 ms [12:34:15] PROBLEM - mw2 Disk Space on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:34:29] PROBLEM - kkutu.wiki - LetsEncrypt on sslhost is CRITICAL: connect to address kkutu.wiki and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [12:34:40] Yey [12:34:50] It's up! [12:34:59] Reception123: recovering [12:35:11] yeah, when I can connect to it I'll repool [12:35:15] still strange that it just shut down like that [12:35:42] Reception123: doesn't make sense. Any log for it? [12:35:57] don't think there's a server-shut-down.log :P [12:36:14] Reception123: anyone on it at the time? [12:36:23] not afaik [12:36:42] Doesn't add up - it shouldn't randomly turn off [12:36:56] If there's no error log then we can't use that [12:37:07] still can't ssh to it though [12:37:41] Reception123: Icinga says connection refused [12:37:43] RhinosF1: might be https://phabricator.miraheze.org/T4128 [12:37:44] [ ⚓ T4128 Fix SSH not working after reboot for various servers ] - phabricator.miraheze.org [12:38:11] Reception123: could be [12:38:21] RhinosF1: ping works so I can only imagine it being that that's why I can't ssh [12:38:21] Try it then [12:38:36] Reception123: probably - try what it says then [12:39:37] will do [12:40:21] RECOVERY - kkutu.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'kkutu.wiki' will expire on Sat 16 Nov 2019 02:04:40 PM GMT +0000. [12:43:56] RECOVERY - mw2 SSH on mw2 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) [12:44:34] !log started sshd and removed /var/run/nologin on mw2 [12:44:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:45:02] RhinosF1: and it's back [12:45:06] !log repool mw2 [12:45:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:45:54] RECOVERY - mw2 Current Load on mw2 is OK: OK - load average: 0.29, 0.08, 0.02 [12:46:05] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.025 second response time [12:46:14] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [12:46:48] RECOVERY - mw2 Disk Space on mw2 is OK: DISK OK - free space: / 32919 MB (42% inode=98%); [12:46:59] RECOVERY - mw2 php-fpm on mw2 is OK: PROCS OK: 7 processes with command name 'php-fpm7.2' [12:47:12] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [12:47:13] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.392 second response time [12:47:26] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [12:47:32] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.687 second response time [12:47:48] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [12:47:48] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [12:52:03] PROBLEM - kkutu.wiki - LetsEncrypt on sslhost is CRITICAL: connect to address kkutu.wiki and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [12:57:35] Reception123: good job! [13:02:07] RhinosF1: thanks :) [13:04:47] Reception123: np [13:47:56] PROBLEM - mw2 Puppet on mw2 is WARNING: WARNING: Puppet last ran 1 hour ago [13:53:56] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:14:54] Voidwalker, JohnLewis: I can smell Trouble -> https://meta.miraheze.org/wiki/Special:CentralAuth?target=Sticky+Is+A+Clown+And+Dies+In+Hell [14:14:55] [ Global account information for Sticky Is A Clown And Dies In Hell - Miraheze Meta ] - meta.miraheze.org [14:28:39] Voidwalker: thx [14:55:56] PROBLEM - mw2 Puppet on mw2 is WARNING: WARNING: Puppet last ran 1 hour ago [14:56:53] paladox: ^ what's going on? It seems like puppet only runs when I run it manually [14:57:00] hmm [14:57:02] at least according to icinga [14:57:26] you never started the cron service Reception123 :) [14:57:37] !log root@mw2:/home/paladox# sudo service cron start [14:57:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [14:57:46] paladox: why would I need to do that manually? [14:57:56] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [14:58:15] because it's the same issue affecting ssh comming up after a reboot [14:58:23] ah ok [15:00:27] Someone should document that [15:00:37] Every step to be ran on a server restart [15:00:53] well they're not steps to be ran [15:00:54] it's a bug [15:01:09] that affects like 3 out of 17 servers [15:02:41] Ah [16:23:53] JohnLewis, Reception123, paladox: Can I get a check on a CW exception within the last few minutes? [16:25:02] JohnLewis: any chance of a rights assignment for https://bmtune.miraheze.org/wiki/Special:ListUsers? [16:25:04] [ User list - BMTune Honda wiki ] - bmtune.miraheze.org [16:26:30] can't find anything in exception.log [16:27:23] RhinosF1: what time was it? [16:27:26] Only thing I've got is [16:27:27] Mon Aug 26 15:10:08 UTC 2019 mw1 metawiki Error connecting to mediawiki-internal-db-master.miraheze.org: :real_connect(): (HY000/2002): [16:27:50] Reception123: that looks to early [16:28:00] well I can't really find anything else [16:28:00] And per yesterday no code to give you [16:29:00] RhinosF1: https://meta.miraheze.org/wiki/Special:RequestWikiQueue/9058#mw-section-request was it right? so I can mark as approved [16:29:01] [ Wiki requests queue - Miraheze Meta ] - meta.miraheze.org [16:29:39] Reception123: needs rights doing [16:30:02] well as John said I should probably not do that unless there's really no one around, so I'll wait for him or Void to get back [16:30:44] Reception123: yeah, time to hang around for a bit [16:44:09] Reception123, you called? :) [16:44:24] Voidwalker: yeah rights needed on https://bmtune.miraheze.org/wiki/Special:ListUsers [16:44:24] [ User list - BMTune Honda wiki ] - bmtune.miraheze.org [16:44:31] (the user already logged in so no need for me to do anything this time) [16:45:22] done [16:45:51] thanks :) [16:46:37] !log exporting matomo db, and reimporting it on db5 [16:46:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:47:55] PROBLEM - misc2 HTTPS on misc2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 0.007 second response time [16:48:58] PROBLEM - cp3 Stunnel Http for misc2 on cp3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 0.508 second response time [16:49:01] PROBLEM - cp2 Stunnel Http for misc2 on cp2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 0.295 second response time [16:49:16] what's going on this time... [16:49:36] PROBLEM - cp4 Stunnel Http for misc2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:50:08] Reception123: can u mark the WR as approved? [16:50:33] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9058; [16:50:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:52:10] Reception123: that'll be paladox seen as Matamo is misc 2 isn't it [16:52:25] ah true [16:52:47] yeh [16:52:48] that's mean [16:52:56] but i dunno why misc2 decided to do that... [16:54:06] paladox: what's mean? [16:54:16] not sure what you mean? [16:55:03] paladox: read scrollback - or did your spelling fail? [16:55:43] oh, spelling mistake, that's ment to be me [16:58:58] RECOVERY - cp3 Stunnel Http for misc2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 1.034 second response time [16:59:01] RECOVERY - cp2 Stunnel Http for misc2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 0.563 second response time [16:59:26] RECOVERY - cp4 Stunnel Http for misc2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 0.112 second response time [16:59:51] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 41810 bytes in 0.115 second response time [17:10:33] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2649 MB (10% inode=94%); [17:13:02] Reception123: ^ recovering [17:17:34] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjA2h [17:17:35] [02miraheze/mw-config] 07Reception123 03dd96e66 - exempt electronpdf from cookiewarning [17:17:38] ^ RhinosF1 this is what we're trying out [17:19:45] RhinosF1: doesn't work due to varnsih [17:19:47] *varnish [17:20:04] Okay [17:22:50] CVT stuff has been updated [17:24:03] Voidwalker: great, thanks :) [18:17:38] !log reception@mw1:/srv/mediawiki/w/extensions/CreateWiki/maintenance$ sudo -u www-data php populateMainPage.php --wiki=aescraftwiki [18:17:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:19:06] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9062; [18:19:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:22:44] !log reception@mw1:/srv/mediawiki/w/extensions/CreateWiki/maintenance$ sudo -u www-data php populateMainPage.php --wiki=2b2tmcpewiki [18:22:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:23:18] !log reception@mw1:/srv/mediawiki/w/extensions/CentralAuth/maintenance$ sudo -u www-data php createLocalAccount.php --wiki=2b2tmcpewiki 'Memelord27 ' [18:23:19] Voidwalker: ^ [18:23:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:25:50] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9063; [18:25:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:27:20] * RhinosF1 slaps Reception123 with a trout [18:27:34] lol [18:27:43] RhinosF1: fixed but corrected on-wiki instead of saying "set other to inreview" etc. [18:27:51] since it's the same effect [18:28:40] Reception123: k thx [18:34:52] !log drop piwik database and reimport on db5 [18:35:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:11:15] * RhinosF1 prods JohnLewis [19:11:28] JohnLewis: u around? [19:11:38] RhinosF1: kinda but not really rn [19:12:15] JohnLewis: u able to double check why we had 2 CW fails in a row earlier and confirm it should work again [19:12:29] jobqueue I bet [19:12:55] JohnLewis: I could guess that [19:13:05] But Reception123 couldn't find it in the logs [19:13:15] then the wiki exists :P [19:13:20] it'll be jobqueue or wiki exists [19:13:36] JohnLewis: JQ then [19:13:41] Should it work now? [19:13:46] 2 in a row unusual [19:13:51] Probably that annoying jobqueue again [19:14:23] Redis ooms, so jq then fails until redis starts up [19:14:34] paladox: ah [19:14:52] * RhinosF1 tries another request [19:15:37] paladox: still down [19:16:12] It’ll be the wiki exists [19:16:35] paladox: https://goanimatev6.miraheze.org/wiki/Special:ListUsers [19:16:36] [ User list - GoAnimate V6 ] - goanimatev6.miraheze.org [19:16:48] I need a main page and local account from u guys first [19:17:29] JohnLewis, Reception123: ^ [19:18:57] urgh more... [19:19:27] Yep [19:19:34] !log reception@mw1:/srv/mediawiki/w/extensions/CentralAuth/maintenance$ sudo -u www-data php createLocalAccount.php --wiki=goanimatev6wiki 'Eijitheawesome' [19:19:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:19:42] Will someone please get it working again for everyone's sake [19:20:02] JohnLewis, Voidwalker: ^ add rights pls [19:20:04] !log reception@mw1:/srv/mediawiki/w/extensions/CreateWiki/maintenance$ sudo -u www-data php populateMainPage.php --wiki=goanimatev6wiki [19:20:05] That’ll require a resource bump for misc2 [19:20:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:20:20] paladox: should I try a jobrunner restart? [19:20:27] paladox: well without it we can't create wikis [19:20:43] Yup [19:21:06] paladox: and that's a bloody big issue [19:21:23] Yup [19:21:47] paladox: so can we try something pls [19:21:49] done [19:22:02] Reception123: ^ approve in the dB pls [19:22:12] !log restarted jobrunner [19:22:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:22:23] Thank you Reception123 [19:22:30] RhinosF1: not much I can do, requires resource bump [19:22:32] Mark that approved and I'll try the other [19:22:38] Reception123: ^ [19:22:48] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9063; [19:22:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:22:55] Thx [19:22:57] RhinosF1: love how wiki creation is now a three man process [19:23:02] Reception123: so do I [19:23:03] 1) the creator 2) the sysadmin 3) the steward [19:23:13] Shall I do the next one? [19:23:16] more tedious than even deleting a wiki tbh [19:23:31] RhinosF1: I mean might as well do it now while all the people involved in creation are around ;) [19:23:51] Reception123: worked!! [19:24:05] RhinosF1: great :0 [19:24:08] *:D [19:24:14] looks like it was that jobqueue after all again [19:24:22] I never liked it, always some issue with it [19:24:27] It was even on test1 at some point [19:24:34] * Voidwalker is relieved :) [19:24:36] Huh [19:24:45] * paladox facepalms [19:25:16] * RhinosF1 cheers [19:25:29] * RhinosF1 then punches the jobQueue [19:25:45] good, if not I had to create a new userbox saying "This user assists wiki creators with creation" [19:26:07] Reception123: :) [19:26:10] * RhinosF1 laughs [20:00:44] Miraheze Logo 503 Backend fetch failed [20:00:46] urgh again [20:01:10] paladox: ^ [20:01:34] I’m mobile [20:02:07] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:02:32] it's back [20:02:42] though I don't get why it keeps doing this again [20:03:33] paladox: matomo is back by the way :) [20:04:06] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:06:52] Ok :) [20:06:55] Check logs [20:52:16] hello [20:53:50] Hi Examknow [20:54:21] thanks for fixing the grammar issues [20:54:40] probably should not have done the i18n files late at night [20:54:54] Examknow: np, did u see the other PR? [20:55:07] yup [20:55:14] And Voidwalker checked it and had a few thoughts [20:55:31] ok [20:55:36] I did not see those [20:55:44] Voidwalker: ^ do you want to do the honours of explaining [20:56:05] please do [20:56:09] Examknow: see phab but Voidwalker thinks he has an idea on how to do it [20:56:15] ok [20:57:17] Voidwalker: What is your github? [20:58:26] https://github.com/The-Voidwalker [20:58:26] [ The-Voidwalker · GitHub ] - github.com [20:58:43] PROBLEM - misc3 Current Load on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:58:52] PROBLEM - misc3 SSH on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:58:55] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:58:58] PROBLEM - misc3 Disk Space on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:04] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:59:05] Voidwalker: Added you to repo [20:59:07] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:09] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:12] PROBLEM - misc3 proton on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:13] PROBLEM - cp3 Stunnel Http for misc3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:13] PROBLEM - misc3 zotero on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:18] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:22] PROBLEM - misc3 restbase on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:22] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:25] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:26] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:59:36] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:44] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 55% [20:59:46] PROBLEM - misc3 lizard.miraheze.org HTTPS on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:48] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:59:59] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:00:00] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:05] Somebody broke something [21:00:07] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:00:09] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:10] PROBLEM - cp2 Stunnel Http for misc3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:12] PROBLEM - misc3 electron on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:00:13] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:16] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:17] PROBLEM - cp4 Stunnel Http for misc3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:17] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:30] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:59] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [21:01:19] JohnLewis, paladox, Reception123 ^ [21:01:40] meta just went down [21:01:42] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 37% [21:01:45] RECOVERY - misc3 lizard.miraheze.org HTTPS on misc3 is OK: HTTP OK: Status line output matched "HTTP/1.1 401 Unauthorized" - 381 bytes in 0.027 second response time [21:01:57] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 1.336 second response time [21:01:58] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [21:02:07] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.675 second response time [21:02:07] RECOVERY - cp2 Stunnel Http for misc3 on cp2 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 0.399 second response time [21:02:09] RECOVERY - misc3 electron on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 3000 [21:02:11] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.392 second response time [21:02:15] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.392 second response time [21:02:15] the idea I'm currently thinking of involves maintaining a list in a central DB of wikis where CU data may be present for the check user [21:02:15] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.009 second response time [21:02:16] RECOVERY - cp4 Stunnel Http for misc3 on cp4 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 0.012 second response time [21:02:16] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 61% [21:02:24] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 10 minutes ago with 0 failures [21:02:45] RECOVERY - misc3 Current Load on misc3 is OK: OK - load average: 1.01, 1.20, 0.72 [21:02:54] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [21:02:58] RECOVERY - misc3 Disk Space on misc3 is OK: DISK OK - free space: / 42658 MB (88% inode=94%); [21:02:58] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 0% [21:03:00] RECOVERY - misc3 SSH on misc3 is OK: SSH OK - OpenSSH_7.9p1 Debian-10 (protocol 2.0) [21:03:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [21:03:10] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.681 second response time [21:03:13] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.004 second response time [21:03:14] hmm [21:03:14] RECOVERY - misc3 proton on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 3000 [21:03:16] RECOVERY - misc3 zotero on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 1969 [21:03:19] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [21:03:20] trouble is, the code is difficult to navigate and all I've been able to do for the past hour is around 50 lines of changes [21:03:22] RECOVERY - misc3 restbase on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 7231 [21:03:22] RECOVERY - cp3 Stunnel Http for misc3 on cp3 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 0.493 second response time [21:03:23] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.789 second response time [21:03:26] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [21:03:27] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [21:03:32] also icinga-miraheze is making it really hard to chat here :P [21:03:34] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.392 second response time [21:03:44] misc3 may have been the cause [21:03:48] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [21:04:06] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:04:16] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 1% [21:04:23] Voidwalker: Given that meta just went back up it should the bot should be doone [21:04:27] done [21:05:11] paladox, most likely considering it's the first service to report problems [21:05:22] Voidwalker: i have suggested moving the bot to another channel before [21:06:05] every now and again, it seems like a decent idea :) [21:06:17] i think there should be a #miraheze-maintenance like RhinosF1 said [21:06:25] we should make an RFC [21:06:31] maybe others agree [21:06:58] Examknow: not sure whether it needs an 'RfC' as such - maybe just CN [21:07:12] maybe [21:09:54] Voidwalker: So what about globalCU [21:10:29] Voidwalker yup [21:10:51] currently, I'm in the process of tearing the whole thing apart to try and do what I want, but it's very slow going [21:11:02] ok [21:11:19] I did not really put much docs in there sorry [21:14:36] meta is down again [21:15:04] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:15:14] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:15:25] PROBLEM - misc3 SSH on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:15:26] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [21:15:27] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:15:29] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:15:30] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:15:36] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:15:37] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:15:48] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [21:15:51] PROBLEM - cp3 Stunnel Http for misc3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:15:52] huh [21:15:59] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:06] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:16:09] PROBLEM - misc3 lizard.miraheze.org HTTPS on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:16:14] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:16:16] PROBLEM - cp2 Stunnel Http for misc3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:21] PROBLEM - misc3 electron on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:16:23] ffs [21:16:23] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:27] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:29] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:30] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:31] misc3 again? [21:16:38] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:39] PROBLEM - cp4 Stunnel Http for misc3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:50] PROBLEM - misc3 Current Load on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:16:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [21:16:55] ssh is not working [21:16:58] PROBLEM - misc3 Disk Space on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:17:24] PROBLEM - misc3 zotero on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:17:29] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 57% [21:17:52] RECOVERY - cp3 Stunnel Http for misc3 on cp3 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 2.347 second response time [21:18:08] RECOVERY - misc3 lizard.miraheze.org HTTPS on misc3 is OK: HTTP OK: Status line output matched "HTTP/1.1 401 Unauthorized" - 381 bytes in 0.012 second response time [21:18:13] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.006 second response time [21:18:13] RECOVERY - cp2 Stunnel Http for misc3 on cp2 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 0.299 second response time [21:18:16] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 54% [21:18:19] RECOVERY - misc3 electron on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 3000 [21:18:21] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.676 second response time [21:18:24] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures [21:18:25] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.396 second response time [21:18:27] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.400 second response time [21:18:38] RECOVERY - cp4 Stunnel Http for misc3 on cp4 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 0.005 second response time [21:18:38] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.307 second response time [21:18:47] RECOVERY - misc3 Current Load on misc3 is OK: OK - load average: 0.33, 0.51, 0.53 [21:18:52] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [21:18:53] RECOVERY - misc3 Disk Space on misc3 is OK: DISK OK - free space: / 42657 MB (88% inode=94%); [21:19:11] and we are back online [21:19:15] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.637 second response time [21:19:21] RECOVERY - misc3 zotero on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 1969 [21:19:26] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [21:19:28] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.018 second response time [21:19:29] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 2% [21:19:31] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.659 second response time [21:19:34] RECOVERY - misc3 SSH on misc3 is OK: SSH OK - OpenSSH_7.9p1 Debian-10 (protocol 2.0) [21:19:34] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.410 second response time [21:19:36] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.006 second response time [21:19:40] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [21:19:48] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [21:19:54] Examknow: yep [21:20:04] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 1.400 second response time [21:20:07] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:20:16] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 0% [21:21:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [21:33:53] Voidwalker: In the meantime is there something I can do on meta? [21:34:10] Examknow: let me think [21:34:15] been trying to get wikicreator for a while but there isnt much to do [21:34:34] Examknow: i'd give it until you've been active for a bit [21:34:47] i know [21:34:57] that is why I need something to do on wiki [21:35:46] Examknow: find an answer to https://meta.miraheze.org/wiki/Stewards%27_noticeboard#Hiding_the_.22from_.2Awikiname.2A.22_text [21:35:46] [ Stewards' noticeboard - Miraheze Meta ] - meta.miraheze.org [21:37:14] I know for that one, you have to hide something with css or edit some system message [21:37:44] Voidwalker: but exactly which one and how [21:38:07] could you tell me so I can answer it [21:38:32] * RhinosF1 has no clue or'd have done it by now [21:39:19] Examknow: we know what they mean or at least I do - it's on some pages [21:40:09] RhinosF1: Is he talking about his local wiki? [21:40:41] Examknow: it's on all wikis so just giving a solution would be best and they can use global.css if it's a css hack to fix [21:40:52] Voidwalker: iirc it's the tagline [21:41:02] and shouldn't GCU be a candidate for translatewiki [21:41:05] hey SPF|Cloud [21:41:53] RhinosF1: I see [21:42:05] Hi [21:42:11] SPF|Cloud: how u doing? [21:42:26] I'm fine [21:42:30] Examknow: Do you want me to see about translatewiki [21:42:33] SPF|Cloud: good [21:42:45] Yes [21:42:53] Examknow: will do [21:42:58] Actually going to bed but couldn't leave this question unanswered [21:43:33] Examknow: https://phpcodechecker.com/ [21:43:33] [ PHP Code Checker - Syntax Check for Common PHP Mistakes ] - phpcodechecker.com [21:43:57] SPF|Cloud: you sound like me not liking leaving stuff [21:47:43] Examknow: could you give me owner access so I can setup translate wiki [21:48:02] ?? [21:48:20] Examknow: I need access to add the translate wiki bot [21:49:03] you mean for the repo? [21:49:08] Examknow: ye [21:49:53] just tell me how to do it and I will do it [21:51:45] RhinosF1: Hello? [21:52:10] Examknow: add @translatewiki with push access [21:53:04] done [22:04:35] Examknow: see PR [22:07:08] did you change my flags on your channel? [22:07:36] Examknow: swapping it to your new cloak [22:07:42] ok [22:12:18] Examknow: Am i okay to merge the syntax fixes? [22:12:25] ok [22:12:35] * RhinosF1 takes that as a yes [22:12:55] yup [22:14:33] does anyone mind me sticking auto patrolled on examknow? [22:14:41] Voidwalker, JohnLewis: thoughts? [22:14:55] it's fine [22:15:14] * RhinosF1 waits for meta [22:15:26] I am currently copying over some of my user scripts [22:16:22] Examknow: cool, saves me having to patrol them. That's something you can help with. Marking revisions as patrolled [22:16:41] RhinosF1: Thanks [22:16:41] sorry for disappearing, but it's either blanking MediaWiki:Tagline or adding #siteSub { display:none; } to Common.css [22:17:10] oh I already figured that out myself [22:17:25] just wasnt sure what he was talking about at first [22:17:27] Examknow: also pings don't work if you don't resign [22:20:10] so the request I was working on in the stewards noticeboard I dont think should have been there so should I move it or just tell the user for next time [22:21:12] Examknow: tell them for next time [22:21:17] ok [22:21:27] Voidwalker: do you want to look at closing [[Discord/RfC]] [22:24:02] might want to leave that to whoever is in operations on discord, as they have the ability to action it [22:24:57] Voidwalker: thought it was both ops and stewards but i'll poke Reception123 tommorow [22:25:25] nah, I can do anything else though :P [22:25:33] Voidwalker: ah [22:25:52] just can't manage channels [22:26:00] + some other things [22:26:05] Voidwalker: gotcha [22:26:15] Voidwalker is very powerful [22:26:26] random [22:26:34] :P [22:27:02] actually, I can manage existing channels, but whatever at this point [22:27:12] and JohnLewis would probably be the most - techincally any of the steward + operations members that share both [22:27:39] well JohnLewis is the founder [22:27:57] Examknow: that's very true [22:28:29] but in all fairness it's pretty community orientated anyway and allows the community to make a lot of calls [22:28:43] yeah [22:29:16] got to be pretty honest, when I first came about miraheze I thought it was just a rip off of the wmf [22:29:27] but you guys are pretty cool [22:29:41] having technical access is not an invitation to use that access [22:29:55] ?? [22:30:17] Examknow: not everything sysadmins can do we are supposed to do [22:30:25] we as in them [22:30:33] ah [22:30:38] huh? [22:30:41] couldnt agree more [22:30:42] kinda in response to "but in all fairness it's pretty community orientated anyway and allows the community to make a lot of calls" [22:32:49] I'm confused about what I pinged over ;) [22:32:56] if it involves Discord, I don't use it [22:33:03] JohnLewis: how powerful you are [22:33:11] lol yeah [22:33:15] and yeah not on discord cause John isn;t there [22:33:29] PROBLEM - test1 Current Load on test1 is CRITICAL: CRITICAL - load average: 2.98, 1.96, 1.30 [22:35:29] PROBLEM - test1 Current Load on test1 is WARNING: WARNING - load average: 1.62, 1.94, 1.38 [22:35:32] Examknow: made https://meta.miraheze.org/w/index.php?title=Template:Freenode&diff=81348&oldid=81068 actually work and more simple [22:35:33] [ Difference between revisions of "Template:Freenode" - Miraheze Meta ] - meta.miraheze.org [22:36:36] RhinosF1: Thanks! [22:37:07] Examknow: np, straight importing uses too many enwp only stuff [22:37:29] RECOVERY - test1 Current Load on test1 is OK: OK - load average: 1.12, 1.57, 1.30 [22:46:31] Voidwalker: What kind of things should I do before I request wiki creator other than be active [22:46:44] Examknow: know policy [22:47:15] although don't worry, we aren't wikipedia and going to give you 600 wiki requests to answer correctly first [22:47:36] also anyone who wants to give me a nom should feel glad to ;) [22:47:51] Examknow: stay active for a few week and ping me [22:47:56] ok [22:48:08] * RhinosF1 will happily consider it [22:49:53] Thanks! [22:51:29] PROBLEM - test1 Current Load on test1 is CRITICAL: CRITICAL - load average: 3.09, 2.11, 1.57 [22:52:00] * RhinosF1 looks over at icinga-miraheze and sighs again - it's only test1 [22:52:37] icinga is getting pretty annoying [22:52:54] Examknow: wait until there's batch of icinga spam [22:53:22] i have seen it [22:53:29] PROBLEM - test1 Current Load on test1 is WARNING: WARNING - load average: 1.07, 1.80, 1.53 [22:53:46] here we go again [22:54:02] Examknow: normally either SSL catching up and renewing at once or trash can puppet alerts that are pointless [22:54:14] for real [22:54:26] Examknow: for real [22:57:29] PROBLEM - test1 Current Load on test1 is CRITICAL: CRITICAL - load average: 2.43, 2.35, 1.82 [22:57:49] who's running something at this time? [22:58:39] paladox: that you? ^ [22:58:53] yeh, it's a import dump running [22:59:01] paladox: ah cool [22:59:11] Examknow: there you go, blame it on paladox [22:59:34] blames paladox [22:59:43] Examknow: /me [22:59:44] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_33 [+0/-0/±1] 13https://git.io/fjAKo [22:59:45] [02miraheze/mediawiki] 07paladox 032009856 - Update Echo [23:06:24] !log run runJob.php on test1 [23:06:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [23:09:29] PROBLEM - test1 Current Load on test1 is WARNING: WARNING - load average: 0.63, 1.57, 1.80 [23:13:29] RECOVERY - test1 Current Load on test1 is OK: OK - load average: 0.96, 1.29, 1.63 [23:20:09] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjAKb [23:20:11] [02miraheze/services] 07MirahezeSSLBot 03a4d29a9 - BOT: Updating services config for wikis [23:22:15] PROBLEM - misc2 HTTPS on misc2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:22:53] paladox: keep an eye pls ^ [23:23:16] misc2 works for me [23:23:50] paladox: looks to be fine, 2 alerts have just recovered so not sure [23:24:12] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 41810 bytes in 0.096 second response time [23:24:24] clear [23:24:57] paladox: can u look at the orange banner on misc2. add see if it's worth fixing [23:25:10] orange banner? [23:25:32] paladox: yeah, the massive orange warning when going to misc2.miraheze.org [23:26:19] oh, definitly not something we should fix. Since we want piwik/matomo traffic to only go over matomo.miraheze.org. Not misc2.miraheze.org. [23:26:34] paladox: k [23:26:37] :) [23:45:57] PROBLEM - mw2 Current Load on mw2 is CRITICAL: CRITICAL - load average: 8.52, 7.15, 5.16 [23:47:54] RECOVERY - mw2 Current Load on mw2 is OK: OK - load average: 4.96, 6.45, 5.15