[06:26:33] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3076 MB (12% inode=94%); [07:24:33] PROBLEM - private.revi.wiki - Comodo on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:34] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:24:34] PROBLEM - wiki.autocountsoft.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:38] PROBLEM - wiki.cloudytheology.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:38] PROBLEM - guiasdobrasil.com.br - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:39] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:43] PROBLEM - disabled.life - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:45] PROBLEM - reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:02] PROBLEM - cp4 Disk Space on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:14] PROBLEM - netazar.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:17] PROBLEM - cp4 Puppet on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:19] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:28] PROBLEM - cp4 SSH on cp4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:33] PROBLEM - cp4 Current Load on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [07:25:41] PROBLEM - Host cp4 is DOWN: PING CRITICAL - Packet loss = 100% [07:25:55] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [07:26:06] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [07:26:19] Oh crap - cp4 died [07:26:28] RECOVERY - private.revi.wiki - Comodo on sslhost is OK: OK - Certificate 'private.revi.wiki' will expire on Thu 07 Nov 2019 11:59:59 PM GMT +0000. [07:26:30] RECOVERY - wiki.autocountsoft.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.autocountsoft.com' will expire on Sun 10 Nov 2019 11:13:25 AM GMT +0000. [07:26:32] RECOVERY - wiki.cloudytheology.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.cloudytheology.com' will expire on Sun 15 Sep 2019 11:17:19 PM GMT +0000. [07:26:38] RECOVERY - disabled.life - LetsEncrypt on sslhost is OK: OK - Certificate 'disabled.life' will expire on Sun 10 Nov 2019 10:58:16 AM GMT +0000. [07:26:38] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [07:26:40] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [07:27:45] paladox, Reception123: ^ ideas? [07:27:54] Cp4 is showing as down [07:28:04] looking [07:29:24] RhinosF1: RamNode again [07:29:33] Ok [07:29:50] opening a ticket so it should be back soon [07:31:05] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:35:13] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [07:37:32] RECOVERY - netazar.org - LetsEncrypt on sslhost is OK: OK - Certificate 'www.netazar.org' will expire on Mon 07 Oct 2019 06:46:45 PM GMT +0000. [07:44:47] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:01:49] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:06:16] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:08:15] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:14:41] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:16:41] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:21:08] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:27:25] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:33:51] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:35:30] Reception123: ramnode on a bank holiday slow go? [08:35:47] I will let you know when I see JohnLewis and I will deliver that message to them [08:35:47] @notify JohnLewis https://anopequotes.org/?442 [08:35:48] [ Anope Quotes - The unofficial quotes site of Anope IRC Services ] - anopequotes.org [08:40:25] Perhaps [08:40:33] I did high priority so not sure that's taking that long [08:40:38] :) [08:40:41] *What's [08:48:44] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [08:53:10] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:25:15] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [09:36:48] PROBLEM - enc.for.uz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:47] RECOVERY - enc.for.uz - LetsEncrypt on sslhost is OK: OK - Certificate 'enc.for.uz' will expire on Wed 13 Nov 2019 01:50:42 PM GMT +0000. [09:39:24] RECOVERY - guiasdobrasil.com.br - LetsEncrypt on sslhost is OK: OK - Certificate 'guiasdobrasil.com.br' will expire on Sat 16 Nov 2019 01:37:57 PM GMT +0000. [09:39:32] RECOVERY - Host cp4 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [09:39:36] PROBLEM - cp4 Stunnel Http for test1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:39:36] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. PROBLEM - cp4 Stunnel Http for misc2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:39:36] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:39:36] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.004 second response time [09:39:48] RECOVERY - cp4 Stunnel Http for misc2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 0.077 second response time [09:39:55] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.003 second response time [09:39:57] RECOVERY - cp4 Disk Space on cp4 is OK: DISK OK - free space: / 20822 MB (50% inode=99%); [09:40:06] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 1% [09:40:07] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [09:40:19] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [09:40:51] RECOVERY - cp4 Current Load on cp4 is OK: OK - load average: 0.29, 0.18, 0.07 [09:40:54] RECOVERY - cp4 SSH on cp4 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) [09:41:01] RECOVERY - cp4 Stunnel Http for test1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24499 bytes in 0.015 second response time [09:41:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [09:43:00] RECOVERY - cp4 Puppet on cp4 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:30:17] Reception123: it came back then ^ [10:31:03] seems like it yeah [10:31:24] still discussing with RamNode about bandwidth exchange [10:32:20] Reception123: from ? To ? [10:32:37] RhinosF1: well cp4 connects to mw* [10:32:54] Reception123: ok [11:27:35] Hey JohnLewis [12:05:09] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjARX [12:05:11] [02miraheze/services] 07MirahezeSSLBot 03d26a218 - BOT: Updating services config for wikis [12:10:13] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjARH [12:10:14] [02miraheze/services] 07MirahezeSSLBot 03fd4f906 - BOT: Updating services config for wikis [12:18:00] JohnLewis: promotional user page - meta - delete? [12:19:28] Reception123: ^ [12:19:39] on it [12:20:13] Reception123: CA shows no other edits so not a spam bot I'd say just promotional [12:20:45] https://www.irccloud.com/pastebin/O0tnqXCR [12:20:45] [ Snippet | IRCCloud ] - www.irccloud.com [12:20:59] Reception123: cp4 playing up ^ [12:21:10] We're down [12:21:12] just saw when I was trying to go on Meta [12:21:23] And back [12:21:36] But slow [12:22:06] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [12:22:21] I can't access anything for now [12:22:21] Reception123: see Icinga web [12:22:31] ok looks fine now [12:22:51] PROBLEM - kkutu.wiki - LetsEncrypt on sslhost is CRITICAL: connect to address kkutu.wiki and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [12:23:04] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [12:23:16] Reception123: ^ here it comes [12:23:26] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw2 [12:23:46] well I dealt with the Meta spam [12:23:48] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw2 [12:23:48] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [12:23:49] * RhinosF1 prods JohnLewis and paladox [12:24:03] now it's mw* what is going on [12:24:06] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [12:24:47] RECOVERY - kkutu.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'kkutu.wiki' will expire on Sat 16 Nov 2019 02:04:40 PM GMT +0000. [12:25:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [12:25:12] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: HTTP CRITICAL - No data received from host [12:25:16] MW2 is down [12:25:18] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:27] PROBLEM - mw2 SSH on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:43] Reception123: RamNode? Mw2 [12:25:46] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:25:48] I can see, checking [12:26:00] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: HTTP CRITICAL - No data received from host [12:26:07] RhinosF1: no, mw2 does not appear suspended by RamNode [12:26:19] PROBLEM - mw2 Current Load on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:26:23] can't connect to it though [12:26:26] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:26:39] Reception123: well something has taken it offline [12:26:39] PROBLEM - mw2 php-fpm on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:26:49] RhinosF1: yes, RamNode status indicates "Offline" but not a suspension [12:26:55] PROBLEM - Host mw2 is DOWN: PING CRITICAL - Packet loss = 100% [12:27:15] Ah [12:27:21] RhinosF1: hm? [12:29:28] * RhinosF1 doesn't understand what broke it [12:29:55] me neither [12:30:02] paladox: JohnLewis when you're around pls ping me [12:30:21] Pls [12:33:32] !log depool mw2 [12:33:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:33:46] !log reboot mw2 (offline) [12:33:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:34:13] RECOVERY - Host mw2 is UP: PING WARNING - Packet loss = 80%, RTA = 0.28 ms [12:34:15] PROBLEM - mw2 Disk Space on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:34:29] PROBLEM - kkutu.wiki - LetsEncrypt on sslhost is CRITICAL: connect to address kkutu.wiki and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [12:34:40] Yey [12:34:50] It's up! [12:34:59] Reception123: recovering [12:35:11] yeah, when I can connect to it I'll repool [12:35:15] still strange that it just shut down like that [12:35:42] Reception123: doesn't make sense. Any log for it? [12:35:57] don't think there's a server-shut-down.log :P [12:36:14] Reception123: anyone on it at the time? [12:36:23] not afaik [12:36:42] Doesn't add up - it shouldn't randomly turn off [12:36:56] If there's no error log then we can't use that [12:37:07] still can't ssh to it though [12:37:41] Reception123: Icinga says connection refused [12:37:43] RhinosF1: might be https://phabricator.miraheze.org/T4128 [12:37:44] [ ⚓ T4128 Fix SSH not working after reboot for various servers ] - phabricator.miraheze.org [12:38:11] Reception123: could be [12:38:21] RhinosF1: ping works so I can only imagine it being that that's why I can't ssh [12:38:21] Try it then [12:38:36] Reception123: probably - try what it says then [12:39:37] will do [12:40:21] RECOVERY - kkutu.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'kkutu.wiki' will expire on Sat 16 Nov 2019 02:04:40 PM GMT +0000. [12:43:56] RECOVERY - mw2 SSH on mw2 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) [12:44:34] !log started sshd and removed /var/run/nologin on mw2 [12:44:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:45:02] RhinosF1: and it's back [12:45:06] !log repool mw2 [12:45:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:45:54] RECOVERY - mw2 Current Load on mw2 is OK: OK - load average: 0.29, 0.08, 0.02 [12:46:05] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.025 second response time [12:46:14] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [12:46:48] RECOVERY - mw2 Disk Space on mw2 is OK: DISK OK - free space: / 32919 MB (42% inode=98%); [12:46:59] RECOVERY - mw2 php-fpm on mw2 is OK: PROCS OK: 7 processes with command name 'php-fpm7.2' [12:47:12] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [12:47:13] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.392 second response time [12:47:26] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [12:47:32] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.687 second response time [12:47:48] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [12:47:48] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [12:52:03] PROBLEM - kkutu.wiki - LetsEncrypt on sslhost is CRITICAL: connect to address kkutu.wiki and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [12:57:35] Reception123: good job! [13:02:07] RhinosF1: thanks :) [13:04:47] Reception123: np [13:47:56] PROBLEM - mw2 Puppet on mw2 is WARNING: WARNING: Puppet last ran 1 hour ago [13:53:56] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:14:54] Voidwalker, JohnLewis: I can smell Trouble -> https://meta.miraheze.org/wiki/Special:CentralAuth?target=Sticky+Is+A+Clown+And+Dies+In+Hell [14:14:55] [ Global account information for Sticky Is A Clown And Dies In Hell - Miraheze Meta ] - meta.miraheze.org [14:28:39] Voidwalker: thx [14:55:56] PROBLEM - mw2 Puppet on mw2 is WARNING: WARNING: Puppet last ran 1 hour ago [14:56:53] paladox: ^ what's going on? It seems like puppet only runs when I run it manually [14:57:00] hmm [14:57:02] at least according to icinga [14:57:26] you never started the cron service Reception123 :) [14:57:37] !log root@mw2:/home/paladox# sudo service cron start [14:57:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [14:57:46] paladox: why would I need to do that manually? [14:57:56] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [14:58:15] because it's the same issue affecting ssh comming up after a reboot [14:58:23] ah ok [15:00:27] Someone should document that [15:00:37] Every step to be ran on a server restart [15:00:53] well they're not steps to be ran [15:00:54] it's a bug [15:01:09] that affects like 3 out of 17 servers [15:02:41] Ah [16:23:53] JohnLewis, Reception123, paladox: Can I get a check on a CW exception within the last few minutes? [16:25:02] JohnLewis: any chance of a rights assignment for https://bmtune.miraheze.org/wiki/Special:ListUsers? [16:25:04] [ User list - BMTune Honda wiki ] - bmtune.miraheze.org [16:26:30] can't find anything in exception.log [16:27:23] RhinosF1: what time was it? [16:27:26] Only thing I've got is [16:27:27] Mon Aug 26 15:10:08 UTC 2019 mw1 metawiki Error connecting to mediawiki-internal-db-master.miraheze.org: :real_connect(): (HY000/2002): [16:27:50] Reception123: that looks to early [16:28:00] well I can't really find anything else [16:28:00] And per yesterday no code to give you [16:29:00] RhinosF1: https://meta.miraheze.org/wiki/Special:RequestWikiQueue/9058#mw-section-request was it right? so I can mark as approved [16:29:01] [ Wiki requests queue - Miraheze Meta ] - meta.miraheze.org [16:29:39] Reception123: needs rights doing [16:30:02] well as John said I should probably not do that unless there's really no one around, so I'll wait for him or Void to get back [16:30:44] Reception123: yeah, time to hang around for a bit [16:44:09] Reception123, you called? :) [16:44:24] Voidwalker: yeah rights needed on https://bmtune.miraheze.org/wiki/Special:ListUsers [16:44:24] [ User list - BMTune Honda wiki ] - bmtune.miraheze.org [16:44:31] (the user already logged in so no need for me to do anything this time) [16:45:22] done [16:45:51] thanks :) [16:46:37] !log exporting matomo db, and reimporting it on db5 [16:46:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:47:55] PROBLEM - misc2 HTTPS on misc2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 0.007 second response time [16:48:58] PROBLEM - cp3 Stunnel Http for misc2 on cp3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 0.508 second response time [16:49:01] PROBLEM - cp2 Stunnel Http for misc2 on cp2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 0.295 second response time [16:49:16] what's going on this time... [16:49:36] PROBLEM - cp4 Stunnel Http for misc2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:50:08] Reception123: can u mark the WR as approved? [16:50:33] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9058; [16:50:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:52:10] Reception123: that'll be paladox seen as Matamo is misc 2 isn't it [16:52:25] ah true [16:52:47] yeh [16:52:48] that's mean [16:52:56] but i dunno why misc2 decided to do that... [16:54:06] paladox: what's mean? [16:54:16] not sure what you mean? [16:55:03] paladox: read scrollback - or did your spelling fail? [16:55:43] oh, spelling mistake, that's ment to be me [16:58:58] RECOVERY - cp3 Stunnel Http for misc2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 1.034 second response time [16:59:01] RECOVERY - cp2 Stunnel Http for misc2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 0.563 second response time [16:59:26] RECOVERY - cp4 Stunnel Http for misc2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 41802 bytes in 0.112 second response time [16:59:51] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 41810 bytes in 0.115 second response time [17:10:33] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2649 MB (10% inode=94%); [17:13:02] Reception123: ^ recovering [17:17:34] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjA2h [17:17:35] [02miraheze/mw-config] 07Reception123 03dd96e66 - exempt electronpdf from cookiewarning [17:17:38] ^ RhinosF1 this is what we're trying out [17:19:45] RhinosF1: doesn't work due to varnsih [17:19:47] *varnish [17:20:04] Okay [17:22:50] CVT stuff has been updated [17:24:03] Voidwalker: great, thanks :) [18:17:38] !log reception@mw1:/srv/mediawiki/w/extensions/CreateWiki/maintenance$ sudo -u www-data php populateMainPage.php --wiki=aescraftwiki [18:17:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:19:06] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9062; [18:19:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:22:44] !log reception@mw1:/srv/mediawiki/w/extensions/CreateWiki/maintenance$ sudo -u www-data php populateMainPage.php --wiki=2b2tmcpewiki [18:22:48] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:23:18] !log reception@mw1:/srv/mediawiki/w/extensions/CentralAuth/maintenance$ sudo -u www-data php createLocalAccount.php --wiki=2b2tmcpewiki 'Memelord27 ' [18:23:19] Voidwalker: ^ [18:23:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:25:50] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9063; [18:25:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:27:20] * RhinosF1 slaps Reception123 with a trout [18:27:34] lol [18:27:43] RhinosF1: fixed but corrected on-wiki instead of saying "set other to inreview" etc. [18:27:51] since it's the same effect [18:28:40] Reception123: k thx [18:34:52] !log drop piwik database and reimport on db5 [18:35:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:11:15] * RhinosF1 prods JohnLewis [19:11:28] JohnLewis: u around? [19:11:38] RhinosF1: kinda but not really rn [19:12:15] JohnLewis: u able to double check why we had 2 CW fails in a row earlier and confirm it should work again [19:12:29] jobqueue I bet [19:12:55] JohnLewis: I could guess that [19:13:05] But Reception123 couldn't find it in the logs [19:13:15] then the wiki exists :P [19:13:20] it'll be jobqueue or wiki exists [19:13:36] JohnLewis: JQ then [19:13:41] Should it work now? [19:13:46] 2 in a row unusual [19:13:51] Probably that annoying jobqueue again [19:14:23] Redis ooms, so jq then fails until redis starts up [19:14:34] paladox: ah [19:14:52] * RhinosF1 tries another request [19:15:37] paladox: still down [19:16:12] It’ll be the wiki exists [19:16:35] paladox: https://goanimatev6.miraheze.org/wiki/Special:ListUsers [19:16:36] [ User list - GoAnimate V6 ] - goanimatev6.miraheze.org [19:16:48] I need a main page and local account from u guys first [19:17:29] JohnLewis, Reception123: ^ [19:18:57] urgh more... [19:19:27] Yep [19:19:34] !log reception@mw1:/srv/mediawiki/w/extensions/CentralAuth/maintenance$ sudo -u www-data php createLocalAccount.php --wiki=goanimatev6wiki 'Eijitheawesome' [19:19:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:19:42] Will someone please get it working again for everyone's sake [19:20:02] JohnLewis, Voidwalker: ^ add rights pls [19:20:04] !log reception@mw1:/srv/mediawiki/w/extensions/CreateWiki/maintenance$ sudo -u www-data php populateMainPage.php --wiki=goanimatev6wiki [19:20:05] That’ll require a resource bump for misc2 [19:20:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:20:20] paladox: should I try a jobrunner restart? [19:20:27] paladox: well without it we can't create wikis [19:20:43] Yup [19:21:06] paladox: and that's a bloody big issue [19:21:23] Yup [19:21:47] paladox: so can we try something pls [19:21:49] done [19:22:02] Reception123: ^ approve in the dB pls [19:22:12] !log restarted jobrunner [19:22:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:22:23] Thank you Reception123 [19:22:30] RhinosF1: not much I can do, requires resource bump [19:22:32] Mark that approved and I'll try the other [19:22:38] Reception123: ^ [19:22:48] !log MariaDB [metawiki]> update cw_requests set cw_status = 'approved' where cw_id = 9063; [19:22:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:22:55] Thx [19:22:57] RhinosF1: love how wiki creation is now a three man process [19:23:02] Reception123: so do I [19:23:03] 1) the creator 2) the sysadmin 3) the steward [19:23:13] Shall I do the next one? [19:23:16] more tedious than even deleting a wiki tbh [19:23:31] RhinosF1: I mean might as well do it now while all the people involved in creation are around ;) [19:23:51] Reception123: worked!! [19:24:05] RhinosF1: great :0 [19:24:08] *:D [19:24:14] looks like it was that jobqueue after all again [19:24:22] I never liked it, always some issue with it [19:24:27] It was even on test1 at some point [19:24:34] * Voidwalker is relieved :) [19:24:36] Huh [19:24:45] * paladox facepalms [19:25:16] * RhinosF1 cheers [19:25:29] * RhinosF1 then punches the jobQueue [19:25:45] good, if not I had to create a new userbox saying "This user assists wiki creators with creation" [19:26:07] Reception123: :) [19:26:10] * RhinosF1 laughs [20:00:44] Miraheze Logo 503 Backend fetch failed [20:00:46] urgh again [20:01:10] paladox: ^ [20:01:34] I’m mobile [20:02:07] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:02:32] it's back [20:02:42] though I don't get why it keeps doing this again [20:03:33] paladox: matomo is back by the way :) [20:04:06] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:06:52] Ok :) [20:06:55] Check logs [20:52:16] hello [20:53:50] Hi Examknow [20:54:21] thanks for fixing the grammar issues [20:54:40] probably should not have done the i18n files late at night [20:54:54] Examknow: np, did u see the other PR? [20:55:07] yup [20:55:14] And Voidwalker checked it and had a few thoughts [20:55:31] ok [20:55:36] I did not see those [20:55:44] Voidwalker: ^ do you want to do the honours of explaining [20:56:05] please do [20:56:09] Examknow: see phab but Voidwalker thinks he has an idea on how to do it [20:56:15] ok [20:57:17] Voidwalker: What is your github? [20:58:26] https://github.com/The-Voidwalker [20:58:26] [ The-Voidwalker · GitHub ] - github.com [20:58:43] PROBLEM - misc3 Current Load on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:58:52] PROBLEM - misc3 SSH on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:58:55] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:58:58] PROBLEM - misc3 Disk Space on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:04] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:59:05] Voidwalker: Added you to repo [20:59:07] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:09] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:12] PROBLEM - misc3 proton on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:13] PROBLEM - cp3 Stunnel Http for misc3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:13] PROBLEM - misc3 zotero on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:18] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:22] PROBLEM - misc3 restbase on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:22] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:25] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:26] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:59:36] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:59:44] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 55% [20:59:46] PROBLEM - misc3 lizard.miraheze.org HTTPS on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:48] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:59:59] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:00:00] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:05] Somebody broke something [21:00:07] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:00:09] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:10] PROBLEM - cp2 Stunnel Http for misc3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:12] PROBLEM - misc3 electron on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:00:13] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:16] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:17] PROBLEM - cp4 Stunnel Http for misc3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:17] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:30] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:00:59] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [21:01:19] JohnLewis, paladox, Reception123 ^ [21:01:40] meta just went down [21:01:42] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 37% [21:01:45] RECOVERY - misc3 lizard.miraheze.org HTTPS on misc3 is OK: HTTP OK: Status line output matched "HTTP/1.1 401 Unauthorized" - 381 bytes in 0.027 second response time [21:01:57] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 1.336 second response time [21:01:58] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [21:02:07]