[00:11:37] RECOVERY - contraao.com - LetsEncrypt on sslhost is OK: OK - Certificate 'contraao.com' will expire on Mon 06 Apr 2020 10:59:38 PM GMT +0000. [00:11:37] RECOVERY - wiki.contraao.com - LetsEncrypt on sslhost is OK: OK - Certificate 'contraao.com' will expire on Mon 06 Apr 2020 10:59:38 PM GMT +0000. [01:10:07] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvel7 [01:10:09] [02miraheze/services] 07MirahezeSSLBot 0326d8dd7 - BOT: Updating services config for wikis [01:55:07] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve8y [01:55:09] [02miraheze/services] 07MirahezeSSLBot 03a256d8a - BOT: Updating services config for wikis [06:25:28] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 2742 MB (11% inode=94%); [06:56:01] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[nginx] [07:02:24] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [08:26:34] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php7.3-redis] [08:33:01] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [08:44:12] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2400:6180:0:d0::403:f001/cpweb [08:46:08] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [09:01:55] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [09:03:53] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [09:56:21] PROBLEM - wiki.kourouklides.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.kourouklides.com' expires in 15 day(s) (Fri 24 Jan 2020 09:52:54 AM GMT +0000). [09:56:35] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvegH [09:56:36] [02miraheze/ssl] 07MirahezeSSLBot 03129f9ed - Bot: Update SSL cert for wiki.kourouklides.com [09:57:44] PROBLEM - wiki.ameciclo.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.ameciclo.org' expires in 15 day(s) (Fri 24 Jan 2020 09:54:34 AM GMT +0000). [09:57:58] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvegd [09:57:59] [02miraheze/ssl] 07MirahezeSSLBot 039af681a - Bot: Update SSL cert for wiki.ameciclo.org [09:59:48] PROBLEM - wiki.macc.nyc - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.macc.nyc' expires in 15 day(s) (Fri 24 Jan 2020 09:57:47 AM GMT +0000). [09:59:49] PROBLEM - wiki.ldmsys.net - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.ldmsys.net' expires in 15 day(s) (Fri 24 Jan 2020 09:56:01 AM GMT +0000). [10:00:02] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvegh [10:00:04] [02miraheze/ssl] 07MirahezeSSLBot 030734ffb - Bot: Update SSL cert for wiki.macc.nyc [10:00:26] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2e [10:00:27] [02miraheze/ssl] 07MirahezeSSLBot 0357ea278 - Bot: Update SSL cert for wiki.ldmsys.net [10:01:48] RECOVERY - wiki.ameciclo.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.ameciclo.org' will expire on Tue 07 Apr 2020 08:57:51 AM GMT +0000. [10:02:17] RECOVERY - wiki.kourouklides.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.kourouklides.com' will expire on Tue 07 Apr 2020 08:56:28 AM GMT +0000. [10:08:01] PROBLEM - athenapedia.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'athenapedia.org' expires in 15 day(s) (Fri 24 Jan 2020 10:04:36 AM GMT +0000). [10:08:18] PROBLEM - podpedia.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'podpedia.org' expires in 15 day(s) (Fri 24 Jan 2020 10:04:09 AM GMT +0000). [10:08:22] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2m [10:08:22] [02miraheze/ssl] 07MirahezeSSLBot 0362fc5dd - Bot: Update SSL cert for athenapedia.org [10:08:32] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2Y [10:08:33] [02miraheze/ssl] 07MirahezeSSLBot 030afe1db - Bot: Update SSL cert for podpedia.org [10:09:04] PROBLEM - pwiki.arkcls.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'pwiki.arkcls.com' expires in 15 day(s) (Fri 24 Jan 2020 10:05:40 AM GMT +0000). [10:09:19] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2s [10:09:20] [02miraheze/ssl] 07MirahezeSSLBot 032f34355 - Bot: Update SSL cert for pwiki.arkcls.com [10:10:06] PROBLEM - kunwok.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'kunwok.org' expires in 15 day(s) (Fri 24 Jan 2020 10:06:35 AM GMT +0000). [10:10:25] PROBLEM - pl.nonbinary.wiki - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'nonbinary.wiki' expires in 15 day(s) (Fri 24 Jan 2020 10:06:59 AM GMT +0000). [10:10:26] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2Z [10:10:27] [02miraheze/ssl] 07MirahezeSSLBot 03d483206 - Bot: Update SSL cert for kunwok.org [10:10:43] PROBLEM - nonbinary.wiki - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'nonbinary.wiki' expires in 15 day(s) (Fri 24 Jan 2020 10:06:59 AM GMT +0000). [10:11:01] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2C [10:11:02] [02miraheze/ssl] 07MirahezeSSLBot 034d9e98c - Bot: Update SSL cert for nonbinary.wiki [10:11:59] PROBLEM - bconnected.aidanmarkham.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'bconnected.aidanmarkham.com' expires in 15 day(s) (Fri 24 Jan 2020 10:08:55 AM GMT +0000). [10:12:05] RECOVERY - wiki.macc.nyc - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.macc.nyc' will expire on Tue 07 Apr 2020 08:59:55 AM GMT +0000. [10:12:10] RECOVERY - wiki.ldmsys.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.ldmsys.net' will expire on Tue 07 Apr 2020 09:00:18 AM GMT +0000. [10:12:12] RECOVERY - athenapedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'athenapedia.org' will expire on Tue 07 Apr 2020 09:08:08 AM GMT +0000. [10:12:17] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve24 [10:12:18] [02miraheze/ssl] 07MirahezeSSLBot 03f243387 - Bot: Update SSL cert for bconnected.aidanmarkham.com [10:12:27] RECOVERY - podpedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'podpedia.org' will expire on Tue 07 Apr 2020 09:08:25 AM GMT +0000. [10:14:14] PROBLEM - wiki.consentcraft.uk - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.consentcraft.uk' expires in 15 day(s) (Fri 24 Jan 2020 10:12:05 AM GMT +0000). [10:14:27] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2u [10:14:28] [02miraheze/ssl] 07MirahezeSSLBot 03229c822 - Bot: Update SSL cert for wiki.consentcraft.uk [10:16:21] PROBLEM - garrettcountyguide.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'garrettcountyguide.com' expires in 15 day(s) (Fri 24 Jan 2020 10:13:59 AM GMT +0000). [10:16:37] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2g [10:16:38] [02miraheze/ssl] 07MirahezeSSLBot 03c1759e6 - Bot: Update SSL cert for garrettcountyguide.com [10:16:41] PROBLEM - wiki.staraves-no.cz - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.staraves-no.cz' expires in 15 day(s) (Fri 24 Jan 2020 10:13:45 AM GMT +0000). [10:16:55] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2a [10:16:57] [02miraheze/ssl] 07MirahezeSSLBot 03b635bb1 - Bot: Update SSL cert for wiki.staraves-no.cz [10:19:23] PROBLEM - meregos.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'meregos.com' expires in 15 day(s) (Fri 24 Jan 2020 10:15:38 AM GMT +0000). [10:19:36] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve2i [10:19:38] [02miraheze/ssl] 07MirahezeSSLBot 032571b63 - Bot: Update SSL cert for meregos.com [10:21:45] RECOVERY - bconnected.aidanmarkham.com - LetsEncrypt on sslhost is OK: OK - Certificate 'bconnected.aidanmarkham.com' will expire on Tue 07 Apr 2020 09:12:06 AM GMT +0000. [10:22:03] RECOVERY - kunwok.org - LetsEncrypt on sslhost is OK: OK - Certificate 'kunwok.org' will expire on Tue 07 Apr 2020 09:10:18 AM GMT +0000. [10:22:08] RECOVERY - wiki.consentcraft.uk - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.consentcraft.uk' will expire on Tue 07 Apr 2020 09:14:20 AM GMT +0000. [10:22:45] RECOVERY - wiki.staraves-no.cz - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.staraves-no.cz' will expire on Tue 07 Apr 2020 09:16:48 AM GMT +0000. [10:22:46] RECOVERY - pl.nonbinary.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'nonbinary.wiki' will expire on Tue 07 Apr 2020 09:10:49 AM GMT +0000. [10:23:00] RECOVERY - nonbinary.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'nonbinary.wiki' will expire on Tue 07 Apr 2020 09:10:49 AM GMT +0000. [10:23:16] RECOVERY - pwiki.arkcls.com - LetsEncrypt on sslhost is OK: OK - Certificate 'pwiki.arkcls.com' will expire on Tue 07 Apr 2020 09:09:11 AM GMT +0000. [10:24:07] RECOVERY - garrettcountyguide.com - LetsEncrypt on sslhost is OK: OK - Certificate 'garrettcountyguide.com' will expire on Tue 07 Apr 2020 09:16:26 AM GMT +0000. [10:33:36] RECOVERY - meregos.com - LetsEncrypt on sslhost is OK: OK - Certificate 'meregos.com' will expire on Tue 07 Apr 2020 09:19:30 AM GMT +0000. [11:02:13] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [11:02:19] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb [11:04:12] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [11:04:15] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [13:07:56] PROBLEM - wiki.graalmilitary.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.graalmilitary.com' expires in 15 day(s) (Fri 24 Jan 2020 01:04:20 PM GMT +0000). [13:08:10] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JverM [13:08:12] [02miraheze/ssl] 07MirahezeSSLBot 03298ac68 - Bot: Update SSL cert for wiki.graalmilitary.com [13:11:54] RECOVERY - wiki.graalmilitary.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.graalmilitary.com' will expire on Tue 07 Apr 2020 12:08:04 PM GMT +0000. [14:55:09] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JveiL [14:55:11] [02miraheze/services] 07MirahezeSSLBot 036803f6a - BOT: Updating services config for wikis [15:08:40] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [15:09:11] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:09:36] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [15:09:37] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw2 [15:09:40] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw2 [15:09:50] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [15:11:27] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:12:21] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4217 bytes in 0.028 second response time [15:12:49] paladox: ^ [15:13:05] hmm [15:13:27] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:13:48] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18667 bytes in 0.855 second response time [15:13:49] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:14:06] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.506 second response time [15:14:15] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:14:27] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18668 bytes in 5.135 second response time [15:14:42] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:14:49] seems to have recovered on it's own [15:16:16] Currently getting 503s [15:16:25] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15256 bytes in 0.005 second response time [15:16:40] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18675 bytes in 0.252 second response time [15:16:45] oh [15:17:00] Well this looks like it wasen't a host but maybe db/redis? [15:18:10] db4 looks like it's getting close to critical on space [15:18:35] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1366 bytes in 0.034 second response time [15:18:42] paladox: ^ Void [15:18:57] done [15:19:02] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:19:07] paladox: !log [15:19:15] PROBLEM - db4 Disk Space on db4 is WARNING: DISK WARNING - free space: / 24948 MB (6% inode=95%); [15:19:25] !log root@db4:/home/paladox# ./purge-binary.sh [15:19:28] !log root@db5:/home/paladox# ./purge-binary.sh [15:19:28] !log purge binary logs BEFORE '2020-01-08 08:00' [15:19:28] [15:19:29] paladox: and we arent back [15:19:35] paladox: oh :P you did it before me [15:19:42] oh damn [15:19:46] let me restart php [15:19:50] It wont render paladox [15:19:55] Just plain text [15:19:59] And 503s [15:20:22] !log restart php7.3-fpm on mw* and lizardfs6 [15:20:30] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 5.388 second response time [15:20:31] it's a 502 which varnish will show as a 503 [15:20:34] RECOVERY - db5 Disk Space on db5 is OK: DISK OK - free space: / 44896 MB (24% inode=99%); [15:20:37] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 3.935 second response time [15:20:48] paladox: ok [15:20:57] I get some 502s [15:20:59] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [15:21:49] paladox: back logo missing on testwiki [15:21:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [15:22:04] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:05] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:22:07] RhinosF1: back logo? [15:22:28] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:35] Zppix: wont load [15:22:51] paladox: instant commons fails [15:22:52] miraheze as a whole wont load for me [15:23:06] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18668 bytes in 0.813 second response time [15:23:07] link please :) [15:23:11] Zppix: its beung useless [15:23:19] .mh login User:RhinosF1 [15:23:19] https://login.miraheze.org/wiki/User:RhinosF1 [15:23:25] paladox: ^ [15:23:32] .mh test [15:23:32] https://meta.miraheze.org/wiki/test [15:23:42] .mh test main_page [15:23:42] https://test.miraheze.org/wiki/main_page [15:23:48] paladox: ^ fails logo [15:24:16] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18675 bytes in 0.662 second response time [15:24:28] working for me [15:24:39] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:24:48] paladox: k [15:24:57] and i've checked the mounts [15:25:01] they are all connected [15:25:05] Kk [15:25:33] paladox: logo back, instant commons still fails on my userpage [15:25:35] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw2 mw3 [15:25:43] uh [15:26:18] paladox: look at my login wiki userpage, I see a link instead of the file [15:26:36] timeout errors [15:26:46] Should be an icon there https://usercontent.irccloud-cdn.com/file/jW3LCiDU/IMG_5871.PNG [15:26:52] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:27:07] i carn't look at that now :) [15:27:15] prioritising stablising mw [15:27:16] Ok [15:27:18] Good [15:27:26] I need to go [15:29:09] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15256 bytes in 0.294 second response time [15:29:10] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15262 bytes in 0.004 second response time [15:29:22] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:29:56] !log restart nginx on mw3 [15:30:28] !log restart nginx on mw* and lizardfs6 [15:30:45] !log restart php7.3-fpm on mw* and lizardfs6 [15:31:01] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [15:31:25] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 0.975 second response time [15:31:41] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:31:53] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 18668 bytes in 1.411 second response time [15:32:13] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:35:23] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18686 bytes in 7.028 second response time [15:35:36] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:36:00] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.482 second response time [15:37:02] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:37:37] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:37:37] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:37:46] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1366 bytes in 0.120 second response time [15:38:25] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:39:06] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18667 bytes in 0.818 second response time [15:39:08] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 2.068 second response time [15:39:13] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 3.408 second response time [15:39:35] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15248 bytes in 0.514 second response time [15:39:43] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 0.214 second response time [15:39:45] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [15:40:06] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 2.094 second response time [15:42:40] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [15:43:28] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [15:43:37] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [15:43:39] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18667 bytes in 0.854 second response time [15:44:41] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 3.031 second response time [15:46:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [15:55:54] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:57:20] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 107.191.126.23/cpweb [15:58:05] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.375 second response time [16:00:21] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:02:16] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 4 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb [16:02:24] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw2 mw3 [16:02:25] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw2 mw3 [16:02:48] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:03:06] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:03:26] !log MariaDB [(none)]> SET GLOBAL slow_query_log = 'ON'; [16:03:30] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw3 [16:04:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:04:20] !log MariaDB [(none)]> SET GLOBAL long_query_time = 15; [16:04:21] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [16:04:22] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [16:04:52] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 5.055 second response time [16:04:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:05:00] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 4.138 second response time [16:05:14] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15249 bytes in 6.208 second response time [16:05:31] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [16:05:57] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [16:06:17] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [16:10:11] !log MariaDB [(none)]> SET GLOBAL long_query_time = 10; [16:10:28] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:14:05] !log MariaDB [(none)]> SET GLOBAL slow_query_log = 'OFF'; [16:14:13] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:30:00] paladox: instant commons still aint working [16:31:04] it is now [16:31:46] paladox: thx [16:33:04] !log MariaDB [(none)]> SET GLOBAL thread_pool_stall_limit = 10; [16:33:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:52:05] !log MariaDB [(none)]> SET GLOBAL thread_cache_size = 80; [16:52:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:53:38] paladox: what happened earlier? [16:54:15] I'm not entirely sure, but looks like it may have been the *db* [16:54:25] i say "may" so it's not definite that it was the db :) [16:55:11] Ok [16:56:00] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/JveMm [16:56:02] [02miraheze/puppet] 07paladox 03d242286 - mariadb: Increase thread-cache-size to 80 [16:56:03] [02puppet] 07paladox created branch 03paladox-patch-8 - 13https://git.io/vbiAS [16:56:05] [02puppet] 07paladox opened pull request 03#1175: mariadb: Increase thread-cache-size to 80 - 13https://git.io/JveMY [16:56:13] [02puppet] 07paladox closed pull request 03#1175: mariadb: Increase thread-cache-size to 80 - 13https://git.io/JveMY [16:56:15] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JveMO [16:56:16] [02miraheze/puppet] 07paladox 035f24004 - mariadb: Increase thread-cache-size to 80 (#1175) [16:56:31] [02puppet] 07paladox deleted branch 03paladox-patch-8 - 13https://git.io/vbiAS [16:56:33] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-8 [17:16:34] !log SET GLOBAL thread_cache_size = 100; - (db4|db5) [17:16:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:17:10] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JveDv [17:17:12] [02miraheze/puppet] 07paladox 034e37be8 - mariadb: Increase thread-cache-size to 100 [17:47:25] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2647 MB (10% inode=94%); [19:48:25] PROBLEM - db4 Current Load on db4 is WARNING: WARNING - load average: 7.75, 6.14, 3.18 [19:48:28] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2604:180:0:33b::2/cpweb [19:48:47] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:50:57] PROBLEM - db4 Current Load on db4 is CRITICAL: CRITICAL - load average: 8.37, 7.95, 4.38 [19:53:36] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:54:07] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:54:16] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.209 second response time [19:55:57] !log restart php7.3-fpm on lizardfs6 and mw* [19:56:29] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 0.625 second response time [19:56:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:56:56] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 6.721 second response time [19:56:58] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18673 bytes in 1.819 second response time [19:58:32] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:58:38] PROBLEM - cp2 Puppet on cp2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[en.gyaanipedia.co.in_private] [19:58:59] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:59:10] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:59:12] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:59:39] PROBLEM - cp3 Puppet on cp3 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[garrettcountyguide.com_private],File[vise.dayid.org_private] [20:01:13] RECOVERY - db4 Current Load on db4 is OK: OK - load average: 2.41, 6.46, 6.32 [20:02:30] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [20:02:39] RECOVERY - cp2 Puppet on cp2 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [20:03:39] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:03:56] RECOVERY - cp3 Puppet on cp3 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [20:12:41] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/JveHX [20:12:42] [02miraheze/puppet] 07paladox 03d76fc71 - mariadb: Switch on threads pool Also adjust configs (which is from https://github.com/wikimedia/puppet/blob/production/modules/role/templates/mariadb/mysqld_config/production.my.cnf.erb) [20:12:43] [ puppet/production.my.cnf.erb at production · wikimedia/puppet · GitHub ] - github.com [20:12:44] [02puppet] 07paladox created branch 03paladox-patch-8 - 13https://git.io/vbiAS [20:12:45] [02puppet] 07paladox opened pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [20:20:50] [02puppet] 07paladox synchronize pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [20:20:52] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/JveHA [20:20:53] [02miraheze/puppet] 07paladox 034e9968d - Update mw.cnf.erb [20:24:00] [02puppet] 07paladox synchronize pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [20:24:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/JveQe [20:24:03] [02miraheze/puppet] 07paladox 03bc5e40c - Update mw.cnf.erb [20:30:49] [02puppet] 07paladox synchronize pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [20:30:51] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/JveQt [20:30:52] [02miraheze/puppet] 07paladox 03087b814 - Update mw.cnf.erb [20:31:48] Hi JohnLewis [20:31:59] * hispano76 grettings [20:32:06] Hi hispano76 [20:32:28] hi [20:32:55] hi [20:33:00] How is everyone [20:33:11] I'm good :) you? [20:33:34] Not too bad [20:36:55] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:37:55] PROBLEM - cp3 Stunnel Http for misc2 on cp3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 1.668 second response time [20:37:57] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 309 bytes in 0.003 second response time [20:38:14] !log restart php7.3-fpm on lizardfs6 and mw* [20:38:36] PROBLEM - cp2 Stunnel Http for misc2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:38:57] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:39:15] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:39:22] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:39:37] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:39:51] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw3 [20:39:51] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw1 [20:39:56] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:40:20] RECOVERY - cp3 Stunnel Http for misc2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 43687 bytes in 0.844 second response time [20:40:29] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15249 bytes in 9.905 second response time [20:40:59] RECOVERY - cp2 Stunnel Http for misc2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 43687 bytes in 0.541 second response time [20:41:13] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 0.515 second response time [20:41:15] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15249 bytes in 0.007 second response time [20:41:20] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15256 bytes in 0.501 second response time [20:41:51] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [20:41:51] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [20:41:54] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:41:55] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [20:43:36] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:05:11] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:13:25] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:19:12] [02puppet] 07paladox synchronize pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [21:19:13] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/Jve7m [21:19:15] [02miraheze/puppet] 07paladox 03c8a5309 - Update mw.cnf.erb [21:20:18] [02puppet] 07paladox synchronize pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [21:20:20] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/Jve7Y [21:20:21] [02miraheze/puppet] 07paladox 03d3b11d7 - Update mw.cnf.erb [21:23:48] [02puppet] 07paladox synchronize pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [21:23:49] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-8 [+0/-0/±1] 13https://git.io/Jve7Z [21:23:51] [02miraheze/puppet] 07paladox 03a669880 - Update mw.cnf.erb [21:24:52] 503 [21:25:04] paladox: ^ [21:25:48] Paladox, I changed logout to sign out... nothing [21:25:51] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:26:27] 502 [21:26:42] yup [21:26:44] looking [21:26:51] we're about to roll out a mysql change [21:27:05] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:27:27] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. lizardfs6 mw1 mw2 [21:27:34] ? [21:27:37] [02mw-config] 07Hispano76 synchronize pull request 03#2853: enable wgExtraInterlanguageLinkPrefixes on hispanowiki per request - 13https://git.io/JveTS [21:27:44] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 5 minutes ago with 7 failures. Failed resources (up to 3 shown): Exec[ufw-allow-tcp-from-185.52.3.121-to-any-port-9113],Package[php7.3-redis],Service[rsyslog],Exec[ops_ensure_members] [21:27:55] [02mw-config] 07Hispano76 edited pull request 03#2853: enable wgExtraInterlanguageLinkPrefixes on hispanowiki and Ucroniaswiki per request - 13https://git.io/JveTS [21:28:03] PROBLEM - db4 Current Load on db4 is CRITICAL: CRITICAL - load average: 6.52, 8.25, 4.82 [21:28:24] @PF94 currently dealing with an out [21:28:27] *outage [21:28:36] the load on the db explains the 503 atm [21:29:31] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [21:29:43] [02puppet] 07paladox closed pull request 03#1176: mariadb: Switch on threads pool - 13https://git.io/JveH1 [21:29:44] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jve72 [21:29:46] [02miraheze/puppet] 07paladox 033fe2068 - mariadb: Switch on threads pool (#1176) * mariadb: Switch on threads pool Also adjust configs (which is from https://github.com/wikimedia/puppet/blob/production/modules/role/templates/mariadb/mysqld_config/production.my.cnf.erb) * Update mw.cnf.erb * Update mw.cnf.erb * Update mw.cnf.erb * Update mw.cnf.erb * Update mw.cnf.erb * Update mw.cnf.erb [21:29:47] [ puppet/production.my.cnf.erb at production · wikimedia/puppet · GitHub ] - github.com [21:29:47] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-8 [21:29:50] [02puppet] 07paladox deleted branch 03paladox-patch-8 - 13https://git.io/vbiAS [21:30:10] PROBLEM - db4 Current Load on db4 is WARNING: WARNING - load average: 4.25, 6.91, 4.77 [21:31:17] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [21:32:00] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:32:07] RECOVERY - db4 Current Load on db4 is OK: OK - load average: 3.91, 6.14, 4.76 [21:32:12] PROBLEM - db4 Puppet on db4 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 11 minutes ago with 0 failures [21:32:41] PROBLEM - db5 Puppet on db5 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 11 minutes ago with 0 failures [21:33:36] !log restart mysql on db5 [21:33:54] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:33:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [21:34:41] RECOVERY - db5 Puppet on db5 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:35:53] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[ufw-allow-tcp-from-185.52.3.121-to-any-port-9253] [21:36:09] RECOVERY - db4 Puppet on db4 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:37:10] !log restart mysql on db4 [21:37:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [21:39:59] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:40:03] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4215 bytes in 0.056 second response time [21:40:10] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4215 bytes in 0.017 second response time [21:40:13] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [21:40:46] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.021 second response time [21:40:57] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4217 bytes in 0.024 second response time [21:41:16] PROBLEM - misc1 webmail.miraheze.org HTTPS on misc1 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 200 OK [21:41:21] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:41:27] PROBLEM - db4 MySQL on db4 is CRITICAL: Can't connect to MySQL server on '81.4.109.166' (115) [21:41:57] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4217 bytes in 0.014 second response time [21:41:57] PROBLEM - misc4 phab.miraheze.wiki HTTPS on misc4 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 500 Internal Server Error [21:41:59] PROBLEM - misc4 phabricator.miraheze.org HTTPS on misc4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 4222 bytes in 0.027 second response time [21:42:00] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [21:42:00] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [21:42:09] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:43:16] RECOVERY - misc1 webmail.miraheze.org HTTPS on misc1 is OK: HTTP OK: Status line output matched "HTTP/1.1 401 Unauthorized" - 5794 bytes in 0.030 second response time [21:43:24] RECOVERY - db4 MySQL on db4 is OK: Uptime: 255 Threads: 37 Questions: 7232 Slow queries: 466 Opens: 817 Flush tables: 1 Open tables: 811 Queries per second avg: 28.360 [21:43:52] RECOVERY - misc4 phab.miraheze.wiki HTTPS on misc4 is OK: HTTP OK: Status line output matched "HTTP/1.1 200" - 17725 bytes in 0.067 second response time [21:43:52] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18673 bytes in 0.469 second response time [21:43:54] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:43:54] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [21:43:55] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [21:43:59] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 18668 bytes in 0.829 second response time [21:44:00] RECOVERY - misc4 phabricator.miraheze.org HTTPS on misc4 is OK: HTTP OK: HTTP/1.1 200 OK - 19074 bytes in 0.129 second response time [21:44:03] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18668 bytes in 0.804 second response time [21:44:05] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [21:44:45] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18673 bytes in 0.270 second response time [21:44:56] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 0.323 second response time [21:45:21] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [21:58:59] > @PF94 (aka DreamIsland) (aka DreamIsland) currently dealing with an out [21:59:00] ah ok [22:08:35] Hmm @RhinosF1 or @Void maybe you know ^? [22:08:49] I’m alseeo [22:08:57] s/o/p [22:08:58] RhinosF1 meant to say: I’m alseep [22:15:24] PROBLEM - cp3 Stunnel Http for misc2 on cp3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 2.538 second response time [22:16:06] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:16:16] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 309 bytes in 0.293 second response time [22:16:28] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:16:38] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:16:50] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:16:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. lizardfs6 mw2 [22:16:54] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. lizardfs6 mw2 [22:16:56] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:17:10] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:17:29] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [22:17:29] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [22:17:33] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw2 [22:17:36] RECOVERY - cp3 Stunnel Http for misc2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 43687 bytes in 0.887 second response time [22:17:53] huh [22:18:07] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 0.632 second response time [22:18:16] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15249 bytes in 0.296 second response time [22:18:23] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 0.266 second response time [22:18:38] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15262 bytes in 0.294 second response time [22:18:47] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [22:18:48] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 0.247 second response time [22:18:49] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [22:18:53] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.004 second response time [22:19:06] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18668 bytes in 0.820 second response time [22:19:28] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:19:28] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [22:19:30] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [22:21:42] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:34:16] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:41:46] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.014 second response time [22:42:17] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:42:26] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:43:06] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:43:08] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:43:18] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4217 bytes in 0.021 second response time [22:43:23] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.023 second response time [22:43:24] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.124 second response time [22:43:24] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [22:43:25] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [22:43:38] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 4 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [22:43:39] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.064 second response time [22:44:09] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 0.223 second response time [22:44:24] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.479 second response time [22:44:27] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15249 bytes in 0.293 second response time [22:45:06] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 1.257 second response time [22:45:08] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15249 bytes in 3.057 second response time [22:45:18] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 4.172 second response time [22:45:29] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [22:45:30] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:45:41] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18675 bytes in 7.294 second response time [22:48:59] !log restart php7.3-fpm on all mediawiki hosts [22:49:53] !log restart nginx on all mediawiki hosts [22:50:35] PROBLEM - misc2 HTTPS on misc2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:50:49] PROBLEM - cp2 Stunnel Http for misc2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:50:51] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. lizardfs6 mw1 mw3 [22:50:51] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. lizardfs6 mw1 mw3 [22:50:55] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:51:03] PROBLEM - cp4 Stunnel Http for misc2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:51:31] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.043 second response time [22:51:38] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [22:51:48] PROBLEM - cp3 Stunnel Http for misc2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:51:49] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [22:51:51] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4217 bytes in 0.028 second response time [22:52:48] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 43695 bytes in 8.057 second response time [22:52:51] RECOVERY - cp2 Stunnel Http for misc2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 43687 bytes in 0.519 second response time [22:53:03] RECOVERY - cp4 Stunnel Http for misc2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 43687 bytes in 0.049 second response time [22:53:08] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 4.406 second response time [22:53:31] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18675 bytes in 1.111 second response time [22:53:33] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18675 bytes in 1.055 second response time [22:53:38] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [22:53:46] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:53:48] RECOVERY - cp3 Stunnel Http for misc2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 43687 bytes in 0.879 second response time [22:53:51] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18668 bytes in 0.825 second response time [22:54:45] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [22:54:45] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [22:54:46] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18675 bytes in 0.348 second response time [22:55:21] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [22:56:46] PROBLEM - misc1 Puppet on misc1 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [23:02:35] RECOVERY - misc1 Puppet on misc1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:06:16] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:14:45] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [23:15:21] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [23:16:49] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [23:46:39] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4219 bytes in 0.023 second response time [23:47:02] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:47:14] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1366 bytes in 0.051 second response time [23:47:14] PROBLEM - cp3 Stunnel Http for misc2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:47:14] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1365 bytes in 0.055 second response time [23:47:28] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:47:28] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1366 bytes in 0.064 second response time [23:47:39] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:47:42] PROBLEM - cp2 HTTPS on cp2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4064 bytes in 0.400 second response time [23:47:57] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:47:59] PROBLEM - cp2 Stunnel Http for misc2 on cp2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 372 bytes in 8.445 second response time [23:48:07] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:48:23] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:48:23] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:48:24] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [23:48:25] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [23:48:35] PROBLEM - misc2 HTTPS on misc2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:48:35] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4215 bytes in 0.031 second response time [23:48:44] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [23:48:59] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [23:49:09] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [23:49:15] PROBLEM - cp4 Stunnel Http for misc2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:49:38] !log rm -rf /srv/mediawiki/w/cache/managewiki/** on all mediawiki hosts [23:50:00] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 1.691 second response time [23:50:29] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.507 second response time [23:50:36] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.293 second response time [23:51:04] !log restart php7.3-fpm on all mediawiki hosts [23:51:57] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15262 bytes in 0.296 second response time [23:52:10] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15263 bytes in 0.004 second response time [23:52:25] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15256 bytes in 0.513 second response time [23:52:39] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15262 bytes in 0.005 second response time [23:54:00] PROBLEM - lizardfs6 JobRunner Service on lizardfs6 is CRITICAL: PROCS CRITICAL: 0 processes with args 'redisJobRunnerService' [23:56:34] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:56:47] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:57:25] PROBLEM - lizardfs6 Puppet on lizardfs6 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [23:57:48] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:58:06] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:59:15] PROBLEM - bacula1 Puppet on bacula1 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [23:59:19] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.293 second response time [23:59:19] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18674 bytes in 0.516 second response time [23:59:19] PROBLEM - lizardfs6 JobChron Service on lizardfs6 is CRITICAL: PROCS CRITICAL: 0 processes with args 'redisJobChronService' [23:59:30] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15257 bytes in 0.482 second response time