[01:20:33] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw2 mw3 [01:21:56] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [01:21:58] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw3 [01:22:04] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [01:22:45] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:23:06] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 3 datacenters are down: 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 81.4.109.133/cpweb [01:24:43] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.661 second response time [01:24:59] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [01:25:56] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [01:25:58] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [01:26:04] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [01:26:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [01:54:18] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [01:54:33] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [01:55:15] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/logrotate.d/nginx] [01:56:18] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [01:56:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [02:05:14] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [02:16:07] !log increasing swap size on misc3 to 768 [02:16:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [02:16:27] !log a small downtime will happen (this is being done because misc3 OOM) [02:16:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [02:19:25] PROBLEM - misc3 Lizardfs Master Port 3 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9421: Connection refused [02:19:31] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [02:19:34] PROBLEM - misc3 Lizardfs Master Port 2 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9420: Connection refused [02:19:52] PROBLEM - misc3 Lizardfs Master Port 1 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9419: Connection refused [02:20:11] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:12] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [02:20:13] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:13] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [02:20:14] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:14] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:21] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [02:20:21] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 41% [02:20:22] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:25] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:29] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:32] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [02:20:44] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:50] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 90% [02:20:54] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:21:20] RECOVERY - misc3 Lizardfs Master Port 3 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9421 [02:21:23] Hello Bloodstream! If you have any questions feel free to ask and someone should answer soon. [02:21:34] RECOVERY - misc3 Lizardfs Master Port 2 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9420 [02:21:52] RECOVERY - misc3 Lizardfs Master Port 1 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9419 [02:22:07] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.016 second response time [02:22:08] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [02:22:10] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.024 second response time [02:22:10] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.670 second response time [02:22:11] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.646 second response time [02:22:13] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [02:22:18] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [02:22:20] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.004 second response time [02:22:21] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 2% [02:22:23] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.393 second response time [02:22:28] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.393 second response time [02:22:28] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:22:44] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.715 second response time [02:22:47] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 5% [02:22:54] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.400 second response time [02:23:19] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [05:04:53] [02miraheze/ManageWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±5] 13https://git.io/fjj8j [05:04:54] [02miraheze/ManageWiki] 07translatewiki 033a3cec0 - Localisation updates from https://translatewiki.net. [05:04:56] [02miraheze/CreateWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/fjj4e [05:04:57] [02miraheze/CreateWiki] 07translatewiki 03cd67d10 - Localisation updates from https://translatewiki.net. [05:04:59] [02miraheze/WikiDiscover] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjj4v [05:05:00] [02miraheze/WikiDiscover] 07translatewiki 03d4500f3 - Localisation updates from https://translatewiki.net. [06:26:53] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3063 MB (12% inode=94%); [10:17:59] Hello A101! If you have any questions feel free to ask and someone should answer soon. [10:19:37] Oh boy, the welcome bot remembers by nick and not by hostname :( [10:26:52] Hi! Here is the list of currently open high priority tasks on Phabricator [10:26:59] No updates for 6 days - https://phabricator.miraheze.org/T4504 - Renew *.miraheze.org cert - authored by Reception123, assigned to Southparkfan [10:34:37] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:41:28] PROBLEM - reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: connect to address reviwiki.info and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [10:43:23] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:46:29] RECOVERY - www.reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:47:15] PROBLEM - reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: connect to address reviwiki.info and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [10:49:09] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:50:29] PROBLEM - www.reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: connect to address www.reviwiki.info and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [10:52:29] RECOVERY - www.reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [11:17:21] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.29, 5.33, 4.02 [11:19:20] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 3.89, 4.68, 3.94 [13:45:21] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.11, 5.70, 4.14 [13:47:21] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 9.57, 6.84, 4.73 [13:49:21] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.06, 7.17, 5.13 [13:51:20] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 4.55, 6.20, 5.02 [14:02:33] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:02:53] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:01] PROBLEM - misc3 Lizardfs Master Port 3 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9421: Connection refused [14:03:16] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:16] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:18] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:26] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:29] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:30] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:31] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:43] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 44% [14:03:43] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:52] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:56] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:03:58] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:03:58] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:04:10] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 89% [14:04:12] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:04:13] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:04:18] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:04:51] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [14:05:34] PROBLEM - misc3 Lizardfs Master Port 2 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9420: Connection refused [14:05:42] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 71% [14:05:53] PROBLEM - misc3 Lizardfs Master Port 1 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9419: Connection refused [14:06:42] JohnLewis: we're down 503 [14:06:51] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 44% [14:06:56] paladox is dealing [14:07:00] Good [14:08:18] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 1.409 second response time [14:08:20] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 9.992 second response time [14:08:34] !log increased swap to 1280M [14:08:42] !log thats for misc3 [14:08:56] RECOVERY - misc3 Lizardfs Master Port 3 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9421 [14:09:00] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [14:09:28] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.007 second response time [14:09:31] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.009 second response time [14:09:34] RECOVERY - misc3 Lizardfs Master Port 2 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9420 [14:09:35] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [14:09:36] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.006 second response time [14:09:41] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 1% [14:09:43] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.675 second response time [14:09:44] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.431 second response time [14:09:48] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 1.523 second response time [14:09:52] RECOVERY - misc3 Lizardfs Master Port 1 on misc3 is OK: TCP OK - 0.005 second response time on 185.52.1.71 port 9419 [14:09:56] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [14:09:58] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [14:09:58] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.684 second response time [14:09:59] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [14:10:10] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 4% [14:10:10] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.393 second response time [14:10:18] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [14:10:32] !log increased swap to 1280M on misc3 [14:10:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [14:10:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [14:10:45] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [14:10:51] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 1% [14:11:03] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [14:12:18] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [14:13:01] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:14:16] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:14:41] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:49:39] K6ka yes it doesnt do hostname, but by nick [14:55:47] Hmm, I just a notification from a private wiki but not sure what about? [15:00:56] paladox: we're on a go slow (testwiki) [15:01:23] paladox: 503 [15:01:28] checking [15:02:35] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [15:02:35] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:02:35] Same issue [15:04:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [15:04:33] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.393 second response time [15:25:06] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjwh [15:25:08] [02miraheze/mw-config] 07paladox 039f97b81 - citoid: use $wgCitoidFullRestbaseURL not $wgCitoidServiceUrl [15:25:59] paladox: any good with scribunto? /doc pages aren't defaulting to wikitext [15:26:15] sadly nope :( [15:27:05] paladox: Can't reproduce on testwikipedia or find any config they're using and we're not so wondering if it was a bug they've fixed [15:27:28] i'm not sure. [15:27:52] me and JohnLewis are currently busy trying to fix the backend (and also have some exciting upgrades planned too!) [15:28:35] Oh goody! Enjoy upgrading! I'll try and weed something out of the knowledgeable folk on mediawiki discord [15:37:40] PROBLEM - mw4 php-fpm on mw4 is CRITICAL: PROCS CRITICAL: 0 processes with command name 'php-fpm7.2' [15:37:41] PROBLEM - mw4 HTTPS on mw4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:37:41] \o/ [15:37:42] PROBLEM - bacula1 Current Load on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:37:44] RECOVERY - bacula1 Current Load on bacula1 is OK: OK - load average: 0.00, 0.00, 0.00 [15:38:59] [02mw-config] 07RhinosF1 opened pull request 03#2751: per T4699 - 13https://git.io/fjjrm [15:41:03] [02mw-config] 07RhinosF1 closed pull request 03#2751: per T4699 - 13https://git.io/fjjrm [15:41:10] [02mw-config] 07RhinosF1 commented on pull request 03#2751: per T4699 - 13https://git.io/fjjr3 [15:48:09] paladox: I have to go now for a while, puppet is running on mw4 [15:48:16] ok [15:48:29] JohnLewis i can take care of fully integrating it, if you want? [15:48:30] don't give it the full shebang, just do the stuff needed to deploy it for me [15:48:35] ok [15:48:39] so no dns etc. [15:49:03] * RhinosF1 wonders what is being done.... [15:49:07] ok [15:49:21] JohnLewis i add it to varnish, right? [15:49:32] testing the feasbility of an idea I have to downscale mw* in a way [15:49:38] paladox: yes as otherwise how is it going to be put into prod :) [15:50:10] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjrC [15:50:12] [02miraheze/services] 07MirahezeSSLBot 034c9d770 - BOT: Updating services config for wikis [15:50:12] JohnLewis: what does that mean for people? users? [15:51:23] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [15:51:25] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrE [15:51:26] [02miraheze/puppet] 07paladox 038a24f35 - Pool in mw4 [15:51:28] [02puppet] 07paladox opened pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:52:01] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjrz [15:52:03] [02miraheze/puppet] 07paladox 03880798a - Update config.yaml [15:52:38] RECOVERY - mw4 HTTPS on mw4 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.010 second response time [15:53:24] RhinosF1: for users, likely nothing hopefully [15:53:38] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjrP [15:53:39] [02miraheze/puppet] 07paladox 036c0a179 - Update stunnel.conf [15:54:14] PROBLEM - mw4 Puppet on mw4 is CRITICAL: CRITICAL: Puppet has 37 failures. Last run 2 minutes ago with 37 failures. Failed resources (up to 3 shown): Package[ploticus],Package[ttf-freefont],Package[libav-tools],Package[texvc] [15:54:16] paladox: a few puppet failures btw so if you fancy making the puppet module deb 10 friendly, be nice :P [15:54:24] JohnLewis will do :) [15:54:31] just going to pull mw4 in first [15:54:33] JohnLewis: oh, What does it mean for anyone then? [15:54:50] RhinosF1: less costs hopefully for us :) [15:54:56] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrM [15:54:58] [02miraheze/puppet] 07paladox 0324617c3 - Update storage_firewall.yaml [15:54:59] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:55:14] JohnLewis: well that's always good news [15:55:43] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrS [15:55:45] [02miraheze/puppet] 07paladox 03e739a4a - Update varnish.pp [15:55:46] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:56:13] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrQ [15:56:15] [02miraheze/puppet] 07paladox 03bfa0547 - Update irc.pp [15:56:16] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:57:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjr7 [15:57:03] [02miraheze/puppet] 07paladox 03dfac2c0 - Update services.pp [15:57:05] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:57:41] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjr5 [15:57:43] [02miraheze/puppet] 07paladox 03a5359bd - Update db.pp [15:57:44] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:57:59] [02puppet] 07paladox closed pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:58:00] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±5] 13https://git.io/fjjrd [15:58:02] [02miraheze/puppet] 07paladox 03607e930 - Pool in mw4 (#1079) * Pool in mw4 * Update storage_firewall.yaml * Update varnish.pp * Update irc.pp * Update services.pp * Update db.pp [15:58:56] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [15:58:58] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [16:00:30] RECOVERY - mw4 php-fpm on mw4 is OK: PROCS OK: 3 processes with command name 'php-fpm7.2' [16:12:59] PROBLEM - lizardfs4 Puppet on lizardfs4 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:13:22] PROBLEM - lizardfs5 Puppet on lizardfs5 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:13:26] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:16:01] uh [16:16:07] network gone down on lizardfs [16:18:29] it's back [16:18:54] RECOVERY - lizardfs4 Puppet on lizardfs4 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:23:11] RECOVERY - lizardfs5 Puppet on lizardfs5 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:23:26] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:32:11] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2649 MB (10% inode=94%); [16:33:04] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+4/-0/±1] 13https://git.io/fjjo8 [16:33:05] [02miraheze/puppet] 07paladox 0302ed278 - mediawiki: Add support for debian 10 [16:43:55] !log reloaded lizardfs-master on misc3 [16:44:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:44:49] !log running lc on mw4 [16:44:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:45:20] RECOVERY - mw4 Puppet on mw4 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:51:06] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjod [16:51:07] [02miraheze/puppet] 07paladox 03a5f0a43 - varnish: Add mw4 [16:51:08] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [16:51:10] [02puppet] 07paladox opened pull request 03#1080: varnish: Add mw4 - 13https://git.io/fjjoF [16:51:45] [02puppet] 07paladox closed pull request 03#1080: varnish: Add mw4 - 13https://git.io/fjjoF [16:51:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjoN [16:51:48] [02miraheze/puppet] 07paladox 03225f68c - varnish: Add mw4 (#1080) [16:53:22] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [16:53:23] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [16:55:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjKe [16:55:03] [02miraheze/puppet] 07paladox 03cd14506 - fix syntax [16:58:02] [02miraheze/MirahezeDebug] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjKk [16:58:05] [02miraheze/MirahezeDebug] 07paladox 0345eeba9 - add mw4 [16:59:10] [02miraheze/MirahezeDebug] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjKt [16:59:12] [02miraheze/MirahezeDebug] 07paladox 0371946ab - Update popup.html [17:04:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjK3 [17:04:38] [02miraheze/puppet] 07paladox 03a9de0dd - Update mediawiki::branch to REL1_33 [17:06:01] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw4 [17:06:58] PROBLEM - mw4 Puppet on mw4 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 2 minutes ago with 1 failures [17:07:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw4 [17:07:59] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw4 [17:14:58] PROBLEM - mw4 Puppet on mw4 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_clone_MediaWiki core] [17:15:32] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [17:15:51] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [17:15:56] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [17:21:33] PROBLEM - mw4 Current Load on mw4 is CRITICAL: CRITICAL - load average: 5.80, 3.70, 2.48 [17:24:58] RECOVERY - mw4 Puppet on mw4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:25:32] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 1.59, 3.21, 2.63 [17:33:32] PROBLEM - mw4 Current Load on mw4 is CRITICAL: CRITICAL - load average: 9.36, 5.48, 3.60 [17:36:58] PROBLEM - mw4 Puppet on mw4 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 2 minutes ago with 0 failures [17:47:33] PROBLEM - mw4 Current Load on mw4 is WARNING: WARNING - load average: 2.57, 3.51, 3.83 [17:53:32] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 1.47, 2.62, 3.35 [18:00:41] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw2 mw3 mw4 [18:00:41] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 4 backends are down. mw1 mw2 mw3 mw4 [18:00:59] huh [18:01:31] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw4 [18:01:40] I would say huh backwards back to you paladox but its still huh [18:02:01] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 2400:6180:0:d0::403:f001/cpweb [18:03:25] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [18:05:27] PROBLEM - misc3 Lizardfs Master Port 1 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9419: Connection refused [18:05:39] PROBLEM - misc3 Lizardfs Master Port 3 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9421: Connection refused [18:06:04] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:05] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:07] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:09] PROBLEM - misc3 Lizardfs Master Port 2 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9420: Connection refused [18:06:18] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:28] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [18:06:30] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [18:07:14] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw1 mw4 [18:07:27] RECOVERY - misc3 Lizardfs Master Port 1 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9419 [18:07:42] RECOVERY - misc3 Lizardfs Master Port 3 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9421 [18:08:00] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [18:08:11] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.005 second response time [18:08:11] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.025 second response time [18:08:12] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.396 second response time [18:08:12] paladox: can you drop in #Wikimedia-operations and tell them to see https://phabricator.wikimedia.org/T232224 [18:08:12] RECOVERY - misc3 Lizardfs Master Port 2 on misc3 is OK: TCP OK - 0.002 second response time on 185.52.1.71 port 9420 [18:08:17] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.396 second response time [18:08:25] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 2% [18:08:26] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.676 second response time [18:08:29] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [18:08:44] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:12:42] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [18:12:47] RhinosF1 they are aware [18:13:05] paladox: thx [18:14:28] i didn't tell them that task, but they are both aware and also they can see you've created a task [18:14:33] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw4 [18:15:27] RhinosF1, irony being I got a timeout error when first attempting to view that task :P [18:18:41] I couldnt view it at all i thought it was just me lol [18:18:41] Voidwalker you would as phabricator is behind varnish :) [18:19:01] Voidwalker: phabs slows for me but works [18:19:01] Whats is up with it? [18:19:42] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [18:19:42] * RhinosF1 was going to check the logs for -operations but ha *.wmflabs.org is down [18:19:42] then eqiad is down [18:19:42] Voidwalker, i get it you like hitting delete but theres no need to nuke the entirety of everything [18:19:42] DoS attack [18:19:42] They suspect Dos [18:19:43] Wmf is being ddos’d? [18:19:43] Im getting in irc i got to see -operations [18:19:43] Well no confirmation [18:19:43] wouldn't be the first time [18:19:43] It is since ive been around (i think?) [18:19:43] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [18:19:43] Wait so is enwiki down? [18:19:43] yes [18:19:43] paladox: ops clinic just told me [18:19:43] the whole cluster is [18:19:43] so wmflabs/wikimedia/wikipedia [18:19:43] there goes my easy access to IP lookups :P [18:19:43] 19:17:06 It's a dos attack. Everybody is online looking at it [18:21:04] Rip ZppixBot [18:21:08] Oh nvm [18:21:16] Probably part of the hit [18:22:22] Cant they just do DC switchover paladox [18:22:22] To update: it's confirmed a DoS attack not DDoS [18:22:22] that makes it easier to deal with [18:25:06] I think they are back (for now?) [18:25:06] Not for me [18:25:06] Thanks [18:25:06] So.. is this channel for unsubstantiated speculations? [18:25:06] :lol: [18:25:27] This is a know issue. The SRE team is finding a quick solution to restore these services. Thanks [18:25:27] Hello ShakespeareFan00! If you have any questions feel free to ask and someone should answer soon. [18:25:27] ShakespeareFan00: no live information on the incident [18:25:27] Poor ZppixBot is slow because of the DoS [18:25:27] lol ZppixBot [18:25:27] * Zppix pats ZppixBot [18:25:27] Its okay buddy [18:25:27] * RhinosF1 goes and gives ZppixBot a boost [18:25:27] Is it only Wikimedia sites? [18:25:27] ShakespeareFan00: wmflabs/wikimedia/wikipedia [18:36:43] it's confirmed DDOS [18:37:44] Who the hell DDoS’ wmf? Like whats the point [18:39:16] paladox: DDoS? Clinic told me DoS rather than DDoS explicitly when I asked [18:39:53] Zppix: Wikipedia has coverage of thing certain entities do not like [18:40:13] RhinosF1: BBlack confirmed it was DDoS [18:40:26] Zppix: k [18:41:32] per standing plicy, we are unliley to hear much about an ongoing investigation [18:42:02] unlikely [18:42:13] Yep clinic have just said DDoS but complicated [18:43:47] I would not be at all suprised if its linked to something wikipedia published recently [18:43:58] ShakespeareFan00: I doubt it [18:44:14] Its probably some LTA [18:45:14] Zppix: it must be big [18:45:32] PROBLEM - mw4 Current Load on mw4 is WARNING: WARNING - load average: 3.72, 3.03, 2.45 [18:47:22] RhinosF1 : Is there a way to see where the traffic is coming from geographically? [18:47:32] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 1.75, 2.61, 2.37 [18:47:58] ShakespeareFan00: not that I know of [18:51:18] PROBLEM - wiki.articmetrosystem.tk - LetsEncrypt on sslhost is CRITICAL: Name or service not knownHTTP CRITICAL - Unable to open TCP socket [19:01:10] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw3 mw4 [19:01:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw1 [19:02:01] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw1 mw4 [19:03:55] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [19:04:08] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 3 datacenters are down: 107.191.126.23/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb [19:05:38] paladox: not us as well ^ ??? [19:05:43] nope [19:05:54] we are not being DDOS, if we were you'll see more then that :P [19:06:13] paladox: well we are at least slow [19:06:27] * RhinosF1 was being half serious half sarcastic [19:06:42] that will be misc3 again [19:06:53] we are looking to resolve this [19:08:02] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:09:07] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [19:09:39] paladox: You said yesterday :) [19:10:30] yeh, but resolving it is not easy as it sounds. We are looking at replacements. [19:11:34] paladox: ok, you spoke to RN didn't you? [19:11:43] yes [19:12:00] paladox: Wasn't they going to change something? [19:12:14] Yes, which they did. [19:12:20] Dosen't seem to have helped much. [19:12:42] I gather [19:14:31] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw4 [19:15:02] !log depool mw4 [19:15:04] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw4 [19:16:45] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:17:14] PROBLEM - wiki.mxlinuxusers.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.mxlinuxusers.org' expires in 15 day(s) (Sun 22 Sep 2019 07:13:17 PM GMT +0000). [19:17:27] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjiF [19:17:29] [02miraheze/ssl] 07MirahezeSSLBot 038f9a1e8 - Bot: Update SSL cert for wiki.mxlinuxusers.org [19:18:31] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [19:19:02] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [19:19:52] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [19:20:18] PROBLEM - storytime.jdstroy.cf - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'storytime.jdstroy.cf' expires in 15 day(s) (Sun 22 Sep 2019 07:16:30 PM GMT +0000). [19:20:31] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjix [19:20:32] [02miraheze/ssl] 07MirahezeSSLBot 0300fc235 - Bot: Update SSL cert for storytime.jdstroy.cf [19:20:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:23:14] RECOVERY - wiki.mxlinuxusers.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.mxlinuxusers.org' will expire on Thu 05 Dec 2019 06:17:21 PM GMT +0000. [19:24:18] RECOVERY - storytime.jdstroy.cf - LetsEncrypt on sslhost is OK: OK - Certificate 'storytime.jdstroy.cf' will expire on Thu 05 Dec 2019 06:20:25 PM GMT +0000. [19:24:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:24:58] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [19:25:29] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:26:14] !log reboot misc3 [19:27:20] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [19:27:27] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:28:09] PROBLEM - misc3 Lizardfs Master Port 2 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9420: Connection refused [19:29:22] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.634 second response time [19:30:09] RECOVERY - misc3 Lizardfs Master Port 2 on misc3 is OK: TCP OK - 0.007 second response time on 185.52.1.71 port 9420 [19:30:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:30:56] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [19:31:08] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [19:31:19] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:32:25] !log reboot lizardfs4 [19:34:24] PROBLEM - lizardfs4 SSH on lizardfs4 is CRITICAL: connect to address 81.4.122.238 and port 22: Connection refused [19:34:41] PROBLEM - lizardfs4 Current Load on lizardfs4 is CRITICAL: connect to address 81.4.122.238 port 5666: Connection refusedconnect to host 81.4.122.238 port 5666: Connection refused [19:34:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:34:53] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [19:34:56] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:34:57] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [19:35:06] PROBLEM - lizardfs4 Lizardfs Chunkserver Port on lizardfs4 is CRITICAL: connect to address 81.4.122.238 and port 9422: Connection refused [19:35:07] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:35:07] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:35:08] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:35:10] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:35:14] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:35:21] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:35:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 71% [19:35:35] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [19:35:38] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:35:42] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:35:51] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:35:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [19:35:55] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:36:16] PROBLEM - lizardfs4 Puppet on lizardfs4 is CRITICAL: connect to address 81.4.122.238 port 5666: Connection refusedconnect to host 81.4.122.238 port 5666: Connection refused [19:36:19] PROBLEM - lizardfs4 Disk Space on lizardfs4 is CRITICAL: connect to address 81.4.122.238 port 5666: Connection refusedconnect to host 81.4.122.238 port 5666: Connection refused [19:36:23] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:36:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [19:36:50] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:37:29] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 36% [19:37:54] uh, ssh hasen't come back up [19:38:18] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [19:38:45] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [19:39:06] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [19:39:06] RECOVERY - lizardfs4 Lizardfs Chunkserver Port on lizardfs4 is OK: TCP OK - 0.001 second response time on 81.4.122.238 port 9422 [19:39:17] it's back [19:39:35] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 44% [19:40:17] RECOVERY - lizardfs4 Puppet on lizardfs4 is OK: OK: Puppet is currently enabled, last run 8 minutes ago with 0 failures [19:40:19] RECOVERY - lizardfs4 Disk Space on lizardfs4 is OK: DISK OK - free space: / 72802 MB (23% inode=90%); [19:40:24] RECOVERY - lizardfs4 SSH on lizardfs4 is OK: SSH OK - OpenSSH_7.9p1 Debian-10 (protocol 2.0) [19:40:41] RECOVERY - lizardfs4 Current Load on lizardfs4 is OK: OK - load average: 1.10, 0.47, 0.18 [19:41:16] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.392 second response time [19:41:19] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.006 second response time [19:41:35] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 62% [19:41:55] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 5.146 second response time [19:42:01] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.683 second response time [19:42:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 51% [19:43:05] having a lot of downtime huh? [19:43:05] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.400 second response time [19:43:27] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.004 second response time [19:43:35] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 6% [19:43:41] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.057 second response time [19:43:51] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [19:43:52] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.634 second response time [19:44:07] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.665 second response time [19:44:25] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 2% [19:44:31] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [19:44:39] Voidwalker yeh, we are looking at alternative file systems. [19:44:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:44:49] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [19:44:55] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [20:21:12] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:21:17] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 81.4.109.133/cpweb [20:23:06] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:23:15] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [20:31:38] paladox: do we run our own cpanel? [20:31:48] we doin't run cpanel [20:31:58] cpanel is not opensource and requires a different os :) [20:33:57] paladox: okay good then, see RN's Twitter [20:34:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:34:51] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw1 mw2 [20:34:53] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:35:20] FYI I wrote a section on enwiki's village pump about the outage https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Wikimedia_down [20:35:20] [WIKIPEDIA] Wikipedia:Village pump (technical)#Wikimedia down | "..." [20:35:24] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw1 [20:35:43] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw1 [20:36:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:36:46] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [20:36:50] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [20:37:24] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [20:37:40] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [20:38:05] we doin't run exim as far as i know [20:38:09] Zppix: when half of us can't get on [20:38:32] RhinosF1: Well you know im doing what I can to stop people from complaining to operations [20:38:49] An email has gone out to wikitech-l a least [20:39:07] not everyone gets wikitech-l [20:40:27] wikitech-i is a developer forum [20:40:30] Nah, Zppix, https://www.independent.co.uk/life-style/gadgets-and-tech/wikipedia-down-not-working-google-stopped-page-loading-encyclopedia-a9095236.html has gone out but it's not amazing [20:40:30] [ Wikipedia down: Online encyclopedia not working as pages fail to load for some users | The Independent ] - www.independent.co.uk [20:40:39] Wikimedia-l might have been better [20:41:18] RhinosF1: thats because they couldnt get a hold of WMF for comment [20:41:55] Zppix: are you surprised? Hopefully they're busy [20:42:09] RhinosF1: No im not [20:42:30] It's probably good the report is sketchy [20:47:19] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw3 [20:48:09] !log reboot lizardfs5 [20:48:39] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:48:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [20:49:53] paladox: ^ 593 [20:49:56] 503 [20:50:01] yup, aware [20:50:10] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:50:12] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:50:16] PROBLEM - lizardfs5 Disk Space on lizardfs5 is CRITICAL: connect to address 81.4.122.196 port 5666: Connection refusedconnect to host 81.4.122.196 port 5666: Connection refused [20:50:17] PROBLEM - lizardfs5 Current Load on lizardfs5 is CRITICAL: connect to address 81.4.122.196 port 5666: Connection refusedconnect to host 81.4.122.196 port 5666: Connection refused [20:50:19] PROBLEM - lizardfs5 Lizardfs Chunkserver Port on lizardfs5 is CRITICAL: connect to address 81.4.122.196 and port 9422: Connection refused [20:50:26] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:50:31] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [20:50:40] PROBLEM - lizardfs5 Puppet on lizardfs5 is CRITICAL: connect to address 81.4.122.196 port 5666: Connection refusedconnect to host 81.4.122.196 port 5666: Connection refused [20:51:02] PROBLEM - lizardfs5 SSH on lizardfs5 is CRITICAL: connect to address 81.4.122.196 and port 22: Connection refused [20:51:13] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:51:27] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:51:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [20:51:36] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:51:41] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:51:46] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:51:53] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:12] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:13] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:52:18] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 49% [20:52:23] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:26] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:53:29] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 25% [20:54:06] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.006 second response time [20:54:17] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [20:54:18] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.009 second response time [20:54:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [20:56:15] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 45% [20:56:25] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 38% [20:57:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 72% [20:58:13] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [20:58:40] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:01] paladox: today is the day of connection loss isnt it [20:59:06] yeh [20:59:51] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 9.446 second response time [21:00:16] paladox: if theres anything i can do lmk [21:00:20] It's honestly everywhere today [21:00:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 70% [21:00:28] ntohing much :P [21:00:32] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.077 second response time [21:00:36] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.022 second response time [21:00:38] k6ka: well if you'd stop pressing all the red buttons [21:00:41] My own Internet connectivity has not been consistent today [21:01:12] Zppix: say, i found a button that says "Emergency Stop", what does it do? [21:01:18] couldn't hurt to press it, i mean. it looks harmless. [21:01:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 41% [21:01:44] Hi, lolAround? [21:01:45] lol [21:01:49] k6ka: try going to english wikipedia from the EU and theres your answer [21:02:07] I'm in Canada so funny enough Wikipedia is working fine for me [21:02:10] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 39% [21:02:27] k6ka: thats because the EU is whats mainly affected rn [21:02:28] I'm noticing freenode is having issues too. My ZNC pinged out and so did half a dozen bots. [21:02:41] I note the Middle East is out [21:02:48] according to Wikipedia [21:02:55] ShakespeareFan00: I include that when i say EU [21:02:57] *sorry according to the Indpendent [21:03:14] Did someone say something iffy about a regime? [21:03:19] ShakespeareFan00: The entire other side of the world from the US is down AFAIk [21:03:28] k6ka: all our mw* servers our based in Netherlands so we'll be impacted [21:03:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 78% [21:03:37] No ETA as far as I know [21:03:45] Zppix: I wonder.. [21:04:05] Wikpedia is currently blocked in PRC right? [21:04:11] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:04:12] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:04:14] ShakespeareFan00: Possibly? [21:04:14] ShakespeareFan00: Yep [21:04:25] (But those with VPN can work around the Chinese Firewall?) [21:04:54] Okay... just wondering where the VPN accesses [21:04:57] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:04:58] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:02] ShakespeareFan00: if you have (G)IPBE [21:05:05] Through WU or through US/JPANA lines? [21:05:19] (G)IPBE? [21:05:20] ShakespeareFan00: depends on the vpn server i asume [21:05:33] Okay just wondering if anyone used a VPN to check on loading times. [21:06:05] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 85% [21:06:13] ShakespeareFan00: (Global) IP Block Exemption - without you Can't edit from behind a VPN [21:06:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 51% [21:06:28] Is there a direct link to Flordia/Virgina I could use to check something [21:06:31] ? [21:06:54] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.003 second response time [21:07:27] ShakespeareFan00: you can try find a VPN through Google or ask Zppix to screenshot it [21:07:40] ShakespeareFan00: what you need? [21:08:02] ShakespeareFan00: web.archive.org is able to access enwiki [21:08:03] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 59% [21:08:04] Zppix: A link that doesn't go through Europe? [21:08:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 70% [21:08:30] And also one that ideally doesn't go through Asia [21:08:59] ShakespeareFan00: if you **need** access to enwiki or other projects web.archive.org has been reported to be able to access enwiki [21:08:59] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [21:09:07] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 2.316 second response time [21:09:51] Okay [21:10:02] Zppix: Have you seen this months Signpost? [21:10:02] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 68% [21:10:14] ShakespeareFan00: I saw some of it [21:10:28] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [21:10:30] Threre are three items in it that could have motivated certain elements [21:10:38] ShakespeareFan00: somewhat do you want a screenshot? [21:10:52] Nope... It finally loaded for me [21:10:53] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.658 second response time [21:11:03] ShakespeareFan00: dont expect it to be a stable connection [21:11:10] Zppix: I'm not [21:11:13] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.393 second response time [21:11:16] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:11:24] PROBLEM - mw1 Puppet on mw1 is UNKNOWN: UNKNOWN: Failed to check. Reason is: failed_to_parse_summary_file [21:11:27] Zppix: I assume you would be capable of determining which three items may have caused issues [21:11:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 51% [21:11:37] Not saying the outage is linked, but. [21:11:53] Let's not speculate too much [21:12:09] ShakespeareFan00: if I read it probably [21:12:37] RhinosF1: I mean to be fair it not speculation obviously something set someone off you just dont wake up one day and say let's DDoS WMF [21:12:37] I don't read Signpost, as I am blocked on enwikipedia until 2021 at the earliest [21:12:46] ShakespeareFan00: what did you do? [21:13:09] Zppix: Self block request after I over-reacted on a policy issue [21:13:14] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.025 second response time [21:13:18] Zppix: well true [21:13:20] (in effect self-requested competence block) [21:13:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 69% [21:13:50] Sorry should have said I don't read Signpost on a regular bais [21:13:53] *basis [21:13:55] ShakespeareFan00: See self-block wouldnt work for me i'd have to have operations block my ip from accessing the webserver [21:13:57] What I'm trying to say is? Don't try and blame certain groups or something [21:13:59] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 49% [21:14:35] Zppix: (re incident) especially on a scale that takes the site offline [21:14:37] Wikipedia is back up, see? https://xkcd.com/903/ [21:14:39] https://xkcd.com/903 | Extended Mind | Alt-text: Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at "Philosophy". [21:14:40] effectively... [21:15:00] Zppix: Have sites like dewikipedia mentioned issues? [21:15:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 56% [21:15:32] ShakespeareFan00: yes all wikimedia projects [21:15:43] Hmm [21:15:44] ShakespeareFan00: most wikimedia sites are down [21:15:48] But the US is still up [21:15:50] for now [21:15:52] *. wmflabs Is up [21:15:54] ShakespeareFan00: it is yes [21:15:56] Interesting [21:15:57] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 68% [21:16:10] ShakespeareFan00: it's essams as far as I know that's been hit [21:16:16] RhinosF1: *.wmflabs is up but barely it was a little iffy at first [21:16:49] Zppix: I've been fine for a bit with the -operations wm-bot logs open [21:17:20] RhinosF1: It seems to be more stable rn but when esp when the US was down wmflabs was all over the place [21:17:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [21:17:35] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:17:41] https://grafana.wikimedia.org/ is still not fully up for me , so I don't have graphs aagina yet [21:17:42] [ Grafana ] - grafana.wikimedia.org [21:18:05] ShakespeareFan00: Let me give the you a summary, everything is ####ed [21:18:16] xD [21:19:08] What Zppix said [21:19:13] Has consideration been giving to essentialy just pulling the plug? [21:19:19] paladox: what's up with us? [21:19:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 54% [21:19:29] ShakespeareFan00: let's let SRE do their job [21:19:32] i'm working with rn on that one :) [21:19:32] I.E Just shut down over the weekend? [21:19:41] ShakespeareFan00: NO [21:19:47] paladox: bandwidth or? [21:19:49] hi paladox [21:19:50] ShakespeareFan00: It wouldnt matter, as soon as the DC goes back up if they are still sending packets it would still continue the ttack [21:19:53] attack* [21:20:02] Hmm [21:20:13] hi, carn't talk much trying to resolve this big issue [21:20:18] Paladox: Are you -operations? [21:20:28] i'm in there but i'm not operations [21:20:34] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [21:20:43] I appreciate -operations can't talk about an active incident [21:22:27] ShakespeareFan00: what i've told you is all that as been publicly released and frankly all I know [21:22:41] Hmm - https://archive.is/J7GGB seems to be down as well [21:23:05] It was mentioned in the phabricator ticket [21:23:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 79% [21:24:51] wikipedia :/ [21:25:04] hispano76_: and us [21:25:25] hispano76_: ¡Sí, saben que está abajo, están trabajando duro para resolverlo! [21:25:44] ;( [21:26:14] and I plan on staying up as long as I can [21:26:53] RhinosF1: hey look at it this way its harder to vandalize on wikipedia if half the world cant access it [21:27:01] RhinosF1: Is Miraheze affected as well? [21:27:04] Zppix: well yeah [21:27:15] ShakespeareFan00: thats a seperate isssue paladox is working on [21:27:24] ShakespeareFan00: instant commons is down and Miraheze is having a seperate issue [21:27:30] so yes in a way [21:27:30] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 54% [21:27:49] Oh.. Becuase Miraheze uses Commons for some images.. Got it [21:27:55] ? [21:28:26] ShakespeareFan00: yep, as we pointed out. Our servers are getting content from the half of the world with issues [21:28:40] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:29:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 78% [21:33:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 43% [21:33:42] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 36% [21:34:42] let me depool mw1 [21:35:57] Is there anyone from WMF operation in this channel? [21:36:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 42% [21:37:00] ShakespeareFan00: if they are they'd be busy whats up? [21:37:07] ShakespeareFan00: samwilson or revi both work closely with the WMF but Zppix has NDA access as well. I doubt they can/will answer though [21:37:24] PROBLEM - mw1 Puppet on mw1 is CRITICAL: connect to address 185.52.1.75 port 5666: Connection refusedconnect to host 185.52.1.75 port 5666: Connection refused [21:37:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [21:37:32] PROBLEM - mw1 Disk Space on mw1 is CRITICAL: connect to address 185.52.1.75 port 5666: Connection refusedconnect to host 185.52.1.75 port 5666: Connection refused [21:37:34] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: connect to address 185.52.1.75 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [21:37:38] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 75% [21:37:39] RhinosF1: I appreciate they can't say much about an ongoing incident [21:38:01] Zppix: can I read -operations right that they've just depooled it [21:38:06] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: HTTP CRITICAL - No data received from host [21:38:09] PROBLEM - mw1 SSH on mw1 is CRITICAL: connect to address 185.52.1.75 and port 22: Connection refused [21:38:18] Zppix: 21:33 cdanis: cdanis@mw1317.eqiad.wmnet ~ 🕠🍺 sudo -i depool [21:38:21] RhinosF1: they depool one of many servers but idk why or if it matters [21:38:25] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 74% [21:38:28] Zppix: k [21:38:40] so I can at least read SAL [21:38:44] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: HTTP CRITICAL - No data received from host [21:39:02] PROBLEM - mw1 Current Load on mw1 is CRITICAL: connect to address 185.52.1.75 port 5666: Connection refusedconnect to host 185.52.1.75 port 5666: Connection refused [21:39:04] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: HTTP CRITICAL - No data received from host [21:39:06] PROBLEM - mw1 php-fpm on mw1 is CRITICAL: connect to address 185.52.1.75 port 5666: Connection refusedconnect to host 185.52.1.75 port 5666: Connection refused [21:39:19] PROBLEM - mw1 MirahezeRenewSsl on mw1 is CRITICAL: connect to address 185.52.1.75 and port 5000: Connection refused [21:39:28] icinga-miraheze: its okay i promise [21:39:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 47% [21:40:11] What as depooled? [21:40:13] *was [21:40:21] Zppix: it's okay 4 me now [21:40:26] (wikimedia) [21:40:34] ShakespeareFan00: some random mw* server [21:40:35] ShakespeareFan00: for Miraheze or Wikimedia? [21:40:41] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.394 second response time [21:40:50] RhinosF1: being okay now means nothing [21:41:04] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.005 second response time [21:41:08] Zppix: well yeah I've seen that throughout today [21:41:18] mw1 is back [21:41:25] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.015 second response time [21:41:29] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 39% [21:41:35] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 6% [21:41:36] paladox: can confirm [21:41:42] Ok Grafana is up [21:41:58] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 1.788 second response time [21:42:03] RECOVERY - mw1 SSH on mw1 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) [21:42:11] file uplaods wont work atm [21:42:15] that's known [21:42:25] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 21% [21:42:39] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [21:42:42] codfw seem to have had some spikes, as has esams as we know [21:42:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:45:39] Zppix: https://twitter.com/UKDrillas [21:45:39] [ UkDrillas (@UKDrillas) | Twitter ] - twitter.com [21:46:07] RhinosF1: its fake [21:46:14] RhinosF1: ignore it and dont give it attention [21:46:31] Zppix: Even if it's fake call the cops on them [21:46:46] ShakespeareFan00: its not illegal unless they truly did it [21:46:59] Zppix: UK has laws about hoax claims [21:47:05] otherwise every angry 12 year old on video games would be in jail [21:47:14] ShakespeareFan00: twitter is in the US though [21:47:54] Jurisdictionaly it would be a UK matter, and Twitter would have to co-operate with a UK investigation [21:47:57] ShakespeareFan00: T&S are aware and the account has been reported [21:48:27] If it's genuine, I hope those responsible end up in jail for a long time [21:49:04] ShakespeareFan00: Sutherland is aware and I'm sure they'll do what's needed. I've reported the tweet to twitter [21:49:28] Sutherland? [21:49:32] Trust me legal and trust and safety are going to be all over it for awhile [21:49:37] ShakespeareFan00: foks on irc [21:49:54] We're coming back up slowly... [21:50:19] And T&S can't comment on active incidents or investigations.. [21:50:32] ShakespeareFan00: Joe Sutherland, wikimedia trust and safety (English Wikipedia admin: Fox) on IRC now as foks [21:50:39] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:50:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:50:50] RhinosF1: Okay [21:50:58] RhinosF1 s/fox/foks [21:51:14] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [21:51:22] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.004 second response time [21:51:27] Zppix: personal account is fox [21:51:41] Zppix: "This is Wikimedia, My name's Joe, I carry a bit.." LOL [21:51:44] Sorry [21:51:52] ? [21:51:54] But the Dragnet intro suddenly came into mind [21:52:20] k6ka yeh, we are currently having issues with one of our data nodes which took the entire thing down. [21:52:34] I've unmounted a mount so that mw1 could come back online [21:52:35] Zppix: Not seen Dragnet? [21:52:39] nope [21:52:49] though images may appear missing and or you won't be able to upload. [21:53:06] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 9.743 second response time [21:53:06] Oh... sorry and never mind... but those here that have will understand my joke [21:55:33] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:55:44] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [21:56:31] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 6.106 second response time [21:56:33] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.005 second response time [21:56:39] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [21:56:43] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [21:57:25] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:00:23] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:00:57] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:00:58] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:01:06] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:01:15] PROBLEM - mw2 php-fpm on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:01:38] PROBLEM - mw2 Disk Space on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:01:52] PROBLEM - mw2 Current Load on mw2 is CRITICAL: connect to address 185.52.2.113 port 5666: Connection refusedconnect to host 185.52.2.113 port 5666: Connection refused [22:02:57] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 2.075 second response time [22:02:59] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 5.445 second response time [22:03:01] PROBLEM - mw2 Puppet on mw2 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 10 minutes ago with 0 failures [22:03:13] RECOVERY - mw2 php-fpm on mw2 is OK: PROCS OK: 6 processes with command name 'php-fpm7.2' [22:03:25] PROBLEM - mw1 Puppet on mw1 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 1 hour ago with 0 failures [22:03:27] RECOVERY - mw1 Disk Space on mw1 is OK: DISK OK - free space: / 29670 MB (38% inode=98%); [22:03:29] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.393 second response time [22:03:35] RECOVERY - mw2 Disk Space on mw2 is OK: DISK OK - free space: / 47807 MB (62% inode=99%); [22:03:38] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [22:03:51] RECOVERY - mw2 Current Load on mw2 is OK: OK - load average: 0.87, 0.24, 0.08 [22:03:51] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [22:03:53] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.004 second response time [22:03:55] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.392 second response time [22:04:00] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.633 second response time [22:05:03] PROBLEM - mw1 Current Load on mw1 is WARNING: WARNING - load average: 5.07, 7.09, 6.09 [22:05:07] RECOVERY - mw1 php-fpm on mw1 is OK: PROCS OK: 13 processes with command name 'php-fpm7.2' [22:07:20] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:07:24] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:07:30] hi JohnLewis [22:07:49] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:07:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw3 [22:08:45] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:08:48] paladox: 503 again lol [22:08:52] huh [22:09:02] RECOVERY - mw1 Current Load on mw1 is OK: OK - load average: 4.70, 6.57, 6.15 [22:09:09] works for me [22:09:14] PROBLEM - mw3 SSH on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:09:24] im getting CP2 paladox [22:09:30] PROBLEM - mw3 Disk Space on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:09:56] PROBLEM - mw3 JobChron Service on mw3 is CRITICAL: connect to address 81.4.121.113 port 5666: Connection refusedconnect to host 81.4.121.113 port 5666: Connection refused [22:10:03] should probably issue a notice about what's going on [22:10:37] PROBLEM - mw3 JobRunner Service on mw3 is CRITICAL: PROCS CRITICAL: 0 processes with args 'redisJobRunnerService' [22:10:37] Voidwalker: I did a post on CN for the instant commons issues but yeah... we need to be up [22:10:40] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [22:10:44] PROBLEM - mw3 Puppet on mw3 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 8 minutes ago with 0 failures [22:11:01] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [22:11:11] RECOVERY - mw3 SSH on mw3 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) [22:11:17] Voidwalker: do you want to do discord? [22:11:26] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.005 second response time [22:11:28] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.691 second response time [22:11:28] RECOVERY - mw3 Disk Space on mw3 is OK: DISK OK - free space: / 52944 MB (68% inode=99%); [22:11:38] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.418 second response time [22:11:51] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [22:12:31] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [22:14:56] RhinosF1, done, considering a central notice alert/sitenotice as well [22:15:44] Voidwalker: okay, Central Notice might be better - maybe mention images not showing properly as well (due to commons being down) [22:16:23] ALL images will be down because our file service is basically broken [22:16:46] it's not only wmf commons that's causing issues with images [22:17:26] Voidwalker: okay, well don't just say uploads might be broke is what I mean [22:30:06] PROBLEM - bacula1 Puppet on bacula1 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 8 minutes ago with 0 failures [22:38:10] last call for any issues before I sleep! It's quiet so i'm taking the chance [22:39:03] RhinosF1: the sky is falling [22:39:27] Zppix: honestly, todays the day ANYTHING could happen [22:39:33] Lol [22:39:50] central notice should be up [22:39:57] Zppix: craxzy week near enough [22:40:02] s/x/ [22:40:02] RhinosF1 meant to say: Zppi: craxzy week near enough [22:40:10] that worked [22:40:11] Lol [22:42:02] Zppix: at least it's Friday [22:45:35] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjMI [22:45:36] [02miraheze/puppet] 07paladox 0342fa4d4 - Update site.pp [22:45:43] * RhinosF1 hides [22:48:10] RECOVERY - lizardfs5 Disk Space on lizardfs5 is OK: DISK OK - free space: / 72763 MB (23% inode=90%); [22:48:11] RECOVERY - lizardfs5 Current Load on lizardfs5 is OK: OK - load average: 1.05, 0.52, 0.22 [22:48:11] RECOVERY - lizardfs5 Lizardfs Chunkserver Port on lizardfs5 is OK: TCP OK - 0.005 second response time on 81.4.122.196 port 9422 [22:51:45] Zppix, check out https://meta.miraheze.org/wiki/User:Void/massGBlock.js [22:51:46] [ User:Void/massGBlock.js - Miraheze Meta ] - meta.miraheze.org [22:52:10] PROBLEM - lizardfs5 Disk Space on lizardfs5 is CRITICAL: connect to address 81.4.122.196 port 5666: Connection refusedconnect to host 81.4.122.196 port 5666: Connection refused [22:52:11] PROBLEM - lizardfs5 Current Load on lizardfs5 is CRITICAL: connect to address 81.4.122.196 port 5666: Connection refusedconnect to host 81.4.122.196 port 5666: Connection refused [22:52:12] Ok [22:52:37] Ty [22:54:04] PROBLEM - puppet1 Puppet on puppet1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_puppet] [22:54:11] PROBLEM - lizardfs5 Lizardfs Chunkserver Port on lizardfs5 is CRITICAL: connect to address 81.4.122.196 and port 9422: Connection refused [22:57:02] RECOVERY - lizardfs5 SSH on lizardfs5 is OK: SSH OK - OpenSSH_7.9p1 Debian-10 (protocol 2.0) [22:58:10] RECOVERY - lizardfs5 Disk Space on lizardfs5 is OK: DISK OK - free space: / 72905 MB (23% inode=90%); [22:58:11] RECOVERY - lizardfs5 Current Load on lizardfs5 is OK: OK - load average: 1.28, 0.60, 0.23 [22:58:11] RECOVERY - lizardfs5 Lizardfs Chunkserver Port on lizardfs5 is OK: TCP OK - 0.011 second response time on 81.4.122.196 port 9422 [22:58:40] RECOVERY - lizardfs5 Puppet on lizardfs5 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:02:40] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjMW [23:02:42] [02miraheze/puppet] 07paladox 03b90404c - lizardfs:client: Add mfsdelayedinit to the option This will make connecting to the lizardfs master in the background. [23:04:29] JohnLewis ^ [23:05:25] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:05:40] okay [23:06:56] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:07:19] RECOVERY - mw1 MirahezeRenewSsl on mw1 is OK: TCP OK - 0.001 second response time on 185.52.1.75 port 5000 [23:09:47] RECOVERY - mw3 JobChron Service on mw3 is OK: PROCS OK: 1 process with args 'redisJobChronService' [23:10:31] RECOVERY - mw3 JobRunner Service on mw3 is OK: PROCS OK: 1 process with args 'redisJobRunnerService' [23:10:36] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [23:11:48] File storage should be recovered now :) [23:14:13] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjM2 [23:14:14] [02miraheze/puppet] 07paladox 030d8a109 - Add mw4 to manifest [23:16:57] RECOVERY - mw4 Puppet on mw4 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [23:22:04] RECOVERY - puppet1 Puppet on puppet1 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [23:23:08] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 19 seconds ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [23:26:57] I've disabled the central notice [23:27:06] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:27:25] Voidwalker: your script doesnt seem to work on special:MassBlock i get no interface [23:27:28] and ive cleared cache [23:27:50] Special:MassGlobalBlock ? [23:28:14] Voidwalker: the script page says Special:MassBlock xD [23:28:39] "Script that allows for mass (global) blocking via Special:MassGlobalBlock" [23:28:56] oh [23:29:02] I see where it's wrong [23:29:02] Yeah :P