[01:20:33] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw2 mw3 [01:21:56] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [01:21:58] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw3 [01:22:04] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [01:22:45] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [01:23:06] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 3 datacenters are down: 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 81.4.109.133/cpweb [01:24:43] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.661 second response time [01:24:59] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [01:25:56] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [01:25:58] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [01:26:04] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [01:26:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [01:54:18] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [01:54:33] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [01:55:15] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/logrotate.d/nginx] [01:56:18] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [01:56:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [02:05:14] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [02:16:07] !log increasing swap size on misc3 to 768 [02:16:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [02:16:27] !log a small downtime will happen (this is being done because misc3 OOM) [02:16:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [02:19:25] PROBLEM - misc3 Lizardfs Master Port 3 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9421: Connection refused [02:19:31] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [02:19:34] PROBLEM - misc3 Lizardfs Master Port 2 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9420: Connection refused [02:19:52] PROBLEM - misc3 Lizardfs Master Port 1 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9419: Connection refused [02:20:11] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:12] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [02:20:13] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:13] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [02:20:14] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:14] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:21] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [02:20:21] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 41% [02:20:22] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:25] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:29] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:32] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [02:20:44] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:20:50] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 90% [02:20:54] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [02:21:20] RECOVERY - misc3 Lizardfs Master Port 3 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9421 [02:21:23] Hello Bloodstream! If you have any questions feel free to ask and someone should answer soon. [02:21:34] RECOVERY - misc3 Lizardfs Master Port 2 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9420 [02:21:52] RECOVERY - misc3 Lizardfs Master Port 1 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9419 [02:22:07] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.016 second response time [02:22:08] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [02:22:10] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.024 second response time [02:22:10] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.670 second response time [02:22:11] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.646 second response time [02:22:13] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [02:22:18] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [02:22:20] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.004 second response time [02:22:21] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 2% [02:22:23] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.393 second response time [02:22:28] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.393 second response time [02:22:28] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [02:22:44] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.715 second response time [02:22:47] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 5% [02:22:54] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.400 second response time [02:23:19] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [05:04:53] [02miraheze/ManageWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±5] 13https://git.io/fjj8j [05:04:54] [02miraheze/ManageWiki] 07translatewiki 033a3cec0 - Localisation updates from https://translatewiki.net. [05:04:56] [02miraheze/CreateWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/fjj4e [05:04:57] [02miraheze/CreateWiki] 07translatewiki 03cd67d10 - Localisation updates from https://translatewiki.net. [05:04:59] [02miraheze/WikiDiscover] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjj4v [05:05:00] [02miraheze/WikiDiscover] 07translatewiki 03d4500f3 - Localisation updates from https://translatewiki.net. [06:26:53] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3063 MB (12% inode=94%); [10:17:59] Hello A101! If you have any questions feel free to ask and someone should answer soon. [10:19:37] Oh boy, the welcome bot remembers by nick and not by hostname :( [10:26:52] Hi! Here is the list of currently open high priority tasks on Phabricator [10:26:59] No updates for 6 days - https://phabricator.miraheze.org/T4504 - Renew *.miraheze.org cert - authored by Reception123, assigned to Southparkfan [10:34:37] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:41:28] PROBLEM - reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: connect to address reviwiki.info and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [10:43:23] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:46:29] RECOVERY - www.reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:47:15] PROBLEM - reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: connect to address reviwiki.info and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [10:49:09] RECOVERY - reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [10:50:29] PROBLEM - www.reviwiki.info - PositiveSSLDV on sslhost is CRITICAL: connect to address www.reviwiki.info and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [10:52:29] RECOVERY - www.reviwiki.info - PositiveSSLDV on sslhost is OK: OK - Certificate 'reviwiki.info' will expire on Wed 03 Feb 2021 11:59:59 PM GMT +0000. [11:17:21] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.29, 5.33, 4.02 [11:19:20] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 3.89, 4.68, 3.94 [13:45:21] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.11, 5.70, 4.14 [13:47:21] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 9.57, 6.84, 4.73 [13:49:21] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.06, 7.17, 5.13 [13:51:20] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 4.55, 6.20, 5.02 [14:02:33] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:02:53] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:01] PROBLEM - misc3 Lizardfs Master Port 3 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9421: Connection refused [14:03:16] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:16] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:18] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:26] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:29] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:30] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:31] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:43] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 44% [14:03:43] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:52] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:03:56] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:03:58] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:03:58] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:04:10] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 89% [14:04:12] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:04:13] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:04:18] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:04:51] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [14:05:34] PROBLEM - misc3 Lizardfs Master Port 2 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9420: Connection refused [14:05:42] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 71% [14:05:53] PROBLEM - misc3 Lizardfs Master Port 1 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9419: Connection refused [14:06:42] JohnLewis: we're down 503 [14:06:51] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 44% [14:06:56] paladox is dealing [14:07:00] Good [14:08:18] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 1.409 second response time [14:08:20] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 9.992 second response time [14:08:34] !log increased swap to 1280M [14:08:42] !log thats for misc3 [14:08:56] RECOVERY - misc3 Lizardfs Master Port 3 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9421 [14:09:00] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.008 second response time [14:09:28] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.007 second response time [14:09:31] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.009 second response time [14:09:34] RECOVERY - misc3 Lizardfs Master Port 2 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9420 [14:09:35] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.007 second response time [14:09:36] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.006 second response time [14:09:41] RECOVERY - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 1% [14:09:43] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.675 second response time [14:09:44] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24522 bytes in 0.431 second response time [14:09:48] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 1.523 second response time [14:09:52] RECOVERY - misc3 Lizardfs Master Port 1 on misc3 is OK: TCP OK - 0.005 second response time on 185.52.1.71 port 9419 [14:09:56] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [14:09:58] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [14:09:58] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.684 second response time [14:09:59] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [14:10:10] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 4% [14:10:10] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.393 second response time [14:10:18] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [14:10:32] !log increased swap to 1280M on misc3 [14:10:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [14:10:37] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [14:10:45] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [14:10:51] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 1% [14:11:03] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [14:12:18] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [14:13:01] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:14:16] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:14:41] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:49:39] K6ka yes it doesnt do hostname, but by nick [14:55:47] Hmm, I just a notification from a private wiki but not sure what about? [15:00:56] paladox: we're on a go slow (testwiki) [15:01:23] paladox: 503 [15:01:28] checking [15:02:35] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [15:02:35] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:02:35] Same issue [15:04:33] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [15:04:33] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.393 second response time [15:25:06] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjwh [15:25:08] [02miraheze/mw-config] 07paladox 039f97b81 - citoid: use $wgCitoidFullRestbaseURL not $wgCitoidServiceUrl [15:25:59] paladox: any good with scribunto? /doc pages aren't defaulting to wikitext [15:26:15] sadly nope :( [15:27:05] paladox: Can't reproduce on testwikipedia or find any config they're using and we're not so wondering if it was a bug they've fixed [15:27:28] i'm not sure. [15:27:52] me and JohnLewis are currently busy trying to fix the backend (and also have some exciting upgrades planned too!) [15:28:35] Oh goody! Enjoy upgrading! I'll try and weed something out of the knowledgeable folk on mediawiki discord [15:37:40] PROBLEM - mw4 php-fpm on mw4 is CRITICAL: PROCS CRITICAL: 0 processes with command name 'php-fpm7.2' [15:37:41] PROBLEM - mw4 HTTPS on mw4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:37:41] \o/ [15:37:42] PROBLEM - bacula1 Current Load on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:37:44] RECOVERY - bacula1 Current Load on bacula1 is OK: OK - load average: 0.00, 0.00, 0.00 [15:38:59] [02mw-config] 07RhinosF1 opened pull request 03#2751: per T4699 - 13https://git.io/fjjrm [15:41:03] [02mw-config] 07RhinosF1 closed pull request 03#2751: per T4699 - 13https://git.io/fjjrm [15:41:10] [02mw-config] 07RhinosF1 commented on pull request 03#2751: per T4699 - 13https://git.io/fjjr3 [15:48:09] paladox: I have to go now for a while, puppet is running on mw4 [15:48:16] ok [15:48:29] JohnLewis i can take care of fully integrating it, if you want? [15:48:30] don't give it the full shebang, just do the stuff needed to deploy it for me [15:48:35] ok [15:48:39] so no dns etc. [15:49:03] * RhinosF1 wonders what is being done.... [15:49:07] ok [15:49:21] JohnLewis i add it to varnish, right? [15:49:32] testing the feasbility of an idea I have to downscale mw* in a way [15:49:38] paladox: yes as otherwise how is it going to be put into prod :) [15:50:10] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjrC [15:50:12] [02miraheze/services] 07MirahezeSSLBot 034c9d770 - BOT: Updating services config for wikis [15:50:12] JohnLewis: what does that mean for people? users? [15:51:23] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [15:51:25] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrE [15:51:26] [02miraheze/puppet] 07paladox 038a24f35 - Pool in mw4 [15:51:28] [02puppet] 07paladox opened pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:52:01] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjrz [15:52:03] [02miraheze/puppet] 07paladox 03880798a - Update config.yaml [15:52:38] RECOVERY - mw4 HTTPS on mw4 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.010 second response time [15:53:24] RhinosF1: for users, likely nothing hopefully [15:53:38] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjrP [15:53:39] [02miraheze/puppet] 07paladox 036c0a179 - Update stunnel.conf [15:54:14] PROBLEM - mw4 Puppet on mw4 is CRITICAL: CRITICAL: Puppet has 37 failures. Last run 2 minutes ago with 37 failures. Failed resources (up to 3 shown): Package[ploticus],Package[ttf-freefont],Package[libav-tools],Package[texvc] [15:54:16] paladox: a few puppet failures btw so if you fancy making the puppet module deb 10 friendly, be nice :P [15:54:24] JohnLewis will do :) [15:54:31] just going to pull mw4 in first [15:54:33] JohnLewis: oh, What does it mean for anyone then? [15:54:50] RhinosF1: less costs hopefully for us :) [15:54:56] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrM [15:54:58] [02miraheze/puppet] 07paladox 0324617c3 - Update storage_firewall.yaml [15:54:59] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:55:14] JohnLewis: well that's always good news [15:55:43] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrS [15:55:45] [02miraheze/puppet] 07paladox 03e739a4a - Update varnish.pp [15:55:46] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:56:13] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjrQ [15:56:15] [02miraheze/puppet] 07paladox 03bfa0547 - Update irc.pp [15:56:16] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:57:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjr7 [15:57:03] [02miraheze/puppet] 07paladox 03dfac2c0 - Update services.pp [15:57:05] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:57:41] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjr5 [15:57:43] [02miraheze/puppet] 07paladox 03a5359bd - Update db.pp [15:57:44] [02puppet] 07paladox synchronize pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:57:59] [02puppet] 07paladox closed pull request 03#1079: Pool in mw4 - 13https://git.io/fjjru [15:58:00] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±5] 13https://git.io/fjjrd [15:58:02] [02miraheze/puppet] 07paladox 03607e930 - Pool in mw4 (#1079) * Pool in mw4 * Update storage_firewall.yaml * Update varnish.pp * Update irc.pp * Update services.pp * Update db.pp [15:58:56] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [15:58:58] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [16:00:30] RECOVERY - mw4 php-fpm on mw4 is OK: PROCS OK: 3 processes with command name 'php-fpm7.2' [16:12:59] PROBLEM - lizardfs4 Puppet on lizardfs4 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:13:22] PROBLEM - lizardfs5 Puppet on lizardfs5 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:13:26] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:16:01] uh [16:16:07] network gone down on lizardfs [16:18:29] it's back [16:18:54] RECOVERY - lizardfs4 Puppet on lizardfs4 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:23:11] RECOVERY - lizardfs5 Puppet on lizardfs5 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:23:26] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:32:11] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2649 MB (10% inode=94%); [16:33:04] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+4/-0/±1] 13https://git.io/fjjo8 [16:33:05] [02miraheze/puppet] 07paladox 0302ed278 - mediawiki: Add support for debian 10 [16:43:55] !log reloaded lizardfs-master on misc3 [16:44:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:44:49] !log running lc on mw4 [16:44:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:45:20] RECOVERY - mw4 Puppet on mw4 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:51:06] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/fjjod [16:51:07] [02miraheze/puppet] 07paladox 03a5f0a43 - varnish: Add mw4 [16:51:08] [02puppet] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbiAS [16:51:10] [02puppet] 07paladox opened pull request 03#1080: varnish: Add mw4 - 13https://git.io/fjjoF [16:51:45] [02puppet] 07paladox closed pull request 03#1080: varnish: Add mw4 - 13https://git.io/fjjoF [16:51:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjoN [16:51:48] [02miraheze/puppet] 07paladox 03225f68c - varnish: Add mw4 (#1080) [16:53:22] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-4 [16:53:23] [02puppet] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbiAS [16:55:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjKe [16:55:03] [02miraheze/puppet] 07paladox 03cd14506 - fix syntax [16:58:02] [02miraheze/MirahezeDebug] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjKk [16:58:05] [02miraheze/MirahezeDebug] 07paladox 0345eeba9 - add mw4 [16:59:10] [02miraheze/MirahezeDebug] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjKt [16:59:12] [02miraheze/MirahezeDebug] 07paladox 0371946ab - Update popup.html [17:04:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fjjK3 [17:04:38] [02miraheze/puppet] 07paladox 03a9de0dd - Update mediawiki::branch to REL1_33 [17:06:01] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw4 [17:06:58] PROBLEM - mw4 Puppet on mw4 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 2 minutes ago with 1 failures [17:07:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw4 [17:07:59] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw4 [17:14:58] PROBLEM - mw4 Puppet on mw4 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_clone_MediaWiki core] [17:15:32] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [17:15:51] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [17:15:56] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [17:21:33] PROBLEM - mw4 Current Load on mw4 is CRITICAL: CRITICAL - load average: 5.80, 3.70, 2.48 [17:24:58] RECOVERY - mw4 Puppet on mw4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:25:32] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 1.59, 3.21, 2.63 [17:33:32] PROBLEM - mw4 Current Load on mw4 is CRITICAL: CRITICAL - load average: 9.36, 5.48, 3.60 [17:36:58] PROBLEM - mw4 Puppet on mw4 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 2 minutes ago with 0 failures [17:47:33] PROBLEM - mw4 Current Load on mw4 is WARNING: WARNING - load average: 2.57, 3.51, 3.83 [17:53:32] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 1.47, 2.62, 3.35 [18:00:41] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw2 mw3 mw4 [18:00:41] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 4 backends are down. mw1 mw2 mw3 mw4 [18:00:59] huh [18:01:31] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw4 [18:01:40] I would say huh backwards back to you paladox but its still huh [18:02:01] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 2400:6180:0:d0::403:f001/cpweb [18:03:25] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [18:05:27] PROBLEM - misc3 Lizardfs Master Port 1 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9419: Connection refused [18:05:39] PROBLEM - misc3 Lizardfs Master Port 3 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9421: Connection refused [18:06:04] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:05] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:07] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:09] PROBLEM - misc3 Lizardfs Master Port 2 on misc3 is CRITICAL: connect to address 185.52.1.71 and port 9420: Connection refused [18:06:18] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:28] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [18:06:30] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:06:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [18:07:14] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw1 mw4 [18:07:27] RECOVERY - misc3 Lizardfs Master Port 1 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9419 [18:07:42] RECOVERY - misc3 Lizardfs Master Port 3 on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.71 port 9421 [18:08:00] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [18:08:11] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.005 second response time [18:08:11] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.025 second response time [18:08:12] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24500 bytes in 0.396 second response time [18:08:12] paladox: can you drop in #Wikimedia-operations and tell them to see https://phabricator.wikimedia.org/T232224 [18:08:12] RECOVERY - misc3 Lizardfs Master Port 2 on misc3 is OK: TCP OK - 0.002 second response time on 185.52.1.71 port 9420 [18:08:17] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.396 second response time [18:08:25] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 2% [18:08:26] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24516 bytes in 0.676 second response time [18:08:29] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [18:08:44] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:12:42] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [18:12:47] RhinosF1 they are aware [18:13:05] paladox: thx [18:14:28] i didn't tell them that task, but they are both aware and also they can see you've created a task [18:14:33] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw4 [18:15:27] RhinosF1, irony being I got a timeout error when first attempting to view that task :P [18:18:41] I couldnt view it at all i thought it was just me lol [18:18:41] Voidwalker you would as phabricator is behind varnish :) [18:19:01] Voidwalker: phabs slows for me but works [18:19:01] Whats is up with it? [18:19:42] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [18:19:42] * RhinosF1 was going to check the logs for -operations but ha *.wmflabs.org is down [18:19:42] then eqiad is down [18:19:42] Voidwalker, i get it you like hitting delete but theres no need to nuke the entirety of everything [18:19:42] DoS attack [18:19:42] They suspect Dos [18:19:43] Wmf is being ddos’d? [18:19:43] Im getting in irc i got to see -operations [18:19:43] Well no confirmation [18:19:43] wouldn't be the first time [18:19:43] It is since ive been around (i think?) [18:19:43] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [18:19:43] Wait so is enwiki down? [18:19:43] yes [18:19:43] paladox: ops clinic just told me [18:19:43] the whole cluster is [18:19:43] so wmflabs/wikimedia/wikipedia [18:19:43] there goes my easy access to IP lookups :P [18:19:43] 19:17:06 It's a dos attack. Everybody is online looking at it [18:21:04] Rip ZppixBot [18:21:08] Oh nvm [18:21:16] Probably part of the hit [18:22:22] Cant they just do DC switchover paladox [18:22:22] To update: it's confirmed a DoS attack not DDoS [18:22:22] that makes it easier to deal with [18:25:06] I think they are back (for now?) [18:25:06] Not for me [18:25:06] Thanks [18:25:06] So.. is this channel for unsubstantiated speculations? [18:25:06] :lol: [18:25:27] This is a know issue. The SRE team is finding a quick solution to restore these services. Thanks [18:25:27] Hello ShakespeareFan00! If you have any questions feel free to ask and someone should answer soon. [18:25:27] ShakespeareFan00: no live information on the incident [18:25:27] Poor ZppixBot is slow because of the DoS [18:25:27] lol ZppixBot [18:25:27] * Zppix pats ZppixBot [18:25:27] Its okay buddy [18:25:27] * RhinosF1 goes and gives ZppixBot a boost [18:25:27] Is it only Wikimedia sites? [18:25:27] ShakespeareFan00: wmflabs/wikimedia/wikipedia [18:36:43] it's confirmed DDOS [18:37:44] Who the hell DDoS’ wmf? Like whats the point [18:39:16] paladox: DDoS? Clinic told me DoS rather than DDoS explicitly when I asked [18:39:53] Zppix: Wikipedia has coverage of thing certain entities do not like [18:40:13] RhinosF1: BBlack confirmed it was DDoS [18:40:26] Zppix: k [18:41:32] per standing plicy, we are unliley to hear much about an ongoing investigation [18:42:02] unlikely [18:42:13] Yep clinic have just said DDoS but complicated [18:43:47] I would not be at all suprised if its linked to something wikipedia published recently [18:43:58] ShakespeareFan00: I doubt it [18:44:14] Its probably some LTA [18:45:14] Zppix: it must be big [18:45:32] PROBLEM - mw4 Current Load on mw4 is WARNING: WARNING - load average: 3.72, 3.03, 2.45 [18:47:22] RhinosF1 : Is there a way to see where the traffic is coming from geographically? [18:47:32] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 1.75, 2.61, 2.37 [18:47:58] ShakespeareFan00: not that I know of [18:51:18] PROBLEM - wiki.articmetrosystem.tk - LetsEncrypt on sslhost is CRITICAL: Name or service not knownHTTP CRITICAL - Unable to open TCP socket [19:01:10] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw3 mw4 [19:01:52] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw1 [19:02:01] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw1 mw4 [19:03:55] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [19:04:08] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 3 datacenters are down: 107.191.126.23/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb [19:05:38] paladox: not us as well ^ ??? [19:05:43] nope [19:05:54] we are not being DDOS, if we were you'll see more then that :P [19:06:13] paladox: well we are at least slow [19:06:27] * RhinosF1 was being half serious half sarcastic [19:06:42] that will be misc3 again [19:06:53] we are looking to resolve this [19:08:02] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:09:07] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [19:09:39] paladox: You said yesterday :) [19:10:30] yeh, but resolving it is not easy as it sounds. We are looking at replacements. [19:11:34] paladox: ok, you spoke to RN didn't you? [19:11:43] yes [19:12:00] paladox: Wasn't they going to change something? [19:12:14] Yes, which they did. [19:12:20] Dosen't seem to have helped much. [19:12:42] I gather [19:14:31]