[00:00:00] PROBLEM - cp4 Stunnel Http for misc2 on cp4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 2319 bytes in 0.080 second response time [00:00:04] SPF|Cloud: how much of an impact though? [00:00:18] PROBLEM - cp3 HTTPS on cp3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4055 bytes in 0.660 second response time [00:00:48] RhinosF1: large, faster i/o would also make the process faster [00:01:07] it is expected - doing lots of checks in the background [00:01:14] ok [00:02:37] PROBLEM - cp3 Stunnel Http for test2 on cp3 is CRITICAL: NRPE: Command 'check_stunnel_test2' not defined [00:02:47] PROBLEM - cp3 Stunnel Http for mw7 on cp3 is CRITICAL: NRPE: Command 'check_stunnel_mw7' not defined [00:02:54] PROBLEM - cp3 Stunnel Http for mw6 on cp3 is CRITICAL: NRPE: Command 'check_stunnel_mw6' not defined [00:02:58] PROBLEM - cp3 Stunnel Http for mw4 on cp3 is CRITICAL: NRPE: Command 'check_stunnel_mw4' not defined [00:03:28] it's okay icinga-miraheze, you're next [00:03:29] PROBLEM - cp3 Stunnel Http for mw5 on cp3 is CRITICAL: NRPE: Command 'check_stunnel_mw5' not defined [00:04:03] PROBLEM - cp4 Current Load on cp4 is CRITICAL: CRITICAL - load average: 0.30, 3.31, 2.48 [00:04:53] PROBLEM - cp4 HTTPS on cp4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4136 bytes in 0.008 second response time [00:06:00] PROBLEM - cp4 Current Load on cp4 is WARNING: WARNING - load average: 0.09, 1.57, 1.95 [00:07:50] paladox: still starting? [00:07:54] yup [00:07:58] RECOVERY - cp4 Current Load on cp4 is OK: OK - load average: 0.04, 0.75, 1.52 [00:07:58] oh [00:08:01] it started now [00:08:09] awesome [00:08:13] !log running mysql_upgrade [00:08:28] sorry, I set read_only=1 quickly :P [00:08:46] RECOVERY - cp4 HTTPS on cp4 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1531 bytes in 0.047 second response time [00:08:52] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19704 bytes in 6.025 second response time [00:09:39] RECOVERY - cp3 HTTPS on cp3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1532 bytes in 1.391 second response time [00:09:42] RECOVERY - cp8 HTTPS on cp8 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1531 bytes in 0.487 second response time [00:11:55] paladox: if you see progress (how many dbs done?), it is mandatory to provide info to us :P [00:12:11] it says "Phase 3/7: Fixing views" [00:12:14] and that's about it [00:13:01] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:18] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 1.066 second response time [00:13:53] uh oh, meta is loading [00:14:17] It'll be slow as db6 load is at 7 [00:14:25] due to i/o i think (due to mysql_upgrade) [00:14:26] paladox: shall I kill nginx temporarily? [00:14:30] yeh [00:14:43] there's a salt master on the new server [00:14:45] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 51% [00:14:45] (puppet2) [00:15:40] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 59% [00:16:43] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 90% [00:17:09] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4197 bytes in 0.063 second response time [00:17:43] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 68% [00:18:10] !log stop nginx on mediawiki servers to prevent load on db6, while performing upgrade [00:20:24] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: connect to address 185.52.1.75 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [00:22:07] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [00:22:08] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 1.102 second response time [00:22:22] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19687 bytes in 0.327 second response time [00:22:24] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.009 second response time [00:22:35] RECOVERY - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is OK: OK - NGINX Error Rate is 11% [00:22:49] SPF|Cloud ^ Guess you forgot puppet :P [00:22:53] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19687 bytes in 0.822 second response time [00:22:54] oh [00:22:57] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [00:22:59] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19704 bytes in 0.520 second response time [00:23:03] they shouldn't even be pooled, nvm [00:23:07] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 0.321 second response time [00:23:37] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [00:23:43] SPF|Cloud it's done "poserdazfreebieswiki.dpl_clview OK" and now still running.. [00:24:02] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 26% [00:24:20] !log stopped most nginxes again [00:24:55] * RhinosF1 is logging off so he can sleep. Thanks to everyone both in SRE and as users. You've been great to support tonight and good luck for the rest of the update. [00:25:05] paladox: but so far it does look like everything went fine [00:25:10] yup [00:25:31] RhinosF1: sleep well. Thanks so much for the assistance and positivity. [00:25:44] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: connect to address 185.52.2.113 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [00:25:57] SPF|Cloud: no problem. You've done great! [00:26:04] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [00:26:08] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4195 bytes in 0.106 second response time [00:26:16] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4195 bytes in 0.077 second response time [00:26:24] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: connect to address 185.52.1.75 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [00:26:26] SPF|Cloud it's running "Phase 4/7: Running 'mysql_fix_privilege_tables'" now [00:26:33] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 71% [00:26:44] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4199 bytes in 0.202 second response time [00:27:01] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4199 bytes in 0.200 second response time [00:27:06] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 51.77.107.210/cpweb, 2001:41d0:800:1056::2/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [00:27:07] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4197 bytes in 0.077 second response time [00:27:30] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: connect to address 81.4.121.113 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [00:27:32] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [00:27:51] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 98% [00:29:47] !log disable puppet on lizardfs6 and mw4-7 [00:30:12] PROBLEM - lizardfs6 Puppet on lizardfs6 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 8 minutes ago with 0 failures [00:31:06] It's running "Phase 5/7: Fixing table and database names" now [00:31:18] SPF|Cloud ^ [00:31:22] woo [00:31:46] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.012 second response time [00:32:24] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.008 second response time [00:32:26] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 58% [00:33:00] paladox: I presume it's at the next phase now? [00:33:07] yup [00:33:16] it's on the a's [00:33:17] * SPF|Cloud is spying on you with show full processlist; [00:33:27] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 541 bytes in 0.007 second response time [00:33:28] now b's [00:33:30] heh [00:34:24] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 78% [00:35:54] on the c's [00:37:42] on the d's [00:38:04] on the e's [00:40:31] on the f's [00:41:51] on the g's [00:42:17] on the h's [00:42:42] is this phase 6 or 7? [00:44:18] on the i's [00:44:55] SPF|Cloud 6 [00:45:03] ah [00:48:46] on the m's [00:54:57] on the q's [00:55:04] on the r's [00:59:52] on the t's [01:04:21] on the v's [01:08:02] SPF|Cloud finished! [01:08:24] alright! [01:08:30] !log starting nginx [01:09:54] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19685 bytes in 0.270 second response time [01:09:56] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 0.374 second response time [01:10:04] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19687 bytes in 1.912 second response time [01:11:13] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [01:11:19] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 43% [01:11:49] RECOVERY - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is OK: OK - NGINX Error Rate is 23% [01:12:48] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19705 bytes in 1.605 second response time [01:13:13] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19704 bytes in 0.894 second response time [01:13:19] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 14% [01:13:38] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [01:13:51] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [01:15:28] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [01:15:33] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [01:15:52] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 46% [01:16:39] time for c2 wikis [01:17:52] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 25% [01:21:43] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 51.77.107.210/cpweb, 2001:41d0:800:1056::2/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [01:21:45] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 51.77.107.210/cpweb, 2001:41d0:800:1056::2/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [01:23:15] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:50] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw4 mw7 [01:23:53] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw7 [01:25:22] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 8.601 second response time [01:25:34] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:56] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [01:26:01] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [01:27:36] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19687 bytes in 7.254 second response time [01:29:57] !log db6 MariaDB [(none)]> set global innodb_io_capacity=100; [01:36:48] !log MariaDB [(none)]> set global innodb_io_capacity_max=500; [01:41:51] !log stopping nginx on mw[4567] [01:42:00] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:42:06] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw4 mw5 [01:42:24] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:42:29] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw4 mw7 [01:42:58] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:45:19] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 92% [01:45:42] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 90% [01:46:24] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [01:46:27] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4199 bytes in 0.064 second response time [01:46:42] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4199 bytes in 0.102 second response time [01:50:25] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 57% [01:52:25] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [02:11:44] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl66 [02:11:45] [02miraheze/puppet] 07paladox 03cee92b9 - grafana: Switch to db6 [02:12:11] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-5 [+0/-0/±1] 13https://git.io/Jvl6i [02:12:13] [02miraheze/puppet] 07paladox 03f3e353a - matomo: Switch to db6 [02:12:14] [02puppet] 07paladox created branch 03paladox-patch-5 - 13https://git.io/vbiAS [02:12:16] [02puppet] 07paladox opened pull request 03#1233: matomo: Switch to db6 - 13https://git.io/Jvl6P [02:15:00] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvl6y [02:15:02] [02miraheze/puppet] 07paladox 031053534 - icinga: Switch to db6 [02:15:03] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [02:15:08] [02puppet] 07paladox opened pull request 03#1234: icinga: Switch to db6 - 13https://git.io/Jvl6S [02:15:39] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvl69 [02:15:41] [02miraheze/puppet] 07paladox 0365c215c - Update init.pp [02:15:42] [02puppet] 07paladox synchronize pull request 03#1234: icinga: Switch to db6 - 13https://git.io/Jvl6S [02:15:53] [02puppet] 07paladox closed pull request 03#1234: icinga: Switch to db6 - 13https://git.io/Jvl6S [02:15:55] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/Jvl6Q [02:15:56] [02miraheze/puppet] 07paladox 034531e22 - icinga: Switch to db6 (#1234) * icinga: Switch to db6 * Update init.pp [02:16:09] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl65 [02:16:10] [02miraheze/puppet] 07paladox 039de4c51 - roundcubemail: Switch to db6 [02:16:17] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [02:16:19] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [02:16:30] [02puppet] 07paladox deleted branch 03paladox-patch-2 - 13https://git.io/vbiAS [02:16:31] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-2 [02:21:06] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl6p [02:21:07] [02miraheze/puppet] 07paladox 0329f3758 - Update main.pp [02:23:34] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 53% [02:23:47] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlif [02:23:48] [02miraheze/puppet] 07paladox 03309da41 - Update init.pp [02:25:30] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 92% [02:28:52] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/JvliT [02:28:54] [02miraheze/puppet] 07paladox 031412023 - mariadb: Bind to 0.0.0.0 This will allow port 3306 to be served over both ipv6 and ipv4. [02:28:55] [02puppet] 07paladox created branch 03paladox-patch-2 - 13https://git.io/vbiAS [02:28:57] [02puppet] 07paladox opened pull request 03#1235: mariadb: Bind to 0.0.0.0 - 13https://git.io/Jvlik [02:31:59] [02puppet] 07paladox synchronize pull request 03#1235: mariadb: Bind to 0.0.0.0 - 13https://git.io/Jvlik [02:32:01] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/JvliI [02:32:02] [02miraheze/puppet] 07paladox 03a3d7baa - Update mw.cnf.erb [03:24:25] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 59% [03:26:25] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 84% [03:38:21] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 56% [03:40:20] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [04:10:11] PROBLEM - misc1 HTTPS on misc1 is CRITICAL: connect to address 185.52.1.76 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [04:10:29] PROBLEM - misc1 Puppet on misc1 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 9 minutes ago with 0 failures [04:10:46] PROBLEM - misc1 grafana.miraheze.org HTTPS on misc1 is CRITICAL: connect to address 185.52.1.76 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [04:11:32] PROBLEM - misc1 icinga.miraheze.org HTTPS on misc1 is CRITICAL: connect to address 185.52.1.76 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [04:13:16] !log MariaDB [(none)]> set global innodb_io_capacity_max=200; [04:48:04] [02mw-config] 07paladox synchronize pull request 03#2881: database: Switch to db6 - 13https://git.io/JvluL [04:48:05] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/JvlXU [04:48:07] [02miraheze/mw-config] 07paladox 03a635c3b - Update Database.php [04:50:20] [02mw-config] 07paladox closed pull request 03#2881: database: Switch to db6 - 13https://git.io/JvluL [04:50:21] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlXI [04:50:23] [02miraheze/mw-config] 07paladox 03cce9213 - database: Switch to db6 (#2881) * database: Switch to db6 * Update Database.php [04:50:24] [02mw-config] 07paladox deleted branch 03paladox-patch-3 - 13https://git.io/vbvb3 [04:50:26] [02miraheze/mw-config] 07paladox deleted branch 03paladox-patch-3 [04:51:46] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/JvlXt [04:51:48] [02miraheze/mw-config] 07paladox 035c19f72 - Unset readonly from all wikis apart from a few [04:51:49] [02mw-config] 07paladox created branch 03paladox-patch-3 - 13https://git.io/vbvb3 [04:51:51] [02mw-config] 07paladox opened pull request 03#2882: Unset readonly from all wikis apart from a few - 13https://git.io/JvlXq [04:52:25] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:53:29] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/JvlXY [04:53:30] [02miraheze/mw-config] 07paladox 03ee04372 - Update LocalSettings.php [04:53:32] [02mw-config] 07paladox synchronize pull request 03#2882: Unset readonly from all wikis apart from a few - 13https://git.io/JvlXq [04:57:19] [02mw-config] 07paladox closed pull request 03#2882: Unset readonly from all wikis apart from a few - 13https://git.io/JvlXq [04:57:20] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JvlXZ [04:57:22] [02miraheze/mw-config] 07paladox 031d790b3 - Unset readonly from all wikis apart from a few (#2882) * Unset readonly from all wikis apart from a few * Update LocalSettings.php [04:57:23] [02mw-config] 07paladox deleted branch 03paladox-patch-3 - 13https://git.io/vbvb3 [04:57:25] [02miraheze/mw-config] 07paladox deleted branch 03paladox-patch-3 [04:59:12] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19705 bytes in 0.387 second response time [04:59:43] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [04:59:45] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [04:59:59] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 56% [05:01:55] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 84% [05:03:12] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4197 bytes in 0.102 second response time [05:03:44] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [05:03:45] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [05:05:12] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19685 bytes in 0.478 second response time [05:06:21] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 55% [05:06:34] PROBLEM - yellowiki.xyz - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'yellowiki.xyz' expires in 15 day(s) (Mon 02 Mar 2020 05:03:40 AM GMT +0000). [05:06:47] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlXW [05:06:48] [02miraheze/ssl] 07MirahezeSSLBot 038d9b924 - Bot: Update SSL cert for yellowiki.xyz [05:08:21] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [05:09:26] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:10:44] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 0.589 second response time [05:11:55] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 53% [05:12:23] RECOVERY - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is OK: OK - NGINX Error Rate is 34% [05:12:35] RECOVERY - yellowiki.xyz - LetsEncrypt on sslhost is OK: OK - Certificate 'yellowiki.xyz' will expire on Fri 15 May 2020 04:06:41 AM GMT +0000. [05:13:55] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 76% [05:14:12] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19687 bytes in 0.410 second response time [05:14:57] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:15:55] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 20% [05:18:14] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 8.762 second response time [05:18:27] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:20:27] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 59% [05:21:15] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 0.993 second response time [05:24:26] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 0.474 second response time [05:26:18] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19704 bytes in 4.908 second response time [05:28:53] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:04] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:31:44] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1351 bytes in 0.067 second response time [05:33:16] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19704 bytes in 8.960 second response time [05:34:32] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [05:35:03] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19687 bytes in 0.640 second response time [05:36:29] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 59% [05:36:58] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:43] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:38:58] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 2.792 second response time [06:38:01] paladox, PuppyKun, Reception123, SPF|Cloud: should we still be down? It’s 503ing [06:38:32] @Site Reliability Engineers [06:40:47] I'm not sure since I didn't do the migration [06:40:59] I can try taking a look but I doubt I can do anything [06:42:24] Reception123: paladox said we we’re back at 01:17 on discord but icinga,phab and wikis are all down [06:46:52] RhinosF1: I can just see that db6 is really slow [06:46:58] Reception123: wiki just loaded very slow but not rendered rigjt [06:47:18] Reception123: hmm, that’s not good. [06:47:34] Back to 503 [06:51:21] Reception123: no edits since 2pm my time [06:52:01] Reception123: this is UBM [06:52:03] UBN* [06:52:21] well we have to wait for the others, I'm looking at what I can but I don't think I'll get far [06:53:04] Zppix: I’ve pinged everyone [06:53:12] Even owen to wake john [06:53:24] But paladox sleeps well usually [06:57:44] I have to go to bed its late and i work tomorrow or else id stay up and help [06:58:49] Zppix: I don’t want to go yet but I need to get another 2-3 hrs sleep as well [06:59:13] Reception123: how long u around for [06:59:22] quite a while [07:10:46] Zppix: RhinosF1 shouldn't https://github.com/miraheze/mw-config/blob/master/LocalSettings.php#L24 be DB6? [07:10:46] [ mw-config/LocalSettings.php at master · miraheze/mw-config · GitHub ] - github.com [07:14:15] Try it Reception123 [07:14:23] Worse thing it cAn do is nothing [07:15:33] true [07:15:38] we're completely down anyway [07:17:30] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl1h [07:17:31] [02miraheze/mw-config] 07Reception123 0313df231 - try to change $wgLocalVirtualHosts to db6 IP [07:17:56] Reception123: force puppet runs [07:18:07] yeah doing [07:19:37] Reception123: any luck? [07:19:55] nope :( reverting [07:20:01] Wait dont [07:20:05] I think its fine [07:20:19] Reception123: [07:20:27] Zppix: well technically it should be like that [07:20:39] when paladox or someone comes I'll let them know of the change in case it shouldn't [07:20:57] Reception123: it was showing meta for second fhen it died [07:21:09] I wonder if the import is breaking it [07:21:13] Zppix: https://github.com/miraheze/mw-config/commit/05b3c870b20ac07d0922a8cf0169de2965332759 this shows it's the correct procedure when moving dbs [07:21:14] [ 81.4.125.112 -> 81.4.109.166 · miraheze/mw-config@05b3c87 · GitHub ] - github.com [07:21:26] Zppix: don't see an active import if I check htop on db6 [07:21:53] Then what the hell is causing this [07:22:08] Now im just getting 502 Reception123 [07:22:15] Zppix: I got that a few times last time [07:22:23] *before not last time [07:23:25] I just announced on discord that our down status is known [07:23:46] Zppix: I think ATT might be the culprit [07:23:54] How so? [07:24:11] Zppix: Sat Feb 15 7:18:46 UTC 2020 mw4 allthetropeswiki SqlBagOStuff::fetchBlobMulti db6.miraheze.org 1146 Table 'allthetropeswiki.objectcache' doesn't exist (db6.miraheze.org) SELECT keyname,value,exptime FROM `objectcache` WHERE keyname = 'allthetropeswiki:messages:en:status' [07:24:34] How do we fix that [07:25:26] Zppix: I don't know why it says it doesn't exist though, maybe a migration problem? [07:25:38] Idk how so we dix [07:25:39] Fix [07:25:45] Zppix: 'use allthetropeswiki' is slow for now not even working [07:26:18] it just won't do it [07:26:42] Maybe we just need to drop it :P jk Reception123 [07:27:21] Zppix: well db6 is just too slow, a select for meta doesn't work and neither does USE att [07:27:46] So basically we have no db? [07:28:01] Can we switch back to the old db temp? [07:28:17] Zppix: We probably could but then it would mean the migration was for nothing [07:28:39] We need to do something until the other sre are back [07:28:49] We cant have an outage overnight [07:28:53] Well lose wikis [07:29:21] Zppix: We could switch ATT back to db5 maybe? [07:29:42] Reception123: if you know how go for it [07:29:57] Did they keep db5 database intact? [07:30:24] yeah, they should've [07:30:29] and I'm not sure how but I'll look [07:31:05] Im just not sure why an error for att would break all wikis Reception123 [07:31:19] That sounds like bad error handling [07:33:02] Zppix: I know, it's probably not that [07:33:13] sounds like I should have exported a backup of our wiki before this maintenance [07:34:07] Tegu: no data loss is expected [07:34:11] phew :) [07:34:21] We still have the old db intact [07:38:04] Reception123: GL im off to bed im tired [07:38:17] Zppix: good night [07:38:43] Reception123: wait have you tried restarting the db? [07:39:02] Zppix: would that not make things worse? [07:39:33] How so? [07:39:52] Zppix: data loss? [07:40:22] Idk [07:40:46] I cant think of any other solutions [07:41:22] Besides literally dropping ATT and maybe reimport [07:42:15] But anyway before i come up with more crazy ideas im going to bed [07:44:07] well, good night [07:44:13] Reception123: is jobrunner operating? [07:44:20] Reception123: i just thought of that [07:44:36] let me see which server that's on [07:44:48] Reception123: jobrunner1 [07:44:59] oh yes I forgot it has its own [07:45:34] !log started jobrunner on jobrunner1 [07:45:50] I wonder if thats why [07:46:29] Zppix: yup, that was why... [07:46:37] Im a genius [07:46:39] Phabricator is still down but Meta seems to be back up [07:46:41] Zppix: you saved the day :) [07:47:18] Zppix: it's still really slow but it seems to work [07:47:44] I was legit getting ready to lay down and i was like oh i wonder if jobrunner is working Reception123 [07:48:15] Zppix: heh that was a good idea [07:48:26] though the others should really come and see why it's so slow and what's up with Phab [07:49:20] The main thing is the wikis work, the other services arent the end of the world [07:49:51] Reception123: it may be varnish hasnt had time to recache? [07:49:58] could be [07:50:07] Zppix: well they don't really because it's still really really slow [07:51:03] Landing page is fast Reception123 [07:51:43] Yeah [07:52:01] Anyway im going to bed (maybe?) [09:50:36] Reception123: Are we supposed to still be read only [09:50:50] Warning: The database has been locked for maintenance, so you will not be able to save your edits right now. You may wish to copy and paste your text into a text file and save it for later. [09:50:50] The system administrator who locked it offered this explanation: The master database server is running in read-only mode. [09:52:57] We just 503’d [10:13:40] Update: We are aware of an issue causing slowness for wikis and most services to be unavailable. We are waiting on either <@585842771256934431> or <@484010048004030484> to be available as it is an issue with the migration. There is a good chance they won’t be available for a few hours unfortunately as they are asleep. We will try to get an incident report as soon as we can. [10:14:17] That’s paladox or SPF|Cloud [10:29:24] No idea, but I'm not going to mess with it [10:30:14] Reception123: leave it. I have a good idea what they’ve done but why baffles me. It’s likely the cause as well. [12:16:06] paladox, SPF|Cloud: ? [13:30:28] i'm around [13:35:46] paladox: wikis are slow + readonly, phab is showing a db error + everything else is err_connection_failed [13:36:00] yup, i am aware. [13:36:05] Please work out wtf is going on. Anyway, I’m off out [13:36:17] paladox: that’s not good 20 hours on [13:36:34] RhinosF1 we are on hdd's, things are going to be slow. [13:36:54] paladox: can you make an update on discord [13:44:03] ok [14:38:20] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl71 [14:38:22] [02miraheze/mw-config] 07paladox 03ba42894 - Only keep allthetropeswiki in read only [14:40:22] paladox: almost there? [14:40:37] SPF|Cloud i've restored access (which seems to be holding) [14:40:43] i'm still restoring att though [14:40:52] piwik i can do later after all wikis are done [14:41:04] How much of it has been done? [14:41:36] paladox: wikis are read only even without that for some reason [14:41:54] where? [14:42:20] oh [14:42:20] paladox: try edit on meta. i said when you turned up [14:42:40] 13:35:46 paladox: wikis are slow + readonly, phab is showing a db error + everything else is err_connection_failed [14:42:47] 13:36:01 yup, i am aware. [14:44:21] SPF|Cloud where else did you set things into read only? [14:44:51] SET GLOBAL read_only=1; [14:44:52] "The master database server is running in read-only mode." [14:44:53] oh [14:44:57] !log SET GLOBAL read_only=0; [14:45:29] RhinosF1 now it's not. [14:48:21] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl77 [14:48:23] [02miraheze/puppet] 07paladox 033aeef34 - Update puppet2.yaml [15:00:07] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl5q [15:00:09] [02miraheze/mw-config] 07paladox 035c3231b - Switch att back to db5 [15:04:22] paladox: should createwiki still be blocked? [15:04:42] yes, for the moment. [15:04:48] we're not ready to unlock that yet [15:05:08] paladox: k, can you respond to https://meta.miraheze.org/w/index.php?diff=96561&oldid=96479&rcid=378506 [15:05:16] [ Difference between revisions of "Stewards' noticeboard" - Miraheze Meta ] - meta.miraheze.org [15:05:28] not at the moment, we're still trying to sort out a few issues, sorry. [15:06:47] I’m not able to either [15:09:18] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl5l [15:09:20] [02miraheze/puppet] 07paladox 032fab380 - db6: Up innodb_buffer_pool_size to 18G [15:13:33] [02puppet] 07paladox synchronize pull request 03#1235: mariadb: Bind to 0.0.0.0 - 13https://git.io/Jvlik [15:13:35] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/Jvl5E [15:13:36] [02miraheze/puppet] 07paladox 03fd4a6dc - Update mw.cnf.erb [15:16:34] !log restart mysql on db6 [15:25:07] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvl5Q [15:25:08] [02miraheze/puppet] 07paladox 03d6c6bf2 - salt: Allow master to listen on ipv6 address [15:25:10] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [15:25:15] [02puppet] 07paladox opened pull request 03#1236: salt: Allow master to listen on ipv6 address - 13https://git.io/Jvl57 [15:25:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl55 [15:25:38] [02miraheze/puppet] 07paladox 03df1c832 - Update masters.pp [15:26:31] [02puppet] 07paladox synchronize pull request 03#1236: salt: Allow master to listen on ipv6 address - 13https://git.io/Jvl57 [15:26:32] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvl5F [15:26:34] [02miraheze/puppet] 07paladox 03a5e5fd8 - Update master.erb [15:26:39] [02puppet] 07paladox closed pull request 03#1236: salt: Allow master to listen on ipv6 address - 13https://git.io/Jvl57 [15:26:40] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl5b [15:26:42] [02miraheze/puppet] 07paladox 03ad09b1e - salt: Allow master to listen on ipv6 address (#1236) * salt: Allow master to listen on ipv6 address * Update master.erb [15:26:43] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [15:26:45] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [15:28:00] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvl5p [15:28:01] [02miraheze/puppet] 07paladox 03e947551 - Update master.erb [15:41:04] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvld4 [15:41:05] [02miraheze/puppet] 07paladox 03e6e4a18 - salt: Run as salt [16:05:47] !log db6: MariaDB [(none)]> set global innodb_flush_log_at_trx_commit=2; [16:11:44] !log install iotop on mw1/cloud1 [16:16:52] !log install sysstat on cloud1 [16:25:23] * hispano76 greetings [16:27:48] [02miraheze/dns] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/JvlFA [16:27:50] [02miraheze/dns] 07paladox 03d120ccf - Rollback to old cluster [16:27:51] [02dns] 07paladox created branch 03paladox-patch-3 - 13https://git.io/vbQXl [16:27:53] [02dns] 07paladox opened pull request 03#127: Rollback to old cluster - 13https://git.io/JvlFx [16:28:52] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlFh [16:28:53] [02miraheze/mw-config] 07paladox 039ae4b6b - Set all wikis to read only [16:29:28] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/Jvlbe [16:29:30] [02miraheze/mw-config] 07paladox 0361af94b - Switch back to old cluster [16:29:31] [02mw-config] 07paladox created branch 03paladox-patch-3 - 13https://git.io/vbvb3 [16:29:33] [02mw-config] 07paladox opened pull request 03#2883: Switch back to old cluster - 13https://git.io/Jvlbv [16:33:08] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlbJ [16:33:10] [02miraheze/puppet] 07paladox 03a0e6c02 - Switch back to the old cluster [16:33:11] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [16:33:32] [02puppet] 07paladox opened pull request 03#1237: Switch back to the old cluster - 13https://git.io/JvlbU [16:35:39] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlbL [16:35:41] [02miraheze/puppet] 07paladox 03ce7c014 - Update stunnel.conf [16:35:42] [02puppet] 07paladox synchronize pull request 03#1237: Switch back to the old cluster - 13https://git.io/JvlbU [16:35:49] !log db6: MariaDB [(none)]> set global innodb_flush_log_at_trx_commit=1; [16:36:53] paladox: shall I review those PRs? [16:38:20] SPF|Cloud yes please [16:39:05] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlb3 [16:39:06] [02miraheze/puppet] 07paladox 03519d27e - Update varnish.pp [16:39:08] [02puppet] 07paladox synchronize pull request 03#1237: Switch back to the old cluster - 13https://git.io/JvlbU [16:39:16] paladox: https://github.com/miraheze/mw-config/pull/2883/files [16:39:17] [ Switch back to old cluster by paladox · Pull Request #2883 · miraheze/mw-config · GitHub ] - github.com [16:39:24] there are wikis missing on c2 [16:39:36] oh [16:39:37] right [16:40:08] So did we figure out the issues? [16:40:20] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/Jvlbl [16:40:22] [02miraheze/mw-config] 07paladox 03a68fe74 - Update Database.php [16:40:23] SPF|Cloud done [16:40:23] Zppix: too high I/O on cloud1 [16:40:23] [02mw-config] 07paladox synchronize pull request 03#2883: Switch back to old cluster - 13https://git.io/Jvlbv [16:40:48] SPF|Cloud: so what are we doing about it? [16:40:55] [02mw-config] 07Southparkfan closed pull request 03#2883: Switch back to old cluster - 13https://git.io/Jvlbv [16:40:57] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlb4 [16:40:58] [02miraheze/mw-config] 07paladox 031a06fd8 - Switch back to old cluster (#2883) * Switch back to old cluster * Update Database.php [16:40:59] Zppix reverting back to the old cluster [16:41:01] cannot comment on that yet [16:41:10] but yes - for now, we're rolling back [16:41:17] Damn [16:42:29] [02miraheze/dns] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/Jvlbu [16:42:31] [02miraheze/dns] 07paladox 03e4f5e98 - Update config [16:42:32] [02dns] 07paladox synchronize pull request 03#127: Rollback to old cluster - 13https://git.io/JvlFx [16:44:41] [02dns] 07paladox closed pull request 03#127: Rollback to old cluster - 13https://git.io/JvlFx [16:44:43] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlbX [16:44:44] [02miraheze/dns] 07paladox 0368bcbfd - Rollback to old cluster (#127) * Rollback to old cluster * Update config [16:44:46] [02dns] 07paladox deleted branch 03paladox-patch-3 - 13https://git.io/vbQXl [16:44:47] [02miraheze/dns] 07paladox deleted branch 03paladox-patch-3 [16:46:13] [02puppet] 07paladox closed pull request 03#1237: Switch back to the old cluster - 13https://git.io/JvlbU [16:46:15] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±3] 13https://git.io/JvlbM [16:46:16] [02miraheze/puppet] 07paladox 032f65aa3 - Switch back to the old cluster (#1237) * Switch back to the old cluster * Update stunnel.conf * Update varnish.pp [16:46:18] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [16:46:19] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [16:47:19] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlby [16:47:21] [02miraheze/puppet] 07paladox 03ad14ba0 - Revert "salt: Run as salt" This reverts commit e6e4a18381bd259ba15555e6bb697e193a89fcaf. [16:50:07] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlb5 [16:50:09] [02miraheze/dns] 07paladox 03e48673b - Update config [16:51:27] !log set set global read_only=1 [16:51:40] !log starting mysql on db4 & set global read_only=1 [16:53:01] RECOVERY - cp8 Stunnel Http for misc2 on cp8 is OK: HTTP OK: HTTP/1.1 200 OK - 43457 bytes in 0.429 second response time [16:53:15] RECOVERY - db5 Puppet on db5 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:53:15] RECOVERY - misc1 Puppet on misc1 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:53:21] RECOVERY - misc1 grafana.miraheze.org HTTPS on misc1 is OK: HTTP OK: HTTP/1.1 200 OK - 29370 bytes in 0.493 second response time [16:53:41] RECOVERY - misc1 icinga.miraheze.org HTTPS on misc1 is OK: HTTP OK: HTTP/1.1 302 Found - 334 bytes in 0.008 second response time [16:54:01] RECOVERY - cp3 Stunnel Http for misc2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 43457 bytes in 0.749 second response time [16:54:02] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 2991 MB (12% inode=94%); [16:54:05] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 43465 bytes in 0.057 second response time [16:54:10] RECOVERY - cp4 Stunnel Http for misc2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 43457 bytes in 0.060 second response time [16:54:23] RECOVERY - misc1 HTTPS on misc1 is OK: HTTP OK: HTTP/1.1 200 OK - 29340 bytes in 0.162 second response time [16:54:24] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4208 bytes in 0.025 second response time [16:55:22] PROBLEM - cp8 HTTPS on cp8 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4138 bytes in 0.380 second response time [16:55:22] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4208 bytes in 0.024 second response time [16:55:23] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 100% [16:55:39] RECOVERY - db4 MySQL on db4 is OK: Uptime: 274 Threads: 26 Questions: 9206 Slow queries: 362 Opens: 917 Flush tables: 1 Open tables: 911 Queries per second avg: 33.598 [16:55:57] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [16:56:35] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19723 bytes in 0.287 second response time [16:57:09] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 4.547 second response time [16:57:10] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 23% [16:57:29] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19724 bytes in 8.254 second response time [16:57:30] RECOVERY - cp8 HTTPS on cp8 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1532 bytes in 0.462 second response time [16:57:31] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 11% [16:58:42] wich is Read-only for rollback? curiosity [16:59:39] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 4.307 second response time [16:59:53] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 2 backends are down. mw3 lizardfs6 [17:01:22] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [17:01:38] paladox: is this normal performance for the old cluster? [17:03:01] SPF|Cloud yes [17:03:14] I'm going to re-enable the file system on the old gluster [17:03:17] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [17:03:31] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 7.581 second response time [17:03:49] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:04:22] !log gluster volume set mvol features.read-only off - lizardfs6 [17:04:58] RECOVERY - lizardfs6 Puppet on lizardfs6 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:05:24] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw3 lizardfs6 [17:06:01] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [17:07:58] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:08:14] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:09:38] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4212 bytes in 0.067 second response time [17:10:11] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 2 backends are down. mw3 lizardfs6 [17:10:33] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw1 [17:12:22] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 4.783 second response time [17:14:00] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [17:14:16] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 0.277 second response time [17:14:31] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [17:14:37] PROBLEM - cp8 Current Load on cp8 is CRITICAL: CRITICAL - load average: 2.20, 2.16, 1.23 [17:15:51] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19724 bytes in 7.604 second response time [17:16:26] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 0.461 second response time [17:16:34] RECOVERY - cp8 Current Load on cp8 is OK: OK - load average: 1.03, 1.70, 1.16 [17:19:51] PROBLEM - misc1 grafana.miraheze.org HTTPS on misc1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:02] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [17:20:08] PROBLEM - misc1 HTTPS on misc1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:20:52] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:22:54] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19724 bytes in 4.990 second response time [17:23:22] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw3 lizardfs6 [17:25:17] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [17:29:31] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:51] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [17:30:24] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:31:25] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [17:31:33] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw2 [17:32:21] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlND [17:32:23] [02miraheze/puppet] 07paladox 03fd8d36c - icinga: Switch back to db4 [17:32:24] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 5.049 second response time [17:32:24] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [17:32:26] [02puppet] 07paladox opened pull request 03#1238: icinga: Switch back to db4 - 13https://git.io/JvlNy [17:32:50] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlN9 [17:32:52] [02miraheze/puppet] 07paladox 0331c17e5 - Update init.pp [17:32:53] [02puppet] 07paladox synchronize pull request 03#1238: icinga: Switch back to db4 - 13https://git.io/JvlNy [17:33:07] [02puppet] 07paladox closed pull request 03#1238: icinga: Switch back to db4 - 13https://git.io/JvlNy [17:33:08] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JvlNH [17:33:10] [02miraheze/puppet] 07paladox 03ade0ca0 - icinga: Switch back to db4 (#1238) * icinga: Switch back to db4 * Update init.pp [17:33:11] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [17:33:13] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [17:33:19] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 2 backends are down. mw3 lizardfs6 [17:33:29] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlN7 [17:33:31] [02miraheze/puppet] 07paladox 03490e257 - grafana: Switch back to db4 [17:33:31] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [17:33:49] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlN5 [17:33:51] [02miraheze/puppet] 07paladox 030670314 - roundcubemail: Switch back to db4 [17:34:05] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [17:36:52] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:24] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [17:37:38] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw3 lizardfs6 [17:38:04] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 19724 bytes in 3.531 second response time [17:38:48] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19724 bytes in 1.287 second response time [17:39:30] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [17:39:43] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [17:41:54] RECOVERY - misc1 HTTPS on misc1 is OK: HTTP OK: HTTP/1.1 200 OK - 29371 bytes in 0.025 second response time [17:44:15] RECOVERY - misc1 grafana.miraheze.org HTTPS on misc1 is OK: HTTP OK: HTTP/1.1 200 OK - 29340 bytes in 0.014 second response time [17:49:40] !log set global read_only=0; [17:49:45] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. lizardfs6 [17:50:56] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw3 lizardfs6 [17:52:33] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlAC [17:52:35] [02miraheze/puppet] 07paladox 03bbdd511 - Add mw4 to varnish (as a test) [17:52:36] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [17:52:38] [02puppet] 07paladox opened pull request 03#1239: Add mw4 to varnish (as a test) - 13https://git.io/JvlAW [17:53:33] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlA8 [17:53:35] [02miraheze/puppet] 07paladox 03ce81da0 - Update stunnel.conf [17:53:36] [02puppet] 07paladox synchronize pull request 03#1239: Add mw4 to varnish (as a test) - 13https://git.io/JvlAW [17:53:47] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:53:53] [02puppet] 07paladox closed pull request 03#1239: Add mw4 to varnish (as a test) - 13https://git.io/JvlAW [17:53:55] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JvlAB [17:53:56] [02miraheze/puppet] 07paladox 038ec290c - Add mw4 to varnish (as a test) (#1239) * Add mw4 to varnish (as a test) * Update stunnel.conf [17:53:58] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [17:53:59] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [17:59:12] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [17:59:47] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [18:00:48] PROBLEM - cp4 Puppet on cp4 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 5 minutes ago with 0 failures [18:02:13] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 35545 bytes in 0.407 second response time [18:03:04] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:04:40] paladox: working on notice regarding data drift now [18:04:45] RECOVERY - cp4 Puppet on cp4 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:04:50] SPF|Cloud ok [18:05:00] Phabricator is broken though [18:05:36] yeh i'll have to restore that [18:05:40] That one should be back online, as I'll be redirecting users there [18:05:59] Or do you want to receive requests for restore through e-mail? [18:06:57] SPF|Cloud i'm restoring phab [18:06:59] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [18:08:30] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1351 bytes in 0.026 second response time [18:09:25] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlAb [18:09:27] [02miraheze/puppet] 07paladox 03d771542 - phabricator: Switch back to db4 [18:10:33] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19726 bytes in 7.175 second response time [18:10:37] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:11:47] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw3 [18:13:53] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [18:14:48] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 19724 bytes in 5.921 second response time [18:15:04] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:17:00] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19724 bytes in 0.390 second response time [18:18:15] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlxJ [18:18:17] [02miraheze/mw-config] 07paladox 03257d2d8 - Take all wikis out of read only [18:18:34] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvlxT [18:18:36] [02miraheze/mw-config] 07paladox 0310e19fe - Update LocalSettings.php [18:20:14] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [18:21:32] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:23:00] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 3 backends are down. mw2 mw3 lizardfs6 [18:23:00] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw3 lizardfs6 [18:24:13] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 5 datacenters are down: 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [18:24:58] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 6 backends are healthy [18:24:58] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [18:25:08] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvlxt [18:25:10] [02miraheze/puppet] 07paladox 03d895d71 - fix [18:25:36] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19725 bytes in 2.788 second response time [18:26:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlx3 [18:26:47] [02miraheze/puppet] 07paladox 03b2a4e82 - Pool in mw[4567] [18:26:49] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [18:26:50] [02puppet] 07paladox opened pull request 03#1240: Pool in mw[4567] - 13https://git.io/Jvlxs [18:27:48] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlxG [18:27:49] [02miraheze/puppet] 07paladox 03d5f5d06 - Update stunnel.conf [18:27:51] [02puppet] 07paladox synchronize pull request 03#1240: Pool in mw[4567] - 13https://git.io/Jvlxs [18:27:57] [02puppet] 07paladox edited pull request 03#1240: varnish: Pool in mw[4567] - 13https://git.io/Jvlxs [18:28:07] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4212 bytes in 0.045 second response time [18:28:15] [02puppet] 07paladox closed pull request 03#1240: varnish: Pool in mw[4567] - 13https://git.io/Jvlxs [18:28:17] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JvlxZ [18:28:18] [02miraheze/puppet] 07paladox 03f9e3506 - varnish: Pool in mw[4567] (#1240) * Pool in mw[4567] * Update stunnel.conf [18:28:20] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [18:28:21] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [18:30:04] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:30:11] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19686 bytes in 4.173 second response time [18:30:37] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [18:34:57] !log depool lizardfs6 [18:36:03] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. lizardfs6 [18:36:22] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlxa [18:36:23] [02miraheze/puppet] 07paladox 03e427f60 - mw[4567] remove new_servers [18:36:25] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [18:36:26] [02puppet] 07paladox opened pull request 03#1241: mw[4567] remove new_servers - 13https://git.io/Jvlxw [18:36:42] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlxr [18:36:43] [02miraheze/puppet] 07paladox 03091ccea - Update mw5.yaml [18:36:45] [02puppet] 07paladox synchronize pull request 03#1241: mw[4567] remove new_servers - 13https://git.io/Jvlxw [18:36:55] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlxo [18:36:57] [02miraheze/puppet] 07paladox 039d4dea3 - Update mw6.yaml [18:36:58] [02puppet] 07paladox synchronize pull request 03#1241: mw[4567] remove new_servers - 13https://git.io/Jvlxw [18:37:07] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlxK [18:37:08] [02miraheze/puppet] 07paladox 034159890 - Update mw7.yaml [18:37:10] [02puppet] 07paladox synchronize pull request 03#1241: mw[4567] remove new_servers - 13https://git.io/Jvlxw [18:37:17] [02puppet] 07paladox closed pull request 03#1241: mw[4567] remove new_servers - 13https://git.io/Jvlxw [18:37:19] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±4] 13https://git.io/Jvlx6 [18:37:20] [02miraheze/puppet] 07paladox 0373e432f - mw[4567] remove new_servers (#1241) * mw[4567] remove new_servers * Update mw5.yaml * Update mw6.yaml * Update mw7.yaml [18:37:22] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [18:37:23] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [18:39:01] paladox: have you updated phab? [18:39:47] What do you mean by update? [18:39:58] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 2 datacenters are down: 128.199.139.216/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [18:40:03] phabricator is back [18:40:09] paladox: software [18:40:16] no [18:40:24] The look is more modern [18:40:55] Subscribers used to be text list and now it’s showing everyone with profile pics next to their names [18:41:00] !log repool cp4 [18:41:15] oh [18:41:27] maybe i did when i switched over to phab1 [18:41:58] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [18:41:59] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 10 backends are healthy [18:42:07] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:42:20] Since before the upgrade, I've been watching it [18:44:05] since phab1 would have cloned master [18:44:22] so must have been newer then what i at the moment on misc1 [18:45:25] https://phabricator.miraheze.org/T5233#99470 that was at the time of making this comment for example. [18:45:28] [ ⚓ T5233 Can someone clear all the nonexistent user permissions from the group rights lists from the following wikis? ] - phabricator.miraheze.org [18:46:58] (I'm just saying) [18:59:15] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 2400:6180:0:d0::403:f001/cpweb [19:04:08] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 81.4.109.133/cpweb [19:04:58] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlpg [19:05:00] [02miraheze/puppet] 07paladox 0395828d1 - Update mw4.yaml [19:05:01] [02puppet] 07paladox created branch 03paladox-patch-10 - 13https://git.io/vbiAS [19:05:03] [02puppet] 07paladox opened pull request 03#1242: Update mw4.yaml - 13https://git.io/Jvlp2 [19:05:04] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:05:15] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlpa [19:05:16] [02miraheze/puppet] 07paladox 03e88554f - Update mw5.yaml [19:05:18] [02puppet] 07paladox synchronize pull request 03#1242: Update mw4.yaml - 13https://git.io/Jvlp2 [19:05:26] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/JvlpV [19:05:27] [02miraheze/puppet] 07paladox 0301a897f - Update mw6.yaml [19:05:29] [02puppet] 07paladox synchronize pull request 03#1242: Update mw4.yaml - 13https://git.io/Jvlp2 [19:05:37] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-10 [+0/-0/±1] 13https://git.io/Jvlpw [19:05:38] [02miraheze/puppet] 07paladox 03c84d49e - Update mw7.yaml [19:05:40] [02puppet] 07paladox synchronize pull request 03#1242: Update mw4.yaml - 13https://git.io/Jvlp2 [19:05:56] [02puppet] 07paladox edited pull request 03#1242: Switch gluster to lizardfs6 gluster for mw[4567] - 13https://git.io/Jvlp2 [19:06:00] [02puppet] 07paladox closed pull request 03#1242: Switch gluster to lizardfs6 gluster for mw[4567] - 13https://git.io/Jvlp2 [19:06:01] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±4] 13https://git.io/Jvlpr [19:06:03] [02miraheze/puppet] 07paladox 03030dec2 - Switch gluster to lizardfs6 gluster for mw[4567] (#1242) * Update mw4.yaml * Update mw5.yaml * Update mw6.yaml * Update mw7.yaml [19:10:05] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:13:54] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 81.4.109.133/cpweb, 51.161.32.127/cpweb [19:14:01] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 51.161.32.127/cpweb [19:15:59] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:17:48] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:19:50] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw4 mw7 [19:19:59] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2400:6180:0:d0::403:f001/cpweb [19:20:01] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 2 backends are down. mw6 mw7 [19:20:08] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw6 mw7 [19:21:58] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:28:03] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 128.199.139.216/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [19:28:41] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 51.161.32.127/cpweb [19:30:01] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:33:58] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 128.199.139.216/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 2607:5300:205:200::17f6/cpweb [19:35:49] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 10 backends are healthy [19:35:58] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 10 backends are healthy [19:36:07] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 9 backends are healthy [19:36:25] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:37:35] For what it's worth, I'm gone now. Haven't gotten enough rest the past 48 hours. [19:38:32] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/Jvlh0 [19:38:33] [02miraheze/mw-config] 07paladox 036a0c777 - Update site notice [19:38:35] [02mw-config] 07paladox created branch 03paladox-patch-4 - 13https://git.io/vbvb3 [19:38:36] [02mw-config] 07paladox opened pull request 03#2884: Update site notice - 13https://git.io/JvlhE [19:38:48] SPF|Cloud: go rest. [19:38:57] [02mw-config] 07paladox synchronize pull request 03#2884: Update site notice - 13https://git.io/JvlhE [19:38:58] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-4 [+0/-0/±1] 13https://git.io/Jvlhu [19:39:00] [02miraheze/mw-config] 07paladox 037ec5dda - Update Sitenotice.php [19:40:23] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 4 datacenters are down: 128.199.139.216/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 2607:5300:205:200::17f6/cpweb [19:41:59] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:42:32] [ANNOUNCEMENT] Important: If you edited a miraheze wiki over the last 24 hours, please read the instructions at https://phabricator.miraheze.org/maniphest/task/edit/form/15/ - Apologies for the distruption. [19:43:47] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw4 [19:43:53] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 1 backends are down. mw4 [19:44:15] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [19:48:03] !log depooled mw[4567] [19:48:06] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw7 [19:48:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:49:44] * RhinosF1 - out [19:50:07] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 9 backends are healthy [20:39:38] [02mw-config] 07paladox closed pull request 03#2884: Update site notice - 13https://git.io/JvlhE [20:39:39] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvljy [20:39:41] [02miraheze/mw-config] 07paladox 0342451fd - Update site notice (#2884) * Update site notice * Update Sitenotice.php [20:39:42] [02mw-config] 07paladox deleted branch 03paladox-patch-4 - 13https://git.io/vbvb3 [20:39:44] [02miraheze/mw-config] 07paladox deleted branch 03paladox-patch-4 [20:39:45] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-10 [20:39:47] [02puppet] 07paladox deleted branch 03paladox-patch-10 - 13https://git.io/vbiAS [20:45:40] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [20:47:52] A bit confusing, the current sitenotice being Tables paladox excuse me [20:49:15] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jv8ef [20:49:16] [02miraheze/mw-config] 07paladox 035447988 - Fix [20:49:17] hispano76 ^ [20:49:38] and I haven't looked at the mobile version [20:49:49] ok [20:51:57] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [22:44:15] PROBLEM - cp8 Current Load on cp8 is CRITICAL: CRITICAL - load average: 1.04, 2.39, 1.55 [22:46:19] PROBLEM - cp8 Current Load on cp8 is WARNING: WARNING - load average: 0.74, 1.85, 1.46 [22:48:22] RECOVERY - cp8 Current Load on cp8 is OK: OK - load average: 0.35, 1.32, 1.31 [22:55:11] [02miraheze/mw-config] 07paladox pushed 032 commits to 03master [+0/-0/±2] 13https://git.io/Jv8Jk [22:55:13] [02miraheze/mw-config] 07paladox 035d06486 - Revert "disable createwiki for db migration" This reverts commit fb418d3a509c377642d5d0039fc7f75b625143ed. [22:55:14] [02miraheze/mw-config] 07paladox 03aac552a - Revert "try to change $wgLocalVirtualHosts to db6 IP" This reverts commit 13df231200946ab5dd522fdd058ab82c716ff994. [23:14:13] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 2607:5300:205:200::17f6/cpweb [23:16:09] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [23:35:07] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jv8UT [23:35:09] [02miraheze/services] 07MirahezeSSLBot 035fe7a12 - BOT: Updating services config for wikis [23:41:27] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:41:43] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. lizardfs6 [23:43:24] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 20532 bytes in 0.262 second response time [23:43:43] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 9 backends are healthy