[00:55:11] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:55:22] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:34] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:50] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:56:07] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [00:56:23] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:57:04] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw3 [00:57:22] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:58:26] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15296 bytes in 0.502 second response time [00:59:03] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [00:59:11] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15296 bytes in 0.004 second response time [00:59:19] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18690 bytes in 1.180 second response time [00:59:33] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18691 bytes in 0.564 second response time [00:59:52] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18691 bytes in 0.451 second response time [01:00:04] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15295 bytes in 0.486 second response time [01:00:10] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [01:11:25] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw1 mw3 [01:13:25] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [03:18:22] [02miraheze/ManageWiki] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/JvqPY [03:18:23] [02miraheze/ManageWiki] 07paladox 0310f7ef7 - modifyGroupPermission: Fix defaults for options [03:18:25] [02ManageWiki] 07paladox created branch 03paladox-patch-2 - 13https://git.io/vpSns [03:19:12] [02ManageWiki] 07paladox opened pull request 03#133: modifyGroupPermission: Fix defaults for options - 13https://git.io/JvqP3 [03:21:07] [02ManageWiki] 07sonarcloud[bot] commented on pull request 03#133: modifyGroupPermission: Fix defaults for options - 13https://git.io/JvqPs [03:21:11] [02ManageWiki] 07sonarcloud[bot] deleted a comment on pull request 03#133: modifyGroupPermission: Fix defaults for options - 13https://git.io/JvqPs [03:21:13] [02ManageWiki] 07sonarcloud[bot] commented on pull request 03#133: modifyGroupPermission: Fix defaults for options - 13https://git.io/JvqPG [03:22:07] [02ManageWiki] 07sonarcloud[bot] deleted a comment on pull request 03#133: modifyGroupPermission: Fix defaults for options - 13https://git.io/JvqPG [03:22:09] [02ManageWiki] 07sonarcloud[bot] commented on pull request 03#133: modifyGroupPermission: Fix defaults for options - 13https://git.io/JvqPn [03:28:51] [02miraheze/ManageWiki] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/JvqP4 [03:28:53] [02miraheze/ManageWiki] 07paladox 0336382c2 - Fix removing groups Based on https://github.com/miraheze/ManageWiki/blob/master/includes/helpers/ManageWikiPermissions.php#L92 remove groups should be after add groups. [03:28:54] [ ManageWiki/ManageWikiPermissions.php at master · miraheze/ManageWiki · GitHub ] - github.com [03:28:54] [02ManageWiki] 07paladox created branch 03paladox-patch-3 - 13https://git.io/vpSns [03:28:56] [02ManageWiki] 07paladox opened pull request 03#134: Fix removing groups - 13https://git.io/JvqPR [03:30:56] [02ManageWiki] 07sonarcloud[bot] commented on pull request 03#134: Fix removing groups - 13https://git.io/JvqPE [03:31:00] [02ManageWiki] 07sonarcloud[bot] deleted a comment on pull request 03#134: Fix removing groups - 13https://git.io/JvqPE [03:31:02] [02ManageWiki] 07sonarcloud[bot] commented on pull request 03#134: Fix removing groups - 13https://git.io/JvqPu [03:32:36] [02ManageWiki] 07sonarcloud[bot] deleted a comment on pull request 03#134: Fix removing groups - 13https://git.io/JvqPu [03:32:37] [02ManageWiki] 07sonarcloud[bot] commented on pull request 03#134: Fix removing groups - 13https://git.io/JvqPa [03:36:45] [02ManageWiki] 07paladox closed pull request 03#134: Fix removing groups - 13https://git.io/JvqPR [03:37:03] [02ManageWiki] 07paladox deleted branch 03paladox-patch-3 - 13https://git.io/vpSns [03:37:04] [02miraheze/ManageWiki] 07paladox deleted branch 03paladox-patch-3 [05:03:24] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [05:04:14] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [05:23:31] PROBLEM - cp2 Stunnel Http for lizardfs6 on cp2 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8203: HTTP/1.1 200 OK [06:26:09] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3179 MB (13% inode=94%); [06:37:48] !log deleted and dropped gmcwiki (T5150) [06:37:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [08:35:11] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [08:43:07] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:21:37] @wmca JosueThomasDiez [10:21:46] .wmca JosueThomasDiez [10:21:46] https://meta.wikimedia.org/wiki/Special:CentralAuth/JosueThomasDiez [10:22:49] Reception123: https://publictestwiki.com/wiki/Special:Block/JosueThomasDiez [10:22:52] [ Permission error - TestWiki ] - publictestwiki.com [10:23:08] https://publictestwiki.com/wiki/User:JosueThomasDiez [10:23:10] [ User:JosueThomasDiez - TestWiki ] - publictestwiki.com [10:24:52] Hello JosueThomasDiez! If you have any questions, feel free to ask and someone should answer soon. [10:26:54] JosueThomasDiez: May I remind you that ban evading is prohibited [10:26:59] grumble: ^ [10:27:22] RhinosF1 Give me a admin rights. [10:28:46] * grumble grumbles [10:29:07] grumble: this is our other troll [10:35:22] grumble: this is the evil impersonating one [14:29:24] I am griot [14:29:33] s/i/o/ [14:29:34] c^ meant to say: I am groot [14:35:24] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. lizardfs6 mw2 mw3 [14:35:24] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:35:25] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:35:28] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 4 backends are down. lizardfs6 mw1 mw2 mw3 [14:35:37] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw1 mw2 [14:35:40] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:36:08] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:36:39] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:36:49] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:38:50] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:39:27] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4212 bytes in 0.038 second response time [14:39:34] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:39:51] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4210 bytes in 0.022 second response time [14:39:55] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4210 bytes in 0.024 second response time [14:40:44] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15296 bytes in 0.396 second response time [14:41:19] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 0.662 second response time [14:41:20] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:15] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18699 bytes in 7.924 second response time [14:42:51] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 7.658 second response time [14:42:57] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15301 bytes in 0.007 second response time [14:43:09] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15287 bytes in 0.532 second response time [14:43:19] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18698 bytes in 0.790 second response time [14:43:25] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18697 bytes in 1.099 second response time [14:43:38] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15301 bytes in 0.294 second response time [14:44:02] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 7 backends are healthy [14:44:03] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18698 bytes in 0.417 second response time [14:44:24] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [14:44:26] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [14:44:26] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [14:44:27] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [14:48:54] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2650 MB (10% inode=94%); [14:49:34] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:49:53] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:50:14] PROBLEM - mw3 MediaWiki Rendering on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:50:26] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:50:42] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:50:56] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:50:57] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:51:07] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:51:26] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:51:27] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw2 [14:51:41] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:51:43] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1352 bytes in 0.022 second response time [14:51:44] huh [14:51:48] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:52:09] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [14:52:09] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [14:53:08] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15288 bytes in 9.688 second response time [14:53:32] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 0.506 second response time [14:53:39] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18698 bytes in 0.213 second response time [14:53:41] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18699 bytes in 1.177 second response time [14:54:16] RECOVERY - mw3 MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 18697 bytes in 0.438 second response time [14:54:43] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:54:43] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:56:38] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 0.505 second response time [14:56:43] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15295 bytes in 0.295 second response time [14:56:43] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15296 bytes in 0.006 second response time [14:56:48] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15301 bytes in 0.005 second response time [14:57:04] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18698 bytes in 0.409 second response time [14:57:15] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15288 bytes in 0.393 second response time [14:57:30] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18699 bytes in 1.124 second response time [14:57:32] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 7 backends are healthy [14:58:01] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [14:58:20] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [14:58:20] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [14:58:20] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [15:02:16] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JvqdY [15:02:17] [02miraheze/puppet] 07paladox 033c40ba3 - Revert "mediawiki: Increase php mysql timeout to 8" This reverts commit 5e1860d0c6d15dac14c68b6068a0ae9149a59229. [15:05:05] !log restart php7.3-fpm on mw2 and lizardfs6 [15:05:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [15:05:40] paladox: have you seen the incident at wikimedia in progress? [15:05:45] yup [15:05:49] i just saw it [15:05:57] nothing new? [15:06:05] what do you reckon? [15:06:24] well looks like esam is having severe issues. [15:06:35] though esams is for varnish. [15:06:48] !log restart php7.3-fpm on mw1 and mw3 [15:06:49] aye, so it’s probably just high traffic spike [15:07:06] esams runs one of the nameservers too afaik [15:07:07] grafana should have more details. [15:07:12] yup [15:07:26] grafana went down with during the spike.. conveniently [15:07:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [15:07:35] c^ ohhhh [15:07:39] that's due to varnish [15:07:48] it never went down for us users [15:07:54] c^: hi :) [15:07:55] same for phab [15:07:58] yeah lol, maybe they need to decouple their monitoring from everything else [15:08:03] yah [15:08:07] Reception123: o/ [15:08:19] paladox: hi [15:08:42] hi [15:09:40] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:09:43] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:10:21] ok, what gives [15:11:02] wikipedia not loading for me now :P [15:11:26] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [15:11:27] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 2 backends are down. mw1 mw2 [15:11:29] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [15:11:30] though it's a conincident that our problem comes when wmf have there outage. Maybe commons causing us issues? [15:11:49] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:12:00] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 328 bytes in 0.004 second response time [15:12:05] Reception123 apparently we have an outage in comming... [15:12:28] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 1352 bytes in 0.047 second response time [15:12:34] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw2 mw3 [15:12:59] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [15:13:16] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 2 backends are down. mw2 mw3 [15:13:30] !log restart php7.3-fpm & nginx on mw[123] & lizardfs6 [15:13:43] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 0.006 second response time [15:13:56] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15296 bytes in 0.004 second response time [15:14:24] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 6.710 second response time [15:14:25] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18698 bytes in 1.129 second response time [15:14:29] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 7.900 second response time [15:16:18] !log hacked LS on mw1 to switch off wgUseInstantCommons temporarily [15:17:30] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 0.391 second response time [15:17:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [15:18:23] !log hacked LS on mw[23] & lizardfs6 to switch off wgUseInstantCommons temporarily [15:18:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [15:18:45] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 7 backends are healthy [15:19:39] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [15:19:55] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [15:19:56] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [15:19:57] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [15:20:24] c^ looks like that outage affected us :( [16:48:18] !log revert hack on mw[123] & lizardfs6 [16:48:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:51:34] paladox: are we still being affected by the commons outage? [16:56:49] paladox: https://meta.miraheze.org/wiki/Stewards%27_noticeboard#Photos_from_wikicommons_disappearance [16:56:52] [ Stewards' noticeboard - Miraheze Meta ] - meta.miraheze.org [17:01:47] Reception123: it’s being mitigated [17:27:13] Curious, is there a reason why a category appears to have several pages in quotes when it doesn't? https://ucronias.miraheze.org/wiki/Categor%C3%ADa:Historias_alternativas [17:27:15] [ Categoría:Historias alternativas - Ucronías ] - ucronias.miraheze.org [17:28:17] paladox: can you refresh category counts for it? [17:39:22] RhinosF1 recountCategories.php? [17:39:56] paladox: if that’s the script that redoes the cat counts [17:40:06] The other could be refreshLinks.php [17:40:26] refreshLinks.php only does links as far as i'm aware [17:40:30] not category specific. [17:40:39] RhinosF1 what mode? [17:40:40] --mode: (REQUIRED) Which category count column to recompute: [17:40:41] "pages", "subcats" or "files". [17:40:48] * paladox presumes pages [17:40:56] paladox: see hispano’s comment [17:41:02] The counts seem wrong [17:41:13] !log root@mw2:/srv/mediawiki/w/maintenance# sudo -u www-data php recountCategories.php --wiki=ucroniaswiki --mode=pages [17:41:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:41:29] hispano76: ^ [17:41:44] !log root@mw2:/srv/mediawiki/w/maintenance# sudo -u www-data php refreshLinks.php --wiki=ucroniaswiki [17:41:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:42:08] paladox: shouldn’t recount be done after links are refreshed? [17:42:54] I'm not sure to be honest. It dosen't mention it on the page. [17:42:57] https://www.mediawiki.org/wiki/Manual:RecountCategories.php [17:42:57] [ Manual:recountCategories.php - MediaWiki ] - www.mediawiki.org [17:43:07] anyways i'm out for a few hours [17:43:14] Ok [17:43:19] so you'll need to ask Reception123 to do anyother stuff :) [17:43:42] I'll be here for a bit but I'm a bit busy tonight so I don't know how much I'll be able to do [17:47:00] Reception123: you can look [17:47:16] RhinosF1: look at what? [17:47:24] Also see ##RhinosF1 - I blocked them and left a message. If they respond, you might want to lock [17:47:28] PROBLEM - mw2 Current Load on mw2 is CRITICAL: CRITICAL - load average: 12.19, 11.34, 7.40 [17:47:32] Reception123: hispano’s request [17:48:10] so I need to run RecountCategories.php> [17:48:12] *? [17:50:02] Reception123: yep [17:50:08] ok [17:51:47] !log reception@mw1:/srv/mediawiki/w/maintenance$ sudo -u www-data php recountCategories.php --wiki=ucroniaswiki --mode=pages/subpages/files [17:51:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:52:03] hispano76: done :) [17:52:58] RhinosF1: and heh Paladox got K-lined again [17:53:20] Reception123: watching the fallout in #freenode [18:09:36] PROBLEM - mw2 Current Load on mw2 is WARNING: WARNING - load average: 4.14, 4.83, 7.29 [18:11:36] RECOVERY - mw2 Current Load on mw2 is OK: OK - load average: 4.63, 4.44, 6.55 [18:19:52] Voidwalker: Can you do a CU [18:20:08] sure, probs [18:21:15] also, paladox was asking if freenode had klined the entire xshellz network earlier [18:21:36] but I'm assuming I'll be CUing the account that was locked earlier [18:22:55] Voidwalker: yeah the one reception did [18:23:00] And I blocked [18:23:18] I blocked an IP range but not sure how good it’ll be and a global would be nice [18:23:25] + delete their wiki [18:23:30] Hi JohnLewis [18:25:45] hi [18:25:47] JohnLewis: 2020-01-25 - 19:40:51UTC tell JohnLewis sonarcloud[bot] is making #miraheze a feed of botspam [18:26:13] JohnLewis: we need to either move alerts for things or not make that bot spam so much [18:26:19] It was getting ridiculous [18:26:30] Multiple people have said it now [18:26:31] any member of SRE can sort it out [18:26:51] JohnLewis: make it post to another channel? [18:26:55] Or slow it down [18:27:05] If so, someone pls do it [18:27:11] Either sonar had changed or someone else enabled it [18:27:20] Reception123: hum, I keep seeing the wrong count Presidente Limantour‎ (2 P) / Presidente López Obrador‎ (1 P) ¿is my cache? [18:27:22] because the bot never commented in the past [18:27:28] JohnLewis: it’d be nice to know [18:27:46] Welcome back Paladox, it was xshellz [18:28:04] I enabled sonar via travis, so the bot shouldn’t be doing anything so afaik someone else did this [18:29:01] have globalled the range RhinosF1, not the first account on it that we've locked [18:29:31] Voidwalker: no it isn’t - it’s an account facing a global ban [18:29:37] Under TOS [18:29:55] JohnLewis: pls look at last night’s logs around when I sent that message [18:30:39] For what reason? [18:33:50] Or I think I didn't make myself clear at first sorry :( [18:40:44] JohnLewis: to see exactly what happened [18:41:03] hispano76: you did, the fix just didn’t work. [18:41:06] Reception123: ^ [18:41:30] I know what happens [18:41:38] oh, okay [18:41:49] Okay, maybe see if it can be turned back off JohnLewis [18:41:54] that doesn’t change the fact I’m not the only one who can do anything about it [18:42:21] Anyone in SRE can - I don’t have access to GitHub [18:42:42] JohnLewis: I know but for some reason you end up doing it and ok [18:43:14] I know - it’s annoying. I’ve come to about 3 things that don’t need me, requiring my attention [18:43:19] JohnLewis: reception also wanted D/SSRE to approve https://phabricator.miraheze.org/T5114 [18:43:20] [ ⚓ T5114 Setup Github Sponsors ] - phabricator.miraheze.org [18:43:49] Yeah that’s fine, I thought Owen already said he’ll pass details on [18:43:56] its what he told me a few weeks ago [18:44:10] JohnLewis: it’s what I thought but reception wanted it [18:44:55] Well anyone in SRE can do it then, it’s fine [18:45:12] if anything since it revolves around money, I’d say it’s Owen’s decision not ours [18:45:27] JohnLewis: I know, it needs a button ticking and copy pasting a file then filling a form with Owen [18:45:49] If Reception123 does it then I’ll be happy for you do it on the twinkle repo as well if owen has no issue [18:51:50] Ok, I wasn't sure so I preferred asking to make sure there are no objections [18:52:07] But in that case I will wait for Owen to send the information [18:52:17] Reception123: see SRE discreet [18:52:33] o [18:52:33] k [18:52:43] Though I won't have time to do it tonight I can handle it tomorrow evening [18:52:54] Kk [18:58:10] Hello paladoxz! If you have any questions, feel free to ask and someone should answer soon. [18:58:58] Oh, yay my bouncer back [19:54:46] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:55:40] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:55:41] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw2 [19:55:59] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [19:56:05] ? [19:56:13] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:56:44] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:57:03] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw3 [19:57:10] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [19:57:26] PROBLEM - db5 Disk Space on db5 is WARNING: DISK WARNING - free space: / 20074 MB (10% inode=99%); [19:57:36] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15296 bytes in 0.006 second response time [19:58:51] PROBLEM - lizardfs6 MediaWiki Rendering on lizardfs6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:00:41] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:01:57] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 51% [20:02:24] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:02:38] 502 [20:02:41] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 0.399 second response time [20:04:00] hum, working slow [20:04:14] PROBLEM - bacula1 Bacula Databases db4 on bacula1 is WARNING: WARNING: Full, 1007552 files, 44.08GB, 2020-01-11 20:01:00 (2.1 weeks ago) [20:04:16] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 13% [20:05:28] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:05:55] sorry paladox look? [20:06:00] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:06:10] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 8.534 second response time [20:06:20] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15295 bytes in 0.005 second response time [20:06:56] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18698 bytes in 4.157 second response time [20:07:00] ok, time to do the hack again [20:07:08] wikimedia are having issues... [20:07:50] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:09:48] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 18685 bytes in 0.206 second response time [20:10:25] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 7 backends are healthy [20:12:01] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 15295 bytes in 0.305 second response time [20:12:05] !log hacked LS on mw[123] & lizardfs6 to switch off wgUseInstantCommons temporarily [20:12:16] RECOVERY - lizardfs6 MediaWiki Rendering on lizardfs6 is OK: HTTP OK: HTTP/1.1 200 OK - 18685 bytes in 5.173 second response time [20:12:20] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 18684 bytes in 0.355 second response time [20:12:27] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 7 backends are healthy [20:12:28] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [20:12:47] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [20:13:32] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [20:13:52] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [20:14:45] Was trying to read something on enwiki and get 502 [20:14:58] Yep, things are blowing up today. [20:15:04] Reception123: known [20:15:12] k6ka: yeah first commons then this [20:15:27] Reception123: https://bpaste.net/B4HA [20:15:27] First the coronavirus, then Kobe Bryant apparently being killed, then Wikipedia going down [20:15:28] [ View paste B4HA ] - bpaste.net [20:15:30] Don't know what's up with them today [20:15:33] "It is the end!" [20:15:37] "Make preparations!" [20:15:48] k6ka: oh yeah I saw that too. Sad if it's actually true [20:16:06] Reception123: I’m talking to operations [20:16:10] A few other sources other than TMZ (which is probably unreliable) are also reporting the death [20:16:15] And what do they say? [20:16:25] k6ka: yeah, looks pretty likely unfortunately [20:16:30] But Kobe Bryant's the least of my concerns since the coronavirus has probably arrived in Canada [20:16:48] Only a matter of time. My parents lived through the SARS outbreak so they bought me a box of medical masks. [20:21:58] paladox: can you make a #announcement + Channel Topic [20:37:04] RhinosF1: well things should have recovered [20:37:08] All I did was hack the commons config (now mobile) [20:37:35] paladox: yeah just best to let people know why [20:39:45] Well if things are up, the status quo is correct? [20:39:46] (For the topic at least) [20:40:11] I’ve wrote in #annoucement [20:40:46] paladox: Add a note on end [20:41:43] RhinosF1: huh? [20:43:01] Is the file problem fixed? [20:43:14] paladox: Welcome to the IRC channel of Miraheze, a free non-profit wiki hosting provider! | https://meta.miraheze.org | Status: Up | This channel is publicly logged at http://wm-bot.wmflabs.org/browser/index.php?display=%23miraheze | By participating in this channel, you agree to abide by our Code of Conduct: https://meta.miraheze.org/m/PA | Fundraiser: https://cutt.ly/Urtm3xZ | Note: InstantCommons Unavailable [20:43:20] [ Meta ] - meta.miraheze.org [20:43:21] [ Wikimedia IRC logs browser ] - wm-bot.wmflabs.org [20:43:22] [ Code of Conduct - Miraheze Meta ] - meta.miraheze.org [20:43:24] [ Fundraiser for Owen Reece Baines by Ferran Tufan : Help Miraheze stay online! ] - cutt.ly [20:43:44] hispano76: see Community Noticeboard on meta but no [20:43:49] What do you mean by a note on our end? :) [20:43:50] Oh [20:44:16] we think we've actually mitigated the connectivity-related issues for now [20:44:16] 2020-01-26 20:36:42 the thing that is currently ongoing is a new [20:44:58] RhinosF1: done [20:45:24] Okay [20:45:34] Thx [20:45:40] paladox: it’s cut off [20:45:57] Yeah, I figured they were related WMF [20:46:45] to wait for it to be resolved :) [20:47:28] Oh, guess we cannot add it then [20:48:30] RhinosF1: ^ [20:50:02] paladox: ok [21:06:45] We can now confirm that Wikimedia issues are related to appservers and connectivity issues are believed to be mitigated. [21:11:54] paladox: services normalising [21:11:55] PROBLEM - bacula1 Bacula Databases db5 on bacula1 is WARNING: WARNING: Full, 2212 files, 61.69GB, 2020-01-11 21:09:00 (2.1 weeks ago) [21:16:51] paladox: want to try put instant commons slowly back on [21:17:45] I will in a bit [21:17:48] I’m mobile [21:19:07] paladox: kk [21:19:52] PROBLEM - bacula1 Bacula Phabricator Static on bacula1 is WARNING: WARNING: Full, 81154 files, 2.852GB, 2020-01-11 21:17:00 (2.1 weeks ago) [21:26:13] In case I get permission to use a content under the Creative Commons license or with the permission of the author keeping a copyright note, and I would like to let you know about the authorization, can I send it to tech@miraheze.org or not? [21:28:35] hispano76: you can [21:40:34] Perfect, this goes for a CommonsWiki file that I was allowed to use [21:41:07] ook [22:14:43] [02miraheze/mw-config] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/Jvmvt [22:14:44] [02miraheze/mw-config] 07paladox 03ac9e212 - MediawikiChat: Create blockedfromchat group when extension is enabled. [22:14:46] [02mw-config] 07paladox created branch 03paladox-patch-1 - 13https://git.io/vbvb3 [22:14:47] [02mw-config] 07paladox opened pull request 03#2868: MediawikiChat: Create blockedfromchat group when extension is enabled. - 13https://git.io/Jvmvq [22:15:18] !log revert hack on mw[123] & lizardfs6 [22:15:43] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [22:16:20] paladox: yey! Will you update Community Noticeboard and Discord. [22:16:42] ok [22:17:00] [02mw-config] 07paladox closed pull request 03#2868: MediawikiChat: Create blockedfromchat group when extension is enabled. - 13https://git.io/Jvmvq [22:17:02] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jvmv4 [22:17:03] [02miraheze/mw-config] 07paladox 03aa6fef8 - MediawikiChat: Create blockedfromchat group when extension is enabled. (#2868) [22:17:05] [02mw-config] 07paladox deleted branch 03paladox-patch-1 - 13https://git.io/vbvb3 [22:17:06] [02miraheze/mw-config] 07paladox deleted branch 03paladox-patch-1 [22:18:25] paladox: community noticeboard? [22:19:42] done [22:20:16] paladox: yw [22:21:01] yeah! [23:03:10] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [23:03:30] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [23:04:59] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:05:11] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:06:53] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 18715 bytes in 0.220 second response time [23:07:06] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 15302 bytes in 0.188 second response time [23:07:14] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [23:07:38] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:12:00] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb [23:13:59] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online