[00:05:23] PROBLEM - mw1 Current Load on mw1 is CRITICAL: CRITICAL - load average: 9.29, 5.85, 4.10 [00:07:24] RECOVERY - mw1 Current Load on mw1 is OK: OK - load average: 4.14, 5.33, 4.15 [00:25:23] PROBLEM - mw1 Current Load on mw1 is WARNING: WARNING - load average: 4.56, 6.88, 5.34 [00:27:23] RECOVERY - mw1 Current Load on mw1 is OK: OK - load average: 2.29, 5.29, 4.95 [00:29:22] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.43, 6.95, 6.12 [00:31:18] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 6.75, 6.80, 6.16 [02:25:08] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3Nw [02:25:10] [02miraheze/services] 07MirahezeSSLBot 0331a4892 - BOT: Updating services config for wikis [05:15:10] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3xL [05:15:11] [02miraheze/services] 07MirahezeSSLBot 033eb7728 - BOT: Updating services config for wikis [06:26:28] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 2918 MB (12% inode=94%); [06:34:39] PROBLEM - misc1 Puppet on misc1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[opendmarc] [06:42:38] RECOVERY - misc1 Puppet on misc1 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:53:11] [02ssl] 07Reception123 closed pull request 03#222: Create yellowiki.xyz.crt - 13https://git.io/Je3rY [06:53:13] [02miraheze/ssl] 07Reception123 pushed 031 commit to 03master [+1/-0/±1] 13https://git.io/Je3pm [06:53:14] [02miraheze/ssl] 07RhinosF1 03292abf4 - Create yellowiki.xyz.crt (#222) * Create yellowiki.xyz.crt * Update certs.yaml * Update certs.yaml [06:54:14] Reception123: thx, I'll do it on wiki in 10 [06:54:28] RhinosF1: ok [06:54:49] And I might be able to do some SSL but I'll see as it falls right in the middle of SPF's maintenance does my perfect time to do it [06:54:50] RhinosF1: also for https://phabricator.miraheze.org/T4733 (when they do the nameserver option) a DNS record has to be created before generating a cert [06:54:51] [ ⚓ T4733 Custom domain for aghspacesystems wiki ] - phabricator.miraheze.org [06:54:55] Reception123: ^ [06:55:33] Ah You'll have to set a DNS record afaik Reception123 unless I'm learning something new [06:56:35] RhinosF1: it's really not hard, just copy/paste another one that has LetsEncrypt and change the info [06:56:43] (including the 20180829000001 ; serial part with today's date) [06:58:27] Reception123: where's the DNS records stored? [06:58:40] RhinosF1: https://github.com/miraheze/dns/tree/master/zones [06:58:40] [ dns/zones at master · miraheze/dns · GitHub ] - github.com [07:03:49] PROBLEM - yellowiki.xzy - LetsEncrypt on sslhost is CRITICAL: Name or service not knownHTTP CRITICAL - Unable to open TCP socket [07:05:00] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[yellowiki.xzy],File[yellowiki.xzy_private] [07:05:24] ^ RhinosF1 hm this again? [07:05:30] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[yellowiki.xzy],File[yellowiki.xzy_private] [07:05:38] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[yellowiki.xzy],File[yellowiki.xzy_private] [07:05:51] PROBLEM - cp4 Puppet on cp4 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[yellowiki.xzy],File[yellowiki.xzy_private] [07:06:38] PROBLEM - cp2 Puppet on cp2 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[yellowiki.xzy],File[yellowiki.xzy_private] [07:06:43] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[yellowiki.xzy],File[yellowiki.xzy_private] [07:07:04] Reception123: give it a bit [07:07:21] PROBLEM - cp3 Puppet on cp3 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[yellowiki.xzy],File[yellowiki.xzy_private] [07:08:22] Reception123: do u have access [07:08:32] RhinosF1: yeah, if I didn't I couldn't have added the private key [07:08:37] and it's weird, but it seems to have gotten removed [07:08:38] I re-added it [07:08:46] Ok [07:11:06] I was going to say check the files on mw1 [07:11:39] what about them? [07:11:56] To make sure they are there [07:12:06] they are, that's how I copied the private key [07:12:24] Good [07:35:20] Reception123: why hasn't puppet recovered yet [07:35:43] RhinosF1: no idea, the key is there [07:35:56] Hmm [07:36:41] Reception123: does it still error if you force it on a server (sudo puppet agent -tv) [07:37:32] yes [07:37:39] Hmm [07:37:49] The patch was correct right [07:37:51] Could not retrieve information from environment production source(s) puppet:///ssl/certificates/yellowiki.xzy.crt [07:38:21] RhinosF1: yeah I can't see anything wrong with it [07:38:44] SPF|Cloud: can we nick you 20mins early I'm lost [07:39:03] Reception123: this is no sense [07:40:09] RhinosF1: oh I think I know why [07:40:13] And hour and 20 mins early even SPF or whatever it is in the world of timezones [07:40:17] Reception123: yes [07:40:38] [02miraheze/ssl] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3p1 [07:40:39] [02miraheze/ssl] 07Reception123 03ac07c11 - rm dot from cert name [07:40:41] ^ RhinosF1 could be this [07:40:59] Reception123: ofc [07:41:09] * RhinosF1 hides [07:47:56] .at 11AM CEST [07:47:56] RhinosF1: Sorry, but I didn't understand, please try again. [07:49:35] .at 10am SPF|Cloud [07:49:36] RhinosF1: Sorry, but I didn't understand, please try again. [08:45:39] [02mw-config] 07Southparkfan created branch 03Southparkfan-patch-2 - 13https://git.io/vbvb3 [08:45:41] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03Southparkfan-patch-2 [+0/-0/±1] 13https://git.io/Je3hw [08:45:42] [02miraheze/mw-config] 07Southparkfan 037c3b6ae - Put several wikis in read only mode for migration (T4724) [08:45:46] [02mw-config] 07Southparkfan opened pull request 03#2764: Put several wikis in read only mode for migration (T4724) - 13https://git.io/Je3hr [08:45:56] [02mw-config] 07Southparkfan edited pull request 03#2764: Put several wikis in read only mode for migration (T4724) - 13https://git.io/Je3hr [08:47:45] [02ssl] 07RhinosF1 opened pull request 03#223: Update certs.yaml - 13https://git.io/Je3ho [08:47:51] SPF|Cloud: ^ [08:48:06] [02ssl] 07Southparkfan closed pull request 03#223: Update certs.yaml - 13https://git.io/Je3ho [08:48:07] [02miraheze/ssl] 07Southparkfan pushed 032 commits to 03master [+0/-0/±2] 13https://git.io/Je3h6 [08:48:09] [02miraheze/ssl] 07RhinosF1 030df63cc - Update certs.yaml [08:48:10] [02miraheze/ssl] 07Southparkfan 03492f06a - Merge pull request #223 from RhinosF1/patch-10 Update certs.yaml [08:51:06] !log purged even more binlogs on db4 [08:51:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [08:52:54] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [08:53:19] RECOVERY - cp4 Puppet on cp4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:53:38] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [08:53:59] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:54:14] RECOVERY - cp2 Puppet on cp2 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [08:54:21] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:54:28] RECOVERY - cp3 Puppet on cp3 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [08:59:57] [02mw-config] 07Southparkfan closed pull request 03#2764: Put several wikis in read only mode for migration (T4724) - 13https://git.io/Je3hr [08:59:58] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3hS [09:00:00] [02miraheze/mw-config] 07Southparkfan 0389958c7 - Put several wikis in read only mode for migration (T4724) (#2764) [09:00:11] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3h9 [09:00:12] [02miraheze/services] 07MirahezeSSLBot 03631ac78 - BOT: Updating services config for wikis [09:01:41] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3hQ [09:01:43] [02miraheze/mw-config] 07paladox 03bd74c6a - Move sidemwiki to new mount [09:02:49] SPF|Cloud: att is showing read-only so it worked [09:02:55] glad [09:04:11] PROBLEM - glusterfs2 Current Load on glusterfs2 is CRITICAL: CRITICAL - load average: 4.66, 3.34, 2.11 [09:04:34] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 4.75, 3.07, 1.90 [09:04:41] paladox: ^ [09:04:48] yup, aware [09:04:54] i think that's partly me [09:05:48] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config] [09:06:01] 1.5G dump for an almost 10G mysql directory, cool [09:06:05] PROBLEM - glusterfs2 Current Load on glusterfs2 is WARNING: WARNING - load average: 3.35, 3.58, 2.36 [09:06:22] SPF|Cloud: that's compressed well [09:06:34] RECOVERY - glusterfs1 Current Load on glusterfs1 is OK: OK - load average: 2.59, 3.11, 2.08 [09:07:59] RECOVERY - glusterfs2 Current Load on glusterfs2 is OK: OK - load average: 2.42, 3.21, 2.37 [09:13:28] PROBLEM - glusterfs1 Current Load on glusterfs1 is WARNING: WARNING - load average: 3.60, 3.91, 2.82 [09:13:42] PROBLEM - glusterfs2 Current Load on glusterfs2 is CRITICAL: CRITICAL - load average: 4.27, 4.17, 3.03 [09:15:26] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 4.12, 3.98, 2.98 [09:17:25] PROBLEM - glusterfs1 Current Load on glusterfs1 is WARNING: WARNING - load average: 2.61, 3.49, 2.92 [09:17:30] PROBLEM - glusterfs2 Current Load on glusterfs2 is WARNING: WARNING - load average: 2.95, 3.72, 3.11 [09:18:13] !log baobabarchiveswiki migrated to db5, changed cluster in cw_wikis, dropped database [09:18:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [09:19:21] SPF|Cloud: erm, that's just gone down https://www.irccloud.com/pastebin/ZyxZaKlp/ [09:19:22] [ Snippet | IRCCloud ] - www.irccloud.com [09:19:24] PROBLEM - glusterfs2 Current Load on glusterfs2 is CRITICAL: CRITICAL - load average: 4.30, 4.05, 3.30 [09:19:34] RhinosF1: yes, deploying config change [09:19:41] (sorry for that) but the content is fine [09:19:50] good [09:20:14] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3j3 [09:20:16] [02miraheze/mw-config] 07Southparkfan 03ca8002d - Move baobabarchiveswiki to c2 [09:20:34] RhinosF1: try again [09:20:50] SPF|Cloud: yep up [09:21:21] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 6.11, 4.58, 3.45 [09:21:32] needed the space on db4 to dump the other wikis, can migrate all other wikis in one go [09:22:00] SPF|Cloud: ah, that's cleared loads up already [09:22:38] yup, nonsensopediawiki is now being imported on db5 and nonciclopediawiki is being dumped on db4 [09:22:47] nice! [09:23:24] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3js [09:23:26] [02miraheze/mw-config] 07Southparkfan 031e4e4dc - Disable read only for baobabarchiveswiki [09:29:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3jn [09:29:47] [02miraheze/puppet] 07paladox 03108e746 - Remove /mnt/mediawiki-trash mount [09:30:34] paladox: there is some local config in test1's LocalSettings.php, can it be removed? [09:30:50] I think john put that there [09:30:59] i just git resetted and then re added it [09:30:59] alright [09:33:34] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [09:33:45] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 2 backends are down. mw1 mw3 [09:33:56] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw1 [09:34:04] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:34:44] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [09:35:27] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [09:36:17] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:38:31] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3jC [09:38:32] [02miraheze/mw-config] 07Southparkfan 0376d02c2 - nonsensopediawiki to c2 [09:39:26] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [09:39:45] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [09:39:56] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [09:40:16] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.391 second response time [09:40:21] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3j8 [09:40:23] [02miraheze/mw-config] 07Southparkfan 031e38463 - Disable read only for nonsensopediawiki [09:40:25] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.724 second response time [09:40:44] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [09:41:27] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [09:43:26] PROBLEM - db4 Disk Space on db4 is WARNING: DISK WARNING - free space: / 25120 MB (6% inode=96%); [09:43:47] PROBLEM - glusterfs1 Puppet on glusterfs1 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [09:44:02] SPF|Cloud: yey, db4 is no longer critical! [09:44:42] might be for a slight moment soon again [09:44:59] as you export the next db [09:45:02] but at some point we'll have 70G back! [09:45:23] PROBLEM - cp2 Puppet on cp2 is CRITICAL: CRITICAL: Puppet has 82 failures. Last run 2 minutes ago with 82 failures. Failed resources (up to 3 shown) [09:45:30] PROBLEM - glusterfs2 Puppet on glusterfs2 is CRITICAL: CRITICAL: Puppet has 16 failures. Last run 2 minutes ago with 16 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d],File[/etc/rsyslog.conf],File[authority certificates],File[/etc/apt/apt.conf.d/50unattended-upgrades] [09:45:37] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 6 failures. Last run 2 minutes ago with 6 failures. Failed resources (up to 3 shown) [09:45:37] importing nonciclopediawiki now [09:45:41] that sounds nice. I hopefully won't have to write db out of space on an incident report again (at least for a long while) [09:45:45] PROBLEM - cp3 Puppet on cp3 is CRITICAL: CRITICAL: Puppet has 159 failures. Last run 2 minutes ago with 159 failures. Failed resources (up to 3 shown): File[monarchists.wiki],File[monarchists.wiki_private],File[stablestate.org],File[stablestate.org_private] [09:46:21] we actually have 32G worth of binlogs on db5 now deu to all imports ;) [09:47:13] That's stupid high! [09:47:56] !log purged binlogs on db5 [09:48:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [09:51:26] PROBLEM - db4 Disk Space on db4 is CRITICAL: DISK CRITICAL - free space: / 21735 MB (5% inode=96%); [09:55:15] RECOVERY - cp2 Puppet on cp2 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [09:55:37] RECOVERY - glusterfs2 Puppet on glusterfs2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:55:41] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:55:44] RECOVERY - cp3 Puppet on cp3 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [09:55:46] RECOVERY - glusterfs1 Puppet on glusterfs1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:06:58] !log purged even more binlogs @ db5 [10:07:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:07:16] PROBLEM - db5 Current Load on db5 is WARNING: WARNING - load average: 3.64, 3.47, 2.23 [10:09:16] RECOVERY - db5 Current Load on db5 is OK: OK - load average: 2.24, 2.98, 2.19 [10:10:04] !log installed pv on db5 [10:10:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:12:18] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3jP [10:12:19] [02miraheze/mw-config] 07Southparkfan 03b29af07 - nonciclopediawiki to c2 [10:13:48] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3j1 [10:13:49] [02miraheze/puppet] 07paladox 036c15b6c - mediawiki: Tweek gluster mount options [10:13:59] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [10:14:02] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3jM [10:14:03] [02miraheze/mw-config] 07Southparkfan 03bccdb87 - Remove readonly for nonciclopediawiki [10:14:12] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [10:14:18] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [10:14:37] so, it's on ATT now.. [10:15:24] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [10:15:27] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw2 [10:15:45] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw2 [10:17:13] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3jy [10:17:14] [02miraheze/puppet] 07paladox 031934dd4 - Update mediawiki.pp [10:17:21] paladox: there is a small but certain possibility I might not make it to work if I wait until the transfer is finished [10:17:26] RECOVERY - db4 Disk Space on db4 is OK: DISK OK - free space: / 52474 MB (14% inode=96%); [10:17:36] can you take over in that case? I have about 35 minutes left now [10:17:43] oh, yes [10:17:59] cool [10:21:08] paladox, SPF|Cloud: meta is down [10:21:25] works for me [10:21:34] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static-new] [10:21:43] it's up for me [10:21:52] Back [10:22:08] See icinga web looks like a blip [10:23:31] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:25:33] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static-new] [10:25:50] !log depooled mw2 [10:25:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:26:16] !log reboot mw2 [10:26:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:27:53] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 10.03, 7.79, 6.10 [10:27:59] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:28:31] PROBLEM - mw2 php-fpm on mw2 is CRITICAL: connect to address 185.52.2.113 port 5666: Connection refusedconnect to host 185.52.2.113 port 5666: Connection refused [10:28:49] PROBLEM - mw2 Puppet on mw2 is CRITICAL: connect to address 185.52.2.113 port 5666: Connection refusedconnect to host 185.52.2.113 port 5666: Connection refused [10:28:54] att is at ~60% now [10:29:16] PROBLEM - mw2 Disk Space on mw2 is CRITICAL: connect to address 185.52.2.113 port 5666: Connection refusedconnect to host 185.52.2.113 port 5666: Connection refused [10:29:34] SPF|Cloud: good [10:29:35] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.861 second response time [10:29:52] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.006 second response time [10:30:04] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24592 bytes in 1.174 second response time [10:30:06] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.393 second response time [10:30:30] RECOVERY - mw2 php-fpm on mw2 is OK: PROCS OK: 4 processes with command name 'php-fpm7.2' [10:30:49] !log repool mw2 [10:30:49] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures [10:30:53] !log depool mw3 [10:30:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:30:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:31:16] RECOVERY - mw2 Disk Space on mw2 is OK: DISK OK - free space: / 28901 MB (37% inode=98%); [10:31:27] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [10:31:45] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [10:31:53] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 3.11, 6.25, 5.93 [10:31:56] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [10:32:29] !log repool mw3 [10:32:29] paladox: might want to check bacula1 disk space as well [10:32:47] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [10:32:51] ok [10:33:39] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:36:08] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [10:39:57] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3j7 [10:39:58] [02miraheze/mw-config] 07Southparkfan 03eebd25a - ATT to c2 [10:40:08] PROBLEM - db5 Disk Space on db5 is WARNING: DISK WARNING - free space: / 14390 MB (7% inode=99%); [10:40:16] ^ binlogs, it's fine [10:42:28] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Je3jd [10:42:29] [02miraheze/mw-config] 07Southparkfan 030c8b601 - Remove read only sitenotice [10:46:07] RECOVERY - db5 Disk Space on db5 is OK: DISK OK - free space: / 49162 MB (26% inode=99%); [10:46:09] migration done :) [10:46:18] awesome! [10:46:41] Yey! [10:47:41] 72G left on db5, 76G left on db4 [10:47:47] * RhinosF1 gives SPF|Cloud some cookies [10:48:05] Should hold out that for a bit [10:56:19] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config] [11:00:10] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jesef [11:00:11] [02miraheze/services] 07MirahezeSSLBot 03003f0b1 - BOT: Updating services config for wikis [11:06:43] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:54:07] PROBLEM - glusterfs2 Puppet on glusterfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [11:54:44] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 2604:180:0:33b::2/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [11:55:26] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 107.191.126.23/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [11:55:45] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw2 [11:58:15] RECOVERY - glusterfs2 Puppet on glusterfs2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:58:37] ^^ due to the high load on glusterfs2 [11:58:40] should recover [11:58:44] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [11:59:26] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [12:01:45] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [12:05:45] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw2 [12:05:56] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [12:08:06] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw2 [12:08:12] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:08:19] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:08:25] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [12:09:01] PROBLEM - mw2 Current Load on mw2 is WARNING: WARNING - load average: 6.99, 6.62, 5.64 [12:12:30] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [12:13:26] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [12:16:22] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [12:17:26] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [12:18:36] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 8.01, 6.56, 5.50 [12:24:29] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.005 second response time [12:24:31] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 6.56, 7.32, 6.22 [12:24:38] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.390 second response time [12:24:52] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.740 second response time [12:25:34] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [12:25:45] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [12:25:56] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [12:26:29] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 8.32, 7.68, 6.47 [12:26:40] RECOVERY - mw2 Current Load on mw2 is OK: OK - load average: 2.97, 5.38, 5.97 [12:28:28] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 6.41, 7.16, 6.43 [12:30:26] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 6.13, 6.70, 6.33 [12:32:38] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2649 MB (10% inode=94%); [12:34:21] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 8.48, 7.31, 6.64 [12:34:58] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static-new] [12:38:18] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.55, 7.59, 6.90 [12:42:14] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 5.10, 6.67, 6.70 [12:56:35] PROBLEM - glusterfs1 Current Load on glusterfs1 is WARNING: WARNING - load average: 2.63, 3.09, 3.97 [12:56:54] PROBLEM - glusterfs2 Current Load on glusterfs2 is WARNING: WARNING - load average: 2.59, 2.79, 3.80 [12:58:34] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 4.01, 3.53, 4.04 [12:58:55] PROBLEM - glusterfs2 Current Load on glusterfs2 is CRITICAL: CRITICAL - load average: 4.44, 3.36, 3.89 [13:04:34] PROBLEM - glusterfs1 Current Load on glusterfs1 is WARNING: WARNING - load average: 2.62, 3.54, 3.90 [13:06:54] PROBLEM - glusterfs2 Current Load on glusterfs2 is WARNING: WARNING - load average: 2.26, 3.35, 3.80 [13:08:54] PROBLEM - glusterfs2 Current Load on glusterfs2 is CRITICAL: CRITICAL - load average: 5.75, 4.26, 4.08 [13:10:34] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 5.68, 4.09, 3.92 [13:12:53] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [13:20:21] [02mw-config] 07RhinosF1 created branch 03RhinosF1-patch-3 - 13https://git.io/vbvb3 [13:20:22] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03RhinosF1-patch-3 [+0/-0/±1] 13https://git.io/Jesv4 [13:20:24] [02miraheze/mw-config] 07RhinosF1 03fd30833 - Covert MW.org Links to Use Special: MyLanguage Requested at https://meta.miraheze.org/w/index.php?title=Stewards%27_noticeboard&curid=662&diff=83574&oldid=83573 [13:20:25] [ Difference between revisions of "Stewards' noticeboard" - Miraheze Meta ] - meta.miraheze.org [13:20:28] [02mw-config] 07RhinosF1 opened pull request 03#2765: Covert MW.org Links to Use Special: MyLanguage - 13https://git.io/JesvB [13:21:01] [02mw-config] 07RhinosF1 commented on pull request 03#2765: Covert MW.org Links to Use Special: MyLanguage - 13https://git.io/JesvR [13:21:36] miraheze/mw-config/RhinosF1-patch-3/fd30833 - RhinosF1 The build passed. https://travis-ci.org/miraheze/mw-config/builds/587837253 [13:22:24] [02mw-config] 07RhinosF1 edited pull request 03#2765: Covert mediawiki.org links to use Special:MyLanguage - 13https://git.io/JesvB [13:35:37] PROBLEM - glusterfs1 Puppet on glusterfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:36:13] [02mw-config] 07RhinosF1 assigned pull request 03#2765: Covert mediawiki.org links to use Special:MyLanguage - 13https://git.io/JesvB [13:36:31] [02mw-config] 07RhinosF1 labeled pull request 03#2765: Covert mediawiki.org links to use Special:MyLanguage - 13https://git.io/JesvB [13:39:40] RECOVERY - glusterfs1 Puppet on glusterfs1 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:45:09] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jesvp [13:45:10] [02miraheze/services] 07MirahezeSSLBot 0323c1389 - BOT: Updating services config for wikis [14:12:16] [02mw-config] 07RhinosF1 edited pull request 03#2765: Convert mediawiki.org links to use Special:MyLanguage - 13https://git.io/JesvB [14:14:51] PROBLEM - glusterfs1 Puppet on glusterfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:16:47] RECOVERY - glusterfs1 Puppet on glusterfs1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:08] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesfK [14:25:09] [02miraheze/services] 07MirahezeSSLBot 03f093b50 - BOT: Updating services config for wikis [15:03:26] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 107.191.126.23/cpweb, 2400:6180:0:d0::403:f001/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [15:05:26] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [16:08:33] PROBLEM - knuxwiki.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:09:00] ^ looking [16:09:49] No longer pointing at us [16:14:24] PROBLEM - glusterfs1 Puppet on glusterfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:16:59] RECOVERY - knuxwiki.com - LetsEncrypt on sslhost is OK: OK - Certificate 'knuxwiki.com' will expire on Fri 12 Jun 2020 11:59:59 PM GMT +0000. [16:21:15] PROBLEM - mw1 Puppet on mw1 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 8 minutes ago with 0 failures [16:24:43] PROBLEM - glusterfs2 Puppet on glusterfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:28:53] RECOVERY - glusterfs2 Puppet on glusterfs2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:29:05] RECOVERY - glusterfs1 Puppet on glusterfs1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:31:46] revi: are you around? [16:33:37] @473730953735438336 Do you speak Korean (Pioneer) [16:33:59] @The Pioneer ^ [16:34:12] It appears also using user id doesnt work to ping you either [16:35:04] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesUD [16:35:05] [02miraheze/puppet] 07paladox 03f7aebd6 - mediawiki: Disable readdirp in gluster mount [16:36:05] We really need some Korean speaking wikicreators in either an US or UK timezone [16:36:13] yeah [16:36:15] !log depooled mw1, mw2 and mw3 (in order and repooled) [16:36:31] RhinosF1: cause revi is the only korean speaking person I know of [16:36:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [16:37:12] sup [16:37:22] revi: can you transltae https://meta.miraheze.org/wiki/Special:RequestWikiQueue/9314 for me [16:37:25] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [16:37:27] translate* [16:37:30] [ Wiki requests queue - Miraheze Meta ] - meta.miraheze.org [16:37:48] [02ssl] 07RhinosF1 opened pull request 03#224: rmv knuxwiki.com - no longer pointing at us - 13https://git.io/JesUS [16:38:39] taken care of [16:38:42] [02ssl] 07RhinosF1 synchronize pull request 03#224: rmv knuxwiki.com - no longer pointing at us - 13https://git.io/JesUS [16:39:03] revi: that works too :P [16:39:15] ty [16:39:21] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:39:30] paladox: see ssl repo [16:42:24] ok [16:43:06] have the RC filter bug been fixed yet? [16:43:35] Zppix: has anyone updated the phab task? [16:43:43] idk, but im getting 502 [16:44:08] paladox: [16:44:15] Zppix: up for me [16:44:18] 502 where? [16:44:21] meta? [16:44:23] * RhinosF1 has to remember what 502 means [16:44:25] https://meta.miraheze.org/wiki/Special:RequestWikiQueue/9314 [16:44:30] RhinosF1: bad gateway [16:44:33] [ Wiki requests queue - Miraheze Meta ] - meta.miraheze.org [16:44:38] Zppix: thx [16:44:49] Zppix: no issue for me [16:44:56] RhinosF1: may want to bookmark this https://en.wikipedia.org/wiki/List_of_HTTP_status_codes [16:44:56] [WIKIPEDIA] List of HTTP status codes | "This is a list of Hypertext Transfer Protocol (HTTP) response status codes. Status codes are issued by a server in response to a client's request made to the server. It includes codes from IETF Request for Comments (RFCs), other specifications, and some additional codes used in some common applications..." [16:45:28] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static-new] [16:46:18] paladox: its back for me [16:46:43] paladox: maybe the depool of mw1,2,3 caused it and i just happened to send a http request at the wrong time? [16:46:50] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static-new] [16:48:43] [02ManageWiki] 07GustaveLondon776 opened pull request 03#117: Update extension.json - 13https://git.io/JesUx [16:49:37] [02ManageWiki] 07Pix1234 commented on pull request 03#117: Update extension.json - 13https://git.io/JesUp [16:49:38] [02ManageWiki] 07GustaveLondon776 edited pull request 03#117: Update extension.json - 13https://git.io/JesUx [16:50:14] Zppix: exact reason i haven't merged [16:50:19] I really hate that notifico doesn't use the name of my github and uses my username [16:50:48] Travis uses my github name but notifico doesnt and it annoys me [16:50:51] * RhinosF1 gets onto mw-config changes [16:51:20] [02ssl] 07paladox closed pull request 03#224: rmv knuxwiki.com - no longer pointing at us - 13https://git.io/JesUS [16:51:22] [02miraheze/ssl] 07paladox pushed 031 commit to 03master [+0/-1/±1] 13https://git.io/JesTv [16:51:23] [02miraheze/ssl] 07RhinosF1 03b53bed8 - rmv knuxwiki.com - no longer pointing at us (#224) * rmv knuxwiki.com - no longer pointing at us * Update certs.yaml [16:51:31] paladox: thx [16:53:18] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03RhinosF1-patch-3 [+0/-0/±1] 13https://git.io/JesTf [16:53:20] [02miraheze/mw-config] 07RhinosF1 03ab96b32 - batch 2 (automated using find&replace tool) [16:53:21] [02mw-config] 07RhinosF1 synchronize pull request 03#2765: Convert mediawiki.org links to use Special:MyLanguage - 13https://git.io/JesvB [16:54:59] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[knuxwiki.com],File[knuxwiki.com_private] [16:55:32] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[knuxwiki.com],File[knuxwiki.com_private] [16:55:45] PROBLEM - cp3 Puppet on cp3 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[knuxwiki.com],File[knuxwiki.com_private] [16:55:53] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 11.35, 8.53, 6.29 [16:55:55] PROBLEM - cp4 Puppet on cp4 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[knuxwiki.com],File[knuxwiki.com_private] [16:56:23] PROBLEM - cp2 Puppet on cp2 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[knuxwiki.com],File[knuxwiki.com_private] [16:57:12] [02mw-config] 07RhinosF1 closed pull request 03#2765: Convert mediawiki.org links to use Special:MyLanguage - 13https://git.io/JesvB [16:57:13] [02miraheze/mw-config] 07RhinosF1 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesTU [16:57:15] [02miraheze/mw-config] 07RhinosF1 0350388ea - Convert mediawiki.org links to use Special:MyLanguage (#2765) Requested on Stewards Noticeboard [16:57:17] [02mw-config] 07RhinosF1 deleted branch 03RhinosF1-patch-3 - 13https://git.io/vbvb3 [16:57:19] [02miraheze/mw-config] 07RhinosF1 deleted branch 03RhinosF1-patch-3 [16:57:53] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 5.55, 7.37, 6.14 [16:59:53] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 4.21, 6.27, 5.88 [17:03:33] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:03:37] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:03:51] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:03:54] RECOVERY - cp4 Puppet on cp4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:04:22] RECOVERY - cp2 Puppet on cp2 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [17:04:48] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:05:40] RECOVERY - cp3 Puppet on cp3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:22:51] hahah netsplit got icinga [17:23:20] :P [17:23:37] aww it's back [17:24:13] !log restarted ircecho on misc1 [17:24:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:24:37] paladox: should let it come back on its own it could of been quiet in here [17:24:56] !log upgrade icinga2 on misc1 [17:24:58] heh [17:25:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:25:30] Zppix, that's what /ignore is for [17:25:36] RhinosF1: out of all the bots in the channel im in i think the only bot i can see that actually got hit by the netsplit was icinga :PP [17:25:37] :P [17:25:45] Zppix: yes [17:25:58] Voidwalker: but then i wont get to watch icinga start to think the world is on fire [17:26:21] https://www.youtube.com/watch?v=oxHcQ4EabaU :P [17:26:22] [ Fallout 4 Soundtrack - The Ink Spots - I Don't Want To Set The World on Fire [HQ] - YouTube ] - www.youtube.com [17:26:41] Zppix: that's when I decide to deploy and puppet choses to restart at the same time [17:27:18] RhinosF1: If anyone is going to cause mass destruction with puppet it would be me cause legit it just start issuing puppet commands and hope it does what i want [17:27:29] Zppix: heh [17:27:50] RhinosF1: wait do you smell smoke or is it just me ? [17:28:04] xD [17:28:15] Zppix: just you. [17:28:22] lol [17:28:32] my smell is poor anyway [17:39:08] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesTP [17:39:10] [02miraheze/puppet] 07paladox 031b8ee02 - Update mail-host-notification.sh [17:39:54] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesTX [17:39:55] [02miraheze/puppet] 07paladox 0348e0125 - Update mail-service-notification.sh [17:47:44] PROBLEM - glusterfs2 Current Load on glusterfs2 is WARNING: WARNING - load average: 1.84, 2.65, 3.75 [17:47:57] PROBLEM - glusterfs1 Current Load on glusterfs1 is WARNING: WARNING - load average: 2.53, 3.00, 3.88 [17:53:00] RECOVERY - glusterfs2 Current Load on glusterfs2 is OK: OK - load average: 1.92, 2.13, 3.20 [17:53:13] RECOVERY - glusterfs1 Current Load on glusterfs1 is OK: OK - load average: 1.48, 2.10, 3.22 [17:59:49] why does sysadmin need centralauth perms? isnt that what the wikiabuse team, cvt and stewards are for? [17:59:59] Zppix: ToU [18:00:06] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 10.68, 6.26, 4.51 [18:00:08] I though Stewards handled ToU [18:00:10] thought* [18:00:14] Zppix: no [18:01:10] Stewards have never handled ToU [18:02:54] PROBLEM - glusterfs1 Puppet on glusterfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:03:02] paladox: ^ [18:03:23] RhinosF1: doesnt really mean anything persay it could just be the check timed out due to laod [18:03:25] load* [18:03:39] Zppix: nah, i'm getting 503s [18:03:52] RhinosF1: that would be the load :P [18:03:52] paladox can see channels you can't with more info in [18:04:03] RhinosF1: this is true [18:04:18] PROBLEM - glusterfs2 Puppet on glusterfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:04:48] PROBLEM - glusterfs2 Current Load on glusterfs2 is CRITICAL: CRITICAL - load average: 6.28, 6.21, 4.80 [18:05:55] we ought to find a way to keep serving like a cached version of a requested page instead of instantly 503 [18:05:58] that would nice [18:06:15] Zppix: nah cause I can't edit then still [18:06:39] RhinosF1: i mean its better then staring at a 503 error page [18:07:55] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw3 [18:08:41] Zppix: true [18:09:20] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:09:37] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw3 [18:10:03] PROBLEM - cp2 Stunnel Http for mw3 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:10:06] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw3 [18:10:21] PROBLEM - cp3 Stunnel Http for mw3 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:10:59] * Zppix watches as the world of Miraheze burns [18:17:03] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.090 second response time [18:17:13] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [18:17:35] RECOVERY - cp2 Stunnel Http for mw3 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24570 bytes in 0.456 second response time [18:17:37] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [18:17:43] RECOVERY - cp3 Stunnel Http for mw3 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24570 bytes in 0.655 second response time [18:18:24] RECOVERY - glusterfs1 Puppet on glusterfs1 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:18:50] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [18:21:22] !log removed PII for a user (ToU enforcement) [18:21:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [18:28:51] RECOVERY - glusterfs2 Puppet on glusterfs2 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:31:15] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw1 [18:32:51] PROBLEM - Host mw1 is DOWN: PING CRITICAL - Packet loss = 100% [18:33:46] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: HTTP CRITICAL - No data received from host [18:34:24] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw1 [18:35:15] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw1 [18:35:15] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: HTTP CRITICAL - No data received from host [18:35:16] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.23, 7.35, 6.78 [18:36:09] PROBLEM - cp2 Stunnel Http for mw1 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:36:13] ... [18:36:27] oh [18:36:28] bugger [18:36:28] mw1 is down [18:36:36] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static-new] [18:38:39] noice [18:40:18] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 9.56, 8.23, 7.24 [18:47:45] RECOVERY - Host mw1 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms [18:47:49] PROBLEM - mw1 Disk Space on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:47:49] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: connect to address 185.52.1.75 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [18:47:49] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:47:49] PROBLEM - mw1 SSH on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:47:49] PROBLEM - mw1 Current Load on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:48:03] RECOVERY - mw1 Disk Space on mw1 is OK: DISK OK - free space: / 28096 MB (36% inode=98%); [18:48:03] RECOVERY - mw1 MirahezeRenewSsl on mw1 is OK: TCP OK - 0.000 second response time on 185.52.1.75 port 5000 [18:48:11] [02mw-config] 07GustaveLondon776 opened pull request 03#2766: Grant bureaucrats renameuser - 13https://git.io/Jesko [18:48:15] RECOVERY - mw1 Current Load on mw1 is OK: OK - load average: 0.03, 0.03, 0.00 [18:48:41] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 26 minutes ago with 0 failures [18:48:48] RECOVERY - mw1 SSH on mw1 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) [18:49:03] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 5.84, 7.58, 7.63 [18:49:03] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: connect to address 185.52.1.75 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [18:50:07] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jesk6 [18:50:09] [02miraheze/services] 07MirahezeSSLBot 034908619 - BOT: Updating services config for wikis [18:50:17] [02mw-config] 07Pix1234 commented on pull request 03#2766: Grant bureaucrats renameuser - 13https://git.io/Jeski [18:51:24] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24592 bytes in 0.684 second response time [18:51:50] PROBLEM - netazar.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.netazar.org' expires in 15 day(s) (Mon 07 Oct 2019 06:46:45 PM GMT +0000). [18:51:59] [02mw-config] 07RhinosF1 commented on pull request 03#2766: Grant bureaucrats renameuser - 13https://git.io/JeskP [18:52:01] [02mw-config] 07RhinosF1 closed pull request 03#2766: Grant bureaucrats renameuser - 13https://git.io/Jesko [18:52:32] PROBLEM - glusterfs2 Puppet on glusterfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [18:52:46] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24570 bytes in 0.004 second response time [18:52:46] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 442 bytes in 0.034 second response time [18:52:50] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [18:52:50] PROBLEM - www.netazar.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'www.netazar.org' expires in 15 day(s) (Mon 07 Oct 2019 06:46:45 PM GMT +0000). [18:53:07] RECOVERY - cp2 Stunnel Http for mw1 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.396 second response time [18:53:07] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/Jesk1 [18:53:09] [02miraheze/ssl] 07MirahezeSSLBot 0393c6b65 - Bot: Update SSL cert for www.netazar.org [18:53:34] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:53:43] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [18:54:59] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [18:55:51] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 3.75, 4.70, 6.29 [18:58:28] RECOVERY - glusterfs2 Puppet on glusterfs2 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [19:04:15] RECOVERY - netazar.org - LetsEncrypt on sslhost is OK: OK - Certificate 'www.netazar.org' will expire on Fri 20 Dec 2019 05:53:00 PM GMT +0000. [19:04:35] FYI to all, please let me know if you feel a small slowdown in performance. [19:04:50] E.g one minute it's fast next minute it's slow. [19:05:11] RECOVERY - www.netazar.org - LetsEncrypt on sslhost is OK: OK - Certificate 'www.netazar.org' will expire on Fri 20 Dec 2019 05:53:00 PM GMT +0000. [19:05:39] We've deployed a new file storage solution to meta and need feedback as to terms wether it's worse or better. [19:06:51] :) better so far I think [19:07:13] apart from whatever happened when I was enforcing the ToU, that was snail pace [19:07:57] great! [19:08:03] i'm hopping for better uptime [19:08:46] even if it means sometimes thumberneils not showing. (preferably everything working) but i rather the sites are up then the mount not timing out. [19:09:23] given there's been less downtime already I'd say it's good. The thumbnail issues seem minor afaik and are easily fixed. [19:09:37] ok [19:13:04] [02mw-config] 07GustaveLondon776 opened pull request 03#2767: Update LocalSettings.php - 13https://git.io/Jeskd [19:13:41] [02mw-config] 07RhinosF1 commented on pull request 03#2767: Update LocalSettings.php - 13https://git.io/JeskF [19:13:42] [02mw-config] 07RhinosF1 closed pull request 03#2767: Update LocalSettings.php - 13https://git.io/Jeskd [19:14:31] paladox: is there any way to block him opening pointless PRs? [19:14:44] huh? [19:14:48] PROBLEM - glusterfs2 Puppet on glusterfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:14:51] you carn't block pull requests [19:14:58] paladox: Gustave keeps opening crappy PRs that arent helpful [19:15:09] You can block him from the repo [19:15:22] I'll talk to him on wiki put pls do for a bit paladox [19:15:22] I'd support that [19:16:23] [02ManageWiki] 07RhinosF1 commented on pull request 03#117: Update extension.json - 13https://git.io/JeskA [19:16:25] [02ManageWiki] 07RhinosF1 closed pull request 03#117: Update extension.json - 13https://git.io/JesUx [19:17:08] paladox: meta slow [19:17:22] thanks, fast for me [19:17:36] paladox: recovered [19:19:48] Hmm re block, i'm not sure about that one. [19:20:01] you can block him from repos or the entire organization, wouldn't really recommend though [19:20:15] paladox: just for like 15-30 mins to give me time to warn him and for him to respond maybe [19:20:35] I rather not, unless he's spamming which his patches do not show. [19:20:42] ok [19:20:51] Zppix: bot dead [19:20:56] back [19:21:04] RhinosF1: patience young one [19:21:31] if the bot dies, wait five minutes before acknowledging it [19:28:15] RECOVERY - glusterfs2 Puppet on glusterfs2 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [19:41:43] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesI4 [19:41:45] [02miraheze/puppet] 07paladox 0331d9fac - mediawiki: Set fetch-attempt to 5 [19:44:02] PROBLEM - glusterfs2 Puppet on glusterfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [19:44:17] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesIR [19:44:18] [02miraheze/puppet] 07paladox 0327242d1 - Update mediawiki.pp [19:46:53] !log depool mw[123] in order and repool [19:46:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [19:50:24] RECOVERY - glusterfs2 Puppet on glusterfs2 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [19:58:43] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static-new] [20:05:59] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:35:51] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw3 [20:39:05] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 8.73, 7.77, 6.03 [20:39:40] PROBLEM - cp4 Stunnel Http for mw3 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:43:00] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 7.71, 7.52, 6.32 [20:43:24] RECOVERY - cp4 Stunnel Http for mw3 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24592 bytes in 0.189 second response time [20:43:30] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [20:46:01] is there a preinstalled feature for viewing site demographics on my wiki? [20:46:23] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 3.75, 6.13, 6.04 [20:47:46] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static-new] [20:53:47] PROBLEM - glusterfs1 Puppet on glusterfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [20:54:18] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:18] !log reboot glusterfs[12] [20:55:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [21:01:24] RECOVERY - glusterfs1 Puppet on glusterfs1 is OK: OK: Puppet is currently enabled, last run 7 minutes ago with 0 failures [21:05:36] RECOVERY - glusterfs1 Current Load on glusterfs1 is OK: OK - load average: 2.35, 2.23, 1.07 [21:05:41] RECOVERY - glusterfs2 Current Load on glusterfs2 is OK: OK - load average: 2.38, 2.28, 1.13 [21:33:17] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesLd [21:33:18] [02miraheze/dns] 07paladox 0379ad3ff - remove es[1234] [21:54:25] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 12.33, 6.13, 3.56 [21:57:21] ........ [21:57:35] PROBLEM - glusterfs2 Current Load on glusterfs2 is WARNING: WARNING - load average: 3.02, 3.74, 2.79 [21:58:01] paladox: whats up with gluster? [21:58:31] it has high load, i'm rsynincing but it should not be causing that type of high load. Disk performance is horrible :( [22:00:14] RECOVERY - glusterfs2 Current Load on glusterfs2 is OK: OK - load average: 1.59, 2.67, 2.53 [22:01:37] paladox: time for solid state! [22:02:35] RECOVERY - glusterfs1 Current Load on glusterfs1 is OK: OK - load average: 2.16, 3.29, 3.38 [22:19:21] dunno how well we could afford that [22:25:29] Voidwalker: first laabster starts accepting donations for miraheze :P [22:26:51] I'm hoping to make that a non-issue soon ;) [22:27:34] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [22:27:53] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw3 [22:28:09] Voidwalker: how so? [22:28:20] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw3 [22:30:07] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [22:30:26] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [22:30:39] we're (finally) making progress on one of our long-term goals that's been around since the beginning [22:30:57] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [22:31:21] Voidwalker: which is? [22:32:08] official charity status :) [22:33:00] Voidwalker: i thought miraheze was already 501c? [22:35:22] nope, that's been a goal since forever [22:40:44] PROBLEM - glusterfs1 Current Load on glusterfs1 is CRITICAL: CRITICAL - load average: 4.73, 4.77, 3.25 [22:40:50] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw1 [22:41:14] PROBLEM - glusterfs2 Current Load on glusterfs2 is CRITICAL: CRITICAL - load average: 4.31, 5.10, 3.45 [22:43:23] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. mw1 [22:44:00] PROBLEM - cp3 Stunnel Http for mw1 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:44:07] PROBLEM - glusterfs1 Current Load on glusterfs1 is WARNING: WARNING - load average: 3.07, 3.99, 3.25 [22:45:10] PROBLEM - cp4 Stunnel Http for mw1 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:45:10] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw1 [22:47:15] RECOVERY - cp3 Stunnel Http for mw1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24570 bytes in 0.664 second response time [22:47:17] RECOVERY - glusterfs1 Current Load on glusterfs1 is OK: OK - load average: 1.11, 2.55, 2.81 [22:47:46] RECOVERY - glusterfs2 Current Load on glusterfs2 is OK: OK - load average: 1.62, 2.95, 3.12 [22:47:56] RECOVERY - cp4 Stunnel Http for mw1 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24586 bytes in 0.007 second response time [22:47:56] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [22:49:13] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [22:49:54] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 6.22, 7.86, 6.43 [22:50:12] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [22:56:34] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static-new] [22:58:02] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 3.98, 5.98, 6.23 [23:14:44] Voidwalker:P [23:23:15] (We’ve got exciting news just waiting to hear back!) [23:24:11] paladox: whats up with all the alerts [23:26:04] That looks gluster related [23:26:48] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JesqL [23:26:50] [02miraheze/mw-config] 07paladox 0385dbff7 - Revert "Move sidemwiki to new mount" This reverts commit bd74c6a1a0432f978d7a114331334fa37860682f. [23:33:57] Actually it appears that it disconnected from gluster on mw1 [23:34:10] Hence why 503, then eventual recovery [23:35:02] Because by it disconnecting instead of staying there forever it allowed mw1 to technically recover [23:35:46] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:37:23] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config]