[00:01:40] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 4170 MB (17% inode=93%); [00:52:36] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/Jf4bX [00:52:38] [02miraheze/puppet] 07paladox 03fada8f3 - openldap: Allow user 'writer' to write in ldap [00:52:39] [02puppet] 07paladox created branch 03paladox-patch-3 - 13https://git.io/vbiAS [00:52:41] [02puppet] 07paladox opened pull request 03#1375: openldap: Allow user 'writer' to write in ldap - 13https://git.io/Jf4b1 [01:18:03] [02puppet] 07paladox synchronize pull request 03#1375: openldap: Allow user 'writer' to write in ldap - 13https://git.io/Jf4b1 [01:18:05] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-3 [+0/-0/±1] 13https://git.io/Jf4Ng [01:18:06] [02miraheze/puppet] 07paladox 039fbff7f - Update openldap.pp [04:27:41] PROBLEM - cloud1 APT on cloud1 is CRITICAL: APT CRITICAL: 39 packages available for upgrade (5 critical updates). [04:49:30] PROBLEM - jobrunner1 APT on jobrunner1 is CRITICAL: APT CRITICAL: 32 packages available for upgrade (5 critical updates). [05:07:32] PROBLEM - db7 APT on db7 is CRITICAL: APT CRITICAL: 35 packages available for upgrade (5 critical updates). [05:13:41] PROBLEM - rdb2 APT on rdb2 is CRITICAL: APT CRITICAL: 26 packages available for upgrade (5 critical updates). [05:27:05] PROBLEM - puppet2 APT on puppet2 is CRITICAL: APT CRITICAL: 25 packages available for upgrade (5 critical updates). [05:32:35] PROBLEM - mw5 APT on mw5 is CRITICAL: APT CRITICAL: 32 packages available for upgrade (5 critical updates). [05:39:23] PROBLEM - dbt1 APT on dbt1 is CRITICAL: APT CRITICAL: 34 packages available for upgrade (5 critical updates). [05:56:27] PROBLEM - test2 APT on test2 is CRITICAL: APT CRITICAL: 36 packages available for upgrade (5 critical updates). [06:03:11] PROBLEM - ns2 APT on ns2 is CRITICAL: APT CRITICAL: 25 packages available for upgrade (5 critical updates). [06:06:29] RECOVERY - ns2 APT on ns2 is OK: APT OK: 20 packages available for upgrade (0 critical updates). [06:08:29] RECOVERY - db7 APT on db7 is OK: APT OK: 30 packages available for upgrade (0 critical updates). [06:13:29] RECOVERY - cloud1 APT on cloud1 is OK: APT OK: 34 packages available for upgrade (0 critical updates). [06:14:05] RECOVERY - puppet2 APT on puppet2 is OK: APT OK: 20 packages available for upgrade (0 critical updates). [06:16:34] RECOVERY - jobrunner1 APT on jobrunner1 is OK: APT OK: 27 packages available for upgrade (0 critical updates). [06:20:25] RECOVERY - dbt1 APT on dbt1 is OK: APT OK: 29 packages available for upgrade (0 critical updates). [06:25:17] Hi guys [06:40:58] RECOVERY - test2 APT on test2 is OK: APT OK: 31 packages available for upgrade (0 critical updates). [06:54:17] RECOVERY - mw5 APT on mw5 is OK: APT OK: 27 packages available for upgrade (0 critical updates). [06:58:14] Reception123: around? [06:58:41] RECOVERY - rdb2 APT on rdb2 is OK: APT OK: 21 packages available for upgrade (0 critical updates). [06:58:46] PROBLEM - mw5 Puppet on mw5 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php7.3-apcu] [07:05:37] RECOVERY - mw5 Puppet on mw5 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:10:20] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfBTg [07:10:21] [02miraheze/services] 07MirahezeSSLBot 03236015b - BOT: Updating services config for wikis [07:18:01] PROBLEM - rdb1 APT on rdb1 is CRITICAL: APT CRITICAL: 26 packages available for upgrade (5 critical updates). [07:27:41] PROBLEM - gluster1 APT on gluster1 is CRITICAL: APT CRITICAL: 27 packages available for upgrade (5 critical updates). [07:54:59] PROBLEM - services2 APT on services2 is CRITICAL: APT CRITICAL: 31 packages available for upgrade (5 critical updates). [07:55:45] RhinosF1: yes [07:56:15] PROBLEM - bacula2 APT on bacula2 is CRITICAL: APT CRITICAL: 25 packages available for upgrade (5 critical updates). [07:56:42] Reception123: see the task I raised to high on fandom [08:01:28] Reception123: *phabricator [08:01:36] * RhinosF1 is multitasking badly [08:05:12] looking [08:07:10] RhinosF1: would that be the MediaWikiChat extension? [08:07:12] morning JohnLewis [08:07:59] Reception123: I’m not sure, they were changes by paladox to the MangeWikiInstaller and that’s all I can see that’s changed [08:08:17] RhinosF1: yeah he made the changes because it didn't work before [08:08:22] but I thought it was fixed [08:08:26] JohnLewis: this is about https://phabricator.miraheze.org/T5587 [08:08:27] [ ⚓ T5587 Blockedfromchat permission issue ] - phabricator.miraheze.org [08:08:56] Reception123: he made the changes to fix mwscript and the reset defaults script yesterday [08:09:10] oh [08:09:18] well I only ran a script on another wiki [08:09:50] Reception123: ManageWiki is the only thing I can see that’s changed [08:09:56] Reception123: it’s every wiki [08:10:53] ah well mwscript, because reset defaults is only done individually [08:11:07] https://csydes.miraheze.org/wiki/Special:ListGroupRights [08:11:08] [ User group rights - C.Syde's Wiki ] - csydes.miraheze.org [08:11:36] Reception123: I’m saying that’s what the changes were supposed to fix yesterday. It may have broke other things. [08:11:47] yeah it's possible [08:11:57] though that permission has always had issues [08:12:09] paladox, JohnLewis: can you see ^ [08:12:22] Reception123: it worked fine a few days ago [08:12:46] It’s definately not always had issues as I’ve seen it work before [08:14:27] Reception123: can you also re-enable wikibase on test2? [08:14:29] !log sudo -u www-data php /srv/mediawiki/w/maintenance/deleteBatch.php --wiki awesomegameswiki -r "Requested - T5583" /home/reception/cg2.txt (new) [08:14:33] RhinosF1: sure [08:14:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [08:14:41] * RhinosF1 recommeneds it is never disabled as it breaks things [08:15:03] * Reception123 wonders if one day someone will break test2 so bad that we'll have to drop it and reset [08:15:16] RhinosF1: it is enabled... [08:16:05] Reception123: it wasn’t yesterday. Maybe paladox sorted it [08:16:39] probably [08:17:19] Reception123: he did. Check exception 3bdfcb57480e80ce535678d2 please on https://test2.miraheze.org/wiki/Item:Q19 [08:17:20] [ Internal error - Test2 ] - test2.miraheze.org [08:17:52] Note: Disabling wikibase should not be done without a sysadmin willing to debug’s intervention [08:18:05] 2020-05-14 08:16:37 test2 test2wiki: [3bdfcb57480e80ce535678d2] /wiki/Item:Q19 InvalidArgumentException from line 123 of /srv/mediawiki/w/includes/api/ApiModuleManager.php: $spec must define a class name [08:18:08] RhinosF1: ^ [08:19:23] Reception123: what the ... [08:20:10] Reception123: are you ready for some fun trying to safely disable WikiBase and re-enable it ? [08:20:36] RhinosF1: I'll do it but if test2 is down I won't be there for the fix :P [08:21:21] Reception123: we need to use delete batch to empty anything in wikibase’s namespaces [08:21:35] Disable wikibase extensions, then drop the namespaces [08:22:56] Ok, I'll be back in a bit to do that [08:24:17] ManageWiki doesn’t deal with wgRevokedPermissions currently [08:26:10] JohnLewis: what broke it then? [08:26:13] And when? [08:26:59] * RhinosF1 also thinks wikibase needs a ‘do not disable unless you know what you’re doing warning. Can break stuff in wonderful ways’ [08:27:08] Improper config when CreateWiki cache took over ages ago? [08:27:39] JohnLewis: could be? Can we fix it? [08:27:57] Yes, by doing proper config for it [08:28:07] Also, how would we run populateSites for all WikiBase wikis on a regular interval? [08:28:13] JohnLewis: which is? [08:28:17] LE.php ? [08:28:42] Yeah [08:29:26] I’ll fix when at pc [08:29:37] JohnLewis: what about running that script regularly [08:29:49] We’d need to know all WikiBase client wikis at the time though [08:31:04] Its complictaed, also boils down to needs [08:31:28] JohnLewis: the sites table would be out of date on those wikis as soon as a new wiki was created [08:31:33] So there’s a need [08:31:41] Well, not really [08:31:49] Especially if I can work out what’s going on with miradatawiki [08:31:50] What if no one needs new wikis added? [08:32:22] JohnLewis: if miradata wiki works, they will. It doesn’t make sense to have a outdated sites table [08:32:50] Why doesn't it, if all they wikis they need are there? [08:33:31] PROBLEM - cp7 Current Load on cp7 is CRITICAL: CRITICAL - load average: 2.41, 4.07, 3.00 [08:33:38] JohnLewis: well, miradata wiki would never work without every wiki. [08:33:59] why? [08:34:28] How can we have a wiki allowing you to link pages on each wiki if all wikis aren’t on there [08:34:44] You know how wikidata works, think that for Miraheze [08:35:00] Okay [08:35:13] Well work with someone on it [08:35:31] I’m asking how to get the list of WikiBase client wikis [08:35:34] If we can do that [08:35:40] As a dblist [08:35:45] Then the puppet change is simple [08:35:51] There's no way currently [08:36:23] * RhinosF1 not even sure how to create a dblist so will need help here [08:36:54] PROBLEM - cp7 Current Load on cp7 is WARNING: WARNING - load average: 3.41, 3.85, 3.12 [08:36:59] per settings, it's not possible [08:37:18] That’s a bottleneck then [08:37:56] Can be done with a script in MirahezeMagic but won't be supported in CW or MW [08:38:13] That’s a start, how can it be done? [08:38:43] looking through every wiki, check if its set, adding it to an array, then writing it to file [08:40:13] RECOVERY - cp7 Current Load on cp7 is OK: OK - load average: 1.38, 2.62, 2.77 [08:41:44] JohnLewis: that means checking against mw_extensions ? [08:42:29] That's one way [08:43:14] JohnLewis: I don't know of another way. You'd have to look at every wikis table table and check if wikibaseclient was listed [08:43:24] * RhinosF1 would need to work out the schema for that [08:43:35] * RhinosF1 would need to brush up his sql as well [08:44:33] [02mw-config] 07RhinosF1 opened pull request 03#3076: set blockedfromchat correctly - 13https://git.io/JfBqs [08:44:45] paladox: ^ that fixes it [08:47:14] ok [08:49:28] [02mw-config] 07paladox synchronize pull request 03#3076: set blockedfromchat correctly - 13https://git.io/JfBqs [08:50:17] paladox: why an override? Why not an addition? [08:50:41] ^ [08:51:24] ah, right. [08:52:14] [02mw-config] 07paladox synchronize pull request 03#3076: set blockedfromchat correctly - 13https://git.io/JfBqs [08:52:16] JohnLewis better? [08:52:49] yeah, but why the need for the array? [08:52:55] Can be done in one line, not three [08:54:18] [02mw-config] 07paladox synchronize pull request 03#3076: set blockedfromchat correctly - 13https://git.io/JfBqs [08:58:14] [02mw-config] 07paladox closed pull request 03#3076: set blockedfromchat correctly - 13https://git.io/JfBqs [08:58:16] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfBqF [08:58:17] [02miraheze/mw-config] 07RhinosF1 03eaf2fbd - set blockedfromchat correctly (#3076) [09:09:02] RhinosF1: still need me for wikibase? [09:09:08] Reception123: yes [09:09:20] RhinosF1: so what do you want me to do again? [09:09:26] 09:21:21 Reception123: we need to use delete batch to empty anything in wikibase’s namespaces [09:09:26] 09:21:35 Disable wikibase extensions, then drop the namespaces [09:09:29] Reception123: ^ [09:09:39] ok [09:10:54] RhinosF1: a manual delete is probably easier than deletebatch tbh since there's just a few [09:11:12] Reception123: if you can reach delete on wiki then go ahead [09:11:21] gives an internal error but delete still works [09:11:58] okay [09:14:51] RhinosF1: pages deleted and wikibase disabled but there is no delete option for the item NS [09:15:31] Reception123: can you chuck me crat on test2 to look? [09:16:42] ok [09:17:16] heh for some reason I can't change userrights [09:17:43] Reception123: how lovely [09:17:52] paladox: how come group assignments got removed on test2? [09:18:11] what do you mean? [09:18:39] paladox: I mean I wasn't able to assign userrights as a crat [09:18:48] because "Group assignments" was empty in MWP [09:19:06] I've restored now [09:19:14] it shows for me [09:19:32] paladox: because I just added it back :P [09:20:38] don't get how it disappeared before [09:21:47] heh [09:22:07] Reception123: echo keeps saying 1 notification then when. I click the bell it says none [09:22:18] paladox: just worried because I hope that only happened for test2 [09:22:47] RhinosF1: yeah echo is annoying at times [09:22:54] Reception123: that's a bug [09:23:06] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_34 [+0/-0/±2] 13https://git.io/JfBmi [09:23:08] [02miraheze/mediawiki] 07paladox 03c67d464 - Update DiscordNotifications and SlackNotifications [09:23:09] * RhinosF1 thinks ManageWiki might be trying to be intelligent [09:23:10] * Reception123 blames upstream [09:24:21] Reception123: let's see if this works [09:24:24] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfBmX [09:24:26] [02miraheze/mw-config] 07paladox 03c596479 - wgWikiUrl -> wgDiscordNotificationWikiUrl [09:36:32] paladox: so can we install slack now? [09:36:38] yes [09:36:51] paladox: could you please do the puppet part since last time somehow I broke it and you managed to fix? [09:37:11] what puppet part? [09:37:30] paladox: well the private webhook part [09:37:35] adding it in common.yaml [09:37:47] did you remove it? [09:38:01] paladox: I think you did? [09:38:06] I'll chec [09:38:08] *check [09:38:39] nope [09:38:40] mediawiki::wiki_slack_hooks_url: [09:38:40] default: '' [09:38:41] PROBLEM - cp7 Current Load on cp7 is WARNING: WARNING - load average: 1.90, 3.50, 3.29 [09:40:45] paladox: oh it's still there ok :) [09:42:00] RECOVERY - cp7 Current Load on cp7 is OK: OK - load average: 1.56, 2.72, 3.04 [10:31:45] paladox: so can I re-add https://github.com/miraheze/puppet/commit/dbf238820e81f5a0bdfcfc35885bb075cae0d0eb ? [10:31:45] [ rm slacknotifications · miraheze/puppet@dbf2388 · GitHub ] - github.com [10:31:55] and do I need to merge mw-config before or puppet before? [10:32:11] yes [10:32:22] though tbf, you didn't really need to remove that in the first place [10:33:22] ok [10:33:28] paladox: which one first though? [10:33:38] the puppet change [10:35:16] ok [10:35:21] thanks [10:39:37] [02miraheze/puppet] 07Reception123 pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JfBsk [10:39:38] [02miraheze/puppet] 07Reception123 039651400 - Revert "rm slacknotifications" This reverts commit dbf238820e81f5a0bdfcfc35885bb075cae0d0eb. [10:52:56] [02mw-config] 07Reception123 synchronize pull request 03#3069: re-add slacknotifications - 13https://git.io/Jf8CH [10:52:57] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03Reception123-patch-4 [+0/-0/±1] 13https://git.io/JfBsX [10:52:59] [02miraheze/mw-config] 07Reception123 03bad0dd3 - changes to ext [10:55:44] [02mw-config] 07Reception123 closed pull request 03#3069: re-add slacknotifications - 13https://git.io/Jf8CH [10:55:46] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±4] 13https://git.io/JfBs5 [10:55:47] [02miraheze/mw-config] 07Reception123 03e983c44 - re-add slacknotifications (#3069) * re-add slacknotifications fix proposed by the developer. * Update LocalExtensions.php * changes to ext [11:03:34] !log ran messagefiles and rebuild LC on mw*/jbr1 [11:03:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [11:15:31] PROBLEM - mon1 APT on mon1 is CRITICAL: APT CRITICAL: 31 packages available for upgrade (5 critical updates). [11:58:50] !log added slack hook for jungegeotechnikerwiki [11:58:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:02:22] paladox: there's been a few alerts about critical updates on servers, can you look through and do them if needed? [12:02:34] Reception123: anything else for me to break today? [12:02:47] don't think so ;) [12:03:09] !log apt-get dist-upgrade - mon1 [12:03:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:03:58] PROBLEM - ping4 on cp3 is UNKNOWN: /bin/ping -4 -n -U -w 10 -c 5 128.199.139.216CRITICAL - Could not interpret output from ping command [12:04:48] Alert to Miraheze Staff: It looks like the icinga-miraheze bot has stopped! Ping !sre. [12:04:49] https://meta.miraheze.org is 03UP03 [12:04:54] !log reboot mon1 [12:04:57] *rehash [12:04:57] Rehashing...... [12:05:04] !shutup icinga [12:05:04] I will not report Icinga incidents [12:05:55] RECOVERY - mon1 APT on mon1 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [12:06:04] So I can't delete groups on my wiki? [12:06:09] paladox: rdb1, gluster1, services2 and bacula2 are the others [12:06:18] CptViraj: user groups? [12:06:20] PROBLEM - ping4 on cp3 is CRITICAL: PING CRITICAL - Packet loss = 0%, RTA = 255.36 ms [12:06:33] they all need it. [12:06:49] some cannot be rebooted like the db servers and cloud servers. [12:06:50] RhinosF1: yep user groups [12:07:00] but i can update. [12:07:31] paladox: ty, thx for looking [12:07:53] CptViraj: if they're created through managewiki, yes. Which one on which wiki? [12:07:58] !speak [12:07:59] I will now report events to this channel [12:08:05] RECOVERY - rdb1 APT on rdb1 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [12:08:26] RhinosF1: ah they are MediaWiki default ones [12:08:44] !log reboot rdb1 [12:08:46] !log reboot mon1 [12:08:47] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:08:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:09:18] !log apt-get dist-upgrade - rdb1 [12:09:20] RhinosF1: for example Rollback [12:09:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:09:28] !log apt-get dist-upgrade - phab1 [12:09:31] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:09:43] CptViraj: I will test that now [12:09:52] * RhinosF1 guesses he can break test2 [12:11:25] paladox: if you're rebooting test2, ping me so I know it's you. [12:11:50] !log apt-get dist-upgrade - bacula2 [12:11:56] !log reboot phab1 and bacula2 [12:12:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:12:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:13:21] CptViraj: One moment [12:13:48] !log apt-get dist-upgrade - puppet2 [12:13:53] !log apt-get dist-upgrade - services1 [12:13:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:14:07] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:14:41] !log apt-get dist-upgrade - services2 [12:14:53] !log reboot puppet2, services1 [12:14:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:14:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:15:11] CptViraj: it seems no [12:15:40] RhinosF1: okay thanks [12:15:46] RECOVERY - bacula2 APT on bacula2 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [12:15:48] !log apt-get dist-upgrade - cp6 & ns2 [12:15:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:16:31] PROBLEM - phab1 Puppet on phab1 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [12:17:06] RECOVERY - services2 APT on services2 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [12:17:44] paladox: i'm done on test2 [12:17:49] ok [12:18:17] !log apt-get dist-upgrade - test2 [12:18:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:18:40] PROBLEM - ns1 APT on ns1 is CRITICAL: APT CRITICAL: 23 packages available for upgrade (5 critical updates). [12:18:45] !log reboot ns2 [12:18:46] PROBLEM - cloud1 Current Load on cloud1 is CRITICAL: CRITICAL - load average: 27.95, 26.68, 19.81 [12:18:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:18:50] PROBLEM - bacula2 Puppet on bacula2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [12:19:36] !log reboot cp6 [12:19:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:19:52] !log reboot test2 [12:19:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:20:46] !log apt-get dist-upgrade - jobrunner1 [12:20:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:21:10] !log apt-get dist-upgrade - ns1 [12:21:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:21:54] RhinosF1: How to delete a user group which is created using ManageWiki? [12:22:17] !log apt-get dist-upgrade - ldap1 [12:22:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:22:39] RECOVERY - ns1 APT on ns1 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [12:22:47] PROBLEM - cloud1 Current Load on cloud1 is WARNING: WARNING - load average: 22.73, 23.63, 20.02 [12:23:30] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfBCO [12:23:32] [02miraheze/puppet] 07paladox 039fe9d9c - nutcracker: Fallover to rdb1 [12:23:43] !log reboot ns1 [12:23:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:23:53] !log reboot jobrunner1 [12:24:00] !log reboot ldap1 [12:24:11] RECOVERY - phab1 Puppet on phab1 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:24:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:24:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:24:28] !log apt-get dist-upgrade - misc1 [12:24:31] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:25:27] !log reboot misc1 [12:25:31] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:26:14] !log apt-get dist-upgrade - gluster1 [12:26:35] RECOVERY - bacula2 Puppet on bacula2 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:26:36] PROBLEM - cloud1 Current Load on cloud1 is CRITICAL: CRITICAL - load average: 24.38, 23.61, 20.81 [12:26:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:28:58] RECOVERY - gluster1 APT on gluster1 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [12:29:07] !log reboot gluster1 [12:29:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:29:41] !log apt-get dist-upgrade - gluster2 [12:29:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:30:03] PROBLEM - cp7 Current Load on cp7 is WARNING: WARNING - load average: 2.28, 3.93, 3.49 [12:30:14] RECOVERY - cloud1 Current Load on cloud1 is OK: OK - load average: 16.64, 17.05, 18.58 [12:30:40] PROBLEM - jobrunner1 Puppet on jobrunner1 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [12:30:54] !log reboot gluster2 [12:30:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:31:49] !log apt-get dist-upgrade - mail1 [12:31:51] PROBLEM - jobrunner1 JobRunner Service on jobrunner1 is CRITICAL: PROCS CRITICAL: 0 processes with args 'redisJobRunnerService' [12:31:51] !log apt-get dist-upgrade - rdb2 [12:32:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:32:42] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:33:48] PROBLEM - cp7 Current Load on cp7 is CRITICAL: CRITICAL - load average: 4.27, 3.39, 3.31 [12:34:20] !log reboot rdb1 [12:34:22] RECOVERY - jobrunner1 Puppet on jobrunner1 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [12:34:31] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:35:14] !log reboot mail1 [12:35:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:35:41] RECOVERY - jobrunner1 JobRunner Service on jobrunner1 is OK: PROCS OK: 1 process with args 'redisJobRunnerService' [12:37:23] RECOVERY - cp7 Current Load on cp7 is OK: OK - load average: 1.20, 2.65, 3.06 [12:39:04] PROBLEM - cloud1 Current Load on cloud1 is WARNING: WARNING - load average: 12.40, 19.90, 20.60 [12:40:52] RhinosF1: finally got slack working [12:40:55] thanks for the help paladox ! [12:41:34] Reception123: great! [12:41:38] * RhinosF1 is breaking core [12:42:39] PROBLEM - cloud1 Current Load on cloud1 is CRITICAL: CRITICAL - load average: 24.67, 21.97, 21.13 [12:44:03] PROBLEM - db6 APT on db6 is CRITICAL: APT CRITICAL: 35 packages available for upgrade (5 critical updates). [12:45:47] !log depool mw4 [12:45:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:45:57] RECOVERY - cloud1 Current Load on cloud1 is OK: OK - load average: 9.45, 16.70, 19.27 [12:50:00] !log apt-get dist-upgrade and reboot on mw4 [12:50:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:50:41] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 1 backends are down. mw4 [12:51:00] !log repool mw4 [12:51:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:51:28] !log depool mw5 [12:51:38] PROBLEM - cp7 Varnish Backends on cp7 is CRITICAL: 1 backends are down. mw5 [12:51:43] PROBLEM - cp6 Varnish Backends on cp6 is CRITICAL: 1 backends are down. mw5 [12:52:14] ^ we know what icinga is spewing [12:53:28] RhinosF1: well it's a bit weird it's saying it for cp6/7 but not 3/8 [12:53:48] Reception123: icinga is weird [12:53:58] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw5 [12:54:09] ah there it is [12:54:11] !log apt-get dist-upgrade and reboot mw5 [12:54:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:54:23] checks must run at different times [12:54:45] !log repool mw5 [12:54:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:55:01] !log depool mw6 [12:55:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:56:59] !log apt-get dist-upgrade and reboot mw6 [12:57:25] !log repool mw6 [12:57:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [12:58:04] !log depool mw7 [12:58:55] PROBLEM - mw5 Puppet on mw5 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php7.3-apcu] [12:59:03] ^ paladox [12:59:14] thanks! [12:59:37] paladox: still finishing mw7, though why's icinga complaining about php7.3 on mw5? [12:59:57] likley you upgraded the package at the same time puppet was running? [13:00:04] it'll resolve its self on the next run [13:00:08] ok :) [13:00:38] !log apt-get dist-upgrade and reboot mw7 [13:00:44] Reception123: puppet does wonderful stuff if you wait 10 mins [13:01:00] !log repool mw7 [13:01:08] paladox: hmm that's odd I'm getting "Sorry! We could not process your edit due to a loss of session data." [13:01:16] and MirahezeLogbot doesn't seem to be logging stuff [13:01:25] err [13:01:35] Reception123 run puppet! [13:01:37] ok [13:01:43] nutcracker didn't start by the sounds of things [13:01:59] paladox: already running [13:02:25] will someone fix echo as well? that annoying 1 when the notification won't show is weird [13:02:46] !log restart lobot on mon1 [13:02:48] well first we need to fix this, and then I'll have to go [13:02:55] !log restart lobot on mon1 [13:02:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:03:10] ah there you go, fixed :) [13:03:41] RECOVERY - mw5 Puppet on mw5 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [13:03:51] Reception123: shall I file a task about echo? [13:04:08] yeah why not [13:04:12] I'll be back later [13:08:15] PROBLEM - ping4 on mw7 is CRITICAL: PING CRITICAL - Packet loss = 100% [13:08:49] uh [13:09:07] PROBLEM - mw7 APT on mw7 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:09:28] PROBLEM - cp7 Stunnel Http for mw7 on cp7 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:09:44] PROBLEM - mw7 php-fpm on mw7 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:10:05] PROBLEM - mw7 Puppet on mw7 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [13:10:09]