[00:00:56] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3642 MB (15% inode=93%); [00:03:31] PROBLEM - mon1 Current Load on mon1 is WARNING: WARNING - load average: 3.95, 3.52, 2.85 [00:13:29] RECOVERY - mon1 Current Load on mon1 is OK: OK - load average: 2.17, 3.21, 3.12 [00:19:55] This could just be the jobrunner trying to catch up, and it's nothing like the network utilization jump to 7 MB at ~15:00-16:00 hours yesterday, but I've noticed the [[Special:Watchlist]], among a couple other signs, on Meta are not able to "remember," or keep pace with, whether a page was last viewed. Looked at [00:19:56] https://grafana.miraheze.org/d/W9MIkA7iz/miraheze-cluster?viewPanel=289&orgId=1&from=now-30m&to=now-1m&var-job=node&var-node=mw7.miraheze.org&var-port=9100, and I do see more modest, albeit regular, traffic spikes from ~1 MBs to ~2 MBs. May be worth looking into before the jobrunner becomes way backlogged again. [00:19:57] [ Grafana ] - grafana.miraheze.org [00:47:12] Is there an issue with Echo? I'm getting notifications up to two hours later than I should be [00:47:42] Notifications from StructuredDiscussions if that makes a difference [00:57:39] No, but there is an issue with the JobQueue, since that is being slow, notifications are not being delivered on time [01:15:38] I imagine that's it, thanks [03:19:29] PROBLEM - mon1 Current Load on mon1 is CRITICAL: CRITICAL - load average: 4.29, 3.44, 2.37 [03:21:29] PROBLEM - mon1 Current Load on mon1 is WARNING: WARNING - load average: 3.90, 3.67, 2.59 [03:23:29] PROBLEM - mon1 Current Load on mon1 is CRITICAL: CRITICAL - load average: 5.42, 4.45, 3.01 [03:36:55] Voidwalker: himmalerin Notifications not being delievered in time is due to jobrunner [04:01:30] PROBLEM - mon1 Current Load on mon1 is WARNING: WARNING - load average: 1.23, 2.29, 3.69 [04:03:31] RECOVERY - mon1 Current Load on mon1 is OK: OK - load average: 1.36, 1.99, 3.40 [04:44:56] !log reception@jobrunner1:/srv/mediawiki/w/maintenance$ sudo -u www-data php deleteBatch.php --wiki reallifevillainswiki --r "Requested - T5997" /home/reception/rlv5.txt [04:44:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:50:27] !log reception@jobrunner1:/srv/mediawiki/w/maintenance$ sudo -u www-data php /srv/mediawiki/w/maintenance/importDump.php --wiki=wizpedia101wiki /home/reception/Wizpedia101-20200730225328.xml [04:50:31] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:25:11] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJaHw [05:25:13] [02miraheze/services] 07MirahezeSSLBot 033b5aa1d - BOT: Updating services config for wikis [06:05:29] RECOVERY - gluster2 APT on gluster2 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [06:18:40] RECOVERY - cp6 APT on cp6 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [06:32:35] [02miraheze/puppet] 07Reception123 deleted branch 03paladox-patch-2 [06:32:36] [02puppet] 07Reception123 deleted branch 03paladox-patch-2 - 13https://git.io/vbiAS [06:32:38] [02miraheze/puppet] 07Reception123 deleted branch 03paladox-patch-8 [06:32:39] [02puppet] 07Reception123 deleted branch 03paladox-patch-8 - 13https://git.io/vbiAS [06:32:53] RhinosF1: do you get why icinga-miraheze keeps moaning about APT being "OK" [06:33:04] RECOVERY - test2 APT on test2 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [06:33:13] if it told us all the time when every check was ok then we would lose our minds [06:33:33] Reception123: because it did warn us at some point overnight it wasn't ok I'm guessing [06:33:48] RhinosF1: well no one updated it in between... [06:34:23] Reception123: yeah, it's a recovery alert [06:34:31] As it was critical for 13 hours [06:34:59] but how does it recover with no intervention? do we have automatic updates for some things? [06:37:34] Reception123: unattended-upgrades [06:37:40] That's been there ages [06:37:59] RhinosF1: hmm then icinga shouldn't warn us about those [06:38:15] Reception123: yes it should [06:38:29] RhinosF1: why warn us about something done automatically? [06:38:34] It's a service going critical recovering [06:38:40] Reception123: so do many things [06:38:53] and I don't even know which updates are unattended so it makes me think I need to upgrade all the time [06:39:47] Reception123: it's so we can ensure it's upgrading fine and check. Let me find the not unattended list [06:40:01] oh if there's a list that would be useful [06:40:23] RECOVERY - cp7 APT on cp7 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [06:40:57] Reception123: https://github.com/miraheze/puppet/blob/10e5bc1d0fd5a047aa182eb2f83a257438d84d2d/modules/base/files/apt/50unattended-upgrades#L7 [06:40:58] [ puppet/50unattended-upgrades at 10e5bc1d0fd5a047aa182eb2f83a257438d84d2d · miraheze/puppet · GitHub ] - github.com [06:41:07] All them are NOT done unattended [06:41:23] RhinosF1: ok so then I've been fooled this entire time :P [06:41:37] Reception123: it's not harmful [06:41:48] I know, just useless since it's done by itself [06:42:09] RECOVERY - phab1 APT on phab1 is OK: APT OK: 0 packages available for upgrade (0 critical updates). [06:42:26] Reception123: apt list --upgradeable will tell you the things that can be updated [06:42:40] yup will look at that next time [07:38:27] RhinosF1 Hello. [07:40:33] .op [07:40:34] Please wait... [07:44:15] Hello SpainDist2! If you have any questions, feel free to ask and someone should answer soon. [07:44:16] .op [07:44:16] SpainDist2: Access Denied. If in error, please contact the channel founder. [07:44:21] .op [07:44:21] SpainDist2: Access Denied. If in error, please contact the channel founder. [07:44:23] .op [07:44:23] SpainDist2: Access Denied. If in error, please contact the channel founder. [07:44:28] I need to ban to RhinosF1 [07:44:37] RhinosF1 bans to me without reason. [09:14:58] [02puppet] 07RhinosF1 opened pull request 03#1464: MediaWiki-packages: drop lillypond - 13https://git.io/JJaAE [09:15:13] Reception123: ^ [09:19:09] Whoever does it might need to apt-get uninstall lillypond on all mediawiki servers (inc test2 and jobrunner) [09:37:41] SPF|Cloud, Reception123, PuppyKun: SRE assistance may be required for an emergency incident. Please stand by. [09:54:34] [02puppet] 07Reception123 closed pull request 03#1464: MediaWiki-packages: drop lillypond - 13https://git.io/JJaAE [09:54:35] [02miraheze/puppet] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJap7 [09:54:37] [02miraheze/puppet] 07RhinosF1 031045deb - MediaWiki-packages: drop lillypond (#1464) Per https://gerrit.wikimedia.org/r/c/operations/puppet/+/612274/2/modules/mediawiki/manifests/packages.pp and https://phabricator.wikimedia.org/T248418#6351818 [09:54:38] [ ⚓ T248418 Roll out videojs as the only video/audio player on all Wikimedia wikis ] - phabricator.wikimedia.org [09:54:38] See -staff, no further help needed [09:54:59] RhinosF1: wrong task I think [09:57:03] RhinosF1: also just FYI it's remove not uninstall [09:58:17] !log sudo apt-get remove lilypond on mw*/jbr/test2 [09:58:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:58:57] Reception123: read the comments of the task and ah! [10:01:17] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03Reception123-patch-1 [+0/-0/±1] 13https://git.io/JJahm [10:01:19] [02miraheze/mw-config] 07Reception123 03864bc8c - restrict score extension T5863 [10:01:20] [02mw-config] 07Reception123 created branch 03Reception123-patch-1 - 13https://git.io/vbvb3 [10:01:24] [02mw-config] 07Reception123 opened pull request 03#3201: restrict score extension - 13https://git.io/JJahO [10:01:40] [02mw-config] 07Reception123 edited pull request 03#3201: restrict score extension - 13https://git.io/JJahO [10:02:00] RhinosF1: ^ uh why do I see no travis? [10:02:13] going to do with a linter [10:02:36] [02mw-config] 07Reception123 closed pull request 03#3201: restrict score extension - 13https://git.io/JJahO [10:02:37] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJahZ [10:02:39] [02miraheze/mw-config] 07Reception123 0317e39de - restrict score extension (#3201) T5863 [10:02:40] [02miraheze/mw-config] 07Reception123 deleted branch 03Reception123-patch-1 [10:02:42] [02mw-config] 07Reception123 deleted branch 03Reception123-patch-1 - 13https://git.io/vbvb3 [10:10:17] Reception123: it's slow [10:10:33] RhinosF1: why does everything have to be slow [10:10:40] (yes I'm looking at jobrunner yesterday) [10:10:55] Reception123: not a clue [10:20:48] Reception123: I'm not seeing score showing as disabled [10:21:24] Reception123: the requires is there twice [10:21:32] https://github.com/miraheze/mw-config/compare/49fa8378ac68...17e39de9a663 [10:21:33] [ Comparing 49fa8378ac68...17e39de9a663 · miraheze/mw-config · GitHub ] - github.com [10:21:37] my bad [10:21:37] fixing [10:22:01] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJajh [10:22:02] [02miraheze/mw-config] 07Reception123 031453404 - fix mistake [10:28:18] Reception123: ty [10:33:40] Reception123: working [11:30:38] PROBLEM - misc1 APT on misc1 is CRITICAL: APT CRITICAL: 1 packages available for upgrade (1 critical updates). [11:52:45] We delete conflicted @MrJaroslavik! [11:55:29] Edit Delete conflict? Lol [11:56:50] Yep, we tried to delete at the same time [14:30:27] PROBLEM - wiki.mikrodev.com - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for wiki.mikrodev.com could not be found [14:31:58] ^ looks fine to me [14:37:24] RECOVERY - wiki.mikrodev.com - reverse DNS on sslhost is OK: rDNS OK - wiki.mikrodev.com reverse DNS resolves to cp7.miraheze.org [15:16:04] PROBLEM - wiki.hrznstudio.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.hrznstudio.com' expires in 15 day(s) (Mon 17 Aug 2020 15:10:34 GMT +0000). [15:20:10] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJVO5 [15:20:11] [02miraheze/ssl] 07MirahezeSSLBot 036bf7ada - Bot: Update SSL cert for wiki.hrznstudio.com [15:36:13] RECOVERY - wiki.hrznstudio.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.hrznstudio.com' will expire on Fri 30 Oct 2020 14:20:01 GMT +0000. [16:20:15] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJVsj [16:20:16] [02miraheze/services] 07MirahezeSSLBot 03a04e4a2 - BOT: Updating services config for wikis [16:23:08] PROBLEM - db7 Check MariaDB Replication on db7 is CRITICAL: MariaDB replication - both - CRITICAL - Slave_IO_Running state : Yes, Slave_SQL_Running state : Yes, Seconds_Behind_Master : 64s [16:30:59] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2651 MB (10% inode=93%); [16:35:11] RECOVERY - db7 Check MariaDB Replication on db7 is OK: MariaDB replication - both - OK - Slave_IO_Running state : Yes, Slave_SQL_Running state : Yes, Seconds_Behind_Master : 0s [16:40:16] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJVZU [16:40:18] [02miraheze/services] 07MirahezeSSLBot 034d39af0 - BOT: Updating services config for wikis [17:33:14] PROBLEM - cloud2 Current Load on cloud2 is CRITICAL: CRITICAL - load average: 29.20, 21.82, 17.65 [17:37:04] PROBLEM - gluster1 Current Load on gluster1 is CRITICAL: CRITICAL - load average: 12.82, 7.87, 4.57 [17:39:02] PROBLEM - gluster1 Current Load on gluster1 is WARNING: WARNING - load average: 7.95, 7.60, 4.86 [17:41:00] RECOVERY - gluster1 Current Load on gluster1 is OK: OK - load average: 4.61, 6.52, 4.80 [17:43:14] PROBLEM - cloud2 Current Load on cloud2 is WARNING: WARNING - load average: 15.79, 21.55, 20.71 [17:45:16] RECOVERY - cloud2 Current Load on cloud2 is OK: OK - load average: 18.43, 20.27, 20.30 [18:01:40] > @Zppix no, but i just woke up so there was nothing i did with it. i can confirm i can still login, it just showed one warning "action has been canceled as a precaution against session hijacking. Please resubmit the form" and then it worked @Miraheze-IRC [18:01:55] i'm having this issue today, and i can't log in at all! [18:01:57] https://cdn.discordapp.com/attachments/435711390544560128/739181137339088916/unknown.png [18:02:19] no matter how many times i resubmit the form i can't log in HELP [19:30:11] Reset your cookies [20:19:25] Any reason why $wgMetaNamespace was removed from ManageWiki? I remember that I was before [20:21:25] Hmm [22:39:13] PROBLEM - misc1 NTP time on misc1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:39:47] PROBLEM - misc1 Disk Space on misc1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:39:51] PROBLEM - misc1 Current Load on misc1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:40:34] PROBLEM - misc1 Puppet on misc1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:43:34] PROBLEM - misc1 SMTP on misc1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:44:49] PROBLEM - ping4 on misc1 is CRITICAL: PING CRITICAL - Packet loss = 100% [22:45:48] PROBLEM - misc1 SSH on misc1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:46:23] PROBLEM - Host misc1 is DOWN: CRITICAL - Time to live exceeded (185.52.1.76) [22:55:58] RECOVERY - wiki.valkyrienskies.org - reverse DNS on sslhost is OK: rDNS OK - wiki.valkyrienskies.org reverse DNS resolves to cp6.miraheze.org [23:00:20] PROBLEM - wiki.valkyrienskies.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Certificate 'wiki.valkyrienskies.org' expired on Fri 29 Nov 2019 13:32:19 GMT +0000. [23:00:23] RECOVERY - Host misc1 is UP: PING OK - Packet loss = 0%, RTA = 13.88 ms [23:00:24] PROBLEM - ping6 on misc1 is CRITICAL: PING CRITICAL - Packet loss = 100% [23:00:24] PROBLEM - misc1 IMAP on misc1 is CRITICAL: connect to address 185.52.1.76 and port 143: No route to host [23:00:25] RECOVERY - misc1 SMTP on misc1 is OK: SMTP OK - 0.129 sec. response time [23:00:25] RECOVERY - misc1 SSH on misc1 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) [23:00:25] RECOVERY - misc1 Current Load on misc1 is OK: OK - load average: 0.26, 0.06, 0.02 [23:00:37] RECOVERY - ping4 on misc1 is OK: PING OK - Packet loss = 0%, RTA = 12.89 ms [23:00:40] RECOVERY - misc1 IMAP on misc1 is OK: IMAP OK - 0.035 second response time on 185.52.1.76 port 143 [* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS LOGINDISABLED] Dovecot ready.] [23:00:48] RECOVERY - misc1 Puppet on misc1 is OK: OK: Puppet is currently enabled, last run 28 minutes ago with 0 failures [23:00:53] RECOVERY - misc1 Disk Space on misc1 is OK: DISK OK - free space: / 34423 MB (84% inode=99%); [23:01:15] PROBLEM - ping6 on misc1 is CRITICAL: PING CRITICAL - Packet loss = 100% [23:01:19] RECOVERY - ping6 on misc1 is OK: PING OK - Packet loss = 0%, RTA = 12.96 ms [23:01:19] RECOVERY - misc1 NTP time on misc1 is OK: NTP OK: Offset -0.00297704339 secs