[00:00:19] oh, glusterfs OOM'd before I had a chance to restart it [00:01:32] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 3593 MB (14% inode=93%); [00:02:20] RECOVERY - mon1 Disk Space on mon1 is OK: DISK OK - free space: / 4892 MB (13% inode=93%); [00:02:31] PROBLEM - bacula2 Bacula Static on bacula2 is CRITICAL: CRITICAL: Timeout or unknown client: gluster1-fd [00:03:35] !log start bacula-fd on gluster1 [00:03:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:04:25] RECOVERY - bacula2 Bacula Static on bacula2 is OK: OK: Full, 1664252 files, 197.7GB, 2020-07-19 23:58:00 (6.4 minutes ago) [00:05:29] PROBLEM - gluster1 Puppet on gluster1 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Service[bacula-fd],Exec[/mnt/mediawiki-static] [00:10:21] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJcg2 [00:10:23] [02miraheze/services] 07MirahezeSSLBot 03ebb0433 - BOT: Updating services config for wikis [00:13:31] !log db12 high on mem usage, added 2G swap + entry in /etc/fstab [00:13:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:16:23] PROBLEM - mw6 APT on mw6 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [00:20:06] <-CloudGuy38-> I also see the email that the Scratch Team wants to delete Bad Scratch Wiki https://badscratch.miraheze.org , not just privated, also what are the steps to delete it? I'll try to tell the stewards about it. [00:20:07] [ Bad Scratch Wiki ] - badscratch.miraheze.org [00:33:24] !log db12: set table_(definition|open)_cache to 4000 [00:33:27] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:43:21] RECOVERY - gluster1 Puppet on gluster1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:48] !log root@gluster1:/var/log/glusterfs# gluster volume set mvol open-behind off [00:44:51] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:48:29] !log reboot jobrunner1 - gluster mount [00:48:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:50:28] PROBLEM - jobrunner1 MirahezeRenewSsl on jobrunner1 is CRITICAL: connect to address 51.89.160.135 and port 5000: Connection refused [00:51:17] PROBLEM - jobrunner1 APT on jobrunner1 is CRITICAL: APT CRITICAL: 4 packages available for upgrade (3 critical updates). [00:52:24] RECOVERY - jobrunner1 MirahezeRenewSsl on jobrunner1 is OK: TCP OK - 0.000 second response time on 51.89.160.135 port 5000 [01:10:14] PROBLEM - cloud1 APT on cloud1 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [01:16:55] PROBLEM - miraheze.wiki - DNS on sslhost is CRITICAL: DNS CRITICAL - expected '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142' but got '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210' [01:17:04] PROBLEM - ml.gyaanipedia.co.in - DNS on sslhost is CRITICAL: DNS CRITICAL - expected '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142' but got '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.89.160.142' [01:17:14] PROBLEM - dariawiki.org - DNS on sslhost is CRITICAL: DNS CRITICAL - expected '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142' but got '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210' [01:23:37] RECOVERY - miraheze.wiki - DNS on sslhost is OK: DNS OK: 0.052 seconds response time. sslhost returns 2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142 [01:23:53] RECOVERY - dariawiki.org - DNS on sslhost is OK: DNS OK: 0.041 seconds response time. sslhost returns 2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142 [01:23:57] RECOVERY - ml.gyaanipedia.co.in - DNS on sslhost is OK: DNS OK: 0.049 seconds response time. sslhost returns 2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142 [01:41:14] PROBLEM - mw4 APT on mw4 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [02:20:44] PROBLEM - cp3 APT on cp3 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [02:57:09] PROBLEM - cloud3 APT on cloud3 is CRITICAL: APT CRITICAL: 13 packages available for upgrade (1 critical updates). [03:06:41] PROBLEM - services1 APT on services1 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [03:20:19] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJcr9 [03:20:21] [02miraheze/services] 07MirahezeSSLBot 03a08bf11 - BOT: Updating services config for wikis [04:44:16] PROBLEM - mw5 APT on mw5 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [05:31:29] PROBLEM - cloud2 APT on cloud2 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [05:37:42] PROBLEM - services2 APT on services2 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [06:04:15] RECOVERY - cloud1 APT on cloud1 is OK: APT OK: 53 packages available for upgrade (0 critical updates). [06:11:14] RECOVERY - mw4 APT on mw4 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [06:11:41] PROBLEM - rdb2 APT on rdb2 is CRITICAL: APT CRITICAL: 3 packages available for upgrade (2 critical updates). [06:13:14] RECOVERY - jobrunner1 APT on jobrunner1 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [06:19:41] RECOVERY - services2 APT on services2 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [06:26:21] RECOVERY - mw6 APT on mw6 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [06:36:41] RECOVERY - services1 APT on services1 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [06:39:09] RECOVERY - cloud3 APT on cloud3 is OK: APT OK: 12 packages available for upgrade (0 critical updates). [06:49:40] RECOVERY - rdb2 APT on rdb2 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [06:51:30] RECOVERY - cloud2 APT on cloud2 is OK: APT OK: 53 packages available for upgrade (0 critical updates). [06:52:53] RECOVERY - cp3 APT on cp3 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [06:54:16] RECOVERY - mw5 APT on mw5 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [07:19:49] .help [07:19:49] Hang on, I'm creating a list. [07:19:51] I've posted a list of my commands at https://clbin.com/uZGd3 - You can see more info about any of these commands by doing .help (e.g. .help time) [07:20:39] .tmask Welcome to the IRC channel of Miraheze, a free non-profit wiki hosting provider! | https://meta.miraheze.org | Status: {} | SRE Duty: {} | This channel is publicly logged at http://wm-bot.wmflabs.org/browser/index.php?display=%23miraheze | By participating in this channel, you agree to abide by our Code of Conduct: https://meta.miraheze.org/m/PA [07:20:39] Gotcha, RhinosF1 [07:20:40] [ Meta ] - meta.miraheze.org [07:20:40] [ Wikimedia IRC logs browser ] - wm-bot.wmflabs.org [07:20:41] [ Code of Conduct - Miraheze Meta ] - meta.miraheze.org [07:20:50] .topic Up RhinosF1 [07:20:50] Please wait... [07:20:51] Not enough arguments. You gave 1, it requires 2. [07:21:33] .topic Up,RhinosF1 [07:21:33] Not enough arguments. You gave 1, it requires 2. [07:21:40] .help topic [07:21:41] Change the channel topic. The bot must be a channel operator for this command to work. [07:21:41] e.g. .topic Your Great New Topic [07:21:56] .showmask [07:21:56] Welcome to the IRC channel of Miraheze, a free non-profit wiki hosting provider! | https://meta.miraheze.org | Status: {} | SRE Duty: {} | This channel is publicly logged at http://wm-bot.wmflabs.org/browser/index.php?display=%23miraheze | By participating in this channel, you agree to abide by our Code of Conduct: https://meta.miraheze.org/m/PA [07:22:09] .topic . Up RhinosF1 [07:22:10] Not enough arguments. You gave 1, it requires 2. [07:22:18] Hmm you're broke [08:07:36] RhinosF1: add DT right to sysops in ManageWikiExtensions [08:12:34] MirahezeBot: later I got osticket to fail over [11:08:35] PROBLEM - www.wikimicrofinanza.it - DNS on sslhost is CRITICAL: DNS CRITICAL - expected '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142' but got '2001:41d0:800:1056::2,51.77.107.210,51.89.160.142' [11:08:36] PROBLEM - wiki.nowchess.org - DNS on sslhost is CRITICAL: DNS CRITICAL - expected '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142' but got '2001:41d0:800:1056::2,51.77.107.210,51.89.160.142' [11:09:08] Huh [11:15:27] RECOVERY - www.wikimicrofinanza.it - DNS on sslhost is OK: DNS OK: 0.051 seconds response time. sslhost returns 2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142 [11:15:30] RECOVERY - wiki.nowchess.org - DNS on sslhost is OK: DNS OK: 0.038 seconds response time. sslhost returns 2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142 [12:00:09] PROBLEM - cyberlaw.ccdcoe.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'cyberlaw.ccdcoe.org' expires in 15 day(s) (Wed 05 Aug 2020 11:53:28 GMT +0000). [12:00:30] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJcjF [12:00:31] [02miraheze/ssl] 07MirahezeSSLBot 03765780f - Bot: Update SSL cert for cyberlaw.ccdcoe.org [12:14:07] RECOVERY - cyberlaw.ccdcoe.org - LetsEncrypt on sslhost is OK: OK - Certificate 'cyberlaw.ccdcoe.org' will expire on Sun 18 Oct 2020 11:00:23 GMT +0000. [12:34:20] PROBLEM - cloud2 Current Load on cloud2 is WARNING: WARNING - load average: 15.14, 21.56, 16.27 [12:34:21] PROBLEM - cp7 Current Load on cp7 is CRITICAL: CRITICAL - load average: 4.73, 8.74, 5.38 [12:36:10] @System Administrators [0b27d8430956b39af4ffb220] 2020-07-20 12:35:19: Fatal exception of type "JobQueueError" trying to decline a wiki request [12:36:18] RECOVERY - cloud2 Current Load on cloud2 is OK: OK - load average: 12.77, 17.88, 15.51 [12:36:20] RECOVERY - cp7 Current Load on cp7 is OK: OK - load average: 1.96, 6.22, 4.86 [12:38:38] Tried it again, so seems to be more than a one-off error... "[c1b6ce924ec7ee40f0a47f16] 2020-07-20 12:38:06: Fatal exception of type "JobQueueError"" [12:38:58] Which request? [12:39:35] https://meta.miraheze.org/wiki/Special:RequestWikiQueue/13281#mw-section-decline pings @RhinosF1 and @paladox, in case they're just hidden [12:39:35] [ Wiki requests queue - Miraheze Meta ] - meta.miraheze.org [12:40:25] @Doug: What did you type? [12:40:25] PROBLEM - ta.gyaanipedia.co.in - DNS on sslhost is CRITICAL: DNS CRITICAL - expected '2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142' but got '2001:41d0:800:1056::2,51.77.107.210,51.89.160.142' [12:40:54] @RhinosF1 Same thing I usually type for most requests. It wasn't too long. It was the same length as my previous notes. [12:41:04] Exactly? [12:41:21] "Procedural decline to notify user in case they're not watching this wiki request to go back into [[Special:RequestWikiEdit]] for this request and define the wiki's purpose, scope, and type of content." [12:42:05] Every other time it goes through fine; plus, @paladox increased the character limit with that Phab ticket from last week. This is a different error than the last time. [12:42:25] RhinosF1: are you trying, or do want me to? [12:44:19] @RhinosF1 Just tried approving a request...same error, [3c36b21f9e3e057f62fcd1e1] 2020-07-20 12:43:49: Fatal exception of type "JobQueueError" [12:44:45] So problem occurs with creating wikis too, not just declining them. [12:45:08] I just tried declining a different request, same error [12:45:43] @Sario Yeah, I tried approving/creating the ToastyMC one with the above error. [12:45:44] I think it's time for a phab ticket [12:46:12] 2020-07-20 12:44:13 mw6 metawiki: [6e724fc4e36bcfc852763030] /wiki/Special:RequestWikiQueue/13281 JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: Could not insert 1 EchoNotificationDeleteJob job(s). [12:46:45] RhinosF1: need me or Doug to write a phab ticket? [12:47:05] RECOVERY - ta.gyaanipedia.co.in - DNS on sslhost is OK: DNS OK: 0.037 seconds response time. sslhost returns 2001:41d0:800:1056::2,2001:41d0:800:105a::10,51.77.107.210,51.89.160.142 [12:47:28] no [12:47:30] i on it [12:47:38] ack [12:47:48] 1983 entries in the log with "Redis server error" [12:47:56] I think we can guess the cause [12:48:01] Oof. Puppet? [12:48:06] Redis [12:48:20] sorry not familiar with redis server [12:48:25] Subscribe me on the ticket please [12:48:43] calls this UBN [12:48:51] UBN? [12:49:03] Unbreak now! [12:49:16] LOL [12:49:42] Highest priority [12:49:49] Should we take CreateWiki down for maintenance until this is fixed? [12:49:50] And I agree [12:50:16] We should probably stop trying to create or decline for now [12:50:40] yes, but with 1,983 log entries, I don't think those were all wiki creators. [12:51:00] I wonder what the other ones were from. People editing their requests and not being able to? [12:51:51] Echo [12:52:53] Ah, thanks. [12:53:16] What's the Phabricator ticket? I don't see it in my subscriptions. Did you ping me in the details? [12:53:19] https://phabricator.miraheze.org/T5939 [12:53:20] [ ⚓ T5939 Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report er ] - phabricator.miraheze.org [12:53:26] > https://phabricator.miraheze.org/T5939 @RhinosF1 thanks [12:53:27] [ ⚓ T5939 Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report er ] - phabricator.miraheze.org [12:54:35] @RhinosF1 Looks good. 🙂 [12:54:38] paging @Site Reliability Engineers [12:54:56] paladox, SPF|Cloud, JohnLewis: ^ [12:56:53] Ooh, @NDKilla's online. Pinging him just in case he's around. [12:57:40] the sre ping should have done it [12:57:48] true [12:58:24] !log Mediawiki & Redis logs are flooded with errors related to https://phabricator.miraheze.org/T5939, cause currently unknown. [12:58:26] [ ⚓ T5939 Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report er ] - phabricator.miraheze.org [12:58:28] No need yo spam pings [12:58:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:58:41] s/yo/to [12:58:41] Sario meant to say: No need to spam pings [12:59:17] I get a lot of mileage out of that function [13:02:43] 2020-07-20 12:43:49 mw6 metawiki: [3c36b21f9e3e057f62fcd1e1] /wiki/Special:RequestWikiQueue/13282 JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: Could not insert 1 NamespaceMigrationJob job(s). [13:03:06] paladox: between 2 mw servers there's about 4000 variations of it [13:04:13] 1195:C 20 Jul 2020 13:03:29.017 # Failed opening the RDB file dump.rdb (in server root dir /srv/redis) for saving: Read-only file system [13:05:23] paladox: since and why? [13:05:51] a chmod issue? [13:06:21] since 06:12:23 [13:06:53] ok [13:07:44] PROBLEM - jobrunner1 Redis Process on jobrunner1 is CRITICAL: PROCS CRITICAL: 4 processes with args 'redis-server' [13:09:44] RECOVERY - jobrunner1 Redis Process on jobrunner1 is OK: PROCS OK: 1 process with args 'redis-server' [13:10:25] !log killed redis-server and started up (had to kill -9 it as it was just hanging on sudo service redis-server stop) [13:10:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:10:35] on jobrunner1 [13:10:43] ah [13:11:32] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2646 MB (10% inode=93%); [13:11:34] paladox: is that working now? [13:12:01] logs indicate yes [13:12:50] paladox: how do we recover all the failed jobs? [13:12:50] yep, seems to be working for creating and declining wikis [13:13:10] That i'm not sure about [13:13:16] Logs indicate that various RecentChanges + refreshLinks jobs failed [13:13:35] If they couldn't be inserted then it would be impossible to get them (i think). [13:13:36] Good question; I wonder how many were FuzzyBot jobs. [13:13:49] paladox: could we maybe run rebuildall.php on all wikis although that might break a lot for a while [13:13:58] No we cannot [13:14:01] that would take weeks [13:14:24] and also as you said would break wikis [13:14:46] (i take back my weeks comment, it would take maybe a month or 2) [13:14:53] lol [13:14:57] paladox: how else we going to rebuild rc + links, they've not been refreshed on any wiki with edits since 6am [13:15:30] JohnLewis ^ (should we run rebuildAll on all wikis?) [13:15:35] @Doug: They don't get rebuilt. [13:15:43] @RhinosF1 Will the edits since 6 am just show up as patrolled edits, or what would be the impact in this case? [13:16:02] paladox: is there a way to work out any wiki that has had an edit since 6am? [13:16:19] I would imagine that would take quite a while [13:16:58] [02mw-config] 07Amanda-Catherine opened pull request 03#3184: Remove 'skin' from wgHiddenPrefs on dcmultiversewiki T5938 - 13https://git.io/JJCUH [13:17:23] @Doug: I'm not sure how I just see that there's a lot of "Could not insert 1 recentChangesUpdate job(s). " [13:18:10] > @Doug: I'm not sure how I just see that there's a lot of "Could not insert 1 recentChangesUpdate job(s). > " @RhinosF1 Ah, interesting. What are some of the wiki db names related to that error? I could check out the front end to see the way in which it's been impacted. [13:18:22] paladox: we could grep the exception logs for recentChangesUpdate and refreshLinksDynamic and get the line it's on [13:18:31] @Doug: Any wiki with an edit since 6am [13:19:09] then we'd have to take that list and me build a fancy script to go and parse it for db names [13:19:18] >  @Doug: Any wiki with an edit since 6am Which log is that? Is that log in a web-accessible folder? something cleaner than #wiki-feed [13:19:48] we don't have a list yet [13:19:52] ah [13:20:31] https://phabricator.miraheze.org/P332 [13:20:32] [ ✎ P332 (An Untitled Masterwork) ] - phabricator.miraheze.org [13:20:49] paladox: it was on every mw [13:20:59] I like that idea; doesn't sound like it would take long to write the fancy script....not sure how long it would take to extract the data from the exception logs [13:21:07] yes i know that [13:21:10] @Doug: Nearly done [13:21:39] if we strip the colon and dupes from P332 when every mw is added then we should have the list of affected wikis [13:24:08] paladox: update that paste with the list from both jobs on each mw and then I can strip dupes [13:24:55] RhinosF1 done [13:30:52] "Read-only file system" - are we sure that was resolved? File systems don't generally just go in and out of read-only on their own [13:31:44] @Void i mean i could write on the disk, searching the error brought me to https://stackoverflow.com/questions/44814351/failed-opening-the-rdb-file-read-only-file-system but that's set in the unit. [13:31:44] [ linux - Failed opening the RDB file ... Read-only file system - Stack Overflow ] - stackoverflow.com [13:31:54] The logs indicate it's saving sucessfully now [13:32:02] after killing redis-server with kill -9 [13:32:33] > "Read-only file system" - are we sure that was resolved? File systems don't generally just go in and out of read-only on their own @Void Yeah, wondered about that, and also why the redis server/service got hung like that. Something had to have caused it. wonders if this will be the type of thing there will be a sysadmin post-mortem meeting on [13:33:08] !log root@jobrunner1:/home/paladox# /usr/local/bin/foreachwikiindblist /srv/mediawiki/w/cache/databases.json /srv/mediawiki/w/maintenance/runJobs.php [13:33:12] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:33:42] Hmm, well if you can write files, then it's probably fine, but finding out what caused it to go read-only would be a very good idea [13:35:08] paladox: generating the list now [13:35:24] ok [13:36:27] The last time I had an issue with a read-only file system, though, the cause couldn't really be identified, but the VHD had been corrupted and needed "repairs" [13:36:54] something caused it to "Received SIGTERM scheduling shutdown..." at 06:11: [13:37:08] *06:11:21 [13:38:04] found it [13:38:56] Oh? [13:38:59] was there a redis upgrade... [13:39:29] redis was upgraded [13:39:31] that's why [13:39:45] the unit from redis package is broken hence why we do our own [13:39:45] paladox: https://phabricator.miraheze.org/P332 [13:39:46] [ Login ] - phabricator.miraheze.org [13:39:58] That seems... Undesirable [13:39:58] 180 affected wikis [13:40:21] att is huge [13:40:36] i would like to hear JohnLewis opinion on running rebuildAll on all those wikis. [13:40:42] @paladox, att? learning acronyms [13:40:49] allthetropes [13:40:55] ah [13:41:01] thanks 🙂 [13:41:26] didn't RhinosF1 post something above about JohnLewis suggesting rebuildAll? [13:41:50] no [13:41:57] I suggested rebuildall [13:42:13] yeah, but I thought I saw something from John concurring with you [13:42:35] paladox: would running rebuild rc and rebuild links individually be better? [13:42:43] JohnLewis ^ (should we run rebuildAll on all wikis?) [13:42:44] yeh [13:42:59] paladox: we could do that as they are two affected parts [13:44:29] [02dns] 07MacFan4000 opened pull request 03#168: update phab endpoint - 13https://git.io/JJCkZ [13:44:44] PROBLEM - test2 APT on test2 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [13:45:15] Also mind documenting exactly what happened on the task? Would be a good idea to keep records of this kind of stuff in a better place than just IRC/Discord [13:45:28] paladox: can you merge dns? [13:45:28] (or incident report) [13:45:32] hi AmandaCath [13:45:46] !log root@jobrunner1:/home/paladox# ./foreachwikiindblist /home/paladox/all.dblist /srv/mediawiki/w/maintenance/rebuildrecentchanges.php [13:45:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:45:52] * AmandaCath waves to RhinosF1 [13:46:11] [02dns] 07paladox closed pull request 03#168: update phab endpoint - 13https://git.io/JJCkZ [13:46:11] paladox: why all.dblist when you have a list of only affected wikis? [13:46:13] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCkB [13:46:14] [02miraheze/dns] 07MacFan4000 03688ea3f - update phab endpoint (#168) [13:46:22] why do 4000 when you can do 180 [13:47:13] [02mw-config] 07paladox closed pull request 03#3184: Remove 'skin' from wgHiddenPrefs on dcmultiversewiki T5938 - 13https://git.io/JJCUH [13:47:15] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCkE [13:47:16] [02miraheze/mw-config] 07Amanda-Catherine 03f07a57f - Remove 'skin' from wgHiddenPrefs on dcmultiversewiki T5938 (#3184) [13:47:26] it's not 4000... [13:48:19] oh [13:48:24] I saw all.dblist [13:48:31] not /home/paladox [13:49:49] PROBLEM - cp9 APT on cp9 is CRITICAL: APT CRITICAL: 37 packages available for upgrade (1 critical updates). [13:51:26] https://www.irccloud.com/pastebin/x0wWCVYs/ [13:51:27] [ Snippet | IRCCloud ] - www.irccloud.com [13:52:06] paladox: them 7 wikis all missed categoryMembershipChange - could you please rebuild categories for them [13:52:17] ok [13:52:47] PROBLEM - rdb1 APT on rdb1 is CRITICAL: APT CRITICAL: 3 packages available for upgrade (2 critical updates). [13:52:49] paladox: could you check via salt for grep "categoryMembershipChange" exception.log to check other mw's [13:52:51] PROBLEM - mw4 Current Load on mw4 is WARNING: WARNING - load average: 7.87, 6.85, 5.79 [13:53:33] ^ gluster and php-fpm are higest [13:53:39] i don't see a script for rebuilding categories [13:54:03] paladox: grep on all mw's to check there's none on any but mw6 while I find it [13:54:44] RhinosF1 i don't see a script for rebuilding categories [13:55:07] paladox: https://www.mediawiki.org/wiki/Manual:RecountCategories.php, Will you please get a list from mw's not mw6 [13:55:07] [ Manual:recountCategories.php - MediaWiki ] - www.mediawiki.org [13:55:11] oh [13:56:20] PROBLEM - mw7 Puppet on mw7 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [13:56:44] categoryMembershipChange,userGroupExpiry,userOptionsUpdate [13:56:45] RhinosF1 https://phabricator.miraheze.org/P333 [13:56:46] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 5.80, 6.67, 5.99 [13:56:46] [ ✎ P333 (An Untitled Masterwork) ] - phabricator.miraheze.org [13:57:19] paladox: them 3 jobs I care about failing [13:58:33] https://www.irccloud.com/pastebin/DeDln2i9/ [13:58:33] [ Snippet | IRCCloud ] - www.irccloud.com [13:58:54] paladox: ^ that's for category, could you get a count from userGroupExpiry and userOptionsUpdate [14:01:02] PROBLEM - cp3 Stunnel Http for mw4 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:01:11] PROBLEM - cp7 Stunnel Http for mw4 on cp7 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:01:59] !log depool and repool mw4 [14:02:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:02:05] paladox: I think gluester and php just died [14:02:07] !log reboot mw4 too [14:02:08] PROBLEM - cp9 Varnish Backends on cp9 is CRITICAL: 1 backends are down. mw4 [14:02:08] PROBLEM - cp9 Stunnel Http for mw4 on cp9 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:02:09] PROBLEM - cp6 Varnish Backends on cp6 is CRITICAL: 1 backends are down. mw4 [14:02:09] yup [14:02:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:02:14] PROBLEM - cp6 Stunnel Http for mw4 on cp6 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:02:20] RECOVERY - mw7 Puppet on mw7 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [14:02:25] PROBLEM - mw4 HTTPS on mw4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:02:35] PROBLEM - cp7 Varnish Backends on cp7 is CRITICAL: 1 backends are down. mw4 [14:02:58] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw4 [14:03:31] @RhinosF1 Yeah, not sure if this was related, but I timed out a couple times trying to load Meta...third time refreshing, it loaded. Could be a cache proxy issue, though. [14:03:43] it's related [14:04:19] PROBLEM - mw6 Puppet on mw6 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/mnt/mediawiki-static] [14:04:25] paladox: CentralAuthCreateLocalAccountJob [14:04:37] PROBLEM - mw4 Puppet on mw4 is CRITICAL: connect to address 51.89.160.128 port 5666: Connection refusedconnect to host 51.89.160.128 port 5666: Connection refused [14:04:44] PROBLEM - mw4 Current Load on mw4 is CRITICAL: connect to address 51.89.160.128 port 5666: Connection refusedconnect to host 51.89.160.128 port 5666: Connection refused [14:05:10] PROBLEM - mw4 NTP time on mw4 is CRITICAL: connect to address 51.89.160.128 port 5666: Connection refusedconnect to host 51.89.160.128 port 5666: Connection refused [14:05:13] PROBLEM - mw4 APT on mw4 is CRITICAL: connect to address 51.89.160.128 port 5666: Connection refusedconnect to host 51.89.160.128 port 5666: Connection refused [14:05:58] PROBLEM - mw4 Disk Space on mw4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:06:08] PROBLEM - mw4 SSH on mw4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:06:19] PROBLEM - mw4 php-fpm on mw4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:06:38] that sounds like the function that attaches your account to a wiki when you visit it the first time [14:06:44] yep [14:09:18] RECOVERY - cp7 Stunnel Http for mw4 on cp7 is OK: HTTP OK: HTTP/1.1 200 OK - 15595 bytes in 0.016 second response time [14:09:37] RECOVERY - cp3 Stunnel Http for mw4 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 15595 bytes in 1.036 second response time [14:10:04] RECOVERY - cp9 Varnish Backends on cp9 is OK: All 7 backends are healthy [14:10:06] RECOVERY - cp6 Varnish Backends on cp6 is OK: All 7 backends are healthy [14:10:18] RECOVERY - cp9 Stunnel Http for mw4 on cp9 is OK: HTTP OK: HTTP/1.1 200 OK - 15595 bytes in 1.513 second response time [14:10:28] RECOVERY - cp6 Stunnel Http for mw4 on cp6 is OK: HTTP OK: HTTP/1.1 200 OK - 15609 bytes in 0.018 second response time [14:10:39] RECOVERY - cp7 Varnish Backends on cp7 is OK: All 7 backends are healthy [14:10:56] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 7 backends are healthy [14:14:18] RECOVERY - mw6 Puppet on mw6 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:39:19] !log root@jobrunner1:/home/paladox# ./foreachwikiindblist /home/paladox/7wikis.dblist /srv/mediawiki/w/maintenance/recountCategories.php --mode=pages [14:39:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:39:40] !log root@jobrunner1:/home/paladox# ./foreachwikiindblist /home/paladox/7wikis.dblist /srv/mediawiki/w/maintenance/recountCategories.php --mode=subcats [14:39:43] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:39:46] !log root@jobrunner1:/home/paladox# ./foreachwikiindblist /home/paladox/7wikis.dblist /srv/mediawiki/w/maintenance/recountCategories.php --mode=files [14:39:47] RhinosF1 ^ [14:39:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:40:15] paladox: ty [14:50:20] I had an inkling something may be wrong before I encountered that first fatal exception error when declining a request...in my "Watchlist," I couldn't get the pages to stay unbolded after visiting them. Now, the pages seem to stay unbolded. Probably was related to those job queue errors. [14:57:59] just noticed something odd in [[Special:RecentChanges]] on Meta, her user talk page needed patrolling except she's autopatrolled and never lost it so it's not related to that RC bug [14:59:02] Oh god, now a whole bunch of revisions I just patrolled need patrolling again [14:59:09] @System Administrators ^ [15:00:25] @Doug: It's because we're fixing the broken jobs [15:00:28] Just wait [15:00:32] It'll fix itself [15:00:38] oh okay [15:00:41] thanks [15:03:19] You don't need to manually do them [15:04:43] !log root@jobrunner1:/home/paladox# ./foreachwikiindblist /home/paladox/all.dblist /srv/mediawiki/w/maintenance/refreshLinks.php [15:04:47] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:06:36] RECOVERY - mw4 Disk Space on mw4 is OK: DISK OK - free space: / 6124 MB (32% inode=67%); [15:06:36] RECOVERY - mw4 HTTPS on mw4 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 557 bytes in 0.055 second response time [15:06:36] RECOVERY - mw4 php-fpm on mw4 is OK: PROCS OK: 27 processes with command name 'php-fpm7.3' [15:06:36] RECOVERY - mw4 SSH on mw4 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) [15:06:41] RECOVERY - mw4 Puppet on mw4 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [15:06:46] RECOVERY - mw4 Current Load on mw4 is OK: OK - load average: 2.12, 2.32, 2.21 [15:07:16] RECOVERY - mw4 NTP time on mw4 is OK: NTP OK: Offset -0.00661033392 secs [15:07:16] RECOVERY - mw4 APT on mw4 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [15:12:24] [02miraheze/puppet] 07paladox pushed 031 commit to 03paladox-patch-11 [+0/-0/±1] 13https://git.io/JJCmc [15:12:25] [02miraheze/puppet] 07paladox 031f9485d - base: Do not run unintended upgrades for redis Redis broke because we use a different systemd unit (e.g it overwrote our unit). [15:12:26] [02puppet] 07paladox created branch 03paladox-patch-11 - 13https://git.io/vbiAS [15:12:28] [02puppet] 07paladox opened pull request 03#1456: base: Do not run unintended upgrades for redis - 13https://git.io/JJCmC [15:12:41] [02puppet] 07paladox closed pull request 03#1456: base: Do not run unintended upgrades for redis - 13https://git.io/JJCmC [15:12:42] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCmW [15:12:44] [02miraheze/puppet] 07paladox 0310e5bc1 - base: Do not run unintended upgrades for redis (#1456) Redis broke because we use a different systemd unit (e.g it overwrote our unit). [15:12:45] [02puppet] 07paladox deleted branch 03paladox-patch-11 - 13https://git.io/vbiAS [15:12:47] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-11 [15:14:47] [02puppet] 07Pix1234 opened pull request 03#1457: Add a brand new generated ssh key for mobile access - 13https://git.io/JJCmz [15:15:10] [02puppet] 07paladox closed pull request 03#1457: Add a brand new generated ssh key for mobile access - 13https://git.io/JJCmz [15:15:11] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCmV [15:15:13] [02miraheze/puppet] 07Pix1234 03b45315c - Add a brand new generated ssh key for mobile access (#1457) [15:17:21] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCmX [15:17:22] [02miraheze/puppet] 07paladox 0390fd317 - matomo: Upgrade to 3.14.0 [15:18:20] PROBLEM - jobrunner1 Current Load on jobrunner1 is CRITICAL: CRITICAL - load average: 12.30, 8.37, 5.55 [15:21:59] PROBLEM - mon1 Puppet on mon1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_checkout_matomo] [15:22:16] paladox: ^ [15:22:23] yeh [15:22:55] I assume that means you know [15:23:59] RECOVERY - mon1 Puppet on mon1 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [15:25:42] well yeh [15:26:06] And given icinga-miraheze it's now fine [15:30:26] !log install gdb on mw4 [15:30:29] Ping me when the rebuilding is finished and I can check Meta for unpatrols. [15:30:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:31:03] paladox: ^ [15:31:44] it's done meta [15:33:11] @Doug: meta is good [15:39:05] @RhinosF1, okay, thanks 🙂 [15:40:10] Thanks for working on Miraheze 👍 [15:41:07] RECOVERY - espiri.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'espiri.wiki' will expire on Sun 27 Sep 2020 22:04:35 GMT +0000. [15:41:08] @RhinosF1 and @paladox, well sort of, there's still a boatload of unpatrolled revs on CN and SN I patrolled days ago. [15:41:22] Hmmm [15:46:20] PROBLEM - jobrunner1 Current Load on jobrunner1 is WARNING: WARNING - load average: 4.02, 6.23, 7.70 [15:47:00] Back to the 12th now and still have revisions to repatrol [15:47:11] on CN [15:48:26] can't temporarily add autopatrol to Meta:Users, either, because once we remove autopatrol again, the revs will be unpatrolled due to the deployed extension bug I need to get on reporting [15:48:38] is not sure what the solution is [15:50:33] PROBLEM - espiri.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:51:07] maybe a 5-10 minute shutdown and manually edit the database(s) to mark all unpatrolled revisions as patrolled? Any high traffic pages will be monitored by multiple people, so locally they can revert or adjust any mass patrolled revisions. [15:52:20] RECOVERY - jobrunner1 Current Load on jobrunner1 is OK: OK - load average: 4.40, 4.83, 6.56 [15:52:30] I don't get what's going on with your patrol bugs [15:56:15] Going back through Community noticeboard to find out how far back I have to go to get an unpatrolled user whose revision I don't need to patrol. [15:56:52] I think it's two bugs, but they're related. One seems to be related to this RC job queue thing and the other one is the one I've told you about [16:01:31] I've run into a couple of bugs when patrolling as well [16:01:40] Regarding the more pressing bug, how does this job queue/jobrunner thing tie in with the redis-server? [16:01:55] I'm using the gadget that adds the "Mark as patrolled" link to Special:RecentChanges and it doesn't always work [16:02:42] @AmandaCath, ah, okay, I'm not using that. This is just going back through the previous diffs and marking as patrolled. [16:03:17] Yeah, that's what I've had to do when the gadget fails [16:03:29] Usual I have no problem with that though [16:03:34] Usually* [16:09:37] yeah, I'm tempted to report this patrol bug as UBN; I normally wouldn't, but the one about a former administrator's or autopatroller's revisions made while administrator/autopatrolled has been outstanding, seemingly, forever. It doesn't happen on Wikimedia-owned wikis, so I'm sure it's a configuration issue with the way the extension is configured in MediaWiki [16:13:52] It's not UBN [16:14:22] which one? the main patrol bug, or this patrol bug related to the rebuilding? [16:15:10] okay I went back and repatrolled the revisions to June 18th, all revisions previously patrolled, and I'm still getting unpatrolled revisions that have been previously patrolled that are again unpatrolled. [16:15:36] looks like today is going to be a big Phab bug filing task day [16:16:40] Either [16:16:47] The rebuilding should be done [16:16:54] It makes no sense that that caused it [16:21:35] !log set i/o threads at 9 on gluster [16:21:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:43:23] PROBLEM - cloud2 Current Load on cloud2 is CRITICAL: CRITICAL - load average: 25.09, 21.02, 16.57 [16:45:14] RECOVERY - gluster2 GlusterFS port 49152 on gluster2 is OK: TCP OK - 0.000 second response time on 51.68.201.37 port 49152 [16:45:23] RECOVERY - cloud2 Current Load on cloud2 is OK: OK - load average: 18.52, 19.31, 16.45 [16:47:11] paladox: https://phabricator.miraheze.org/T5941 [16:47:13] [ ⚓ T5941 Error in to upload images... ] - phabricator.miraheze.org [16:48:54] !log restarted gluster on glusterfs2 [16:48:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:50:27] PROBLEM - wiki.xysspon.com - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.xysspon.com' expires in 15 day(s) (Wed 05 Aug 2020 16:45:23 GMT +0000). [16:56:56] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCZt [16:56:57] [02miraheze/ssl] 07MirahezeSSLBot 038eb02ee - Bot: Update SSL cert for wiki.xysspon.com [17:04:04] RECOVERY - wiki.xysspon.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.xysspon.com' will expire on Sun 18 Oct 2020 15:56:48 GMT +0000. [17:06:04] >  It makes no sense that that caused it @RhinosF1 But you did say there were problems with RecentChanges in the error logs; I still think it would be worthwhile to file a separate bug, as, though it may be somewhat related, this newer bug is related to regular users who had their revisions already manually patrolled become unpatrolled again [17:07:06] paladox: has all the scripts on every wiki finished? [17:34:21] + done one more [17:37:12] !log set i/o threads to 12 on gluster [17:37:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:37:20] I have many emails with your name on @Doug [17:37:32] <-CloudGuy38-> What is happening to Void [17:38:52] >  I have many emails with your name on @Doug @RhinosF1 Sorry. I disabled e-mails for Phabricator; I can appreciate you get a lot of notifications. [17:39:11] I do get a lot [17:39:32] > What is happening to Void @-CloudGuy38- What do you mean? [17:40:06] <-CloudGuy38-> Void today [17:40:10] @Doug: can you see if any other wiki on https://phabricator.miraheze.org/P332 has the same issue? [17:40:11] [ Login ] - phabricator.miraheze.org [17:40:19] As the patrol breaking bug [17:40:37] I can nosy in the db later [17:40:55] >  @Doug: can you see if any other wiki on https://phabricator.miraheze.org/P332 has the same issue? @RhinosF1 Sure. I don't know if they've been already patrolled, but should be easy to tell with duplicate patrol log entries for the same revision [17:40:56] [ Login ] - phabricator.miraheze.org [17:41:12] !log depool mw4 [17:41:17] You'll have to check manually [17:41:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:41:21] @-CloudGuy38- Not sure. I assume @Void has a full-time job and is busy with that? [17:41:25] paladox: why? [17:43:17] !log repool mw4 [17:43:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:43:23] RhinosF1 because the mount was broken [17:43:38] paladox: oh [17:43:51] * RhinosF1 grrrrs at both mw4 and gluster [17:44:30] <-CloudGuy38-> I think a full time job, and busy. [17:44:44] <-CloudGuy38-> Also it's under a month before my 15th birthday [17:45:02] @RhinosF1 can you unrestrict that Paste for me? [17:45:12] @Doug: link me to a revision that's still broke [17:45:13] (or @paladox) [17:45:21] And that would help wouldn't it if you could see it [17:45:29] https://phabricator.miraheze.org/P332 [17:45:30] [ Login ] - phabricator.miraheze.org [17:45:37] I'll check later and do a sanity check on the tables [17:45:59] Something's wrong and I'm hoping it's something obvious [17:46:02] >  I'll check later and do a sanity check on the tables @RhinosF1 Okay, sounds good, and I will check some of those other wikis. [17:46:15] Me too. I feel like this could be a hard to solve bug. 😦 [17:46:56] !log root@gluster1:/mnt# gluster volume set mvol performance.open-behind on [17:46:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:47:00] If it's what I hope it is, hopefully not [17:47:15] If it's not, I'm going to slam my head against my pillow tonight [17:47:20] >  If it's what I hope it is, hopefully not Oh, that's good. crosses fingers [17:47:42] It's whether next it affected all 180 wikis or just meta [17:48:02] Yeah. [17:48:55] I love redis at times [17:49:01] * RhinosF1 sarcastic [17:50:42] haha [17:50:59] [02miraheze/MatomoAnalytics] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCcK [17:51:01] [02miraheze/MatomoAnalytics] 07translatewiki 035530320 - Localisation updates from https://translatewiki.net. [17:51:02] [ Main page - translatewiki.net ] - translatewiki.net [17:51:02] [02miraheze/CreateWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCc6 [17:51:04] [02miraheze/CreateWiki] 07translatewiki 0349bbb33 - Localisation updates from https://translatewiki.net. [17:51:05] [ Main page - translatewiki.net ] - translatewiki.net [17:51:05] [02miraheze/MirahezeMagic] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCci [17:51:07] [02miraheze/MirahezeMagic] 07translatewiki 034da7435 - Localisation updates from https://translatewiki.net. [17:51:08] [ Main page - translatewiki.net ] - translatewiki.net [17:51:08] [02miraheze/WikiDiscover] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCcP [17:51:10] [02miraheze/WikiDiscover] 07translatewiki 03ecdf93d - Localisation updates from https://translatewiki.net. [17:51:11] [ Main page - translatewiki.net ] - translatewiki.net [17:51:21] It's got a lovely habit of being an utter pain in the neck when it fails [17:59:23] PROBLEM - mw7 APT on mw7 is CRITICAL: APT CRITICAL: 2 packages available for upgrade (1 critical updates). [18:26:17] [02miraheze/MatomoAnalytics] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCC9 [18:26:19] [02miraheze/MatomoAnalytics] 07paladox 038579422 - Matomo: Update javascript Matches changes done in matomo 3.14.0: https://developer.matomo.org/guides/tracking-javascript-guide [18:26:19] [ JavaScript Tracking Client: Integrate - Matomo Analytics (formerly Piwik Analytics) - Developer Docs - v4 ] - developer.matomo.org [18:26:59] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_34 [+0/-0/±1] 13https://git.io/JJCCF [18:27:01] [02miraheze/mediawiki] 07paladox 0340ff75d - Update MatomoAnalytics [18:38:25] PROBLEM - cloud2 Current Load on cloud2 is WARNING: WARNING - load average: 23.09, 17.22, 14.86 [18:40:16] PROBLEM - mw5 Puppet on mw5 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [18:40:19] PROBLEM - mw6 Puppet on mw6 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [18:40:20] PROBLEM - mw7 Puppet on mw7 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [18:40:23] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJCWR [18:40:23] RECOVERY - cloud2 Current Load on cloud2 is OK: OK - load average: 16.03, 16.04, 14.69 [18:40:24] [02miraheze/services] 07MirahezeSSLBot 03f0c922c - BOT: Updating services config for wikis [18:41:22] PROBLEM - jobrunner1 Puppet on jobrunner1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [18:41:40] PROBLEM - mw4 Puppet on mw4 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [18:43:21] RECOVERY - jobrunner1 Puppet on jobrunner1 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:43:39] RECOVERY - mw4 Puppet on mw4 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:44:16] RECOVERY - mw5 Puppet on mw5 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:44:20] RECOVERY - mw6 Puppet on mw6 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:44:21] RECOVERY - mw7 Puppet on mw7 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:46:21] PROBLEM - cloud2 Current Load on cloud2 is WARNING: WARNING - load average: 19.45, 21.12, 17.53 [18:48:20] RECOVERY - cloud2 Current Load on cloud2 is OK: OK - load average: 10.74, 17.78, 16.75 [18:50:15] [02MatomoAnalytics] 07paladox created branch 03paladox-patch-1 - 13https://git.io/fN4LT [18:50:28] [02MatomoAnalytics] 07paladox opened pull request 03#27: Bump version to 1.0.4 - 13https://git.io/JJClv [18:51:20] [02miraheze/MatomoAnalytics] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JJClU [18:51:22] [02miraheze/MatomoAnalytics] 07paladox 03f51c408 - Update CHANGELOG [18:52:19] PROBLEM - mon1 Disk Space on mon1 is WARNING: DISK WARNING - free space: / 3906 MB (10% inode=93%); [18:53:35] [02MatomoAnalytics] 07paladox synchronize pull request 03#27: Bump version to 1.0.4 - 13https://git.io/JJClv [18:58:39] !log increase mon1 by 10gb [18:58:45] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:59:10] [02miraheze/MatomoAnalytics] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/JJClq [18:59:12] [02miraheze/MatomoAnalytics] 07paladox 037e20714 - Bump version to 1.0.4 [19:00:19] RECOVERY - mon1 Disk Space on mon1 is OK: DISK OK - free space: / 13117 MB (29% inode=94%); [19:02:00] [02MatomoAnalytics] 07paladox closed pull request 03#27: Bump version to 1.0.4 - 13https://git.io/JJClv [19:02:02] [02miraheze/MatomoAnalytics] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/JJCls [19:02:03] [02miraheze/MatomoAnalytics] 07paladox 0367ccaea - Bump version to 1.0.4 (#27) * Bump version to 1.0.4 * Update CHANGELOG [19:02:05] [02miraheze/MatomoAnalytics] 07paladox deleted branch 03paladox-patch-1 [19:02:21] [02MatomoAnalytics] 07paladox deleted branch 03paladox-patch-1 - 13https://git.io/fN4LT [19:41:35] PROBLEM - cp7 Current Load on cp7 is CRITICAL: CRITICAL - load average: 15.68, 10.19, 5.35 [19:41:50] PROBLEM - cloud2 Current Load on cloud2 is CRITICAL: CRITICAL - load average: 27.24, 20.90, 16.35 [19:43:52] RECOVERY - cloud2 Current Load on cloud2 is OK: OK - load average: 16.55, 19.92, 16.60 [19:45:28] PROBLEM - cp7 Current Load on cp7 is WARNING: WARNING - load average: 6.06, 7.58, 5.30 [19:47:24] RECOVERY - cp7 Current Load on cp7 is OK: OK - load average: 3.02, 5.98, 4.99 [19:52:33] !log root@jobrunner1:/srv/mediawiki/w/extensions/MatomoAnalytics/maintenance# sudo -u www-data php modifyMatomo.php --wiki loginwiki [19:52:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:54:52] !log root@jobrunner1:/srv/mediawiki/w/extensions/MatomoAnalytics/maintenance# sudo -u www-data php modifyMatomo.php --wiki test2wiki [19:54:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:55:22] !log MariaDB [mhglobal]> delete from matomo where matomo_wiki = 'test1wiki'; - db11 [19:55:28] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:10:45] PROBLEM - cp7 Current Load on cp7 is CRITICAL: CRITICAL - load average: 5.16, 8.54, 5.85 [20:12:43] RECOVERY - cp7 Current Load on cp7 is OK: OK - load average: 1.68, 6.09, 5.26 [21:00:08] PROBLEM - fr.gyaanipedia.co.in - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'fr.gyaanipedia.co.in' expires in 15 day(s) (Wed 05 Aug 2020 20:51:22 GMT +0000). [21:07:54] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JJC0o [21:07:56] [02miraheze/ssl] 07MirahezeSSLBot 0329bfa7c - Bot: Update SSL cert for fr.gyaanipedia.co.in [21:13:57] RECOVERY - fr.gyaanipedia.co.in - LetsEncrypt on sslhost is OK: OK - Certificate 'fr.gyaanipedia.co.in' will expire on Sun 18 Oct 2020 20:07:48 GMT +0000.