[00:03:12] PROBLEM - GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 172.104.111.8/cpweb [00:03:42] PROBLEM - Current Load on swift1 is CRITICAL: CRITICAL - load average: 4.21, 3.04, 2.21 [00:04:57] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNmpR [00:04:59] [02miraheze/puppet] 07paladox 03ef6e3b8 - Update account-server.conf.erb [00:05:05] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNmpE [00:05:07] [02miraheze/puppet] 07paladox 039e4692e - Update account-server.conf.erb [00:05:10] RECOVERY - GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [00:05:15] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNmpu [00:05:16] [02miraheze/puppet] 07paladox 034d8f313 - Update container-server.conf.erb [00:05:24] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNmpz [00:05:26] [02miraheze/puppet] 07paladox 03019f0eb - Update object-server.conf.erb [00:05:40] RECOVERY - Current Load on swift1 is OK: OK - load average: 1.79, 2.57, 2.14 [00:08:12] !log upgrading swift1 and swift2 to stretch 9.5 point release (and a reboot) [00:08:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [00:11:41] RECOVERY - Puppet on bacula1 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:12:30] !log upgrade misc4 to stretch 9.5 and reboot [00:12:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [00:14:51] !log upgrade arcanist on misc4 [00:14:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [00:17:31] PROBLEM - Puppet on misc4 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php7.2-gettext] [00:19:31] RECOVERY - Puppet on misc4 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [00:39:30] PROBLEM - Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [00:45:31] RECOVERY - Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:55:41] PROBLEM - Disk Space on bacula1 is WARNING: DISK WARNING - free space: / 99175 MB (20% inode=99%): [02:14:52] PROBLEM - Puppet on cp2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[ufw-allow-tcp-from-any-to-any-port-80] [02:22:51] RECOVERY - Puppet on cp2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:49:40] PROBLEM - Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw1 [02:49:50] PROBLEM - Varnish Backends on cp5 is CRITICAL: 1 backends are down. mw1 [02:51:40] RECOVERY - Varnish Backends on cp4 is OK: All 3 backends are healthy [02:51:51] RECOVERY - Varnish Backends on cp5 is OK: All 3 backends are healthy [02:59:31] RECOVERY - Bacula - Databases - db4 on bacula1 is OK: OK: Diff, 293946 files, 49.86GB, 2018-07-15 02:57:00 (2.4 minutes ago) [03:51:42] PROBLEM - Disk Space on bacula1 is CRITICAL: DISK CRITICAL - free space: / 52208 MB (10% inode=99%): [06:44:51] PROBLEM - JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 2660 [07:03:12] PROBLEM - GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [07:04:52] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 44% [07:05:12] RECOVERY - GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [07:05:40] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp5 is WARNING: WARNING - NGINX Error Rate is 43% [07:06:50] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 5% [07:07:40] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp5 is OK: OK - NGINX Error Rate is 27% [07:31:51] PROBLEM - Current Load on misc4 is CRITICAL: CRITICAL - load average: 4.35, 2.63, 1.32 [07:35:52] RECOVERY - Current Load on misc4 is OK: OK - load average: 1.20, 2.29, 1.52 [07:38:50] RECOVERY - JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [07:54:51] PROBLEM - JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 2727 [08:50:51] RECOVERY - JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [08:55:41] RECOVERY - Disk Space on bacula1 is OK: DISK OK - free space: / 102326 MB (21% inode=99%): [09:13:40] PROBLEM - Disk Space on bacula1 is WARNING: DISK WARNING - free space: / 99451 MB (20% inode=99%): [09:17:41] PROBLEM - MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:51] PROBLEM - Current Load on misc4 is CRITICAL: CRITICAL - load average: 4.03, 2.78, 1.61 [09:18:11] PROBLEM - GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [09:18:41] PROBLEM - Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw3 [09:18:51] PROBLEM - Varnish Backends on cp5 is CRITICAL: 1 backends are down. mw3 [09:19:31] RECOVERY - MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 30103 bytes in 0.186 second response time [09:19:51] RECOVERY - Current Load on misc4 is OK: OK - load average: 1.29, 2.28, 1.57 [09:20:11] RECOVERY - GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [09:20:41] RECOVERY - Varnish Backends on cp4 is OK: All 3 backends are healthy [09:20:51] RECOVERY - Varnish Backends on cp2 is OK: All 3 backends are healthy [09:20:53] RECOVERY - Varnish Backends on cp5 is OK: All 3 backends are healthy [10:46:31] JohnLewis: hi [10:46:42] PROBLEM - Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [10:46:52] PROBLEM - GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 1 datacenter is down: 107.191.126.23/cpweb [10:48:40] RECOVERY - Varnish Backends on cp4 is OK: All 3 backends are healthy [10:48:50] RECOVERY - Varnish Backends on cp5 is OK: All 3 backends are healthy [10:48:52] RECOVERY - GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [10:54:41] PROBLEM - Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [10:54:51] PROBLEM - Varnish Backends on cp5 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [10:54:53] PROBLEM - GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [10:56:11] PROBLEM - GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [10:56:51] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 48% [10:57:31] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 42% [10:59:31] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 18% [11:02:11] RECOVERY - GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [11:02:51] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp2 is OK: OK - NGINX Error Rate is 4% [11:10:32] Hmm both mw? [12:00:51] PROBLEM - Current Load on misc4 is WARNING: WARNING - load average: 3.98, 2.67, 1.58 [12:02:51] RECOVERY - Current Load on misc4 is OK: OK - load average: 1.90, 2.43, 1.63 [12:09:31] PROBLEM - Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [12:13:31] RECOVERY - Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:15:31] RECOVERY - Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:02] Wait why is a user called JohnFLewis banned on Wikimedia? [13:06:14] Is it just a coincidence? [13:18:22] !log rebooting misc4, swift and test1 to gain FUSE. [13:18:38] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:19:11] PROBLEM - Puppet on test1 is CRITICAL: connect to address 185.52.2.243 port 5666: Connection refused [13:21:11] PROBLEM - Puppet on test1 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 1 day ago with 1 failures [13:29:42] PROBLEM - Disk Space on bacula1 is CRITICAL: DISK CRITICAL - free space: / 51816 MB (10% inode=99%): [14:33:22] RECOVERY - Bacula - Static Swift2 on bacula1 is OK: OK: Full, 677352 files, 28.51GB, 2018-07-15 14:32:00 (1.2 minutes ago) [14:35:42] PROBLEM - Bacula - Databases - db4 on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:36:40] PROBLEM - Bacula - Private Git on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:37:07] hmm [14:37:20] PROBLEM - Bacula - Static Swift2 on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [14:41:20] RECOVERY - Bacula - Static Swift2 on bacula1 is OK: OK: Full, 677352 files, 28.51GB, 2018-07-15 14:32:00 (9.2 minutes ago) [14:41:30] RECOVERY - Bacula - Databases - db4 on bacula1 is OK: OK: Diff, 293946 files, 49.86GB, 2018-07-15 02:57:00 (11.7 hours ago) [14:41:40] RECOVERY - Bacula Daemon on bacula1 is OK: PROCS OK: 2 processes with UID = 110 (bacula) [14:42:31] RECOVERY - Bacula - Private Git on bacula1 is OK: OK: Full, 1372 files, 1.467MB, 2018-07-08 00:05:00 (1.1 weeks ago) [15:00:52] PROBLEM - JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 2732 [15:52:12] Hello! [16:54:51] RECOVERY - JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [16:58:23] !log upgrade misc1 to stretch 9.5 (point release) and reboot [16:58:28] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:03:52] RECOVERY - JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [17:07:07] PROBLEM - Puppet on misc1 is CRITICAL: CRITICAL: Puppet has 5 failures. Last run 5 minutes ago with 5 failures. Failed resources (up to 3 shown): Package[dirmngr],Package[apache2],Package[salt-minion],Exec[git_pull_dns] [17:09:07] RECOVERY - Puppet on misc1 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:14:33] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYlu [17:14:35] [02miraheze/dns] 07paladox 03a941ebe - Add lizard to subdomain [17:18:05] Hello grumble1! If you have any questions feel free to ask and someone should answer soon. [17:20:13] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYli [17:20:15] [02miraheze/puppet] 07paladox 0380e31cd - Update bacula-dir.conf [17:26:16] PROBLEM - JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 2738 [17:44:45] [02puppet] 07JohnFLewis commented on pull request 03#775: optimising php-fpm - 13https://git.io/fNY8K [17:45:03] [02puppet] 07JohnFLewis closed pull request 03#774: Add defaulter.py - 13https://git.io/fNJGo [18:00:05] [02miraheze/services] 07MacFan4000 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNY4v [18:00:07] [02miraheze/services] 07MacFan4000 03ddf0966 - Add idolmasterwiki [18:19:16] okay the job queue is really starting to get to me now [18:19:43] it has to be solved because it's only a matter of time until people notice things [18:22:06] paladox: ^^ [18:22:30] why ping paladox? [18:22:37] it's 100% a mw-admins issues imho [18:22:45] What can I do then? [18:23:02] GlobalUserPage creates all of these jobs [18:24:05] ok [18:25:18] How can it be fixed to not have high jobqueue? [18:25:55] JohnLewis: ^^ [18:26:19] it's nothing technical, it's a social issue [18:26:36] GUP shouldn't create jobs to propogate a userpage when they have it disabled [18:27:15] eveyr single edits around 2.3k jobs, people usually edit their meta userpage for many different reasons. Someone generated content there over an hour a few days around and make upwards of 13k jobs [18:28:22] I don’t know how to fix. [18:30:35] JohnLewis: ^^ [18:33:08] take it off meta or make it stop using user pages [18:34:23] It has to be on meta or it won’t work at all, as meta is where the userpages are [18:34:54] which is why my suggestion is, stop using meta for the purpose [18:35:27] But won’t the same thing happen, no matter which wiki we use? [18:35:45] MacFan4000 why ping me? [18:36:11] JohnLewis we can disable jobs [18:36:13] well yes but that's the point of the extension isn't it? [18:36:15] like we do for videos [18:36:32] paladox: yes, disable them from the jobrunner [18:36:35] but they still exist [18:36:36] ok [18:36:43] and then they're not ran at all [18:36:45] JohnLewis what's the job name? Or do i have to search for it? [18:37:09] GlobalUserPageUpdateJob I think [18:37:14] thanks [18:37:35] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYBk [18:37:37] [02miraheze/puppet] 07paladox 0343ad22d - Disable GlobalUserPageUpdateJob in the job runner [18:37:44] like that john ^^? [18:38:07] JohnLewis: ^^ [18:38:31] no need to ping john MacFan4000 [18:39:22] uh [18:39:24] guys [18:39:45] 1) did no one read what I said immediate after? 2) if we don't have the jobs in the jobrunner… how do you expect the jobs to be ran? [18:40:05] (I legit just went to get a drink) [18:40:34] JohnLewis im confused now. [18:40:37] you said [19:36:32] <+JohnLewis> paladox: yes, disable them from the jobrunner [18:40:45] read what I said immediately after [18:40:56] JohnLewis> but they still exist [18:41:00] 14:36 but they still exist [18:41:12] oh [18:41:13] [19:36:35] <+JohnLewis> but they still exist [18:41:14] [19:36:44] <+JohnLewis> and then they're not ran at all [18:41:32] plus, it's just sort of logical that if you remove jobs from the thing designed to run jobs, you don't run the jobs you've included in the exclusion list [18:44:32] paladox: ^^ [18:44:47] PROBLEM - Puppet on mw3 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 1 minute ago with 0 failures [18:44:48] yep i read it [18:45:10] and now he's gone to try and kill mw3 :P [18:45:49] Lol [18:45:59] JohnLewis lol [19:00:52] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_31 [+0/-0/±1] 13https://git.io/fNYB7 [19:00:54] [02miraheze/mediawiki] 07paladox 03d303ac1 - Update Comments to upstream [19:05:36] RECOVERY - JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [19:05:41] Uhh I am getting 503s for some reason [19:06:42] hmm [19:07:17] It’s working now [19:17:08] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYR0 [19:17:10] [02miraheze/puppet] 07paladox 0321f9993 - Revert "Disable GlobalUserPageUpdateJob in the job runner" This reverts commit 43ad22dc7782fdc6cf68c54a79a3d280b3a94ccf. [19:17:57] RECOVERY - Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:33:36] PROBLEM - JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 2317 [19:48:57] PROBLEM - Puppet on misc4 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 7 minutes ago with 0 failures [20:17:18] [02ManageWiki] 07paladox synchronize pull request 03#13: separate extensions in ManageWiki - 13https://git.io/fNttj [20:17:19] [02miraheze/ManageWiki] 07paladox pushed 031 commit to 03Reception123-patch-1 [+0/-0/±1] 13https://git.io/fNYEB [20:17:21] [02miraheze/ManageWiki] 07paladox 03c041284 - Update SpecialManageWikiExtensions.php [20:19:42] [02miraheze/ManageWiki] 07paladox pushed 031 commit to 03Reception123-patch-1 [+0/-0/±1] 13https://git.io/fNYEg [20:19:44] [02miraheze/ManageWiki] 07paladox 033649484 - Update en.json [20:19:45] [02ManageWiki] 07paladox synchronize pull request 03#13: separate extensions in ManageWiki - 13https://git.io/fNttj [20:20:37] [02miraheze/ManageWiki] 07paladox pushed 031 commit to 03Reception123-patch-1 [+0/-0/±1] 13https://git.io/fNYEw [20:20:38] [02miraheze/ManageWiki] 07paladox 03140d9ab - Update qqq.json [20:20:40] [02ManageWiki] 07paladox synchronize pull request 03#13: separate extensions in ManageWiki - 13https://git.io/fNttj [20:21:29] [02ManageWiki] 07paladox synchronize pull request 03#13: separate extensions in ManageWiki - 13https://git.io/fNttj [20:21:31] [02miraheze/ManageWiki] 07paladox pushed 031 commit to 03Reception123-patch-1 [+0/-0/±1] 13https://git.io/fNYEX [20:21:32] [02miraheze/ManageWiki] 07paladox 03d265d34 - Update en.json [20:23:51] it's fine now [20:25:05] [02ManageWiki] 07paladox synchronize pull request 03#13: separate extensions in ManageWiki - 13https://git.io/fNttj [20:25:06] [02miraheze/ManageWiki] 07paladox pushed 031 commit to 03Reception123-patch-1 [+0/-0/±1] 13https://git.io/fNYE7 [20:25:08] [02miraheze/ManageWiki] 07paladox 031c7d131 - Update en.json [20:25:28] [02ManageWiki] 07paladox closed pull request 03#13: separate extensions in ManageWiki - 13https://git.io/fNttj [20:25:30] [02miraheze/ManageWiki] 07paladox pushed 0312 commits to 03master [+2/-0/±17] 13https://git.io/fNYEF [20:25:31] [02miraheze/ManageWiki] 07paladox 030044ef9 - Merge pull request #13 from miraheze/Reception123-patch-1 separate extensions in ManageWiki [20:25:33] [02ManageWiki] 07paladox deleted branch 03Reception123-patch-1 - 13https://git.io/vpSns [20:25:34] [02miraheze/ManageWiki] 07paladox deleted branch 03Reception123-patch-1 [20:27:37] RECOVERY - JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [20:31:12] [02miraheze/mediawiki] 07paladox pushed 031 commit to 03REL1_31 [+0/-0/±1] 13https://git.io/fNYue [20:31:14] [02miraheze/mediawiki] 07paladox 03f512e1e - Update ManageWiki [20:33:00] 1 hour for jobqueue recovery [20:37:42] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYuZ [20:37:44] [02miraheze/mw-config] 07paladox 03bd11fb5 - Default help param to false to avoid index warnings [20:39:42] paladox: uh [20:39:47] JohnLewis ? [20:39:48] it's settings only [20:39:59] theres no help for extensions [20:40:08] oh [20:40:14] ok [20:41:16] [02miraheze/mw-config] 07paladox pushed 032 commits to 03master [+0/-0/±2] 13https://git.io/fNYuz [20:41:18] [02miraheze/mw-config] 07paladox 03c1a7efb - Revert "Default help param to false to avoid index warnings" This reverts commit bd11fb519a5f42611100658143b68c60a533ba92. [20:41:19] [02miraheze/mw-config] 07paladox 0353a2395 - Default help param to false to avoid index warnings [20:41:20] JohnLewis done [20:43:17] !log sudo -u www-data php /srv/mediawiki/w/maint*/rebuildLocalisationCache.php --wiki test1wiki on mw* [20:43:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [20:44:16] PROBLEM - Puppet on swift2 is CRITICAL: CRITICAL: Puppet has 20 failures. Last run 3 minutes ago with 20 failures. Failed resources (up to 3 shown): Exec[ufw-allow-tcp-from-any-to-any-port-9102],Exec[ufw-allow-tcp-from-any-to-any-port-22],Exec[ufw-allow-tcp-from-any-to-any-port-5666],Exec[ufw-allow-tcp-from-185.52.1.76-to-any-port-9100] [20:59:26] Because Special:ManageWiki has no link to Special:ManageWikiExtensions ? [20:59:41] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYzc [20:59:43] [02miraheze/mw-config] 07paladox 03d428cd9 - Set site notice for db maintenance [20:59:49] Wiki-1776 it should [20:59:50] at the top [20:59:54] or is it because it hasn't been fixed yet? [21:00:34] "To manage the extensions for this wiki please see this page instead." [21:00:35] Wiki-1776 ^^ [21:00:45] https://wiki1776.miraheze.org/wiki/Especial:ManageWikiExtensions [21:00:46] Title: [ Error de permisos - Wiki ] - wiki1776.miraheze.org [21:01:56] paladox: -> https://wiki1776.miraheze.org/wiki/Especial:ManageWiki <- [21:01:57] Title: [ Error de permisos - Wiki ] - wiki1776.miraheze.org [21:02:12] yep [21:02:14] links to https://wiki1776.miraheze.org/wiki/Especial:ManageWikiExtensions [21:02:14] Title: [ Error de permisos - Wiki ] - wiki1776.miraheze.org [21:07:15] PROBLEM - Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config] [21:20:59] hmm [21:42:53] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYgF [21:42:55] [02miraheze/puppet] 07paladox 03baadf09 - Update container-server.conf.erb [21:43:02] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYgb [21:43:04] [02miraheze/puppet] 07paladox 03794f67c - Update object-server.conf.erb [21:43:18] !log reboot swift2 [21:43:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [21:43:48] seriously swift is getting annoying now with the number of commits [21:46:16] RECOVERY - Puppet on swift2 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:46:50] at least swift doesn't have a fear of committment [21:58:36] PROBLEM - Current Load on test1 is CRITICAL: CRITICAL - load average: 4.27, 2.76, 1.44 [21:59:57] PROBLEM - Current Load on misc4 is CRITICAL: CRITICAL - load average: 4.99, 3.47, 2.13 [22:00:38] 4 on test1? [22:01:17] RECOVERY - Puppet on test1 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:02:16] JohnLewis swift is very intrusive! [22:02:20] (swift client that is) [22:02:27] PROBLEM - Current Load on swift1 is CRITICAL: CRITICAL - load average: 4.91, 3.46, 1.79 [22:02:33] Swift is on test1? [22:02:42] Reception123 the client [22:02:50] to connect to swift1 [22:02:52] and swift2 [22:03:39] Why do we have it on test1 though? Test1 should only be used for testing or temporarily hosting services [22:04:21] it's doing the latter [22:05:15] Ok [22:05:19] Reception123 it's including it in the mw class [22:05:35] and yeh test1 is for testing which is why it has to also have it [22:05:37] PROBLEM - Varnish Backends on cp5 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [22:05:45] PROBLEM - GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [22:05:55] Hmm why is this happening again [22:06:05] swift [22:06:13] need to migrate [22:06:45] PROBLEM - GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [22:07:17] PROBLEM - Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [22:07:27] PROBLEM - Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [22:07:57] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp5 is CRITICAL: CRITICAL - NGINX Error Rate is 82% [22:08:07] PROBLEM - MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:08:27] PROBLEM - Current Load on swift1 is WARNING: WARNING - load average: 3.18, 3.60, 2.41 [22:09:25] RECOVERY - Varnish Backends on cp4 is OK: All 3 backends are healthy [22:09:35] RECOVERY - Varnish Backends on cp5 is OK: All 3 backends are healthy [22:09:45] RECOVERY - GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:09:55] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 4% [22:09:57] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp5 is WARNING: WARNING - NGINX Error Rate is 48% [22:10:27] RECOVERY - Current Load on swift1 is OK: OK - load average: 1.82, 2.94, 2.31 [22:10:45] RECOVERY - GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [22:11:05] PROBLEM - MediaWiki Rendering on test1 is WARNING: HTTP WARNING: HTTP/1.1 404 Not Found - 199 bytes in 0.062 second response time [22:11:15] RECOVERY - Varnish Backends on cp2 is OK: All 3 backends are healthy [22:11:25] PROBLEM - Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [22:11:37] 404 on test1? [22:11:55] RECOVERY - Current Load on misc4 is OK: OK - load average: 1.24, 2.82, 2.78 [22:11:57] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp5 is OK: OK - NGINX Error Rate is 8% [22:12:36] PROBLEM - Current Load on test1 is WARNING: WARNING - load average: 1.38, 1.87, 1.78 [22:13:15] paladox: is test1 404 you? [22:13:33] JohnLewis yes, i am fixing puppet. It timed out so i am re cloning mediawiki [22:13:57] uh mkay [22:14:36] RECOVERY - Current Load on test1 is OK: OK - load average: 1.07, 1.59, 1.68 [22:19:07] PROBLEM - MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki configuration Error - 1272 bytes in 0.018 second response time [22:33:06] RECOVERY - MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 1101 bytes in 0.017 second response time [22:43:16] RECOVERY - Puppet on test1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:48:01] !log sudo -u www-data php /srv/mediawiki/w/maint*/rebuildLocalisationCache.php --wiki test1wiki on test1 [22:48:08] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [22:54:28] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYaA [22:54:29] [02miraheze/mw-config] 07paladox 03db35bc7 - Set wiki's into read only [22:57:09] !log stopping mysql on db4 [22:58:18] Hi [22:58:46] hi [22:59:27] PROBLEM - Puppet on test1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki core] [22:59:35] PROBLEM - MediaWiki Rendering on mw3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1346 bytes in 0.048 second response time [22:59:35] !log upgrading mariadb and stretch to 9.5 point release (and also reboot) [22:59:45] PROBLEM - GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [22:59:55] PROBLEM - MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1346 bytes in 0.177 second response time [23:00:07] PROBLEM - MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1346 bytes in 0.060 second response time [23:00:25] PROBLEM - Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [23:00:37] PROBLEM - Varnish Backends on cp5 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [23:00:47] PROBLEM - GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [23:01:06] PROBLEM - MySQL on db4 is CRITICAL: Cant connect to MySQL server on 81.4.109.166 (111 Connection refused) [23:01:15] PROBLEM - Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [23:01:45] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 89% [23:01:55] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 93% [23:03:16] PROBLEM - Puppet on db4 is CRITICAL: CRITICAL: Puppet has 6 failures. Last run 2 minutes ago with 6 failures. Failed resources (up to 3 shown): Package[postgresql-9.6],Package[postgresql-client-9.6],Package[postgresql-contrib-9.6],Package[check-postgres] [23:03:56] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp5 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [23:09:56] RECOVERY - MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 31055 bytes in 0.342 second response time [23:10:08] RECOVERY - MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 31054 bytes in 0.107 second response time [23:10:26] RECOVERY - Varnish Backends on cp4 is OK: All 3 backends are healthy [23:10:36] RECOVERY - Varnish Backends on cp5 is OK: All 3 backends are healthy [23:11:46] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 49% [23:13:46] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 82% [23:13:56] PROBLEM - Bacula - Databases - db4 on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:14:06] PROBLEM - MediaWiki Rendering on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:14:16] PROBLEM - MediaWiki Rendering on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:14:26] PROBLEM - Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [23:14:36] PROBLEM - Varnish Backends on cp5 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [23:17:16] RECOVERY - Puppet on db4 is OK: OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures [23:23:07] PROBLEM - Puppet on misc1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:24:17] PROBLEM - MySQL on db4 is CRITICAL: Cant connect to MySQL server on 81.4.109.166 (111 Connection refused) [23:24:57] PROBLEM - Disk Space on db4 is CRITICAL: connect to address 81.4.109.166 port 5666: Connection refused [23:25:17] RECOVERY - Puppet on test1 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:25:47] PROBLEM - Current Load on db4 is CRITICAL: connect to address 81.4.109.166 port 5666: Connection refused [23:26:17] PROBLEM - SSH on db4 is CRITICAL: connect to address 81.4.109.166 and port 22: Connection refused [23:28:15] RECOVERY - SSH on db4 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u3 (protocol 2.0) [23:29:05] PROBLEM - MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1346 bytes in 0.123 second response time [23:29:45] RECOVERY - Bacula - Databases - db4 on bacula1 is OK: OK: Diff, 293946 files, 49.86GB, 2018-07-15 02:57:00 (20.5 hours ago) [23:32:47] RECOVERY - GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [23:33:05] RECOVERY - MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 31061 bytes in 0.531 second response time [23:33:15] RECOVERY - Varnish Backends on cp2 is OK: All 3 backends are healthy [23:33:35] RECOVERY - MediaWiki Rendering on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 31054 bytes in 0.112 second response time [23:33:45] RECOVERY - GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:33:55] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 2% [23:35:55] PROBLEM - HTTP 4xx/5xx ERROR Rate on cp5 is WARNING: WARNING - NGINX Error Rate is 43% [23:37:22] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYwL [23:37:24] [02miraheze/mw-config] 07paladox 03e0de68e - Revert "Set wiki's into read only" This reverts commit db35bc7adacd4e027934a87d83993b108c719940. [23:37:56] RECOVERY - HTTP 4xx/5xx ERROR Rate on cp5 is OK: OK - NGINX Error Rate is 14% [23:39:26] [02miraheze/mw-config] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fNYwt [23:39:27] [02miraheze/mw-config] 07paladox 03d77b2a4 - Revert "Set site notice for db maintenance" This reverts commit d428cd9cc5ffe6f51a803cc58038e6d49cd0e531. [23:44:56] there a mistake with the maintenance? [23:45:47] Wiki-1776 nope, it has been done. [23:46:04] !log [23:57:09] <+paladox> !log stopping mysql on db4 [23:46:08] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [23:46:14] !log [23:59:35] <+paladox> !log upgrading mariadb and stretch to 9.5 point release (and also reboot) [23:46:17] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [23:47:27] ah, ok