[00:02:48] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 2400:6180:0:d0::403:f001/cpweb, 81.4.109.133/cpweb [00:04:49] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [00:20:41] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKJL [00:20:42] [02miraheze/puppet] 07paladox 0388d8ca1 - Update mount.pp [00:24:34] PROBLEM - lizardfs6 Puppet on lizardfs6 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Mount[/mnt/mediawiki-static] [00:34:42] RECOVERY - lizardfs6 Puppet on lizardfs6 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:08:18] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [02:13:17] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [02:31:11] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2400:6180:0:d0::403:f001/cpweb [02:33:14] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [06:27:46] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 2727 MB (11% inode=94%); [08:18:38] PROBLEM - cp3 Disk Space on cp3 is WARNING: DISK WARNING - free space: / 2649 MB (10% inode=94%); [13:23:42] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKs1 [13:23:43] [02miraheze/puppet] 07paladox 03154d3d7 - bacula: Update comment [13:26:39] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKsp [13:26:41] [02miraheze/puppet] 07paladox 038ffa4bb - bacula: Fix calculation (From John) [13:26:41] !log Just an update restarting import for T4821 [13:26:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:43:35] [02puppet] 07Pix1234 opened pull request 03#1150: Upgrade mw[123] and lizardfs6 to php 7.3 - 13https://git.io/JeKG0 [13:44:55] [02puppet] 07paladox closed pull request 03#1150: Upgrade mw[123] and lizardfs6 to php 7.3 - 13https://git.io/JeKG0 [13:44:57] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±4] 13https://git.io/JeKGz [13:44:58] [02miraheze/puppet] 07Pix1234 031d2dda0 - Upgrade mw[123] and lizardfs6 to php 7.3 (#1150) * Upgrade php to 7.3 * Upgrade php to 7.3 * Upgrade php to 7.3 * Upgrade to php 7.3 while lizardfs6 is running MW [13:45:43] !log depool, upgrade php7.2 to php7.3 and repool on mw[123] and lizardfs6 [13:45:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:47:07] PROBLEM - mw2 Puppet on mw2 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 3 minutes ago with 0 failures [13:47:21] PROBLEM - mw3 Puppet on mw3 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 4 minutes ago with 0 failures [13:48:21] PROBLEM - lizardfs6 Puppet on lizardfs6 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 5 minutes ago with 0 failures [13:59:06] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [13:59:09] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. mw2 [13:59:21] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. mw2 [14:01:08] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [14:01:21] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [14:03:15] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:08:23] RECOVERY - lizardfs6 Puppet on lizardfs6 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:09:22] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. lizardfs6 [14:09:53] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. lizardfs6 [14:10:06] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. lizardfs6 [14:11:21] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 6 backends are healthy [14:11:56] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 6 backends are healthy [14:12:03] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 6 backends are healthy [14:17:46] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKZm [14:17:47] [02miraheze/puppet] 07paladox 0305619d5 - Update php.pp [14:29:37] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 1 backends are down. lizardfs6 [14:30:11] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 1 backends are down. lizardfs6 [14:30:23] PROBLEM - lizardfs6 Puppet on lizardfs6 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 7 minutes ago with 0 failures [14:30:37] PROBLEM - lizardfs6 php-fpm on lizardfs6 is CRITICAL: PROCS CRITICAL: 0 processes with command name 'php-fpm7.3' [14:31:21] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 1 backends are down. lizardfs6 [14:31:23] PROBLEM - cp3 Stunnel Http for misc3 on cp3 is CRITICAL: HTTP CRITICAL - No data received from host [14:31:45] PROBLEM - cp4 Stunnel Http for misc3 on cp4 is CRITICAL: HTTP CRITICAL - No data received from host [14:31:56] PROBLEM - cp2 Stunnel Http for misc3 on cp2 is CRITICAL: HTTP CRITICAL - No data received from host [14:45:49] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKZx [14:45:50] [02miraheze/puppet] 07paladox 0355398da - Update init.pp [14:54:10] PROBLEM - cp4 Stunnel Http for lizardfs6 on cp4 is CRITICAL: NRPE: Command 'check_stunnel_lizardfs6' not defined [14:56:25] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKnY [14:56:27] [02miraheze/puppet] 07paladox 03bdc7ff0 - Update nrpe.cfg.erb [15:03:24] PROBLEM - cp2 Stunnel Http for lizardfs6 on cp2 is CRITICAL: HTTP CRITICAL - No data received from host [15:04:16] PROBLEM - cp3 Stunnel Http for lizardfs6 on cp3 is CRITICAL: HTTP CRITICAL - No data received from host [15:57:31] JohnLewis: I guess that was because of a security problem. [15:57:44] hi hispano76 [16:16:33] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 3 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_checkout_landing],Exec[ufw-allow-tcp-from-any-to-any-port-80],Exec[ufw-allow-tcp-from-any-to-any-port-443] [16:17:56] huh [16:24:42] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:25:06] good [16:46:42] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus-php-fpm] [16:48:31] RhinosF1: hi [17:19:41] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKWA [17:19:43] [02miraheze/puppet] 07paladox 03830b5a8 - Update mount.pp [17:20:07] [02miraheze/MirahezeMagic] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKWp [17:20:09] [02miraheze/MirahezeMagic] 07translatewiki 03d5362af - Localisation updates from https://translatewiki.net. [17:20:10] [ Main page - translatewiki.net ] - translatewiki.net. [17:23:45] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:24:54] ping paladox [17:25:01] hi? [17:30:02] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [17:42:03] !log reboot mw1 [17:42:13] !log also depooled then repool mw1 [17:42:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:42:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [17:44:14] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [18:32:55] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 3 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb [18:34:53] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [19:49:36] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKBg [19:49:38] [02miraheze/puppet] 07paladox 0387f980c - varnish: Remove lizardfs6 [19:51:43] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [19:52:09] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [19:53:08] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 5 backends are healthy [20:05:08] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKBA [20:05:09] [02miraheze/services] 07MirahezeSSLBot 0324722ef - BOT: Updating services config for wikis [20:21:24] RECOVERY - lizardfs6 Puppet on lizardfs6 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:23:38] RECOVERY - cp2 Stunnel Http for lizardfs6 on cp2 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 0.433 second response time [20:23:45] RECOVERY - cp3 Stunnel Http for lizardfs6 on cp3 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 1.003 second response time [20:23:50] RECOVERY - cp4 Stunnel Http for lizardfs6 on cp4 is OK: HTTP OK: Status line output matched "401" - 381 bytes in 0.058 second response time [20:24:08] RECOVERY - lizardfs6 php-fpm on lizardfs6 is OK: PROCS OK: 41 processes with command name 'php-fpm7.3' [20:27:38] PROBLEM - cp2 Stunnel Http for lizardfs6 on cp2 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8203: HTTP/1.1 200 OK [20:27:45] PROBLEM - cp3 Stunnel Http for lizardfs6 on cp3 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8203: HTTP/1.1 200 OK [20:27:49] PROBLEM - cp4 Stunnel Http for lizardfs6 on cp4 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8203: HTTP/1.1 200 OK [20:38:34] Voidwalker: ehm, you taking care of the translations in Mediawiki? this is because I already translated the ad into Spanish. [20:39:10] oh yes [20:40:20] hispano76, published [20:40:31] Ok, thanks Voidwalker :) [20:40:59] no problem, thanks for the ping :) [21:47:17] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 4 datacenters are down: 107.191.126.23/cpweb, 128.199.139.216/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb [21:49:15] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [22:16:36] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [22:20:07] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKuT [22:20:09] [02miraheze/services] 07MirahezeSSLBot 035eac202 - BOT: Updating services config for wikis [22:32:22] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:46:35] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 3 minutes ago with 3 failures. Failed resources (up to 3 shown): Exec[git_checkout_landing],Exec[ufw-allow-tcp-from-any-to-any-port-80],Exec[ufw-allow-tcp-from-any-to-any-port-443] [22:58:51] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [23:05:16] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Puppet has 9 failures. Last run 3 minutes ago with 9 failures. Failed resources (up to 3 shown): Exec[ufw-allow-tcp-from-185.52.3.121-to-any-port-9253],Exec[ufw-allow-tcp-from-185.52.3.121-to-any-port-9113],Package[php7.3-apcu],Package[php7.3-redis] [23:06:14] paladox: ufw exec fails ^ [23:06:28] yup, OOM apparently [23:06:36] ouch [23:06:54] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKzG [23:06:55] [02miraheze/puppet] 07paladox 0384d8047 - Update mount.pp [23:07:51] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKzC [23:07:53] [02miraheze/puppet] 07paladox 03d04ad1d - Update mediawiki.pp [23:08:34] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2604:180:0:33b::2/cpweb [23:09:22] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 2 datacenters are down: 2604:180:0:33b::2/cpweb, 128.199.139.216/cpweb [23:09:40] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:10:02] PROBLEM - cp2 Stunnel Http for mw2 on cp2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:10:39] paladox: (very) random guess: search engine scraper bots are indexing all wikis [23:10:53] heh, nope. Seems to be when puppet runs. [23:10:57] ok [23:11:16] "not enough RAM for puppet" sounds like the webserver gets too much of it [23:11:18] though [23:11:32] PROBLEM - cp4 Stunnel Http for mw2 on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:11:39] PROBLEM - cp3 Stunnel Http for mw2 on cp3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [23:11:48] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19020 bytes in 0.413 second response time [23:12:46] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [23:13:31] RECOVERY - cp4 Stunnel Http for mw2 on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 24639 bytes in 0.004 second response time [23:13:35] RECOVERY - cp3 Stunnel Http for mw2 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 24639 bytes in 0.683 second response time [23:13:45] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [23:13:47] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKzu [23:13:48] [02miraheze/puppet] 07paladox 039eda91f - Update mount.pp [23:14:17] RECOVERY - cp2 Stunnel Http for mw2 on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 24639 bytes in 9.881 second response time [23:17:15] !log restart php7.3-fpm on mw2 [23:17:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [23:18:31] !log restart php7.3-fpm on mw1 [23:18:49] mutante fixed! [23:19:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [23:19:27] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [23:19:36] paladox: the restart killed the googlebot connections ?:) [23:19:48] paladox: cool! [23:19:54] nope my puppet change seemed to have lowered the amount of ram is used apparently [23:20:17] oh.. did not notice the puppet change. where was that [23:20:55] mutante https://git.io/JeKzG [23:20:56] [ Comparing 87f980cdeddf...84d8047ce49e · miraheze/puppet · GitHub ] - git.io [23:21:38] i suspect a memory leak or something in join/delete_undef_values [23:22:00] paladox: you changed the mount options? [23:22:11] nope, i simplified it [23:24:34] "code simplification saves RAM on mw application server" heh [23:24:41] :P [23:26:10] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JeKzX [23:26:12] [02miraheze/puppet] 07paladox 036dbec84 - Update mount.pp [23:26:40] hmm [23:26:44] seems to keep happening [23:26:52] though at the moment puppet is running [23:28:06] run it manually? [23:29:35] yup, i did which works. [23:29:54] So i'm going to leave it for now and if it keeps happening, have a closer look at it.