[00:05:09] PROBLEM - cp5 HTTPS on cp5 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3814 bytes in 1.075 second response time [00:05:27] PROBLEM - mw3 SSH on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:05:37] PROBLEM - lizardfs2 SSH on lizardfs2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:05:53] PROBLEM - db4 Puppet on db4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:05:57] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [00:06:11] PROBLEM - lizardfs2 Puppet on lizardfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:06:15] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [00:06:19] PROBLEM - db4 Current Load on db4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:06:31] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:06:37] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [00:06:43] PROBLEM - mw3 JobChron Service on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:06:49] PROBLEM - cp5 Varnish Backends on cp5 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [00:07:09] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 87% [00:07:13] PROBLEM - db4 SSH on db4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:17] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:07:19] seconds. [00:07:23] PROBLEM - mw3 HTTPS on mw3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:25] PROBLEM - lizardfs2 Current Load on lizardfs2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:07:27] PROBLEM - cp4 HTTPS on cp4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3801 bytes in 0.076 second response time [00:07:33] PROBLEM - Host lizardfs2 is DOWN: PING CRITICAL - Packet loss = 100% [00:07:35] PROBLEM - misc1 HTTPS on misc1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:39] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 6 datacenters are down: 107.191.126.23/cpweb, 2604:180:0:33b::2/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [00:07:41] PROBLEM - misc2 HTTPS on misc2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:49] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 94% [00:07:55] PROBLEM - cp2 HTTPS on cp2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3801 bytes in 0.408 second response time [00:08:03] PROBLEM - bacula1 Bacula Databases db4 on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:08:39] PROBLEM - bacula1 Bacula Lizardfs2 Lizardfs Chunkserver2 on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:08:43] PROBLEM - bacula1 Bacula Static Lizardfs2 on bacula1 is CRITICAL: CRITICAL: Timeout or unknown client: lizardfs2-fd [00:08:45] PROBLEM - db4 MySQL on db4 is UNKNOWN: [00:09:03] PROBLEM - Host mw3 is DOWN: PING CRITICAL - Packet loss = 100% [00:09:11] RECOVERY - db4 SSH on db4 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [00:09:31] RECOVERY - misc1 HTTPS on misc1 is OK: HTTP OK: HTTP/1.1 302 Found - 390 bytes in 0.052 second response time [00:09:51] RECOVERY - db4 Puppet on db4 is OK: OK: Puppet is currently enabled, last run 9 minutes ago with 0 failures [00:09:59] RECOVERY - bacula1 Bacula Databases db4 on bacula1 is OK: OK: Diff, 60663 files, 60.17GB, 2018-10-21 04:07:00 (1.4 weeks ago) [00:10:17] RECOVERY - db4 Current Load on db4 is OK: OK - load average: 0.84, 0.26, 0.09 [00:10:39] PROBLEM - mw2 MediaWiki Rendering on mw2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3871 bytes in 0.110 second response time [00:10:41] RECOVERY - misc4 phabricator.miraheze.org HTTPS on misc4 is OK: HTTP OK: HTTP/1.1 200 OK - 18290 bytes in 0.207 second response time [00:10:43] PROBLEM - test1 MediaWiki Rendering on test1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3871 bytes in 0.041 second response time [00:10:45] RECOVERY - db4 MySQL on db4 is OK: Uptime: 105 Threads: 55 Questions: 2062 Slow queries: 0 Opens: 138 Flush tables: 1 Open tables: 132 Queries per second avg: 19.638 [00:10:47] PROBLEM - mw1 MediaWiki Rendering on mw1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3871 bytes in 0.033 second response time [00:11:11] RECOVERY - cp5 HTTPS on cp5 is OK: HTTP OK: HTTP/1.1 200 OK - 23533 bytes in 2.642 second response time [00:11:27] RECOVERY - cp4 HTTPS on cp4 is OK: HTTP OK: HTTP/1.1 200 OK - 23533 bytes in 0.043 second response time [00:11:37] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 40789 bytes in 0.130 second response time [00:11:49] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is WARNING: WARNING - NGINX Error Rate is 53% [00:11:53] RECOVERY - cp2 HTTPS on cp2 is OK: HTTP OK: HTTP/1.1 200 OK - 23565 bytes in 0.525 second response time [00:12:39] RECOVERY - mw2 MediaWiki Rendering on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 19040 bytes in 0.069 second response time [00:12:43] RECOVERY - test1 MediaWiki Rendering on test1 is OK: HTTP OK: HTTP/1.1 200 OK - 19040 bytes in 0.029 second response time [00:12:47] RECOVERY - mw1 MediaWiki Rendering on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 19040 bytes in 0.047 second response time [00:13:07] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 41% [00:13:31] RECOVERY - Host mw3 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [00:13:49] PROBLEM - cp2 HTTP 4xx/5xx ERROR Rate on cp2 is CRITICAL: CRITICAL - NGINX Error Rate is 85% [00:15:08] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [00:15:32] PROBLEM - mw2 HTTPS on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:54] PROBLEM - cp2 HTTPS on cp2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3801 bytes in 0.407 second response time [00:16:40] PROBLEM - misc4 phabricator.miraheze.org HTTPS on misc4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 4127 bytes in 0.057 second response time [00:16:46] PROBLEM - db4 MySQL on db4 is CRITICAL: Can't connect to MySQL server on '81.4.109.166' (111 "Connection refused") [00:17:06] connect to host 185.52.1.75 port 5666: Connection refused [00:17:10] PROBLEM - cp5 HTTPS on cp5 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3814 bytes in 1.025 second response time [00:17:28] PROBLEM - cp4 HTTPS on cp4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 3801 bytes in 0.016 second response time [00:17:54] PROBLEM - mw1 php7.2-fpm on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:18:04] PROBLEM - mw1 Mirahezerenewssl on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:18] PROBLEM - mw1 HTTPS on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:21] !log upgrade mariadb to 10.2.18 [00:18:40] RECOVERY - misc4 phabricator.miraheze.org HTTPS on misc4 is OK: HTTP OK: HTTP/1.1 200 OK - 18290 bytes in 0.181 second response time [00:18:44] RECOVERY - db4 MySQL on db4 is OK: Uptime: 158 Threads: 50 Questions: 4490 Slow queries: 0 Opens: 226 Flush tables: 1 Open tables: 220 Queries per second avg: 28.417 [00:19:00] PROBLEM - mw1 Disk Space on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:19:20] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:19:44] PROBLEM - mw2 Current Load on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:20:08] PROBLEM - mw2 Disk Space on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:20:52] PROBLEM - mw2 SSH on mw2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:21:20] PROBLEM - mw2 php7.2-fpm on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:21:24] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:21:28] PROBLEM - Host mw1 is DOWN: PING CRITICAL - Packet loss = 100% [00:21:50] PROBLEM - Host mw2 is DOWN: PING CRITICAL - Packet loss = 100% [00:24:54] RECOVERY - mw3 php7.2-fpm on mw3 is OK: PROCS OK: 5 processes with command name 'php-fpm7.2' [00:24:58] RECOVERY - mw3 HTTPS on mw3 is OK: HTTP OK: HTTP/1.1 200 OK - 22759 bytes in 0.016 second response time [00:25:02] RECOVERY - mw3 JobRunner Service on mw3 is OK: PROCS OK: 1 process with args 'redisJobRunnerService' [00:25:16] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 3.01, 0.77, 0.26 [00:25:24] RECOVERY - mw3 SSH on mw3 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [00:25:34] RECOVERY - mw3 Disk Space on mw3 is OK: DISK OK - free space: / 60754 MB (79% inode=99%); [00:26:18] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:26:20] RECOVERY - Host mw2 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [00:26:22] RECOVERY - mw3 JobChron Service on mw3 is OK: PROCS OK: 1 process with args 'redisJobChronService' [00:26:26] RECOVERY - mw2 Disk Space on mw2 is OK: DISK OK - free space: / 27688 MB (36% inode=98%); [00:27:10] RECOVERY - mw2 SSH on mw2 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [00:27:22] RECOVERY - mw2 php7.2-fpm on mw2 is OK: PROCS OK: 5 processes with command name 'php-fpm7.2' [00:27:26] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [00:27:36] RECOVERY - mw2 HTTPS on mw2 is OK: HTTP OK: HTTP/1.1 200 OK - 22759 bytes in 0.016 second response time [00:27:42] RECOVERY - mw2 Current Load on mw2 is OK: OK - load average: 3.06, 0.89, 0.30 [00:33:49] PROBLEM - bacula1 Bacula Lizardfs2 Lizardfs Chunkserver2 on bacula1 is UNKNOWN: [00:34:31] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fxFCU [00:34:32] [02miraheze/dns] 07paladox 038b6c27f - Depool cp4 [00:34:53] RECOVERY - Host mw1 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [00:35:11] RECOVERY - mw1 HTTPS on mw1 is OK: HTTP OK: HTTP/1.1 200 OK - 22759 bytes in 0.047 second response time [00:35:13] RECOVERY - mw1 SSH on mw1 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [00:35:17] RECOVERY - mw1 Disk Space on mw1 is OK: DISK OK - free space: / 33695 MB (43% inode=99%); [00:35:19] PROBLEM - mw3 Current Load on mw3 is CRITICAL: CRITICAL - load average: 11.88, 10.10, 5.50 [00:35:49] PROBLEM - bacula1 Bacula Lizardfs2 Lizardfs Chunkserver2 on bacula1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:37:17] PROBLEM - lizardfs1 Puppet on lizardfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:37:18] !log killing services on mw1 [00:38:35] PROBLEM - bacula1 Bacula Static Lizardfs1 on bacula1 is CRITICAL: CRITICAL: Timeout or unknown client: lizardfs1-fd [00:38:37] PROBLEM - bacula1 Bacula Lizardfs1 Lizardfs Chunkserver1 on bacula1 is CRITICAL: CRITICAL: Timeout or unknown client: lizardfs1-fd [00:38:49] PROBLEM - lizardfs1 Current Load on lizardfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:39:17] PROBLEM - lizardfs1 Disk Space on lizardfs1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:39:23] PROBLEM - lizardfs1 Lizardfs Chunkserver Port on lizardfs1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:39:25] PROBLEM - lizardfs1 SSH on lizardfs1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:39:59] PROBLEM - Host lizardfs1 is DOWN: PING CRITICAL - Packet loss = 100% [00:40:13] paladox: all mediawiki servers are experiencing the same issues, make it an official downtime [00:40:22] Ok [00:40:28] offical downtime? [00:40:43] (announce the downtime in all official channels) [00:40:52] topic, twitter, facebook, whatever we're using [00:41:01] PROBLEM - puppet1 Puppet on puppet1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:41:03] PROBLEM - puppet1 Disk Space on puppet1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:41:07] oh, i have no access to any of them i think [00:41:19] SPF|Cloud https://twitter.com/miraheze?ref_src=twsrc%5Etfw%7Ctwcamp%5Eembeddedtimeline%7Ctwterm%5Eprofile%3Amiraheze&ref_url=https%3A%2F%2Flogin.miraheze.org%2Fwiki%2FMain_Page [00:41:21] Title: [ Miraheze (@miraheze) | Twitter ] - twitter.com [00:41:25] PROBLEM - puppet1 SSH on puppet1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:31] PROBLEM - puppet1 Current Load on puppet1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:41:44] oh good lord the puppet machine is down [00:41:53] PROBLEM - bacula1 Bacula Private Git on bacula1 is CRITICAL: CRITICAL: Timeout or unknown client: puppet1-fd [00:42:21] PROBLEM - Host puppet1 is DOWN: PING CRITICAL - Packet loss = 100% [00:42:23] PROBLEM - test1 Current Load on test1 is CRITICAL: CRITICAL - load average: 3.56, 1.76, 0.72 [00:42:37] PROBLEM - mw1 php7.2-fpm on mw1 is CRITICAL: PROCS CRITICAL: 0 processes with command name 'php-fpm7.2' [00:42:51] PROBLEM - ns1 Puppet on ns1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:43:07] PROBLEM - misc3 Current Load on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:43:12] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:43:16] PROBLEM - bacula1 Puppet on bacula1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:43:26] !.log rebooting mw1 [00:43:28] PROBLEM - misc3 Parsoid on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:43:30] PROBLEM - misc3 SSH on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:43:38] PROBLEM - misc2 Puppet on misc2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:43:48] PROBLEM - misc2 Prometheus on misc2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:22] PROBLEM - Host misc2 is DOWN: PING CRITICAL - Packet loss = 100% [00:44:44] PROBLEM - misc3 Disk Space on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:44:54] PROBLEM - Host misc3 is DOWN: PING CRITICAL - Packet loss = 100% [00:45:06] PROBLEM - cp2 Puppet on cp2 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:45:12] PROBLEM - cp5 Puppet on cp5 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:45:44] PROBLEM - misc4 Lizardfs Master Port 1 on misc4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:45:48] PROBLEM - misc4 Lizardfs Master Port 2 on misc4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:46:18] RECOVERY - Host misc2 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [00:46:20] RECOVERY - bacula1 Bacula Private Git on bacula1 is OK: OK: Full, 1613 files, 1.858MB, 2018-10-28 00:08:00 (3.0 days ago) [00:46:20] great, SolusVM is giving up as well [00:46:22] out after 10 seconds. [00:46:24] RECOVERY - misc2 Current Load on misc2 is OK: OK - load average: 1.00, 0.29, 0.10 [00:46:31] mw1 is gone [00:46:38] RECOVERY - misc2 Disk Space on misc2 is OK: DISK OK - free space: / 31525 MB (76% inode=98%); [00:46:40] PROBLEM - misc4 phd on misc4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:46:43] SPF|Cloud works for me [00:46:50] RECOVERY - Host puppet1 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [00:46:52] RECOVERY - misc2 SSH on misc2 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [00:46:54] PROBLEM - misc4 lizard.miraheze.org HTTPS on misc4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:46:56] PROBLEM - misc3 Restbase on misc3 is CRITICAL: connect to address 185.52.1.144 and port 7231: Connection refused [00:46:58] SPF|Cloud i can get mw1 up for you [00:47:00] RECOVERY - misc3 Electron on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.144 port 3000 [00:47:02] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 40789 bytes in 2.725 second response time [00:47:04] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:47:06] PROBLEM - mw1 Mirahezerenewssl on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:08] PROBLEM - misc4 phabricator.miraheze.org HTTPS on misc4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:14] RECOVERY - puppet1 Disk Space on puppet1 is OK: DISK OK - free space: / 58929 MB (95% inode=99%); [00:47:16] PROBLEM - misc4 Lizardfs Master Port 3 on misc4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:18] PROBLEM - phab.miraheze.wiki on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:20] RECOVERY - puppet1 Puppet on puppet1 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [00:47:22] PROBLEM - misc4 SSH on misc4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:26] PROBLEM - mw1 Disk Space on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:47:32] PROBLEM - mw1 SSH on mw1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:34] PROBLEM - mw1 Current Load on mw1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:47:36] RECOVERY - puppet1 Current Load on puppet1 is OK: OK - load average: 2.82, 1.18, 0.44 [00:47:40] PROBLEM - misc4 phab.miraheze.wiki HTTPS on misc4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:42] RECOVERY - puppet1 SSH on puppet1 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [00:47:44] RECOVERY - misc2 Redis Process on misc2 is OK: PROCS OK: 1 process with args 'redis-server' [00:48:12] PROBLEM - bacula1 Bacula Phabricator Static on bacula1 is CRITICAL: CRITICAL: Timeout or unknown client: misc4-fd [00:48:16] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:48:22] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:48:24] gonna mail RamNode [00:48:24] PROBLEM - db4 Puppet on db4 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:48:36] PROBLEM - bacula1 Bacula Misc4 Lizardfs Master on bacula1 is CRITICAL: CRITICAL: Timeout or unknown client: misc4-fd [00:48:37] SPF|Cloud you can log into it now [00:48:38] (mw1) [00:48:46] RECOVERY - misc3 Restbase on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.144 port 7231 [00:48:50] PROBLEM - misc1 Puppet on misc1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:48:52] PROBLEM - adadevelopersacademy.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:48:56] PROBLEM - decrypted.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:02] PROBLEM - wiki.svenskabriardklubben.se - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:04] PROBLEM - madgenderscience.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:08] RECOVERY - Host lizardfs1 is UP: PING OK - Packet loss = 0%, RTA = 1.45 ms [00:49:10] PROBLEM - miraheze.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:14] [00:49:16] PROBLEM - garrettcountyguide.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:18] PROBLEM - tensegritywiki.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:20] PROBLEM - wiki.veloren.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:24] PROBLEM - wiki.lostminecraftminers.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:28] PROBLEM - test1 HTTPS on test1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:30] RECOVERY - mw1 SSH on mw1 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [00:49:32] PROBLEM - www.eerstelijnszones.be - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:34] PROBLEM - programming.red - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:36] PROBLEM - saveta.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:38] PROBLEM - www.dariawiki.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:40] PROBLEM - www.b1tcash.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:42] PROBLEM - poserdazfreebies.orain.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:44] PROBLEM - give.effectively.to - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:46] PROBLEM - infectowiki.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:48] RECOVERY - lizardfs1 Lizardfs Chunkserver Port on lizardfs1 is OK: TCP OK - 0.001 second response time on 81.4.101.157 port 9422 [00:49:50] PROBLEM - cp4 Puppet on cp4 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [00:50:10] y'all mind if I quit icinga-miraheze? [00:50:10] PROBLEM - marinebiodiversitymatrix.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:12] PROBLEM - wiki.inebriation-confederation.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:18] Voidwalker yes please [00:50:18] PROBLEM - wiki.dwplive.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:20] PROBLEM - www.evanswiki.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:22] PROBLEM - wiki.exnihilolinux.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:24] (please quiet it) :) [00:50:24] PROBLEM - www.allthetropes.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:31] thanks Voidwalker! [00:52:47] 2018-10-31 00:52:27 mw1 metawiki: [450d126725bdd15cc10cb98a] /wiki/Miraheze Cdb\Exception from line 41 of /srv/mediawiki/w/vendor/wikimedia/cdb/src/Writer/DBA.php: Unable to open CDB file for write "/mnt/mediawiki-static/cdb/managewiki/permissions-metawiki.cdb" [00:53:03] SPF|Cloud yeh [00:53:04] and this is exactly why you shouldn't use nfs for cdb [00:53:06] lizards is down [00:53:20] better get that path changed to the local ssd [00:53:24] SPF|Cloud it would go down any ways [00:54:28] !.log rebooting lizardfs2 [00:54:32] has been down for 45 minutes now [00:55:38] Oct 31 00:52:20 mw1 systemd[1]: php7.2-fpm.service: Failed to set invocation ID on control group /system.slice/php7.2-fpm.service, ignoring: Operation not permitted [00:55:42] mw1 seems due for reinstalling Ig uess [00:56:20] ok [01:00:26] mw1 is dead, if I reboot mw2 and mw3 (to come back to normal load) they might die as well [01:01:53] SPF|Cloud login.m.org is back [01:02:43] yeah, at least mw3 is getting back online [01:03:56] mw1 works now [01:03:57] SPF|Cloud [01:05:43] I see [01:05:45] can't say this was a 'professional maintenance window', though [01:06:04] yeh [01:06:50] [02miraheze/dns] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fxFWr [01:06:51] [02miraheze/dns] 07paladox 039699867 - Revert "Depool cp4" This reverts commit 8b6c27feaf733404a91ae34f49415eb85545fbc4. [01:06:53] apart from test1 everything seems working again [01:07:12] [02miraheze/mw-config] 07paladox pushed 032 commits to 03master [+0/-0/±2] 13https://git.io/fxFWK [01:07:13] yeh [01:07:13] [02miraheze/mw-config] 07paladox 03303f002 - Revert "Increase notice ID" This reverts commit 1aea827bddde58d4a945ee14a5c6886c6a65a80c. [01:07:15] [02miraheze/mw-config] 07paladox 031afc2ed - Revert "Update Sitenotice.php" This reverts commit b17559a25b24dc8046ac0d4ca0a2aef60fa604b0. [01:08:11] SPF|Cloud test1 should work now [01:08:53] yep. icinga alerts are being cleared [01:21:07] PROBLEM - mw3 Current Load on mw3 is WARNING: WARNING - load average: 1.16, 1.72, 7.85 [01:24:59] RECOVERY - mw3 Current Load on mw3 is OK: OK - load average: 1.07, 1.30, 6.32 [01:31:07] RECOVERY - mw3 JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [01:46:52] [02mw-config] 07The-Voidwalker opened pull request 03#2541: maintain sitenotice id - 13https://git.io/fxF8Y [02:02:56] [02mw-config] 07MacFan4000 closed pull request 03#2541: maintain sitenotice id - 13https://git.io/fxF8Y [02:02:58] [02miraheze/mw-config] 07MacFan4000 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fxF8Q [02:02:59] [02miraheze/mw-config] 07The-Voidwalker 035700f06 - maintain sitenotice id (#2541) otherwise you risk having users miss later notices [05:29:51] PROBLEM - mw3 JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 318 [05:31:51] RECOVERY - mw3 JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [06:16:10] PROBLEM - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is WARNING: WARNING - NGINX Error Rate is 46% [06:20:10] RECOVERY - cp4 HTTP 4xx/5xx ERROR Rate on cp4 is OK: OK - NGINX Error Rate is 3% [13:45:24] paladox: After restarting the servers, the imports made by the system appear in recent changes. I'm just telling you to let them know. https://ucronias.miraheze.org/wiki/Especial:CambiosRecientes [13:45:25] Title: [ Cambios recientes - Ucronías Wiki ] - ucronias.miraheze.org [13:47:04] But there don't seem to be any mistakes. [13:50:16] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fxbqY [13:50:18] [02miraheze/services] 07MirahezeSSLBot 039a78f0f - BOT: Updating services config for wikis [15:59:06] HI