[00:23:30] RECOVERY - mw3 JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [02:40:14] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fxXrZ [02:40:16] [02miraheze/services] 07MirahezeSSLBot 03b1fbf44 - BOT: Updating services config for wikis [04:03:34] PROBLEM - puppet1 Current Load on puppet1 is CRITICAL: CRITICAL - load average: 4.25, 4.25, 2.19 [04:05:34] RECOVERY - puppet1 Current Load on puppet1 is OK: OK - load average: 0.65, 2.87, 1.93 [05:41:32] PROBLEM - mw3 JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 2949 [06:45:30] RECOVERY - mw3 JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs [09:14:20] PROBLEM - puppet1 Current Load on puppet1 is CRITICAL: CRITICAL - load average: 9.16, 5.95, 3.11 [09:15:14] PROBLEM - misc2 Current Load on misc2 is CRITICAL: CRITICAL - load average: 10.78, 5.94, 2.65 [09:15:26] PROBLEM - misc2 HTTPS on misc2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:14] PROBLEM - misc2 Prometheus on misc2 is CRITICAL: connect to address 81.4.127.174 and port 9090: Connection refused [09:19:10] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:20:13] PROBLEM - puppet1 Puppet on puppet1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:20:31] PROBLEM - misc3 Current Load on misc3 is CRITICAL: CRITICAL - load average: 4.10, 3.11, 1.52 [09:21:31] PROBLEM - misc2 Puppet on misc2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:22:21] PROBLEM - misc2 SSH on misc2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:23:19] PROBLEM - misc2 Redis Process on misc2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:23:25] PROBLEM - misc2 Disk Space on misc2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:23:41] PROBLEM - misc3 Disk Space on misc3 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:24:33] PROBLEM - misc3 Parsoid on misc3 is CRITICAL: connect to address 185.52.1.144 and port 8142: Connection refused [09:24:43] PROBLEM - misc3 SSH on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:25:25] PROBLEM - ns1 Puppet on ns1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:47] PROBLEM - lizardfs2 Puppet on lizardfs2 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:25:51] PROBLEM - puppet1 Disk Space on puppet1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:26:03] PROBLEM - cp4 Puppet on cp4 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:07] PROBLEM - bacula1 Puppet on bacula1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:19] PROBLEM - cp2 Puppet on cp2 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:33] PROBLEM - mw1 Puppet on mw1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:43] PROBLEM - mw3 Puppet on mw3 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:26:53] PROBLEM - puppet1 SSH on puppet1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:26:57] PROBLEM - mw2 Puppet on mw2 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:28:11] PROBLEM - db4 Puppet on db4 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:11] PROBLEM - misc4 Puppet on misc4 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:29:27] PROBLEM - misc1 GDNSD Datacenters on misc1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [09:29:51] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 5 datacenters are down: 107.191.126.23/cpweb, 81.4.109.133/cpweb, 2a00:d880:5:8ea::ebc7/cpweb, 172.104.111.8/cpweb, 2400:8902::f03c:91ff:fe07:444e/cpweb [09:29:55] PROBLEM - misc3 Restbase on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:30:05] PROBLEM - cp4 Varnish Backends on cp4 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [09:30:21] PROBLEM - cp2 Varnish Backends on cp2 is CRITICAL: 3 backends are down. mw1 mw2 mw3 [09:30:25] PROBLEM - cp5 Varnish Backends on cp5 is CRITICAL: 2 backends are down. mw2 mw3 [09:30:57] PROBLEM - misc3 Electron on misc3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:07] PROBLEM - Host misc2 is DOWN: PING CRITICAL - Packet loss = 100% [09:31:19] PROBLEM - Host puppet1 is DOWN: PING CRITICAL - Packet loss = 100% [09:31:55] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [09:31:57] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 19 minutes ago with 0 failures [09:32:03] RECOVERY - misc3 Disk Space on misc3 is OK: DISK OK - free space: / 58667 MB (95% inode=99%); [09:32:21] RECOVERY - cp2 Varnish Backends on cp2 is OK: All 5 backends are healthy [09:32:23] RECOVERY - cp5 Varnish Backends on cp5 is OK: All 5 backends are healthy [09:32:43] RECOVERY - misc3 Parsoid on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.144 port 8142 [09:32:51] RECOVERY - misc3 Electron on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.144 port 3000 [09:32:53] RECOVERY - misc3 SSH on misc3 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [09:33:05] RECOVERY - Host misc2 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [09:33:09] RECOVERY - misc3 Current Load on misc3 is OK: OK - load average: 1.27, 0.82, 0.32 [09:33:17] RECOVERY - misc2 Prometheus on misc2 is OK: TCP OK - 0.001 second response time on 81.4.127.174 port 9090 [09:33:31] RECOVERY - Host puppet1 is UP: PING OK - Packet loss = 68%, RTA = 0.74 ms [09:33:33] RECOVERY - misc1 GDNSD Datacenters on misc1 is OK: OK - all datacenters are online [09:33:45] RECOVERY - misc3 Restbase on misc3 is OK: TCP OK - 0.001 second response time on 185.52.1.144 port 7231 [09:33:49] PROBLEM - cp5 Puppet on cp5 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:34:03] RECOVERY - cp4 Varnish Backends on cp4 is OK: All 5 backends are healthy [09:34:49] RECOVERY - puppet1 Puppet on puppet1 is OK: OK: Puppet is currently enabled, last run 33 minutes ago with 0 failures [09:35:10] RECOVERY - puppet1 SSH on puppet1 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u4 (protocol 2.0) [09:35:12] RECOVERY - puppet1 Current Load on puppet1 is OK: OK - load average: 1.92, 0.72, 0.26 [09:40:20] PROBLEM - test1 Puppet on test1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:56] PROBLEM - lizardfs1 Puppet on lizardfs1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:41:08] PROBLEM - misc2 Current Load on misc2 is CRITICAL: CRITICAL - load average: 3.22, 1.65, 0.69 [09:42:54] RECOVERY - lizardfs1 Puppet on lizardfs1 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [09:43:12] PROBLEM - puppet1 Current Load on puppet1 is CRITICAL: CRITICAL - load average: 11.40, 6.45, 2.71 [09:43:34] PROBLEM - misc1 Puppet on misc1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:44:02] RECOVERY - cp4 Puppet on cp4 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [09:44:06] RECOVERY - bacula1 Puppet on bacula1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:44:10] RECOVERY - db4 Puppet on db4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:45:44] PROBLEM - misc2 Puppet on misc2 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:58] PROBLEM - misc3 Puppet on misc3 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:02] PROBLEM - puppet1 Puppet on puppet1 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. [09:47:17] !log PURGE BINARY LOGS BEFORE '2018-10-24 10:46:00'; on db4 [09:48:45] RECOVERY - misc2 Current Load on misc2 is OK: OK - load average: 1.32, 0.43, 0.15 [09:49:57] PROBLEM - cp5 HTTP 4xx/5xx ERROR Rate on cp5 is WARNING: WARNING - NGINX Error Rate is 43% [09:51:39] !log rebooted puppet1 [09:51:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [09:51:48] !log rebooted misc2 (to try and reduce load) [09:51:52] !log PURGE BINARY LOGS BEFORE '2018-10-24 10:46:00'; on db4 [09:51:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [09:51:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [09:51:57] RECOVERY - cp5 HTTP 4xx/5xx ERROR Rate on cp5 is OK: OK - NGINX Error Rate is 30% [09:52:35] PROBLEM - misc2 Current Load on misc2 is CRITICAL: CRITICAL - load average: 3.16, 2.13, 0.89 [09:52:55] PROBLEM - lizardfs1 Puppet on lizardfs1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:54:03] PROBLEM - cp4 Puppet on cp4 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:54:07] PROBLEM - bacula1 Puppet on bacula1 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:54:11] PROBLEM - db4 Puppet on db4 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:54:29] RECOVERY - misc2 Current Load on misc2 is OK: OK - load average: 0.81, 1.59, 0.83 [09:55:07] RECOVERY - puppet1 Current Load on puppet1 is OK: OK - load average: 0.94, 0.21, 0.07 [09:59:53] RECOVERY - puppet1 Puppet on puppet1 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:01:17] RECOVERY - misc2 Puppet on misc2 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:01:25] RECOVERY - ns1 Puppet on ns1 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:01:45] RECOVERY - lizardfs2 Puppet on lizardfs2 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [10:01:53] RECOVERY - misc3 Puppet on misc3 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:02:03] RECOVERY - cp4 Puppet on cp4 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:02:08] RECOVERY - bacula1 Puppet on bacula1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:02:09] RECOVERY - db4 Puppet on db4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:02:20] RECOVERY - test1 Puppet on test1 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:02:34] RECOVERY - mw1 Puppet on mw1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:02:44] RECOVERY - mw3 Puppet on mw3 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:02:56] RECOVERY - mw2 Puppet on mw2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:03:10] RECOVERY - misc4 Puppet on misc4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:03:34] RECOVERY - misc1 Puppet on misc1 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:03:50] RECOVERY - cp5 Puppet on cp5 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:36:27] RECOVERY - misc2 HTTPS on misc2 is OK: HTTP OK: HTTP/1.1 200 OK - 40789 bytes in 0.090 second response time [18:01:31] PROBLEM - mw3 JobQueue on mw3 is CRITICAL: JOBQUEUE CRITICAL - job queue greater than 300 jobs. Current queue: 5982 [19:30:18] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/fxMLZ [19:30:20] [02miraheze/services] 07MirahezeSSLBot 0378bdb32 - BOT: Updating services config for wikis [22:09:29] RECOVERY - mw3 JobQueue on mw3 is OK: JOBQUEUE OK - job queue below 300 jobs