[00:30:56] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [00:30:56] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [00:33:29] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [01:33:58] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [01:34:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:40:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.017 seconds [01:40:45] hashar: ping [01:41:19] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 239 seconds [01:41:28] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 248 seconds [01:44:19] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [01:45:58] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 10 seconds [01:47:01] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [02:11:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.463 seconds [04:44:14] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [04:44:50] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms [04:48:53] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [04:52:02] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [05:01:20] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:01:20] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [05:01:20] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [05:01:20] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [05:01:20] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [05:01:20] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [05:10:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:15:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:20:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:25:37] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:30:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:35:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:40:10] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [05:40:19] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:45:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:50:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [05:55:37] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:00:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:05:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:10:37] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:15:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:20:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:25:28] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:30:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:35:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:40:28] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:45:18] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:50:33] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:50:33] PROBLEM - check_nginx on payments1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name nginx [06:55:39] RECOVERY - check_nginx on payments1001 is OK: PROCS OK: 49 processes with command name nginx [06:55:39] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [06:57:36] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [07:00:36] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:05:33] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:10:39] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:15:36] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:20:06] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:25:03] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:30:09] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:35:06] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:40:03] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:45:28] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:48:10] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [07:50:25] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [07:55:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:00:28] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:05:25] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:10:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:13:13] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [08:15:28] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:15:37] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms [08:20:25] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:25:22] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:30:28] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:35:25] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:40:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:45:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:50:16] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [08:55:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:00:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:05:16] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:05:52] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [09:10:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:15:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:20:16] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:25:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:30:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:35:16] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:40:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:41:00] 5 nginx instances? [09:45:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:50:16] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [09:55:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:00:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:05:16] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:10:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:13:31] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2563* [10:15:01] RECOVERY - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is OK: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y OK - 2388 [10:15:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:20:16] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:25:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:30:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:32:23] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [10:32:23] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [10:35:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:40:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:45:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:50:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [10:55:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:05:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:10:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:15:18] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:20:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:25:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:30:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:35:08] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [11:35:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:40:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:45:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:48:11] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [11:50:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [11:55:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:00:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:05:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:10:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:15:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:20:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:25:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:30:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:35:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:36:37] @statistics-on [12:36:38] Statistics were now enabled [12:40:07] New review: Aude; "looks good though found some whitespace" [operations/deployment] (master) C: 0; - https://gerrit.wikimedia.org/r/8732 [12:40:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:45:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:50:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [12:55:15] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:00:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:05:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:10:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:15:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:15:11] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [13:20:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:25:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:30:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:35:23] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:40:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:45:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:50:23] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [13:55:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:00:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:05:23] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:10:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:15:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:17:50] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [14:20:05] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [14:20:14] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:25:20] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:29:59] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2588* [14:30:17] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:31:29] RECOVERY - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is OK: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y OK - 2063 [14:34:04] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [14:35:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:40:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:43:29] New patchset: Jeremyb; "add wikiversions.dat to noc.wm.o/conf" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23784 [14:45:37] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:46:13] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [14:48:15] hi gerrit-wm! [14:49:00] finally someone else talking then nagios-wm ;) [14:50:25] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [14:55:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:00:37] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:02:34] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [15:02:34] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [15:02:34] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [15:02:34] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [15:02:34] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [15:02:34] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [15:05:25] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:10:22] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:15:08] New review: Hashar; "Good catch! The index.php already has a *.dat filter to show up the wikiversions.dat file." [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/23784 [15:15:09] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23784 [15:15:37] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:15:51] jeremyb: thanks for the wikiversions.dat fix :) [15:16:50] hashar: sure ;) there's a relevant thread on wikitech in case you missed it [15:17:13] jeremyb: I came from the thread ;-:D [15:17:19] * jeremyb had already read index.php and saw the dat filter [15:17:30] hashar: i thought maybe you came from the review req ;) [15:18:07] I just woke up [15:18:13] opened laptop for some mail [15:18:15] hashar: want to do some more shell reqs? ;-P [15:18:21] and end up deploying a change hehe [15:18:47] maybe need to boot apache on fenari? [15:18:51] not showing up [15:20:25] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:21:15] I have no idea [15:21:25] need to find apache docroot on fenari [15:22:30] New patchset: Mark Bergsma; "Add NTP client to sodium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23786 [15:23:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23786 [15:23:33] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23786 [15:23:49] STUPID CLUSTER CONFIGURATION [15:25:31] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:27:03] bahhh [15:27:54] I have no idea how to update fenari document root [15:28:22] PROBLEM - spamassassin on sodium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:28:49] PROBLEM - mailman on sodium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:29:13] that's not too good [15:30:37] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:32:51] !log Powercycled sodium [15:33:00] Logged the message, Master [15:35:34] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:35:52] RECOVERY - spamassassin on sodium is OK: PROCS OK: 4 processes with args spamd [15:36:10] RECOVERY - mailman on sodium is OK: PROCS OK: 10 processes with args mailman [15:40:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:40:41] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [15:41:10] New patchset: Hashar; "fix noc.wm.org configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23788 [15:42:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23788 [15:42:07] New review: Hashar; "It seems like updating noc.wm.org require some manual copy from commons to a specific docroot. I hav..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23784 [15:45:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:46:05] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 206 seconds [15:46:20] must get out [15:46:23] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 212 seconds [15:47:53] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 184 seconds [15:47:53] New patchset: Jeremyb; "admins.pp: annotate the include as disabled" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23789 [15:48:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23789 [15:49:05] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 215 seconds [15:50:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [15:50:52] is db33 still broken?? [15:51:22] hrmmm, seems the earlier problems with it have fallen off the end of my backscroll [15:55:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:00:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:01:05] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [16:01:32] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [16:05:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:07:14] RECOVERY - MySQL Replication Heartbeat on db1020 is OK: OK replication delay 0 seconds [16:07:14] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 0 seconds [16:10:06] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:15:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:20:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:25:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:30:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:35:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:40:05] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:45:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:50:07] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:55:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [16:58:13] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [17:00:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:03:45] New patchset: Hashar; "fix noc.wm.org configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23788 [17:04:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23788 [17:05:07] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:05:16] PROBLEM - NTP on sodium is CRITICAL: NTP CRITICAL: No response from NTP server [17:10:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:14:28] New review: Hashar; "Please keep the million.txt and test.txt files, they are unrelated to the sep11 wiki removal :)" [operations/mediawiki-config] (master); V: -1 C: -1; - https://gerrit.wikimedia.org/r/22534 [17:15:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:15:16] lcarr: lots of payments CRITICAL alerts about nginx process counts. can't tell if they're a real problem. maybe tell jeff when you see him [17:15:16] New review: Hashar; "Roan, is this now safe to merge in ?" [operations/mediawiki-config] (master); V: 0 C: -1; - https://gerrit.wikimedia.org/r/18125 [17:15:34] jeremyb: thanks - it's not live yet so should be ok [17:15:47] LeslieCarr: what isn't? [17:15:56] LeslieCarr: there was a fundraising test that ended 20 mins ago [17:16:12] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16874 [17:17:11] the new fundraising infrastructure in eqiad [17:17:17] so all the 100x machines [17:17:19] oh, ok [17:17:22] nvm ;) [17:17:25] it's all being served from tampa still [17:17:26] oh it's ok [17:17:39] i should probably work on the firewall again ;) [17:17:48] heh [17:17:53] meetings still i guess [17:17:54] ? [17:20:07] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:25:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:26:31] Reedy: you created the noc root dir in mediawiki-config.git [17:26:45] Reedy: I did a change in puppet to use that instead of the old stuff [17:27:13] New patchset: Aude; "disable anon edits on wikimania2012 wiki since we are not monitoring RC as closely now" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23795 [17:27:14] guess that is a dupe of your hehe [17:28:05] Yeah, I did it not so long ago ;) [17:28:18] New review: Hashar; "We should also remove the index.html file which is being provided by puppet. Example on abandoned ch..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/23425 [17:28:26] New review: Hashar; "Already there with https://gerrit.wikimedia.org/r/#/c/23425/" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/23788 [17:28:39] Change abandoned: Hashar; "See: https://gerrit.wikimedia.org/r/#/c/23425/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23788 [17:28:54] I didn't see it being overly urgent to get moved over as it doesn't change much [17:29:03] yup [17:29:05] It's at least in puppet now [17:29:07] just need to remove the index.html file as well [17:29:11] I will amend your change [17:29:52] cool, ty [17:30:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:30:11] and of course we need to find an op to merge it :-] [17:32:14] New patchset: Hashar; "Move noc from /h/w/htdocs to /h/w/c/docroot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23425 [17:32:22] Reedy: ^^^^^ [17:32:41] I am not sure what you mean by "files copying first" [17:33:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23425 [17:34:23] I just copied the noc files to the proper place, before the puppet change [17:34:31] due to us not needing to worry about it getting out of sync [17:34:42] kk [17:34:55] guess you will have to sync with whoever from op is going to merge it [17:35:07] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:35:22] hashar: whatcha need ? 5 second summary [17:35:30] (too lazy to scroll up) [17:35:40] i don't even understand it ;) [17:35:57] not that i tried too hard [17:35:59] LeslieCarr: merge and push a change to move the docroot for noc.wikimedia.org [17:36:16] hashar: https://gerrit.wikimedia.org/r/#/c/23425/ ? [17:36:30] yeah that one [17:36:46] the only impact would be making noc.wm.org looks weird ;-D [17:37:02] ugh, mediawiki-config is still broken? no one has the refs? [17:37:13] Is it? [17:37:17] reedy copied the files to /h/w/c/docroot/noc already and it is on feanari [17:37:33] jeremyb: what do you mean? I did sent changes to it. [17:37:36] why are we removing index.html ? [17:37:48] Reedy: yes [17:37:56] LeslieCarr: it is provided by operations/mediawiki-config.git:/docroot/noc/index.html or something [17:37:56] LeslieCarr: there's an index.php ? idk [17:38:06] oh, nvm [17:38:10] i'm confusing with /conf/ [17:38:16] LeslieCarr: so we don't want Puppet to override index.html anymore ;-) [17:38:23] why is that ? [17:38:55] i mean why is it via this other repo and not the normal puppet repo ? [17:39:34] historically the noc docroot was only on fenari [17:39:36] so anyone can deploy it? [17:39:51] I think I am the one that added index.html to puppet (can't remember) so people can submit changes against it [17:40:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:40:16] then sam took the whole noc docroot and put it in operations/mediawiki-config [17:40:30] I wonder what else is still useful in that htdocs folder [17:40:37] that now hold the dbtree stuff from asher, and various confs [17:43:02] Reedy: I don't think /h/w/htdocs is still used anywhere. Will have to carefully clean it up over time [17:43:04] New patchset: Jeremyb; "noc.wm.o/conf: these files are now in public git" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23797 [17:43:13] the noc one is a good catch [17:43:37] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23797 [17:43:39] hashar: which? [17:44:04] we have noc.wm.org served from /h/w/htdocs which is only on fenari [17:44:10] (sort of) [17:44:13] Based on the puppet repo, that's the only usage of /h/w/htdocs [17:44:13] hashar: can we do 23286 too? [17:44:21] !g 23286 [17:44:21] https://gerrit.wikimedia.org/r/#q,23286,n,z [17:44:33] and 23419 [17:45:01] sorry I am not willing to deploy a configuration change on a production wiki [17:45:08] we are at a conference right now [17:45:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:45:19] so supposed to pay attention to what is being broadcasted / shown / said [17:45:31] sorry jeremyb :( [17:45:33] we are? shit ;) [17:45:40] LeslieCarr: hehe [17:46:07] hashar: ok, sure. maybe monday? idk. i'm mostly not around sat-tues [17:46:18] to poke people later ;) [17:46:22] yeah monday definitely [17:46:32] that is my code review day [17:46:36] hashar: and the other one too? [17:46:44] (though that will be a code review evening/night due to jet lag) [17:46:46] yeah [17:47:06] k. merci [17:47:10] !g 23419 [17:47:11] https://gerrit.wikimedia.org/r/#q,23419,n,z [17:47:15] de rien :) [17:49:13] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [17:50:08] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [17:55:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:00:11] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:05:07] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:07:54] hashar: Ping [18:08:29] hoo: yup [18:08:45] hashar: "Please keep the million.txt and test.txt files, they are unrelated to the sep11 wiki removal :)" [18:08:55] yeah [18:08:58] And need to keep them? I know the first one is from en, [18:09:02] huh, which one? [18:09:05] so you submitted a change to cleanup the conf from the sep11 conf [18:09:19] I submitted a change with a few clean ups [18:09:25] and also deleted both those files which are not related to sep11 [18:09:29] In fact two :P sep11 and those files [18:09:36] we prefer one cleanup = one change ;-) [18:09:46] so that should be a commit for sep11 and another one to delete the files [18:10:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:10:44] hashar: That's a bit frustrating, I asked beforehand on IRC and nobody answered, so I assumed it's fine that way :/ [18:11:57] to restore a file: git checkout origin/master -- some/path/to/file.txt [18:12:03] then git add it [18:12:11] amend change and resubmit :-) [18:13:30] New patchset: Hoo man; "Clean up: Removed sep11wiki code" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22534 [18:13:41] \O/ [18:15:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:15:21] New patchset: Hashar; "Clean up: Removed sep11wiki code" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22534 [18:20:07] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:20:09] New review: Hashar; "seems fine. Still have an occurence of sept11wiki in the closed.dblist that I guess we could remove..." [operations/mediawiki-config] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/22534 [18:20:40] luuunnnnchhhh tiiiime [18:20:53] be careful... somethings hate removals from all.dblist [18:20:54] iirc [18:20:56] and he's gone [18:22:05] New patchset: Hoo man; "Clean up: Removed sep11wiki code" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22534 [18:23:07] New patchset: Hoo man; "Clean up: Removed sep11wiki code" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22534 [18:25:13] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:25:57] New patchset: Hoo man; "Clean up: Removed old test files out of wikipedia.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23798 [18:30:10] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:35:12] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:40:09] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:45:06] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:50:12] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [18:55:09] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [19:00:06] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [19:05:12] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [19:07:09] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [19:10:09] PROBLEM - check_nginx on payments1004 is CRITICAL: PROCS CRITICAL: 5 processes with command name nginx [19:15:06] RECOVERY - check_nginx on payments1004 is OK: PROCS OK: 49 processes with command name nginx [20:23:13] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [20:24:34] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [20:33:16] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [20:33:16] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [21:35:40] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [21:48:43] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [21:50:58] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [21:51:34] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [22:41:43] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (32940) [22:42:10] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (32177) [23:12:37] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [23:13:13] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [23:33:19] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [23:39:10] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (51479) [23:39:55] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (50112) [23:45:19] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active