[00:09:39] (03CR) 10Krinkle: [C: 031] "Woo, this dates back to 2012. Ib6bc0bb88e2413682e" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145620 (owner: 10Ori.livneh) [00:15:04] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: Puppet has 7 failures [00:16:04] RECOVERY - puppet last run on analytics1031 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [00:16:15] weird, manual puppet run seemed to fix that [00:16:17] everything looks ok. [00:18:14] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [00:19:16] mutante: in case someone else runs into the SSH problem: http://superuser.com/a/782127/34937 [00:23:47] (03PS4) 10Awight: Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 [00:23:58] (03CR) 10jenkins-bot: [V: 04-1] Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 (owner: 10Awight) [00:26:51] (03PS5) 10Awight: Enable FundraisingTranslateWorkflow on metawiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145703 [00:29:34] (03PS1) 10Ejegg: Add new CentralNotice cookie config var [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145736 [00:45:53] greg-g: I just bumped https://bugzilla.wikimedia.org/show_bug.cgi?id=67805 as it's seemingly spreading to other wikis… :-( [00:52:30] * greg-g looks [00:54:27] ugh [00:55:48] greg-g: Ta. [01:19:22] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:31:38] _joe_: Looks like there is a list! [01:31:38] https://github.com/wikimedia/operations-mediawiki-config/commit/88f997ef96e9efb3687afe916f0e4561a5fe6942 [01:31:44] apache-fast-test urls.txt [01:45:47] (03PS1) 10Legoktm: Add centralauth.dblist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145743 (https://bugzilla.wikimedia.org/67910) [01:52:01] (03CR) 10TTO: [C: 04-1] "Needs NOC symlinks etc." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145743 (https://bugzilla.wikimedia.org/67910) (owner: 10Legoktm) [02:00:00] (03PS2) 10Legoktm: Add centralauth.dblist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145743 (https://bugzilla.wikimedia.org/67910) [02:00:29] (03CR) 10Legoktm: "Thanks, didn't know I had to do that." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145743 (https://bugzilla.wikimedia.org/67910) (owner: 10Legoktm) [02:16:36] !log LocalisationUpdate completed (1.24wmf12) at 2014-07-12 02:15:33+00:00 [02:16:45] Logged the message, Master [02:22:34] (03CR) 10Krinkle: "Ideally this'd be generated by a script so that it doesn't get out of sync." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145743 (https://bugzilla.wikimedia.org/67910) (owner: 10Legoktm) [02:26:39] (03CR) 10Krinkle: "Alternatively, slightly easier to do, add a test to dblistTest.php that ensures it is what it should be. Then at least when people change " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145743 (https://bugzilla.wikimedia.org/67910) (owner: 10Legoktm) [02:26:51] !log LocalisationUpdate completed (1.24wmf13) at 2014-07-12 02:25:47+00:00 [02:26:56] Logged the message, Master [02:52:53] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 12 02:51:47 UTC 2014 (duration 51m 46s) [02:52:58] Logged the message, Master [04:14:02] Anyone know if there's anything up with the mail server? [04:25:55] N/M, it's working, just taking like 45 minutes to get delivered. [04:34:46] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 02:33:39 UTC [04:42:11] <_joe_> Krinkle|detached: thanks! I had build one by myself [05:32:57] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 7 below the confidence bounds [05:33:36] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Jul 12 05:33:30 UTC 2014 [06:27:04] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 7 below the confidence bounds [06:28:44] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:54] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:14] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:24] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:54] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:43] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:45:53] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:45:53] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:46:33] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [07:12:12] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [07:34:14] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 05:33:30 UTC [07:37:38] !log started running checkLocalNames.php --delete=1 on all CentralAuth wikis for bug 67350 [07:37:43] Logged the message, Master [07:39:20] !log started running checkLocalUser.php --delete=1 on all CentralAuth wikis for bug 67350 [07:39:25] Logged the message, Master [07:50:34] thanks legoktm [07:50:55] mhm [07:53:27] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Jul 12 07:53:25 UTC 2014 [08:00:27] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 05:59:42 UTC [08:20:29] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Jul 12 08:20:27 UTC 2014 [09:14:48] (03CR) 10Filippo Giunchedi: "looks good, one question around the include" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145510 (owner: 10Ori.livneh) [09:37:16] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [09:52:14] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [09:54:54] (03CR) 10Filippo Giunchedi: "packaging looks good!" (031 comment) [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 (owner: 1020after4) [09:55:51] (03CR) 10Filippo Giunchedi: Add init and upstart scripts (031 comment) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/144981 (owner: 10Giuseppe Lavagetto) [09:59:18] (03CR) 10Filippo Giunchedi: [C: 031] "looks good, just a couple of nitpicks" (032 comments) [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/144981 (owner: 10Giuseppe Lavagetto) [10:19:06] (03CR) 10Filippo Giunchedi: [C: 031] "agree it shouldn't be kept around if it doesn't work" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145620 (owner: 10Ori.livneh) [11:41:29] (03PS1) 10TTO: Add lawiki featured feed settings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145760 (https://bugzilla.wikimedia.org/33978) [11:44:04] PROBLEM - LighttpdHTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:44:54] RECOVERY - LighttpdHTTP on dataset1001 is OK: HTTP OK: HTTP/1.1 200 OK - 5122 bytes in 0.085 second response time [12:00:48] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 10:00:21 UTC [12:35:09] Is there a plan or an ETA for upgrading icinga.wikimedia.org to Ubuntu Trusty? [12:43:37] (03CR) 10JanZerebecki: [C: 031] "I'm fine with this as this change doesn't have any effect. The !DH at the end of the cipher setting removes all DHE ciphers from the list." [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 (owner: 10Dzahn) [13:18:06] (03CR) 10Chmarkine: [C: 031] remove SSL cipher DHE-RSA-AES128-GCM-SHA256 [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 (owner: 10Dzahn) [13:20:38] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Jul 12 13:20:30 UTC 2014 [13:25:38] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:28] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53248 bytes in 0.184 second response time [13:51:43] !log reboot ms-be1007, xfs problems on sdn, load at 300+ [13:51:49] Logged the message, Master [14:03:23] ugh, that required a powerdown, xfs and the kernel were wedged sideways [15:38:59] PROBLEM - mysqld processes on labsdb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:39:19] PROBLEM - check if dhclient is running on labsdb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:39:19] PROBLEM - MySQL Recent Restart Port 3306 on labsdb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:39:30] PROBLEM - MySQL Idle Transactions Port 3306 on labsdb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:47:50] RECOVERY - mysqld processes on labsdb1001 is OK: PROCS OK: 1 process with command name mysqld [15:48:10] RECOVERY - MySQL Recent Restart Port 3306 on labsdb1001 is OK: OK seconds since restart [15:48:11] RECOVERY - check if dhclient is running on labsdb1001 is OK: PROCS OK: 0 processes with command name dhclient [15:48:20] RECOVERY - MySQL Idle Transactions Port 3306 on labsdb1001 is OK: OK longest blocking idle transaction sleeps for seconds [15:59:44] (03CR) 10Ori.livneh: mediawiki: move SSHD nice override from web.pp to init.pp (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145510 (owner: 10Ori.livneh) [16:14:11] PROBLEM - MediaWiki profile collector on tungsten is CRITICAL: CRITICAL: Not all configured mwprof instances are running. [16:16:11] RECOVERY - MediaWiki profile collector on tungsten is OK: OK: All defined mwprof jobs are runnning. [18:29:44] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [18:43:27] (03CR) 10JanZerebecki: "Oh just remembered: you might then also want to remove kEDH+AESGCM as that contains DHE. (Again already disabled by !DH .)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145688 (owner: 10Dzahn) [18:46:40] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [19:59:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:01:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:03:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:05:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:07:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:09:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:11:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:13:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:15:50] (03CR) 1020after4: "Well do we have build machines running trusty? And if it's built on trusty will it still run on precise? I really don't know what version" [operations/debs/php-mailparse] (review) - 10https://gerrit.wikimedia.org/r/142751 (owner: 1020after4) [20:15:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:17:57] PROBLEM - Puppet freshness on db1019 is CRITICAL: Last successful Puppet run was Sat 12 Jul 2014 19:57:51 UTC [20:18:38] RECOVERY - Puppet freshness on db1019 is OK: puppet ran at Sat Jul 12 20:18:32 UTC 2014 [22:04:51] !log checkLocalNames/checkLocalUser finished a few hours ago, I don't have a timestamp (bug 67350) [22:04:57] Logged the message, Master [22:09:35] oh, I do have a timestamp [22:09:36] meh [22:21:34] !log running foreachwiki extensions/CentralAuth/maintenance/migratePass0.php (bug 67350) [22:21:39] Logged the message, Master [22:35:57] (03PS1) 10Nemo bis: Remove dead ULS variable after I49e812eae32266f165591c75fd67b86ca06b13f0 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145861 [22:36:25] (03Abandoned) 10Nemo bis: Remove dead ULS configs after I49e812eae32266f165591c75fd67b86ca06b13f0 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/115880 (owner: 10Nemo bis) [23:06:59] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [23:20:05] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]