[00:27:23] PROBLEM - Disk space on elastic1016 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19811 MB (3% inode=99%): [00:33:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [00:40:23] RECOVERY - Disk space on elastic1016 is OK: DISK OK [01:03:23] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:07:23] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:10:23] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:11:23] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:21:24] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:53] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 29 data above and 1 below the confidence bounds [01:25:53] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 29 data above and 1 below the confidence bounds [01:26:13] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [01:27:33] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:30:33] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:32:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [01:34:33] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:38:33] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:54:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 23:53:02 UTC [02:17:43] !log LocalisationUpdate completed (1.24wmf15) at 2014-08-03 02:16:39+00:00 [02:17:54] Logged the message, Master [02:19:02] (03Abandoned) 10Tim Landscheidt: WIP: Add test suite [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108880 (https://bugzilla.wikimedia.org/43266) (owner: 10Tim Landscheidt) [02:23:32] (03Abandoned) 10Tim Landscheidt: Add virtual host for wiki.toolserver.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109460 (https://bugzilla.wikimedia.org/60222) (owner: 10Tim Landscheidt) [02:28:48] !log LocalisationUpdate completed (1.24wmf16) at 2014-08-03 02:27:44+00:00 [02:28:54] Logged the message, Master [02:34:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [02:40:03] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 00:39:52 UTC [02:40:44] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sun Aug 3 02:40:37 UTC 2014 [02:53:23] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sun Aug 3 02:53:15 UTC 2014 [03:30:03] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 3 03:28:56 UTC 2014 (duration 28m 55s) [03:30:08] Logged the message, Master [03:33:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [04:35:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [05:17:53] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [05:31:53] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [05:34:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [06:29:03] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:13] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:44] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 4 failures [06:29:53] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 5 failures [06:29:53] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:54] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:03] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:16] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:23] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:13] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [06:45:53] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:53] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:46:23] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:23] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:47:24] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [07:30:34] PROBLEM - puppetmaster backend https on strontium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:31:24] RECOVERY - puppetmaster backend https on strontium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.033 second response time [07:35:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [07:45:04] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 124 seconds [07:45:43] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [08:14:03] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 06:13:14 UTC [08:37:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [09:02:03] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 07:00:56 UTC [09:36:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [09:53:23] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Sun Aug 3 09:53:15 UTC 2014 [09:58:03] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 07:57:01 UTC [10:17:03] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sun Aug 3 10:16:55 UTC 2014 [10:20:43] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Sun Aug 3 10:20:33 UTC 2014 [10:30:33] PROBLEM - puppetmaster backend https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:31:23] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.034 second response time [10:38:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [11:37:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [12:01:33] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Puppet has 1 failures [12:19:33] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [12:39:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [13:38:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [13:40:23] (03PS1) 10Yuvipanda: quarry: Use separate worker module for celery [operations/puppet] - 10https://gerrit.wikimedia.org/r/151409 [14:40:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [14:57:40] (03CR) 10Tim Landscheidt: [C: 031] tools: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil) [15:13:16] (03PS1) 10Tim Landscheidt: Tools: Install libgd-gd2-perl [operations/puppet] - 10https://gerrit.wikimedia.org/r/151416 (https://bugzilla.wikimedia.org/67199) [15:25:30] (03CR) 10Tim Landscheidt: "@Dzahn: Never done that and don't know if I have the permissions for that." [operations/puppet] - 10https://gerrit.wikimedia.org/r/124001 (owner: 10Tim Landscheidt) [15:39:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [15:44:37] (03CR) 10Tim Landscheidt: "The "puppet errors" were probably due to paravoid's removal of files/ganglia/collect_exim_stats_via_gmetric in I076a849434594efb20da95badc" [operations/puppet] - 10https://gerrit.wikimedia.org/r/143167 (owner: 10Yuvipanda) [15:45:15] (03PS1) 10Calak: i18n: Enable more features for fawiki AbuseFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 [15:48:41] (03CR) 10Tim Landscheidt: "Reverted in I6a4a4044ee2257fde1135107147c824723501ee8." [operations/puppet] - 10https://gerrit.wikimedia.org/r/143251 (owner: 10Matanya) [16:04:40] (03PS2) 10Calak: Enable more features for fawiki AbuseFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) [16:15:02] (03CR) 10Reza: [C: 031] "Thank you!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [16:38:03] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 14:37:21 UTC [16:41:03] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 14:40:29 UTC [16:41:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [16:57:44] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sun Aug 3 16:57:42 UTC 2014 [17:00:53] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Sun Aug 3 17:00:49 UTC 2014 [17:40:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [18:04:10] (03CR) 10Legoktm: [C: 04-1] Enable more features for fawiki AbuseFilter (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [18:37:53] PROBLEM - puppetmaster backend https on strontium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:38:04] (03PS3) 10Calak: Enable more features for fawiki AbuseFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) [18:38:43] RECOVERY - puppetmaster backend https on strontium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.021 second response time [18:39:30] (03CR) 10Reza: [C: 031] Enable more features for fawiki AbuseFilter [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [18:42:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [18:42:54] (03CR) 10Calak: "@Legoktm: Thank you, done." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151421 (https://bugzilla.wikimedia.org/69073) (owner: 10Calak) [19:07:28] Do we use a ::prefix for including classes, i. e. "include mediawiki::multimedia::fonts" or "include ::mediawiki::multimedia::fonts"? The former is used *far* more often (~ 10 : 1), but https://wikitech.wikimedia.org/wiki/Puppet_usage isn't explicit about that. [19:11:10] scfc_de: heh, it isn't explicit abot most things [19:17:33] (03PS1) 10Tim Landscheidt: Tools: Include mediawiki::multimedia::fonts in exec_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/151440 (https://bugzilla.wikimedia.org/66354) [19:39:36] aaron@terbium:~$ mwscript showJobs.php enwiki --group [19:39:38] refreshLinks: 0 queued; 49 claimed (4 active, 45 abandoned); 0 delayed [19:39:46] * AaronSchulz celebrates \o/ [19:40:06] AaronSchulz: is this on hhvm? [19:40:21] ? [19:40:31] mw1053 isn't running atm if that's what you mean [19:40:44] ah [19:40:52] gwtoolsetUploadMediafileJob: 0 queued; 14996 claimed (0 active, 14996 abandoned); 0 delayed [19:40:58] hmm, sad panda for those [19:41:02] heh [19:41:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [19:41:15] on commons [19:41:45] so basically 90% of the queue is just parsoid ;) [19:42:12] it has as many dedicated runners as the main loop for most jobs [19:42:27] * AaronSchulz wonder how much you have to throw at it [20:13:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 18:12:52 UTC [20:33:03] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 18:32:49 UTC [20:33:06] Are we aware that Gmail categorises some e-mails sent through Special:Emailuser as phishing? [20:33:34] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sun Aug 3 20:33:31 UTC 2014 [20:33:34] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Sun Aug 3 20:33:31 UTC 2014 [20:33:35] It happened to me just now, for the first time, never seen that before [20:43:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [21:09:28] odder: I filed a bug about it several months ago [21:09:47] (please add your information there) [21:11:53] https://bugzilla.wikimedia.org/buglist.cgi?quicksearch=phishing [21:16:29] sigh [21:18:27] odder: https://bugzilla.wikimedia.org/show_bug.cgi?id=56416 [21:37:03] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 19:36:48 UTC [21:42:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [21:43:05] (03PS1) 10Tim Landscheidt: Set up redirects for toolserver.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/151523 (https://bugzilla.wikimedia.org/60238) [21:43:24] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [21:44:55] (03CR) 10Tim Landscheidt: "Moved to I398127fe8e8531689a9ee67d56e69f3d06020714." (031 comment) [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [21:46:19] (03CR) 10Tim Landscheidt: "Incorporated Krinkle's advice at I6c320b1d46176aca6b29731527530461f1696fd5 and the changes of Ie16e5de91c68ce9c7e4a3332cd14431033e52135 an" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151523 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [21:53:05] (03PS1) 10Tim Landscheidt: Tools: Sort package lists alphabetically [operations/puppet] - 10https://gerrit.wikimedia.org/r/151526 [21:57:14] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sun Aug 3 21:57:11 UTC 2014 [21:59:23] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [22:44:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [23:32:44] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:33:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sun 03 Aug 2014 21:32:40 UTC [23:33:23] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sun Aug 3 23:33:18 UTC 2014 [23:34:13] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 29 data above and 0 below the confidence bounds [23:34:23] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 29 data above and 0 below the confidence bounds [23:34:34] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.006 second response time [23:43:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [23:59:02] (03PS1) 10Tim Landscheidt: Tools: Install php5-imagick [operations/puppet] - 10https://gerrit.wikimedia.org/r/151551 (https://bugzilla.wikimedia.org/69078)