[00:00:13] mutante twentyafterfour: I'll be back in a couple of hours [00:05:40] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [00:06:40] (03PS10) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc [puppet] - 10https://gerrit.wikimedia.org/r/193665 (https://phabricator.wikimedia.org/T90892) [00:09:59] (03PS4) 10GWicke: Enable group1 wikis in RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) [00:14:24] (03CR) 10jenkins-bot: [V: 04-1] Enable group1 wikis in RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [00:14:50] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:15:10] (03PS1) 10Dzahn: openstack firewall: avoid hardcoding tendril IP [puppet] - 10https://gerrit.wikimedia.org/r/201875 [00:15:52] (03PS11) 10BryanDavis: [WIP] Add role::mediawiki_vagrant_lxc [puppet] - 10https://gerrit.wikimedia.org/r/193665 (https://phabricator.wikimedia.org/T90892) [00:24:23] (03PS1) 10Tim Landscheidt: gridengine: Remove unused file [puppet] - 10https://gerrit.wikimedia.org/r/201878 [00:24:31] mutante: I'm debugging an apache issue on gallium. It seems root is required to read the access/error log. [00:24:40] Could you maybe take a peek so I know where to look? [00:25:05] gallium:/var/log/apache2/integration_error.log [00:25:13] (03PS1) 10Dzahn: openstack firewall: avoid hardcoding iron IP [puppet] - 10https://gerrit.wikimedia.org/r/201879 [00:27:43] Krinkle: copied it to your home dir [00:27:45] Krinkle: https://phabricator.wikimedia.org/P475 [00:27:46] on gallium [00:27:48] oh [00:27:49] heh :) [00:28:04] sorry, i lag, i'm on a bus [00:28:05] thx both :) [00:34:52] OK.I'll can reproduce it locally now [00:40:19] PROBLEM - salt-minion processes on db1035 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [00:45:30] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1179642 (10MZMcBride) >>! In T92438#1179308, @Eloquence wrote: > If so, I'm not sure there is a good option here other than 1) going back to the old name... [00:45:35] Krenair: ee-prototype did not make it to the labs restbase config yet, maybe the puppet master wasn't updated [00:47:35] (03PS1) 10Dzahn: openstack firewall: get designate host from hiera [puppet] - 10https://gerrit.wikimedia.org/r/201880 [01:08:52] (03CR) 10Negative24: "I'm sure @chasemp wouldn't want a "partial configuration" (as he expressed in I3365925902d9e636377fc18b6ee2eb6008a6268e) which it is at th" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/201864 (https://phabricator.wikimedia.org/T95062) (owner: 10Negative24) [01:09:24] (03PS1) 10Dzahn: WIP - contint, move zuul_merger_hosts to hiera [puppet] - 10https://gerrit.wikimedia.org/r/201882 [01:10:40] mutante twentyafterfour: Or just one hour. [01:11:41] (03CR) 10Dzahn: "actually i don't see any comment from Chase there" [puppet] - 10https://gerrit.wikimedia.org/r/201864 (https://phabricator.wikimedia.org/T95062) (owner: 10Negative24) [01:15:46] Negative24: rush and twentyafterfour know better and i don't see his comment there actually. was it an IRC conversation without gerrit trail that lead to abandoning that? well, i gotta run, my buss arrived, cya [01:16:12] mutante: it was IRC [01:17:49] (03CR) 10Negative24: "http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/20150325.txt at 21:08." [puppet] - 10https://gerrit.wikimedia.org/r/201864 (https://phabricator.wikimedia.org/T95062) (owner: 10Negative24) [01:43:05] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1179687 (10Jalexander) Err, the 'name' is one thing that I guess we can keep discussing however I was under the impression that Victoria had already said... [01:52:43] (03CR) 1020after4: "I don't really have an objection to this really but currently Phabricator doesn't update automatically, so the storage upgrades being auto" [puppet] - 10https://gerrit.wikimedia.org/r/201864 (https://phabricator.wikimedia.org/T95062) (owner: 10Negative24) [01:53:10] (03CR) 1020after4: "Chase explicitly set it up to upgrade manually, that's why I suspect he might have an opinion." [puppet] - 10https://gerrit.wikimedia.org/r/201864 (https://phabricator.wikimedia.org/T95062) (owner: 10Negative24) [01:57:14] (03CR) 1020after4: [C: 031] Puppet run storage upgrade for phd service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/201864 (https://phabricator.wikimedia.org/T95062) (owner: 10Negative24) [01:57:36] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1179692 (10Eloquence) Ah, thanks for pointing that out, James. Daniel asked me about this ticket and I assumed (given the ticket description and Daniel's... [01:58:31] PROBLEM - puppet last run on rdb2003 is CRITICAL: CRITICAL: puppet fail [02:03:54] (03PS1) 10Tim Landscheidt: apache: Mute warnings about right-to-left relationships [puppet] - 10https://gerrit.wikimedia.org/r/201884 [02:05:47] (03PS2) 10Tim Landscheidt: apache: Mute warnings about right-to-left relationships [puppet] - 10https://gerrit.wikimedia.org/r/201884 (https://phabricator.wikimedia.org/T87132) [02:15:19] RECOVERY - puppet last run on rdb2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:22:10] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [02:28:33] !log l10nupdate Synchronized php-1.25wmf23/cache/l10n: (no message) (duration: 09m 10s) [02:28:44] Logged the message, Master [02:35:11] !log LocalisationUpdate completed (1.25wmf23) at 2015-04-04 02:34:07+00:00 [02:35:16] Logged the message, Master [02:45:29] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [02:51:27] !log l10nupdate Synchronized php-1.25wmf24/cache/l10n: (no message) (duration: 05m 58s) [02:51:36] Logged the message, Master [02:51:41] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1179732 (10MZMcBride) >>! In T92438#1179201, @Eloquence wrote: > In any case, it's up to the shop team to pick the name for the thing :) In consultation... [02:56:01] !log LocalisationUpdate completed (1.25wmf24) at 2015-04-04 02:54:57+00:00 [02:56:06] Logged the message, Master [02:57:10] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [03:03:59] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [03:24:10] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [04:39:49] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [24.0] [04:59:30] PROBLEM - puppet last run on analytics1012 is CRITICAL: CRITICAL: puppet fail [05:14:59] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [05:17:59] RECOVERY - puppet last run on analytics1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:35:31] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Apr 4 05:34:27 UTC 2015 (duration 34m 26s) [05:35:37] Logged the message, Master [06:30:01] PROBLEM - puppet last run on mw2059 is CRITICAL: CRITICAL: puppet fail [06:31:19] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:39] PROBLEM - puppet last run on db2040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:39] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:10] PROBLEM - puppet last run on mw2097 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:00] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:30] PROBLEM - puppet last run on mw2134 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:29] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:46:50] RECOVERY - puppet last run on db2040 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:50] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:46:50] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:19] RECOVERY - puppet last run on mw2097 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:47:20] RECOVERY - puppet last run on mw2134 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:48:30] RECOVERY - puppet last run on mw2059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:52:40] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [35.0] [06:54:20] RECOVERY - Persistent high iowait on labstore1001 is OK: OK: Less than 50.00% above the threshold [25.0] [07:15:50] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [07:17:30] RECOVERY - Host mw2027 is UP: PING OK - Packet loss = 0%, RTA = 43.91 ms [07:55:30] Where is the repo for frack puppet? [07:56:49] 6operations, 10Datasets-General-or-Unknown: dumps.wikimedia.org seems super-slow right now - https://phabricator.wikimedia.org/T45647#1179843 (10Nemo_bis) It would be nice to have those nginx stats for dumps hosts: https://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&m=cpu_report&c=Miscellaneous+eqiad&h=datas... [07:57:08] (Context: https://phabricator.wikimedia.org/T45647#1179843 ) [08:30:09] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [08:48:39] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [08:51:59] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [09:00:29] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [09:15:29] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [09:27:19] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [09:32:19] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [09:42:30] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [24.0] [09:55:49] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [10:04:19] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [10:09:19] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [10:19:19] RECOVERY - High load for whatever reason on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [11:43:45] (03CR) 10Hashar: [C: 031] apache: Mute warnings about right-to-left relationships [puppet] - 10https://gerrit.wikimedia.org/r/201884 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt) [11:47:05] 7Puppet, 6operations, 6Labs, 5Patch-For-Review, 7Regression: Puppet: "Package[gdb] is already declared in file modules/java/manifests/tools.pp" - https://phabricator.wikimedia.org/T94917#1179901 (10hashar) 5Open>3Resolved >>! In T94917#1179105, @Dzahn wrote: > merged. should be fixed now. wanna confi... [11:47:20] 7Puppet, 6operations, 10Continuous-Integration, 5Patch-For-Review, 7Regression: Puppet: "Package[git-core] is already declared in file modules/authdns/manifests/scripts.pp" - https://phabricator.wikimedia.org/T94921#1179904 (10hashar) 5Open>3Resolved >>! In T94921#1179104, @Dzahn wrote: > should be f... [13:06:10] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: puppet fail [13:23:00] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:51:01] PROBLEM - Outgoing network saturation on labstore1001 is CRITICAL: CRITICAL: 20.69% of data above the critical threshold [100000000.0] [15:58:03] (03PS1) 10Glaisher: Change project name to 'Wikipedia' at astwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/201897 (https://phabricator.wikimedia.org/T94341) [16:19:31] RECOVERY - Outgoing network saturation on labstore1001 is OK: OK: Less than 10.00% above the threshold [75000000.0] [16:20:10] PROBLEM - puppet last run on mw2052 is CRITICAL: CRITICAL: puppet fail [16:38:30] RECOVERY - puppet last run on mw2052 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:27:59] PROBLEM - HHVM rendering on mw1081 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.013 second response time [17:28:50] PROBLEM - Apache HTTP on mw1081 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.009 second response time [17:28:59] PROBLEM - HHVM processes on mw1081 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm [17:30:30] RECOVERY - Apache HTTP on mw1081 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.052 second response time [17:30:30] RECOVERY - HHVM processes on mw1081 is OK: PROCS OK: 25 processes with command name hhvm [17:31:19] RECOVERY - HHVM rendering on mw1081 is OK: HTTP OK: HTTP/1.1 200 OK - 67932 bytes in 0.179 second response time [17:39:29] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [18:01:19] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:16:34] (03CR) 10Alex Monk: [C: 031] "Will just fall back to a 1x1 image on wikis where it's false, which is OK" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194084 (https://phabricator.wikimedia.org/T91340) (owner: 10MaxSem) [18:33:44] (03CR) 10Alex Monk: Add wikitech config. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [18:52:44] anyone from ops happen to be around and willing to look for an icon file that may have been left on virt1000? [18:53:04] but should really be on bits? [20:39:44] (03PS1) 10Alex Monk: Convert all Bugzilla numbers to Phabricator ticket numbers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/201910 [20:49:49] (03PS2) 10Alex Monk: Convert all Bugzilla numbers to Phabricator ticket numbers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/201910 [20:51:04] (03CR) 10Alex Monk: "guess we'll have to deal with all these /[Bb]ug T(\d+)(,| and)/ in another commit" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/201910 (owner: 10Alex Monk) [21:57:00] PROBLEM - Outgoing network saturation on labstore1001 is CRITICAL: CRITICAL: 13.79% of data above the critical threshold [100000000.0] [22:42:20] RECOVERY - Outgoing network saturation on labstore1001 is OK: OK: Less than 10.00% above the threshold [75000000.0] [22:44:05] YuviPanda: prod for anyone from ops happen to be around and willing to look for an icon file that may have been left on virt1000? [22:44:36] oh, sure Krenair [22:47:25] YuviPanda: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L10566 is the file he's on about. should be on bits but isn't and likely still on virt1000 [22:50:20] I’ll take a (slow) look [22:50:38] (03CR) 10John F. Lewis: [C: 031] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/188204 (owner: 10Dzahn) [22:51:21] (03CR) 10John F. Lewis: [C: 031] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/201880 (owner: 10Dzahn) [23:01:21] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [23:12:23] Krenair: I couldn't find that file on /srv/mediawiki on virt1000 [23:12:38] > yuvipanda@virt1000:/srv/mediawiki$ find . -name 'Wikitech-apple-touch-icon.png' [23:12:39] > [23:13:21] it's probably gone [23:13:21] oh well [23:13:56] thanks for looking YuviPanda [23:14:03] Krenair: it doesn't have MF installed, IIRC... [23:14:26] oh it [23:14:27] does [23:14:31] ?userformat=mobile works [23:18:00] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:38:05] (03CR) 10Yuvipanda: "This might also break a bunch of other things that rely on graphite, like all the check_graphite stuff. I've nothing against merging this," [puppet] - 10https://gerrit.wikimedia.org/r/181949 (owner: 10Hoo man)