[00:23:57] (03PS1) 10Springle: Reduce MariaDB thread_pool_stall_limit to 100ms [puppet] - 10https://gerrit.wikimedia.org/r/192254 [00:26:11] (03CR) 10Springle: [C: 032] Reduce MariaDB thread_pool_stall_limit to 100ms [puppet] - 10https://gerrit.wikimedia.org/r/192254 (owner: 10Springle) [01:19:17] Nemo_bis: wtf? [01:19:39] did you really just send a gripe email to 175 mailman owner addresses? [01:20:17] could you not have just filed a bug? [01:43:03] PROBLEM - puppet last run on virt1012 is CRITICAL: CRITICAL: Puppet has 1 failures [02:00:23] RECOVERY - puppet last run on virt1012 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [02:02:24] !log l10nupdate Synchronized php-1.25wmf17/cache/l10n: (no message) (duration: 00m 01s) [02:02:33] Logged the message, Master [02:03:32] !log LocalisationUpdate completed (1.25wmf17) at 2015-02-23 02:02:28+00:00 [02:03:37] Logged the message, Master [02:03:57] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 01s) [02:04:00] Logged the message, Master [02:05:04] !log LocalisationUpdate completed (1.25wmf18) at 2015-02-23 02:04:00+00:00 [02:05:08] Logged the message, Master [02:16:39] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Feb 23 02:15:36 UTC 2015 (duration 15m 35s) [02:16:43] Logged the message, Master [03:33:33] (03PS3) 10KartikMistry: WIP: Do not use registry and fallback to config.default.js [puppet] - 10https://gerrit.wikimedia.org/r/191263 [05:05:39] (03PS4) 10KartikMistry: WIP: Do not use registry and fallback to config.default.js [puppet] - 10https://gerrit.wikimedia.org/r/191263 [05:38:46] (03PS1) 10Springle: depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192269 [05:39:52] (03CR) 10Springle: [C: 032] depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192269 (owner: 10Springle) [05:39:57] (03Merged) 10jenkins-bot: depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192269 (owner: 10Springle) [05:41:05] !log springle Synchronized wmf-config/db-eqiad.php: depool db1066 (duration: 00m 06s) [05:41:12] Logged the message, Master [05:42:43] (03CR) 10BryanDavis: "If needed, by all means turn the logging level back up. The main driver for disabling the debug and info log messages was to relieve press" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192083 (owner: 10Hoo man) [06:28:25] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:45] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:45] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:45] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:03] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:24] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:54] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:04] PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on db2040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:54] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:23] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on db2040 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:51:31] It wasn't meant to be so [06:51:50] hmm, usually whenever this happens _joe_ pops up [06:52:40] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058223 (10Effeietsanders) As someone who also contributed quite a number of messages (but I won't bother counting them), I can very much feel with the feeling that one does not want t... [06:52:48] just cron.daily running logrotate restarting the puppetmaster [06:52:58] yeah [06:53:51] <_joe_> hey [06:53:57] hah! [06:53:57] <_joe_> good morning :) [06:55:01] _joe_: morning! [06:55:27] <_joe_> YuviPanda: you back? [06:55:37] _joe_: yup! [06:55:41] _joe_: although doing IST. [06:55:43] <_joe_> I'm sorry for you :/ [06:56:14] _joe_: :D I'm still going to be travelling for the next two weeks, though. [07:34:34] (03CR) 10Yuvipanda: [C: 031] Labs: Increase labnet1001 conntrack tables [puppet] - 10https://gerrit.wikimedia.org/r/190214 (https://phabricator.wikimedia.org/T72076) (owner: 10coren) [07:39:45] PROBLEM - very high load average likely xfs on ms-be1009 is CRITICAL: CRITICAL - load average: 255.44, 122.52, 56.67 [07:44:54] 6operations, 10Continuous-Integration, 5Patch-For-Review: move mediawiki php config files to /etc/php5/mods-available - https://phabricator.wikimedia.org/T90005#1058293 (10Joe) 5Open>3Resolved [07:48:55] 6operations: Fix the puppet catalog compiler - https://phabricator.wikimedia.org/T90417#1058302 (10Joe) a:3Joe [08:38:46] (03CR) 10Yuvipanda: [C: 04-1] "Ok, so I'm inclined to -2 any change that introduces new perl into ops/puppet." [puppet] - 10https://gerrit.wikimedia.org/r/192172 (https://phabricator.wikimedia.org/T90331) (owner: 10Tim Landscheidt) [08:39:41] <_joe_> what's that perl script? [08:40:36] (03CR) 10Yuvipanda: Tools: Install at (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/191521 (https://phabricator.wikimedia.org/T72324) (owner: 10Tim Landscheidt) [08:41:09] _joe_: I'm not fully sure. [08:41:27] _joe_: I'm also not fully sure what exactly it does [08:41:36] <_joe_> well, that both of us are not sure just looking at the code is already bad enough [08:41:58] hmm, zero comments [08:42:00] and has code like [08:42:01] next unless $stat =~ m/^([0-9]+) \((.*)\) ([RSDZTW]) ([0-9]+) ([0-9]+) /; [08:42:27] (03CR) 10Yuvipanda: "Also on the code having no comments on what exactly jobkill does." [puppet] - 10https://gerrit.wikimedia.org/r/192172 (https://phabricator.wikimedia.org/T90331) (owner: 10Tim Landscheidt) [08:43:02] <_joe_> YuviPanda: wat? [08:43:02] _joe_: I *think* it is trying to kill a whole process group? [08:45:17] _joe_: so to kill a job GridEngine calls this script with the pid [08:45:32] _joe_: and I think the script finds the job's children and kills everything [08:47:09] <_joe_> YuviPanda: which would be matter of just stopping the service if we used some normal supervisor instead of a big bowl of perl for that task [08:47:45] _joe_: this could also be a terribleness inflicted upon us because gridengine [08:47:55] <_joe_> no [08:48:27] <_joe_> and in case, burden of proof is on you toollabs people [08:48:37] <_joe_> I'll review the thing later [08:48:39] oh I completely agree [08:48:41] on that [08:50:35] <_joe_> btw, I do agree we should puppetize it if it's not [08:50:49] TimStarling: I try to learn the lesson, thanks again for your comment https://meta.wikimedia.org/wiki/Mailing_lists/Administration#Communication_with_list_administrators [08:51:01] _joe_: yeah, and hence my -1 rather than -2 [08:51:12] <_joe_> we just need a phab task to remove the whole bigbrother/portwhatever/groupkill thing [08:51:58] _joe_: yup, agreed. we need service manifests. [08:52:26] _joe_: I'm wondering, at least for web services, if we can basically kill OGE, and just use upstart/systemd with cgroups or something for limiting memory / CPU usage [08:53:03] <_joe_> YuviPanda: not really [08:53:14] <_joe_> you still need OGE to schedule jobs [08:53:29] _joe_: right, just for webservices. [08:53:32] oh, hmm [08:53:34] <_joe_> but OGE whould do something like launching and stopping some supervisor maybe [08:53:43] would need to distribute across machines [08:53:48] and re-distribute when machines die [08:53:48] <_joe_> yes [08:54:03] <_joe_> or, we use something more modern ;) [08:54:14] (and sorry) [08:54:17] <_joe_> but we're moving to my usual general and pointless rant [08:54:40] _joe_: anything that doesn't involve a large amount of Java with lots of pointers about how it makes running other large amounts of Java easier? :) [08:55:30] <_joe_> there are such things, but I never tried those [08:55:40] _joe_: I haven't properly looked yet. [08:55:50] the thing I definitely don't want to do at all is build our own :) [08:56:48] _joe_: anyway, if we do make service manifests properly, it shouldn't matter what the underlying scheduler is [08:56:57] fuuuuuttttuuuurrrreeeee wwwoooorrrkkkk [08:58:20] (03PS2) 10Yuvipanda: Tools: Install byobu [puppet] - 10https://gerrit.wikimedia.org/r/191368 (https://phabricator.wikimedia.org/T88989) (owner: 10Tim Landscheidt) [08:59:48] (03CR) 10Yuvipanda: [C: 032] Tools: Install byobu [puppet] - 10https://gerrit.wikimedia.org/r/191368 (https://phabricator.wikimedia.org/T88989) (owner: 10Tim Landscheidt) [09:00:31] fucking hate the 'yes' command [09:01:01] 6operations, 10Beta-Cluster: mwscript hardcoded with user 'apache', should use ::mediawiki::users::web - https://phabricator.wikimedia.org/T89165#1058350 (10yuvipanda) a:3Joe [09:02:55] (03PS2) 10Yuvipanda: Tools: Make crontab host configurable [puppet] - 10https://gerrit.wikimedia.org/r/190977 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [09:04:39] (03PS1) 10Giuseppe Lavagetto: Execute scap-provided script as mediawiki::users::web [puppet] - 10https://gerrit.wikimedia.org/r/192286 [09:04:45] (03CR) 10Yuvipanda: [C: 04-1] "Hmm, this feels hacky. Doesn't our LDAP setup already have a 'min number' below which it will refuse to manage users? Also basing 'system " [puppet] - 10https://gerrit.wikimedia.org/r/190978 (https://phabricator.wikimedia.org/T87527) (owner: 10Tim Landscheidt) [09:05:17] greetings [09:05:23] 6operations, 10Beta-Cluster, 5Patch-For-Review: mwscript hardcoded with user 'apache', should use ::mediawiki::users::web - https://phabricator.wikimedia.org/T89165#1058351 (10Joe) [09:05:32] (03PS3) 10Yuvipanda: Tools: Make crontab host configurable [puppet] - 10https://gerrit.wikimedia.org/r/190977 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [09:05:42] 6operations, 10Beta-Cluster, 5Patch-For-Review: mwscript hardcoded with user 'apache', should use ::mediawiki::users::web - https://phabricator.wikimedia.org/T89165#1028958 (10Joe) see https://gerrit.wikimedia.org/r/#/c/192286/ [09:06:13] yo YuviPanda ! welcome back [09:06:17] :D [09:06:29] (03CR) 10Yuvipanda: [C: 032] "(super-minor spacing / trailing comma change is what I made)" [puppet] - 10https://gerrit.wikimedia.org/r/190977 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [09:08:56] (03CR) 10Giuseppe Lavagetto: "If I have to get into the details of what this script does and how it does that, this would be a -2." [puppet] - 10https://gerrit.wikimedia.org/r/192172 (https://phabricator.wikimedia.org/T90331) (owner: 10Tim Landscheidt) [09:09:14] (03PS2) 10Yuvipanda: Execute scap-provided script as mediawiki::users::web [puppet] - 10https://gerrit.wikimedia.org/r/192286 (https://phabricator.wikimedia.org/T89165) (owner: 10Giuseppe Lavagetto) [09:09:25] (03CR) 10Yuvipanda: [C: 031] Execute scap-provided script as mediawiki::users::web [puppet] - 10https://gerrit.wikimedia.org/r/192286 (https://phabricator.wikimedia.org/T89165) (owner: 10Giuseppe Lavagetto) [09:11:56] (03CR) 10Yuvipanda: "Cool! Did you test this on toolsbeta? (or somewhere?)" [puppet] - 10https://gerrit.wikimedia.org/r/145441 (owner: 10Tim Landscheidt) [09:12:57] godog: https://gerrit.wikimedia.org/r/#/c/123903/3 maybe when you have the chance? (also added you as reviewer) [09:13:44] YuviPanda: sure, looks simple enough [09:13:53] godog: :) [09:19:49] (03CR) 10Yuvipanda: [C: 031] "lgtm! need to remove that dependent patch, however." [puppet] - 10https://gerrit.wikimedia.org/r/186627 (https://phabricator.wikimedia.org/T86445) (owner: 10Tim Landscheidt) [09:20:35] (03CR) 10Yuvipanda: "but... but... why?" [software] - 10https://gerrit.wikimedia.org/r/191846 (owner: 10Tim Landscheidt) [09:22:01] (03CR) 10Yuvipanda: [C: 031] "lgtm, but I wonder what our git::clone will do to local changes, etc. Will merge later this week after verifying." [puppet] - 10https://gerrit.wikimedia.org/r/148172 (owner: 10Tim Landscheidt) [09:26:42] !log reboot ms-be1009, xfs hosed [09:26:45] Logged the message, Master [09:27:14] (03CR) 10Yuvipanda: "*poke*. Is this still running unpuppetized from somewhere?" [puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [09:31:24] RECOVERY - very high load average likely xfs on ms-be1009 is OK: OK - load average: 28.62, 8.30, 2.85 [09:32:13] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Puppet last ran 4 days ago [09:32:53] that's me, just reenabled puppet [09:33:14] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [09:38:47] (03PS2) 10Giuseppe Lavagetto: admin: move to hiera, use roles where appropriate [puppet] - 10https://gerrit.wikimedia.org/r/191890 [09:48:14] (03PS2) 10Yuvipanda: Remove role::elasticsearch::config abstraction [puppet] - 10https://gerrit.wikimedia.org/r/188877 (owner: 10Chad) [09:50:18] (03CR) 10Yuvipanda: [C: 032] Remove role::elasticsearch::config abstraction [puppet] - 10https://gerrit.wikimedia.org/r/188877 (owner: 10Chad) [09:50:27] (03PS1) 10Filippo Giunchedi: admin: set filippo TERM [puppet] - 10https://gerrit.wikimedia.org/r/192287 [09:51:15] (03CR) 10Yuvipanda: "Needs manual rebase" [puppet] - 10https://gerrit.wikimedia.org/r/188702 (owner: 10Chad) [09:53:14] (03PS1) 10Giuseppe Lavagetto: puppet-compiler: correctly refresh prod hiera config on every run [software] - 10https://gerrit.wikimedia.org/r/192289 [10:00:56] (03CR) 10Filippo Giunchedi: [C: 031] add zotero role class skeleton [puppet] - 10https://gerrit.wikimedia.org/r/191925 (https://phabricator.wikimedia.org/T89867) (owner: 10Dzahn) [10:02:50] (03CR) 10Filippo Giunchedi: [C: 031] LVS configuration for zotero service [puppet] - 10https://gerrit.wikimedia.org/r/191938 (https://phabricator.wikimedia.org/T89867) (owner: 10Dzahn) [10:06:29] (03PS2) 10Giuseppe Lavagetto: puppet-compiler: correctly refresh prod hiera config on every run [software] - 10https://gerrit.wikimedia.org/r/192289 [10:07:19] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: correctly refresh prod hiera config on every run [software] - 10https://gerrit.wikimedia.org/r/192289 (owner: 10Giuseppe Lavagetto) [10:09:08] (03CR) 10Filippo Giunchedi: Use apt::repository instead of file resources (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/123903 (owner: 10Tim Landscheidt) [10:16:04] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1058457 (10ArielGlenn) Most people on the analytics team do not have the two admin groups. Does he need those, Ottomata? [10:17:33] (03PS2) 10Filippo Giunchedi: admin: set filippo TERM [puppet] - 10https://gerrit.wikimedia.org/r/192287 [10:17:39] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] admin: set filippo TERM [puppet] - 10https://gerrit.wikimedia.org/r/192287 (owner: 10Filippo Giunchedi) [10:31:55] (03CR) 10Gilles: "Because Media Viewer preloads next and previous images for performance purposes, looking at actual image requests is insufficient. The ima" [puppet] - 10https://gerrit.wikimedia.org/r/190821 (https://phabricator.wikimedia.org/T89088) (owner: 10Gilles) [10:32:40] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1058474 (10Aklapper) p:5Triage>3Low [10:39:08] (03PS1) 10ArielGlenn: root on vanadium (event-logging) for nuria and milimetric [puppet] - 10https://gerrit.wikimedia.org/r/192290 (https://phabricator.wikimedia.org/T88769) [10:40:05] (03PS2) 10ArielGlenn: root on vanadium (event-logging) for nuria and milimetric [puppet] - 10https://gerrit.wikimedia.org/r/192290 (https://phabricator.wikimedia.org/T88769) [10:41:15] (03PS1) 10Giuseppe Lavagetto: puppet-compiler: fix typos [software] - 10https://gerrit.wikimedia.org/r/192291 [10:46:15] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: fix typos [software] - 10https://gerrit.wikimedia.org/r/192291 (owner: 10Giuseppe Lavagetto) [10:58:39] (03PS3) 10Giuseppe Lavagetto: admin: move to hiera, use roles where appropriate [puppet] - 10https://gerrit.wikimedia.org/r/191890 [11:25:46] (03PS4) 10Giuseppe Lavagetto: admin: move to hiera, use roles where appropriate [puppet] - 10https://gerrit.wikimedia.org/r/191890 [11:28:16] (03CR) 10Giuseppe Lavagetto: [C: 032] "Verified with the puppet compiler, this should be a no-op" [puppet] - 10https://gerrit.wikimedia.org/r/191890 (owner: 10Giuseppe Lavagetto) [11:30:26] (03PS2) 10Giuseppe Lavagetto: admin: move to hiera, use roles/2 [puppet] - 10https://gerrit.wikimedia.org/r/191891 [11:43:42] sorry about that, folks. somewhere in the bowels of old hardware, kde desktop, nvidia drivers or giant dust bunnies in the latop, the desktop was crashing on me repeatedly [11:46:12] (03PS3) 10Giuseppe Lavagetto: admin: move to hiera, use roles/2 [puppet] - 10https://gerrit.wikimedia.org/r/191891 [11:49:08] (03CR) 10Giuseppe Lavagetto: [C: 032] "Verified again with the puppet compiler, is a noop" [puppet] - 10https://gerrit.wikimedia.org/r/191891 (owner: 10Giuseppe Lavagetto) [11:54:39] (03PS3) 10ArielGlenn: root on vanadium (event-logging) for nuria and milimetric [puppet] - 10https://gerrit.wikimedia.org/r/192290 (https://phabricator.wikimedia.org/T88769) [11:56:24] (03CR) 10ArielGlenn: [C: 032] root on vanadium (event-logging) for nuria and milimetric [puppet] - 10https://gerrit.wikimedia.org/r/192290 (https://phabricator.wikimedia.org/T88769) (owner: 10ArielGlenn) [11:59:22] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting deployment access for milimetric - https://phabricator.wikimedia.org/T88769#1058603 (10ArielGlenn) 5Open>3Resolved a:3ArielGlenn You and nuria should now both have the privs you need on vanadium. I'm closing this, reopen if there's anyt... [12:02:58] (03PS2) 10Giuseppe Lavagetto: mediawiki: do not install php-apc on newer hosts [puppet] - 10https://gerrit.wikimedia.org/r/191906 [12:03:03] 6operations, 10OTRS: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1058611 (10tommorris) @Steinsplitter: I know I have a new IP address when reconnecting. The fact that every other web service I use manages to persist connections over IP changes except OTRS kind of led me to... [12:03:51] (03CR) 10Giuseppe Lavagetto: [C: 032] "This doesn't even remove apcu, so it's a noop for any practical purpose where it's already present." [puppet] - 10https://gerrit.wikimedia.org/r/191906 (owner: 10Giuseppe Lavagetto) [12:10:36] 6operations, 10OTRS: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1058629 (10Aschmidt) Please note that the case of dual IPv4/IPv6 addresses is indeed one main problem we reproduced some weeks ago in issue T88224 which was soon closed and marked as a duplicate to this ticke... [12:10:45] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1058631 (10fgiunchedi) did some tests by piping statsd-tg into both txstatsd and statsite and compare outputs: txstatsd ``` c.000000.count 3 1424692011 ms.000002.999percentile 1022.0 1424692011 ms.000002.99percentile 1011.0 1... [12:21:35] (03PS3) 10Giuseppe Lavagetto: Execute scap-provided script as mediawiki::users::web [puppet] - 10https://gerrit.wikimedia.org/r/192286 (https://phabricator.wikimedia.org/T89165) [12:22:57] 6operations, 10Parsoid, 6Services, 7service-runner: Create a standard service template / init / logging / package setup - https://phabricator.wikimedia.org/T88585#1058647 (10ArielGlenn) [12:36:01] (03PS4) 10Giuseppe Lavagetto: Execute scap-provided script as mediawiki::users::web [puppet] - 10https://gerrit.wikimedia.org/r/192286 (https://phabricator.wikimedia.org/T89165) [12:38:51] (03PS5) 10Giuseppe Lavagetto: Execute scap-provided script as mediawiki::users::web [puppet] - 10https://gerrit.wikimedia.org/r/192286 (https://phabricator.wikimedia.org/T89165) [12:40:09] 6operations, 6Phabricator, 7domains: enable email for tickets in domains project? - https://phabricator.wikimedia.org/T88842#1058691 (10ArielGlenn) [12:41:22] 6operations, 3wikis-in-codfw: Setup redis clusters in codfw - https://phabricator.wikimedia.org/T86887#1058692 (10ArielGlenn) [12:42:19] (03CR) 10Giuseppe Lavagetto: [C: 032] Execute scap-provided script as mediawiki::users::web [puppet] - 10https://gerrit.wikimedia.org/r/192286 (https://phabricator.wikimedia.org/T89165) (owner: 10Giuseppe Lavagetto) [12:42:46] 6operations, 3wikis-in-codfw: Set up the mediawiki application layer in codfw - https://phabricator.wikimedia.org/T86894#1058694 (10ArielGlenn) [12:43:42] 6operations, 7Graphite: revisit what percentiles are calculated by txstatsd - https://phabricator.wikimedia.org/T88662#1058695 (10ArielGlenn) [12:48:56] 6operations, 7Monitoring: Monitor the up-to-date status of wikitech-static - https://phabricator.wikimedia.org/T89323#1058705 (10fgiunchedi) ack, thanks! yep comparing the two sound easy enough [12:50:14] 6operations, 10Beta-Cluster, 5Patch-For-Review: mwscript hardcoded with user 'apache', should use ::mediawiki::users::web - https://phabricator.wikimedia.org/T89165#1058708 (10Joe) 5Open>3Resolved [12:51:19] (03PS3) 10Matanya: lvs: init.pp lint [puppet] - 10https://gerrit.wikimedia.org/r/190689 [12:53:38] 6operations, 10Wikimedia-Stream, 5Patch-For-Review: stream.wikimedia.org: Uneven distribution of client connections on backends - https://phabricator.wikimedia.org/T69957#1058710 (10ArielGlenn) Ori, asking for feedback on the comments on your changeset so this can get unstalled. [12:53:49] 6operations, 10ops-eqiad: rack and setup restbase production cluster in eqiad - https://phabricator.wikimedia.org/T88805#1058711 (10fgiunchedi) a:5Cmjohnson>3fgiunchedi [13:01:48] (03PS5) 10Krinkle: Standardize the name of interface editor group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [13:03:45] 6operations: NIC misassigned (double entries) by jessie installer - https://phabricator.wikimedia.org/T90236#1058717 (10Joe) [13:03:46] 6operations, 3wikis-in-codfw: Setup memcached cluster in codfw - https://phabricator.wikimedia.org/T86888#1058716 (10Joe) [14:03:49] (03CR) 10Yuvipanda: [C: 031] Move beta elasticsearch config into hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/188702 (owner: 10Chad) [14:15:35] (03PS4) 10Yuvipanda: Move beta elasticsearch config into hiera [puppet] - 10https://gerrit.wikimedia.org/r/188702 (owner: 10Chad) [14:15:53] (03PS2) 10Amire80: Set $wgTranslationNotificationsAlwaysHttpsInEmail to true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [14:19:27] (03CR) 10Amire80: [C: 031] Set $wgTranslationNotificationsAlwaysHttpsInEmail to true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [14:20:41] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1058833 (10Ottomata) I would like Joseph to be able to cover for a lot of the things Christian used to do. [14:23:07] (03CR) 10coren: [C: 031] tools: Add support to bigbrother to just say webservice2 [puppet] - 10https://gerrit.wikimedia.org/r/187949 (owner: 10Yuvipanda) [14:23:27] (03PS3) 10Yuvipanda: tools: Add support to bigbrother to just say webservice2 [puppet] - 10https://gerrit.wikimedia.org/r/187949 [14:29:04] (03PS2) 10Chad: Move jq package to module, all elasticsearch machines should have it [puppet] - 10https://gerrit.wikimedia.org/r/188881 [14:30:24] ^demon|away: did you just write a non-compliant commit message?! [14:30:37] that's 68 characters on the first line [14:32:03] (03CR) 10Nemo bis: [C: 031] "We default to https in l10n as well." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [14:32:11] can someone from ops say me how many logged-in users don't use https? and if there's regions where this perecentage is much higher (china?). If you know people, who are familiar with https usage, please add them to https://gerrit.wikimedia.org/r/190203 [14:32:34] se4598: csteipp is the person to talk to [14:32:44] what hoo said :) [14:32:46] [14:32:57] but I'd like to know the answer as well [14:36:47] (03CR) 10Se4598: "@CSteipp: can you comment on this on how many logged-in users don't use https? and if there's regions where this perecentage is much highe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [14:36:57] thanks [14:39:51] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1058855 (10Ottomata) [14:40:16] grrr. didn't even save the X core dump when it died. [14:40:20] 6operations, 10Analytics-Cluster: Audit analytics cluster alerts and recipients - https://phabricator.wikimedia.org/T89730#1058856 (10Ottomata) p:5Triage>3Normal [14:40:48] <^demon|away> paravoid: No, I didn't just write it. I wrote it 18 days ago [14:44:34] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [14:58:05] (03CR) 10ArielGlenn: [C: 031] "This looks right to me. I think this should go before I345516cd" [puppet] - 10https://gerrit.wikimedia.org/r/170925 (owner: 10Glaisher) [14:59:14] 6operations, 10ops-eqiad: cp1047 down - https://phabricator.wikimedia.org/T88045#1058917 (10ArielGlenn) [15:00:52] 6operations: cp* boxes, pagecache issues & trying newer kernels - https://phabricator.wikimedia.org/T83809#1058922 (10ArielGlenn) [15:02:06] 6operations, 10ops-esams: decom amslvs1-4 (dc work) - https://phabricator.wikimedia.org/T87790#1058931 (10ArielGlenn) [15:04:01] (03CR) 10ArielGlenn: "John, what would we be asking legal exactly? Happy to do it, just need to know... what we need to know :-)" [puppet] - 10https://gerrit.wikimedia.org/r/185474 (https://phabricator.wikimedia.org/T87039) (owner: 10Glaisher) [15:12:29] 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: Varnish: Mobile site redirect interferes with OAuth authorization process - https://phabricator.wikimedia.org/T74186#1058964 (10ArielGlenn) Is this still an issue after the (long since) merge of that gerrit changeset? [15:12:41] 6operations, 10OTRS: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1058965 (10Steinsplitter) a:5lfaraone>3None [15:12:43] 6operations, 10Wikimedia-General-or-Unknown: Varnish: Mobile site redirect interferes with OAuth authorization process - https://phabricator.wikimedia.org/T74186#1058966 (10ArielGlenn) [15:14:04] 6operations, 10Analytics-Cluster, 6Analytics-Kanban: Audit analytics cluster alerts and recipients - https://phabricator.wikimedia.org/T89730#1058972 (10ggellerman) [15:22:23] 6operations, 10Beta-Cluster, 5Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1059002 (10ArielGlenn) is parsoid still left on the to do list? [15:22:35] 6operations, 10Beta-Cluster: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1059003 (10ArielGlenn) [15:23:53] 7Puppet, 6operations, 10Beta-Cluster, 10Staging: Move scap puppet code into a module - https://phabricator.wikimedia.org/T87221#1059006 (10ArielGlenn) [15:29:04] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: Remove port 29418 from cloning process - https://phabricator.wikimedia.org/T37611#1059013 (10ArielGlenn) I'd like to close this as declined if no one minds. I look forward to filippo's script being made available. [15:29:53] 6operations, 6Labs, 5Patch-For-Review: Make labs salt use instance names than ids - https://phabricator.wikimedia.org/T1154#1059014 (10ArielGlenn) I have a draft of that plugin which I need to test. [15:30:59] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1059019 (10ArielGlenn) Ah, good to know. Let's review it at today's ops meeting then. [15:33:57] 6operations, 10Quality-Assurance, 6Release-Engineering: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#1059022 (10ArielGlenn) [15:36:54] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1059024 (10ArielGlenn) bblack, I am happy to be on this with you, watching/poking on the console. Let me know a good time (early morning for you or late evening for you overlaps with EET). Gotta remember to upate the ipv6 int... [15:42:19] (03CR) 10Yuvipanda: [C: 032] tools: Add support to bigbrother to just say webservice2 [puppet] - 10https://gerrit.wikimedia.org/r/187949 (owner: 10Yuvipanda) [15:43:33] 6operations: NIC misassigned (double entries) by jessie installer - https://phabricator.wikimedia.org/T90236#1059028 (10fgiunchedi) trying to boot up restbase1001 I've seen it timeout on initial dhcp at boot, most of the time it works though as below: ``` Feb 23 15:39:09 carbon dhcpd: DHCPOFFER on 10.64.0.220 t... [15:44:40] (03PS5) 10BBlack: varnish+jessie filesystem stuff [puppet] - 10https://gerrit.wikimedia.org/r/190610 [15:45:41] (03CR) 10BBlack: [C: 032] varnish+jessie filesystem stuff [puppet] - 10https://gerrit.wikimedia.org/r/190610 (owner: 10BBlack) [15:46:35] 6operations, 10Parsoid, 6Services, 7service-runner: Create a standard service template / init / logging / package setup - https://phabricator.wikimedia.org/T88585#1059031 (10GWicke) [15:48:04] (03CR) 10Hoo man: "Well, I miss a considerable amount of data updates done by jobs on wikidata, some seem to be abandoned, but I doubt the number of abandone" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192083 (owner: 10Hoo man) [15:49:41] (03CR) 10coren: [C: 032] "Less noise is good." [puppet] - 10https://gerrit.wikimedia.org/r/145441 (owner: 10Tim Landscheidt) [15:50:34] (03CR) 10coren: [C: 032] Labs: Increase labnet1001 conntrack tables [puppet] - 10https://gerrit.wikimedia.org/r/190214 (https://phabricator.wikimedia.org/T72076) (owner: 10coren) [15:50:55] apergos: have you seen https://phabricator.wikimedia.org/T85442 [15:51:25] no [15:51:40] :D now you have! [15:51:48] wedon't roll our own, we steal from the salt ppa btw, so we could just steal the salt-syndic package (hopefully) [15:52:29] 6operations, 10Beta-Cluster, 6Labs: Backport new salt-syndic packages - https://phabricator.wikimedia.org/T85442#1059049 (10ArielGlenn) a:3ArielGlenn [15:52:33] apergos: right. stealing would be nice too [15:52:49] apergos: right now salt commmands from virt1000 don't reach deployment-prep instances [15:52:56] apergos: thanks! [15:52:56] * anomie sees nothing for SWAT this morning [15:53:04] right because not master [15:53:05] uh huh [15:53:27] don't thank me yet, but I'll get to it during my clinic duty this week [15:53:36] apergos: :D cool! [15:53:40] Coren: also, comments on https://gerrit.wikimedia.org/r/#/c/192172/? [15:53:55] got anything else I should be claiming? [15:54:52] apergos: well, https://phabricator.wikimedia.org/T78466 if you want [15:55:45] dunno if I want to own that yet, subscribed for now [15:55:51] apergos: yeah, fair enough [15:56:06] apergos: there was also a ticket to upgrade to a newer version of salt again, since that has auto key signing [15:56:13] * YuviPanda finds [15:56:24] dunno if we wanna do that yet, but it's been a good 7 months since that released, I think [15:56:48] apergos: https://phabricator.wikimedia.org/T88971 [15:57:44] 6operations: Upgrade salt to 2014.7 (investigating) - https://phabricator.wikimedia.org/T88971#1059060 (10yuvipanda) [15:58:27] (03PS5) 10Yuvipanda: Move beta elasticsearch config into hiera [puppet] - 10https://gerrit.wikimedia.org/r/188702 (owner: 10Chad) [15:58:55] 6operations: Upgrade salt to 2014.7 (investigating) - https://phabricator.wikimedia.org/T88971#1059063 (10ArielGlenn) actually I have already a start on this so I'll bump it up in my queue. [15:59:05] apergos: wheee [15:59:07] thanks [15:59:25] 6operations: Upgrade salt to 2014.7 (investigating) - https://phabricator.wikimedia.org/T88971#1059064 (10ArielGlenn) a:3ArielGlenn [15:59:25] apergos: if we have autoaccepting keys it should make it much easier to give projects their own salt master [15:59:29] (along with syndic) [15:59:56] well the keys is a whole other thing, which I need to get back to the salt devs on [16:00:03] reusing puppet keys, see [16:00:33] ah, right [16:00:44] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [16:03:39] yep 2014.7 has been out for a while [16:11:16] <^d> YuviPanda: https://gerrit.wikimedia.org/r/#/c/188881/ is trivial at least :) [16:21:25] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [16:23:41] (03PS3) 10Yuvipanda: Move jq package to module, all elasticsearch machines should have it [puppet] - 10https://gerrit.wikimedia.org/r/188881 (owner: 10Chad) [16:26:34] (03CR) 10John F. Lewis: "@ArielGlen the status of the domains that are being changed but not actually hosted by the WMF (just Wikibooks.com I think). Such as if th" [puppet] - 10https://gerrit.wikimedia.org/r/185474 (https://phabricator.wikimedia.org/T87039) (owner: 10Glaisher) [16:26:51] <_joe_> !log depooling mw1062 for testing for T86652 [16:26:58] Logged the message, Master [16:31:58] trying to create last two necessary security for "staging" project on wikitech. For whatever reason, I can't create > 10 security groups for this project. Looking at the deployment-prep project, I see 13 security groups. [16:33:03] !log updating jessie d-i image to currently nightly [16:33:07] Logged the message, Master [16:36:12] (03PS1) 10Nuria: Doubling NavigationTiming Sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192332 [16:40:37] thcipriani: 10 is the default quota for security groups. You need a bump from YuviPanda, andrewbogot or Coren to add more than that to a project. [16:41:26] thcipriani: I can't see your project's quotas at https://wikitech.wikimedia.org/w/index.php?title=Special:NovaProject&action=displayquotas&projectname=staging . Not sure what's up with that. [16:43:20] bd808: probably since you're not a project member of staging [16:43:46] thcipriani: but I have near-god-like wikitech powers! ;) [16:44:03] 7Blocked-on-Operations, 6operations, 10RESTBase, 6Scrum-of-Scrums, and 3 others: RESTBase production hardware - 3 of 6 ready - https://phabricator.wikimedia.org/T76986#1059177 (10GWicke) [16:44:38] thcipriani: looks like you're right though. I can't see quotas for random projects [16:46:13] 6operations, 10Wikimedia-General-or-Unknown, 7Regression: svn.wikimedia.org security certificate expired - https://phabricator.wikimedia.org/T88731#1059182 (10Aklapper) [16:47:04] (03PS1) 10coren: Conntrack collector for diamond [puppet] - 10https://gerrit.wikimedia.org/r/192335 [16:47:14] YuviPanda|zzz: ^^ [16:47:21] Ah, bah. Sleep. [16:48:45] (03CR) 10jenkins-bot: [V: 04-1] Conntrack collector for diamond [puppet] - 10https://gerrit.wikimedia.org/r/192335 (owner: 10coren) [16:51:20] (03CR) 10Faidon Liambotis: "No, we're definitely not going to waste IP space or renumber our datacenter to encode the rack into the IP. Especially not because a Cassa" [puppet] - 10https://gerrit.wikimedia.org/r/167645 (https://phabricator.wikimedia.org/T84518) (owner: 10Alexandros Kosiaris) [16:51:22] (03PS2) 10coren: Conntrack collector for diamond [puppet] - 10https://gerrit.wikimedia.org/r/192335 [16:51:34] ... seriously? Lint fail because trailing newline in source fail? [16:53:55] the trailing newline issue is really annoying but also really serious. software that processes textfiles as line should be able to assume that all lines end with a line-ending terminator, even the last one [16:54:42] No, the lint fail wasn't because of a lacking final newline (which is a real issue) but because the end of the file had an extra blank line. [16:54:50] oh :) [16:55:15] https://integration.wikimedia.org/ci/job/operations-puppet-pep8/4443/violations/file/modules/diamond/files/collector/conntrack.py/ [16:55:22] yeah pep8 is kinda picky [16:55:29] IMO that goes beyond lint into OCD territory [16:55:32] it usually gets me on failing to have two blank lines between methods, too :p [16:56:01] (what a waste of vertical screen space, imho) [16:57:22] it is easy enough to highlight trailing spaces in text editors [16:57:59] (03PS1) 10BBlack: post-merge bugfix for 1595d545 [puppet] - 10https://gerrit.wikimedia.org/r/192340 [16:58:27] (03CR) 10BBlack: [C: 032 V: 032] post-merge bugfix for 1595d545 [puppet] - 10https://gerrit.wikimedia.org/r/192340 (owner: 10BBlack) [16:58:56] godog: I despise the python obsession with whitespace in the first place. :-) [16:59:44] mutante: yt? [17:00:43] FORTRAN also had unambiguous semantics by careful use of whitespace. Whenever I use python I feel like I should be dusting off a keypunch. :-) [17:02:01] (03CR) 10Chad: [C: 032] Tidy up SpecialVersionUrl hook usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191927 (https://phabricator.wikimedia.org/T75759) (owner: 10Chad) [17:02:29] 6operations, 10ops-codfw, 3wikis-in-codfw: Move network cable to the other port on codfw memcached hosts - https://phabricator.wikimedia.org/T90456#1059265 (10Joe) 3NEW [17:03:41] PROBLEM - puppet last run on restbase1002 is CRITICAL: CRITICAL: Puppet has 2 failures [17:04:13] Coren: hehe well the parser won't complain of course, just lint/pep8 [17:07:00] RECOVERY - puppet last run on restbase1002 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [17:07:57] gwicke mobrovac restbase1002 should be online [17:08:15] yeeey [17:09:43] <^d> good thing I'm not in a hurry [17:09:58] <^d> the fact that wmf-config changes are gated behind a MW core job is batshit insane [17:10:03] * ^d twiddles thumbs [17:10:21] (03Merged) 10jenkins-bot: Tidy up SpecialVersionUrl hook usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191927 (https://phabricator.wikimedia.org/T75759) (owner: 10Chad) [17:10:57] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1059331 (10ArielGlenn) Hello? Any wikidata dumps consumers on this ticket? Otherwise I'll ask in xmlatadumps-l. [17:11:26] !log demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s) [17:11:31] Logged the message, Master [17:18:40] godog: sweet! [17:24:00] (03CR) 10GWicke: "Just for the record, I was not suggesting this option for Cassandra's benefit, but as a simplification for figuring out the dc / row / rac" [puppet] - 10https://gerrit.wikimedia.org/r/167645 (https://phabricator.wikimedia.org/T84518) (owner: 10Alexandros Kosiaris) [17:25:52] (03PS1) 10Ori.livneh: update my (ori) dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/192343 [17:26:07] (03PS2) 10Ori.livneh: update my (ori) dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/192343 [17:26:18] (03CR) 10Ori.livneh: [C: 032 V: 032] update my (ori) dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/192343 (owner: 10Ori.livneh) [17:26:21] (03PS2) 10Ottomata: Create apache site define for forcing https and doing transparent reverse proxy, use this for hue [puppet] - 10https://gerrit.wikimedia.org/r/191911 (https://phabricator.wikimedia.org/T85834) [17:26:57] (03CR) 10Chad: [C: 032] Enable recent changes patrolling on Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192103 (https://phabricator.wikimedia.org/T89147) (owner: 10Odder) [17:27:04] (03Merged) 10jenkins-bot: Enable recent changes patrolling on Wikispecies [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192103 (https://phabricator.wikimedia.org/T89147) (owner: 10Odder) [17:27:10] <_joe_> ottomata: please use the autoload layout for new puppet classes [17:27:56] !log demon Synchronized wmf-config/InitialiseSettings.php: T89147 (duration: 00m 05s) [17:27:59] Logged the message, Master [17:29:02] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I don't think this is an use general enough to need a define in the apache module. This is something that could be confined to the hue mod" [puppet] - 10https://gerrit.wikimedia.org/r/191911 (https://phabricator.wikimedia.org/T85834) (owner: 10Ottomata) [17:29:40] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 1 failures [17:31:10] (03CR) 10Chad: [C: 032] Add autopatrolled user group for dawikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188928 (https://phabricator.wikimedia.org/T88591) (owner: 10Mjbmr) [17:31:18] (03Merged) 10jenkins-bot: Add autopatrolled user group for dawikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188928 (https://phabricator.wikimedia.org/T88591) (owner: 10Mjbmr) [17:31:41] !log demon Synchronized wmf-config/InitialiseSettings.php: T88591 (duration: 00m 06s) [17:31:43] (03PS1) 10BBlack: more bugfixes for 1595d545 [puppet] - 10https://gerrit.wikimedia.org/r/192345 [17:31:46] Logged the message, Master [17:32:08] (03CR) 10BBlack: [C: 032 V: 032] more bugfixes for 1595d545 [puppet] - 10https://gerrit.wikimedia.org/r/192345 (owner: 10BBlack) [17:33:19] (03PS1) 10Ori.livneh: Typo fix for Ia6c7990e9c [puppet] - 10https://gerrit.wikimedia.org/r/192346 [17:33:26] (03PS2) 10Ori.livneh: Typo fix for Ia6c7990e9c [puppet] - 10https://gerrit.wikimedia.org/r/192346 [17:33:35] (03CR) 10Ori.livneh: [C: 032 V: 032] Typo fix for Ia6c7990e9c [puppet] - 10https://gerrit.wikimedia.org/r/192346 (owner: 10Ori.livneh) [17:34:47] (03CR) 10Dzahn: [C: 031] base: move selector outside resource block [puppet] - 10https://gerrit.wikimedia.org/r/191589 (owner: 10Matanya) [17:35:11] (03CR) 10Giuseppe Lavagetto: Create apache site define for forcing https and doing transparent reverse proxy, use this for hue (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/191911 (https://phabricator.wikimedia.org/T85834) (owner: 10Ottomata) [17:38:29] (03CR) 10Chad: [C: 032] Create 'autopatrolled' user group on maiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190721 (https://phabricator.wikimedia.org/T89346) (owner: 10Gerardduenas) [17:39:45] (03Merged) 10jenkins-bot: Create 'autopatrolled' user group on maiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190721 (https://phabricator.wikimedia.org/T89346) (owner: 10Gerardduenas) [17:40:29] !log demon Synchronized wmf-config/InitialiseSettings.php: T89346 (duration: 00m 07s) [17:40:32] Logged the message, Master [17:41:20] _joe_: yay someone cares! :p [17:41:39] i would put this in hue module, but I didn't want to make a dependency on our apache module [17:41:54] but, uhhhh [17:41:58] can I do the redirect in varnish? [17:42:01] that would be totally cool [17:42:10] it is the misc-web-lbs though, wasn't sure i could modify those [17:42:14] since so many things use them [17:43:19] i don't really like the apache module change either [17:43:28] but mutante said he didn't want it in templates/apache/sites [17:43:36] and I don't want to put it in the hue module because apache dep. [17:43:46] and it doesn't warrant is own module [17:43:59] hm. i wouldn't mind putting the template into hue module i guess [17:44:03] and not doing any puppet with it [17:44:08] and then rendering it using apache module in hue role. [17:44:14] but! as you say. varnish? [17:44:26] (03PS3) 10Glaisher: Add apache config for m.{project}.org (-wikipedia) [puppet] - 10https://gerrit.wikimedia.org/r/185461 (https://phabricator.wikimedia.org/T78421) [17:44:52] (03CR) 10Chad: [C: 032] Unifying talk namespaces for fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190232 (https://phabricator.wikimedia.org/T90340) (owner: 10Mjbmr) [17:45:00] (03Merged) 10jenkins-bot: Unifying talk namespaces for fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190232 (https://phabricator.wikimedia.org/T90340) (owner: 10Mjbmr) [17:45:50] !log demon Synchronized wmf-config/InitialiseSettings.php: T90340 (duration: 00m 05s) [17:45:55] Logged the message, Master [17:46:20] (03CR) 10Glaisher: "bump bump" [puppet] - 10https://gerrit.wikimedia.org/r/185461 (https://phabricator.wikimedia.org/T78421) (owner: 10Glaisher) [17:47:19] (03CR) 10Ottomata: "If I can do this in varnish, that is certainly ideal. Wasn't sure I could modify the misc.inc.vcl.erb for this though, since so many othe" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/191911 (https://phabricator.wikimedia.org/T85834) (owner: 10Ottomata) [17:47:20] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:51:05] Glaisher: yours is the task I was in the middle of commenting on the last time may laptop fell over this morning (note to anyone keeping track, the random X crashes seem to have stopped, so it was either cairo-dock or dust bunnies in the works) [17:51:39] (03PS1) 10BBlack: do not prompt about swap for jessie caches [puppet] - 10https://gerrit.wikimedia.org/r/192350 [17:51:52] (03CR) 10BBlack: [C: 032 V: 032] do not prompt about swap for jessie caches [puppet] - 10https://gerrit.wikimedia.org/r/192350 (owner: 10BBlack) [17:51:58] apergos: the mobile domains one? [17:52:04] yeah [17:52:10] apergos: if the laptop fell over, maybe it's lost its balance instead [17:52:23] I was typing a comment in and boom. black screen and it shut off [17:52:54] on the up side if it was jus toverheating then my cleaning job should forestall more of that for another year or so [17:53:32] thought we told you to stop doing 'shutdown -h now' :p [17:54:00] too addicted... that sense of raw power... [17:55:29] was testing the chaos monkey script [17:56:00] I hate that script [17:56:02] mutante: that sentence made me laugh inside :p [17:56:40] brb need to make dinner and attempt to eat before meeting [17:57:04] 6operations, 10Staging: Package geoipupdate for jessie - https://phabricator.wikimedia.org/T90229#1059516 (10thcipriani) [18:09:30] I have most access to production; however, I can't get to tin.eqiad.wmnet should I create a ticket to get access? Or can someone get me access? [18:12:07] 6operations, 10ops-eqiad: rack and setup restbase production cluster in eqiad - https://phabricator.wikimedia.org/T88805#1059600 (10fgiunchedi) restbase1002 installed, pending restbase1001 [18:17:31] (03CR) 10Chad: [C: 032] Enable EducationProgram in the Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190357 (https://phabricator.wikimedia.org/T89393) (owner: 10Amire80) [18:17:38] (03Merged) 10jenkins-bot: Enable EducationProgram in the Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190357 (https://phabricator.wikimedia.org/T89393) (owner: 10Amire80) [18:18:04] !log demon Synchronized wmf-config/InitialiseSettings.php: T89393 (duration: 00m 06s) [18:18:08] Logged the message, Master [18:19:30] 10Ops-Access-Requests, 6operations: Requesting access to tin.eqiad.wmnet for thcipriani - https://phabricator.wikimedia.org/T90467#1059641 (10thcipriani) 3NEW [18:19:54] <^d> !log created education program tables for hewiktionary [18:19:57] Logged the message, Master [18:24:31] (03PS1) 10Gage: Icinga: add gage to analytics contact group [puppet] - 10https://gerrit.wikimedia.org/r/192360 [18:25:49] (03CR) 10Gage: [C: 032] Icinga: add gage to analytics contact group [puppet] - 10https://gerrit.wikimedia.org/r/192360 (owner: 10Gage) [18:30:31] (03PS1) 10BBlack: remove late_command mkfs, cannot have multiple... [puppet] - 10https://gerrit.wikimedia.org/r/192365 [18:30:46] (03CR) 10BBlack: [C: 032 V: 032] remove late_command mkfs, cannot have multiple... [puppet] - 10https://gerrit.wikimedia.org/r/192365 (owner: 10BBlack) [18:33:08] (03Abandoned) 10Chad: Grant reedy root [puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy) [18:34:33] (03CR) 10Yuvipanda: ":(" [puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy) [18:39:06] 6operations: NIC misassigned (double entries) by jessie installer - https://phabricator.wikimedia.org/T90236#1059715 (10fgiunchedi) also seemingly fails to load firmware for bnx2x ``` ~ # dmesg | grep -e eth -e bnx [ 2.749242] bnx2x: Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.78.... [18:43:30] (03PS1) 10Odder: Set $wgArticleCountMethod to 'any' on zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192367 (https://phabricator.wikimedia.org/T53604) [18:46:07] (03CR) 10Dzahn: [C: 032] removed mw2215 from list, last mw server is mw2214 [dns] - 10https://gerrit.wikimedia.org/r/191912 (owner: 10Papaul) [18:49:21] andrewbogott: Hi, I'm trying to follow-through on https://phabricator.wikimedia.org/T88599 but it isn't clear to me what the next step should be. Fundraising Tech needs to help provision a new instance, under the integration project. Is there a way to do that without adding us all to the integration project? [18:50:50] awight: Members of a project can access instances within the project. Project admins can create/delete instances in the project. [18:51:04] (03CR) 10ArielGlenn: "sadness..." [puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy) [18:51:33] awight: I don’t object to creating a new project, I don’t really follow whether it should or shouldn’t have its own [18:52:39] andrewbogott: me neither! [18:52:51] There are two special things going on... [18:52:59] So, probably best to respond to the ticket so hashar can read [18:53:07] ok sure, thx. [18:53:58] andrewbogott: actually, I think the situation is already described well in the task. [18:54:28] hashar said, "But it is probably easier to have the dedicated instance added to the integration project and add your team to it. This way people in charge of the other CI instances would be able to connect to your instance and help maintain it." [18:54:46] What does that mean, in terms of getting the instance created and changing permissions? [18:55:00] I'd like to define some actionable items here! [18:56:24] (03Abandoned) 10Dzahn: Revert "Allocate labstore200[12] mgmt IPs" [dns] - 10https://gerrit.wikimedia.org/r/173318 (owner: 10Papaul) [18:56:49] I don’t really know anything about that project, but here is a list of the current admins: https://dpaste.de/XwBB [18:56:59] (Note that Coren and I are on there by default, we aren’t actually active there.) [18:57:24] andrewbogott: OK thank you, that does illuminate things a bit. I think I should go ahead and have my team added, in that case. [18:58:02] awight: I just read the ticket. by 'Continuous Integration' do you mean 'hook into the current gerrit/jenkins infrastructure'? [18:58:12] YuviPanda|zzz: hi :) [18:58:31] Yeah, I have some scripts which unfortunately need to muck around in MySQL in order for our PHPUnit tests to run. [18:58:32] cmjohnson1: ping [18:58:40] pong [18:58:42] hey [18:58:51] So hashar is suggesting that we run CI on dedicated instances, where we create a huge db mess. [18:59:16] right [18:59:17] Other than that, we want the usual test and gated-merge jobs [18:59:21] cmjohnson1: I was just looking at https://phabricator.wikimedia.org/T89639; so current status is that we'll send back the controller & wait for a replacement? [18:59:31] so yeah, all action items basically involve hashar / CI team folks now. nothing we can do [18:59:41] YuviPanda|zzz: makes sense! [19:00:44] 6operations, 10Wikimedia-General-or-Unknown: Varnish: Mobile site redirect interferes with OAuth authorization process - https://phabricator.wikimedia.org/T74186#1059827 (10csteipp) >>! In T74186#1058964, @ArielGlenn wrote: > Is this still an issue after the (long since) merge of that gerrit changeset? That c... [19:01:50] gwicke: current status is it's in the troubleshooting phase. Most likely the controller However, I have to work with HP to get it fixed. That could take time and I thought it faster to get 1001/1002 online first and then deal with tech support. They make you jump through several hoops before actually providing you with the part. [19:02:24] (03PS1) 10Ori.livneh: Set up a beacon namespace on bits [puppet] - 10https://gerrit.wikimedia.org/r/192370 [19:04:09] cmjohnson1: makes sense, thanks for the update [19:04:32] yep, once I have something from them I will update phab task [19:05:44] (03CR) 10Ottomata: [C: 031] "I like this idea a lot." [puppet] - 10https://gerrit.wikimedia.org/r/192370 (owner: 10Ori.livneh) [19:06:58] 6operations, 10Staging: Increase Security Groups quota on Wikitech staging project - https://phabricator.wikimedia.org/T90473#1059853 (10thcipriani) 3NEW [19:08:37] 6operations, 3wikis-in-codfw: Setup memcached cluster in codfw - https://phabricator.wikimedia.org/T86888#1059868 (10RobH) a:5RobH>3Joe [19:09:43] 7Blocked-on-Operations, 6operations, 10RESTBase, 6Scrum-of-Scrums, and 3 others: RESTBase production hardware - 4 of 6 ready - https://phabricator.wikimedia.org/T76986#1059872 (10GWicke) [19:10:57] 7Blocked-on-Operations, 6operations, 10RESTBase, 6Scrum-of-Scrums, and 3 others: RESTBase production hardware - 4 of 6 ready - https://phabricator.wikimedia.org/T76986#824247 (10GWicke) 4 of 6 servers are now online and serving requests. The remaining two are: - restbase1001: needs to be racked - restbas... [19:25:07] (03Abandoned) 10Nuria: Doubling NavigationTiming Sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192332 (owner: 10Nuria) [19:25:51] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1059937 (10RobH) The sudo escalation has been approved in our ops meeting just now (2015-02-23) [19:26:31] 6operations, 6Services, 7service-runner: Create a standard puppet module for service-runner services - https://phabricator.wikimedia.org/T89901#1059938 (10GWicke) [19:26:56] 6operations, 6Release-Engineering, 6Services, 7service-runner: Create a standard puppet module for service-runner services - https://phabricator.wikimedia.org/T89901#1048322 (10GWicke) [19:27:32] 10Ops-Access-Requests, 6operations: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1059945 (10RobH) 5stalled>3Resolved This was pushed live via https://phabricator.wikimedia.org/T88769 (two tasks for same thing causing confusion). This was also approved for sudo escalatio... [19:27:47] 10Ops-Access-Requests, 6operations: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1059948 (10RobH) 5Open>3Resolved This was pushed live via https://phabricator.wikimedia.org/T88769 (two tasks for same thing causing confusion). This was also approved for sudo esca... [19:29:30] (03PS1) 10Nuria: Making alarms for NavigationTiming events less sensitive [puppet] - 10https://gerrit.wikimedia.org/r/192375 [19:31:32] (03PS2) 10Ori.livneh: Make alarms for NavigationTiming events less sensitive [puppet] - 10https://gerrit.wikimedia.org/r/192375 (owner: 10Nuria) [19:36:38] (03PS3) 10Nuria: Making alarms for NavigationTiming events less sensitive [puppet] - 10https://gerrit.wikimedia.org/r/192375 [19:38:15] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1059970 (10GWicke) It would be great if the replacement supported [multi-metric packets](https://github.com/etsy/statsd/blob/master/docs/metric_types.md#multi-metric-packets) as in [statsd](https://github.com/etsy/statsd). Thi... [19:40:30] _joe_: hm, thinking about ganglia/ cluster variable/ hiera [19:40:34] maybe it isn't set properly for some reason? [19:40:48] if i try to render a file with that variable via puppet, that would be a good test, yes? [19:42:31] <_joe_> ottomata: it is set, look at the ganglia config file on the hosts :) [19:42:38] <_joe_> that's where it acts, not magic [19:43:02] PROBLEM - HTTPS on cp1008 is CRITICAL: Return code of 255 is out of bounds [19:43:41] der, right [19:44:06] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1059992 (10chasemp) > - the flush period of those statsd instances is shorter than the aggregation period in graphite (1/5?) Why? [19:45:12] RECOVERY - HTTPS on cp1008 is OK: SSLXNN OK - 36 OK [19:46:54] mentally checking out for the night (actually eating the dinner I made earlier) [19:48:15] 6operations: NIC misassigned (double entries) by jessie installer - https://phabricator.wikimedia.org/T90236#1060025 (10fgiunchedi) more from /var/log/syslog, I've tried passing `rd.udev.log-priority=debug` and `udev.log-priority=debug` at boot but doesn't seem to have an effect ``` ~ # grep -e bnx -e udev -e e... [19:50:26] RT duty! Makes your dinner get cold [19:50:35] What a great slogan [19:51:49] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1060039 (10fgiunchedi) btw this is strictly to replace txstatsd with something more performant but keep things otherwise the same, there's a related discussion in T89857 about broader changes [19:52:16] no, the meeting makes my dinner get cold! clinic duty (there, see? I didn't call it rt duty) is fine [19:53:20] 6operations, 10Wikimedia-General-or-Unknown, 7Documentation: Add a wiki on wikitech is out of date, incomplete - https://phabricator.wikimedia.org/T87588#1060053 (10tomasz) [19:56:59] 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: Cleanup and delete vewikimedia - https://phabricator.wikimedia.org/T57737#1060077 (10tomasz) [20:06:50] 6operations, 5Patch-For-Review: revoke old digicert certificates - https://phabricator.wikimedia.org/T86689#1060151 (10RobH) 5Open>3Resolved All of these were revoked awhile back, and I neglected to resolve this task. [20:09:38] legoktm: the next step is global talk page ? [20:09:58] uhmmmmmmm :P [20:10:24] I'm hesitant to do that because that's supposedly Flow's job [20:11:19] hmm, we havn't talked at all about how that would happen, but seems plausible :) [20:11:21] Probably, global scribunto module is less controversial ;) [20:11:43] ebernhardson: i am willing to pay for such a feature :) [20:12:18] matanya: the main thing we havn't figured out yet is what rules to apply when you attempt to post cross-wiki [20:12:20] * Nemo_bis 's global talk page is his email address [20:12:34] matanya: but with SUL perhaps that doesn't matter anymore(was alway my hope) :) [20:12:53] I hope you are right [20:13:24] i find myself opening way too many wikis to read talk page messages, a one feed for all wikis would be gread [20:13:33] *t [20:13:41] sounds like you want a cross-wiki watchlist :P [20:13:50] that too [20:13:51] yes i want that too :) [20:14:10] i even wrote a small demo with nodejs and knockout that reads the irc feed and allows that [20:14:20] but then the weekend was over and i neve rlooked at it again :P [20:14:20] 10Ops-Access-Requests, 6operations: Requesting access to tin.eqiad.wmnet for thcipriani - https://phabricator.wikimedia.org/T90467#1060186 (10RobH) @thcipriani, As this is a different request than the recently merged (T89378), we'll need a manager's approval for you to have this access (access to tin, and the... [20:14:55] ebernhardson: this is my global watchlist: https://tools.wmflabs.org/guc/?user=matanya [20:15:21] but it is not the most efficint way to do stuff [20:15:36] matanya: mine was faster :P (but it kept everything in redis, so probably not scalable :) [20:15:47] matanya: certainly appears to work though [20:16:04] yes, we also need a real global contribs tool [20:17:02] and to look at RC i use Krinkle's RTRC [20:17:15] https://phabricator.wikimedia.org/T66475 [20:17:22] which should be the default RC imho [20:28:23] 10Ops-Access-Requests, 6operations: Requesting access to tin.eqiad.wmnet for thcipriani - https://phabricator.wikimedia.org/T90467#1060226 (10greg) >>! In T90467#1060186, @RobH wrote: > @thcipriani, > > As this is a different request than the recently merged (T89378), we'll need a manager's approval for you t... [20:32:31] (03PS1) 10Ottomata: Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) [20:35:44] (03PS2) 10Ottomata: Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) [20:46:02] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: Puppet has 1 failures [20:48:14] (03PS3) 10Ottomata: Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) [20:49:47] (03PS4) 10Ottomata: Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) [21:00:28] (03PS5) 10Ottomata: Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) [21:02:53] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:06:35] !log deployed parsoid version d9ac8c21 [21:06:38] Logged the message, Master [21:08:16] jouncebot: next [21:08:29] * bd808 goes to poke jouncebot with a stick [21:08:34] robh, is https://phabricator.wikimedia.org/T87028 supposed to be restricted visibility? [21:09:37] andrewbogott: yt? [21:10:42] (03CR) 10BBlack: [C: 031] Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) (owner: 10Ottomata) [21:10:54] ottomata: what’s up? [21:11:24] jouncebot: next [21:11:25] In 2 hour(s) and 48 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150224T0000) [21:13:46] andrewbogott: looking into the ganglia thing [21:13:54] great, you beat me to it. [21:14:07] ha, was going to sync with you on it. so, you are just missing one node? virt1000? [21:14:40] ottomata: it’s different from when I last looked [21:14:46] last week everything virt100x was missing [21:14:48] now they are there [21:14:51] and I didn’t change anything [21:15:04] although, yes, virt1000 is still missing [21:15:05] aye [21:15:27] andrewbogott: do you knwo what's up with hieradata/hsots [21:15:33] why are only some of our nodes there [21:15:34] ? [21:15:40] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting deployment access for milimetric - https://phabricator.wikimedia.org/T88769#1060367 (10Milimetric) thanks very much Ariel, I have the rights I need, you rock. [21:15:56] So… as of last week, things that had $cluster defined in puppet appeared in ganglia, things that had it in hiera but not puppet did not. [21:15:58] Today the opposite is the case. [21:16:10] So something elsewhere was fixed — I would say that it is working correctly now [21:16:25] Give me a minute, I’ll see what analytics looks like [21:17:20] analytics cluster things are totally missing from ganglia [21:17:26] anything that should be in $cluster = analytics [21:17:30] they disappeared feb 18 [21:17:37] ok, so probably they need hiera entires [21:17:40] entries [21:20:50] _joe_: would this be true? [21:21:31] <_joe_> ottomata: lemme check [21:21:38] <_joe_> ottomata: one machine of the cluster? [21:21:52] analytics1011.eqiad.wmnet [21:22:10] do we need to create hiera entries for every host now? [21:22:32] no, it’s done by roles, and then there’s a hiera entry for the role. e.g. role/common/nova/compute.yaml:cluster: virt [21:22:48] how does that role get applied to the node? [21:22:53] <_joe_> ottomata: cluster { name = "Analytics cluster eqiad" owner = "Wikimedia Foundation" latlong = "unspecified" url = "http://ganglia.wikimedia.org" [21:22:58] yes [21:23:02] <_joe_> so it's correct [21:23:04] in ganglia it is good [21:23:15] but somethign is fishy [21:23:17] and i'm not sure what [21:23:24] <_joe_> it's not with hiera though [21:23:26] so i'm trying to understand this hiera part now, so I can see what has changed [21:23:33] i believe you but i want to understand anyway [21:23:39] <_joe_> ok [21:23:47] why does analytics1001 have a hosts/ file, but not say, analytics1011? [21:24:05] AH [21:24:07] regex.yaml [21:24:07] <_joe_> because I suppose it has a role applied with the role keyword [21:24:11] <_joe_> or that [21:24:22] <_joe_> :) [21:24:24] (03CR) 10BryanDavis: [C: 031] Revert "Limit runJobs output to warning and higher severity" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192083 (owner: 10Hoo man) [21:24:34] if it was role keyword, where would that be? [21:24:50] <_joe_> hieradata/role/common [21:25:04] <_joe_> look at wikitech for hiera, it's all there [21:25:05] matanya: For this quarter I hope to spend some time writing an MW extension that provides an RTRC-like experience [21:25:09] Tied in with CVN [21:25:11] and stream.wm.o [21:25:13] rcstream [21:25:15] (instead of polling api) [21:25:24] ok ja sorry, I should rtfm :) thanks [21:25:39] <_joe_> Krinkle: why not a service instead? [21:26:09] _joe_: services providing special pages? [21:26:22] <_joe_> Right [21:26:31] ok, andrewbogott, so. virt1000 also looks good in gmond.conf [21:26:49] <_joe_> I missed the second part [21:27:05] <_joe_> s/second/that/ [21:27:27] <_joe_> ottomata: I guess something fishy is happening with multicast [21:28:30] <_joe_> but look for relevant changes around that date [21:28:36] <_joe_> it may well be my mistake [21:31:52] that would be great [21:34:40] 6operations, 6Phabricator: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1060488 (10chasemp) 3NEW [21:35:17] 6operations, 6Phabricator: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1060496 (10chasemp) [21:38:59] ah, andrewbogott, hm, to your knowledge, shoudl all ganglia aggregators for a given cluster have data for all hosts? [21:39:03] in that cluster? [21:39:12] since they are all using the same multicast group? [21:39:19] I would think so, yes. [21:39:37] yeah me too [21:39:40] in both of our cases [21:39:47] the first aggregator listed in gmetad.conf [21:39:49] is missing the offending nodes [21:39:59] but, the next aggregator has them [21:40:13] it look slike each aggregator only has nodes for that um...rack/row? [21:40:16] however they are set up [21:40:25] maybe something is fishy with multicast [21:40:34] hey paravoid [21:40:46] do you know of any network multicast stuff that might have changed last week? [21:56:10] (03PS1) 10Ottomata: Remove non production ganglia aggregator analytics1010 [puppet] - 10https://gerrit.wikimedia.org/r/192460 (https://phabricator.wikimedia.org/T90035) [21:57:16] (03CR) 10Ottomata: [C: 032] Remove non production ganglia aggregator analytics1010 [puppet] - 10https://gerrit.wikimedia.org/r/192460 (https://phabricator.wikimedia.org/T90035) (owner: 10Ottomata) [21:58:09] <_joe_> ottomata: that sounds like a problem, yes :) [21:58:28] _joe_: i also noticed, that the first aggregator listed for analytics cluster [21:58:38] was not in the analytics cluster [21:58:43] bblack: thoughts re: https://gerrit.wikimedia.org/r/#/c/192370/ ? ottomata +1'd but it's a VCL patch so I'm not deploying it myself. [21:58:44] could also be very related [21:58:50] but, yes, something still seems weird with multicast [21:58:57] i can't send multicast traffic between rows [21:59:21] <_joe_> ottomata: that is surely related [21:59:34] <_joe_> if not in the cluster, it's not in the multicast group [21:59:46] <_joe_> so no way for it to listen to the others chattering [21:59:50] aye [21:59:54] <_joe_> the ganglia stuff is so broken [21:59:55] but, i did a different test too [22:00:10] so [22:00:12] <_joe_> I'm not sure about the rows problem [22:00:13] next aggregator [22:00:15] analytics1013 [22:00:15] yeah [22:00:19] is in analytics clsuter [22:00:26] <_joe_> I am able to speak multicast between rows [22:00:26] and it only contains hosts that are in its row [22:00:38] <_joe_> mh that's kinda strange [22:00:43] how are you testing? i'm using iperf, maybe doing something wrong [22:00:46] <_joe_> check its own firewall rules [22:00:52] OH [22:00:58] or, maybe I just have a typo in my grep... [22:00:59] <_joe_> ottomata: mc1016 is in ganglia [22:01:10] <_joe_> and it's in row C I think [22:01:16] han gon [22:01:23] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1060612 (10GWicke) >>! In T90111#1059992, @chasemp wrote: > >> - the flush period of those statsd instances is shorter than the aggregation period in graphite (1/5?) > > Why? If the statsd flush period is longer, then there... [22:02:01] yeah i take it back, it was just the first bad aggregator [22:02:19] still, my iperf multicast test didn't work, must have been doing somehting wrong there too [22:05:41] 6operations, 10ops-eqiad: cr1-eqiad power supply fan failure - https://phabricator.wikimedia.org/T89224#1060627 (10faidon) 5Resolved>3Open This isn't fixed, T89999 is just masking this error. [22:05:52] <_joe_> ottomata: happy to help by being grumpy and saying "no" to most hypotheses :P [22:06:09] 7Puppet, 6operations: Virt nodes missing from the 'virtualization cluster eqiad' ganglia report - https://phabricator.wikimedia.org/T90035#1060631 (10Ottomata) This turned out to be caused by a change that removed analytics1010 from the ganglia analytics cluster, but didn't remove it as an aggregator. [22:06:16] 7Puppet, 6operations: Virt nodes missing from the 'virtualization cluster eqiad' ganglia report - https://phabricator.wikimedia.org/T90035#1060632 (10Ottomata) (for analytics, anyway) [22:06:29] thanks _joe_ :) [22:07:18] (03PS6) 10Ottomata: Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) [22:08:19] (03CR) 10Ottomata: [C: 032] Redirect to https for hue if not coming via https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192389 (https://phabricator.wikimedia.org/T85834) (owner: 10Ottomata) [22:14:07] (03PS1) 10Ottomata: Configure hue to handle upstream https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192463 [22:14:41] (03PS2) 10Ottomata: Configure hue to handle upstream https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192463 [22:16:18] (03CR) 10Ottomata: [C: 032] Configure hue to handle upstream https proxy [puppet] - 10https://gerrit.wikimedia.org/r/192463 (owner: 10Ottomata) [22:17:20] (03PS4) 10Ori.livneh: Making alarms for NavigationTiming events less sensitive [puppet] - 10https://gerrit.wikimedia.org/r/192375 (owner: 10Nuria) [22:17:20] 6operations, 6Phabricator, 7domains: enable email for tickets in domains project? - https://phabricator.wikimedia.org/T88842#1060657 (10chasemp) is this done then? [22:17:28] (03CR) 10Ori.livneh: [C: 032 V: 032] Making alarms for NavigationTiming events less sensitive [puppet] - 10https://gerrit.wikimedia.org/r/192375 (owner: 10Nuria) [22:19:08] (03Abandoned) 10Ottomata: Create apache site define for forcing https and doing transparent reverse proxy, use this for hue [puppet] - 10https://gerrit.wikimedia.org/r/191911 (https://phabricator.wikimedia.org/T85834) (owner: 10Ottomata) [22:20:04] (03CR) 10Ottomata: [C: 032] "Woot, we ended up doing this in varnish." [dns] - 10https://gerrit.wikimedia.org/r/180471 (owner: 10Dzahn) [22:30:42] bblack, hi, dan has been asking me about all traffic tagging, any progress on that? [22:34:00] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1060704 (10chasemp) >>! In T90111#1060612, @GWicke wrote: >>>! In T90111#1059992, @chasemp wrote: >> >>> - the flush period of those statsd instances is shorter than the aggregation period in graphite (1/5?) >> >> Why? > > I... [22:34:08] (03PS1) 10Legoktm: contint: Use 'wmf-deploy' branch for cloning mediawiki/tools/codesniffer [puppet] - 10https://gerrit.wikimedia.org/r/192466 (https://phabricator.wikimedia.org/T90495) [22:38:04] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1060711 (10GWicke) > You are going to be getting smaller actual aggregate periods than reported, and you are going to be clobbering any gauge data especially. Not with the aggregation settings [I linked to](https://github.com/... [22:39:45] yurik: zero progress (pun intended!) [22:47:47] godog: are instances swift-filippo-c2 and swift-filippo-c1 still in use? (I’m just doing housekeeping, nothing urgent) [22:48:24] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1060752 (10chasemp) >>! In T90111#1060711, @GWicke wrote: >> You are going to be getting smaller actual aggregate periods than reported, and you are going to be clobbering any gauge data especially. > > Not with the aggregatio... [22:52:38] <_joe_> andrewbogott: you can turn off everything in the hat-imagescaler project if you feel like it [22:53:16] _joe_: does ‘turn off’ == ‘delete’? [22:53:27] <_joe_> nope please :) [22:53:55] ok [22:53:56] <_joe_> I may still use what I prepared there, but they can stay turned off for now [22:54:03] ah, cool. thanks [23:01:10] (03CR) 10Calak: "Can you check this? Thank you." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [23:06:06] 6operations, 7Graphite: replace txstatsd - https://phabricator.wikimedia.org/T90111#1060866 (10GWicke) Hmm, if I understand you correctly you are saying that it's not possible to aggregate several flushes in graphite. That would indeed be a bummer, as it would make it hard to parallelize statsds feeding into g... [23:14:26] (03CR) 10BBlack: [C: 031] "I don't like this idea a lot, but I don't think there's a technical reason this doesn't work :p" [puppet] - 10https://gerrit.wikimedia.org/r/192370 (owner: 10Ori.livneh) [23:14:45] bblack: why not? [23:15:03] there are so many levels on which I could answer that question [23:15:37] really, probably the highest-level answer is that I'm not a fan of analytics data gathering to begin with :) [23:15:40] well, the point of the patch is to improve the coherence and structure of the way we do this, rather than to scratch a particular itch [23:16:02] my point is, feel free to merge, I don't think it breaks anything, but I'm not going to act cheery about it :p [23:16:05] so if you don't think it's the right idea, i'm up for thinking it through with you [23:16:55] basically, when I see us creating new HTTP requests for the purpose of ... tracking other requests ... my "we're way too far down the analytics rabbit hole" alarm goes off [23:16:55] all statsv data, all projected multimedia data, and a good portion of eventlogging data is performance monitoring [23:17:10] which we're already doing, I know, so this is just one more URL pattern for an existing thing [23:17:28] bblack: that was exactly my reaction (i'll dig up the phab task), but multimedia has a good reason [23:17:42] which is that under certain conditions images are preloaded as an optimization strategy [23:17:47] but they're not necessarily shown to the user [23:17:59] they want to track image views, rather than image requests, and the two are not always identical. [23:18:01] I think I fall into the camp that ultimately doesn't care about the reason. The combined arguments of better privacy and simpler technical details >> data [23:19:15] bblack, I totally see where you're coming from, but I think you might be bringing a bias to the table from your experience of dealing with this in the WMF previously [23:19:42] well, we don't have to hash it out anyways. I know I can't win this battle. [23:19:47] but this request specifically came from erik zachte, who is very close to your heart here, and the further one could possibly be from someone who'd want to optimize features for user behaviors [23:19:53] I'm just taking the opportunity to make noise :p [23:20:27] you gave me the green light to merge -- if all I cared about was pushing this out I would have just gone ahead with it after that. [23:20:47] so I wouldn't be so fatalistic about your ability to persuade :) [23:22:19] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [23:22:27] is there not some way to distinguish the preloads from the actual views in another way? via something about the requests themselves that doesn't vary caching? [23:22:39] setting/seeing a header? [23:24:04] bblack: possibly... [23:24:53] making the URL space for beaconing generic and official is going to create more uses of course. eventually I expect to be debugging something in varnishlog and notice that the top URL hit with 72% of our request traffic is something that's beaconing mouseover movements to gather data on optimizing image placement or something [23:25:08] because I'm very pessimistic about these kinds of things :) [23:25:43] do you mind if I copy/paste that into a comment on the patch? I didn't think of that, and I think it's a fair point [23:26:16] go for it [23:27:29] (03CR) 10Ori.livneh: "Interest thought from bblack: Is there not some way to distinguish the preloads from the actual views in another way, via something about " [puppet] - 10https://gerrit.wikimedia.org/r/190821 (https://phabricator.wikimedia.org/T89088) (owner: 10Gilles) [23:27:29] I'm going to go brave the cold and acquire semi-nutrious food-like input, bbl :) [23:27:39] bye, thanks for the review [23:28:00] (03CR) 10Ori.livneh: " making the URL space for beaconing generic and official is going to create more uses of course. eventually I expect to be debugg" [puppet] - 10https://gerrit.wikimedia.org/r/192370 (owner: 10Ori.livneh) [23:28:48] !log on osmium installing packages necessary for building hhvm [23:28:52] Logged the message, Master [23:48:00] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [23:48:59] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [23:57:14] (03CR) 10Gergő Tisza: "It's technically possible as we load the images via AJAX. You'll lose image view information for preloaded images though (there is no sepa" [puppet] - 10https://gerrit.wikimedia.org/r/190821 (https://phabricator.wikimedia.org/T89088) (owner: 10Gilles) [23:57:37] 6operations, 10ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1060964 (10Papaul) Rack table update. The DB servers in C6 are in place in total 9 servers in C6. I have 19 more servers that i need to rack, waiting for final rack location.