[00:00:15] related: if i submit a patch, jenkins starts running the tests. if i +2 it, *don't wait for the commit tests to finish, abort them* [00:00:23] rather than wait for the test suite to finish just so you can run it again [00:00:56] hmmm, interesting, yeah [00:02:24] so, the one I skipped/didn't record ("make certain tests only run if a particular file is changed") I'm not sure how to record, maybe I'll just do the last suggestion (re composer.json) and then work from there [00:03:21] is there an RT ticket for better hardware? [00:03:36] I think so yeah, one sec [00:03:41] (03PS5) 10Dzahn: move racktables from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/167903 [00:05:34] I don't think there are any tests specifically of the code in mw/vendor. Adding mw/vendor meant that it needed to be present for all tests so that the PSR-3 patch can land. [00:05:43] And there is something not quite optimal about the cloning for that as I recall. Like the old system cloned with hardlinks and now it doesn't. [00:06:13] (03CR) 10Ori.livneh: move racktables from misc to module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/167903 (owner: 10Dzahn) [00:08:16] That being said, on a few random tests I'm looking at in Jenkins, the clone step was <5 seconds of a 6+ minute test run [00:09:04] ori: I can't find a bug nor rt re jenkins ram/hardware, though I remember there being movement on it (or I'm just being hopeful) [00:09:18] I mean, there's the general "add more jenkins slaves" one [00:09:36] ... [00:09:38] (03CR) 10Dzahn: move racktables from misc to module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/167903 (owner: 10Dzahn) [00:09:42] https://bugzilla.wikimedia.org/show_bug.cgi?id=70049 [00:10:01] (03CR) 10Ori.livneh: [C: 031] move racktables from misc to module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/167903 (owner: 10Dzahn) [00:10:04] whatever happened to the run all extension tests under HHVm together idea? [00:10:20] it wouldn't have worked, but it was a start in the right direction [00:26:46] greg-g: what's your wikitech name? [00:26:46] Greg Grossmeier [00:26:46] ^ [00:26:46] done [00:26:46] thanks muchly [00:26:47] (03CR) 10Dzahn: [C: 032] "yep, all these classes do is include webserver::base (already getting that from below), and installing the packages" [puppet] - 10https://gerrit.wikimedia.org/r/167999 (owner: 10Dzahn) [00:27:08] wtf, why do I still see nothing on that page? [00:28:14] you probably need to log out and log back in [00:28:14] heh /me sighs [00:28:15] you're right :/ [00:28:15] greg-g: an e-mail to engineering@ and wikitech@ saying "labs is oversubscribed and we badly need to have more jenkins slaves, could you take a look at your instances and see if there are any you don't need anymore" could help [00:28:16] idspispopd [00:28:24] idklev! [00:28:34] Coren: did you already send a similar email to labs-l like that ^ ? [00:29:06] not everyone reads labs-l [00:29:10] greg-g: We did before the migration. It helped a little. Then we lost one of the virt hosts. [00:29:15] * greg-g nods [00:29:17] right right [00:30:04] ori: will do tonight/tomorrow morning, I have to run now, other things added as tasks for antoine to see tonight, will hopefully get movement on those [00:30:13] thanks very much [00:30:16] thank you [00:30:24] hope i wasn't too big a jerk [00:30:36] nothing I can't handle ;) [00:30:53] * greg-g goes [00:35:37] (03PS1) 10Dzahn: move 'noc' from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/168006 [00:39:17] Error: Could not find data item mediawiki_memcached_servers in any Hiera data file and no default supplied at /opt/wmf/software/compare-puppet-catalogs/external/change/168006/puppet/manifests/role/mediawiki.pp:26 on node terbium.eqiad.wmnet [00:39:29] seems so unrelated to the change i made, but happens to be on the same node [00:39:34] in compiler that is [00:40:54] yep, it already fails in "production" http://puppet-compiler.wmflabs.org/436/change/168006/compiled/puppet_catalogs_3_production/terbium.eqiad.wmnet.warnings [00:43:24] (03CR) 10Dzahn: "can't really run this one in compiler because it fails in the "production" run before this change. see here http://puppet-compiler.wmflabs" [puppet] - 10https://gerrit.wikimedia.org/r/168006 (owner: 10Dzahn) [00:44:56] (03CR) 10Dzahn: "unrelated Error: Could not find data item mediawiki_memcached_servers in any Hiera data file" [puppet] - 10https://gerrit.wikimedia.org/r/168006 (owner: 10Dzahn) [00:51:30] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 311 seconds [00:52:01] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 345 seconds [00:53:00] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds [00:53:31] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [01:10:11] (03PS2) 10Dzahn: apply role::tor on radium [puppet] - 10https://gerrit.wikimedia.org/r/167161 [01:10:39] (03PS3) 10Dzahn: apply role::tor on radium [puppet] - 10https://gerrit.wikimedia.org/r/167161 [01:11:32] (03CR) 10Dzahn: [C: 032] apply role::tor on radium [puppet] - 10https://gerrit.wikimedia.org/r/167161 (owner: 10Dzahn) [01:12:33] \o/ [01:15:03] paravoid: :) almost, almost.. it applied all the stuff [01:15:05] Unknown option 'RunAsDemon'. Failing [01:15:59] typo, bah [01:16:56] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: Puppet has 1 failures [01:18:35] (03PS1) 10Dzahn: tor - fix typo 'demon' -> 'daemon' [puppet] - 10https://gerrit.wikimedia.org/r/168012 [01:19:32] (03PS2) 10Dzahn: tor - fix typo 'demon' -> 'daemon' [puppet] - 10https://gerrit.wikimedia.org/r/168012 [01:20:29] "< mutante> typo, bah" +1 for linting/non-voting ;) [01:21:29] it wouldn't catch those though [01:21:43] it was inside an app config [01:22:25] (03CR) 10Dzahn: [C: 032] tor - fix typo 'demon' -> 'daemon' [puppet] - 10https://gerrit.wikimedia.org/r/168012 (owner: 10Dzahn) [01:23:10] mutante: :) [01:23:44] Nickname 'wikimedia-eqiad-1' is wrong length or contains illegal characters :p [01:23:56] hah [01:31:42] umm.. supposed to be allowed < opal> [A-Za-z_`[]{}\|][-0-9A-Za-z_`[]{}\|]{29} [01:39:31] (03PS1) 10Dzahn: tor - ensure /srv/tor exists/owned by debian-tor [puppet] - 10https://gerrit.wikimedia.org/r/168017 [01:45:46] (03PS2) 10Dzahn: tor - ensure /srv/tor exists/owned by debian-tor [puppet] - 10https://gerrit.wikimedia.org/r/168017 [01:45:53] (03PS1) 10Dzahn: tor - adjust nickname [puppet] - 10https://gerrit.wikimedia.org/r/168020 [01:46:17] (03PS2) 10Dzahn: tor - adjust nickname [puppet] - 10https://gerrit.wikimedia.org/r/168020 [01:51:41] (03CR) 10Dzahn: [C: 032] tor - ensure /srv/tor exists/owned by debian-tor [puppet] - 10https://gerrit.wikimedia.org/r/168017 (owner: 10Dzahn) [01:52:16] (03CR) 10Dzahn: [C: 032] tor - adjust nickname [puppet] - 10https://gerrit.wikimedia.org/r/168020 (owner: 10Dzahn) [01:58:27] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: Puppet has 1 failures [01:59:48] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [02:01:36] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [02:07:12] (03PS1) 10Dzahn: tor - run as debian-tor user [puppet] - 10https://gerrit.wikimedia.org/r/168022 [02:07:17] (03CR) 10jenkins-bot: [V: 04-1] tor - run as debian-tor user [puppet] - 10https://gerrit.wikimedia.org/r/168022 (owner: 10Dzahn) [02:07:35] (03PS2) 10Dzahn: tor - run as debian-tor user [puppet] - 10https://gerrit.wikimedia.org/r/168022 [02:11:41] (03CR) 10Dzahn: [C: 032] tor - run as debian-tor user [puppet] - 10https://gerrit.wikimedia.org/r/168022 (owner: 10Dzahn) [02:11:49] (03PS1) 10Dzahn: tor - include passwords for control pass [puppet] - 10https://gerrit.wikimedia.org/r/168023 [02:12:06] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [02:12:24] (03PS2) 10Dzahn: tor - include passwords for control pass [puppet] - 10https://gerrit.wikimedia.org/r/168023 [02:13:31] (03CR) 10Dzahn: [C: 032] tor - include passwords for control pass [puppet] - 10https://gerrit.wikimedia.org/r/168023 (owner: 10Dzahn) [02:17:49] !log LocalisationUpdate completed (1.25wmf3) at 2014-10-22 02:17:49+00:00 [02:18:01] Logged the message, Master [02:18:27] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [02:21:46] PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: / 349 MB (3% inode=73%): [02:24:13] (03PS1) 10Dzahn: tor - remove obsolete config option 'Group' [puppet] - 10https://gerrit.wikimedia.org/r/168025 [02:24:19] (03CR) 10jenkins-bot: [V: 04-1] tor - remove obsolete config option 'Group' [puppet] - 10https://gerrit.wikimedia.org/r/168025 (owner: 10Dzahn) [02:24:24] (03PS2) 10Dzahn: tor - remove obsolete config option 'Group' [puppet] - 10https://gerrit.wikimedia.org/r/168025 [02:26:46] (03CR) 10Dzahn: [C: 032] "like https://bugzilla.redhat.com/show_bug.cgi?id=515822" [puppet] - 10https://gerrit.wikimedia.org/r/168025 (owner: 10Dzahn) [02:30:07] !log LocalisationUpdate completed (1.25wmf4) at 2014-10-22 02:30:07+00:00 [02:30:13] Logged the message, Master [02:30:45] (03PS1) 10Dzahn: tor - remove trailing / from DataDirectory [puppet] - 10https://gerrit.wikimedia.org/r/168027 [02:32:29] (03PS2) 10Dzahn: tor - remove trailing / from DataDirectory [puppet] - 10https://gerrit.wikimedia.org/r/168027 [02:37:58] (03CR) 10Dzahn: [C: 032] tor - remove trailing / from DataDirectory [puppet] - 10https://gerrit.wikimedia.org/r/168027 (owner: 10Dzahn) [02:52:24] (03PS1) 10Dzahn: tor - do not use /srv, let the package handle it [puppet] - 10https://gerrit.wikimedia.org/r/168031 [02:52:29] (03CR) 10jenkins-bot: [V: 04-1] tor - do not use /srv, let the package handle it [puppet] - 10https://gerrit.wikimedia.org/r/168031 (owner: 10Dzahn) [02:53:12] (03PS2) 10Dzahn: tor - do not use /srv, let the package handle it [puppet] - 10https://gerrit.wikimedia.org/r/168031 [02:53:57] PROBLEM - BGP status on cr2-ulsfo is CRITICAL: CRITICAL: No response from remote host 198.35.26.193 [02:54:33] (03CR) 10Dzahn: [C: 032] tor - do not use /srv, let the package handle it [puppet] - 10https://gerrit.wikimedia.org/r/168031 (owner: 10Dzahn) [02:55:47] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: Puppet has 1 failures [02:56:48] RECOVERY - puppet last run on radium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [02:56:57] RECOVERY - BGP status on cr2-ulsfo is OK: OK: host 198.35.26.193, sessions up: 42, down: 0, shutdown: 0 [03:15:54] (03Restored) 10Catrope: Clean up the mess that is SSL certificate installation [puppet] - 10https://gerrit.wikimedia.org/r/15561 (owner: 10Catrope) [03:21:16] PROBLEM - Disk space on ocg1001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=72%): [03:42:56] PROBLEM - Disk space on ocg1001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=72%): [03:52:06] PROBLEM - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: / 348 MB (3% inode=72%): [03:54:53] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 22 03:54:53 UTC 2014 (duration 54m 52s) [03:54:59] Logged the message, Master [03:57:16] PROBLEM - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=72%): [04:20:07] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [04:22:37] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail [04:38:37] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [04:41:26] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [05:10:38] (03PS1) 10Ori.livneh: HHVM: install source to /usr/local/src/hhvm [puppet] - 10https://gerrit.wikimedia.org/r/168041 [05:13:28] (03CR) 10Ori.livneh: [C: 032] HHVM: install source to /usr/local/src/hhvm [puppet] - 10https://gerrit.wikimedia.org/r/168041 (owner: 10Ori.livneh) [05:17:47] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: puppet fail [05:18:48] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: puppet fail [05:19:39] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: puppet fail [05:19:48] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: puppet fail [05:19:49] PROBLEM - puppet last run on mw1020 is CRITICAL: CRITICAL: puppet fail [05:23:32] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail [05:23:33] PROBLEM - puppet last run on mw1028 is CRITICAL: CRITICAL: puppet fail [05:25:24] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: puppet fail [05:28:12] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: puppet fail [05:31:45] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail [05:32:14] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: puppet fail [05:33:14] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: puppet fail [05:34:15] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [05:34:24] PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: puppet fail [05:34:26] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: puppet fail [05:36:14] PROBLEM - puppet last run on mw1027 is CRITICAL: CRITICAL: puppet fail [05:39:47] (03PS1) 10Ori.livneh: hhvm: Make source file installation refreshonly [puppet] - 10https://gerrit.wikimedia.org/r/168044 [05:39:50] <_joe_> ori: why did you remove warmup.urls? [05:40:04] _joe_: it's in the mediawiki module now [05:40:12] <_joe_> oh ok :) [05:40:24] (03CR) 10Ori.livneh: [C: 032 V: 032] hhvm: Make source file installation refreshonly [puppet] - 10https://gerrit.wikimedia.org/r/168044 (owner: 10Ori.livneh) [05:40:46] <_joe_> Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter shell at /etc/puppet/modules/hhvm/manifests/init.pp:218 [05:41:35] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:42:35] RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [05:44:15] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:46:19] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [05:50:55] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:51:15] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:51:15] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [05:53:25] RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:53:26] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [05:55:18] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [05:56:05] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:56:25] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: puppet fail [05:57:07] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:57:46] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:59:05] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:59:15] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 61 seconds ago with 0 failures [06:01:10] (03CR) 10Nemo bis: [C: 031] "Looks ok on simple.wikipedia.wmflabs.org. http://pediapress.com/books/show/bdace4c4669bb7c42cfe40d818764e/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167866 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [06:02:12] <_joe_> Nemo_bis: good morning [06:02:48] _joe_: mogge :) [06:14:11] PROBLEM - puppet last run on db2011 is CRITICAL: CRITICAL: puppet fail [06:15:06] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:21:10] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [06:23:30] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail [06:26:12] RECOVERY - Disk space on ocg1001 is OK: DISK OK [06:27:00] RECOVERY - Disk space on ocg1003 is OK: DISK OK [06:27:00] RECOVERY - Disk space on ocg1002 is OK: DISK OK [06:28:00] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: puppet fail [06:28:40] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: puppet fail [06:28:50] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [06:29:00] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: puppet fail [06:29:01] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: puppet fail [06:30:39] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:41] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:41] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:49] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:49] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:50] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:00] PROBLEM - puppet last run on analytics1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:00] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:00] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:00] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:00] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:09] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:19] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:20] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:21] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:30] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:39] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:39] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:39] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:49] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:49] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:00] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:34] (03PS1) 10Yuvipanda: Add basic setup.py file [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168046 [06:32:35] legoktm: ^ CR? [06:32:38] python [06:33:22] (03CR) 10Yuvipanda: [C: 032 V: 032] Add basic setup.py file [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168046 (owner: 10Yuvipanda) [06:33:26] (03CR) 10Legoktm: Add basic setup.py file (031 comment) [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168046 (owner: 10Yuvipanda) [06:33:29] RECOVERY - puppet last run on db2011 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:33:45] (03CR) 10Legoktm: "-1" [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168046 (owner: 10Yuvipanda) [06:34:12] legoktm: bah, missed that [06:35:32] (03PS1) 10Yuvipanda: Fix URL to point to the proper git repository [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168047 [06:36:29] <_joe_> YuviPanda: what's shinkengen? [06:36:39] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: puppet fail [06:36:40] <_joe_> does it has to interact with puppet in any way? [06:37:02] (03PS1) 10Yuvipanda: Add Apache2 License [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168048 [06:37:12] _joe_: it's like naggen2, but for shinken. [06:37:46] _joe_: and no, it doesn't really interact with puppet, other than I'll probably have puppet exec it (or cron it) on the shinken host [06:37:51] it relies on the wikitech API [06:38:05] <_joe_> oh ok so it's specifc for labs [06:38:09] _joe_: yeah [06:38:11] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:38:14] <_joe_> maybe you can abstract the api part [06:38:25] <_joe_> and we'd be able to use it in prod as well :) [06:38:48] _joe_: yeah, I'm doing that next :) It expects 'instance' objects... [06:39:49] _joe_: but since we've no resource collection in labs, monitor_service, etc will need to be generated differently [06:40:12] _joe_: current thinking is that projects will define simple python functions that return Service objects based on hostname, projectname, puppet classes applied [06:40:20] <_joe_> YuviPanda: why we do not collect resources in python? [06:40:42] _joe_: we can't collect resources in labs. too many self hosted puppetmasters, plus it'll be slow, and also insecure since anyone can have root... [06:40:55] wait, what do you mean 'in python'? [06:42:10] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:43:41] There is some sort of a ui image missing : https://bits.wikimedia.org/w/images/ui-bg_highlight-soft_75_cccccc_1x100.png [06:44:15] it is called from some pages: and in the console: Failed to load resource: the server responded with a status of 404 (Not Found) https://bits.wikimedia.org/w/images/ui-bg_highlight-soft_75_cccccc_1x100.png [06:44:59] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:45:10] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:45:10] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:45:30] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:45:30] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:45:39] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:45:41] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:45:50] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:45:59] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:00] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:46:10] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:46:15] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:46:18] (03PS1) 10Yuvipanda: Setup generator as a script by itself [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168050 [06:46:21] matanya: did you see shinken now? [06:46:23] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:24] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:26] matanya: has host configs [06:46:39] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [06:46:39] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:46:40] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:40] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:42] I logged in on monday [06:46:48] it was empty :) [06:46:50] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:47:00] matanya: try shinken.wmflabs.org/all [06:47:09] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:47:10] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:47:10] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 65 seconds ago with 0 failures [06:47:20] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:47:34] deployment-soa-cache01 is down [06:47:38] that is all i see [06:47:39] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:47:50] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:47:50] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:50:46] matanya: in the 'all' tab? [06:50:50] yes [06:51:03] http://shinken.wmflabs.org/all?search=tools is all of toollabs, for example [06:51:06] hard refresh fail [06:51:59] http://shinken.wmflabs.org/user/login needs explanatory link [06:54:37] it needs a lot of love before it's useful. [06:54:45] like a MW OAuth auth provider... [07:07:10] (03PS1) 10Yuvipanda: Make sure setup.py picks up all packages [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168052 [07:07:41] (03CR) 10Yuvipanda: [C: 032 V: 032] Fix URL to point to the proper git repository [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168047 (owner: 10Yuvipanda) [07:09:50] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [07:27:02] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [07:31:43] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet has 1 failures [07:35:46] (03PS2) 10Alexandros Kosiaris: Backup /var/lib/carbon/whisper on graphite [puppet] - 10https://gerrit.wikimedia.org/r/167817 [07:36:44] (03CR) 10Alexandros Kosiaris: [C: 032] torrus - remove subtree for eqiad-pmtpa link [puppet] - 10https://gerrit.wikimedia.org/r/167892 (owner: 10Dzahn) [07:36:57] (03CR) 10Alexandros Kosiaris: [C: 032] Backup /var/lib/carbon/whisper on graphite [puppet] - 10https://gerrit.wikimedia.org/r/167817 (owner: 10Alexandros Kosiaris) [07:37:22] (03PS3) 10Giuseppe Lavagetto: gerrit: move to module [puppet] - 10https://gerrit.wikimedia.org/r/167215 [07:39:47] (03CR) 10Giuseppe Lavagetto: "@Daniel: yes there is an issue with the puppet compiler and hiera I will take a look at today." [puppet] - 10https://gerrit.wikimedia.org/r/168006 (owner: 10Dzahn) [07:40:51] (03PS4) 10Giuseppe Lavagetto: gerrit: move to module [puppet] - 10https://gerrit.wikimedia.org/r/167215 [07:48:31] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [07:58:23] PROBLEM - DPKG on tungsten is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:00:40] RECOVERY - DPKG on tungsten is OK: All packages OK [08:00:40] (03PS2) 10Alexandros Kosiaris: backup ganglia data on the collector [puppet] - 10https://gerrit.wikimedia.org/r/167778 (owner: 10Filippo Giunchedi) [08:12:40] (03CR) 10Giuseppe Lavagetto: [C: 032] gerrit: move to module [puppet] - 10https://gerrit.wikimedia.org/r/167215 (owner: 10Giuseppe Lavagetto) [08:18:55] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/167310 (owner: 10Aaron Schulz) [08:23:02] (03CR) 10Filippo Giunchedi: [C: 04-1] "AFAICT x-timestamp is also used by object server during POST/DELETE/PUT to issue "409 conflict" if the timestamp on the object is more rec" [software] - 10https://gerrit.wikimedia.org/r/167828 (owner: 10Filippo Giunchedi) [08:24:55] (03CR) 10Filippo Giunchedi: [C: 031] Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 (owner: 10Chad) [08:26:06] (03CR) 10Filippo Giunchedi: [C: 031] Another es-tool function: restart a node the fast & easy way [puppet] - 10https://gerrit.wikimedia.org/r/164401 (owner: 10Chad) [08:27:39] (03CR) 10Filippo Giunchedi: [C: 031] sudo: create module, remove old files [puppet] - 10https://gerrit.wikimedia.org/r/167183 (owner: 10Giuseppe Lavagetto) [08:28:00] <_joe_> godog: I see admin::sudo is never used anywhere for groups [08:28:26] <_joe_> and btw, it's better to ditch the sudo::user and sudo::group defines completely [08:28:39] (03CR) 10Filippo Giunchedi: [C: 032] Introduce LLDP facts [puppet] - 10https://gerrit.wikimedia.org/r/167644 (owner: 10Alexandros Kosiaris) [08:29:26] (03CR) 10Alexandros Kosiaris: [C: 04-1] nfs.pp - remove pmtpa (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/167885 (owner: 10Dzahn) [08:33:57] _joe_: indeed doesn't seem to be used for groups [08:34:27] <_joe_> godog: we should, maybe [08:34:47] <_joe_> I'll work on it after I fix the hiera private/common.yaml fail [08:35:49] _joe_: yep, I think the missing bit is the validation, the rest looks straightforward [08:39:13] (03CR) 10Ori.livneh: "Many thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/167817 (owner: 10Alexandros Kosiaris) [08:39:41] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [08:51:56] (03PS4) 10Alexandros Kosiaris: Introduce LLDP facts [puppet] - 10https://gerrit.wikimedia.org/r/167644 [08:51:58] (03PS4) 10Alexandros Kosiaris: Introduce rack/rackrow facts based on LLDP facts [puppet] - 10https://gerrit.wikimedia.org/r/167645 [08:53:51] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce LLDP facts [puppet] - 10https://gerrit.wikimedia.org/r/167644 (owner: 10Alexandros Kosiaris) [08:54:20] and lldp facts go live :-) [08:54:49] can't wait to cross refs some stuff with racktables :-) [08:56:48] (03PS1) 10Gilles: Prerender thumbnails at upload time on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168054 [08:58:13] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [08:58:21] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [09:01:46] (03CR) 10Alexandros Kosiaris: [C: 032] backup ganglia data on the collector [puppet] - 10https://gerrit.wikimedia.org/r/167778 (owner: 10Filippo Giunchedi) [09:02:19] akosiaris: sweet, thanks! [09:02:45] :-) [09:02:59] (03PS1) 10Giuseppe Lavagetto: hiera: use the same lookup rules for private and public hiera dirs [puppet] - 10https://gerrit.wikimedia.org/r/168055 [09:06:58] !log Jenkins: upgrading gearman-plugin from 0.0.7-1-g3811bb8 to 0.1.0-1-gfa5f083 . Ie bring us to latest version + 1 commit [09:07:06] Logged the message, Master [09:07:13] (03CR) 10Alexandros Kosiaris: [C: 032] DHCP - delete unused linux-host-entry files [puppet] - 10https://gerrit.wikimedia.org/r/167865 (owner: 10Dzahn) [09:07:16] and I am going to restart jenkins [09:07:59] !log Restarting Jenkins [09:08:03] Logged the message, Master [09:12:01] PROBLEM - DPKG on gallium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:12:29] (03PS2) 10Alexandros Kosiaris: DHCP - remove Tampa Squid/LVS subnet [puppet] - 10https://gerrit.wikimedia.org/r/167862 (owner: 10Dzahn) [09:12:36] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] DHCP - remove Tampa Squid/LVS subnet [puppet] - 10https://gerrit.wikimedia.org/r/167862 (owner: 10Dzahn) [09:13:00] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [09:16:10] RECOVERY - DPKG on gallium is OK: All packages OK [09:22:24] (03PS2) 10Giuseppe Lavagetto: hiera: use the same lookup rules for private and public hiera dirs [puppet] - 10https://gerrit.wikimedia.org/r/168055 [09:38:56] (03PS3) 10Giuseppe Lavagetto: hiera: use the same lookup rules for private and public hiera dirs [puppet] - 10https://gerrit.wikimedia.org/r/168055 [09:39:07] <_joe_> godog: mind to take a look? ^^ [09:39:18] <_joe_> this solves the private/common fiasco [09:40:04] _joe_: yep I'll take a look [09:40:57] remembers me I am overdue with catching up with Hiera :( [09:41:50] !log Zuul/Jenkins in a deadlock [09:41:57] Logged the message, Master [09:45:46] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Please put manifests in autoload layout." [puppet] - 10https://gerrit.wikimedia.org/r/167713 (owner: 10Andrew Bogott) [10:05:58] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, please add an example to the commit message so it is clear what the intent is" [puppet] - 10https://gerrit.wikimedia.org/r/168055 (owner: 10Giuseppe Lavagetto) [10:35:48] (03PS3) 10Giuseppe Lavagetto: sudo: create module, remove old files [puppet] - 10https://gerrit.wikimedia.org/r/167183 [10:37:07] (03PS4) 10Giuseppe Lavagetto: sudo: create module, remove old files [puppet] - 10https://gerrit.wikimedia.org/r/167183 [10:51:02] _joe_: hey [10:51:08] _joe_: https://wikitech.wikimedia.org/wiki/Puppet_coding#Puppet_Modules + https://wikitech.wikimedia.org/wiki/Puppet_Todo [10:51:16] they're a bit outdated now I think [10:51:38] but in any case, I don't think we should have a "sudo::appserver" class (for example) [10:51:57] <_joe_> me neither, honestly [10:52:02] the contents of that (that single File) fits better within the appserver module [10:52:04] <_joe_> I am deleting the whole thing [10:52:08] <_joe_> yes [10:53:07] <_joe_> so I'm going to: 1) modify admin::sudo so that it DTRT with groups too 2) convert all sudo_user and sudo_group declarations to use admin::sudo 3) move those files to where they belong [10:53:41] <_joe_> 1) is not problematic as admin::sudo has never been used with is_group :) [10:53:59] I think we've discussed this before [10:54:17] <_joe_> I missed that discussion I guess [10:54:34] and i think the result of that discussion was that "admin" should be kept for granting access (sudo too) to humans [10:54:43] <_joe_> ok [10:54:47] but dunno how it is these days, chase would know better I think [10:55:00] <_joe_> so we may remove the is_group clause [10:55:39] <_joe_> but hey, I'll check with chase as well on that, putting that on hold. [11:03:51] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail [11:21:56] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:43:38] (03PS5) 10Giuseppe Lavagetto: sudo: create module, remove old files [puppet] - 10https://gerrit.wikimedia.org/r/167183 [11:48:17] (03PS1) 10Giuseppe Lavagetto: role::labs::instance: include sudo::labs_project [puppet] - 10https://gerrit.wikimedia.org/r/168062 [11:53:24] (03PS4) 10Giuseppe Lavagetto: hiera: use the same lookup rules for private and public hiera dirs [puppet] - 10https://gerrit.wikimedia.org/r/168055 [11:53:48] <_joe_> godog: ^^ [11:56:28] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: Puppet has 1 failures [11:57:20] _joe_: I'm off to lunch but will take a look afterwards! [11:57:38] <_joe_> godog: yes, not in a hurry actually [11:57:57] <_joe_> I'll be off for a couple of hours after lunch [11:58:06] <_joe_> (I'm off as well) [12:00:27] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: puppet fail [12:11:01] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [12:18:04] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [12:22:20] greg-g: ori: I agree with the idea with not running non-voting jobs, or even not having such a thing in the first place. Imho they should only exist when CI is debugging something or when we're looking to add a new job that isn't passing just yet while we merge commits that make it passing (e.g. enable a new style rule) It would never exist for more than a day. [12:22:26] But I know hashar doesn't agree. [12:22:54] if it's something not passing any time soon, kill it and have whoever is interested work on it themselves and let us know when it's ready. [12:23:26] Krinkle: hi :) you probably want to follow up on the related task greg-g opened https://phabricator.wikimedia.org/T794 [12:25:37] hashar: I am currently not using phabricator in my workflow. And am not really tempted to right now. [12:25:47] Does it have the concept of components inside projects? [12:26:01] Or projects inside other projects? [12:26:56] do whatever you want [12:27:28] I am not going to attempt to follow the discussion via IRC though :D [12:27:41] hashar: I see you raised the same concern (see email) [12:27:50] We need a CI grouping of sorts [12:28:09] No, projects in Phab are a flat namespace. Subprojects are not (yet) supported in Phabricator: https://secure.phabricator.com/T3670 [12:28:13] If that doesn't happen soon I'm tempted to move whatever stuff ended up in phab/releng back to bugzilla. [12:28:18] Why are there things for it on phab already? [12:28:22] Afaik we didn't move CI yet. [12:28:29] I don't want it in to places, what's the point in that? [12:28:40] Or did we import CI? That's fine by me, as long as its not in two places. [12:29:13] andre__: So what are we goign to do with mediawiki core components in bugzilla/phab? [12:29:27] I think it's an definite no-go without that kind of separation, I can't do my job otherwise. [12:29:31] Krinkle, migrate them into projects. [12:29:41] Right, that sounds good. [12:29:44] prefixes or something [12:29:56] (and you even more so of course!) [12:30:21] Project-Component, like MediaWiki-Page_editing [12:30:48] That does mean it's harder to get a grip on wider scopes for you [12:30:50] Krinkle: bring the subject to Greg. I am not sure why he filled the tasks in Phabricator instead of using Bugzilla which prompted my email this morning about how to handle CI inside Phabricator [12:31:01] andre__: unless there's a way to do MediaWiki-* [12:31:08] I guess he filled the tasks in Phabricator so we can track them in the Releng team workboard [12:31:10] which is fine to me [12:31:11] andre__: do you know of such a way? [12:31:17] andre__: I guess you'll want that for triaging and sorting [12:31:18] Krinkle, define "do". [12:31:23] not sure what the question is [12:31:31] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: use the same lookup rules for private and public hiera dirs [puppet] - 10https://gerrit.wikimedia.org/r/168055 (owner: 10Giuseppe Lavagetto) [12:31:44] andre__: If we map components to MEdiaWiki+component as phab product, how will you query all of Mediawiki product? [12:32:01] short of manually selecting them from the global project list which is gonna be huge [12:32:10] (03PS1) 10RobH: granting jkatz bastion and statistics-user access [puppet] - 10https://gerrit.wikimedia.org/r/168066 [12:32:19] Krinkle, by entering all subprojects in a query. And after doing that once you'll quickly learn to save the URL for that query. [12:32:23] I see a ticket can be in multiple projects, that's nice. [12:32:25] The other idea would be umbrella projects. [12:32:47] so we'd manually add tickets in MediaWiki-* projects to the MediaWiki umbrella project, too. [12:32:56] but there's no simple way to say "if task is in project MediaWiki-Page_editing also automatically add it to umbrella project MediaWiki" [12:33:03] ugh yeah [12:33:12] basically it's the same discussion about categories on Wikipedia or Commons [12:33:30] "Female writers" && "Male writers" vs. "Writers" with sub-categories [12:33:33] Pick your poison. [12:33:53] andre__: Have we settled on what we'll do for our phab instance? e.g. whehter to bother with umbrella tracking manually, or require interested users to do a query for all subprojects. [12:34:05] andre__: yeah, due to lack of recursive and intersection. [12:34:09] (03CR) 10RobH: [C: 032] "His three day wait started on 2014-10-15, so this is overdue." [puppet] - 10https://gerrit.wikimedia.org/r/168066 (owner: 10RobH) [12:34:12] Also Female humans and Male humans [12:34:16] and humans [12:34:24] intersect with Writers and Female humans [12:34:26] etc. [12:34:30] I guess that will depends on each team :D [12:34:41] Krinkle, "query for all subprojects" for the time being. [12:34:45] OK [12:35:17] for CI I guess we could get the major tasks / tracking tasks to be flagged with both Releng team project and contint project [12:35:35] andre__: So for Rel Eng, we'd have a CI product as well. But that does mean for workflow boards (like https://phabricator.wikimedia.org/tag/release-engineering/board/ ) we can't add them to the board of a "Parent" project, since it's not really in that project [12:35:38] and have the minor tasks / subtasks just flagged with "contint" so as to not pollute the "parent" project workboard [12:35:41] andre__: maybe wed' need umbrella for that [12:36:03] hashar: yeah we'll add rel-eng only to those that are needed to be in the larger view for releng [12:36:09] and keep the rest out [12:36:17] cool [12:36:29] hence my email wondering how to handle sub projects part of RelEng team responsability [12:36:46] hashar: do you (As member of rel eng) use that workboard? [12:36:49] the idea of the RelEng project is to be used for our weekly checkin and have a quick glance at what is being worked on [12:37:20] we wanted to talk about using the work board during our yesterday weekly checkin, but I had a connection issue so we postponed it to next week meeting [12:37:34] I guess we will play a bit with it and iteratively figure out something that works for us [12:38:21] Krinkle: if you have any ideas, please reply to the email so the rest of the team knows about them :] [12:39:33] (03Abandoned) 10Giuseppe Lavagetto: mediawiki: avoid having mysqld running on trusty. [puppet] - 10https://gerrit.wikimedia.org/r/167007 (owner: 10Giuseppe Lavagetto) [12:42:51] PROBLEM - puppet last run on ssl3003 is CRITICAL: CRITICAL: puppet fail [12:44:55] (03PS1) 10Giuseppe Lavagetto: wikitech: do not include sudoers::labs_project via ldap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168067 [12:48:09] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 203, down: 1, dormant: 0, excluded: 1, unused: 0BRxe-4/3/2: down - BR [12:48:34] that's me, temporary [12:49:47] (03CR) 10Giuseppe Lavagetto: [C: 04-1] move 'noc' from misc to module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/168006 (owner: 10Dzahn) [12:50:31] (03CR) 10Yuvipanda: [C: 032 V: 032] Add Apache2 License [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168048 (owner: 10Yuvipanda) [12:50:39] (03CR) 10Yuvipanda: [C: 032 V: 032] Setup generator as a script by itself [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168050 (owner: 10Yuvipanda) [12:51:15] (03CR) 10Yuvipanda: [C: 032 V: 032] Make sure setup.py picks up all packages [software/shinkengen] - 10https://gerrit.wikimedia.org/r/168052 (owner: 10Yuvipanda) [13:01:40] RECOVERY - puppet last run on ssl3003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [13:34:22] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [13:38:15] !log catch-up swiftrepl sync eqiad -> codfw for commons containers [13:38:22] Logged the message, Master [13:43:30] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail [13:48:45] (03CR) 10Ottomata: Enable GELF for MRAppManager part 2 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/167044 (owner: 10Gage) [13:56:19] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: puppet fail [14:01:19] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [14:11:29] some mail problems on dewwiki: Unable to execute sendmail -t -i - reported as bug 72358 [14:15:31] !bug 72358 [14:15:54] https://bugzilla.wikimedia.org/show_bug.cgi?id=72358 [14:18:55] what's the HHVM man to speak to at the moment ? [14:20:40] might have a HHVM bug on Commons, at https://commons.wikimedia.org/w/index.php?title=Commons%3AAdministrators%27_noticeboard%2FUser_problems&diff=137580932&oldid=137580888 [14:21:07] NotASpy: none around, the best is to fill a bug with keyword 'hhvm' [14:21:29] i've prodded someone on ops but thye may be afk [14:21:34] cuz they've been idle for a bit. [14:21:44] will do. Bugzilla or Phab ? [14:21:47] NotASpy: and that diff does not provide that much information as to what the issue is :)D [14:21:50] NotASpy: bugzilla :] [14:21:55] indeed bz [14:22:07] NotASpy: why do we think it's hhvm-related? [14:22:44] the last edit is tagged HHVM and we can't see anything that would be causing the page to display incorrectly (assuming it's displaying incorrectly for everybody) [14:23:24] NotASpy: so you will want to explain what is the bug , maybe attaching a screenshot can help [14:23:32] ...oh crap. yes it is. confirmed [14:23:36] we already have a bug about this [14:23:44] https://bugzilla.wikimedia.org/72357 [14:25:00] that's it. Perfect. The issue has just resolved itself now someone else has edited the page, but I'll include the diff anyway, just in case you guys can make use of it. [14:25:14] you might want to flag that bug with the keyword hhvm :) [14:25:29] hashar: i have [14:25:32] i am off be back later [14:25:56] NotASpy: note that this is related to the last parse, not the last edit, so the tag won't always be right [14:26:07] it only happened to be right this time because nothing else triggered a parse in the meantime [14:26:07] <_joe_> NotASpy: why do you all think this is hhvm related? [14:26:07] <_joe_> I just see a problematic edit that happened to be done via hhvm [14:26:12] _joe_: it is [14:26:18] right, that all makes sense. [14:26:32] _joe_: look at https://commons.wikimedia.org/w/index.php?title=Commons%3AAdministrators%27_noticeboard%2FUser_problems&diff=137580932&oldid=137580888 with hhvm off and with it on [14:26:33] <_joe_> jackmcbarn: can you be a little more explicit? [14:26:56] <_joe_> jackmcbarn: ok the "wikidata problem" [14:27:05] what's the "wikidata problem"? [14:28:12] _joe_: and is this problem going to affect all the wikipedias tomorrow if we don't do something by then? [14:28:55] <_joe_> jackmcbarn: no the problem I was referring to was one case where hhvm could handle large pages, and zend doesn't [14:29:44] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, please attach a puppet compiler run before merging!" [puppet] - 10https://gerrit.wikimedia.org/r/167183 (owner: 10Giuseppe Lavagetto) [14:30:23] <_joe_> jackmcbarn: tje cache is distinct between hhvm an zend, so maybe it's more a fact of one object being cached badly [14:30:35] _joe_: on the diff link i gave you, i don't think it's caching [14:30:47] <_joe_> but this seems like a purely sw-related bug, so I'd leave that to devs :) [14:31:23] <_joe_> jackmcbarn: probably not, qute weird indeed [14:31:23] <_joe_> have you tagged the bug "hhvm"? [14:31:28] yeah [14:32:24] <_joe_> ok, then devs will see it probably [14:32:26] <_joe_> lemme ping them [14:34:20] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [14:49:19] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 205, down: 0, dormant: 0, excluded: 0, unused: 0 [14:54:34] _joe_: did you get puppet fixed on deployment-pdf01 ? [14:55:09] bd808|BUFFER: said we should have specified a reason string when we did puppet agent --disable last night [14:56:19] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: puppet fail [14:59:09] manybubbles, ^demon|busy, MARKTRACEUR: So who wants to SWAT? [14:59:27] I could, but I did yesterday. Ugh so much work. :P [14:59:39] anomie: hmm - well today I'm doing a meeting for the next half hour so if it is me I'd do 30 minutes late [14:59:54] MARKTRACEUR: yesterday didn't have a swat, right? I looked and it was empty [15:00:04] manybubbles, anomie, ^d, marktraceur, cscott: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141022T1500). [15:00:05] Yeah, that's the point of the ":P" [15:00:11] ah, well done [15:00:26] meeting? [15:00:57] anomie: Are you unable to? [15:01:12] MARKTRACEUR: I could, but I'd rather not if someone else will [15:01:18] manybubbles: there was a sort of swat at the end of the train deploy yesterday. [15:01:26] I'm happy to [15:01:38] I guess cscott has a patch, and obviously he's here [15:01:48] yes indeedy [15:01:50] gi11es also has a patch! Prerendering thumbs on Commons! [15:01:53] YES [15:01:53] So excite. [15:02:02] i could actually deploy my own patch, i think i've got all the necessary perms. [15:02:10] but i'm happy to be lazy. [15:02:23] cscott: You just relax and be ready to test it [15:02:31] i've got my clicking finger ready [15:02:34] We'll do cscott first 'cause I'm less scared of his patch [15:02:49] * cscott is so cuddly and nonthreatening [15:03:23] (03CR) 10MarkTraceur: [C: 032] Re-enable PediaPress POD in production. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167866 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [15:03:30] Two config patches does not a long SWAT make [15:03:40] (03Merged) 10jenkins-bot: Re-enable PediaPress POD in production. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/167866 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [15:05:28] !log marktraceur Synchronized wmf-config/CommonSettings.php: [SWAT] Re-enable PediaPress POD in production. (duration: 00m 05s) [15:05:33] Logged the message, Master [15:05:37] cscott: Testy test [15:05:37] ready to test? [15:05:39] !log marktraceur Synchronized wmf-config/CommonSettings-labs.php: [SWAT] Re-enable PediaPress POD in production. (duration: 00m 05s) [15:05:44] Logged the message, Master [15:06:08] gi11es: OK, config patches are fast, you're up [15:06:15] <_joe_> cscott: u here? [15:06:17] OK [15:06:21] _joe_: might be [15:06:30] <_joe_> do you want to work on deployment-pdf01? [15:06:36] (03CR) 10MarkTraceur: [C: 032] Prerender thumbnails at upload time on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168054 (owner: 10Gilles) [15:06:44] (03Merged) 10jenkins-bot: Prerender thumbnails at upload time on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168054 (owner: 10Gilles) [15:06:46] <_joe_> I have a flaky connection but I can do something [15:07:26] * MARKTRACEUR waits for confirmation [15:07:37] _joe_: hang on, i'm testing a mediawiki-config change [15:07:46] <_joe_> cscott: uh sorry [15:07:49] <_joe_> just read [15:07:59] MARKTRACEUR: we might have to revert that patch :( [15:08:04] That's OK [15:08:14] cscott: Urgently? [15:08:19] still testing, maybe there's something cached wrong [15:08:22] no, not urgently [15:08:30] OK, I'll push gi11es's patch first [15:09:03] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Pre-render thumbnails on upload on Commons (duration: 00m 05s) [15:09:04] gi11es: Test! [15:09:10] Logged the message, Master [15:09:29] hm, pediapress still works on beta. [15:09:43] cscott: I'm going to make a revert patch just in case [15:09:46] (03PS1) 10MarkTraceur: Revert "Re-enable PediaPress POD in production." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168082 [15:10:11] we're getting an error from the pediapress side. i don't know yet whether this is a misconfiguration on our side or theirs. [15:10:27] cscott: We have nothing but time, no worries [15:10:42] MARKTRACEUR: TESTING [15:11:52] :) [15:12:12] so where can i find logs from enwiki? it would be nice to see what the collection extension is saying right now. [15:12:17] Uhhhh [15:12:22] fluorine IIRC [15:12:47] they don't go to logstash or anything clever like that, do they? [15:12:58] Clever? Pff. [15:13:11] I mean, maybe. [15:13:16] I don't really know [15:13:28] i need a log wizard [15:13:42] a subset of the logs from flourine end up in logstash [15:13:47] MARKTRACEUR: IT WORKS [15:13:54] YAAAAY [15:14:19] In that case I'm going to run and get coffee, but cscott let me know what happens, I'm still at the ready to revert [15:14:21] IS TODAY CAPSLOCK DAY ?! [15:14:34] bd808: OF COURSE IT IS!!! [15:14:46] MARKTRACEUR: yeah, i'm going to ping the pediapress people and see if they can maybe debug some on their side before we decide whether to revert. [15:15:05] cscott: These log events are added to logstash in prod -- https://github.com/wikimedia/operations-puppet/blob/production/files/logstash/filter-mw-via-udp2log.conf#L44-L83 [15:15:21] OPS, IF SOMETHING BLOWS UP WITH THUMBNAILS LET ME KNOW. NEW UPLOADS TO COMMONS NOW HAVE A SET OF THUMBNAILS RENDERED AT UPLOAD TIME [15:15:29] FOR SCIENCE [15:15:42] * bd808 has remapped caps-lock to esc so won't play this game [15:23:13] MARKTRACEUR: CHRISTOPH FROM PEDIAPRESS IS INVESTIGATING NOW. [15:23:26] GOOD PLAN. STANDING BY, CAPTAIN. [15:23:49] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet has 1 failures [15:23:57] <_joe_> !log rolling restart of hhvm appservers, to alleviate memory issues [15:24:02] Logged the message, Master [15:24:33] _joe_: just checking in about app server reinstall...still on hold? [15:24:57] <_joe_> ot yes [15:26:08] klepper from pediapress asked me what IP the requests from the collection extension would be coming from. [15:26:22] do we have an IP range for the WMF backends? [15:26:49] * MARKTRACEUR doesn't know it off the top of his head [15:27:19] (03CR) 10Ottomata: "BTW, I know that Hiera is the way to go, but I must say...this global variable conditional assignment thing in defaults.pp works great in " [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [15:27:57] Hi - I am Christoph from PediaPress. We are trying to setup a new render server. [15:28:21] And I would like to know from which IP the requests to pediapress.com are sent [15:28:29] (03CR) 10Ottomata: "Example of the way I did this for Hadoop:" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [15:29:31] kepper: according to https://wikitech.wikimedia.org/wiki/Network_design they should broadly be from 208.80.154.x or 10.64.x i believe. [15:29:50] PROBLEM - HHVM rendering on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:29:59] PROBLEM - Apache HTTP on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:30:00] wait, those are our private addresses [15:30:06] (03PS2) 10Ottomata: Require 2 ACKs from kafka brokers for text caches [puppet] - 10https://gerrit.wikimedia.org/r/167551 (https://bugzilla.wikimedia.org/69667) (owner: 10QChris) [15:30:27] (03CR) 10QChris: Require 2 ACKs from kafka brokers for text caches [puppet] - 10https://gerrit.wikimedia.org/r/167551 (https://bugzilla.wikimedia.org/69667) (owner: 10QChris) [15:30:42] kepper: ah, https://wikitech.wikimedia.org/wiki/IP_addresses seems to have what you want [15:30:51] (03CR) 10Ottomata: [C: 032 V: 032] Require 2 ACKs from kafka brokers for text caches [puppet] - 10https://gerrit.wikimedia.org/r/167551 (https://bugzilla.wikimedia.org/69667) (owner: 10QChris) [15:31:00] thank you cscott [15:31:23] !log setting request.required.acks to 2 for mobile and text varnishkafka's (mobile was set to 2 yesterday) [15:31:28] Logged the message, Master [15:31:30] the requests from beta would be coming from the "wikimedia labs" ips, 208.80.155.128/25 -- those are the ones which are definitely working right now. [15:32:02] RECOVERY - HHVM rendering on mw1020 is OK: HTTP OK: HTTP/1.1 200 OK - 68191 bytes in 0.188 second response time [15:32:03] Zomg I made coffee [15:32:13] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.049 second response time [15:32:15] cmjohnson: checking in about new elasticsearch node racking timeline :D :D [15:32:32] kepper: the ones from 198.x.x.x and 208.x.x.x and 91.x.x.x are the ones which seem to not be working. [15:32:45] i have questions...i was trying to get a hold of nik [15:32:51] manybubbles: yt? [15:32:58] ottomata: yes [15:33:00] cscott: we have not blocked any of those, i think [15:33:16] (03CR) 10Manybubbles: [C: 031] "That should do it." [puppet] - 10https://gerrit.wikimedia.org/r/164401 (owner: 10Chad) [15:33:16] cmjohnson: has questions for manybubbles. here we are! [15:33:17] :) [15:33:21] is it possible to remove some of the old search boxes? [15:33:34] kepper: i take it that you are not seeing the requests arrive at https://pediapress.com/wmfup/ ? [15:33:36] cmjohnson: lsearchd boxes? not at this time, unfortunately [15:33:37] the html layout also looks a little different on Beta - are you sure that you deployed the same configuration? [15:33:41] in a few months I think [15:33:45] :( [15:34:10] i think beta is running a different default skin [15:34:26] yes, maybe I looked at the wrong place, but I did not see any requests [15:34:36] but you see the requests from beta? [15:34:45] will check again [15:34:45] that would be a good double-check that you were looking at the right place [15:34:48] yes [15:34:53] ok, thanks...ottoata I will hopefully have done by tomorrow...I have several things in the air right now [15:35:13] cmjohnson: thanks! [15:35:31] kepper: the production wiki configuration is at https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L1591 [15:36:02] kepper: and it is modified in beta by https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings-labs.php#L166 [15:37:27] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:38:38] MARKTRACEUR, _joe_: I take it fenari is no longer running in tampa? perhaps https://wikitech.wikimedia.org/wiki/Fenari should be updated. [15:39:08] I don't think fenari is running at all... [15:39:12] I could be wrong [15:39:21] oh, right, you said flourine or something for the logs [15:39:36] so https://wikitech.wikimedia.org/wiki/Apaches should probably be updated, since it has fenari all over [15:40:20] Probly [15:40:24] I'm not too worried about it [15:40:37] and bd808 -- could I add 'Collection' to that list of logs ending up in logstash? [15:41:20] (03PS1) 10Cmjohnson: Temporarily adding americium to production to test pxe [dns] - 10https://gerrit.wikimedia.org/r/168086 [15:41:26] (03PS9) 10Filippo Giunchedi: Another es-tool function: restart a node the fast & easy way [puppet] - 10https://gerrit.wikimedia.org/r/164401 (owner: 10Chad) [15:41:32] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Another es-tool function: restart a node the fast & easy way [puppet] - 10https://gerrit.wikimedia.org/r/164401 (owner: 10Chad) [15:41:43] cscott: Yes. Let me peek a log file on flourine to see what the format is [15:42:15] (03PS10) 10Chad: Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 [15:42:29] i think just adding "collection", to that list will do it. [15:42:38] cscott: I don't see any logs named "collection" in /a/mw-log on flourine... [15:43:40] $ fgrep '[collection]' /var/log/mediawiki/debug-mediawiki-dev.log [15:43:40] on my localhost gives: [15:43:40] [collection] Server returned error: system overloaded. please try again later. [15:43:41] [collection] Server returned error: system overloaded. please try again later. [15:43:50] it doesn't log very much, though. [15:45:02] cscott: So we'd need to start by adding the 'collection' channel to https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L4120 [15:45:12] ah [15:45:27] Once MW is sending that to the udp2log relay then we can pass it on to logstash [15:46:11] (03PS1) 10Giuseppe Lavagetto: ocg: stop erratic puppet failures. [puppet] - 10https://gerrit.wikimedia.org/r/168088 [15:46:19] cscott: Status? [15:46:47] (03CR) 10Chad: "Last one, but arguably one of the most useful ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/164617 (owner: 10Chad) [15:47:03] (03PS2) 10Giuseppe Lavagetto: ocg: stop erratic puppet failures. [puppet] - 10https://gerrit.wikimedia.org/r/168088 [15:47:11] (03PS1) 10Cscott: Add logs from Extension:Collection to logstash. [puppet] - 10https://gerrit.wikimedia.org/r/168089 [15:47:20] MARKTRACEUR: kepper is still working on it, and i'm working with bd808 to try to get more logs. [15:47:27] OK [15:47:32] Will we be set in 13 minutes? [15:47:39] bd808: do you think there are any collection logs on flourine, or are they being thrown away? [15:47:40] I have a meeting, so I can't shepherd it any further [15:47:59] MARKTRACEUR: I don't think there's any harm if the config stays this way until this afternoon's SWAT session. [15:48:00] There's no deploy until 18:00 UTC, but still [15:48:03] OK then [15:48:13] I will close some SSH sessions then [15:48:17] everything works, you just get an error if you try to buy a book from pediapress. [15:48:17] cscott: Since it's not in wgDebugLogGroups it goes to /dev/null [15:48:37] bd808: ok, so maybe i need MARKTRACEUR to swat me a change to wgDebugLogGroups? [15:48:44] yup [15:48:55] MARKTRACEUR: NO REST FOR THE WICKED [15:49:05] cscott: BUT REST IS THE FUTURE OF ALL APIS [15:49:20] bd808: https://gerrit.wikimedia.org/r/168089 is the patch to the logstash config [15:49:23] (03CR) 10Giuseppe Lavagetto: [C: 032] ocg: stop erratic puppet failures. [puppet] - 10https://gerrit.wikimedia.org/r/168088 (owner: 10Giuseppe Lavagetto) [15:49:38] MARKTRACEUR: tHAT IS WHY ONLY GOOD PEOPLE GET REST [15:49:45] cmjohnson: thank you! [15:49:51] I hate this game [15:49:57] OK, looking, merging, deploying [15:50:00] Blink and you'll miss it [15:50:00] i just lost the game. [15:50:32] Oh, wait, lol, I can't merge puppet patches [15:50:35] (03CR) 10BryanDavis: [C: 031] "LGTM but until collection is added to $wgDebugLogGroups it won't do anything." [puppet] - 10https://gerrit.wikimedia.org/r/168089 (owner: 10Cscott) [15:51:09] cscott: Literally if I can't do this in 9 minutes you have to find someone else or wait [15:51:10] cscott: I have checked (and double checked) our setup. I can see requests from Beta (208.80.155.255 I assume), but not from en.Wikipedia [15:51:42] (03CR) 10Cmjohnson: [C: 032] Temporarily adding americium to production to test pxe [dns] - 10https://gerrit.wikimedia.org/r/168086 (owner: 10Cmjohnson) [15:52:43] (03PS1) 10Cscott: Send Extension:Collection logs to logstash. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168091 (https://bugzilla.wikimedia.org/71675) [15:52:58] bd808: could you double check that ^^ [15:53:55] ops folk: do we have any outgoing firewall rules that would prevent requests getting from the apache to pediapress? i assume not, because we *used* to send requests to pediapress and it worked. but maybe the ips have shifted slightly? [15:54:04] (03CR) 10MarkTraceur: [C: 04-1] "Might be needed, but we're trying to figure out the issues on PediaPress's side." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168082 (owner: 10MarkTraceur) [15:54:41] cscott: Can I deploy this or do I need to wait for the puppet patch to get merged and deployed? [15:54:51] (03CR) 10BryanDavis: [C: 031] Send Extension:Collection logs to logstash. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168091 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [15:55:00] MARKTRACEUR: i think you can deploy this, and then logstash will ignore it (until the puppet patch is merged) [15:55:04] OK [15:55:06] cscott: yeah, apaches can't talk to the Internet at all [15:55:07] but let's get confirmation from bd808 [15:55:24] Well...that would probably explain the issue then [15:55:25] MARKTRACEUR, cscott: MW config first then puppet [15:55:25] akosiaris: fyi, I'm goign to merge that old base::firewall change I let get stale [15:55:30] OK [15:55:33] MARK: ORLY. so are those firewall rules buried in a puppet config somewhere? because the pediapress backend *used* to work. [15:55:33] mutante: +1ed it and reminded me this week [15:55:40] https://gerrit.wikimedia.org/r/#/c/160802 [15:55:42] should be fine i think [15:55:44] cscott: it's not a firewall, they are on a private network and have private IPs [15:55:45] (03CR) 10MarkTraceur: [C: 032] Send Extension:Collection logs to logstash. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168091 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [15:55:47] they can't talk to the internet [15:55:51] (there's no NAT anywhere) [15:55:53] (03Merged) 10jenkins-bot: Send Extension:Collection logs to logstash. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168091 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [15:56:01] MARK: but they used to. so how did that work? [15:56:06] maybe webproxy was used to do that [15:56:20] there is indeed a squid HTTP proxy that can be used for some things [15:56:34] there is a reverse proxy somewhere isn't there? That's how flikr import to commons works right? [15:56:44] there are two *forward* proxies [15:56:44] yes [15:56:54] there is one that we use for system updates etc. [15:57:01] OK, going [15:57:01] and there's another one for flickr and other services like that [15:57:03] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Send collection logs to logstash. (duration: 00m 05s) [15:57:06] grep for url-downloader [15:57:06] And SWAT is OVER. [15:57:09] Logged the message, Master [15:57:10] Stop asking me for stuff [15:57:14] * MARKTRACEUR grumps away [15:57:20] paravoid: grep in which repo? [15:57:21] those caps are really annoying and [15:57:24] (03PS3) 10Ottomata: Add nickel to $MONITORING_HOSTS network, rename ferm::rule icinga-all to monitoring-all [puppet] - 10https://gerrit.wikimedia.org/r/160802 [15:57:25] mediawiki-config [15:57:25] paravoid: you're of course right, forward to go out; reverse to come in [15:57:55] cscott: Oh, also, you should test that change as soon as you can [15:58:28] collections log -- "2014-10-22 15:58:04 mw1036 frwiki: Request to http://ocg.svc.eqiad.wmnet:8000 resulted in error" [15:58:28] (03CR) 10Ottomata: [C: 032] Add nickel to $MONITORING_HOSTS network, rename ferm::rule icinga-all to monitoring-all [puppet] - 10https://gerrit.wikimedia.org/r/160802 (owner: 10Ottomata) [15:58:31] ottomata: yeah , go ahead [15:58:37] paravoid: there's 'wgCopyUploadProxy' => array( [15:58:37] 'default' => 'url-downloader.wikimedia.org:8080', [15:58:37] ), [15:58:48] _joe_: merging your ocg change, sok? [15:59:02] bd808: if you're seeing that, can i assume that MARKTRACEUR's config change worked? [15:59:09] cscott: that [15:59:14] (doing it!) [15:59:14] not sure how pediapress used to work though [15:59:21] cscott: yes. logs comming in on fluorine [15:59:25] (03PS1) 10Cmjohnson: Temporarily adding ameriium to production to test pxe boot [puppet] - 10https://gerrit.wikimedia.org/r/168092 [15:59:28] i love code archeology. [15:59:42] MARKTRACEUR: your SWATting is done [16:00:00] <_joe_> ottomata: yea sorry [16:00:18] <_joe_> ottomata: I was focused in correcting labs [16:00:22] np [16:01:01] <_joe_> cscott: I should have fixed beta ocg [16:01:21] paravoid: hm, from https://www.mediawiki.org/wiki/Offline_content_generator/print-on-demand_service it looks like it was the old pdf servers in tampa which talked to pediapress, not the apaches directly. [16:01:30] those had public IPs [16:01:32] the apaches only talked to the tampa mw-serve [16:01:46] so that would work indeed [16:01:47] which i guess they could get to via vlan? [16:02:01] internal routing works, yes [16:02:03] hm. [16:02:16] (03CR) 10Cmjohnson: [C: 032] Temporarily adding ameriium to production to test pxe boot [puppet] - 10https://gerrit.wikimedia.org/r/168092 (owner: 10Cmjohnson) [16:02:33] i could make ocg proxy over to pediapress. but i'd rather not. [16:02:52] how easy/hard is it to use the existing url-downloader proxy? [16:03:01] should be easy [16:03:21] (03CR) 10Ottomata: [C: 04-2] "-2ing until the new cassandra module is merged and added as a submodule in this commit, and until the desired version of Cassandra .deb is" [puppet] - 10https://gerrit.wikimedia.org/r/167700 (owner: 10Ottomata) [16:04:37] (03CR) 10Ottomata: [C: 032] Allow local server-status requests on http [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/164583 (https://bugzilla.wikimedia.org/71606) (owner: 10QChris) [16:05:23] paravoid: does url downloader proxy any/everything? or do i need to name the pediapress server specifically? the url_downloader puppet module seems very general. [16:05:40] iirc it just proxies anything, but let me check [16:06:34] yeah it does, as long as it's 80/443 [16:07:08] (03PS3) 10Ottomata: Declare namenode directory only once [puppet] - 10https://gerrit.wikimedia.org/r/164762 (owner: 10QChris) [16:07:11] akosiaris: want to hear my cent on tor exit with my hat as a steward? [16:07:24] paravoid: POSTs should work, right? [16:07:56] yes [16:08:02] you can just try it from an appserver or even tin [16:08:12] yeah, i was about to try wget from tin [16:08:13] export http_proxy=http://url-downloader.wikimedia.org:8080 [16:08:18] then wget/curl [16:08:26] (03CR) 10Ottomata: [C: 032] Declare namenode directory only once [puppet] - 10https://gerrit.wikimedia.org/r/164762 (owner: 10QChris) [16:08:31] (03PS3) 10Ottomata: Declare datanode's mount directories only once [puppet] - 10https://gerrit.wikimedia.org/r/164763 (owner: 10QChris) [16:09:27] (03CR) 10Ottomata: [C: 032] Declare datanode's mount directories only once [puppet] - 10https://gerrit.wikimedia.org/r/164763 (owner: 10QChris) [16:09:58] matanya: obviously [16:10:37] kepper: is there a good test command to use? https://pediapress.com/wmfup/ gives 400 Bad Request and "no command given" in text [16:10:43] here or in pm akosiaris [16:11:05] (03PS1) 10QChris: Have wikimetrics accept server-status requests on http [puppet] - 10https://gerrit.wikimedia.org/r/168096 [16:11:06] well that's already a good answer [16:11:06] whatever you prefer [16:11:12] https_proxy=http://url-downloader.wikimedia.org:8080 wget -v https://pediapress.com/wmfup/ [16:11:18] gives the same thing, so i guess that works. [16:11:27] although it might be best not to bother the rest of the channel [16:11:50] now i assume that PHP has some elegant way to set the desired proxy for an web request [16:11:54] * cscott crosses his fingers [16:12:38] we seem to be using Http::post [16:12:52] (03CR) 10Andrew Bogott: "> Please put manifests in autoload layout." [puppet] - 10https://gerrit.wikimedia.org/r/167713 (owner: 10Andrew Bogott) [16:13:07] RECOVERY - RAID on ms-be2007 is OK: OK: optimal, 13 logical, 13 physical [16:14:15] (03PS1) 10Cmjohnson: Fixing americium ip's wrong row [dns] - 10https://gerrit.wikimedia.org/r/168098 [16:15:18] (03CR) 10Cmjohnson: [C: 032] Fixing americium ip's wrong row [dns] - 10https://gerrit.wikimedia.org/r/168098 (owner: 10Cmjohnson) [16:16:41] paravoid: any reason why we can't use url-downloader from labs as well? i'd like to keep the labs and production configurations identical if possible. [16:17:06] I don't think you'll be able to [16:17:10] labs can't connect to prod [16:20:05] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [16:20:35] paravoid: ok, i wasn't sure where in the network url-downloader was. but that makes sense. [16:21:25] MARKTRACEUR: i updated the status in https://bugzilla.wikimedia.org/show_bug.cgi?id=71675 [16:21:36] OK [16:22:21] cscott: It's up to you; I probably won't be SWATting this afternoon [16:28:36] There are about five reports out of -commons that HHVM is breaking CSS today [16:28:39] Just FYI [16:32:31] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [16:33:07] (03CR) 10Hashar: "it is probably fine, just have to verify whether APC is configured with some memory to actually be used as local cache :D It is long sin" [puppet] - 10https://gerrit.wikimedia.org/r/119102 (owner: 10Nemo bis) [16:33:56] MARKTRACEUR: i'M THINKING ABOUT THE PROPER CAPITALIZATION OF swat [16:35:09] HEH. [16:35:09] MARKTRACEUR: is there a bug? a message to the eather of -operations might not stick [16:35:32] greg-g: I haven't filed anything yet, it seems like there are a few linked issues that may get fixed at the same time [16:35:39] * greg-g nods [16:35:48] just making sure we don't lose it [16:36:13] (03PS3) 10Andrew Bogott: Move openstack files and manifests into a module [puppet] - 10https://gerrit.wikimedia.org/r/167713 [16:36:24] It's in my brain for "fix today" or at least "investigate today", so no worries [16:37:39] greg-g: the HHVM CSS issue has https://bugzilla.wikimedia.org/72357 and https://bugzilla.wikimedia.org/72205 attached [16:38:03] NotASpy: awesome, thanks, I'll look later, I'm in the middle of 1.5 hours of 1:1s :) [16:38:09] does the issue relate to tidy? [16:38:24] https://bugzilla.wikimedia.org/show_bug.cgi?id=72345 [16:38:24] it's mentioned aude in the bugs [16:38:30] ok [16:38:54] and can be fixed by purging the page with HHVM disabled by way of a temporary fix [16:39:13] morebots: yt? [16:39:13] I am a logbot running on tools-exec-14. [16:39:13] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [16:39:13] To log a message, type !log . [16:39:28] similarly, can be made to manifest itself by purging any page with HHVM enabled. [16:49:43] (03PS1) 10Yuvipanda: shinken: Autogenerate config from shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/168102 [16:51:28] (03PS1) 10QChris: Add link to pagecounts-all-site dataset [puppet] - 10https://gerrit.wikimedia.org/r/168104 [16:59:03] !log reinstalling OS on virt1006 [16:59:10] Logged the message, Master [17:00:03] robh: sorry, tell me again how to specify precise for a pxe install? [17:00:56] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: puppet fail [17:06:25] (03PS1) 10Andrew Bogott: Specify precise for virt servers [puppet] - 10https://gerrit.wikimedia.org/r/168106 [17:07:40] mutante or cmjohnson, could I get a review of ^? Haven't done this before. [17:09:05] (03CR) 10Andrew Bogott: shinken: Autogenerate config from shinkengen (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/168102 (owner: 10Yuvipanda) [17:09:27] (03CR) 10Cmjohnson: [C: 031] Specify precise for virt servers [puppet] - 10https://gerrit.wikimedia.org/r/168106 (owner: 10Andrew Bogott) [17:12:44] cmjohnson: thanks [17:18:33] (03CR) 10Andrew Bogott: [C: 032] Specify precise for virt servers [puppet] - 10https://gerrit.wikimedia.org/r/168106 (owner: 10Andrew Bogott) [17:19:13] cmjohnson: and then I need to force a puppet run on carbon for those settings to take effect, right? Or is there another step? [17:20:39] after merge? yeah just force the puppet run [17:20:46] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:21:22] (03PS1) 10Rush: phab match bz 10MB upload limit [puppet] - 10https://gerrit.wikimedia.org/r/168109 [17:23:04] andrewbogott: if you look in the install module dhcpd files, linux.host.entries.S1-115200 [17:23:12] you'll see entries with precise in their stanzas [17:23:16] (just grep for precise) [17:23:33] robh: yep, cmjohnson helped me sort that already, thanks. [17:23:37] cool [17:23:40] sorry for multiple pings [17:27:38] (03PS1) 10Cmjohnson: Revert "Temporarily adding ameriium to production to test pxe boot" [puppet] - 10https://gerrit.wikimedia.org/r/168110 [17:27:50] (03PS1) 10Cmjohnson: Revert "Fixing americium ip's wrong row" [dns] - 10https://gerrit.wikimedia.org/r/168111 [17:28:12] (03PS1) 10Cmjohnson: Revert "Temporarily adding americium to production to test pxe" [dns] - 10https://gerrit.wikimedia.org/r/168112 [17:28:14] (03CR) 10jenkins-bot: [V: 04-1] Revert "Temporarily adding americium to production to test pxe" [dns] - 10https://gerrit.wikimedia.org/r/168112 (owner: 10Cmjohnson) [17:29:18] paravoid: I guess you are yellow in the etherpad? [17:29:19] :) [17:29:26] (SoS) [17:29:32] I am :) [17:30:21] this is awesooommmme thanks for doing notes [17:30:23] oh man so easy [17:30:40] wanna give the update? [17:30:43] yup [17:30:45] uhh [17:30:48] perfect [17:31:01] feel free to complete my notes if I forgot anything [17:31:06] (03CR) 10Cmjohnson: [C: 032] Revert "Temporarily adding ameriium to production to test pxe boot" [puppet] - 10https://gerrit.wikimedia.org/r/168110 (owner: 10Cmjohnson) [17:31:22] YuviPanda: no account in Phab yet? [17:31:44] greg-g: in the prod one? I guess not [17:31:47] YuviPanda: see: https://phabricator.wikimedia.org/T789 [17:31:59] (03CR) 10Cmjohnson: [C: 032] Revert "Fixing americium ip's wrong row" [dns] - 10https://gerrit.wikimedia.org/r/168111 (owner: 10Cmjohnson) [17:32:01] (03CR) 10Cscott: "https://gerrit.wikimedia.org/r/168091 was SWATted, so this patch should be good to go." [puppet] - 10https://gerrit.wikimedia.org/r/168089 (owner: 10Cscott) [17:32:08] PROBLEM - Host virt1006 is DOWN: CRITICAL - Plugin timed out after 15 seconds [17:32:44] YuviPanda: pointing someone in the right direction (in lieu of doing the work yourself) is acceptable [17:32:45] (03PS2) 10Cmjohnson: Revert "Temporarily adding americium to production to test pxe" [dns] - 10https://gerrit.wikimedia.org/r/168112 [17:32:56] (03PS11) 10Filippo Giunchedi: Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 (owner: 10Chad) [17:33:03] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Adding tools for banning/unbanning an ES node [puppet] - 10https://gerrit.wikimedia.org/r/164617 (owner: 10Chad) [17:33:04] greg-g: ah, I could do it myself, yeah. isn't too hard. [17:33:17] (03CR) 10Cmjohnson: [C: 032] Revert "Temporarily adding americium to production to test pxe" [dns] - 10https://gerrit.wikimedia.org/r/168112 (owner: 10Cmjohnson) [17:34:04] YuviPanda: don't make yourself the one i always go to ;) [17:34:38] greg-g: heh :) I've shinken monitoring things now, I'll probably have it doing the same things icinga is doing... [17:34:43] so the alerts will be nicer [17:34:52] (03PS1) 10Cscott: Use url-downloader proxy to reach PediaPress POD service. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168114 (https://bugzilla.wikimedia.org/71675) [17:36:15] (03CR) 10Cscott: "We should either SWAT this, or else https://gerrit.wikimedia.org/r/168114" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168082 (owner: 10MarkTraceur) [17:36:47] (03PS2) 10Cscott: Revert "Re-enable PediaPress POD in production." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168082 (https://bugzilla.wikimedia.org/71675) (owner: 10MarkTraceur) [17:37:17] RECOVERY - Host virt1006 is UP: PING OK - Packet loss = 0%, RTA = 1.28 ms [17:40:21] YuviPanda: cool :) [17:40:54] greg-g: not today tho [17:44:22] (03PS1) 1001tonythomas: Made the deployment mx talk back to mediawiki-verp labs host [puppet] - 10https://gerrit.wikimedia.org/r/168116 [17:45:10] Jeff_Green: https://gerrit.wikimedia.org/r/#/c/168116/ looks good ? [17:48:48] cmjohnson: pxe boot on that box is saying "EXT4-fs (nbd4): VFS: Can't find ext4 filesystem" that seems new... [17:49:29] cmjohnson: can you suggest a way for me to determine if that's a config mistake or if the drive controller is kaput? [17:50:46] tonythomas: looking [17:50:53] okey. [17:51:35] (03CR) 10Ottomata: [C: 031] "Ariel, are you ok with this?" [puppet] - 10https://gerrit.wikimedia.org/r/168104 (owner: 10QChris) [17:51:37] andrewbogott: was that during the install? [17:51:42] (03CR) 10Jgreen: [C: 032 V: 031] Made the deployment mx talk back to mediawiki-verp labs host [puppet] - 10https://gerrit.wikimedia.org/r/168116 (owner: 1001tonythomas) [17:51:51] tonythomas: merged [17:51:59] Jeff_Green: yay ! [17:52:00] great [17:52:20] now we need to force a git fetch on deployment-mx [17:52:24] cmjohnson: Yes, I think so [17:52:32] yeah. [17:52:33] It's on the mgmt console… a million things like that. [17:52:55] waiting for that one to show up in the exim4.conf ( I dont have access for exim4.conf ) though [17:53:02] do you know for sure it picked up the precise installer? [17:53:04] cmjohnson: wait, hang on, may be mistaken... [17:53:16] tonythomas: sync'd [17:53:23] great [17:53:29] forcing a puppet run [17:54:29] cmjohnson: um, I was looking at the wrong window, so nevermind for now [17:54:31] tonythomas: done [17:54:38] (although now I have to worry about why virt1005 was saying that instead...) [17:54:47] heh..ok [17:55:12] Jeff_Green: nice. its showing up in exim4.conf ? [17:56:27] (03PS2) 10Dzahn: nfs.pp - remove pmtpa, add codfw [puppet] - 10https://gerrit.wikimedia.org/r/167885 [17:56:56] tonythomas: checking [17:56:57] (03CR) 10Dzahn: nfs.pp - remove pmtpa, add codfw (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/167885 (owner: 10Dzahn) [17:57:11] looked like it applied [17:57:13] okey. [17:57:34] domainlist verp_domains = mediawiki-verp.wmflabs.org [17:57:52] command = /usr/bin/curl -H 'Host: mediawiki-verp.wmflabs.org' http://mediawiki-verp.wmflabs.org/w/api.php -d "action=bouncehandler" --data-urlencode "email@-" -o /dev/null [17:57:55] looks good [17:58:02] (03PS3) 10Dzahn: nfs.pp - remove pmtpa, add codfw [puppet] - 10https://gerrit.wikimedia.org/r/167885 [17:58:12] (03CR) 10Dzahn: nfs.pp - remove pmtpa, add codfw (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/167885 (owner: 10Dzahn) [17:58:26] yeah. that looks great ! [17:58:39] let me try to send a mail to some unknown user [17:58:41] in a sec [17:59:07] (03CR) 10Dzahn: "failover for netapp" [puppet] - 10https://gerrit.wikimedia.org/r/167885 (owner: 10Dzahn) [18:00:04] yurik: Respected human, time to deploy Wikipedia Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141022T1800). Please do the needful. [18:00:54] (03CR) 10Alexandros Kosiaris: [C: 032] nfs.pp - remove pmtpa, add codfw [puppet] - 10https://gerrit.wikimedia.org/r/167885 (owner: 10Dzahn) [18:01:06] paravoid: who should I get to review the cassandra module? [18:01:10] you want to, or should I ask someone else? [18:01:22] I can [18:01:32] mmmk danke, [18:03:07] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures [18:04:48] (03CR) 10Dzahn: [C: 031] "merge tomorrow, Robh updated ticket with "going live on 2014-10-23"" [puppet] - 10https://gerrit.wikimedia.org/r/167627 (owner: 10Dzahn) [18:08:54] Jeff_Green: looks like there is an issue somewhere - exim -bt 'wiki-testwiki-5-nduy3i-k988JSwMyq5PqrOg@mediawiki-verp.wmflabs.org' [18:08:54] from the deployement-mx takes the DNS lookup [18:08:58] not our verp router [18:09:31] uh oh [18:10:18] the domainlist verp_domains value is mediawiki-verp.wmflabs.org right ? [18:11:00] (03PS2) 10Dzahn: move 'noc' from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/168006 [18:11:32] (03CR) 10Dzahn: move 'noc' from misc to module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/168006 (owner: 10Dzahn) [18:11:33] Jeff_Green: can I have a paste of the exim4.conf ? [18:12:33] (03PS3) 10Dzahn: move 'noc' from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/168006 [18:13:43] _joe_: can https://gerrit.wikimedia.org/r/#/c/167310/ be deployed? [18:15:10] So 21 of the roughly 115 juniper items are accounted for on the service contract output that paravoid gave me [18:15:53] what about the other way around? :) [18:16:33] tonythomas: see your homedir on deployment-mx [18:16:46] Jeff_Green: k. checking [18:17:07] (03PS2) 10Ottomata: Have wikimetrics accept server-status requests on http [puppet] - 10https://gerrit.wikimedia.org/r/168096 (owner: 10QChris) [18:17:57] Jeff_Green: can you just do a exim4 restart ? [18:18:02] might be thats the issue ? [18:18:07] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Puppet has 1 failures [18:18:14] the configs looks just like what we need [18:18:20] puppet should restart it on change, but sure [18:18:22] (03CR) 10Ottomata: [C: 032] Have wikimetrics accept server-status requests on http [puppet] - 10https://gerrit.wikimedia.org/r/168096 (owner: 10QChris) [18:18:37] done [18:18:48] tonythomas: i need to grab lunch, back in ~15 [18:18:57] Jeff_Green: okey [18:18:59] c ya [18:19:47] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [18:20:23] (03CR) 10Ori.livneh: Initial commit of Cassandra puppet module (032 comments) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [18:20:45] ottomata: ^ the comments are against PS8, I forgot to submit them the other day, but I think they're still relevant [18:20:46] (03PS4) 10Dzahn: Tampa decom - clean up 152.80.208.in-addr.arpa [dns] - 10https://gerrit.wikimedia.org/r/167868 [18:21:34] ori: did you review the whole thing? [18:21:54] paravoid: no, but i figured a pair of partial comments were better than no feedback [18:22:00] damn [18:22:17] if you had, I wouldn't need to :) [18:22:26] (03CR) 10Dzahn: [C: 031] "ok, so not deleting the entire zone, instead, clean it up and can be used for codfw now, see the FIXME's" [dns] - 10https://gerrit.wikimedia.org/r/167868 (owner: 10Dzahn) [18:23:59] (03CR) 10Ottomata: Initial commit of Cassandra puppet module (032 comments) [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 (owner: 10Ottomata) [18:24:46] (03PS2) 10Yuvipanda: shinken: Autogenerate config from shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/168102 [18:24:50] andrewbogott: ^ can you CR? [18:25:49] (03CR) 10Ori.livneh: shinken: Autogenerate config from shinkengen (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/168102 (owner: 10Yuvipanda) [18:25:50] YuviPanda: didn't i? [18:26:34] (03CR) 10Alexandros Kosiaris: [C: 032] remove Tampa networks from network.pp [puppet] - 10https://gerrit.wikimedia.org/r/167872 (owner: 10Dzahn) [18:27:41] ori: thanks, fixed :) I usually set the comment header, forgot this time [18:27:56] (03PS3) 10Yuvipanda: shinken: Autogenerate config from shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/168102 [18:28:00] (03PS2) 10Dzahn: remove Tampa networks from network.pp [puppet] - 10https://gerrit.wikimedia.org/r/167872 [18:28:03] andrewbogott: no, looks like not [18:28:24] ottomata: I have a few questions about the cassandra puppetization [18:28:42] YuviPanda: I did, but you committed a patch without responding. You want the exec to run as part of every puppet run, right? [18:28:50] ottomata: do you have time to chat about this? [18:28:52] oh damn [18:29:20] andrewbogott: yeah, exec once per run. [18:29:44] andrewbogott: although in the future I'd want it to notify only if there's a change, but no way to do that right now since the exec can produce multiple files [18:29:46] gwicke: ja, i'm working on it now that I have some comments form ori [18:29:51] what's up? [18:30:06] ottomata: so one thing I'm wondering about is how restbase would get the cluster info from puppet [18:30:11] (03CR) 10Andrew Bogott: [C: 031] shinken: Autogenerate config from shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/168102 (owner: 10Yuvipanda) [18:30:27] aye, ok, [18:30:35] ori: can you +1 as well? [18:30:35] so, i'm thinking the restbase role class will define this [18:30:45] (or cassandra role, not sure which yet) [18:30:59] but, either way, it would likely be a hash structure that defined the cluster [18:31:02] topology [18:31:04] restbase and cassandra don't necessarily live on the same box [18:31:08] that's fine [18:31:15] (03PS1) 10Dzahn: make hooft a real 'bastionhost' [puppet] - 10https://gerrit.wikimedia.org/r/168124 [18:31:18] it can be a ::config class that just declares variables [18:31:24] that would be safe to include anywhere [18:31:39] okay [18:31:43] see also [18:31:44] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/analytics/hadoop.pp#L42 [18:31:58] note that the ::config class doesn't actually do anything [18:32:05] you can incldue that anywhere if you want to reference the varaibles defined there [18:32:33] the other hadoop role classes inherit from it, so they then have the ::config variables in local scope [18:32:42] so we'd define one of those per cluster? [18:32:43] YuviPanda: reviewing [18:32:48] what kind of cluster info? [18:32:56] ori: ty. [18:33:00] paravoid: cassandra cluster [18:33:13] nodes, topology etc [18:33:15] YuviPanda: where's the package? [18:33:20] ori: how does pick work when both values are undefined? [18:33:22] why would restbase care about the topology though? [18:33:28] (03CR) 10Dzahn: "should be fine after Change-Id: I3fb21db64beb3b ?:)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/164542 (owner: 10Alexandros Kosiaris) [18:33:29] ori: the python code is in operations/software/shinkengen [18:33:33] paravoid: it sets up replication for instance [18:33:38] ottomata: i think you need to have at least one value that isn't undefined [18:33:38] per keyspace [18:33:48] ori: and package is locally built and in labsdebrepo for the shinken project. [18:33:51] hm, ok, i won't use pick for ones where i want the default to be undef [18:33:55] hrm [18:34:00] ori: code is still unstable, needs a fair bit more work done [18:34:43] yeah, paravoid, its kinda like the app schema needs to know about the topology in order to define replication. unless we somehow made a fancy custom replication topology that could do this automatically [18:35:11] some custom auto-balance thing, but gwicke says that certain keyspaces will have specific replication needs [18:35:33] YuviPanda: what provisions /etc/shinkengen.yaml? [18:35:33] (03CR) 10Dzahn: "should be good after Change-Id: I61f2f9fcb6392ea13 which should be good after Change-Id: I3fb21db64beb3b" [puppet] - 10https://gerrit.wikimedia.org/r/96424 (owner: 10Dzahn) [18:35:45] isn't replication something that we would set as admins, rather than something the app would define itself? [18:35:53] as is now, with multi-dc replication, the keyspaces need to say how many replicas belong in each dc [18:35:54] explicitly [18:35:59] ori: ah, hmm. nothing. let me puppetize that. [18:36:10] ottomata, paravoid: it's not related to balancing; the only thing restbase needs to know is the list of dcs and available replicas in each [18:36:19] it also needs a list of nodes to contact [18:36:23] YuviPanda: that way you can make the exec be refreshonly => true, subscribe => File['/etc/shinkengen.yaml'], which would resolve your FIXME [18:36:32] list of nodes to contact is easy, but why does it need to know dcs? [18:36:38] (03CR) 10Dzahn: [C: 032] remove Tampa networks from network.pp [puppet] - 10https://gerrit.wikimedia.org/r/167872 (owner: 10Dzahn) [18:36:39] oh, because you say the app is creating keyspaces? [18:36:53] yes, restbase is in charge of creating keyspaces [18:36:57] one per table / bucket [18:37:00] ori: no, because the generated config files are based on data from wikitech, so they will change often even if the .yaml file doesn't [18:37:14] depending on the value of the data in those tables, it sets up replication factors [18:37:25] YuviPanda: is it expensive to run? [18:37:41] ori: nope, takes a few secs. [18:37:50] with cross-dc replication you set the number of replicas per DC [18:38:02] paravoid: http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/create_keyspace_r.html#reference_ds_ask_vyj_xj__NTS [18:38:22] YuviPanda: you could have a --dry-run arg that does all the work but doesn't actually update files; it only prints to stderr some indication of whether it would update files or not. and you can make the dry run invocation be the onlyif => clause. [18:38:29] why would the app create keyspaces? [18:38:30] ottomata, paravoid: or http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/alter_keyspace_r.html?scroll=reference_ds_fdz_3l4_wj__example_unique_1 [18:38:44] heh [18:38:45] paravoid: restbase is a table / bucket storage service [18:39:03] ori: ah, hmm. that makes sense, yeah. [18:39:05] that means that clients with the appropriate authorization can create tables and buckets [18:39:26] which all map to one or more keyspaces in Cassandra [18:39:28] what would be the use case for that? [18:39:34] ori: will want to do that in a separate patch, maybe. Since the package itself is still in flux... [18:39:42] paravoid: for tables and buckets? [18:39:52] or for creation? [18:39:56] for restbase creating its own keyspaces [18:40:08] what kind of keyspaces would it create? [18:40:14] Jeff_Green : back ? [18:40:22] back [18:40:22] paravoid: it's one of the main things it does [18:40:34] provide a sane table abstraction [18:41:09] no I mean specifically [18:41:14] each logical table maps to one Cassandra keyspace, so that we can a) maintain secondary indexes and metadata, and b) control replication factors to match the use case [18:41:17] Jeff_Green: I think I found the issue [18:41:29] with an example :) [18:41:42] (03PS4) 10Yuvipanda: shinken: Autogenerate config from shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/168102 [18:41:44] I found the same happening in my other machine, and resolved it by adding the hostname to /etc/exim4/wikimedia_domains [18:42:00] (03CR) 10Dzahn: [C: 04-2] "shouldn't just delete this file without also deleting the entire puppet class and includes in site.pp, unsure how much longer, but we stil" [puppet] - 10https://gerrit.wikimedia.org/r/167325 (owner: 10Dzahn) [18:42:01] paravoid: are you asking for the schema of the tables in the keyspace? [18:42:10] tonythomas: looking [18:42:27] what would constitute a keyspace? [18:42:29] Jeff_Green: so we would want mediawiki-verp.wmflabs.org in wikimedia_domains [18:42:36] and we are good [18:42:40] (03CR) 10Andrew Bogott: [C: 032] shinken: Autogenerate config from shinkengen [puppet] - 10https://gerrit.wikimedia.org/r/168102 (owner: 10Yuvipanda) [18:42:43] paravoid: one logical table -> one keyspace [18:42:50] (03Abandoned) 10Dzahn: delete blog apache site [puppet] - 10https://gerrit.wikimedia.org/r/167325 (owner: 10Dzahn) [18:42:59] andrewbogott: ori thanks! I'll implement the dryrun option [18:43:12] for a test, can you just add that entry into the wikimedia_domains, and without a puppet apply - give the exim -bt ( or i can do that ) [18:43:17] paravoid: a keyspace is a concept that's somewhat similar to a database in MySQL [18:43:25] so that we can see whether the routing is happening ? [18:43:29] that was my understanding, yes [18:43:33] tonythomas: yep, on sec [18:43:37] it's basically just a little bit of metadata [18:43:39] okey :) [18:43:39] I don't think I'd ever let an app create its own databases :) [18:43:41] so very cheap to create [18:44:12] paravoid: this is not 'an app' [18:44:22] it's not? [18:44:27] apps are using this service to access storage [18:44:43] it's the only way they interact with the underlying Cassandra storage [18:44:55] Jeff_Green: yay [18:44:58] you changed right ? [18:45:12] paravoid: think of it as the equivalent of DynamoDB [18:45:19] I don't understand the difference [18:45:23] it's still an app isn't it? [18:45:36] or you mean that restbase creates keyspaces for every app uses it? [18:46:05] paravoid: cassandra is a storage mechanism for a storage service [18:46:08] tonythomas: try now [18:46:11] yeah. its happening :) [18:46:22] tonythomas01@deployment-mx:/etc/exim4$ exim -bt wiki-testwiki-5-nduz6t-b+lmLnWWh+MRU2nW@mediawiki-verp.wmflabs.org [18:46:22] wiki-testwiki-5-nduz6t-b+lmLnWWh+MRU2nW@mediawiki-verp.wmflabs.org [18:46:22] router = mw_verp_api, transport = mwverpbounceprocessor [18:46:27] looks good - [18:46:37] paravoid: restbase is in charge of configuring this backend, including creation of logical tables using keyspaces [18:46:37] (03PS1) 10Ori.livneh: hhvm: add a 'creates' clause for source code installation [puppet] - 10https://gerrit.wikimedia.org/r/168130 [18:46:37] so we would need a gerrit change for that ? [18:46:52] services! [18:47:20] this has implications to the security model [18:47:30] I still don't understand tbh, I'll have a look at the source [18:47:36] in this model, *every access* is authenticated and authorized in restbase [18:48:05] in our deployment, could you give a few examples of keyspace names? [18:48:06] so restbase fully owns the cluster [18:48:09] and their descriptions [18:48:29] Jeff_Green: and yay ! [18:48:34] the bounce is showing up in the mysql db [18:48:36] (03PS1) 10Dzahn: blog Apache config - remove SSLv3 [puppet] - 10https://gerrit.wikimedia.org/r/168132 [18:48:37] paravoid: https://github.com/gwicke/restbase#usage [18:48:54] this is backed by several tables [18:49:09] each of which is stored in one cassandra keyspace [18:49:37] specifically? [18:49:40] (03CR) 10Ori.livneh: [C: 032] hhvm: add a 'creates' clause for source code installation [puppet] - 10https://gerrit.wikimedia.org/r/168130 (owner: 10Ori.livneh) [18:49:50] what would be the keyspace here? [18:50:45] paravoid: http://api.wmflabs.org/v1/en.wikipedia.org/pages.html/ is a listing of a table backed by a keyspace [18:51:01] (03CR) 10Dzahn: "yes, this is now here as a reminder to make us add automatic Icinga monitoring for cert expiry when using install_certificate" [puppet] - 10https://gerrit.wikimedia.org/r/15561 (owner: 10Catrope) [18:51:17] paravoid: it might be quicker to do a hangout & chat about this [18:52:02] (03CR) 10Dzahn: "also see https://gerrit.wikimedia.org/r/#/c/163798/" [puppet] - 10https://gerrit.wikimedia.org/r/15561 (owner: 10Catrope) [18:52:34] paravoid: here are some examples for how those tables are created in the cassandra backend: https://github.com/gwicke/restbase-cassandra/blob/master/storage/cassandra/test.js#L340 [18:53:07] the implementation of all that is in https://github.com/gwicke/restbase-cassandra/blob/master/storage/cassandra/db.js [18:54:19] (why use github.com/gwicke btw and not gerrit?) [18:54:25] (03PS3) 10Dzahn: Give cscott the ability to deploy zuul changes. [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott) [18:54:29] (03PS1) 10Yuvipanda: shinken: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/168133 [18:54:31] (03PS1) 10Yuvipanda: shinken: Add missing shinkengen.yaml config file [puppet] - 10https://gerrit.wikimedia.org/r/168134 [18:54:33] andrewbogott_afk: ^ followups that I missed. [18:54:52] paravoid: I'm hoping for phabricator to be ready soon [18:55:00] for code review? [18:55:14] (03CR) 10Dzahn: [C: 032] "i'm merging this. Robh said on ticket that it can go "live on Wednesday, 2014-10-22" , that is now, has approvals from robla and all" [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott) [18:55:15] it'll be months for that [18:55:34] has this code been code reviewed at all? :) [18:55:41] but anyway, I digress [18:55:53] (03CR) 10Dzahn: [V: 032] "i'm merging this. Robh said on ticket that it can go "live on Wednesday, 2014-10-22" , that is now, has approvals from robla and all" [puppet] - 10https://gerrit.wikimedia.org/r/165867 (owner: 10Cscott) [18:55:59] at some point it was Real Soon Now ;) [18:56:22] cscott: ^ welcome to being a "contint-admin" [18:56:31] gwicke: there'll be a plan in place to figure out what to do + test instance to check things out by end of 2014, IIRCC [18:56:38] is the existing cluster still running? [18:56:57] it is still running, but doesn't have restbase on it [18:57:11] YuviPanda: k, looking forward to it [18:57:28] ok.. [18:57:55] for now github is quite useful, as it lets us easily test each commit against a real cassandra cluster [18:58:06] can you give me an example of a keyspace (its name, description) and tables (names, descriptions) that would be contained in it? [18:58:56] (03PS11) 10Ottomata: Initial commit of Cassandra puppet module [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/166888 [18:59:20] paravoid: keyspaces have names like org_wikipedia_en_T_pages_html [18:59:52] this would back the table 'pages.html' in the 'en.wikipedia.org' domain [19:00:37] there is also a system domain, which has metadata tables for ACLs and high-level schemas [19:00:50] wait [19:01:04] so there's a keyspace "org_wikipedia_en_T_pages_html"; how many tables does it have in it? [19:01:05] within each keyspace, there is a 'data' and a 'meta' table, plus any number of secondary index tables [19:01:34] paravoid: one logical table == one keyspace [19:01:44] I'm talking about cassandra tables [19:01:49] CREATE TABLE? [19:02:01] as I said, at least 'data' and 'meta' [19:02:21] ok [19:02:27] this is a strange design.. [19:03:40] I arrived at this design after starting out with a single keyspace [19:03:43] is Sean involved in this at all? [19:03:56] but then realized that replication can't be controlled per table [19:04:09] gwicke: why does replication need to be per table? [19:04:17] I've been keeping Sean in the loop, but so far he hasn't really dug in [19:04:22] why not just always 2 or 3 replicas of everything? [19:05:04] ottomata: dome data is merely a cache, and doesn't need extensive replication [19:05:11] dome? [19:05:17] some :) [19:05:18] *some [19:05:21] sorry ;) [19:05:34] cache, because the canonical data is in mysql? [19:05:41] other data is really valuable, and warrants replication factors of 5 [19:06:00] of 5? for redundancy? or just read performance? [19:06:07] also, replication is not only used for durability; it's also used to scale read loads [19:06:28] i'm all curious now too! :) [19:06:29] which differs between logical tables [19:06:36] what's an example of something you'd want replication=5 on? [19:06:54] I think this is conflating the DBA's responsibilities vs. the service's (and its users) responsibilities [19:07:02] it's a bit peculiar and I'm not sure if it's a good idea [19:07:11] unusual at least [19:07:12] (03CR) 10Dzahn: [C: 032] phab match bz 10MB upload limit [puppet] - 10https://gerrit.wikimedia.org/r/168109 (owner: 10Rush) [19:07:20] well, i do understand what gwicke is saying here though, he's saying restbase is not the app, its a service interface [19:07:27] apps using it aren't going to know anything about cassandra [19:07:34] a bit /too/ smart maybe :) [19:07:36] ottomata: tables with high read loads, and tables with information that's really valuable [19:07:54] gwicke: aye, hm, how are those tables chosen? [19:08:01] not by the app, you say, but by restbase? [19:08:08] if by restbase, it must be pretty automatic? [19:08:10] currently restbase only sets up replication on table creation [19:08:10] or is that part of the api? [19:08:27] any changes have to be done manually [19:08:32] but that'll get old very quickly [19:08:43] so we'll need proper automation [19:08:49] aye, restbase will do this? [19:08:54] eventually, yes [19:08:55] like, self alter the keyspaces based on load? [19:09:02] (03PS1) 10Ori.livneh: Fix-up for I8f16b61f2 [puppet] - 10https://gerrit.wikimedia.org/r/168140 [19:09:03] likely not that [19:09:05] keyspace replication* [19:09:06] aye [19:09:20] but change factors based on configured replication class [19:09:25] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix-up for I8f16b61f2 [puppet] - 10https://gerrit.wikimedia.org/r/168140 (owner: 10Ori.livneh) [19:09:32] see for example 'reduced redundancy' in DynamoDB [19:09:33] oo, tell us about 'replication class' [19:09:34] :) [19:09:43] that's a restbase thing, i assume? [19:09:50] no, DynamoDB [19:10:24] mutante: whoo (belatedly) [19:10:35] anyway, I really need to get going [19:10:42] ottomata: but it's something we can add to restbase too [19:10:46] gwicke: i wonder if we could integrate travis/github with jenkins somehow [19:10:55] cscott: welcome, let us know if anything is missing [19:11:02] or, you know, stick to what the rest of the organization uses :) [19:11:05] gwicke: rather than create our own travis clone, for certain projects it might be easier just to use travis for the tests [19:11:18] paravoid: we don't have VM support currently [19:11:31] so it's not as easy as we'd like [19:11:37] ok, so, still curious, when creating new keyspaces, how will restbase choose the replicaiton level? [19:11:52] is it specified in the REST request? [19:12:03] ottomata: yes, eventually [19:12:14] it's a property of the table [19:12:15] and while we can add VM support to jenkins/phabricator/whatever -- at some point you have to ask, "is this really our core mission? can't we just use the tools other people use?" [19:12:15] something wrong ? git pull in operations/puppet taking infinite time :( [19:12:21] my opinion at least. [19:12:36] ok, hm. [19:12:36] tonythomas: have you actually waited an infinite duration to verify that? ;) [19:12:38] cscott: can we just not ask and respond that for every repository? [19:12:57] but the app won't do so based on Datacenter, like restbase wil, right? [19:13:02] cscott: nearly :D [19:13:11] we're supposed to work as a team, not everyone doing what they think is best for them [19:13:15] app will just be all like: "make me a new table that is super reliable" [19:13:24] paravoid: yes to a large extent. when i was at olpc, a much smaller org, it was clear that our resources didn't really extend to being able to run our own github, etc. [19:13:30] and restbase will do "create keyspace blabla replication{dc1: 5, dc2: 5}" [19:13:38] using a common vcs/review/CI infrastructure is not too much to ask I think [19:13:39] paravoid: we can always switch off those tests, but I think that'd be a loss [19:13:53] paravoid: wmf is larger, so it has the capability to keep its sources in house (for example). but i think we can intelligently draw lines and say "yeah, other people are just better at this than we are" [19:13:56] ottomata: yes [19:14:04] cscott: this is orthogonal to my point [19:14:17] ottomata: it also stores the 'super reliable' property in metadata, so it can evolve what that actually means in terms of cassandra replication [19:14:18] ok, design aside, could restbase infer this then from cluster status? [19:14:22] paravoid: i thought you were making the slippery slope argument. [19:14:29] you can get the list of configured DCs directly from cassandra [19:14:35] I'm making the point that the org has decided for now to use gerrit [19:14:36] rather than passing it into restbase configs [19:14:51] individual engineers or teams shouldn't go their own way just because they lack some feature [19:15:00] paravoid: that's where you lose me [19:15:07] restbase could just say "if someone asks me for super reliable new table, I will say replication = 5 in all DCs that cassandra knows about" [19:15:08] look at the jenkins jobs that are configured. they are *wildly* different. [19:15:17] paravoid: we have a choice of no cassandra testing or cassandra testing [19:15:27] (03CR) 10Dzahn: "feel like running in compiler and adding a link to the diff?" [puppet] - 10https://gerrit.wikimedia.org/r/167713 (owner: 10Andrew Bogott) [19:15:51] certainly many extensions can reuse configurations, and they should do so as much as possible. but at some point you have to try something new. if the something new works, then you can move more projects to it. [19:15:58] gwicke: (joining other sidetopic :p) could you just use gerrit and then do travis testing from the replicated github repo? [19:16:19] ottomata: that is in fact what i do for most of the ocg packages. [19:16:41] ottomata: it is not well integrated, though -- i have to push to a throwaway branch in my local github fork in order to trigger the travis tests [19:16:44] ottomata: hmm, getting it from cassandra could actually be an interesting option [19:16:53] didn't think of that before [19:17:06] ottomata: what i am asking for (which is probably different from what gwicke and paravoid are talking about) is integrating it more closely so that travis tests can gate gerrit. [19:17:07] we'd still need the list of all nodes [19:17:12] that's easier [19:17:36] ottomata: did you see something in the docs about how to get this info? [19:17:39] anyway really gtg [19:17:42] later! [19:17:46] later paravoid! [19:17:49] gwicke: from cassandra? [19:17:53] no, but nodetool status outputs it [19:18:00] so it must be there [19:18:03] yeah, but that's out of band IIRC [19:18:11] we'd need something usable from CQL [19:18:12] ? [19:18:14] oh [19:18:24] paravoid: goodnight! [19:19:12] 'describe cluster' looks promising [19:19:15] oo [19:19:17] gwicke [19:19:17] also [19:19:28] select * from system.schema_keyspaces; [19:19:40] hm [19:19:41] ah [19:19:44] hm, naw [19:19:48] that's just the keyspaces [19:19:51] not the replicas [19:19:55] yeah, btu strategry options are there [19:19:55] hm [19:20:08] you could have a canonical keyspace that every other one should emulate? [19:20:15] kinda hacky, meh [19:20:21] you can see datacenters [19:20:26] http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/describe_r.html?scroll=reference_ds_vyl_gns_xj__examples_unique_2 [19:20:29] meh, but those are manually configured anyway [19:20:31] nm [19:20:40] (03CR) 10Dzahn: [C: 031] sudo: create module, remove old files [puppet] - 10https://gerrit.wikimedia.org/r/167183 (owner: 10Giuseppe Lavagetto) [19:22:13] whoa, that shows the replicas, but not the DCs, gwicke [19:22:18] describe cluster [19:22:57] I get Cluster: Test Cluster [19:22:57] Hi. There is an issue with Commons files' thumbnails paths on other wikis: https://fr.wikipedia.org/wiki/Fichier:Les_dernières_notes_rédigées_par_Robespierre,_saisies_chez_lui_le_jour_de_son_exécution_34_sur_34_-_Archives_Nationales_-_AE-II-1419_pièce_a_et_f.jpg links to .../511px-thumbnail.jpg instead to .../511px-.jpg [19:23:18] ottomata: the problem I see with that is that it lists the clusters used per table [19:23:29] at least as I understand it [19:23:52] more accurately, per keyspace [19:25:47] gwicke: [19:25:47] select host_id, data_center, rack, rpc_address from peers; [19:25:48] ? [19:25:57] i'm just peeking at all the system tables now [19:26:21] ahhh [19:26:25] that sounds very promising [19:27:00] (03CR) 10Dzahn: "honestly not sure i see why removing those parameters and templates to files is an advantage, but up to Catrope to comment" [puppet] - 10https://gerrit.wikimedia.org/r/167413 (owner: 10Ori.livneh) [19:27:18] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: puppet fail [19:29:29] ottomata, also: select key, data_center from local; [19:30:36] ottomata: so no need for dcs, only need the list of cluster nodes [19:30:56] hm, aye, gwicke, but i only see a single entry for the local dc in the local table [19:30:57] not all nodes [19:31:11] but, aye, if you only want to list the nodes for the current dc [19:31:16] local table would help [19:31:45] select rpc_address from peers, local where peers.data_center = local.data_center? [19:31:47] somethign like thta [19:31:49] *nod*, local is the current node only afaik; peers only has the remaining nodes [19:32:01] dnno if you can do that in cql [19:32:07] oh [19:32:11] that's why i was missing one [19:32:11] got it [19:32:23] it'd be two queries [19:32:45] hm, i don't see rpc_address in lcoal [19:32:49] but, i gues sif you are talking to local [19:32:51] you alreday know that :p [19:32:54] the only wrinkle might be that it's hard to tell which node you are talking to [19:32:59] gwicke: you do need DCs [19:33:03] if you are going to set replicaiton strategry [19:33:20] yes, but those can be retrieved through CQL [19:33:39] aye, in peers(?) [19:33:43] *nod* [19:34:47] for multi-dc stuff there will likely be more issues we need to consider, like having one DC be primary for writes to avoid the cost of synchronous replication [19:35:04] primary for writes per domain [19:35:30] we can deal with that once we get there [19:37:10] aye, [19:37:20] NNEEEEEWay, guess that thing is just waiting for more review now :p [19:37:43] let me have a look over it after lunch [19:40:20] k [19:40:52] (03PS1) 10Cscott: ocg-render-admins should be able to run any command as the ocg user. [puppet] - 10https://gerrit.wikimedia.org/r/168143 [19:42:40] andrewbogott: could you take a look at that ^^ [19:43:33] (03CR) 10Andrew Bogott: [C: 032] ocg-render-admins should be able to run any command as the ocg user. [puppet] - 10https://gerrit.wikimedia.org/r/168143 (owner: 10Cscott) [19:44:32] thanks. [19:46:58] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [19:47:44] bd808, can you confirm that https://gerrit.wikimedia.org/r/#/c/167726/2/conf/wmf/localsettings.js has the right logstash server / port for production? [19:47:45] » var LOGSTASH_HOSTNAME='logstash1003.eqiad.wmnet'; 50 [19:47:45] » var LOGSTASH_PORT=12201; [19:49:13] subbu: looking ... [19:50:47] subbu: That looks right. `netstat -aln |grep 12201` on logstash1003 shows that port open and it matches the config in /etc/logstash/conf.d/10-input-gelf-gelf.conf [19:51:05] bd808, okay, great. deploying that in 10 mins. [19:51:24] Did you get the events to show up in beta? [19:51:37] yes, once we removed "type" from our events, they started to. [19:51:45] awesome [19:52:17] is there a logstash relatd wiki page where we can add this finding .. so others dont stumble on this? [19:52:49] subbu: https://wikitech.wikimedia.org/wiki/Logstash would be a good place I think [19:52:52] k [19:54:44] (03PS1) 10Ottomata: Remove all kraken references [puppet] - 10https://gerrit.wikimedia.org/r/168147 [19:58:04] I may need someone to check https://gerrit.wikimedia.org/r/#/c/167852/ for me; I'd have liked Yuvi to be able to look at it but it's blocking me atm. [19:59:27] bd808, https://wikitech.wikimedia.org/w/index.php?title=Logstash&diff=131972&oldid=125197 [20:00:05] gwicke, cscott, subbu: Respected human, time to deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141022T2000). Please do the needful. [20:02:10] (03CR) 10Andrew Bogott: [C: 031] "This is better!" [puppet] - 10https://gerrit.wikimedia.org/r/167852 (owner: 10coren) [20:02:57] PROBLEM - CI tmpfs disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 4 MB (0% inode=99%): [20:02:57] PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 4 MB (0% inode=99%): [20:03:19] (03CR) 10coren: [C: 032] "Good for testing certainly. More to come." [puppet] - 10https://gerrit.wikimedia.org/r/167852 (owner: 10coren) [20:09:10] (03CR) 10Andrew Bogott: "On virt1000 (controller node): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/442/" [puppet] - 10https://gerrit.wikimedia.org/r/167713 (owner: 10Andrew Bogott) [20:12:37] arlolra is getting a permission denied trying to restart parsoid service on the cluster. [20:13:00] "dsh -g parsoid sudo service parsoid restart" [20:13:22] andrewbogott, ^^ can you help? [20:13:40] looks like robh is on duty [20:13:46] subbu: looking... [20:13:57] (03PS1) 10coren: Tool Labs: web nodes are not infrastructure [puppet] - 10https://gerrit.wikimedia.org/r/168171 [20:13:59] arlolra: what machine are you logged in to? [20:14:03] tin [20:14:04] thanks. [20:14:06] PROBLEM - very high load average likely xfs on ms-be1012 is CRITICAL: CRITICAL - load average: 249.30, 132.39, 65.30 [20:14:31] im here but seems andrew is handling =] [20:14:41] subbu: and, what server is rejecting you? [20:14:44] * robh is totally cool with that [20:14:58] andrewbogott, arlolra is doing the deploy today. [20:15:02] so, i didn't try to restart. [20:15:06] PROBLEM - check_puppetrun on indium is CRITICAL: CRITICAL: puppet fail [20:15:30] (03CR) 10coren: [C: 032] "Trivial fix to https://gerrit.wikimedia.org/r/#/c/167852/" [puppet] - 10https://gerrit.wikimedia.org/r/168171 (owner: 10coren) [20:15:53] andrewbogott: the parsoid cluster [20:16:25] git deploy worked, but restarting the service didn't [20:17:00] arlolra: make sure you forward all the way to tin [20:17:07] I suspect that everyone else who is in the parsoid-admin group is also in 'deployment'. Looking to see what's missing... [20:17:44] hm, nope, restart is in there already [20:17:49] and i verified by logging onto wtp1001 that it has the new code and there were no silent failures. [20:18:08] arlolra: are you able to ssh in to any of the parsoid boxes? [20:18:19] andrewbogott, subbu: hhm, nevermind. adding -A to my ssh command (as suggested by gwicke) cleared things up [20:18:22] * arlolra feels bad [20:18:27] ok! [20:18:30] ah, ok. [20:19:14] arlolra: of course we *never* made that mistake before ;) [20:19:49] gwicke, speak for yourself. I always get it right the first time. [20:20:00] hehe ;) [20:20:07] PROBLEM - check_puppetrun on indium is CRITICAL: CRITICAL: puppet fail [20:20:22] :) [20:23:59] (03PS1) 10coren: Gridengine: remove redundant inclusion of parent [puppet] - 10https://gerrit.wikimedia.org/r/168173 [20:25:08] PROBLEM - check_puppetrun on indium is CRITICAL: CRITICAL: puppet fail [20:25:21] !log updated Parsoid to version 2a8dc85ce676391acd8c6255c4f94250612c9ee2 [20:25:23] (03CR) 10coren: [C: 032] "rivial fix to https://gerrit.wikimedia.org/r/#/c/167852/" [puppet] - 10https://gerrit.wikimedia.org/r/168173 (owner: 10coren) [20:25:30] Logged the message, Master [20:29:29] bd808: ping re: https://gerrit.wikimedia.org/r/168089 [20:29:40] bd808: (collection logs to logstash) [20:30:10] PROBLEM - check_puppetrun on indium is CRITICAL: CRITICAL: puppet fail [20:30:29] _joe_: did puppet on deployment-pdf01 get fixed? [20:30:46] (03PS1) 1001tonythomas: Added the beta hostname to local_domains to make exim use mw_verp_api [puppet] - 10https://gerrit.wikimedia.org/r/168175 [20:31:45] cscott: I'm not a root so I can't +2 :( [20:32:20] bd808: Jeff_Green https://gerrit.wikimedia.org/r/#/c/168175 [20:32:31] bd808: ah. who's root for logstash? [20:32:58] cscott: I have root on the servers but not the ops/puppet repo. [20:33:18] cscott: I just poke random opsen for merges :) [20:33:27] that's what i thought i was doing. ;) [20:33:35] but my random number generator seems to be faulty [20:33:36] tonythomas: i've gotta finish a fundraising project, can we pick up tomorrow? [20:34:00] Jeff_Green: of course :) c ya tomorrow then, [20:34:15] PROBLEM - Host ms-be1012 is DOWN: PING CRITICAL - Packet loss = 100% [20:34:22] ok have a good night [20:34:53] cscott: robh is on duty according to the channel topic. Maybe you can nudge him. [20:35:06] RECOVERY - check_puppetrun on indium is OK: OK: Puppet is currently enabled, last run 221 seconds ago with 0 failures [20:35:21] robh: nudge! https://gerrit.wikimedia.org/r/168089 (Add logs from Extension:Collection to logstash.) [20:35:28] (03CR) 10Andrew Bogott: [C: 032] Add logs from Extension:Collection to logstash. [puppet] - 10https://gerrit.wikimedia.org/r/168089 (owner: 10Cscott) [20:35:52] cscott: ? [20:36:09] you linked me to a merged patchset? [20:36:19] oh, andrew literally JSUST did it [20:36:21] heh [20:36:27] andrewbogott: KEEP MAKING MY JOB EASY [20:36:35] sorry, andrewbogott is just TOO GOOD [20:36:51] best duty week ever. [20:36:56] anyway, now off to look at my logs being stashed! [20:37:35] <^d> cscott: Is that why they call it logstash? Clever! [20:37:52] !log powercycling unresponsive ms-be1012 (this happened before, search SAL for hostname) [20:37:53] ms-be1012 is down (as mutante points out) [20:37:55] no, they call it that because the developers all have handlebar mustaches [20:37:57] oh, cool, you got it [20:37:59] Logged the message, Master [20:38:07] srsly, you guys are making this very easy. [20:38:09] * cscott twirls his mustache at ^d [20:38:18] * ^d looks for a razor [20:38:33] you shouldnt joke about a mans mustache. [20:38:48] robh: even if it looks like a big old log? [20:39:00] well, just grow a unix beard to go with it. [20:40:26] RECOVERY - Host ms-be1012 is UP: PING OK - Packet loss = 0%, RTA = 1.26 ms [20:40:33] !log forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/168089 [20:40:39] Logged the message, Master [20:41:05] RECOVERY - very high load average likely xfs on ms-be1012 is OK: OK - load average: 19.29, 5.02, 1.69 [20:41:58] mutante: thanks! [20:42:50] robh: yw [20:44:08] man o man parsoid and OCG are chatty with their log messages [20:45:43] bd808: we could suppress request logs [20:46:01] if logstash has trouble coping [20:46:33] gwicke: It seems to be keeping up so far. I was just being whiny [20:46:53] okay ;) [20:47:00] hey, it's popular ;) [20:48:14] (03PS1) 10BBlack: Merge branch '3.0.6-plus' into 3.0.6-plus-wm [debs/varnish] (3.0.6-plus-wm) - 10https://gerrit.wikimedia.org/r/168179 [20:48:16] (03PS1) 10BBlack: varnish (3.0.6plus~x-wm3) unstable; urgency=low [debs/varnish] (3.0.6-plus-wm) - 10https://gerrit.wikimedia.org/r/168180 [20:48:53] (03CR) 10BBlack: [C: 032] Merge branch '3.0.6-plus' into 3.0.6-plus-wm [debs/varnish] (3.0.6-plus-wm) - 10https://gerrit.wikimedia.org/r/168179 (owner: 10BBlack) [20:49:26] (03CR) 10BBlack: [C: 032] varnish (3.0.6plus~x-wm3) unstable; urgency=low [debs/varnish] (3.0.6-plus-wm) - 10https://gerrit.wikimedia.org/r/168180 (owner: 10BBlack) [20:52:11] (03PS1) 10BryanDavis: logstash: Add support for redis password [puppet] - 10https://gerrit.wikimedia.org/r/168182 [20:54:30] (03CR) 10BryanDavis: "Cherry picked to deployment-salt for testing." [puppet] - 10https://gerrit.wikimedia.org/r/168182 (owner: 10BryanDavis) [20:58:25] (03CR) 10BryanDavis: [C: 031] "Confirmed that applying this patch in beta fixes redis input plugin." [puppet] - 10https://gerrit.wikimedia.org/r/168182 (owner: 10BryanDavis) [20:58:26] RECOVERY - Disk space on lanthanum is OK: DISK OK [20:59:12] robh: You can +2 and merge my logstash config patch if you are faster than andrewbogott -- https://gerrit.wikimedia.org/r/#/c/168182/ [21:01:11] (03CR) 10Andrew Bogott: [C: 032] logstash: Add support for redis password [puppet] - 10https://gerrit.wikimedia.org/r/168182 (owner: 10BryanDavis) [21:01:16] heh [21:03:23] (03PS1) 10Ottomata: Open up more zookeeper ports in ferm [puppet] - 10https://gerrit.wikimedia.org/r/168185 [21:06:56] !log forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/#/c/168182/ [21:07:03] Logged the message, Master [21:12:23] (03PS1) 10Ejegg: RecordImpression log should not depend on qs order [puppet] - 10https://gerrit.wikimedia.org/r/168186 [21:15:23] bd808: man no where near as fast [21:15:28] (was on vendor phone call ;) [21:16:29] robh: he's cherry-picking all your easy dev happiness patches :) [21:16:48] !log updated OCG to version e977e2c8ecacea2b4dee837933cc2ffdc6b214cb [21:16:51] meh, whatevs, i go for the neglected tickets that piss folks off ;D [21:16:56] Logged the message, Master [21:17:56] RECOVERY - CI tmpfs disk space on lanthanum is OK: DISK OK [21:23:51] (03PS1) 10Andrew Bogott: Generate a useful failure message if archive-project-volumes is run in the wrong place. [puppet] - 10https://gerrit.wikimedia.org/r/168188 [21:25:13] (03PS2) 10Andrew Bogott: Generate a useful failure message if archive-project-volumes is run in the wrong place. [puppet] - 10https://gerrit.wikimedia.org/r/168188 [21:27:19] (03PS3) 10Andrew Bogott: Generate a useful failure message if archive-project-volumes is run in the wrong place. [puppet] - 10https://gerrit.wikimedia.org/r/168188 [21:28:18] (03CR) 10Andrew Bogott: [C: 032] Generate a useful failure message if archive-project-volumes is run in the wrong place. [puppet] - 10https://gerrit.wikimedia.org/r/168188 (owner: 10Andrew Bogott) [21:37:05] (03CR) 10AndyRussG: [C: 031] "Looks good! :)" [puppet] - 10https://gerrit.wikimedia.org/r/168186 (owner: 10Ejegg) [21:38:30] !log killed duplicate logstash services running on logstash1001 and restarted [21:38:36] Logged the message, Master [22:05:15] PROBLEM - Apache HTTP on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:05:15] PROBLEM - HHVM rendering on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:08:06] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.060 second response time [22:08:06] RECOVERY - HHVM rendering on mw1019 is OK: HTTP OK: HTTP/1.1 200 OK - 66746 bytes in 0.205 second response time [22:14:30] (03PS1) 10BryanDavis: logstash: Remove upstart jobs [puppet] - 10https://gerrit.wikimedia.org/r/168199 (https://bugzilla.wikimedia.org/72202) [22:18:11] (03PS1) 10Dzahn: Revert "Work around Bugzilla XML RPC bug with special Unicode characters" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/168200 [22:19:33] (03CR) 10Dzahn: [C: 032] "we ran into a new problem when Chase tried to get all the attachments via scripts and we suspect this could have broken it. reverting to c" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/168200 (owner: 10Dzahn) [22:20:24] (03CR) 10Dzahn: [V: 032] Revert "Work around Bugzilla XML RPC bug with special Unicode characters" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/168200 (owner: 10Dzahn) [22:21:29] (03CR) 10BryanDavis: "Cherry-picked to deployment-salt for testing. Applied cleanly on deployment-logstash1 and made both `sudo service logstash status` and `su" [puppet] - 10https://gerrit.wikimedia.org/r/168199 (https://bugzilla.wikimedia.org/72202) (owner: 10BryanDavis) [22:21:31] !log depooled amssq42 (esams text) for trusty testing [22:21:36] Logged the message, Master [22:22:16] robh, andrewbogott: One more logstash puppet patch at https://gerrit.wikimedia.org/r/#/c/168199/ [22:22:33] (03PS1) 10BBlack: Depool amssq42 backend for trusty testing [puppet] - 10https://gerrit.wikimedia.org/r/168201 [22:22:59] (03CR) 10BBlack: [C: 032 V: 032] Depool amssq42 backend for trusty testing [puppet] - 10https://gerrit.wikimedia.org/r/168201 (owner: 10BBlack) [22:34:23] (03CR) 10RobH: [C: 032] logstash: Remove upstart jobs [puppet] - 10https://gerrit.wikimedia.org/r/168199 (https://bugzilla.wikimedia.org/72202) (owner: 10BryanDavis) [22:34:29] bd808: handling now [22:34:47] sorry for the short delay, i get a bit stuck in and the pings dont always register [22:34:54] robh: I knew if I kept throwing them up you'd catch one. :) [22:35:24] robh: Oh no worries at all. I'm used to waiting months for a merge. ;) [22:35:33] bd808: So i merged it live, what system do i need to push the puppet run on for this again? [22:35:43] (its on puppetmaster, i just am lazy and dont wanna grep site.pp ;) [22:35:59] robh: I can handle the puppet runs. logstash100[123] [22:36:06] ahh, ok, all yours then =] [22:36:50] (03CR) 10Dzahn: "this fixed exporting attachments via XMLRPC for import into phab" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/168200 (owner: 10Dzahn) [22:37:38] (03CR) 10Dzahn: "reverted because it caused a new issue. this should not be applied to non-text attachments. see https://phabricator.wikimedia.org/T815" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/156100 (https://bugzilla.wikimedia.org/69747) (owner: 10Aklapper) [22:38:19] (03CR) 10Dzahn: "oops, broken link: T815" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/156100 (https://bugzilla.wikimedia.org/69747) (owner: 10Aklapper) [22:42:06] !log forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/#/c/168199/ [22:42:11] Logged the message, Master [22:42:29] (03PS1) 10Dzahn: tor - not using /srv, it's /var/lib [puppet] - 10https://gerrit.wikimedia.org/r/168203 [22:47:11] !log forced puppet run on logstash1002 to pick up https://gerrit.wikimedia.org/r/#/c/168199/ [22:47:19] Logged the message, Master [22:53:15] (03PS2) 10Dzahn: tor - not using /srv, it's /var/lib [puppet] - 10https://gerrit.wikimedia.org/r/168203 [22:53:20] <^d> I can do today's swat if nobody's dead set. [22:54:51] !log forced puppet run on logstash1003 to pick up https://gerrit.wikimedia.org/r/#/c/168199/ [22:54:56] Logged the message, Master [22:55:05] <^d> cscott: Ping for swat. I'm guessing 168108 doesn't have backports yet? [22:55:46] PROBLEM - Host virt1006 is DOWN: PING CRITICAL - Packet loss = 100% [22:56:26] (03CR) 10Dzahn: [C: 032] tor - not using /srv, it's /var/lib [puppet] - 10https://gerrit.wikimedia.org/r/168203 (owner: 10Dzahn) [22:59:27] RECOVERY - SSH on virt1006 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [22:59:46] RECOVERY - Host virt1006 is UP: PING OK - Packet loss = 0%, RTA = 3.32 ms [22:59:55] ^d: i'm here, and no it doesn't have backports yet [23:00:04] RoanKattouw, ^d, marktraceur, MaxSem, cscott: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20141022T2300). Please do the needful. [23:00:14] <^d> Ok sweet, we can do that :) [23:00:25] <^d> what branch(es) does it need to hit? [23:00:31] ^d: both [23:00:41] <^d> got it. [23:01:24] thanks [23:03:19] <^d> Do de dooooo [23:06:07] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [23:08:16] !log added jhernandez to wmf LDAP group [23:08:22] Logged the message, Master [23:08:54] <^d> cscott: https://gerrit.wikimedia.org/r/168212 for wmf3, wmf4 incoming. [23:09:13] <^d> https://gerrit.wikimedia.org/r/168213 [23:09:54] <^d> Ok, just waiting on jenkinsss [23:11:39] ^d: looks good to me [23:17:26] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [23:21:50] !log demon Synchronized php-1.25wmf3/extensions/Collection: (no message) (duration: 00m 04s) [23:21:57] Logged the message, Master [23:22:24] that was collection, now we just have the config change to sync? [23:22:27] !log demon Synchronized php-1.25wmf4/extensions/Collection: (no message) (duration: 00m 04s) [23:22:32] Logged the message, Master [23:22:33] <^d> Yep, just config now [23:22:58] (03CR) 10Chad: [C: 032] Use url-downloader proxy to reach PediaPress POD service. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168114 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [23:23:10] (03Merged) 10jenkins-bot: Use url-downloader proxy to reach PediaPress POD service. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168114 (https://bugzilla.wikimedia.org/71675) (owner: 10Cscott) [23:25:34] !log demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s) [23:25:39] Logged the message, Master [23:25:41] !log demon Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 00m 04s) [23:25:42] <^d> Ok, and we're all done [23:25:47] ok, i'll test it out! [23:25:48] Logged the message, Master [23:26:57] ^d: looks good, thanks! [23:27:15] <^d> you're welcome :) [23:27:35] * ^d declares swat a complete and total success by every metric you can think of [23:28:50] centimeters? [23:30:20] <^d> Hmm. [23:30:32] <^d> We could measure LOC in centimeters perhaps. [23:31:18] (03PS1) 10Reedy: Update CirrusSearch for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168214 (https://bugzilla.wikimedia.org/72332) [23:31:35] <^d> lol whoops. [23:31:43] <^d> i forgot we configure those and there's no lvs. [23:32:04] <^d> Reedy: gj to me :p [23:32:41] (03CR) 10Chad: [C: 032] Update CirrusSearch for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168214 (https://bugzilla.wikimedia.org/72332) (owner: 10Reedy) [23:32:48] (03Merged) 10jenkins-bot: Update CirrusSearch for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/168214 (https://bugzilla.wikimedia.org/72332) (owner: 10Reedy) [23:42:56] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [23:44:01] (03PS1) 10Dzahn: tor - add custom motd message about arm [puppet] - 10https://gerrit.wikimedia.org/r/168217 [23:45:29] ori: i'm using your motd module, i think it hasn't been before :) [23:45:43] you added it from vagrant [23:45:46] mutante: yep [23:45:48] cool! [23:46:19] (03CR) 10Ori.livneh: tor - add custom motd message about arm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/168217 (owner: 10Dzahn) [23:47:15] (03PS2) 10Dzahn: tor - add custom motd message about arm [puppet] - 10https://gerrit.wikimedia.org/r/168217 [23:47:16] good one, yep [23:47:49] (03CR) 10Dzahn: tor - add custom motd message about arm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/168217 (owner: 10Dzahn) [23:47:58] (03CR) 10Dzahn: [C: 032] tor - add custom motd message about arm [puppet] - 10https://gerrit.wikimedia.org/r/168217 (owner: 10Dzahn) [23:53:42] (03PS1) 10Dzahn: tor - fix syntax error in motd script [puppet] - 10https://gerrit.wikimedia.org/r/168219 [23:55:23] ori: i think it's missing to add #!/bin/sh [23:55:36] the other files in update-motd.d have that [23:55:38] and chmod+x [23:56:47] (03CR) 10Dzahn: [C: 032] tor - fix syntax error in motd script [puppet] - 10https://gerrit.wikimedia.org/r/168219 (owner: 10Dzahn)