[00:02:51] (03PS1) 10BBlack: switch neon v6 revdns to mapped [dns] - 10https://gerrit.wikimedia.org/r/160567 [00:03:40] (03CR) 10BBlack: [C: 032] switch neon v6 revdns to mapped [dns] - 10https://gerrit.wikimedia.org/r/160567 (owner: 10BBlack) [00:22:19] (03PS1) 10BBlack: add all ytterbium addrs to git repl rules [puppet] - 10https://gerrit.wikimedia.org/r/160573 [00:22:44] (03CR) 10BBlack: [C: 032 V: 032] add all ytterbium addrs to git repl rules [puppet] - 10https://gerrit.wikimedia.org/r/160573 (owner: 10BBlack) [00:24:05] (03PS4) 10BBlack: Disable IPv6 global autoconf if explicit addr config for interface [puppet] - 10https://gerrit.wikimedia.org/r/128089 [00:25:31] (03PS5) 10BBlack: Disable IPv6 global autoconf if explicit addr config for interface [puppet] - 10https://gerrit.wikimedia.org/r/128089 [00:27:32] (03CR) 10BBlack: [C: 032] Disable IPv6 global autoconf if explicit addr config for interface [puppet] - 10https://gerrit.wikimedia.org/r/128089 (owner: 10BBlack) [00:51:07] (03PS3) 10Krinkle: Revert "Set wgMathDisableTexFilter to fix performance regression" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158559 (https://bugzilla.wikimedia.org/49169) (owner: 10Physikerwelt) [00:51:19] (03CR) 10Krinkle: "Clean up weird topic. Remove stray "See" line." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158559 (https://bugzilla.wikimedia.org/49169) (owner: 10Physikerwelt) [00:56:00] (03CR) 10Krinkle: "-fixme; Fixed in d703e4598bc." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157993 (owner: 10Reedy) [01:04:56] (03Abandoned) 10Dzahn: exim templates - deprecated variable syntax [puppet] - 10https://gerrit.wikimedia.org/r/154371 (owner: 10Dzahn) [01:11:25] (03PS1) 10Gergő Tisza: Enable 1:1000 attribution logging for MediaViewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160578 [01:11:34] (03Abandoned) 10Dzahn: let NDAed people login on servermon [puppet] - 10https://gerrit.wikimedia.org/r/159419 (owner: 10Dzahn) [01:12:14] (03Abandoned) 10Dzahn: retab InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158313 (owner: 10Dzahn) [01:12:44] (03Abandoned) 10Dzahn: remove blog.wikimedia.org related things [puppet] - 10https://gerrit.wikimedia.org/r/153117 (owner: 10Dzahn) [01:13:28] (03Abandoned) 10Dzahn: add halfak to researchers admin group [puppet] - 10https://gerrit.wikimedia.org/r/155452 (owner: 10Dzahn) [01:15:35] (03CR) 10Dzahn: "all you are changing here should likely be reported as a bug agains MediaViewer. i already changed it to use thumbnail before seeing this," [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [01:17:52] (03CR) 10Dzahn: "please rebase" [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [01:18:10] (03CR) 10MZMcBride: "Shrug." [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [01:21:17] (03CR) 10MZMcBride: "Rebase button doesn't work due to a path conflict? Sigh." [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [01:27:28] (03CR) 10Dzahn: "i'm not sure yet how to turn something from misc to module structure in multiple smaller changes. i could move copy files one by one witho" [puppet] - 10https://gerrit.wikimedia.org/r/145472 (owner: 10Dzahn) [01:41:44] (03PS1) 10Dzahn: redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160585 [01:45:43] (03CR) 10Dzahn: redirect http->https on racktables (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/160164 (owner: 10Dzahn) [01:47:34] (03CR) 10Dzahn: "it's a bit surprising to see that discussion on gerrit now but not on the other services. how is functionality affected negatively?" [puppet] - 10https://gerrit.wikimedia.org/r/159729 (https://bugzilla.wikimedia.org/38516) (owner: 10Chmarkine) [01:48:05] (03CR) 10Dzahn: [C: 031] Add mobile subdomains for Wikimedia chapter wikis [dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [01:48:21] (03CR) 10Dzahn: "i wish i could +1 AND remove myself at the same time" [dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [01:48:46] (03PS2) 10Dzahn: redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160585 [01:49:19] (03CR) 10Dzahn: "submodules will drive me crazy" [puppet] - 10https://gerrit.wikimedia.org/r/160585 (owner: 10Dzahn) [01:55:38] (03CR) 10Dzahn: "removing self" [puppet] - 10https://gerrit.wikimedia.org/r/153600 (owner: 10Yuvipanda) [01:56:29] (03PS3) 10Dzahn: redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160585 [01:59:04] (03CR) 10Dzahn: "it's simply copied from kibana and contacts, not saying it's needed in this case but once it was added by somebody for monitoring the stat" [puppet] - 10https://gerrit.wikimedia.org/r/160528 (owner: 10Chmarkine) [02:02:47] (03PS4) 10Dzahn: redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160585 [02:02:51] (03CR) 10jenkins-bot: [V: 04-1] redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160585 (owner: 10Dzahn) [02:03:17] (03CR) 10Dzahn: [C: 04-2] "ok, that's it" [puppet] - 10https://gerrit.wikimedia.org/r/160585 (owner: 10Dzahn) [02:03:22] (03Abandoned) 10Dzahn: redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160585 (owner: 10Dzahn) [02:06:15] (03CR) 10Dzahn: "i'd measure how much time we spend on rebases of this kind" [puppet] - 10https://gerrit.wikimedia.org/r/160585 (owner: 10Dzahn) [02:09:25] (03PS1) 10Dzahn: redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160588 [02:10:50] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3610 MB (3% inode=99%): [02:16:58] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 1 failures [02:24:20] PROBLEM - puppet last run on mw1072 is CRITICAL: CRITICAL: Puppet has 1 failures [02:24:39] PROBLEM - puppet last run on mw1006 is CRITICAL: CRITICAL: Puppet has 1 failures [02:31:16] !log LocalisationUpdate completed (1.24wmf20) at 2014-09-16 02:31:16+00:00 [02:31:23] Logged the message, Master [02:35:09] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [02:38:02] (03CR) 10Springle: [C: 032] repool db1036, depool db1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160545 (owner: 10Springle) [02:38:46] (03Merged) 10jenkins-bot: repool db1036, depool db1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160545 (owner: 10Springle) [02:39:33] !log springle Synchronized wmf-config/db-eqiad.php: repool db1036, depool db1002 (duration: 00m 07s) [02:39:38] Logged the message, Master [02:42:29] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [02:42:43] (03PS1) 10Springle: depool s2 db1054, s3 db1027, s4 db1056, s5 db1037 for codfw cloning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160599 [02:43:58] (03CR) 10Springle: [C: 032] depool s2 db1054, s3 db1027, s4 db1056, s5 db1037 for codfw cloning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160599 (owner: 10Springle) [02:43:59] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [02:44:02] (03Merged) 10jenkins-bot: depool s2 db1054, s3 db1027, s4 db1056, s5 db1037 for codfw cloning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160599 (owner: 10Springle) [02:45:49] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: Puppet has 1 failures [02:50:50] !log springle Synchronized wmf-config/db-eqiad.php: depool s2 db1054, s3 db1027, s4 db1056, s5 db1037 for codfw cloning (duration: 01m 12s) [02:50:56] Logged the message, Master [02:52:48] (03Abandoned) 10Chmarkine: racktables - remove RewriteCond on /status [puppet] - 10https://gerrit.wikimedia.org/r/160528 (owner: 10Chmarkine) [02:52:58] PROBLEM - puppet last run on mw1087 is CRITICAL: CRITICAL: Puppet has 1 failures [02:53:52] !log xtrabackup clone db1054 to db2017 [02:53:58] Logged the message, Master [02:57:51] (03CR) 10Chmarkine: [C: 031] redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160588 (owner: 10Dzahn) [02:59:38] PROBLEM - puppet last run on mw1058 is CRITICAL: CRITICAL: Puppet has 1 failures [03:00:29] RECOVERY - Disk space on virt0 is OK: DISK OK [03:03:18] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [03:04:47] !log LocalisationUpdate completed (1.24wmf21) at 2014-09-16 03:04:46+00:00 [03:04:53] Logged the message, Master [03:05:57] (03CR) 10Tim Starling: [C: 031] HHVM: increase JitAColdSize to 60 MiB [puppet] - 10https://gerrit.wikimedia.org/r/160394 (owner: 10Ori.livneh) [03:09:37] (03CR) 10Tim Starling: [C: 031] HHVM: Increase the maximum number of open files to 65536 [puppet] - 10https://gerrit.wikimedia.org/r/160510 (owner: 10Hoo man) [03:11:28] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [03:11:39] !log xtrabackup clone db1027 to db2018 [03:11:46] Logged the message, Master [03:18:49] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [04:01:05] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 16 04:01:05 UTC 2014 (duration 1m 4s) [04:01:12] Logged the message, Master [04:21:15] (03CR) 10MZMcBride: "Broadly, long-term, I'm not sure we want to use "m." I'm wary of expanding its scope." [dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [05:31:34] !log xtrabackup clone db1056 to db2019 [05:31:40] Logged the message, Master [06:28:12] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Epic puppet fail [06:28:52] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Epic puppet fail [06:28:55] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Epic puppet fail [06:29:23] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:25] !log xtrabackup clone db1037 to db2023 [06:29:32] Logged the message, Master [06:29:53] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:03] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:03] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:03] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 3 failures [06:30:03] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:12] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:12] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:20] <_joe_> springle: when I see you logging this, I think about the old days when we did mysqldump | nc [06:30:24] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:33] <_joe_> and I DON'T regret them. At all [06:30:42] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:42] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:46] _joe_: nor do i [06:30:52] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:52] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:53] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:03] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:05] <_joe_> it's mod_passenger o'clock [06:32:10] _joe_: though i mysqldump to shake things up sometimes, jic xtrabackup bug appears [06:33:53] <_joe_> uhm fair enough [06:34:11] generally at least one slave per shard gets a clean dump/reload from a known clean source [06:34:25] then the others binary clone [06:35:56] <_joe_> fiber is here [06:35:59] <_joe_> bbl [06:45:02] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:45:23] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:45:24] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:45:32] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:45:42] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:02] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:12] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:46:22] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:22] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:46:52] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:47:03] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:47:22] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:47:33] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:48:02] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:53:12] RECOVERY - Disk space on ms1004 is OK: DISK OK [06:56:12] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=94%): /var/lib/ureadahead/debugfs 0 MB (0% inode=94%): [07:49:39] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Epic puppet fail [08:08:04] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [08:12:14] (03CR) 10Filippo Giunchedi: "sure we could do that, let's see what other reviewers think and I can add those too" [puppet] - 10https://gerrit.wikimedia.org/r/160419 (owner: 10Filippo Giunchedi) [08:23:00] (03CR) 10Filippo Giunchedi: [C: 031] "to be merged in two days" [puppet] - 10https://gerrit.wikimedia.org/r/160497 (owner: 10Matanya) [08:24:49] (03CR) 10Filippo Giunchedi: [C: 031] Add robots.txt rewrite rule where wiki is public [puppet] - 10https://gerrit.wikimedia.org/r/147487 (owner: 10Reedy) [08:49:03] (03CR) 10Hashar: [C: 04-1] contint: Ensure nodejs-legacy is installed (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/159226 (owner: 10Krinkle) [09:07:06] _joe_: hello! :-D [09:07:17] <_joe_> hashar: hey [09:07:23] adding an item for your review queue which is hhvm related https://gerrit.wikimedia.org/r/#/c/150813/ [09:07:41] i.e. use /usr/bin/apt-get build-dep hhvm to install hhvm build dependencies instead of a hardcoded list of packages hehe [09:08:09] not urgent, just wanna make sure it is sitting in your backlog [09:08:21] <_joe_> it is [09:08:29] <_joe_> but my backlog grows every day :/ [09:09:49] we are all in the same boat hehe [09:20:58] <_joe_> !log reimaging mw1018 and mw1021 w HAT: removing from pybal, etc. [09:21:05] Logged the message, Master [09:22:43] (03CR) 10Hashar: "On eqiad labs instance, /var/ is only 2GB and elastic search defaults to /var/lib/elasticsearch." [puppet] - 10https://gerrit.wikimedia.org/r/160524 (owner: 10Manybubbles) [09:25:00] i wonder if someone can run http://gerrit-review.googlesource.com/Documentation/cmd-gc.html for wikibase? [09:25:11] or whatever magic ^d did to 'compact' mediawiki [09:25:52] it takes awfully long time to clone wikibase, and thus composer install (and whatever jenkins does) [09:26:38] looks like we can do via ssh? [09:39:24] aude: it probably could use to be repacked [09:39:35] aude: a way to tell is to do a fresh clone of the repository [09:39:42] check the disk usage of the .git repository [09:39:46] then use git repack -ad [09:39:52] and compare the disk usage after [09:39:55] hashar: i can try [09:40:07] if there is a significant difference, definitely fill a bug to have the repo repacked on Gerrit side [09:40:19] though there might be a cronjob doing it automatically [09:40:32] also git pack-refs :D [09:40:36] suppose i have to ask ^d [09:40:45] wikimedia > git/gerrit [09:40:52] chad is unlikely to proceed it [09:41:07] he basically burned out maintaining Gerrit and I am pretty sure he is no more involved in Gerrit [09:41:11] i expect this would help with jenkins, as while normally jenkins caches git repos, not sure via composer [09:41:16] hashar: i see [09:41:34] on production slave, we clone locally from a gerrit replicate on the same disk [09:41:42] so git creates hardlink for the .git objects which is super fast [09:41:50] on labs slave, we do full clones from gerrit (iirc) [09:41:52] that should work with composer? [09:41:58] composer I have no clue [09:42:00] ah, full clone [09:42:11] addshore tweaked a few things a couple weeks ago [09:42:15] ok [09:42:18] and I noticed that composer was always doing a full clone [09:42:24] and not using any local cache apparently [09:42:30] :( [09:44:14] alright, repack, pack-refs and gc don't seem to help :( [09:44:31] maybe it already is done or done on cron [09:44:51] aude: what is the repo name ? [09:45:00] mediawiki-extensions-Wikibase [09:47:26] hashar: doing the things in https://www.rallydev.com/community/engineering/shrinking-git-repository-move-githubcom [09:47:34] went from 45M to 16M [09:47:43] don't know if any of those things are evil though [09:48:30] or if any are ok [09:49:29] PROBLEM - Host mw1021 is DOWN: CRITICAL - Plugin timed out after 15 seconds [09:49:39] PROBLEM - Host mw1018 is DOWN: CRITICAL - Plugin timed out after 15 seconds [09:51:21] <_joe_> fuck [09:51:28] <_joe_> I forgot to ack those [09:51:39] aude: yeah it is just 45M on a fresh clone [09:54:50] RECOVERY - Host mw1018 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [09:54:57] aude: git gc --aggressive does the trick [09:55:04] hashar: i see [09:55:08] aude: it spent more time analyzing the objects to produce a better compression [09:55:19] we can probably have that done [09:55:38] make jenkins more happy :) [09:56:01] aude: most of the time is spent in resolving the delta which is CPU bound :-D [09:56:05] yeah [09:56:15] or maybe not [09:56:43] git repack -A -d i snice [09:56:53] it moves out of the pack files any object which is not reacheable [09:56:57] so you can see them under .git/objects [09:57:02] that is known as 'loose' objects [09:57:07] they are reclaimed with git gc [09:57:18] * aude nods [09:58:10] https://bugzilla.wikimedia.org/show_bug.cgi?id=70883 [10:02:52] aude: note that my group "release engineering" might well take over maintenance of Gerrit :D [10:02:56] nothing happened yet afaik [10:03:16] git gc --aggressive indeed works [10:04:05] don't know how much there is to maintain? except restarting when it crashes and adding new repos [10:04:20] and things like gc occasionally [10:05:02] aude: git gc uses git repack [10:05:09] i see [10:05:19] RECOVERY - Host mw1021 is UP: PING OK - Packet loss = 0%, RTA = 0.86 ms [10:05:30] with --aggressive that pass to git repack the options --depth=250 --window=250 [10:05:35] which tweak the delta compression [10:05:47] ah [10:05:56] as for what are the best values to use, one would need a PhD in delta compression :] [10:06:34] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Epic puppet fail [10:06:36] you can read the man page of git-repack :-] [10:06:40] :) [10:07:02] it's up to us to do this for our github things [10:07:08] so probably should read [10:07:18] but i think wikibase is most problematic as it's biggest and oldest [10:09:49] heavily changed ones can probably benefit from a recompression as well [10:09:56] mobilefrontend / visualeditor / flow etc [10:10:04] * aude nods [10:10:43] hashar: I also fixed the androidsdk on trusty bug, btw :) [10:11:01] \O/ [10:11:16] YuviPanda: I think the puppet class is no more applied on Trusty instances [10:11:33] Iirc I added a if ubuntu_version ( <= Trusty ) or something like that [10:11:44] hashar: yeah, but we have it applied on toollabs and we have a test trusty instance, and once I setup monitoring for puppet failures it was frustrating so I went and fixed it :) [10:12:13] ahhh [10:15:27] hashar: re: diamond, I've landed a few patches in Diamond itself and we have some custom collectors, it's fairly easy to do :) [10:16:33] YuviPanda: yeah just replied on https://bugzilla.wikimedia.org/show_bug.cgi?id=70862 [10:16:34] hashar: I'm currently building an icinga replacement for labs (shinken) [10:16:41] basically got too many things to do :-D [10:16:49] hashar: I'll put it on my 'spare bandwidth' todo :) [10:17:02] good to know you are going to work on shinken [10:17:41] :) [10:18:35] (03PS1) 10Giuseppe Lavagetto: mediawiki: make HAT appservers a separate cluster [puppet] - 10https://gerrit.wikimedia.org/r/160624 [10:18:40] hashar: feel free to file more bugs for things you think should be monitored. I'll work through them :) [10:18:59] will do! [10:19:09] hashar: ty [10:25:40] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:47:18] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I need to add the corresponding config stanzas to the ganglia class, and probably add a couple more aggregators." [puppet] - 10https://gerrit.wikimedia.org/r/160624 (owner: 10Giuseppe Lavagetto) [10:57:21] (03PS1) 10Yuvipanda: [WIP] Initial shinken setup for labs [puppet] - 10https://gerrit.wikimedia.org/r/160626 [10:58:03] (03PS1) 10Matanya: ssh: disable root login [puppet] - 10https://gerrit.wikimedia.org/r/160628 [10:59:13] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Initial shinken setup for labs [puppet] - 10https://gerrit.wikimedia.org/r/160626 (owner: 10Yuvipanda) [10:59:30] (03PS2) 10Yuvipanda: [WIP] Initial shinken setup for labs [puppet] - 10https://gerrit.wikimedia.org/r/160626 [11:00:29] (03CR) 10Matanya: "This change is in the light of pushing all access via sudo without direct root login." [puppet] - 10https://gerrit.wikimedia.org/r/160628 (owner: 10Matanya) [11:04:07] (03PS3) 10Yuvipanda: [WIP] Initial shinken setup for labs [puppet] - 10https://gerrit.wikimedia.org/r/160626 [11:09:00] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:09:17] (03CR) 10Alexandros Kosiaris: [C: 04-1] "In all the time since the I 've only used root login for graceful-apache-all and a one-off rsync. We discussed with Giuseppe and Ori a pl" [puppet] - 10https://gerrit.wikimedia.org/r/160628 (owner: 10Matanya) [11:10:34] !log reedy Purged l10n cache for 1.24wmf15 [11:10:39] Logged the message, Master [11:12:08] !log reedy Purged l10n cache for 1.24wmf18 [11:12:13] Logged the message, Master [11:12:45] !log reedy Purged l10n cache for 1.24wmf19 [11:12:50] Logged the message, Master [11:21:31] (03PS1) 10Reedy: Fix MEDIAWIKI_STAGING_DIR to point to /srv/mediawiki-staging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160629 [11:22:36] (03CR) 10Reedy: [C: 032] Fix MEDIAWIKI_STAGING_DIR to point to /srv/mediawiki-staging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160629 (owner: 10Reedy) [11:22:40] (03Merged) 10jenkins-bot: Fix MEDIAWIKI_STAGING_DIR to point to /srv/mediawiki-staging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160629 (owner: 10Reedy) [11:23:40] (03PS1) 10Reedy: Remove 1.24wmf10 through 1.24wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160630 [11:24:00] (03CR) 10Reedy: [C: 032] Remove 1.24wmf10 through 1.24wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160630 (owner: 10Reedy) [11:24:05] (03Merged) 10jenkins-bot: Remove 1.24wmf10 through 1.24wmf14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160630 (owner: 10Reedy) [11:25:04] !log reedy Synchronized docroot and w: (no message) (duration: 00m 35s) [11:25:09] Logged the message, Master [11:37:50] PROBLEM - RAID on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:37:50] PROBLEM - puppet last run on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:38:40] RECOVERY - puppet last run on fenari is OK: OK: Puppet is currently enabled, last run 1270 seconds ago with 0 failures [11:38:40] RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [11:39:13] (03PS4) 10Yuvipanda: [WIP] Initial shinken setup for labs [puppet] - 10https://gerrit.wikimedia.org/r/160626 [11:43:58] (03CR) 10Alexandros Kosiaris: [C: 032] Ensure cron absent for misc::nfs-server::home::backup [puppet] - 10https://gerrit.wikimedia.org/r/159716 (owner: 10Alexandros Kosiaris) [11:46:22] (03PS1) 10Filippo Giunchedi: partially enable outbound SMTP STARTTLS support [puppet] - 10https://gerrit.wikimedia.org/r/160632 [11:48:24] (03PS1) 10Yuvipanda: phabricator: Set base-uri to https for labs [puppet] - 10https://gerrit.wikimedia.org/r/160633 [11:48:32] godog: akosiaris_ want to merge a trivial patch? ^ [11:48:47] will unblock quim on the phabricator migration [11:50:24] (03PS2) 10Filippo Giunchedi: partially enable outbound SMTP STARTTLS support [puppet] - 10https://gerrit.wikimedia.org/r/160632 [11:50:30] YuviPanda: sure, 160633 ? [11:50:36] godog: ya [11:51:10] (03PS3) 10Nemo bis: partially enable outbound SMTP STARTTLS support [puppet] - 10https://gerrit.wikimedia.org/r/160632 (owner: 10Filippo Giunchedi) [11:51:13] (03CR) 10Filippo Giunchedi: [C: 031] phabricator: Set base-uri to https for labs [puppet] - 10https://gerrit.wikimedia.org/r/160633 (owner: 10Yuvipanda) [11:51:18] godog: I've it hacked in locally right now, and it works (http://phab-01.wmflabs.org/config/issue/ vs https://phab-01.wmflabs.org/config/issue/) [11:51:19] YuviPanda: LGTM! [11:51:27] godog: can you +2 as well? I can't :) [11:53:01] YuviPanda: sure didn't realize it :) I can merge too if you wish [11:53:12] godog: yes please! [11:53:24] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] phabricator: Set base-uri to https for labs [puppet] - 10https://gerrit.wikimedia.org/r/160633 (owner: 10Yuvipanda) [11:53:29] godog: no +2 for me until I formally move to ops in november, so I've to bug people until then [11:53:59] YuviPanda: haha fair enough! {{done}} [11:54:05] godog: cool, thanks! [11:54:14] yw [12:03:19] jeremyb: maint-announce already over like https://rt.wikimedia.org/Ticket/Display.html?id=8310 are safe to be resolved, correct? [12:05:02] hoo: any news/update on https://rt.wikimedia.org/Ticket/Display.html?id=8286 ? [12:05:21] godog: I can'T see it anyway, can I? [12:06:14] ah, I've got mail... I see [12:06:55] godog: Can I just reply to the mail or will that be rejected by RT? [12:07:17] hoo: yep replying to the email should be fine [12:07:40] tired of having to name the reasons again [12:07:50] maybe I have it somehwere still [12:11:30] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:13:24] godog: Replied [12:17:04] (03PS5) 10Yuvipanda: [WIP] Initial shinken setup for labs [puppet] - 10https://gerrit.wikimedia.org/r/160626 [12:20:58] !log reedy Synchronized php-1.24wmf20/LocalSettings.php: Fix path to be /srv based (duration: 00m 32s) [12:21:04] Logged the message, Master [12:22:23] !log temporarily chgrp wikidev /var/log/hhvm/error.log on mw1018 [12:22:26] Reedy: ^ [12:22:27] Logged the message, Master [12:22:45] thanks :) [12:23:01] yw [12:23:34] Shame nothing else is being added to it :) [12:23:49] PROBLEM - very high load average likely xfs on ms-be1014 is CRITICAL: CRITICAL - load average: 221.91, 111.26, 53.76 [12:24:02] hmmm [12:24:09] godog: ^ [12:24:15] akosiaris_: system works! [12:24:57] Server: Apache [12:24:58] X-Powered-By: HHVM/3.3.0-dev [12:26:23] akosiaris_: yep there's a deadlock in xfs somewhere (my best guess so far) [12:26:24] I guess apache is serving the 404 [12:26:46] !log reboot ms-be1014, xfs issues [12:26:51] Logged the message, Master [12:27:09] godog: yeah, same here. Good to know we can catch that finally [12:27:20] godog: Mind temporarily chgrp-ing the apache2.log/apache2 folder too please? [12:28:34] Reedy: sure, done [12:28:39] PROBLEM - swift-account-auditor on ms-be1014 is CRITICAL: Connection refused by host [12:28:40] PROBLEM - swift-container-auditor on ms-be1014 is CRITICAL: Connection refused by host [12:28:40] PROBLEM - swift-account-replicator on ms-be1014 is CRITICAL: Connection refused by host [12:28:43] (03PS6) 10Yuvipanda: [WIP] Initial shinken setup for labs [puppet] - 10https://gerrit.wikimedia.org/r/160626 [12:29:10] PROBLEM - swift-object-updater on ms-be1014 is CRITICAL: Connection refused by host [12:29:10] PROBLEM - SSH on ms-be1014 is CRITICAL: Connection refused [12:29:12] akosiaris_: yup, very crude check but at least it is in place, if it works out we could even make them reboot themselves [12:29:25] * YuviPanda sets up cron job to rebbot things every 30m [12:29:29] PROBLEM - swift-account-reaper on ms-be1014 is CRITICAL: Connection refused by host [12:29:30] PROBLEM - swift-object-replicator on ms-be1014 is CRITICAL: Connection refused by host [12:29:49] YuviPanda: https://gerrit.wikimedia.org/r/#/c/124861/ might also be worth looking. In very early stages but something useful might be in there [12:30:00] PROBLEM - swift-object-auditor on ms-be1014 is CRITICAL: Connection refused by host [12:30:00] PROBLEM - swift-container-updater on ms-be1014 is CRITICAL: Connection refused by host [12:30:01] Nothing exctiing there either [12:30:10] PROBLEM - check if dhclient is running on ms-be1014 is CRITICAL: Connection refused by host [12:30:16] akosiaris_: aaah, coool [12:30:19] didn't know about it [12:30:52] akosiaris_: I'll steal some. [12:31:18] akosiaris_: with my change, I'm trying to 'quickly' get it to a point where it can monitor the current stuff icinga monitors for labs (check graphite and also a check http), and then go from there [12:31:36] because I haven't separated out the deamons, it won't scale as well, so I'll perhaps steal that from yours [12:32:14] YuviPanda: yeah ok, different directions then. I was looking at the daemon architectural level in order for it to scale [12:32:24] yeah, I agree [12:32:26] which is why we are looking at shinken in the first place anyway [12:32:31] right [12:32:44] akosiaris_: I was gonna do that after, but I'll include that in the first patch shortly [12:32:52] akosiaris_: for labs I suppose all deamons will still run on one host [12:32:55] for now at least [12:33:06] (03PS1) 10Springle: repool s2 db1054, s3 db1027, s4 db1056, s5 db1037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160639 [12:34:12] (03CR) 10Springle: [C: 032] repool s2 db1054, s3 db1027, s4 db1056, s5 db1037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160639 (owner: 10Springle) [12:34:16] (03Merged) 10jenkins-bot: repool s2 db1054, s3 db1027, s4 db1056, s5 db1037 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160639 (owner: 10Springle) [12:35:17] !log springle Synchronized wmf-config/db-eqiad.php: repool s2 db1054, s3 db1027, s4 db1056, s5 db1037 (duration: 00m 10s) [12:35:22] Logged the message, Master [12:35:40] YuviPanda: yeah ok [12:38:54] anyone logged in ms-be1014 console? getting "no more sessions" [12:40:07] <_joe_> godog: racadm reset? [12:40:46] should have been more specific, not console but idrac ssh itself [12:41:06] <_joe_> oh [12:44:20] PROBLEM - swift eqiad-prod container availability on tungsten is CRITICAL: CRITICAL: 10.34% of data under the critical threshold [96.0] [12:47:21] akosiaris_: it seems to be using mongo for a bunch of things [12:47:25] hmm [12:49:28] <_joe_> mongo? [12:49:32] <_joe_> WAT? [12:49:34] ino [12:50:02] > The WebUI use mongodb to store all user preferences, dashboards and other information. [12:50:22] <_joe_> oh ok [12:50:32] <_joe_> I guess mongo can handle that load [12:50:41] ya [12:50:48] <_joe_> as long as no one writes while another one reads [12:51:20] not to mention distance of mars relative to jupiter? [12:53:34] mh perhaps robh knows how to fix 'No more sessions are available for this type of connection!' on idrac ssh even when seemingly people are not connected [12:58:06] (03PS1) 10Reedy: Update symlinks to use /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160640 [12:58:14] _joe_: ^^ ;) [13:02:43] _joe_: tested on mw1018 [13:02:47] sudo -u mwdeploy ln -nsf $(readlink /srv/mediawiki/docroot/wikipedia.org/w | sed s,/usr/local/apache/common,/srv/mediawiki,) $(echo /srv/mediawiki/docroot/wikipedia.org/w | sed s,/usr/local/apache/common/,/srv/mediawiki,) [13:02:47] sudo -u mwdeploy ln -nsf $(readlink /srv/mediawiki/docroot/wikipedia.org/images | sed s,/usr/local/apache/common,/srv/mediawiki,) $(echo /srv/mediawiki/docroot/wikipedia.org/images | sed s,/usr/local/apache/common/,/srv/mediawiki,) [13:02:55] http://en.wikipedia.org/wiki/Special:Version [13:02:56] mw1018 200 OK 128284 [13:02:56] mw1021 404 Not Found [13:03:05] <_joe_> :))) [13:03:55] <_joe_> Reedy: https://en.wikiversity.org/wiki/Wikiversity:Main_Page * 403 Forbidden [13:03:59] <_joe_> :/ [13:04:28] Where's that? [13:04:35] I only fixed Wikipedia on mw1018 fyi [13:04:40] <_joe_> ok [13:04:53] <_joe_> shouldn't that be done not-by-hand? [13:05:56] yeah, which is what https://gerrit.wikimedia.org/r/160640 is for [13:06:43] review, merge, git pull, sync-docroot [13:06:54] <_joe_> :) ok [13:07:09] <_joe_> I'd be careful in doing that anyways [13:07:32] akosiaris_: packaging this feels like it's going to be a bit of a nightmare. [13:07:37] are some servers in weird inconsistent states? [13:12:48] YuviPanda: packaging what ? [13:13:11] a just read backlog [13:25:16] godog: i guess so. i was just cleaning up the mess. :) [13:28:37] jeremyb: haha indeed, thanks! [13:29:17] <_joe_> you realize truly how awesome jeremy is when you're on RT duty :) [13:37:31] PROBLEM - RAID on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:38:55] akosiaris_: oh, shinken [13:40:20] YuviPanda: you probably don't need to shinken ui btw [13:40:45] it is supposedly backwards compatible with icinga so it should be able to load all the configuration [13:41:39] PROBLEM - puppet last run on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:42:40] RECOVERY - puppet last run on fenari is OK: OK: Puppet is currently enabled, last run 911 seconds ago with 0 failures [13:43:44] (03PS8) 10Physikerwelt: Introducing Service Cluster A, hosting mathoid [puppet] - 10https://gerrit.wikimedia.org/r/156576 (https://bugzilla.wikimedia.org/69990) [13:45:52] PROBLEM - puppet last run on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:46:17] (03CR) 10Giuseppe Lavagetto: [C: 031] "this patch would fix the 404 on newly reinstalled mediawiki appservers." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160640 (owner: 10Reedy) [13:47:51] (03PS1) 10Springle: Current state of logging/revision partitioning on s1 and s2. [software] - 10https://gerrit.wikimedia.org/r/160653 [13:49:05] RECOVERY - puppet last run on fenari is OK: OK: Puppet is currently enabled, last run 1286 seconds ago with 0 failures [13:49:05] RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [13:49:14] (03PS3) 10Giuseppe Lavagetto: varnish: add comment to avoid future pitfalls [puppet] - 10https://gerrit.wikimedia.org/r/159294 [13:50:37] akosiaris_: true, but labs has nothing atm anyway, so am setting it up from scratch [13:50:52] akosiaris_: might as well use this? plus it'll be properly puppetized, etc [13:50:58] (03CR) 10Ottomata: "Cool! But, let's do them in a separate change, to keep this simpler, eh?" [puppet] - 10https://gerrit.wikimedia.org/r/160419 (owner: 10Filippo Giunchedi) [13:51:55] YuviPanda: well if it is better we could. But you are talking mongodb right ? [13:51:59] PROBLEM - RAID on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:51:59] PROBLEM - puppet last run on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:52:08] (03PS4) 10Giuseppe Lavagetto: varnish: add comment to avoid future pitfalls [puppet] - 10https://gerrit.wikimedia.org/r/159294 [13:52:09] akosiaris_: sadly, yeah. we do have mongodb in prod elsewhere, though [13:52:54] <_joe_> akosiaris_: well I hope it's configured with the --tell-me-write-is-ok-when-its-buffered-in-the-client [13:52:58] the eventlogging thing ? [13:53:10] <_joe_> (which is usually abbreviated to --omfg-webscale) [13:53:13] akosiaris_: hmm, that was turned off, it's running on one of the stat machines, I think [13:53:51] (03PS1) 10Springle: Make final ANALYZE step configurable for online schema changes. [software] - 10https://gerrit.wikimedia.org/r/160654 [13:53:56] YuviPanda: you just hurt you own case [13:53:57] matanya: y u want revert? [13:53:59] https://gerrit.wikimedia.org/r/#/c/160544/ [13:54:02] akosiaris_: heh [13:54:27] either way, I'm right now still at the 'play around, see how it goes' stage [13:54:49] (03CR) 10Springle: [C: 032] Make final ANALYZE step configurable for online schema changes. [software] - 10https://gerrit.wikimedia.org/r/160654 (owner: 10Springle) [13:55:00] PROBLEM - SSH on fenari is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:56:00] PROBLEM - check configured eth on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:56:39] (03PS2) 10Springle: Current state of logging/revision partitioning on s1 and s2. [software] - 10https://gerrit.wikimedia.org/r/160653 [13:57:04] RECOVERY - check configured eth on fenari is OK: NRPE: Unable to read output [13:57:05] RECOVERY - SSH on fenari is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [13:57:05] RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [13:57:05] RECOVERY - puppet last run on fenari is OK: OK: Puppet is currently enabled, last run 1778 seconds ago with 0 failures [13:57:33] (03CR) 10Giuseppe Lavagetto: [C: 032] varnish: add comment to avoid future pitfalls [puppet] - 10https://gerrit.wikimedia.org/r/159294 (owner: 10Giuseppe Lavagetto) [13:57:38] (03PS3) 10Springle: Current state of logging/revision partitioning on s1 and s2. [software] - 10https://gerrit.wikimedia.org/r/160653 [13:57:42] marktraceur and anomie: I can do swat this morning! [13:57:47] Yay! [13:57:51] manybubbles: Ok! [13:57:51] It's in an hour? [13:57:56] yeah [13:58:19] <_joe_> win 36 [13:58:32] its VE and wikidata who both always make their own submodule updates (yay!) [13:58:44] PROBLEM - HTTP on fenari is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:49] _joe_: way too many windows :) [13:59:04] <_joe_> springle: that is #mediawiki-core [13:59:12] <_joe_> springle: I have 45 [13:59:19] <_joe_> because I restarted irssi yesterday [13:59:23] <_joe_> I had like 100 [13:59:29] :| [14:00:12] (03PS4) 10Springle: Current state of logging/revision partitioning on s1 and s2. [software] - 10https://gerrit.wikimedia.org/r/160653 [14:00:14] PROBLEM - RAID on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:00:33] RECOVERY - HTTP on fenari is OK: HTTP OK: HTTP/1.1 200 OK - 4775 bytes in 0.955 second response time [14:01:13] RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [14:01:28] (03CR) 10Springle: [C: 032] Current state of logging/revision partitioning on s1 and s2. [software] - 10https://gerrit.wikimedia.org/r/160653 (owner: 10Springle) [14:03:43] PROBLEM - HTTP on fenari is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:44] akosiaris_: it doesn't need mongodb if you don't want to have per-user changes and stuff, I think [14:04:24] PROBLEM - RAID on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:04:24] PROBLEM - puppet last run on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:05:23] RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [14:06:34] PROBLEM - DPKG on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:06:40] YuviPanda: that's comforting [14:07:24] PROBLEM - check configured eth on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:07:33] RECOVERY - DPKG on fenari is OK: All packages OK [14:08:13] RECOVERY - check configured eth on fenari is OK: NRPE: Unable to read output [14:08:25] PROBLEM - RAID on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:10:04] fenari in swap... looking [14:10:43] PROBLEM - DPKG on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:11:14] RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [14:11:43] RECOVERY - DPKG on fenari is OK: All packages OK [14:11:59] !log stopped apache on fenari . It was in swap, investigating [14:12:04] Logged the message, Master [14:14:08] _joe_: heh :) [14:15:26] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00334448160535 [14:17:38] RECOVERY - puppet last run on fenari is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [14:17:59] RECOVERY - HTTP on fenari is OK: HTTP OK: HTTP/1.1 200 OK - 4775 bytes in 0.071 second response time [14:19:01] alt-j o /win 24 [14:19:04] nope [14:19:29] https://ganglia.wikimedia.org/latest/graph.php?r=2hr&z=xlarge&h=fenari.wikimedia.org&m=cpu_report&s=descending&mc=2&g=mem_report&c=Miscellaneous+pmtpa [14:19:41] apache was badly leaking here [14:19:53] https://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&h=fenari.wikimedia.org&m=cpu_report&s=descending&mc=2&g=mem_report&c=Miscellaneous+pmtpa [14:20:05] since the 12th and then it peaked 1 hour ago [14:20:17] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [14:20:19] it started climbing 1 hour ago I mean [14:20:55] !log restarted apache on fenari , it was leaking memory, situation back to normal, cause unknown yet [14:21:00] Logged the message, Master [14:22:22] (03CR) 10Ottomata: "Hm, why?" [puppet] - 10https://gerrit.wikimedia.org/r/160544 (owner: 10Matanya) [14:22:50] ottomata: https://bugzilla.wikimedia.org/show_bug.cgi?id=70858 [14:24:05] huh [14:24:10] that is inside of zookeeper::server [14:24:31] then ignore and abandon [14:25:15] andrewbogott: hmm, for 'list of hosts to monitor' (for shinken), I suppose I should write a script that hits the wikitech API and maintains it on a per-project basis... [14:25:21] hokkkaaay [14:25:34] YuviPanda: would be nice :) [14:25:44] (03Abandoned) 10Ottomata: Revert "Need network::constants to render ferm defs.conf" [puppet] - 10https://gerrit.wikimedia.org/r/160544 (owner: 10Matanya) [14:25:50] andrewbogott: can't hit a direct OpenStack API, can I? [14:26:02] YuviPanda: is that running inside labs or on a prod box? [14:26:07] labmon1001, so prod box [14:26:13] but I'll have to test it in labs [14:27:06] labmon1001 can probably access the openstack API. [14:27:17] RECOVERY - Disk space on ms1004 is OK: DISK OK [14:27:18] hmm [14:27:22] would it be very different? [14:27:27] andrewbogott: is the wikitech API documented someplace? [14:27:35] * YuviPanda hits api.php [14:28:16] YuviPanda: Wikitech doesn't have an API that's any different from the normal MW api [14:28:33] yeah, but will add methods/actions to the mw api [14:28:33] found it [14:28:37] !log Jenkins: breaking continuous integration for MediaWiki repositories. Extensions are now tested with mediawiki/vendor and, mediawiki/core is checked out to the patch branch if it exist. {{gerrit|160656}} [14:28:42] Logged the message, Master [14:29:51] mh there's HTCPPurger running on ms1004 and it filled its disk, I'm assuming it is safe to stop? [14:30:18] PROBLEM - swift-object-server on ms-be1014 is CRITICAL: Connection refused by host [14:30:18] PROBLEM - swift-container-server on ms-be1014 is CRITICAL: Connection refused by host [14:30:18] PROBLEM - Disk space on ms-be1014 is CRITICAL: Connection refused by host [14:30:27] PROBLEM - swift-container-replicator on ms-be1014 is CRITICAL: Connection refused by host [14:30:27] PROBLEM - DPKG on ms-be1014 is CRITICAL: Connection refused by host [14:30:27] PROBLEM - puppet last run on ms-be1014 is CRITICAL: Connection refused by host [14:30:36] PROBLEM - check configured eth on ms-be1014 is CRITICAL: Connection refused by host [14:30:36] PROBLEM - swift-account-server on ms-be1014 is CRITICAL: Connection refused by host [14:30:37] PROBLEM - RAID on ms-be1014 is CRITICAL: Connection refused by host [14:30:44] andrewbogott: uh oh, there's no API action to list instances in a project?! [14:31:07] YuviPanda: I'm confused :) [14:31:11] You're talking about the MW api, right? [14:31:14] andrewbogott: yes [14:31:17] Why would there be? [14:31:25] sure there must be some smw magic for that? [14:31:25] It is useful! :) [14:31:26] hence [14:31:32] oh god no SMW [14:31:36] :D [14:31:45] 'Wikitech doesn't have an API that's any different from the normal MW api' [14:32:03] But you can query openstack for things like that, probably... [14:32:03] !log silenced ms-be1014 until torrow, pending forced reboot [14:32:06] although not from within labs [14:32:07] https://wikitech.wikimedia.org/w/api.php?action=ask&query=[[Modification%20date::%2B]]|%3FModification%20date|sort%3DModification%20date|order%3Ddesc [14:32:09] Logged the message, Master [14:32:10] e.g. [14:32:42] * aude no idea how to use that [14:32:46] yeah, you can use mw api and smw queries probably. That… might be useful? [14:33:34] andrewbogott: aude I think... I'll write a patch to OSM that exposes the list of instances per project [14:33:47] andrewbogott: aude there's mw API calls to reboot, delete an instance, for example [14:33:51] or list members of a project [14:33:58] Really? News to me :) [14:34:00] i see [14:34:03] sounds good [14:34:06] hahah [14:34:08] action=novainstance [14:34:12] https://wikitech.wikimedia.org/wiki/Special:ApiSandbox#action=novaprojects&format=json&subaction=getall&project=quarry [14:34:14] and some others [14:34:14] I don't think it'll do any harm to reveal instance lists. [14:34:20] Since you can browse to that anyway via smw [14:34:29] yeah [14:34:36] andrewbogott: want to write the patch? :D [14:34:41] In theory that's information held securely by OpenStack but clearly we've decided it's not secret. [14:35:10] YuviPanda: if you want to rough in the api wrappings I can fill in the query bits. [14:36:48] !log stopped htcp-purger on ms1004 RT #8358 [14:36:53] Logged the message, Master [14:37:17] akosiaris_: What's the status about Service Cluster A [14:37:27] ? [14:39:54] bd808: I have new wikitech issues -- non-urgent, but could use a hand when you have a few minutes. A constellation of issues probably all resulting from https://bugzilla.wikimedia.org/show_bug.cgi?id=70882 [14:40:40] (03PS6) 10Giuseppe Lavagetto: puppet: hiera backend for the WMF [puppet] - 10https://gerrit.wikimedia.org/r/151869 [14:41:24] (03PS7) 10Giuseppe Lavagetto: puppet: hiera backend for the WMF [puppet] - 10https://gerrit.wikimedia.org/r/151869 [14:42:31] <_joe_> merging this ^^ andrewbogott: I'll keep an eye on labs, but if all hell breaks loose just revert it [14:42:44] <_joe_> that is if that happens after I'm gone [14:43:00] ok, I'll keep an eye out as well [14:43:10] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: hiera backend for the WMF [puppet] - 10https://gerrit.wikimedia.org/r/151869 (owner: 10Giuseppe Lavagetto) [14:45:46] physikerwelt: will be up and running today [14:46:06] physikerwelt: well scheduled for today at least [14:46:36] (03PS1) 10Mark Bergsma: Add a temp router for migrating some (disabled) LDAP accounts to Google Apps [puppet] - 10https://gerrit.wikimedia.org/r/160661 [14:46:39] akosiaris_: That's great news [14:47:27] (03PS2) 10Mark Bergsma: Add a temp router for migrating some (disabled) LDAP accounts to Google Apps [puppet] - 10https://gerrit.wikimedia.org/r/160661 [14:49:35] <_joe_> andrewbogott: virt1000 seems healthy [14:50:16] yeah, lemme make sure a labs instance can still complete a puppet run... [14:51:05] <_joe_> I am doing it as well [14:52:00] <_joe_> and I chose one where puppet is failing since forever [14:52:06] <_joe_> good choice, giuseppe [14:52:07] (03PS1) 10Steinsplitter: Enable flow on outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160663 [14:52:23] everything looks good to me [14:52:34] <_joe_> ok great [14:52:46] <_joe_> I'd say, we keep an eye on virt1000's health [14:52:53] !log set vm.dirty_expire_centisecs to 10000 (was 30000) on analytics1021 to experiment with paging and kafka-zookeeper timeouts [14:52:59] Logged the message, Master [14:53:03] quiddity: https://gerrit.wikimedia.org/r/160663 [14:53:08] <_joe_> if everything's good, tomorrow I'll add the hiera.yaml on production [14:53:36] (_joe_: WOOO) [14:54:57] andrewbogott: Hmm... bug 70882 sounds like we have some config missing for WikiMap [14:55:24] * andrewbogott has never heard of wikimap [14:55:42] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 5244.17747124 [14:56:06] It's a class that can tell you how to link to various sites in a wiki farm if you know the db name [14:56:07] bd808: where is it getting the 'labs' for 'labs.wikimedia.org'? Is it just truncating 'labswiki'? [14:56:17] That would be my guess [14:56:33] (03PS2) 10Steinsplitter: Enable Flow on outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160663 [14:57:25] manybubbles: we are skipping swat today [14:57:28] not ready [14:57:56] aude: k! [14:58:03] jenkins says no [14:58:26] James_F|Away: around for swat? [14:58:28] aude: sorry! [14:58:53] moved us to tomorrow :) [15:00:04] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140916T1500). [15:00:20] (03Abandoned) 10Giuseppe Lavagetto: mediawiki: use HHVM everywhere on HAT appservers [puppet] - 10https://gerrit.wikimedia.org/r/156303 (owner: 10Giuseppe Lavagetto) [15:00:38] bd808: I'm grepping the config and don't see anything that looks promising… any idea where the mapping lives? [15:00:54] aude: thanks. sorry about jenkins! [15:01:01] James_F: ready? [15:01:14] np [15:01:26] manybubbles: Hey, yes. [15:01:30] manybubbles: Yes. [15:01:52] andrewbogott: My first guess was MWMultiversion but it seems to have an override for default logic -- https://github.com/wikimedia/operations-mediawiki-config/blob/master/multiversion/MWMultiVersion.php#L144 [15:04:11] Oh, that's the fqdn to dbname mapping, not the other way around... [15:04:58] godog: cmjohnson1: eqiuinix ticket made yet? RT 8334 [15:05:47] jeremyb: not AFAIK [15:06:08] well supposedly that's really important. or it will be refused [15:07:55] true, cmjohnson1 or robh might know more how to open equinix inbound shipment tickets [15:08:10] !log manybubbles Synchronized php-1.24wmf21/extensions/VisualEditor/: SWAT visual editor update wmf21 (duration: 00m 07s) [15:08:15] Logged the message, Master [15:08:16] James_F: ^^^ [15:08:34] Ta. [15:08:36] * James_F tests. [15:08:46] i can open a ticket....rt 8334 correct? [15:09:07] manybubbles: Yup, works like a charm. Go for wmf20. [15:09:18] cmjohnson1: yep 8334 [15:12:40] andrewbogott: I think you need to add a labswiki => '//wikitech...' mapping in https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L1306 [15:12:58] ok! Stay tuned... [15:13:11] !log Jenkins: mediawiki extensions phpunit jobs should pass more or less until the CI system is sent an orbit and dies out horribly. in such a case ping me / phone. [15:13:17] Logged the message, Master [15:13:18] bd808: AHHHHHH [15:13:34] bd808: I have made mediawiki extensions to run the tests using mediawiki/vendor :-]]]] [15:13:49] * bd808 hugs hashae [15:13:54] *hashar [15:13:55] bd808: last thing to handle is migrating the extensions qunit jobs to use vendor as well [15:14:08] side effect, a patch proposed on a REL branch will be tested using the mediawiki/core REL branch :D [15:14:15] Did you fix the slow cloning problem? [15:14:19] no [15:14:38] well baby steps :) [15:14:40] in like 2 weeks at most, gating a change will take up to 15 minutes [15:14:58] but I think I got a good work around. the whole clone is only made once [15:15:03] when the job is first run [15:15:06] <_joe_> bd808: https://gerrit.wikimedia.org/r/#/c/160640/ do you see any problem with this? [15:15:11] that is all :] [15:15:11] (03PS1) 10Andrew Bogott: Add a mapping for wikitech to wgCanonicalServer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160667 [15:15:30] bd808: ^ ? [15:15:58] !log manybubbles Synchronized php-1.24wmf20/extensions/VisualEditor/: swat update for wmf20 (duration: 00m 25s) [15:16:00] James_F: ^^^^^ [15:16:03] Logged the message, Master [15:17:50] Thanks manybubbles. Testing now. [15:17:57] (03CR) 10BryanDavis: [C: 031] "The only thing that would be better would be to use relative symlinks instead of absolute so that the next time we rearrange the clone dir" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160640 (owner: 10Reedy) [15:18:23] (03CR) 10BryanDavis: [C: 031] Add a mapping for wikitech to wgCanonicalServer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160667 (owner: 10Andrew Bogott) [15:18:53] (03PS2) 10BryanDavis: Add a mapping for wikitech to wgCanonicalServer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160667 (https://bugzilla.wikimedia.org/70882) (owner: 10Andrew Bogott) [15:20:22] manybubbles: Seems fine, I think. Thanks! [15:20:30] James_F: sweet! [15:20:51] !log SWAT complete [15:20:57] Logged the message, Master [15:21:14] bd808: Is the proper procedure to schedule that for a swat? Or should I just merge and sync? [15:21:49] andrewbogott: Since manybubbles just finished swat I think you can merge and sync [15:21:55] ok [15:22:03] (03CR) 10Andrew Bogott: [C: 032] Add a mapping for wikitech to wgCanonicalServer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160667 (https://bugzilla.wikimedia.org/70882) (owner: 10Andrew Bogott) [15:22:13] oops, I clicked 'submit' by mistake [15:22:35] andrewbogott: yeah - normally you schedule the SWAT but its probably ok to just go now - we certainly have time [15:22:36] That change may be a red herring. I see we already had wgServer set. [15:23:24] !log andrew Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 19s) [15:23:30] Logged the message, Master [15:24:33] um… ok, /a is missing, where should I be doing my rebase? [15:25:10] andrewbogott: /srv/mediawiki-staging is the new /a/common [15:25:13] 'k [15:25:41] We like to change things just to keep everyone on their toes :) [15:25:54] Also /a is gone and that is awesome [15:25:54] !log andrew Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 03s) [15:26:00] Logged the message, Master [15:26:27] <_joe_> bd808: not quite btw [15:26:57] _joe_: It's at least not on tin anymore [15:27:21] <_joe_> bd808: we need to fix symlinks in docroots btw [15:28:06] Ori didn't get them all then I guess [15:29:08] <_joe_> bd808: he didn't got any of those [15:29:11] Some of those symlinks seem like they should die too. p, live-1.5, probably some others [15:29:26] I think we tried before [15:29:34] <_joe_> anyways, I guess we could go on and do this [15:29:38] <_joe_> one thing at a time [15:29:40] Then found they were used in weird 3rd party places where exposed via bits and stuff [15:29:43] <_joe_> as ops love [15:29:45] <_joe_> :) [15:29:56] _joe_: first this, second REMOVE MOST OF TEH STUPID DOCROOT FOLDERS [15:30:03] <_joe_> Reedy: for that [15:30:10] <_joe_> I'm preparing "webtest" [15:30:29] I had a quick look when you linked it yesterday [15:30:31] <_joe_> I am tired of doing seemingly perfect changes that kill debug=true on bits [15:30:35] <_joe_> or anything else [15:30:54] * bd808 hugs _joe_ for adding tests [15:31:13] <_joe_> bd808: for now I just wrote a basic engine to _generate_ tests [15:31:19] <_joe_> :P [15:31:24] is SWAT done? [15:32:11] bd808: what would be a simple way to test parallel php -l ? https://gerrit.wikimedia.org/r/#/c/160668/ [15:33:00] godog: We can test scap changes in beta by cherry-picking commits from gerrit [15:34:11] !log Jenkins: deleting /srv/ssd/jenkins-slave/workspace/*testextensions-master on gallium and lanthanum. [15:34:12] If no one says otherwise, I'll deploy my symlink updates now(ish) [15:34:17] Logged the message, Master [15:34:26] Reedy: I think you are clear [15:34:35] (03PS2) 10Reedy: Update symlinks to use /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160640 [15:34:41] (03CR) 10Reedy: [C: 032] Update symlinks to use /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160640 (owner: 10Reedy) [15:34:46] (03Merged) 10jenkins-bot: Update symlinks to use /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160640 (owner: 10Reedy) [15:35:47] PROBLEM - puppet last run on amssq39 is CRITICAL: CRITICAL: Epic puppet fail [15:35:51] !log reedy Synchronized docroot and w: Update symlinks to use /srv/mediawiki (duration: 00m 16s) [15:35:56] Logged the message, Master [15:36:37] bd808: sounds good to me, on deployment-bastion01 ? [15:37:10] godog: Yes. I was looking for a wiki page that explains and aparrently I haven't made one :( [15:38:23] LGTM [15:38:38] godog: cd /srv/deployment/scap/scap; git deploy start; git cherry-pick ...; git deploy sync; /usr/local/bin/wmf-beta-scap --verbose [15:38:50] heh, I see some enwiki slaves are 5.5.36, some are 5.5.37 [15:42:17] bd808: ok! trying :) [15:42:37] (03PS3) 10Mark Bergsma: Add a temp router for migrating some (disabled) LDAP accounts to Google Apps [puppet] - 10https://gerrit.wikimedia.org/r/160661 [15:43:36] (03PS4) 10Mark Bergsma: Add a temp router for migrating some (disabled) LDAP accounts to Google Apps [puppet] - 10https://gerrit.wikimedia.org/r/160661 [15:47:14] (03CR) 10Reedy: Change the deployment source directory from /a/common to /srv/mediawiki (035 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157485 (owner: 10Ori.livneh) [15:48:38] bd808: that didn't break scap, but I couldn't find the log line where it ran check_php_syntax either :( [15:48:46] bd808: I think that fixed it! Of course, I failed to check if I could reproduce the issue beforehand. [15:49:07] andrewbogott: WORKS_FOR_ME and just walk away :) [15:49:29] (03PS1) 10Nuria: Moving /var/lib/wikimetrics to /srv/wikimetrics [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/160672 [15:49:55] godog: uh. well, subprocess.check_call() doesn't log unless it breaks [15:50:42] (03PS3) 10Reedy: Update refresh-dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157485 (owner: 10Ori.livneh) [15:50:57] (03CR) 10Mark Bergsma: [C: 032] "I apologize :-)" [puppet] - 10https://gerrit.wikimedia.org/r/160661 (owner: 10Mark Bergsma) [15:51:24] godog: If you want debug logging for that you'll need to change things a bit more I guess. [15:52:08] bd808: ah indeed, was expecting it to be wrapped already by log.Timer, I'll give it another go thanks! [15:52:24] (03PS1) 10Reedy: lol, svn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160674 [15:52:53] godog: The old way is so fast the timer always said 0.0s :) [15:55:47] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [15:56:23] Yoo, Jeff_Green, yt? [15:56:28] yep [15:57:19] (03PS2) 10Ottomata: Bring more udp2log filters over to kafkatee on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/160523 [15:57:29] (03PS43) 1001tonythomas: Added the bouncehandler router to catch in all bounce emails [puppet] - 10https://gerrit.wikimedia.org/r/155753 [15:57:54] Jeff_Green: what's the status of fundraising udp2log stuff? [15:58:28] status quo. last I knew we were waiting for hardware. [15:59:06] hm, is there an RT? [15:59:16] there's still not hurry, but i'm starting to bring more udp2log stuff over to kafkatee [15:59:21] i can work on all my stuff first [15:59:34] i might try to disable one of the udp2log instances (probalby oxygen) and use kafkatee instead soon [16:00:30] <_joe_> !log mw1018 and mw1021 in the hhvm appservers pool [16:00:35] Logged the message, Master [16:02:37] Reedy: releng meeting :) [16:03:02] pfft [16:03:10] ah I guess we got the server! RT 8211 [16:04:03] (03CR) 10Ottomata: [C: 032 V: 032] Bring more udp2log filters over to kafkatee on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/160523 (owner: 10Ottomata) [16:05:59] ottomata: so I have to review and package kafkatee. that's RT 7854 [16:07:26] oh cool [16:07:36] well, it is packaged, and there is a puppet module...i thinkit is even a git submodule [16:07:46] oh ja [16:08:01] https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet/kafkatee [16:08:05] (03Abandoned) 10Nuria: Moving /var/lib/wikimetrics to /srv/wikimetrics [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/160672 (owner: 10Nuria) [16:08:35] ottomata: frack does not use the production repos [16:08:46] aye, but since it is a submodule, ,can you just include it in your repo that way? [16:08:58] yep [16:09:02] you can manually copy it in and add it if you like, but since it is s submodule [16:09:02] cool [16:09:43] I'll have to figure out how to do it, but I think I could just pull it in at a specific tag [16:09:56] that's how submodules work anyway [16:10:03] you commit them to point at a specific sha [16:10:11] ok, I haven't done that yet [16:10:11] it won't change in your parent repo unless you change it [16:10:14] (03PS1) 10RobH: setting db2033-2042 install params [puppet] - 10https://gerrit.wikimedia.org/r/160678 [16:10:23] once your submodule is in, you'll just have to use it [16:10:26] tons of usage examples here [16:10:27] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/analytics/kafkatee.pp [16:10:47] (03PS1) 10Nuria: Moving /var/lib/wikimetrics to /srv/wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/160679 [16:10:48] Jeff_Green: do you know if you need the full stream for your filters? [16:10:50] you probalby don't, right? [16:10:54] like, do you need bits and upload? [16:10:58] or just text and mobile? [16:11:22] ottomata: I don't know. I haven't had a chance to think about any of this [16:11:45] (03CR) 10RobH: [C: 032] setting db2033-2042 install params [puppet] - 10https://gerrit.wikimedia.org/r/160678 (owner: 10RobH) [16:11:47] ok, well, we can figure that out, a nice feature of kafka(tee) is you can configure the topics you want to consume [16:11:52] we split the webrequest stream into 4 parts [16:11:56] great [16:11:59] so, if you don't need say, upload traffic [16:12:04] then you won't have to filter through that at all [16:12:07] and your box will do that much less work [16:12:12] right [16:13:24] bd808: latest PS seems to work, anything I need to do on deployment-bastion to revert my changes or can stay as it is? [16:13:59] <_joe_> godog: I usually git reset --hard HEAD~1 [16:14:13] <_joe_> if the change is going to be merged soon [16:14:49] I hope it is! [16:16:07] godog: _joe_'s revert command would be good and run git deploy again. [16:16:53] godog: Also, do you mind if I mess with the patch a tiny bit? I have python code style OCD and would be glad to make it look right to me if you don't mind minor changes by me to your patch. [16:20:49] (03PS1) 10Ottomata: Include kafkatee::monitoring on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/160683 [16:21:29] bd808: sure go ahead! I tweaked just about to make jenkins like me [16:22:13] bd808: if it is double vs single quotes, I started with single quotes but then quoting \' in the command looked even uglier :) [16:24:10] godog: It's actually the line continuation character that is making my eyes bleed. [16:24:15] :) [16:24:56] the bees! the bees! [16:25:36] (03CR) 10Ottomata: [C: 032 V: 032] Include kafkatee::monitoring on analytics1003 [puppet] - 10https://gerrit.wikimedia.org/r/160683 (owner: 10Ottomata) [16:32:01] cmjohnson1: ms-be1014 seems to have idrac hung or sth like that and not reachable via ssh anymore, is there any other trick to try beside kicking it? https://rt.wikimedia.org/Ticket/Display.html?id=8372 [16:32:38] godog: on my way to do that now..ran into someone I know at the data center and was distracted [16:33:22] no it needs to be kicked [16:35:08] PROBLEM - check_apache2 on payments1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [16:36:19] * Jeff_Green sees that alert ^^^ [16:36:31] cmjohnson1: ack, thanks! [16:38:37] greg-g, Reedy: Mind if I update scap on the cluster? [16:38:49] bd808: WFM [16:40:09] RECOVERY - check_apache2 on payments1003 is OK: PROCS OK: 12 processes with command name apache2 [16:40:15] bd808|deploy: \o/ can finally close off https://bugzilla.wikimedia.org/show_bug.cgi?id=68255 I think [16:40:29] RECOVERY - very high load average likely xfs on ms-be1014 is OK: OK - load average: 1.26, 0.28, 0.09 [16:40:29] RECOVERY - swift-account-replicator on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [16:40:38] RECOVERY - swift-container-updater on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [16:40:39] RECOVERY - swift-object-updater on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [16:40:39] RECOVERY - SSH on ms-be1014 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [16:40:48] RECOVERY - check if dhclient is running on ms-be1014 is OK: PROCS OK: 0 processes with command name dhclient [16:40:50] RECOVERY - swift-account-reaper on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [16:40:53] RECOVERY - swift-container-replicator on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [16:40:54] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:40:54] RECOVERY - swift-container-server on ms-be1014 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [16:40:54] RECOVERY - DPKG on ms-be1014 is OK: All packages OK [16:40:54] RECOVERY - Disk space on ms-be1014 is OK: DISK OK [16:41:00] RECOVERY - swift-object-server on ms-be1014 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [16:41:00] RECOVERY - swift-object-replicator on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:41:10] RECOVERY - check configured eth on ms-be1014 is OK: NRPE: Unable to read output [16:41:10] RECOVERY - swift-container-auditor on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [16:41:10] RECOVERY - swift-account-auditor on ms-be1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [16:41:10] RECOVERY - swift-account-server on ms-be1014 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [16:41:10] RECOVERY - swift-object-auditor on ms-be1014 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:41:11] RECOVERY - RAID on ms-be1014 is OK: OK: optimal, 14 logical, 14 physical [16:41:16] !log Trebuchet update for scap reporting failure from osmium.eqiad.wmnet, searchidx1001.eqiad.wmnet, fenari.wikimedia.org and mw1110.eqiad.wmnet [16:42:36] !log Trebuchet sync for scap reporting failure from osmium.eqiad.wmnet, mw1053.eqiad.wmnet, searchidx1001.eqiad.wmnet, fenari.wikimedia.org, and mw1110.eqiad.wmnet [16:42:41] Logged the message, Master [16:43:06] !log Updated scap to 663f137 (Check php syntax with parallel `php -l`) [16:43:10] Logged the message, Master [16:45:09] !log bd808 Started scap: No code change scap to test scap internal update [16:45:15] Logged the message, Master [16:47:18] (03PS1) 10Ottomata: Make role::labs::lvm::srv only use 50% of the /dev/vd/ volume group [puppet] - 10https://gerrit.wikimedia.org/r/160687 [16:48:15] (03CR) 10Ottomata: "wikimetrics uses /var/lib/wikimetrics to write public data. We want to be abl to mount a logical volume there as well." [puppet] - 10https://gerrit.wikimedia.org/r/160687 (owner: 10Ottomata) [16:48:22] cmjohnson1: we're back, thanks! [16:48:27] (03CR) 10Hashar: "Nit: make it a parameter :D" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/160687 (owner: 10Ottomata) [16:48:30] yw [16:48:45] hashar: can't, its a role [16:48:49] roles shouldn't have parametrs [16:48:53] ottomata: ah true :-( [16:48:55] i could make it check a global for a default value [16:49:02] which could be configurable via the labs interface [16:49:06] ottomata: that role (or maybe it is the ::mnt one) is used on integration and beta cluster labs project [16:49:11] and in some case we need a huge disk space available [16:49:18] though maybe 50% is enough. haven't checked [16:49:29] i'm not sure, it probably varies per instance [16:49:32] ok ok, i got something for ya [16:49:37] will try to make it not change stuff as is [16:49:45] i mean, its not going to recrearte anything that exists [16:49:51] but new instances would be affected [16:50:07] yeah [16:50:13] I am not too worried about it though [16:50:26] 50% is probably enough for "my" projects :D [16:50:32] tools-labs might disagree though [16:50:36] Anybody aware of this error? "Internal error in ApiFormatXml::recXmlPrint: (P225, ...) has integer keys without _element value. Use ApiResult::setIndexedTagName()." [16:51:56] anomie: ^ Do you know what would cause that? Most of fatalog errors right now are variations on that exception. [16:52:08] (03PS2) 10Ottomata: Make role::labs::lvm::srv only use 50% of the /dev/vd/ volume group [puppet] - 10https://gerrit.wikimedia.org/r/160687 [16:52:11] Might be best having a url to go with it [16:52:19] hashar: $::srv_logical_volume_size [16:52:20] oops [16:52:33] that better? [16:52:39] bd808|deploy: Some extension not setting the metadata needed by format=xml in their API module [16:53:04] ottomata: yeah might do it :-] [16:53:19] ottomata: so we can optionally pass it in wikitech labs console [16:53:30] Reedy: wikidata /w/api.php?action=wbeditentity&format=xml is one [16:53:33] yeah [16:53:44] HMM, you know, I can do what I need to do if I just don't use the role class [16:53:45] role::labs::lvm::srv [16:53:46] all are wikidata I think. aude? [16:53:49] maybe I shoudln,t' bother, hashar [16:53:49] hm [16:53:55] i can use the define manually [16:54:03] instead of relying on this role wrapper [16:54:18] I see notoken [16:54:18] ottomata: possibly. I was merely nitpicking anyway and highlighting a potential side effect of always using 50% [16:54:27] yeah [16:55:23] ottomata: end of meeting I am off :] [16:58:17] latesr! [16:58:42] Reedy: Badly behaved bot? Seems like it still shouldn't cause a fatal exception. [16:59:09] Oh maybe it's not fatal? Just in exception-json logs [16:59:42] PHP Catchable fatal error: Argument 1 passed to AbuseFilter::contentToString() must implement interface Content, null given [17:00:28] Ah [17:00:30] Can we take fenari out of scap yet? [17:00:36] nocnocnoc [17:01:08] (03PS1) 10Ottomata: Use labs_lvm::volume to create /srv and /var/lib/wikimetrics mounts in labs [puppet] - 10https://gerrit.wikimedia.org/r/160689 [17:01:22] bd808|deploy: Reedy known [17:01:49] we are working on fix and would like it for swat tomorrow [17:03:14] (03PS1) 10Filippo Giunchedi: use scap's embedded linking, remove lint script [puppet] - 10https://gerrit.wikimedia.org/r/160691 (https://bugzilla.wikimedia.org/68255) [17:03:15] !log bd808 Finished scap: No code change scap to test scap internal update (duration: 18m 06s) [17:03:20] Logged the message, Master [17:08:15] (03PS3) 10Ottomata: Make role::labs::lvm::srv only use 50% of the /dev/vd/ volume group, move make-instance-vol file into base labs_lvm class [puppet] - 10https://gerrit.wikimedia.org/r/160687 [17:09:16] (03PS4) 10Ottomata: Make role::labs::lvm::srv volume size configurable, move make-instance-vol file into labs_lvm base class [puppet] - 10https://gerrit.wikimedia.org/r/160687 [17:10:19] (03PS2) 10Giuseppe Lavagetto: HHVM: increase JitAColdSize to 60 MiB [puppet] - 10https://gerrit.wikimedia.org/r/160394 (owner: 10Ori.livneh) [17:10:49] (03CR) 10Giuseppe Lavagetto: [C: 032] "I monitored the sizes the whole day on appservers, it makes sense." [puppet] - 10https://gerrit.wikimedia.org/r/160394 (owner: 10Ori.livneh) [17:10:58] (03CR) 10Giuseppe Lavagetto: [V: 032] "I monitored the sizes the whole day on appservers, it makes sense." [puppet] - 10https://gerrit.wikimedia.org/r/160394 (owner: 10Ori.livneh) [17:14:19] RECOVERY - swift eqiad-prod container availability on tungsten is OK: OK: Less than 1.00% under the threshold [98.0] [17:15:25] (03PS3) 10Giuseppe Lavagetto: HHVM: Increase the maximum number of open files to 65536 [puppet] - 10https://gerrit.wikimedia.org/r/160510 (owner: 10Hoo man) [17:16:36] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] HHVM: Increase the maximum number of open files to 65536 [puppet] - 10https://gerrit.wikimedia.org/r/160510 (owner: 10Hoo man) [17:36:58] PROBLEM - puppet last run on db60 is CRITICAL: CRITICAL: Epic puppet fail [17:37:08] godog: ping [17:37:08] gwicke: ping detected, please leave a message! [17:38:24] gwicke: in a meeting ATM [17:38:37] k, np [17:42:01] (03PS1) 10RobH: fixing foward dns entries for db2036-db2039 [dns] - 10https://gerrit.wikimedia.org/r/160707 [17:42:34] (03CR) 10RobH: [C: 032] fixing foward dns entries for db2036-db2039 [dns] - 10https://gerrit.wikimedia.org/r/160707 (owner: 10RobH) [17:45:45] (03CR) 10Dzahn: [C: 032] redirect http->https on ishmael [puppet] - 10https://gerrit.wikimedia.org/r/160588 (owner: 10Dzahn) [17:46:02] mark: time for one final review at gerrit.wikimedia.org/r/#/c/155753/ ? [17:46:48] Jeff_Green and me could test the config on labs and it looks like we got the right configuration file as expected [17:48:30] (03CR) 10BryanDavis: [C: 031] "Seems right to me but I haven't grepped for any other random scripts that may be using the lint.php wrapper." [puppet] - 10https://gerrit.wikimedia.org/r/160691 (https://bugzilla.wikimedia.org/68255) (owner: 10Filippo Giunchedi) [17:49:28] PROBLEM - Host db2011 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:28] PROBLEM - Host db2007 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:28] PROBLEM - Host baham is DOWN: CRITICAL - Time to live exceeded (208.80.153.13) [17:49:28] PROBLEM - Host acamar is DOWN: CRITICAL - Time to live exceeded (208.80.153.12) [17:49:28] PROBLEM - Host ns1-baham.wikimedia.org is DOWN: CRITICAL - Time to live exceeded (208.80.153.231) [17:49:29] PROBLEM - Host install2001 is DOWN: CRITICAL - Time to live exceeded (208.80.153.4) [17:49:29] PROBLEM - Host bast2001 is DOWN: CRITICAL - Time to live exceeded (208.80.153.5) [17:49:48] PROBLEM - Host db2005 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:49] PROBLEM - Host db2003 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:49] PROBLEM - Host db2012 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:49] PROBLEM - Host db2004 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:49] PROBLEM - Host db2010 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:51] PROBLEM - Host db2002 is DOWN: PING CRITICAL - Packet loss = 100% [17:49:51] PROBLEM - Host db2009 is DOWN: PING CRITICAL - Packet loss = 100% [17:50:09] PROBLEM - Host db2001 is DOWN: PING CRITICAL - Packet loss = 100% [17:50:18] PROBLEM - Host 208.80.153.12 is DOWN: CRITICAL - Time to live exceeded (208.80.153.12) [17:50:18] PROBLEM - Host 2620:0:860:1:d6ae:52ff:feac:4dc8 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:860:1:d6ae:52ff:feac:4dc8 [17:51:02] ^ just codfw network stuff [17:51:08] PROBLEM - Host 2620:0:860:2:d6ae:52ff:fead:5610 is DOWN: PING CRITICAL - Packet loss = 100% [17:51:15] scary [17:51:24] heh, hosts merely by ipv6 [17:51:25] nice [17:51:33] its super scary that way [17:53:08] RECOVERY - Host db2002 is UP: PING OK - Packet loss = 0%, RTA = 53.36 ms [17:53:08] RECOVERY - Host db2010 is UP: PING OK - Packet loss = 0%, RTA = 52.11 ms [17:53:08] RECOVERY - Host db2004 is UP: PING OK - Packet loss = 0%, RTA = 52.01 ms [17:53:08] RECOVERY - Host db2003 is UP: PING OK - Packet loss = 0%, RTA = 52.82 ms [17:53:08] RECOVERY - Host db2001 is UP: PING OK - Packet loss = 0%, RTA = 52.58 ms [17:53:09] RECOVERY - Host db2012 is UP: PING OK - Packet loss = 0%, RTA = 52.74 ms [17:53:09] RECOVERY - Host db2007 is UP: PING OK - Packet loss = 0%, RTA = 52.37 ms [17:53:18] RECOVERY - Host db2005 is UP: PING OK - Packet loss = 0%, RTA = 52.14 ms [17:53:18] RECOVERY - Host db2009 is UP: PING OK - Packet loss = 0%, RTA = 51.98 ms [17:53:18] RECOVERY - Host install2001 is UP: PING OK - Packet loss = 0%, RTA = 52.18 ms [17:53:18] RECOVERY - Host acamar is UP: PING OK - Packet loss = 0%, RTA = 52.02 ms [17:53:18] RECOVERY - Host db2011 is UP: PING OK - Packet loss = 0%, RTA = 52.31 ms [17:53:18] RECOVERY - Host baham is UP: PING OK - Packet loss = 0%, RTA = 52.36 ms [17:53:19] RECOVERY - Host bast2001 is UP: PING OK - Packet loss = 0%, RTA = 52.31 ms [17:53:38] RECOVERY - Host 2620:0:860:2:d6ae:52ff:fead:5610 is UP: PING OK - Packet loss = 0%, RTA = 43.17 ms [17:53:38] RECOVERY - Host ns1-baham.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 52.04 ms [17:54:18] RECOVERY - Host 2620:0:860:1:d6ae:52ff:feac:4dc8 is UP: PING OK - Packet loss = 0%, RTA = 52.55 ms [17:54:43] (03CR) 10Dzahn: "arr, would have been all ok but this Apache was also still not loading mod_headers" [puppet] - 10https://gerrit.wikimedia.org/r/160588 (owner: 10Dzahn) [17:55:08] RECOVERY - puppet last run on db60 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:55:09] RECOVERY - Host 208.80.153.12 is UP: PING OK - Packet loss = 0%, RTA = 52.63 ms [17:55:38] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: Epic puppet fail [17:55:38] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: Epic puppet fail [17:55:38] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: Epic puppet fail [17:55:39] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: Epic puppet fail [17:56:33] "if it's in the '2000's' it's not really critical right now" [17:57:48] Server 2k problems... [17:57:49] :D [17:57:57] *g* [17:58:04] blame Y2K [17:58:18] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Epic puppet fail [17:59:49] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [18:00:04] Reedy, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140916T1800). Please do the needful. [18:00:09] PROBLEM - check_apache2 on payments1004 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [18:01:25] (03PS1) 10Dzahn: ishmael - include ::apache::mod::headers [puppet] - 10https://gerrit.wikimedia.org/r/160780 [18:02:17] gwicke: have to run now :( tomorrow I can stay longer tho [18:02:27] (03CR) 10Dzahn: [C: 032] "fix: Syntax error on line 14 of /etc/apache2/sites-enabled/50-ishmael-wikimedia-org.conf:" [puppet] - 10https://gerrit.wikimedia.org/r/160780 (owner: 10Dzahn) [18:05:10] PROBLEM - check_apache2 on payments1004 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [18:05:27] that payments error seems unrelated to the other things [18:05:36] godog: okay, it's not urgent- just wanted to ask you about your opinion on something [18:05:47] see you tomorrow [18:06:09] (03PS1) 10Reedy: Non wikipedias to 1.24wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160781 [18:06:10] mutante: i see it [18:06:25] package update restarted apache [18:06:35] Jeff_Green: thanks, i was about to report in -fundraising [18:06:48] cool [18:07:28] PdfHandler is OOM-ing a lot [18:09:18] (03CR) 10Dzahn: "needed Ided5b65aae1fd5 but now it's ok:" [puppet] - 10https://gerrit.wikimedia.org/r/160588 (owner: 10Dzahn) [18:09:50] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [18:09:51] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.24wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160781 (owner: 10Reedy) [18:10:01] (03Merged) 10jenkins-bot: Non wikipedias to 1.24wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160781 (owner: 10Reedy) [18:10:08] RECOVERY - check_apache2 on payments1004 is OK: PROCS OK: 12 processes with command name apache2 [18:10:49] (03PS1) 10Reedy: wgMemoryLimit to 256M [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160783 [18:10:49] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [18:10:49] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [18:11:12] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf21 [18:11:17] Logged the message, Master [18:11:59] RECOVERY - puppet last run on db2003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [18:12:18] (03PS2) 10Reedy: lol, svn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160674 [18:12:23] (03CR) 10Reedy: [C: 032] lol, svn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160674 (owner: 10Reedy) [18:12:27] (03Merged) 10jenkins-bot: lol, svn [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160674 (owner: 10Reedy) [18:13:01] (03PS3) 10Reedy: Enable Flow on outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160663 (owner: 10Steinsplitter) [18:14:00] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [18:14:49] (03CR) 10Reedy: [C: 04-1] "Needs db tables creating on extension1 before this can be deployed etc..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160663 (owner: 10Steinsplitter) [18:15:26] (03PS2) 10Reedy: Enable 1:1000 attribution logging for MediaViewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160578 (owner: 10Gergő Tisza) [18:15:31] (03Abandoned) 10Steinsplitter: Enable Flow on outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160663 (owner: 10Steinsplitter) [18:15:34] (03CR) 10Reedy: [C: 032] Enable 1:1000 attribution logging for MediaViewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160578 (owner: 10Gergő Tisza) [18:15:39] (03Merged) 10jenkins-bot: Enable 1:1000 attribution logging for MediaViewer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160578 (owner: 10Gergő Tisza) [18:16:17] (03PS2) 10Reedy: wgMemoryLimit to 256M [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160783 [18:16:24] (03CR) 10Reedy: [C: 032] wgMemoryLimit to 256M [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160783 (owner: 10Reedy) [18:16:29] (03Merged) 10jenkins-bot: wgMemoryLimit to 256M [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160783 (owner: 10Reedy) [18:17:44] (03PS2) 10Reedy: Set wgContentHandlerDB true for enwiki-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159457 (https://bugzilla.wikimedia.org/49193) (owner: 10Spage) [18:17:48] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [18:17:51] (03CR) 10Reedy: [C: 032] Set wgContentHandlerDB true for enwiki-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159457 (https://bugzilla.wikimedia.org/49193) (owner: 10Spage) [18:17:57] (03Merged) 10jenkins-bot: Set wgContentHandlerDB true for enwiki-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/159457 (https://bugzilla.wikimedia.org/49193) (owner: 10Spage) [18:18:28] (03PS2) 10Reedy: Set up "Author" namespace for aswikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160117 (https://bugzilla.wikimedia.org/70464) (owner: 10Jforrester) [18:18:34] (03CR) 10Reedy: [C: 032] Set up "Author" namespace for aswikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160117 (https://bugzilla.wikimedia.org/70464) (owner: 10Jforrester) [18:18:41] (03Merged) 10jenkins-bot: Set up "Author" namespace for aswikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160117 (https://bugzilla.wikimedia.org/70464) (owner: 10Jforrester) [18:18:54] Reedy: I'm assuming that's the way to do that change, then? :-) [18:19:01] (03PS6) 10Reedy: Fix typos in various localizations of dvwiki configurations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (https://bugzilla.wikimedia.org/48075) (owner: 10Gerrit Patch Uploader) [18:19:07] * James_F hadn't done new-namespace before. [18:19:07] (03CR) 10Reedy: [C: 032] Fix typos in various localizations of dvwiki configurations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (https://bugzilla.wikimedia.org/48075) (owner: 10Gerrit Patch Uploader) [18:19:11] (03Merged) 10jenkins-bot: Fix typos in various localizations of dvwiki configurations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156821 (https://bugzilla.wikimedia.org/48075) (owner: 10Gerrit Patch Uploader) [18:21:01] Reedy: https://gerrit.wikimedia.org/r/#/c/154234/ looks good to go (needs a note to ops@). [18:22:55] Reedy: Can the category collation ones go out? https://gerrit.wikimedia.org/r/#/c/140580/ https://gerrit.wikimedia.org/r/#/c/155241/ https://gerrit.wikimedia.org/r/#/c/147922/ https://gerrit.wikimedia.org/r/#/c/154213/ [18:22:59] No [18:23:05] Boo. Why not? [18:23:14] Last time I checked in with springle, it needs some ops massaging to go along with it [18:23:26] Temporarily disabling a mysql server setting and such [18:23:52] Ah, fun. [18:24:58] I think there was talk about libicu upgrades too [18:25:21] * James_F runs away from those patches, then. :-) [18:25:37] grumble grumble [18:25:49] i emailed him about it a few days ago and got no reply [18:26:16] He's busy with mariadb upgrades and provisioning codfw servers [18:26:19] Ping him on irc [18:26:33] ops massaging, eh. we deployed like twenty of these with no issues, and only ran into something when doing frwiki which is large [18:26:33] Or wait. :-) [18:26:55] Reedy: https://gerrit.wikimedia.org/r/#/c/157485/ and maybe https://gerrit.wikimedia.org/r/#/c/158573/ [18:26:59] the wikis in these patches mostly have like 10k pages or something [18:27:17] I don't mind doing them so much [18:27:32] But I'm sure he'd be pissed if it caused an outage after saying not to do it without doing X first [18:27:49] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 3 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [18:28:06] (03PS2) 10Reedy: (Re-)enable VisualEditor for Wikitech (labswiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158573 (owner: 10Jforrester) [18:28:11] (03CR) 10Reedy: [C: 032] (Re-)enable VisualEditor for Wikitech (labswiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158573 (owner: 10Jforrester) [18:28:16] (03Merged) 10jenkins-bot: (Re-)enable VisualEditor for Wikitech (labswiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/158573 (owner: 10Jforrester) [18:28:33] (03PS4) 10Reedy: Update refresh-dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157485 (owner: 10Ori.livneh) [18:28:37] (03CR) 10Reedy: [C: 032] Update refresh-dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157485 (owner: 10Ori.livneh) [18:28:41] (03Merged) 10jenkins-bot: Update refresh-dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/157485 (owner: 10Ori.livneh) [18:29:02] * James_F nods. [18:29:55] James_F: Is https://gerrit.wikimedia.org/r/#/c/112590/ needed/wanted/cared about? [18:29:58] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [18:30:14] Reedy: It's… waiting. [18:30:29] Reedy: For community discussion that still isn't moving very quickly. [18:30:40] Reedy: I could abandon and re-do if you'd prefer/ [18:30:45] (03PS2) 10Jforrester: Disable 'beta' label in tab for the VE opt-in wiki (enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/112590 (https://bugzilla.wikimedia.org/58583) [18:30:52] (03PS2) 10Reedy: Set $wgAbuseFilterAnonBlockDuration to 3 months on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160286 (https://bugzilla.wikimedia.org/70828) (owner: 10Jackmcbarn) [18:30:56] (03CR) 10Reedy: [C: 032] Set $wgAbuseFilterAnonBlockDuration to 3 months on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160286 (https://bugzilla.wikimedia.org/70828) (owner: 10Jackmcbarn) [18:31:01] (03Merged) 10jenkins-bot: Set $wgAbuseFilterAnonBlockDuration to 3 months on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160286 (https://bugzilla.wikimedia.org/70828) (owner: 10Jackmcbarn) [18:32:08] !log reedy Synchronized wmf-config/: (no message) (duration: 00m 15s) [18:32:14] Logged the message, Master [18:36:11] PROBLEM - Host payments1003 is DOWN: PING CRITICAL - Packet loss = 100% [18:37:27] (03CR) 10Mark Bergsma: [C: 031] Added the bouncehandler router to catch in all bounce emails [puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [18:38:31] Jeff_Green: gerrit.wikimedia.org/r/#/c/155753/ ready to go ? [18:39:18] Jeff_Green: i take it you are wokring on that payments1003 server that just alerted? (or it is an actual issue?) [18:39:32] yeah, I think I give up on icinga [18:39:39] heh, cool just checkin [18:39:44] that's me rebooting it while it is marked down for maintenance [18:40:31] RECOVERY - Host payments1003 is UP: PING OK - Packet loss = 0%, RTA = 1.62 ms [18:44:25] (03PS1) 10RobH: typo in dhcp lease file entries for db2041 and db42 [puppet] - 10https://gerrit.wikimedia.org/r/160788 [18:45:24] (03CR) 10RobH: [C: 032] typo in dhcp lease file entries for db2041 and db42 [puppet] - 10https://gerrit.wikimedia.org/r/160788 (owner: 10RobH) [18:48:47] this appears to be a lie: "When a host or service is in a period of scheduled downtime, Nagios will not allow normal notifications to be sent out for the host or service." [18:50:25] payments1002 rebooting, expect a notification [18:51:34] PROBLEM - Host payments1002 is DOWN: PING CRITICAL - Packet loss = 100% [18:53:31] Jeff_Green: it may be that "host down/up" are not in the group "normal notifications" because host notification is different from a service notification [18:53:41] as in the services on payments1002 [18:54:20] but then.. it says "host or service" so hrrmmm [18:54:32] i dunno. i feel like it didn't used to do this [18:54:41] i can remember this working at some point though :p [18:54:51] at least on nagios [18:54:58] yeah [18:55:23] RECOVERY - Host payments1002 is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms [18:55:50] you know....there's no checkbox for the 'host up' test itself [18:56:31] only for the passive checks [19:02:05] greg-g: Heads-up – we may need to do an emergency Parsoid deploy. [19:04:51] Reedy: Is the deploy window free now? RoanKattouw needs to revert this morning's SWAT for VE… [19:05:00] Ja [19:05:08] OK. RoanKattouw? [19:05:36] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/160793/, https://gerrit.wikimedia.org/r/#/c/160794/ [19:05:40] James_F: wee [19:05:57] greg-g: Yay for corruptions in prod. :-( [19:06:22] OK I'll jump in there and do my stuff [19:08:09] WTF [19:08:12] What happened to tin [19:08:18] Why is /srv/mediawiki/php-1.24wmf20 not a git repo [19:08:35] RoanKattouw: /srv/mediawiki-staging [19:08:36] :D [19:08:40] Aha [19:08:51] Was this in Bryan's email and did I just miss it? [19:09:05] I think it was Ori's mail to ops-l [19:10:33] hah git submodule update with rebase is kind of broken for rollbacks [19:11:53] PROBLEM - Host payments1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:15:21] RECOVERY - Host payments1001 is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms [19:15:37] !log catrope Synchronized php-1.24wmf20/extensions/VisualEditor/: (no message) (duration: 00m 09s) [19:15:43] Logged the message, Master [19:15:59] RoanKattouw: Cool log message. [19:16:20] * aude had such fun on monday :p [19:16:31] yesterday [19:16:41] (03CR) 10Alexandros Kosiaris: use scap's embedded linking, remove lint script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/160691 (https://bugzilla.wikimedia.org/68255) (owner: 10Filippo Giunchedi) [19:16:52] aude: When IE corrupts the page if you leave a patch out, and Firefox corrupts it if you leave it in… :-( [19:17:07] !log catrope Synchronized php-1.24wmf21/extensions/VisualEditor/: Revert IE hacks so Firefox will stop corrupting non-Latin characters (duration: 00m 06s) [19:17:10] omg [19:17:11] Logged the message, Master [19:17:38] aude: Yeah. Fun times. [19:17:46] good luck [19:27:50] James_F: on product duty for ever! :) [19:28:07] matanya: Ideally someone else will do that from time to time. :-) [19:34:03] (03PS1) 10Ottomata: Add nickel to $MONITORING_HOSTS network, rename ferm::rule icinga-all to monitoring-all [puppet] - 10https://gerrit.wikimedia.org/r/160802 [19:38:10] !log deployed patches for bugs 70469 and 70672 [19:38:15] Logged the message, Master [19:44:40] who can change user rights on ca.wikimedia (freshly installed) [19:45:01] i don't have permission to Special:UserRights itself [19:45:18] mutante: a steward? [19:45:23] how do we add the initial one [19:45:33] create and promote ? :) [19:45:37] maintenance script [19:45:46] don't know that works with central auth though [19:46:39] hmm, create and promote is a good hint though, thx [19:47:10] i think anyone can login [19:47:17] yes, login is no problem [19:47:24] then a steward or maybe staff cna assign bureaucrat [19:47:31] just now we want to give a person from Canada chapter "full admin" [19:47:40] and then they want to import [19:47:41] yep [19:47:43] from existing wiki [19:47:57] ok, thanks, i'll find a staff [19:48:08] stew* [19:48:09] :P [19:48:20] jeremyb: are you a steward? [19:48:25] no! [19:48:34] #wikimedia-stewards [19:48:50] 'k :) [19:49:42] * hoo is [19:49:45] hoo: ^ :) [19:49:53] you are steward, right? [19:49:58] Yes :) [19:54:02] thanks hoo :) [19:54:08] that worked nicely [19:55:14] Indeed :) [20:09:15] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Epic puppet fail [20:15:21] (03PS5) 10Ottomata: Move make-instance-vol file into labs_lvm base class [puppet] - 10https://gerrit.wikimedia.org/r/160687 [20:15:25] YuviPanda: ^ [20:16:45] (03CR) 10Yuvipanda: [C: 031] "Looks sane, haven't tested." [puppet] - 10https://gerrit.wikimedia.org/r/160687 (owner: 10Ottomata) [20:29:25] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [20:34:44] (03CR) 10Dzahn: "lgtm, except you probably want to add a redirect from http to https , because now metrics.wm.org does that, and after this change you woul" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/160419 (owner: 10Filippo Giunchedi) [20:35:36] (03PS1) 10RobH: setting public ip for labcontrol2001 [dns] - 10https://gerrit.wikimedia.org/r/160815 [20:38:50] (03PS1) 10RobH: setting install params for labcontrol2001 [puppet] - 10https://gerrit.wikimedia.org/r/160816 [20:39:09] (03CR) 10RobH: [C: 032] setting public ip for labcontrol2001 [dns] - 10https://gerrit.wikimedia.org/r/160815 (owner: 10RobH) [20:39:50] (03CR) 10RobH: [C: 032] setting install params for labcontrol2001 [puppet] - 10https://gerrit.wikimedia.org/r/160816 (owner: 10RobH) [20:40:09] (03CR) 10Dzahn: [C: 031] "i take that back, i was confused by the redirect from wikimedia.org to wmflabs.org that happens here. it looks all good, that SSL config i" [puppet] - 10https://gerrit.wikimedia.org/r/160419 (owner: 10Filippo Giunchedi) [20:49:52] (03PS1) 10Dzahn: icinga - use apache::site [puppet] - 10https://gerrit.wikimedia.org/r/160820 [20:50:16] (03PS2) 10Dzahn: icinga - use apache::site [puppet] - 10https://gerrit.wikimedia.org/r/160820 [21:13:44] RoanKattouw: yurik has deployment access... just search his name on https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:13:52] (03CR) 10Hashar: [C: 04-1] "The Jenkins jobs doing php linting rely on parsekit / lint.php. I have to change them to use xargs | php -l" [puppet] - 10https://gerrit.wikimedia.org/r/160691 (https://bugzilla.wikimedia.org/68255) (owner: 10Filippo Giunchedi) [21:13:53] Right [21:14:09] legoktm, RoanKattouw, what did i do? [21:14:16] Quite probably nothing [21:14:20] see ops@ [21:14:30] yurikR: So for some reason I thought you didn't have deployment access. But apparently you do? [21:14:38] Am I just not staying on top of things? [21:14:48] RoanKattouw, i had it for the past year :))) [21:14:57] ever since i joined WMF [21:16:05] RoanKattouw, looking... [21:17:07] RoanKattouw: I was going to say... since when has yuri not had deploy rights? ;) [21:17:26] :D [21:17:27] (03CR) 10Jgreen: [C: 032 V: 031] add SPF record for donate.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/159489 (owner: 10Jgreen) [21:18:50] (03PS1) 10Legoktm: Enable UserMerge extension on all public wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160828 (https://bugzilla.wikimedia.org/68844) [21:19:17] (03CR) 10BBlack: [C: 032] De-duplicate various LVS ethernet fixes/optimizations across DCs [puppet] - 10https://gerrit.wikimedia.org/r/160826 (owner: 10BBlack) [21:19:34] (03Abandoned) 10Andrew Bogott: Enable the isolated hoss filter to prevent new hosts on virt1006 [puppet] - 10https://gerrit.wikimedia.org/r/160825 (owner: 10Andrew Bogott) [21:20:35] (03CR) 10Dzahn: "what hashar said, needs if ubuntu_version('>= trusty') {" [puppet] - 10https://gerrit.wikimedia.org/r/159226 (owner: 10Krinkle) [21:21:05] (03CR) 10Reedy: [C: 031] Enable UserMerge extension on all public wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160828 (https://bugzilla.wikimedia.org/68844) (owner: 10Legoktm) [21:27:36] Reedy: Did you do something weird with ZeroBanner and ZeroPortal in wmf20, like pinning them? [21:27:37] RoanKattouw, Reedy, I am looking at the ZeroBanner: yurik@tin:/srv/mediawiki-staging/php-1.24wmf20/extensions/ZeroBanner$ git log 44f4d8..8c045 [21:27:49] RoanKattouw: Nope [21:27:59] Oooh aha [21:28:06] "Update ZeroPortal to master" [21:28:14] was an effective roll-back apparently [21:28:22] (03PS1) 10Andrew Bogott: Uncluded wikiupdates on all nova boxes. [puppet] - 10https://gerrit.wikimedia.org/r/160829 [21:28:48] And git submodule update in rebase mode is arguably broken in a way that makes rollbacks no-ops [21:28:49] oh, is reedy deploying right after me? [21:29:09] (03PS2) 10Andrew Bogott: Include wikiupdates on all nova boxes. [puppet] - 10https://gerrit.wikimedia.org/r/160829 [21:29:20] bah, who broke "sql centralauth" ? [21:29:24] could it be that my deploy and reedy's pending version switchover interacted? [21:29:46] second time lucky [21:29:48] (03CR) 10Andrew Bogott: [C: 032] Include wikiupdates on all nova boxes. [puppet] - 10https://gerrit.wikimedia.org/r/160829 (owner: 10Andrew Bogott) [21:29:58] RoanKattouw, was i deploying something wrong? [21:30:05] No it's my fault [21:32:25] PROBLEM - puppet last run on virt1000 is CRITICAL: CRITICAL: Epic puppet fail [21:33:25] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: Epic puppet fail [21:34:28] (03PS1) 10Andrew Bogott: Use require_package('python-mwclient') to solve a duplicate package def [puppet] - 10https://gerrit.wikimedia.org/r/160832 [21:35:06] ori_: quick review? https://gerrit.wikimedia.org/r/#/c/160832/ [21:36:03] !log SPF record deployed for donate.wikimedia.org [21:36:08] Logged the message, Master [21:36:57] (03CR) 10Andrew Bogott: [C: 032] Use require_package('python-mwclient') to solve a duplicate package def [puppet] - 10https://gerrit.wikimedia.org/r/160832 (owner: 10Andrew Bogott) [21:38:31] well that did not help at all [21:39:26] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Epic puppet fail [21:40:00] (03CR) 10Dzahn: "About 42,700 results for linkto:labsconsole.wikimedia.org" [dns] - 10https://gerrit.wikimedia.org/r/160454 (owner: 10Filippo Giunchedi) [21:43:18] (03PS1) 10Andrew Bogott: Revert "Use require_package('python-mwclient') to solve a duplicate package def" [puppet] - 10https://gerrit.wikimedia.org/r/160834 [21:43:56] (03CR) 10jenkins-bot: [V: 04-1] Revert "Use require_package('python-mwclient') to solve a duplicate package def" [puppet] - 10https://gerrit.wikimedia.org/r/160834 (owner: 10Andrew Bogott) [21:44:22] (03PS2) 10Andrew Bogott: Revert "Use require_package('python-mwclient') to solve a duplicate package def" [puppet] - 10https://gerrit.wikimedia.org/r/160834 [21:44:49] (03CR) 10Dzahn: [C: 031] admin: add subbu and gwicke to ocg-render-admins [puppet] - 10https://gerrit.wikimedia.org/r/160497 (owner: 10Matanya) [21:45:09] (03CR) 10Andrew Bogott: [C: 032] Revert "Use require_package('python-mwclient') to solve a duplicate package def" [puppet] - 10https://gerrit.wikimedia.org/r/160834 (owner: 10Andrew Bogott) [21:47:24] (03PS1) 10Andrew Bogott: Remove the dependency on the compute service for wikiupdates. [puppet] - 10https://gerrit.wikimedia.org/r/160836 [21:48:45] (03CR) 10Andrew Bogott: [C: 032] Remove the dependency on the compute service for wikiupdates. [puppet] - 10https://gerrit.wikimedia.org/r/160836 (owner: 10Andrew Bogott) [21:49:46] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:51:11] (03PS1) 10BBlack: fix interface::txqueuelen name param [puppet] - 10https://gerrit.wikimedia.org/r/160841 [21:51:13] (03PS1) 10BBlack: interface-tweaks consistency [puppet] - 10https://gerrit.wikimedia.org/r/160842 [21:51:43] (03PS2) 10BBlack: fix interface::txqueuelen name param [puppet] - 10https://gerrit.wikimedia.org/r/160841 [21:51:46] RECOVERY - puppet last run on virt1000 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:51:50] (03CR) 10BBlack: [C: 032 V: 032] fix interface::txqueuelen name param [puppet] - 10https://gerrit.wikimedia.org/r/160841 (owner: 10BBlack) [21:51:59] (03PS2) 10BBlack: interface-tweaks consistency [puppet] - 10https://gerrit.wikimedia.org/r/160842 [21:52:04] (03CR) 10BBlack: [C: 032 V: 032] interface-tweaks consistency [puppet] - 10https://gerrit.wikimedia.org/r/160842 (owner: 10BBlack) [21:59:35] RECOVERY - puppet last run on virt0 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:02:48] (03CR) 10Dzahn: [C: 04-1] "are we actually fixing something by this that is worth breaking the old links?" [dns] - 10https://gerrit.wikimedia.org/r/160454 (owner: 10Filippo Giunchedi) [22:05:06] (03CR) 10Dzahn: ".. and removed :)" [dns] - 10https://gerrit.wikimedia.org/r/156596 (owner: 10Reedy) [22:09:57] (03PS3) 10Dzahn: Puppetize icinga log file permission fix. [puppet] - 10https://gerrit.wikimedia.org/r/158633 (owner: 10JanZerebecki) [22:10:57] (03CR) 10Dzahn: [C: 032] Puppetize icinga log file permission fix. [puppet] - 10https://gerrit.wikimedia.org/r/158633 (owner: 10JanZerebecki) [22:16:56] (03CR) 10Dzahn: "Notice: /Stage[main]/Icinga::Monitor::Files::Misc/File[/var/log/icinga]/mode: mode changed '2757' to '2755'" [puppet] - 10https://gerrit.wikimedia.org/r/158633 (owner: 10JanZerebecki) [22:18:50] jzerebecki: hello? [22:21:45] (03PS1) 10BBlack: add codfw to runcommand [puppet] - 10https://gerrit.wikimedia.org/r/160844 [22:21:47] (03PS1) 10BBlack: add recdns to codfw pybal [puppet] - 10https://gerrit.wikimedia.org/r/160845 [22:22:01] CUSTOM - DPKG on mchenry is UNKNOWN: NRPE: Unable to read output [22:22:04] (03CR) 10BBlack: [C: 032 V: 032] add codfw to runcommand [puppet] - 10https://gerrit.wikimedia.org/r/160844 (owner: 10BBlack) [22:22:13] (03CR) 10BBlack: [C: 032 V: 032] add recdns to codfw pybal [puppet] - 10https://gerrit.wikimedia.org/r/160845 (owner: 10BBlack) [22:29:23] CUSTOM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 47 data above and 0 below the confidence bounds [22:32:16] (03PS1) 10Andrew Bogott: Tell palladium about all the domains [puppet] - 10https://gerrit.wikimedia.org/r/160848 [22:35:45] (03CR) 10RobH: [C: 031] Tell palladium about all the domains [puppet] - 10https://gerrit.wikimedia.org/r/160848 (owner: 10Andrew Bogott) [22:37:49] (03CR) 10Dzahn: "there is also esams.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/160848 (owner: 10Andrew Bogott) [22:40:43] (03CR) 10Dzahn: "hey Scott, since there was no recent reply on this i'm being bold and abandon it. but don't get me wrong, if you ever want to work on it a" [dns] - 10https://gerrit.wikimedia.org/r/147168 (https://bugzilla.wikimedia.org/68769) (owner: 10Scottlee) [22:40:51] (03Abandoned) 10Dzahn: Fixed spacing. [dns] - 10https://gerrit.wikimedia.org/r/147168 (https://bugzilla.wikimedia.org/68769) (owner: 10Scottlee) [22:46:01] (03PS2) 10Dzahn: Various tweaks to people.wikimedia.org index page [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [22:47:24] (03CR) 10Dzahn: [C: 032] "rebased, trivial changes left, we were already using the thumbnail" [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [22:53:59] (03PS1) 10BBlack: shift codfw LVS DNS over to private [dns] - 10https://gerrit.wikimedia.org/r/160851 [22:54:01] (03PS1) 10BBlack: move baham to private subnet [dns] - 10https://gerrit.wikimedia.org/r/160852 [22:54:19] (03PS1) 10BBlack: move lvs200x to private subnet [puppet] - 10https://gerrit.wikimedia.org/r/160853 [22:54:22] (03PS1) 10BBlack: move baham to private subnet [puppet] - 10https://gerrit.wikimedia.org/r/160854 [23:00:04] RoanKattouw, ^d, marktraceur, MaxSem: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140916T2300). Please do the needful. [23:00:27] Who has it? :) [23:00:30] eh,nothing to deploy? [23:00:36] * MaxSem bites hoo [23:00:39] :D [23:00:48] Am I alone? [23:00:58] ircknick. [23:01:13] {{fixed}} [23:01:31] * hoo goes to prepare the submodule updates [23:01:35] jouncebot needs {{PLURAL}} support [23:01:47] so, icinga-wm isnt talking currently because of issue on neon..sigh [23:01:58] and no sign of superm401 with his patches too [23:01:59] but jouncebot shouldnt be influenced, heh [23:02:13] Run git submodule update for GettingStarted in wmf21, was forgotten last week [23:02:17] :P [23:02:23] I don't think he wants a patch [23:03:26] (03PS1) 10Dzahn: people.wm - add line breaks [puppet] - 10https://gerrit.wikimedia.org/r/160857 [23:03:46] MaxSem, it's https://gerrit.wikimedia.org/r/#/c/160084/ [23:03:56] thx:) [23:03:58] Just for that branch (wmf21). I accidentally forgot to do the submodule update when I deployed it. [23:05:25] (03CR) 10Dzahn: [C: 032] "_yes_ the trailing whitespace actually belongs there, only adds line breaks, no other changes" [puppet] - 10https://gerrit.wikimedia.org/r/160857 (owner: 10Dzahn) [23:07:06] whoever is doing SWAT: https://gerrit.wikimedia.org/r/160860 and https://gerrit.wikimedia.org/r/160858 [23:07:27] !log maxsem Synchronized php-1.24wmf21/extensions/GettingStarted: https://gerrit.wikimedia.org/r/#/c/160084/ (duration: 00m 08s) [23:07:34] Logged the message, Master [23:07:42] superm401, ^^ please test:) [23:08:30] hoo, any special lore needed for those? [23:08:36] MaxSem, I can't, it's not enabled on any non-Wikipedias. The version we already deployed to 1.24wmf20 seems to have worked though (and was tested locally and has unit tests). [23:08:48] MaxSem: No, only CSS changes [23:08:59] might need an additional touch if RL acts up, though [23:09:03] hoo, okie [23:09:48] superm401, okay - but then I can't be held liable if wiki dies:) [23:09:55] MaxSem, deal. [23:09:57] MaxSem: Still SWATting? [23:10:15] I have another commit for SWAT if that's OK [23:10:57] aha [23:11:21] (03PS1) 10Dzahn: icinga - partly revert logfile permission fix [puppet] - 10https://gerrit.wikimedia.org/r/160864 [23:12:10] !log maxsem Synchronized php-1.24wmf20/extensions/Wikidata: (no message) (duration: 00m 17s) [23:12:16] Logged the message, Master [23:12:16] hoo, ^ [23:12:34] works right ahead... nice :) [23:13:11] (03CR) 10Dzahn: [C: 032] "nice or not, but this is just how it was before the recent change" [puppet] - 10https://gerrit.wikimedia.org/r/160864 (owner: 10Dzahn) [23:15:32] MaxSem: don't forget the other branch [23:15:41] !log Wikidata submodule in wmf21 was in the middle of rebase - reset and updating to a newer submodule commit [23:15:47] Logged the message, Master [23:15:53] oh [23:16:57] !log maxsem Synchronized php-1.24wmf21/extensions/Wikidata: (no message) (duration: 00m 24s) [23:17:01] hoo, ^ [23:17:03] Logged the message, Master [23:17:37] thanks, MaxSem :) [23:17:44] MaxSem: Calendar has the new VE patches finally. Sorry! [23:18:09] (03CR) 10Dzahn: "it's a bit sad that the puppet compiler links are already 404" [puppet] - 10https://gerrit.wikimedia.org/r/116064 (owner: 10Dzahn) [23:19:18] James_F, https://gerrit.wikimedia.org/r/#/c/160869/ is a VE commit - should I merge it? [23:19:40] MaxSem: Sorry, fixed to 160870. [23:19:55] MaxSem: Roan's merging them all for you; just a pull to head of the branch should be sufficient. [23:20:01] RoanKattouw: ^^^ Please confirm. [23:20:05] Yes please merge 160870 [23:20:17] The other one was 160868 which MaxSem has kindly already merged [23:20:30] I've already done 160869 [23:20:40] okay, does everything look merged now? :) [23:20:47] Yes [23:20:49] Thanks [23:21:02] Yay MaxSem. :-) [23:21:35] PROBLEM - CI tmpfs disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 29 MB (5% inode=99%): [23:21:51] !log maxsem Synchronized php-1.24wmf20/extensions/VisualEditor/: (no message) (duration: 00m 04s) [23:21:56] Logged the message, Master [23:22:26] !log maxsem Synchronized php-1.24wmf21/extensions/VisualEditor/: (no message) (duration: 00m 04s) [23:22:31] Logged the message, Master [23:22:35] Thanks MaxSem [23:22:41] please test:) [23:22:44] Yeah [23:22:49] I'll need to test this in three browsers [23:22:59] (03CR) 10Dzahn: "i wonder if interface::add_ip6_mapped will update the IPv6 address if the v4 address changes with this" [puppet] - 10https://gerrit.wikimedia.org/r/160854 (owner: 10BBlack) [23:24:18] Works in Chrome [23:24:37] (03CR) 10MZMcBride: "Thanks much. <3" [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [23:25:46] (03CR) 10Dzahn: "yay, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/160383 (owner: 10MZMcBride) [23:26:15] PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 15 MB (2% inode=99%): [23:27:32] Works in Firefix [23:29:05] RoanKattouw: Freudian. ;-) [23:29:38] :) [23:29:49] I'll test in IE once my Windows laptop finishes installing updates and rebooting [23:30:54] RoanKattouw: So… tomorrow? [23:31:17] eww, why yo're not using a VM?:) [23:31:20] (03CR) 10BBlack: [C: 032] shift codfw LVS DNS over to private [dns] - 10https://gerrit.wikimedia.org/r/160851 (owner: 10BBlack) [23:31:54] (03PS2) 10BBlack: move lvs200x to private subnet [puppet] - 10https://gerrit.wikimedia.org/r/160853 [23:32:01] (03CR) 10BBlack: [C: 032 V: 032] move lvs200x to private subnet [puppet] - 10https://gerrit.wikimedia.org/r/160853 (owner: 10BBlack) [23:32:33] IE works fine too [23:32:40] I tested with [[fr:Exotoxine]]