[00:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150625T0000). [00:00:39] (03PS2) 10Jalexander: Add exception for ALA hackathon at WMF Office [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220653 (https://phabricator.wikimedia.org/T103764) [00:01:36] (03PS1) 10Dzahn: backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 [00:04:00] PROBLEM - jmxtrans on analytics1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:04:40] PROBLEM - jmxtrans on analytics1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:05:00] PROBLEM - jmxtrans on analytics1018 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:05:21] PROBLEM - jmxtrans on analytics1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args -jar.+jmxtrans-all.jar [00:05:22] (03PS1) 10Legoktm: contint: Create symlink for composer in /usr/local/bin/ [puppet] - 10https://gerrit.wikimedia.org/r/220658 [00:06:05] 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1398607 (10mmodell) [00:07:11] (03PS1) 10Krinkle: contint: Fix syntax error in localvhost documentation [puppet] - 10https://gerrit.wikimedia.org/r/220661 [00:08:00] Phabricator upgrade happening now. Will be down for a few minutes. [00:08:31] (03CR) 10Legoktm: [C: 031] contint: Fix syntax error in localvhost documentation [puppet] - 10https://gerrit.wikimedia.org/r/220661 (owner: 10Krinkle) [00:09:20] RECOVERY - jmxtrans on analytics1022 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:10:57] 500 error on Phabricator. Known? [00:11:01] 1:07 Phabricator upgrade happening now. Will be down for a few minutes. [00:11:10] K thanks [00:11:26] shoulda known about that [00:12:16] !log Phabricator upgrade happening now. Will be down for a few minutes. [00:12:22] Logged the message, Master [00:14:00] legoktm: is logging necessary when it's a scheduled maintenance window? I can log it next time if that's appropriate thing to do, just didn't know that it was [00:14:55] twentyafterfour: if there's going to be downtime, I think it's helpful [00:15:13] (03PS1) 10Rush: confd: cleanup template and toml dirs [puppet] - 10https://gerrit.wikimedia.org/r/220664 [00:15:16] twentyafterfour: yeah even a simple 'restarting phabricator for maintenance' is good practice [00:15:26] the upgrade would be much faster if I didn't make a full backup each time. I assume we already have scheduled backups for phabricator, how would I find out when the last backup ran (so I can decide whether it's necessary to make another one before the upgrade... or perhaps schedule the automatic backup to happen immediately before the upgrade window) [00:15:41] RECOVERY - jmxtrans on analytics1018 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:15:42] godog: legoktm: thanks [00:15:45] I'll log it next time [00:15:51] twentyafterfour: that's mostly my early paranoia / always we were doing major things [00:15:57] sean would know the schedule [00:16:01] RECOVERY - jmxtrans on analytics1012 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:16:03] but I think we have an 8 hour replication backup? [00:16:08] should ask [00:16:32] chasemp: considering that the upgrade usually applies schema changes, it's probably a good idea to make the full dump but it does take a while [00:16:38] which script runs the backups (again?) [00:16:57] it's gotten a lot longer mainly [00:17:01] RECOVERY - jmxtrans on analytics1021 is OK: PROCS OK: 1 process with command name java, regex args -jar.+jmxtrans-all.jar [00:17:07] which of course makes sense [00:17:42] Negative24: manually running phabricator/bin/storage dump | gzip > /srv/dumps/$DATESTAMP.sql.gz [00:17:54] we have a lot (a lot, a lot) more tasks than upstream's I noticed [00:18:10] twentyafterfour: ah thanks [00:19:10] twentyafterfour: is that mentioned in any docs? [00:19:28] hey my icinga scheduled maintenance window actually worked, huh? I wish I could schedule it as a recurring window in icinga [00:20:01] Negative24: not really that I know of, I was just contemplating that I should document this stuff (I'm operating off a rough set of notes I got from chasemp) [00:20:11] (03PS1) 10Krinkle: contint: Rename 'qunit' localvhost to 'mediawiki' [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) [00:20:23] (03PS2) 10BBlack: Get rid of unused director_options [puppet] - 10https://gerrit.wikimedia.org/r/220643 [00:20:23] Krinkle: local*v*host? [00:20:25] (03PS2) 10BBlack: Get rid of the default_backend setting [puppet] - 10https://gerrit.wikimedia.org/r/220642 [00:20:27] (03PS5) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 [00:20:29] (03PS5) 10BBlack: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 [00:20:29] James_F: Yep [00:20:39] Krinkle: Not "local vhost"? [00:20:53] James_F: It's the name of the puppet entity. [00:20:57] Oy. [00:21:10] twentyafterfour: I was thinking the same thing. I always look at phabricator on WT and see docs from pre-prod phab time there [00:21:24] (03CR) 10jenkins-bot: [V: 04-1] move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 (owner: 10BBlack) [00:21:26] (03CR) 10jenkins-bot: [V: 04-1] restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack) [00:21:35] twentyafterfour: it did work, didnt get an SMS [00:22:03] twentyafterfour: it's actually possible, we have a script on neon to schedule downtimes, it would just need a cronjob to run it [00:22:10] https://gerrit.wikimedia.org/r/#/q/project:operations/mediawiki-config+-ownerin:wmf-deployment+status:open+-label:Code-Review%253C%253D-1,n,z is longer than it should be. [00:22:28] Who in ops needs to get https://gerrit.wikimedia.org/r/#/c/139581/6 in? [00:22:40] mutante: bblack: Can we make grafana.wikimedia.org https-only? [00:22:41] (03CR) 10Rush: [C: 032] confd: cleanup template and toml dirs [puppet] - 10https://gerrit.wikimedia.org/r/220664 (owner: 10Rush) [00:22:43] Also, Gitblit's broken again (TM) [00:23:18] Krinkle: afair, we can't [00:23:23] og? [00:23:26] interesting [00:23:28] why not? [00:28:03] !log upgrade cassandra to 2.1.7 on restbase1004 [00:28:09] Logged the message, Master [00:29:08] Krinkle: ok, so it was graphite, not grafana, but i meant this: https://gerrit.wikimedia.org/r/#/c/198564/ [00:29:14] see the comments there [00:29:51] also https://gerrit.wikimedia.org/r/#/c/98003/ [00:29:53] hmm getting 503 error on https://phabricator.wikimedia.org/ ? [00:30:01] Jamesofur|cloud: Not any more? [00:30:08] https://wikitech.wikimedia.org/wiki/Httpsless_domains needs an update [00:30:17] Jamesofur|cloud: upgrade window right now [00:30:25] true not anymore but it was as I clicked it :) [00:30:32] (as I linked it that is) [00:30:34] ah [00:30:35] * greg-g wishes there was a Special:Version in phab [00:30:40] !log phabricator upgrade completed [00:30:44] :) [00:30:46] Logged the message, Master [00:31:02] greg-g: there is [00:31:41] (03PS21) 10Dzahn: Add json, erb and less highlight support to gitblit [puppet] - 10https://gerrit.wikimedia.org/r/216421 (owner: 10Paladox) [00:31:54] oh? [00:33:20] hmm, well there is in the log files at least - a has of every repo's HEAD is included in every log, and I thought it was in the footer but I guess our footer overrides it [00:33:31] mutante: Right. But other front-ends can be made https only [00:33:56] greg-g: here you go: https://phabricator.wikimedia.org/config/all/ <- can you see that page? [00:33:58] and any internal use should probably connect to the server directly instead of going to through the front - or use https as well. [00:34:29] (03PS1) 10Rush: pybal: source pools from etcd for lvs2004 [puppet] - 10https://gerrit.wikimedia.org/r/220670 [00:34:33] Krinkle: likely, yea [00:34:52] mutante: grafana is almost entirely unused. But I hope ti use it more in the future. [00:35:01] Better to make it https early :) [00:35:17] phabricator has some kind of failures about sending out mail [00:35:20] https://phabricator.wikimedia.org/daemon/ [00:35:34] twentyafterfour: 403 [00:35:37] PhabricatorMetaMTAWorker [00:35:46] ./bin/mail show-outbound -- ... [00:35:51] (03PS2) 10Rush: pybal: source pools from etcd for lvs2004 [puppet] - 10https://gerrit.wikimedia.org/r/220670 [00:35:58] mutante: I'll check it out [00:36:10] (03CR) 10Rush: [C: 032 V: 032] pybal: source pools from etcd for lvs2004 [puppet] - 10https://gerrit.wikimedia.org/r/220670 (owner: 10Rush) [00:36:17] greg-g: :-/ well that page has the version info [00:37:33] twentyafterfour: oh well [00:37:47] Krinkle: yep, i wouldn't know where the config goes though, since the Apache site is just a Proxy setup [00:38:01] mutante: that looks like normal smtp failures to me [00:38:04] config for http->https redirect [00:38:27] !log upgrade cassandra to 2.1.7 on restbase1008 [00:38:33] Logged the message, Master [00:39:08] (03PS6) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 [00:39:10] (03PS6) 10BBlack: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 [00:39:13] twentyafterfour: if it's normal, cool, i had not watched that "daemons" page before [00:39:59] (03CR) 10jenkins-bot: [V: 04-1] move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 (owner: 10BBlack) [00:40:01] (03CR) 10jenkins-bot: [V: 04-1] restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack) [00:40:06] (03PS22) 10Dzahn: Add json, erb and less highlight support to gitblit [puppet] - 10https://gerrit.wikimedia.org/r/216421 (owner: 10Paladox) [00:40:26] mutante: varnish? [00:41:28] let's replace ngins and apache with varnish as well. It's faster right ? [00:41:29] :D [00:42:04] Krinkle: maybe, gotta ask bblack. for other services behind misc-web we don't, the proto redirect is in the backend, with a condition that it's not https already to avoid a loop [00:42:23] mutante: yeah. same for doc.wm.o [00:42:25] and integration.wm.o [00:42:49] those are already enforcing https [00:44:00] if it was just a regular Apache site we'd just paste the same config snippet we use all the time, yep [00:44:29] (03CR) 10Legoktm: [C: 031] Remove dependency on echowikis.dblist [puppet] - 10https://gerrit.wikimedia.org/r/139581 (https://phabricator.wikimedia.org/T59375) (owner: 10Withoutaname) [00:45:00] (03CR) 10Dzahn: [C: 032] "it's gitblit, not much to break" [puppet] - 10https://gerrit.wikimedia.org/r/216421 (owner: 10Paladox) [00:45:29] mutante: do we have a list of what's left on misc that isn't https? [00:45:56] bblack: we used to have https://wikitech.wikimedia.org/wiki/Httpsless_domains but really outdated [00:45:57] it wouldn't be hard to blanket-cover it in varnish like we do for prod, and then pull all the disparate apache conditionals [00:46:00] mutante: somewhat normal, there should not be a lot of failed jobs but a few MTA failures isn't really unusual [00:46:22] (if there's not some we can't yet convert, which I think might be the case) [00:47:21] (03CR) 10Paladox: "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/216421 (owner: 10Paladox) [00:47:42] bblack: i can at least speak for zirconium, which is backend for a bunch of misc. i just cleaned that up today and it doesnt listen on 443 for anything anymore [00:49:31] well sure, that's I guess a different part of cleanup though (killing direct HTTPS on machines already behind misc) [00:50:04] what we need to inventory at this point is "which misc services do not have apache configs that force HTTPS? (or equivalent for non-apache)" [00:51:51] ganglia, grafana [00:52:08] thanks to all the work on https://phabricator.wikimedia.org/T40516 most have been checked [00:52:13] when HSTS was enabled [00:52:34] i can make it a ticket for me to compile double-checked list [00:53:48] torrus, because "monitoring tool" [00:54:05] also, torrus doesnt have valid cert [00:54:47] mutante: I meant, the ones acutally behind misc-web cluster, specifically (because if none left there, we could turn it on at varnish level for them all) [00:55:22] bblack: ok, i'll find out [00:55:55] oh, that made me notice our status page is not showing data [00:56:01] http://status.wikimedia.org/ [00:56:17] I don't know how many things there are that are currently non-misc-web that could be, we've gotten a lot of the low-hanging fruit there. And so yeah, the non-misc-web ones we have to audit one by one I guess [00:57:14] (03PS1) 10BryanDavis: Cleanup imports [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220673 [00:57:16] (03PS1) 10BryanDavis: Use collections.defaultdict [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220674 [00:57:18] (03PS1) 10BryanDavis: Force YAML strings to be unicode [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220675 [00:57:20] (03PS1) 10BryanDavis: Exit cleanly on ctrl-C [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220676 [00:57:22] (03PS1) 10BryanDavis: Add a debugging command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 [00:57:24] (03PS1) 10BryanDavis: Give separate notifications to deployers and patch owners [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220678 (https://phabricator.wikimedia.org/T101329) [00:57:32] (but like you said, that's already largely audited at this point) [00:58:53] (03PS2) 10BryanDavis: Give separate notifications to deployers and patch owners [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220678 (https://phabricator.wikimedia.org/T101329) [01:01:50] (03CR) 10Dzahn: ".erb looks highlighted :)" [puppet] - 10https://gerrit.wikimedia.org/r/216421 (owner: 10Paladox) [01:04:02] (03CR) 10Alex Monk: [C: 032] Cleanup imports [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220673 (owner: 10BryanDavis) [01:04:22] (03CR) 10Paladox: "Hi yes added that support to with this patch. Json and less are highlighting too." [puppet] - 10https://gerrit.wikimedia.org/r/216421 (owner: 10Paladox) [01:04:58] (03CR) 10Alex Monk: [C: 032] Use collections.defaultdict [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220674 (owner: 10BryanDavis) [01:05:35] (03Merged) 10jenkins-bot: Cleanup imports [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220673 (owner: 10BryanDavis) [01:06:37] (03Merged) 10jenkins-bot: Use collections.defaultdict [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220674 (owner: 10BryanDavis) [01:07:26] Krenair: thanks! [01:08:52] bd808, for https://gerrit.wikimedia.org/r/#/c/220676/1/jouncebot.py shouldn't it exit(0) and not say 'Unhandled exception' for KeyboardInterrupt? [01:09:20] Krenair: hmmm... yeah that would be nicer I suppose [01:09:31] I can amend [01:09:46] the deploy_page.stop() is the important part [01:10:29] (03PS7) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 [01:10:31] (03PS7) 10BBlack: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 [01:11:55] (03PS2) 10BryanDavis: Exit cleanly on ctrl-C [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220676 [01:13:29] (03PS4) 10Ori.livneh: Add Pyglet, a Trebuchet-deployed syntax-highlighting micro-service(!) [puppet] - 10https://gerrit.wikimedia.org/r/220641 [01:20:38] (03CR) 10Alex Monk: [C: 032] Exit cleanly on ctrl-C [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220676 (owner: 10BryanDavis) [01:26:03] (03CR) 10Alex Monk: [C: 032] Give separate notifications to deployers and patch owners [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220678 (https://phabricator.wikimedia.org/T101329) (owner: 10BryanDavis) [01:32:18] (03CR) 10Krinkle: [C: 04-1] "Gonna rename further to not be mediawiki specific either. I plan to use this for other http needs as well. E.g. mounting static files from" [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) (owner: 10Krinkle) [01:39:27] (03PS2) 10Krinkle: contint: Rename 'qunit' localvhost to 'worker' [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) [01:42:20] PROBLEM - puppet last run on mw2133 is CRITICAL puppet fail [01:48:18] 6operations, 10Traffic: check if services behind misc-web enforce http->https redirect or not - https://phabricator.wikimedia.org/T103773#1398743 (10Dzahn) [01:50:36] (03PS3) 10Krinkle: contint: Rename 'qunit' localvhost to 'worker' [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) [01:51:33] (03PS1) 10Dzahn: bugzilla: backup dump and static files [puppet] - 10https://gerrit.wikimedia.org/r/220691 (https://phabricator.wikimedia.org/T95184) [02:01:51] RECOVERY - puppet last run on mw2133 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [02:04:06] (03PS3) 10BBlack: Get rid of unused director_options [puppet] - 10https://gerrit.wikimedia.org/r/220643 [02:04:08] (03PS3) 10BBlack: Get rid of the default_backend setting [puppet] - 10https://gerrit.wikimedia.org/r/220642 [02:04:10] (03PS8) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 [02:04:12] (03PS8) 10BBlack: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 [02:04:55] !log disabling puppet on cp* caches for patch-testing [02:05:05] Logged the message, Master [02:05:55] (03CR) 10BBlack: [C: 032] Get rid of the default_backend setting [puppet] - 10https://gerrit.wikimedia.org/r/220642 (owner: 10BBlack) [02:07:21] (03PS1) 10Dzahn: misc-web varnish: remove stat1001 from config [puppet] - 10https://gerrit.wikimedia.org/r/220692 [02:07:42] (03PS2) 10Dzahn: bugzilla: backup dump and static files [puppet] - 10https://gerrit.wikimedia.org/r/220691 (https://phabricator.wikimedia.org/T95184) [02:09:42] (03PS2) 10Dzahn: contint: Fix syntax error in localvhost documentation [puppet] - 10https://gerrit.wikimedia.org/r/220661 (owner: 10Krinkle) [02:09:53] (03CR) 10Dzahn: [C: 032] bugzilla: backup dump and static files [puppet] - 10https://gerrit.wikimedia.org/r/220691 (https://phabricator.wikimedia.org/T95184) (owner: 10Dzahn) [02:11:20] (03PS3) 10Dzahn: contint: Fix syntax error in localvhost documentation [puppet] - 10https://gerrit.wikimedia.org/r/220661 (owner: 10Krinkle) [02:13:01] (03CR) 10Dzahn: [C: 032] contint: Fix syntax error in localvhost documentation [puppet] - 10https://gerrit.wikimedia.org/r/220661 (owner: 10Krinkle) [02:16:02] PROBLEM - puppet last run on cp2026 is CRITICAL Puppet has 1 failures [02:16:23] figures :P [02:16:30] (03PS1) 10Legoktm: contint: Add 'libffi-dev' package [puppet] - 10https://gerrit.wikimedia.org/r/220694 (https://phabricator.wikimedia.org/T103775) [02:17:12] PROBLEM - puppet last run on cp2022 is CRITICAL Puppet has 1 failures [02:17:18] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Build Debian package ruby-jsduck for Jessie - https://phabricator.wikimedia.org/T95008#1398829 (10Dzahn) a:5Dzahn>3None [02:19:55] 6operations, 7Monitoring, 5Patch-For-Review: remove ganglia(old), replace with ganglia_new - https://phabricator.wikimedia.org/T93776#1398845 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/217214/ https://gerrit.wikimedia.org/r/#/c/219418/ https://gerrit.wikimedia.org/r/#/c/219079/ https://gerrit.wikimedia.org... [02:21:29] (03CR) 10Dzahn: "DBA's: want a separate ticket to drop the database(s)?" [puppet] - 10https://gerrit.wikimedia.org/r/220548 (https://phabricator.wikimedia.org/T103193) (owner: 10Dzahn) [02:25:20] (03CR) 10Springle: [C: 031] "This is enough. Puppet doesn't apply stuff, so DBA still needs to REVOKE. Which I shall." [puppet] - 10https://gerrit.wikimedia.org/r/220548 (https://phabricator.wikimedia.org/T103193) (owner: 10Dzahn) [02:25:43] (03PS4) 10Krinkle: contint: Rename 'qunit' localvhost to 'worker' [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) [02:27:13] (03CR) 10Springle: "We've never really dropped databases. after all, if they're unused, they're a) relatively harmless to leave intact, and b) easy to access " [puppet] - 10https://gerrit.wikimedia.org/r/220548 (https://phabricator.wikimedia.org/T103193) (owner: 10Dzahn) [02:27:45] (03CR) 10BBlack: [C: 031] "+1, with the caveat that let's not move any existing working redirect-domains to this yet." [dns] - 10https://gerrit.wikimedia.org/r/216025 (owner: 10Dzahn) [02:29:42] ACKNOWLEDGEMENT - puppet last run on cp2022 is CRITICAL Puppet has 1 failures Brandon Black Known issues while testing new etcd+varnish integration for parsoidcluster [02:29:42] ACKNOWLEDGEMENT - puppet last run on cp2026 is CRITICAL Puppet has 1 failures Brandon Black Known issues while testing new etcd+varnish integration for parsoidcluster [02:31:27] (03PS2) 10Dzahn: mariadb: drop grants for bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/220548 (https://phabricator.wikimedia.org/T103193) [02:31:39] (03CR) 10Krinkle: [C: 031] contint: Create symlink for composer in /usr/local/bin/ [puppet] - 10https://gerrit.wikimedia.org/r/220658 (owner: 10Legoktm) [02:31:55] (03PS4) 10BBlack: Get rid of unused director_options [puppet] - 10https://gerrit.wikimedia.org/r/220643 [02:31:57] (03PS9) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 [02:31:59] (03PS9) 10BBlack: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 [02:32:19] (03CR) 10Dzahn: [C: 032] mariadb: drop grants for bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/220548 (https://phabricator.wikimedia.org/T103193) (owner: 10Dzahn) [02:33:18] (03PS5) 10BBlack: Get rid of unused director_options [puppet] - 10https://gerrit.wikimedia.org/r/220643 [02:33:20] (03PS10) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 [02:33:22] (03PS10) 10BBlack: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 [02:33:23] endless-rebase! [02:33:42] (03CR) 10BBlack: [C: 032 V: 032] Get rid of unused director_options [puppet] - 10https://gerrit.wikimedia.org/r/220643 (owner: 10BBlack) [02:34:04] !log l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 06m 44s) [02:34:13] Logged the message, Master [02:37:44] !log LocalisationUpdate completed (1.26wmf10) at 2015-06-25 02:37:44+00:00 [02:37:51] Logged the message, Master [02:38:08] (03PS2) 10Alex Monk: Add a debugging command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 (owner: 10BryanDavis) [02:38:17] 6operations, 10Wikimedia-Bugzilla, 5Patch-For-Review: remove Bugzilla installation remnants from zirconium and repos - https://phabricator.wikimedia.org/T103193#1398865 (10Dzahn) The database copy @springle made for us that was used to create the sanitized dump can also be deleted now. 19:34 <@springle> mut... [02:38:31] PROBLEM - salt-minion processes on restbase1007 is CRITICAL: Connection refused by host [02:39:00] PROBLEM - DPKG on restbase1007 is CRITICAL: Connection refused by host [02:39:01] PROBLEM - SSH on restbase1007 is CRITICAL: Connection refused [02:39:11] PROBLEM - configured eth on restbase1007 is CRITICAL: Connection refused by host [02:39:20] PROBLEM - NTP on restbase1007 is CRITICAL: NTP CRITICAL: No response from NTP server [02:39:21] PROBLEM - dhclient process on restbase1007 is CRITICAL: Connection refused by host [02:39:31] PROBLEM - Disk space on restbase1007 is CRITICAL: Connection refused by host [02:39:41] PROBLEM - puppet last run on restbase1007 is CRITICAL: Connection refused by host [02:39:55] restbase1007 is not yet in production (it has a broken disk), so no need to worry [02:40:10] !log puppet re-enabled on caches [02:40:17] Logged the message, Master [02:40:38] (03CR) 10Alex Monk: [C: 04-1] Add a debugging command (031 comment) [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 (owner: 10BryanDavis) [02:41:33] bd808, I'm not reviewing https://gerrit.wikimedia.org/r/#/c/220675/1 but the rest seems fine [02:41:43] do they really depend on this? [02:43:27] (03PS3) 10Alex Monk: Add dvidshub to whitelist upload URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219599 (https://phabricator.wikimedia.org/T103062) (owner: 10Matanya) [02:44:50] (03PS5) 10Krinkle: contint: Rename 'qunit' localvhost to 'worker' [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) [02:44:55] (03CR) 10jenkins-bot: [V: 04-1] contint: Rename 'qunit' localvhost to 'worker' [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) (owner: 10Krinkle) [02:45:13] (03PS6) 10Krinkle: contint: Rename 'qunit' localvhost to 'worker' [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) [02:45:20] (03CR) 10Alex Monk: [C: 031] Enable the SandboxLink extension on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220408 (https://phabricator.wikimedia.org/T103643) (owner: 10Ricordisamoa) [02:47:11] PROBLEM - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [02:47:43] ^ looking... [02:48:45] bblack: ack, let me know if I can help [02:49:00] RECOVERY - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 494 bytes in 0.002 second response time [02:50:14] pybal never depooled them... [02:50:20] (textlb6_80) [02:50:24] not sure yet [02:51:35] nothing seems obviously amiss in traffic graphs per node for that pool either. it may have been not-real. [02:53:38] it looks real in neon's logs, but maybe something about neon, or about eqiad<->esams? [02:55:00] why esams tho? [02:56:06] I don't know [02:56:14] why only ipv6, why only text, too? :) [02:56:47] (03CR) 10Alex Monk: "The problem with this is that there are existing users in that group... Someone (local bureaucrat? global sysadmin? unclear) will need to " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218926 (https://phabricator.wikimedia.org/T102770) (owner: 10Glaisher) [02:57:09] it seems to have been intermittently and relatedly failing since :04 of this hour in icinga logs [02:57:18] just for that service (ipv6 text esams) [02:58:00] we're at a relatively low point in esams daily traffic curves though, and no spike from any kind of DoS or whatever. [02:58:46] the first (recent) soft alert in logs was: [02:58:47] Jun 25 02:04:30 neon icinga: SERVICE ALERT: text-lb.eqiad.wikimedia.org_ipv6;LVS HTTP IPv6;CRITICAL;SOFT;1;Connection timed out [02:58:53] Krenair: yes, the change in https://gerrit.wikimedia.org/r/#/c/220678 where I use string.format() instead of % does depend on the yaml unicode casting patch [02:59:26] oooh right [02:59:32] just strange ordering of dependencies then [03:00:32] RECOVERY - Disk space on snapshot1001 is OK: DISK OK [03:01:28] bd808, okay then [03:01:33] (03CR) 10Alex Monk: [C: 032] Force YAML strings to be unicode [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220675 (owner: 10BryanDavis) [03:01:54] (03CR) 10Krinkle: [C: 031] "Deployed on integration-puppet." [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) (owner: 10Krinkle) [03:02:31] going back further in pybal.log timelines, there are intermittent spikes of related issues, somehow, hours/days ago [03:02:36] again, may not be real :/ [03:03:09] (03Merged) 10jenkins-bot: Force YAML strings to be unicode [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220675 (owner: 10BryanDavis) [03:04:01] (03PS3) 10BryanDavis: Add a debugging command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 [03:04:29] yup text-lb and mobile-lb [03:04:47] (03CR) 10BryanDavis: Add a debugging command (031 comment) [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 (owner: 10BryanDavis) [03:04:50] !log l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 19s) [03:04:58] Logged the message, Master [03:08:58] bd808, you didn't update the indentation [03:09:08] doh [03:09:49] (03PS4) 10BryanDavis: Add a debugging command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 [03:10:56] (03CR) 10Alex Monk: [C: 032] Add a debugging command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 (owner: 10BryanDavis) [03:11:01] !log LocalisationUpdate completed (1.26wmf11) at 2015-06-25 03:11:01+00:00 [03:11:08] Logged the message, Master [03:12:31] (03Merged) 10jenkins-bot: Add a debugging command [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220677 (owner: 10BryanDavis) [03:13:01] (03PS3) 10Alex Monk: Give separate notifications to deployers and patch owners [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220678 (https://phabricator.wikimedia.org/T101329) (owner: 10BryanDavis) [03:13:20] (03CR) 10Alex Monk: [C: 032] Give separate notifications to deployers and patch owners [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220678 (https://phabricator.wikimedia.org/T101329) (owner: 10BryanDavis) [03:14:51] (03Merged) 10jenkins-bot: Give separate notifications to deployers and patch owners [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/220678 (https://phabricator.wikimedia.org/T101329) (owner: 10BryanDavis) [03:26:03] (03CR) 10Hydriz: first draft python wrapper for html dumps (036 comments) [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/206849 (https://phabricator.wikimedia.org/T17017) (owner: 10ArielGlenn) [03:30:36] (03CR) 10John Vandenberg: [C: 031] contint: Add 'libffi-dev' package [puppet] - 10https://gerrit.wikimedia.org/r/220694 (https://phabricator.wikimedia.org/T103775) (owner: 10Legoktm) [03:31:30] jouncebot: next [03:31:31] In 11 hour(s) and 28 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150625T1500) [03:32:09] Krenair: thanks for all the merges. We will see if the world hates my changes tomorrow [03:41:39] (03CR) 10Hydriz: [C: 04-1] first draft python wrapper for html dumps [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/206849 (https://phabricator.wikimedia.org/T17017) (owner: 10ArielGlenn) [03:44:30] would you expect your home dir on a bastion host to be in backed up (in bacula)? i would say, _either_ it's a thing that gets added with "role bastionhost" or it doesn't. current situation: kind of random, per host in site.pp. _but_ that encourages using bastion hosts as work hosts they shouldn't be (https://en.wikipedia.org/wiki/Bastion_host) but then the actual work hosts are just terbium and tin [03:58:31] PROBLEM - puppet last run on lvs4004 is CRITICAL puppet fail [03:58:40] PROBLEM - puppet last run on mw1155 is CRITICAL puppet fail [03:58:41] PROBLEM - puppet last run on cp1068 is CRITICAL puppet fail [03:58:51] PROBLEM - puppet last run on mw2195 is CRITICAL puppet fail [03:59:20] PROBLEM - puppet last run on mw1113 is CRITICAL puppet fail [03:59:20] PROBLEM - puppet last run on db1037 is CRITICAL puppet fail [03:59:31] PROBLEM - puppet last run on db2052 is CRITICAL puppet fail [03:59:31] PROBLEM - puppet last run on mw2078 is CRITICAL puppet fail [03:59:32] PROBLEM - puppet last run on mw1207 is CRITICAL puppet fail [03:59:32] PROBLEM - puppet last run on labvirt1006 is CRITICAL puppet fail [03:59:40] PROBLEM - puppet last run on mc1015 is CRITICAL puppet fail [03:59:41] PROBLEM - puppet last run on mw1104 is CRITICAL puppet fail [03:59:41] PROBLEM - puppet last run on elastic1025 is CRITICAL puppet fail [03:59:42] PROBLEM - puppet last run on db2060 is CRITICAL puppet fail [04:00:00] PROBLEM - puppet last run on cp2008 is CRITICAL puppet fail [04:00:00] PROBLEM - puppet last run on mw1255 is CRITICAL puppet fail [04:00:00] PROBLEM - puppet last run on es2010 is CRITICAL puppet fail [04:00:20] PROBLEM - puppet last run on db2055 is CRITICAL puppet fail [04:00:21] PROBLEM - puppet last run on mw2070 is CRITICAL puppet fail [04:00:21] PROBLEM - puppet last run on mw2055 is CRITICAL puppet fail [04:00:32] PROBLEM - puppet last run on mw1131 is CRITICAL puppet fail [04:00:41] PROBLEM - puppet last run on cp3048 is CRITICAL puppet fail [04:00:41] PROBLEM - puppet last run on radon is CRITICAL puppet fail [04:00:42] PROBLEM - puppet last run on mw2207 is CRITICAL puppet fail [04:00:42] PROBLEM - puppet last run on mw2060 is CRITICAL puppet fail [04:00:42] PROBLEM - puppet last run on mw2120 is CRITICAL puppet fail [04:00:42] PROBLEM - puppet last run on mw2119 is CRITICAL puppet fail [04:00:42] PROBLEM - puppet last run on mc2015 is CRITICAL puppet fail [04:00:50] PROBLEM - puppet last run on cp1064 is CRITICAL puppet fail [04:00:50] PROBLEM - puppet last run on etherpad1001 is CRITICAL puppet fail [04:00:51] PROBLEM - puppet last run on db1009 is CRITICAL puppet fail [04:01:00] PROBLEM - puppet last run on db1035 is CRITICAL puppet fail [04:01:01] PROBLEM - puppet last run on db1005 is CRITICAL puppet fail [04:01:01] PROBLEM - puppet last run on cp3007 is CRITICAL puppet fail [04:01:20] PROBLEM - puppet last run on conf1002 is CRITICAL puppet fail [04:01:21] PROBLEM - puppet last run on analytics1031 is CRITICAL puppet fail [04:01:31] PROBLEM - puppet last run on mw2110 is CRITICAL puppet fail [04:01:31] PROBLEM - puppet last run on mw1085 is CRITICAL puppet fail [04:01:41] PROBLEM - puppet last run on mw1075 is CRITICAL puppet fail [04:01:41] PROBLEM - puppet last run on db1019 is CRITICAL puppet fail [04:01:41] PROBLEM - puppet last run on elastic1009 is CRITICAL puppet fail [04:01:50] PROBLEM - puppet last run on cp3022 is CRITICAL puppet fail [04:01:51] PROBLEM - puppet last run on wtp2004 is CRITICAL puppet fail [04:02:01] PROBLEM - puppet last run on cp4015 is CRITICAL puppet fail [04:02:01] PROBLEM - puppet last run on cp3020 is CRITICAL puppet fail [04:02:02] PROBLEM - puppet last run on labsdb1007 is CRITICAL puppet fail [04:02:10] PROBLEM - puppet last run on mw1191 is CRITICAL puppet fail [04:02:10] PROBLEM - puppet last run on cp2010 is CRITICAL puppet fail [04:02:10] PROBLEM - puppet last run on wtp1014 is CRITICAL puppet fail [04:02:11] PROBLEM - puppet last run on mw2174 is CRITICAL puppet fail [04:02:11] PROBLEM - puppet last run on mw2138 is CRITICAL puppet fail [04:02:11] PROBLEM - puppet last run on mw2165 is CRITICAL puppet fail [04:02:11] PROBLEM - puppet last run on lvs4001 is CRITICAL puppet fail [04:02:12] PROBLEM - puppet last run on ms-be2010 is CRITICAL puppet fail [04:02:20] PROBLEM - puppet last run on tmh1002 is CRITICAL puppet fail [04:02:20] PROBLEM - puppet last run on cp1073 is CRITICAL puppet fail [04:02:21] PROBLEM - puppet last run on mw1182 is CRITICAL puppet fail [04:02:30] PROBLEM - puppet last run on mw1013 is CRITICAL puppet fail [04:02:30] PROBLEM - puppet last run on analytics1029 is CRITICAL puppet fail [04:02:41] PROBLEM - puppet last run on mw1179 is CRITICAL puppet fail [04:02:42] PROBLEM - puppet last run on elastic1016 is CRITICAL puppet fail [04:03:00] PROBLEM - puppet last run on mw1094 is CRITICAL puppet fail [04:03:01] PROBLEM - puppet last run on praseodymium is CRITICAL puppet fail [04:03:11] PROBLEM - puppet last run on db1041 is CRITICAL puppet fail [04:03:11] PROBLEM - puppet last run on db2049 is CRITICAL puppet fail [04:03:12] PROBLEM - puppet last run on mw2141 is CRITICAL Puppet has 1 failures [04:03:12] PROBLEM - puppet last run on mw2089 is CRITICAL puppet fail [04:03:20] what happened? [04:03:31] PROBLEM - puppet last run on db2061 is CRITICAL puppet fail [04:03:31] PROBLEM - puppet last run on mc1008 is CRITICAL puppet fail [04:03:32] PROBLEM - puppet last run on ms-be1016 is CRITICAL puppet fail [04:03:41] PROBLEM - puppet last run on analytics1019 is CRITICAL puppet fail [04:03:41] PROBLEM - puppet last run on wtp2003 is CRITICAL puppet fail [04:03:41] PROBLEM - puppet last run on mw2099 is CRITICAL puppet fail [04:03:41] PROBLEM - puppet last run on mw2102 is CRITICAL puppet fail [04:03:41] PROBLEM - puppet last run on mw2041 is CRITICAL puppet fail [04:03:50] PROBLEM - puppet last run on cp3013 is CRITICAL puppet fail [04:03:50] PROBLEM - puppet last run on cp1069 is CRITICAL puppet fail [04:03:51] PROBLEM - puppet last run on cp1074 is CRITICAL puppet fail [04:04:00] PROBLEM - puppet last run on mw1234 is CRITICAL puppet fail [04:04:00] PROBLEM - puppet last run on mw1244 is CRITICAL puppet fail [04:04:01] PROBLEM - puppet last run on analytics1036 is CRITICAL puppet fail [04:04:09] i'll restart puppetmaster [04:04:11] PROBLEM - puppet last run on mw1080 is CRITICAL puppet fail [04:04:11] PROBLEM - puppet last run on zirconium is CRITICAL puppet fail [04:04:18] looking as well [04:04:20] PROBLEM - puppet last run on mc1004 is CRITICAL puppet fail [04:04:20] PROBLEM - puppet last run on wtp1024 is CRITICAL puppet fail [04:04:21] PROBLEM - puppet last run on db2035 is CRITICAL puppet fail [04:04:21] PROBLEM - puppet last run on mw2179 is CRITICAL puppet fail [04:04:21] PROBLEM - puppet last run on mw2133 is CRITICAL puppet fail [04:04:21] PROBLEM - puppet last run on mw2046 is CRITICAL puppet fail [04:04:21] PROBLEM - puppet last run on mw2103 is CRITICAL puppet fail [04:04:22] PROBLEM - puppet last run on mw1246 is CRITICAL puppet fail [04:04:22] PROBLEM - puppet last run on mw2072 is CRITICAL puppet fail [04:04:23] PROBLEM - puppet last run on mw1062 is CRITICAL puppet fail [04:04:31] PROBLEM - puppet last run on elastic1028 is CRITICAL puppet fail [04:04:32] PROBLEM - puppet last run on cp4007 is CRITICAL puppet fail [04:04:35] !log restarted apache2 on palladium [04:04:41] PROBLEM - puppet last run on cp3046 is CRITICAL puppet fail [04:04:41] Logged the message, Master [04:04:42] PROBLEM - puppet last run on mw1124 is CRITICAL puppet fail [04:04:42] PROBLEM - puppet last run on mw1178 is CRITICAL puppet fail [04:04:51] PROBLEM - puppet last run on analytics1015 is CRITICAL puppet fail [04:05:00] PROBLEM - puppet last run on mw1257 is CRITICAL puppet fail [04:05:02] PROBLEM - puppet last run on mw2032 is CRITICAL puppet fail [04:05:11] PROBLEM - puppet last run on mw1109 is CRITICAL puppet fail [04:05:12] PROBLEM - puppet last run on db2046 is CRITICAL puppet fail [04:05:21] PROBLEM - puppet last run on mw1115 is CRITICAL puppet fail [04:05:21] PROBLEM - puppet last run on eeden is CRITICAL puppet fail [04:05:22] PROBLEM - puppet last run on cp2017 is CRITICAL puppet fail [04:05:30] PROBLEM - puppet last run on mw2074 is CRITICAL puppet fail [04:05:31] PROBLEM - puppet last run on cp4006 is CRITICAL puppet fail [04:05:31] PROBLEM - puppet last run on cp3045 is CRITICAL puppet fail [04:05:41] PROBLEM - puppet last run on logstash1001 is CRITICAL puppet fail [04:05:41] PROBLEM - puppet last run on labnodepool1001 is CRITICAL puppet fail [04:05:51] PROBLEM - puppet last run on mw1240 is CRITICAL puppet fail [04:05:51] PROBLEM - puppet last run on mw1007 is CRITICAL puppet fail [04:06:00] PROBLEM - puppet last run on caesium is CRITICAL puppet fail [04:06:01] PROBLEM - puppet last run on ms-be1001 is CRITICAL puppet fail [04:06:10] PROBLEM - puppet last run on mw2164 is CRITICAL puppet fail [04:06:10] PROBLEM - puppet last run on mw2124 is CRITICAL puppet fail [04:06:10] PROBLEM - puppet last run on mw2122 is CRITICAL puppet fail [04:06:10] PROBLEM - puppet last run on es2008 is CRITICAL puppet fail [04:06:11] PROBLEM - puppet last run on potassium is CRITICAL puppet fail [04:06:11] PROBLEM - puppet last run on mw1048 is CRITICAL puppet fail [04:06:11] PROBLEM - puppet last run on ms-be2003 is CRITICAL puppet fail [04:06:12] RECOVERY - puppet last run on etherpad1001 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [04:06:20] PROBLEM - puppet last run on mw1134 is CRITICAL puppet fail [04:06:21] PROBLEM - puppet last run on mw1197 is CRITICAL puppet fail [04:06:22] PROBLEM - puppet last run on uranium is CRITICAL puppet fail [04:06:30] PROBLEM - puppet last run on netmon1001 is CRITICAL puppet fail [04:06:30] PROBLEM - puppet last run on ms-be1003 is CRITICAL puppet fail [04:06:31] PROBLEM - puppet last run on mw1141 is CRITICAL puppet fail [04:06:31] PROBLEM - puppet last run on elastic1001 is CRITICAL puppet fail [04:06:41] PROBLEM - puppet last run on mw1012 is CRITICAL puppet fail [04:06:41] PROBLEM - puppet last run on mw1006 is CRITICAL puppet fail [04:06:41] PROBLEM - puppet last run on analytics1017 is CRITICAL puppet fail [04:06:50] PROBLEM - puppet last run on mc1016 is CRITICAL puppet fail [04:06:51] PROBLEM - puppet last run on mw1145 is CRITICAL puppet fail [04:06:51] PROBLEM - puppet last run on es2005 is CRITICAL puppet fail [04:06:51] PROBLEM - puppet last run on mw2014 is CRITICAL puppet fail [04:06:52] PROBLEM - puppet last run on mc2004 is CRITICAL puppet fail [04:06:52] PROBLEM - puppet last run on mw1082 is CRITICAL puppet fail [04:07:01] PROBLEM - puppet last run on wtp1016 is CRITICAL puppet fail [04:07:01] PROBLEM - puppet last run on cp2023 is CRITICAL puppet fail [04:07:07] mod passenger freaked out in error.log too [04:07:11] PROBLEM - puppet last run on heze is CRITICAL puppet fail [04:07:11] PROBLEM - puppet last run on analytics1040 is CRITICAL puppet fail [04:07:12] PROBLEM - puppet last run on db2039 is CRITICAL puppet fail [04:07:12] PROBLEM - puppet last run on mw2015 is CRITICAL puppet fail [04:07:20] PROBLEM - puppet last run on mw1060 is CRITICAL puppet fail [04:07:20] PROBLEM - puppet last run on cp1055 is CRITICAL puppet fail [04:07:21] PROBLEM - puppet last run on elastic1018 is CRITICAL puppet fail [04:07:21] PROBLEM - puppet last run on db1073 is CRITICAL puppet fail [04:07:31] PROBLEM - puppet last run on ms-be1006 is CRITICAL puppet fail [04:07:31] PROBLEM - puppet last run on logstash1004 is CRITICAL puppet fail [04:07:31] PROBLEM - puppet last run on mw2114 is CRITICAL puppet fail [04:07:40] PROBLEM - puppet last run on mw1150 is CRITICAL puppet fail [04:07:40] PROBLEM - puppet last run on elastic1012 is CRITICAL puppet fail [04:07:41] PROBLEM - puppet last run on mw1242 is CRITICAL puppet fail [04:07:42] PROBLEM - puppet last run on mw1205 is CRITICAL puppet fail [04:07:50] PROBLEM - puppet last run on labstore1003 is CRITICAL puppet fail [04:07:51] PROBLEM - puppet last run on cp3012 is CRITICAL puppet fail [04:07:51] PROBLEM - puppet last run on db2059 is CRITICAL puppet fail [04:07:52] PROBLEM - puppet last run on cp2005 is CRITICAL puppet fail [04:07:52] PROBLEM - puppet last run on db2005 is CRITICAL puppet fail [04:07:52] PROBLEM - puppet last run on es2001 is CRITICAL puppet fail [04:08:01] PROBLEM - puppet last run on mw1173 is CRITICAL puppet fail [04:08:01] PROBLEM - puppet last run on mw1008 is CRITICAL puppet fail [04:08:11] PROBLEM - puppet last run on cp3040 is CRITICAL puppet fail [04:08:20] PROBLEM - puppet last run on db1066 is CRITICAL puppet fail [04:08:21] PROBLEM - puppet last run on db1015 is CRITICAL puppet fail [04:08:31] PROBLEM - puppet last run on sca1002 is CRITICAL puppet fail [04:08:31] PROBLEM - puppet last run on mw1160 is CRITICAL puppet fail [04:08:31] PROBLEM - puppet last run on db1022 is CRITICAL puppet fail [04:08:32] PROBLEM - puppet last run on mw1187 is CRITICAL puppet fail [04:08:40] PROBLEM - puppet last run on mw2173 is CRITICAL puppet fail [04:08:40] PROBLEM - puppet last run on mw2163 is CRITICAL puppet fail [04:08:41] PROBLEM - puppet last run on mw2127 is CRITICAL puppet fail [04:08:41] PROBLEM - puppet last run on es2009 is CRITICAL puppet fail [04:08:41] PROBLEM - puppet last run on db2018 is CRITICAL puppet fail [04:08:41] PROBLEM - puppet last run on db1046 is CRITICAL puppet fail [04:08:41] PROBLEM - puppet last run on mw1117 is CRITICAL puppet fail [04:08:42] PROBLEM - puppet last run on mc2010 is CRITICAL puppet fail [04:08:42] PROBLEM - puppet last run on dbproxy1008 is CRITICAL puppet fail [04:08:51] PROBLEM - puppet last run on mw1065 is CRITICAL puppet fail [04:08:51] PROBLEM - puppet last run on mw1189 is CRITICAL Puppet has 42 failures [04:08:52] PROBLEM - puppet last run on cp3010 is CRITICAL puppet fail [04:08:52] PROBLEM - puppet last run on ms-fe1001 is CRITICAL puppet fail [04:09:00] PROBLEM - puppet last run on mw2016 is CRITICAL puppet fail [04:09:01] PROBLEM - puppet last run on mw1119 is CRITICAL puppet fail [04:09:01] PROBLEM - puppet last run on mw1222 is CRITICAL Puppet has 7 failures [04:09:01] PROBLEM - puppet last run on mw2184 is CRITICAL puppet fail [04:09:01] PROBLEM - puppet last run on mw2097 is CRITICAL puppet fail [04:09:01] PROBLEM - puppet last run on mw2082 is CRITICAL puppet fail [04:09:10] PROBLEM - puppet last run on mw1164 is CRITICAL puppet fail [04:09:11] PROBLEM - puppet last run on cp3037 is CRITICAL puppet fail [04:09:11] PROBLEM - puppet last run on cp3042 is CRITICAL puppet fail [04:09:11] PROBLEM - puppet last run on cp3014 is CRITICAL puppet fail [04:09:12] PROBLEM - puppet last run on mw1153 is CRITICAL Puppet has 41 failures [04:09:20] PROBLEM - puppet last run on mw1100 is CRITICAL puppet fail [04:09:20] PROBLEM - puppet last run on mw1069 is CRITICAL Puppet has 10 failures [04:09:20] PROBLEM - puppet last run on mw2145 is CRITICAL Puppet has 10 failures [04:09:20] PROBLEM - puppet last run on iron is CRITICAL puppet fail [04:09:21] PROBLEM - puppet last run on mw2083 is CRITICAL puppet fail [04:09:21] PROBLEM - puppet last run on mw2076 is CRITICAL Puppet has 10 failures [04:09:21] PROBLEM - puppet last run on mw2093 is CRITICAL puppet fail [04:09:22] PROBLEM - puppet last run on mw2066 is CRITICAL puppet fail [04:09:22] PROBLEM - puppet last run on mw1088 is CRITICAL Puppet has 12 failures [04:09:23] PROBLEM - puppet last run on mw2080 is CRITICAL Puppet has 9 failures [04:09:30] PROBLEM - puppet last run on lvs3001 is CRITICAL puppet fail [04:09:31] PROBLEM - puppet last run on analytics1035 is CRITICAL puppet fail [04:09:31] PROBLEM - puppet last run on mw1068 is CRITICAL Puppet has 9 failures [04:09:42] PROBLEM - puppet last run on db2045 is CRITICAL Puppet has 4 failures [04:09:42] PROBLEM - puppet last run on mw2213 is CRITICAL Puppet has 9 failures [04:09:42] PROBLEM - puppet last run on mw2013 is CRITICAL puppet fail [04:09:42] PROBLEM - puppet last run on mw2036 is CRITICAL puppet fail [04:09:42] PROBLEM - puppet last run on mw2033 is CRITICAL Puppet has 8 failures [04:09:51] PROBLEM - puppet last run on mw1176 is CRITICAL Puppet has 10 failures [04:09:51] PROBLEM - puppet last run on mw1226 is CRITICAL Puppet has 16 failures [04:10:22] PROBLEM - puppet last run on mw2104 is CRITICAL Puppet has 6 failures [04:10:22] PROBLEM - puppet last run on mw2109 is CRITICAL Puppet has 7 failures [04:10:32] PROBLEM - puppet last run on mw1099 is CRITICAL Puppet has 15 failures [04:10:52] PROBLEM - puppet last run on mw1003 is CRITICAL Puppet has 2 failures [04:11:30] PROBLEM - puppet last run on mw2043 is CRITICAL Puppet has 7 failures [04:11:31] PROBLEM - puppet last run on mw2023 is CRITICAL Puppet has 2 failures [04:14:52] RECOVERY - puppet last run on mw1068 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:14:54] it did not even get kicked [04:15:30] RECOVERY - puppet last run on db1037 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [04:15:41] RECOVERY - puppet last run on mw2141 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [04:15:41] RECOVERY - puppet last run on db2052 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [04:15:42] RECOVERY - puppet last run on labvirt1006 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [04:15:51] RECOVERY - puppet last run on db2060 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [04:16:01] RECOVERY - puppet last run on mw1255 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:16:10] RECOVERY - puppet last run on es2010 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures [04:16:21] RECOVERY - puppet last run on db2055 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [04:16:31] RECOVERY - puppet last run on lvs4004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:16:32] RECOVERY - puppet last run on mw1155 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [04:16:32] RECOVERY - puppet last run on mw1131 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [04:16:40] RECOVERY - puppet last run on cp1068 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:16:40] RECOVERY - puppet last run on radon is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [04:16:41] RECOVERY - puppet last run on mw2195 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [04:16:50] RECOVERY - puppet last run on mc2015 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [04:16:51] RECOVERY - puppet last run on db1009 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [04:17:01] RECOVERY - puppet last run on db1035 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:17:11] RECOVERY - puppet last run on mw1113 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [04:17:20] RECOVERY - puppet last run on conf1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:17:21] RECOVERY - puppet last run on analytics1031 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [04:17:31] RECOVERY - puppet last run on mw1207 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:17:32] RECOVERY - puppet last run on mw2078 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [04:17:32] RECOVERY - puppet last run on mc1015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:17:40] RECOVERY - puppet last run on mw1104 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:17:41] RECOVERY - puppet last run on elastic1025 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:17:41] RECOVERY - puppet last run on elastic1009 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures [04:17:51] RECOVERY - puppet last run on cp3022 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [04:18:00] RECOVERY - puppet last run on cp2008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:00] RECOVERY - puppet last run on wtp2004 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [04:18:01] RECOVERY - puppet last run on labsdb1007 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [04:18:01] RECOVERY - puppet last run on cp4015 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures [04:18:11] RECOVERY - puppet last run on wtp1014 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [04:18:11] RECOVERY - puppet last run on cp2010 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [04:18:20] RECOVERY - puppet last run on mw2070 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:20] RECOVERY - puppet last run on mw2055 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [04:18:21] RECOVERY - puppet last run on cp1073 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:18:31] RECOVERY - puppet last run on cp3048 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:40] RECOVERY - puppet last run on mw2207 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:40] RECOVERY - puppet last run on mw2060 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:40] RECOVERY - puppet last run on cp1064 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:40] RECOVERY - puppet last run on mw2120 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:40] RECOVERY - puppet last run on mw2119 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures [04:18:41] RECOVERY - puppet last run on mw1179 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [04:18:51] RECOVERY - puppet last run on elastic1016 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [04:18:51] RECOVERY - puppet last run on db1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:18:52] RECOVERY - puppet last run on cp3007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:19:01] RECOVERY - puppet last run on mw1094 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [04:19:11] RECOVERY - puppet last run on praseodymium is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:19:12] RECOVERY - puppet last run on db1041 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [04:19:21] RECOVERY - puppet last run on db2049 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [04:19:21] RECOVERY - puppet last run on mw2110 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:19:21] RECOVERY - puppet last run on mw1085 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:19:31] RECOVERY - puppet last run on mw1075 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:19:31] RECOVERY - puppet last run on db1019 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:19:31] RECOVERY - puppet last run on db2061 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [04:19:40] RECOVERY - puppet last run on mc1008 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [04:19:41] RECOVERY - puppet last run on ms-be1016 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [04:19:50] RECOVERY - puppet last run on analytics1019 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [04:19:51] RECOVERY - puppet last run on cp1069 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [04:20:00] RECOVERY - puppet last run on cp3020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:20:00] RECOVERY - puppet last run on cp1074 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:20:01] RECOVERY - puppet last run on mw1191 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [04:20:02] RECOVERY - puppet last run on mw1234 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [04:20:02] RECOVERY - puppet last run on mw2174 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [04:20:02] RECOVERY - puppet last run on mw2138 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:20:02] RECOVERY - puppet last run on mw2165 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [04:20:11] RECOVERY - puppet last run on tmh1002 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [04:20:11] RECOVERY - puppet last run on ms-be2010 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [04:20:11] RECOVERY - puppet last run on lvs4001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:20:20] RECOVERY - puppet last run on mw1182 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:20:21] RECOVERY - puppet last run on zirconium is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [04:20:21] RECOVERY - puppet last run on mw1013 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [04:20:21] RECOVERY - puppet last run on analytics1029 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:20:21] RECOVERY - puppet last run on wtp1024 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:20:31] RECOVERY - puppet last run on mw2133 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures [04:20:31] RECOVERY - puppet last run on mw1246 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [04:20:40] RECOVERY - puppet last run on elastic1028 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:20:41] RECOVERY - puppet last run on cp4007 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [04:20:51] RECOVERY - puppet last run on mw1124 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [04:21:11] RECOVERY - puppet last run on mw1257 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [04:21:21] RECOVERY - puppet last run on mw2089 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:21:22] RECOVERY - puppet last run on mw1109 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [04:21:31] RECOVERY - puppet last run on db2046 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [04:21:40] RECOVERY - puppet last run on eeden is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [04:21:41] RECOVERY - puppet last run on cp2017 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [04:21:41] RECOVERY - puppet last run on wtp2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:21:41] RECOVERY - puppet last run on mw2102 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:21:51] RECOVERY - puppet last run on cp3013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:00] RECOVERY - puppet last run on mw1244 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:01] RECOVERY - puppet last run on mw1240 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [04:22:01] RECOVERY - puppet last run on analytics1036 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:12] RECOVERY - puppet last run on ms-be1001 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [04:22:12] RECOVERY - puppet last run on mc1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:21] RECOVERY - puppet last run on db2035 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:21] RECOVERY - puppet last run on mw2179 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [04:22:21] RECOVERY - puppet last run on mw2103 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [04:22:21] RECOVERY - puppet last run on mw1048 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [04:22:21] RECOVERY - puppet last run on mw1062 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:21] RECOVERY - puppet last run on mw2072 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:33] RECOVERY - puppet last run on uranium is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures [04:22:33] RECOVERY - puppet last run on cp3046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:22:41] RECOVERY - puppet last run on mw1178 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [04:22:50] RECOVERY - puppet last run on mw1141 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [04:22:51] RECOVERY - puppet last run on analytics1015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:23:00] RECOVERY - puppet last run on mc1016 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [04:23:01] RECOVERY - puppet last run on mw1145 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [04:23:02] RECOVERY - puppet last run on es2005 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [04:23:21] RECOVERY - puppet last run on mw1115 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [04:23:31] RECOVERY - puppet last run on db2039 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [04:23:31] RECOVERY - puppet last run on mw2099 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:23:31] RECOVERY - puppet last run on mw2074 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [04:23:31] RECOVERY - puppet last run on mw2041 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:23:31] RECOVERY - puppet last run on cp1055 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [04:23:32] RECOVERY - puppet last run on db1073 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [04:23:40] RECOVERY - puppet last run on cp4006 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:23:41] RECOVERY - puppet last run on cp3045 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:23:41] RECOVERY - puppet last run on logstash1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:23:42] RECOVERY - puppet last run on ms-be1006 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [04:23:42] RECOVERY - puppet last run on labnodepool1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:23:51] RECOVERY - puppet last run on mw1007 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [04:23:51] RECOVERY - puppet last run on caesium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:00] RECOVERY - puppet last run on mw1080 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:01] RECOVERY - puppet last run on cp3012 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [04:24:10] RECOVERY - puppet last run on mw2164 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [04:24:10] RECOVERY - puppet last run on cp2005 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [04:24:10] RECOVERY - puppet last run on potassium is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [04:24:10] RECOVERY - puppet last run on mw2046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:11] RECOVERY - puppet last run on es2008 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [04:24:11] RECOVERY - puppet last run on db2005 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [04:24:11] RECOVERY - puppet last run on mw2124 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [04:24:12] RECOVERY - puppet last run on mw2122 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:12] RECOVERY - puppet last run on ms-be2003 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:24:20] RECOVERY - puppet last run on mw1134 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:21] RECOVERY - puppet last run on mw1197 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [04:24:22] RECOVERY - puppet last run on netmon1001 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [04:24:22] RECOVERY - puppet last run on cp3040 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [04:24:22] RECOVERY - puppet last run on ms-be1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:30] RECOVERY - puppet last run on db1066 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [04:24:31] RECOVERY - puppet last run on elastic1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:41] RECOVERY - puppet last run on mw1012 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:41] RECOVERY - puppet last run on mw1006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:41] RECOVERY - puppet last run on analytics1017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:42] RECOVERY - puppet last run on sca1002 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [04:24:42] RECOVERY - puppet last run on db1022 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [04:24:50] RECOVERY - puppet last run on mw1187 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [04:24:51] RECOVERY - puppet last run on mw1082 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:51] RECOVERY - puppet last run on mw2109 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [04:24:51] RECOVERY - puppet last run on es2009 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [04:24:51] RECOVERY - puppet last run on mw2014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:51] RECOVERY - puppet last run on mw2032 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:52] RECOVERY - puppet last run on mc2004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:24:52] RECOVERY - puppet last run on mc2010 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [04:24:53] RECOVERY - puppet last run on dbproxy1008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:00] RECOVERY - puppet last run on wtp1016 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [04:25:01] RECOVERY - puppet last run on mw1189 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [04:25:01] RECOVERY - puppet last run on cp2023 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:02] RECOVERY - puppet last run on ms-fe1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:11] RECOVERY - puppet last run on heze is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:11] RECOVERY - puppet last run on analytics1040 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:11] RECOVERY - puppet last run on mw1222 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:25:12] RECOVERY - puppet last run on mw2015 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [04:25:12] RECOVERY - puppet last run on mw1164 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [04:25:20] RECOVERY - puppet last run on mw1003 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [04:25:20] RECOVERY - puppet last run on elastic1018 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [04:25:21] RECOVERY - puppet last run on mw1153 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:25:30] RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [04:25:30] RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [04:25:30] RECOVERY - puppet last run on mw1069 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:25:30] RECOVERY - puppet last run on mw1100 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [04:25:31] RECOVERY - puppet last run on iron is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [04:25:31] RECOVERY - puppet last run on logstash1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:31] RECOVERY - puppet last run on mw2145 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [04:25:32] RECOVERY - puppet last run on mw2066 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [04:25:32] RECOVERY - puppet last run on mw2076 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:33] RECOVERY - puppet last run on mw1150 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [04:25:33] RECOVERY - puppet last run on elastic1012 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:40] RECOVERY - puppet last run on mw2080 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [04:25:40] RECOVERY - puppet last run on analytics1035 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [04:25:41] RECOVERY - puppet last run on mw1242 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [04:25:41] RECOVERY - puppet last run on lvs3001 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [04:25:41] RECOVERY - puppet last run on mw1205 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [04:25:41] RECOVERY - puppet last run on labstore1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:51] RECOVERY - puppet last run on db2059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:25:52] RECOVERY - puppet last run on db2045 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [04:26:00] RECOVERY - puppet last run on mw2213 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:00] RECOVERY - puppet last run on mw2023 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [04:26:00] RECOVERY - puppet last run on es2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:00] RECOVERY - puppet last run on mw2036 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [04:26:00] RECOVERY - puppet last run on mw2033 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures [04:26:00] RECOVERY - puppet last run on mw2013 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [04:26:01] RECOVERY - puppet last run on mw1173 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:01] RECOVERY - puppet last run on mw1176 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:01] RECOVERY - puppet last run on mw1226 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures [04:26:10] RECOVERY - puppet last run on mw1008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:21] RECOVERY - puppet last run on db1015 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [04:26:31] RECOVERY - puppet last run on mw1160 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:41] RECOVERY - puppet last run on mw2173 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [04:26:41] RECOVERY - puppet last run on mw1117 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:41] RECOVERY - puppet last run on mw2163 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:41] RECOVERY - puppet last run on db1046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:41] RECOVERY - puppet last run on mw2104 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:41] RECOVERY - puppet last run on db2018 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:50] RECOVERY - puppet last run on mw1099 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:51] RECOVERY - puppet last run on mw1065 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [04:27:00] RECOVERY - puppet last run on mw1119 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:27:00] RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:00] RECOVERY - puppet last run on cp3010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:01] RECOVERY - puppet last run on mw1060 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:01] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [04:27:01] RECOVERY - puppet last run on mw2097 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [04:27:01] RECOVERY - puppet last run on mw2082 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:10] RECOVERY - puppet last run on cp3037 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:20] RECOVERY - puppet last run on mw1088 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:21] RECOVERY - puppet last run on mw2114 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:21] RECOVERY - puppet last run on mw2083 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:27:21] RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [04:27:42] RECOVERY - puppet last run on mw2043 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:28:31] RECOVERY - puppet last run on mw2127 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [05:14:41] 10Ops-Access-Requests, 6operations: Deployment Access for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1398996 (10ellery) 3NEW [05:36:11] (03PS11) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 [05:36:13] (03PS11) 10BBlack: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 [05:41:39] (03PS1) 10KartikMistry: CX: Enable Content Translation in 20150625 deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220706 (https://phabricator.wikimedia.org/T95955) [05:56:15] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I like the logic and the implementation, a few details to correct around. I'll give it a try." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack) [06:00:04] (03PS1) 10KartikMistry: CX: Add languages for deployment on 20150625 [puppet] - 10https://gerrit.wikimedia.org/r/220707 (https://phabricator.wikimedia.org/T95955) [06:21:09] (03PS12) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack) [06:22:37] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "This huge yaml array written in this way is completely unreadable to me, is a performance hit and should probably be part of your deployme" [puppet] - 10https://gerrit.wikimedia.org/r/220707 (https://phabricator.wikimedia.org/T95955) (owner: 10KartikMistry) [06:36:19] 6operations, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, 6Multimedia, and 6 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1399147 (10Legoktm) >>! In T102566#1384154, @faidon wrote: > Our CA for production is GlobalSign. It is one of the big (in... [06:52:28] (03PS1) 10Nemo bis: Disable uploads on br.wikipedia (except emergency uploads) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220715 (https://phabricator.wikimedia.org/T103068) [06:59:13] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 25 06:59:13 UTC 2015 (duration 59m 12s) [06:59:21] Logged the message, Master [07:08:51] (03PS1) 10Jcrespo: Mobile data is generating an avalache of queries on dbstore1002 [puppet] - 10https://gerrit.wikimedia.org/r/220716 [07:10:49] ^does anyone know who is the owner of stat1003 so he is on the loop? [07:11:20] I don't think any one person owns stat1003, it's gonna come to down to dealing with individual users [07:11:39] bblack, I imagined that, but it is the "stats" user [07:11:52] so very generic [07:11:56] ah right, ok [07:12:05] (also, commitmsg mentions wrong hostname, I think) [07:12:05] I will send a generic email, too [07:12:29] or probably is being applied to the wrong host [07:13:09] 6operations, 10Wikimedia-Git-or-Gerrit: Get rid of the gerrit Debian package and migrate to puppet - https://phabricator.wikimedia.org/T103735#1399302 (10MoritzMuehlenhoff) No, it for some reason the package only takes care of removing the gerrit2 user/group, but not creating them: The maintainer scripts of a... [07:13:58] (03PS2) 10Jcrespo: Mobile data is generating an avalache of queries on dbstore1002 [puppet] - 10https://gerrit.wikimedia.org/r/220716 [07:14:06] no, you are right, the commit message was off [07:14:16] thank you, bblack ! [07:14:22] np! [07:14:48] I guess ping ottomata specifically when you get the chance, he'll know who else to ping probably [07:32:16] (03PS3) 10Jcrespo: Disable temporarelly on stat1003 a cron job (mobile data) [puppet] - 10https://gerrit.wikimedia.org/r/220716 [07:34:15] (03CR) 10Muehlenhoff: "Just nitpicking, but quite a few of those fonts are common in jessie and earlier Ubuntu releases, e.g. the xfonts-* or texlive fonts. Also" [puppet] - 10https://gerrit.wikimedia.org/r/218640 (https://phabricator.wikimedia.org/T102623) (owner: 10Dzahn) [07:35:08] bblack, yeah, I saw the cron comment, but he probably will not be available for a while [07:46:10] PROBLEM - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1373 bytes in 0.110 second response time [07:55:07] ^I do not know how to respond to that [07:55:52] me either! [07:56:21] <_joe_> jynus: uhm no idea either [07:56:33] https://www.wikidata.org/wiki/Special:DispatchStats [07:58:17] s2 is not very busy at he moment, so it has to be at application layer [07:58:23] <_joe_> I'd say the nagios check needs to be looked at [07:58:35] <_joe_> "poattern not found" [07:59:43] it's actually checking the json from: 'https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&format=json&siprop=statistics' [08:00:13] the ereg match looks for this chunk, with lag <60, basically: "median":{"pending":9683,"lag":2117 [08:00:27] (but that 2117 was what I saw a bit ago) [08:01:39] so probably job-related [08:01:47] (so probably the description should say "higher than 1 minute" too) [08:13:28] 7Puppet, 10Beta-Cluster, 10OCG-General-or-Unknown, 10OCG-PDF-renderer: Error: Sysctl::Parameters[wikimedia base]: Could not evaluate: can't dup Symbol on deployment-pdf01 - https://phabricator.wikimedia.org/T87197#1399393 (10hashar) Unrelated but on deployment-pdf01 I have deleted /var/log/ocg/ content. Th... [08:20:21] <_joe_> bblack: are you morally against creating custom puppet functions? [08:20:47] <_joe_> I'd prefer that than the inline_tempalte thing to figure aout the list of backends [08:21:08] <_joe_> sorry, I have a 2 seconds lag so I type blind :P [08:25:17] _joe_: they both suck, but I feel like a one-liner inline template is less obtuse than defining a whole new puppet function in another file and calling it. [08:25:58] (for trivial cases like these, where really the only reason for the one-liner is deficiency of the puppet language's operators and control structures, which are already fixed in puppet4 supposedly) [08:26:20] unless!* [08:26:49] unless you can make the puppet function very generic and stdlibby. As in, define a useful map() operator that makes the inline not necessary with sane-looking synax. [08:27:13] I meant the above more as a reaction to if you defined a custom puppet function that was specific to the directors hacks [08:27:14] <_joe_> well, since this is failing :) [08:27:37] <_joe_> and I think it's because it lacks some checks [08:27:46] it should work in theory, it's probably just syntax fail from me typing it in blindly and guessing [08:28:09] <_joe_> I'm still trying to find out why it fails btw [08:29:08] making those work in simpler ways was one reason I defined the 'dynamic' attribute everywhere, instead of just default it to "no" and leaving it out of most stanzas heh. [08:30:29] _joe_: oh there's a stupid syntax error in both inlines [08:30:58] the second "{" should not be there in: @directors.map{|k,v| { v['dynamic'] } [08:31:03] same issue in both [08:31:09] <_joe_> bblack: I was about to point that out [08:31:32] comes from me going back and forth between ruby and future-puppet syntax heh [08:32:04] <_joe_> also found other issues, fixing [08:32:19] future-puppet is: $a.map |$x| { ... } ruby is a.map{ |$x| .... } [08:32:29] which seems like a retarded discrepancy given puppet is implemented in ruby [08:33:11] <_joe_> yeah [08:33:14] (03PS1) 10Jcrespo: Depool db1018 to disable performance schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220721 [08:33:15] <_joe_> but you know [08:33:23] <_joe_> puppetlabs [08:33:23] !log ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: I0e5f2d3b2: Wrap lines in
 and .mw-code by default (duration: 00m 12s)
[08:33:29] 	 Logged the message, Master
[08:36:33] 	 _joe_: fundamentally what those inlines are doing isn't rocket science.  it's just puppet lacks sufficient expressive power.
[08:37:52] 	 oh I see the other bug now I think, puppet vs ruby hashref syntax?
[08:38:02] 	 on v['backends']
[08:38:24] 	 hmm no, that's right
[08:38:43] <_joe_>	 bblack: yes, I agree
[08:38:58] <_joe_>	 bblack: there are other issues I'm trying to iron out btw
[08:39:16] 	 ok I'll shut up and await PS13 :)
[08:40:00] 	 (03PS2) 10Jcrespo: Depool db1018 to disable performance schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220721 
[08:40:32] 	 (03CR) 10Jcrespo: [C: 032] Depool db1018 to disable performance schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220721 (owner: 10Jcrespo)
[08:42:39] 	 !log jynus Synchronized wmf-config/db-eqiad.php: depool db1018 for maintenance (duration: 00m 13s)
[08:42:43] 	 Logged the message, Master
[08:48:16] 	 (03PS13) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[08:51:31] 	 lol at the <%= bugs too
[08:51:41] 	 man, I should've slept before writing that or something :)
[08:52:46] 	 _joe_: don't worry about templates/varnish/text stuff for _random in that patch
[08:53:07] <_joe_>	 well, it broke
[08:53:08] 	 it's an oversight in that that makes that patch break, but that gets wiped out in the next patch, which is relatively trivial
[08:53:21] 	 the whole thing you're fixing there gets deleted
[08:54:45] <_joe_>	 ok
[08:55:10] 	 (could just merge those two patches really, it would be simpler given the breakage)
[08:56:12] 	 (03PS2) 10Alexandros Kosiaris: backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 (owner: 10Dzahn)
[08:56:19] 	 7Puppet, 10Beta-Cluster, 10OCG-General-or-Unknown, 10OCG-PDF-renderer: Error: Sysctl::Parameters[wikimedia base]: Could not evaluate: can't dup Symbol on deployment-pdf01 - https://phabricator.wikimedia.org/T87197#1399486 (10hashar) 5Open>3Resolved a:3hashar I removed puppet and ruby from deployment-...
[09:00:29] 	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Premise seems fine, comments inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220657 (owner: 10Dzahn)
[09:02:21] 	 !log restarting mysqld on db1018
[09:02:26] 	 Logged the message, Master
[09:07:15] 	 6operations, 10ops-eqiad, 7Database: Disk issue on db1028 - https://phabricator.wikimedia.org/T103230#1399509 (10jcrespo) a:3jcrespo
[09:09:43] 	 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Build Debian package ruby-jsduck for Jessie - https://phabricator.wikimedia.org/T95008#1399511 (10akosiaris) @hashar  Indeed. And, you are obviously right about bundler/gems. I did the porting of ruby-rkelly-re...
[09:18:21] 	 RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1360 bytes in 0.171 second response time
[09:30:03] 	 (03PS14) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[09:34:10] 	 (03Abandoned) 10Hashar: (WIP) vmbuilder with puppet (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/208939 (owner: 10Hashar)
[09:34:37] 	 (03PS4) 10Hashar: contint: PIL 1.1.7 expects libs in /usr/lib [puppet] - 10https://gerrit.wikimedia.org/r/216307 (https://phabricator.wikimedia.org/T101550) 
[09:35:00] 	 (03CR) 10Hashar: [C: 031 V: 032] "Still on integration puppetmaster." [puppet] - 10https://gerrit.wikimedia.org/r/216307 (https://phabricator.wikimedia.org/T101550) (owner: 10Hashar)
[09:35:33] 	 (03Abandoned) 10Hashar: base: vim -> vim-nox [puppet] - 10https://gerrit.wikimedia.org/r/203342 (owner: 10Hashar)
[09:35:51] 	 (03PS2) 10Hashar: contint: role::ci::slave::labs::light [puppet] - 10https://gerrit.wikimedia.org/r/217466 (https://phabricator.wikimedia.org/T94836) 
[09:37:41] 	 (03CR) 10Hashar: [C: 031 V: 032] "That is applied on the integration puppet master and has let me create a dumb Jenkins slave based on Jessie, in turn letting us run gdnsd " [puppet] - 10https://gerrit.wikimedia.org/r/217466 (https://phabricator.wikimedia.org/T94836) (owner: 10Hashar)
[09:37:53] 	 (03PS3) 10Hashar: contint: authdns::lint on light Jessie slave [puppet] - 10https://gerrit.wikimedia.org/r/217467 (https://phabricator.wikimedia.org/T98003) 
[09:38:28] 	 (03CR) 10Hashar: [C: 031 V: 032] "Applied on integration puppetmaster. The Jenkins job is https://integration.wikimedia.org/ci/job/operations-dns-lint/" [puppet] - 10https://gerrit.wikimedia.org/r/217467 (https://phabricator.wikimedia.org/T98003) (owner: 10Hashar)
[09:38:37] 	 (03PS2) 10Hashar: contint: do not install zuul on light slaves [puppet] - 10https://gerrit.wikimedia.org/r/217476 (https://phabricator.wikimedia.org/T94836) 
[09:39:24] 	 (03CR) 10Hashar: [C: 031 V: 032] "That is applied on the integration puppetmaster. Will be removed whenever Zuul is packaged for Jessie." [puppet] - 10https://gerrit.wikimedia.org/r/217476 (https://phabricator.wikimedia.org/T94836) (owner: 10Hashar)
[09:39:43] 	 (03PS3) 10Hashar: contint: install python3-tk [puppet] - 10https://gerrit.wikimedia.org/r/216969 (https://phabricator.wikimedia.org/T101697) 
[09:40:05] 	 (03CR) 10Hashar: [V: 032] "Still applied on integration puppetmaster" [puppet] - 10https://gerrit.wikimedia.org/r/216969 (https://phabricator.wikimedia.org/T101697) (owner: 10Hashar)
[09:40:13] 	 (03PS15) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[09:41:00] 	 (03CR) 10jenkins-bot: [V: 04-1] restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[09:44:47] 	 (03PS16) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[09:45:36] 	 (03CR) 10jenkins-bot: [V: 04-1] restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[09:54:48] 	 6operations, 6Phabricator, 10Phabricator-Sprint-Extension, 5Patch-For-Review: Odd error whilst navigating to https://phabricator.wikimedia.org/project/sprint/burn/1113/ - https://phabricator.wikimedia.org/T102142#1399600 (10Christopher)
[10:03:44] 	 (03PS1) 10Jcrespo: Repool db1018 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220730 
[10:05:14] 	 (03PS17) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[10:06:54] 	 (03CR) 10Jcrespo: [C: 032] Repool db1018 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220730 (owner: 10Jcrespo)
[10:08:36] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1399655 (10Aklapper)
[10:09:49] 	 !log jynus Synchronized wmf-config/db-eqiad.php: repool db1018 (duration: 00m 12s)
[10:09:54] 	 Logged the message, Master
[10:12:13] 	 this error is new: Title::invalidateCache Lock wait timeout exceeded; try restarting transaction
[10:13:16] 	 and the question is, why people do not get that an app can be run in more than one server at a time?
[10:13:46] <_joe_>	 jynus: ah it's something I've been asking myself since I started doing operations for websites
[10:14:25] 	 but _joe_ I studied concepts like mutexes, locks and semaphores in university
[10:14:43] <_joe_>	 jynus: the next level is "why people do not get that an app can be run in a context with arbitrary latencies?" which is the multi-dc scenario
[10:14:44] 	 I assume other people too!
[10:15:16] <_joe_>	 jynus: I didn't, I studied general relativity, quantum field theory and FORTRAN
[10:15:30] 	 oh, you are one of "those"
[10:15:38] 	 :-)
[10:15:56] 	 I studied how to deal with idiots
[10:16:24] 	 YuviPanda, doesn't seem to work, you are still dealing with me! :-)
[10:16:31] 	 (03PS2) 10Hashar: contint: Add 'libffi-dev' package [puppet] - 10https://gerrit.wikimedia.org/r/220694 (https://phabricator.wikimedia.org/T103775) (owner: 10Legoktm)
[10:16:35] 	 *how* to, you see :P
[10:16:36] 	 not how not to :P
[10:17:01] 	 YuviPanda: that's WMF on-the-job training, right? ;-)
[10:17:13] 	 valhallasw: got a head start, I did
[10:17:28] 	 I should write up the horror stories at some point
[10:18:45] 	 (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on integration puppetmaster." [puppet] - 10https://gerrit.wikimedia.org/r/220694 (https://phabricator.wikimedia.org/T103775) (owner: 10Legoktm)
[10:20:11] 	 _joe_: i will give you the nobel prize for ops if you find a way to teach dev's beyond "the app runs well on my machine", going to one server, then to many then to multiple sites is beyond the commons dev person
[10:20:41] 	 *common
[10:23:00] 	 (03CR) 10Hashar: [C: 04-1] contint: Create symlink for composer in /usr/local/bin/ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/220658 (owner: 10Legoktm)
[10:23:37] 	 hi matanya 
[10:23:47] 	 hello YuviPanda 
[10:24:02] 	 yes, i am killing NFS
[10:24:12] 	 matanya: did you get added to the project already?
[10:24:31] 	 oh, i predicted the next question wrongly :)
[10:24:43] 	 :D
[10:25:03] 	 matanya: another question is - for the video project, can you use /data/scratch?
[10:25:06] 	 instead of /data/project?
[10:25:15] 	 (03PS2) 10Hashar: contint: Create symlink for composer in /usr/local/bin/ [puppet] - 10https://gerrit.wikimedia.org/r/220658 (owner: 10Legoktm)
[10:26:14] 	 (03CR) 10Hashar: [C: 031 V: 032] "Cherry picked on integration puppetmaster. I have verified via salt the vendor/bin/composer file actually exist :-D" [puppet] - 10https://gerrit.wikimedia.org/r/220658 (owner: 10Legoktm)
[10:26:25] 	 matanya: added you
[10:26:28] 	 YuviPanda: i wasn't added to project, and i can start using /data/scratch, but only after the current run ends
[10:26:32] 	 thanks
[10:26:49] 	 matanya: indeed, that's great :) 
[10:26:55] <_joe_>	 matanya: well all of that adds a significant, exponential difficulty to writing applications. It's not easy at all
[10:27:07] 	 matanya: you know hat means that it won't get backed up, etc, right?
[10:27:10] 	 i didn't say it is _joe_ :)
[10:27:14] 	 and could get blown away in case out outages
[10:27:26] 	 YuviPanda: i figured
[10:27:46] 	 it should be ok, if labs outages will be less frequent
[10:27:58] 	 matanya: thank you :)
[10:28:44] 	 (03CR) 10Hashar: [C: 031] "Thanks Timo!" [puppet] - 10https://gerrit.wikimedia.org/r/220666 (https://phabricator.wikimedia.org/T103766) (owner: 10Krinkle)
[10:29:44] 	 matanya: can you comment on https://phabricator.wikimedia.org/T102402
[10:30:45] 	 done
[10:31:23] 	 matanya: hmm, so you still need your shared homedir?
[10:31:50] 	 YuviPanda: yes, when i bring more than more instance i need to share my homedir between them
[10:31:52] 	 (03PS1) 10Jcrespo: Depooling es2003 and es2004 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220734 
[10:32:04] 	 matanya: hmm, what are you keeping in your homedirs?
[10:32:05] 	 but that is rare nowadays
[10:32:26] 	 pywikibot, scripts dev in progress etc
[10:32:43] 	 hmmm
[10:33:17] 	 want me to move it elsewhere ?
[10:33:29] 	 matanya: ideally no NFS for homes :D 
[10:33:36] 	 but there's also the question of 'so what if the instance dies'
[10:33:52] 	 ok, i think it can back it up in git
[10:34:07] 	 +1!
[10:34:09] 	 git is best :)
[10:34:11] 	 and just clone every time i need it
[10:34:12] 	 for code
[10:34:14] 	 indeed.
[10:34:23] 	 we'll work on a 'backup' solution as well
[10:34:24] 	 ok, deal
[10:34:57] 	 but give me a few days, i am over-saturated right now
[10:35:30] 	 matanya: no problems at all!
[10:35:38] 	 I still have 65 projects to go through ;)
[10:37:26] 	 (03CR) 10Jcrespo: [C: 032] Depooling es2003 and es2004 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220734 (owner: 10Jcrespo)
[10:37:51] 	 good luck with that
[10:40:09] 	 !log jynus Synchronized wmf-config/db-codfw.php: depool es2003 and es2004 for maintenance (duration: 00m 13s)
[10:40:14] 	 Logged the message, Master
[10:42:55] 	 6operations, 10Mathoid, 10RESTBase, 6Services: Document and hook up public mathoid end point in RB - https://phabricator.wikimedia.org/T102030#1399731 (10mobrovac)
[10:44:09] 	 akosiaris: https://phabricator.wikimedia.org/T103812 another one of your instances, permission to remove NFS :)
[10:44:17] 	 akosiaris: seems to have been used for building etherpadlite packages
[10:45:25] 	 YuviPanda: ah, yes etherpadlite packages can not be built on copper right now
[10:45:39] 	 akosiaris: oh, I see.
[10:45:45] 	 akosiaris: but do you need NFS there? :)
[10:45:52] 	 I can recover your files onto local places
[10:45:55] 	 but I thought I had disable nfs already
[10:46:09] 	 gimme a sec 
[10:46:12] 	 akosiaris: nope, and there's files of yours there.
[10:46:13] 	 akosiaris: ok
[10:46:21] 	 at least when I looked at the server
[10:46:43] 	 ok, so I had disabled and moved /home
[10:47:02] 	 I 've copied already whatever I needed in the VM anyway
[10:47:14] 	 the rest {/keys,/dumps,/scratch} I do not need
[10:47:21] 	 and /project and /home are gone already
[10:47:28] 	 feel free to kill NFS from that project
[10:47:29] 	 akosiaris: alrighty then!
[10:47:31] 	 thank you
[10:51:13] 	 (03CR) 10Muehlenhoff: [C: 031] "Looks good to me (now that sshd has been updated on all precise machines), but before merging the sshd_config.erb changes we should test i" [puppet] - 10https://gerrit.wikimedia.org/r/218411 (https://phabricator.wikimedia.org/T102401) (owner: 10Yuvipanda)
[10:53:05] 	 moritzm: sure! let me arrange that
[10:53:15] 	 (03PS6) 10Yuvipanda: ssh: Extend all the cipher goodies to precise as well [puppet] - 10https://gerrit.wikimedia.org/r/218411 (https://phabricator.wikimedia.org/T102401) 
[10:56:31] 	 moritzm: is on deployment-salt.eqiad.wmflabs now
[10:57:15] 	 moritzm: works well afaict
[10:57:51] 	 !log rebooting es2003 and es2004
[10:57:56] 	 Logged the message, Master
[10:59:15] 	 YuviPanda: I'll have a look
[10:59:27] 	 thanks
[11:00:06] 	 memory failure on es2004
[11:02:18] 	 (03PS1) 10Alexandros Kosiaris: Migrate lvs::monitor to hashes and create_resources [puppet] - 10https://gerrit.wikimedia.org/r/220736 
[11:08:39] 	 YuviPanda: deployment-prep has disable_nist_kex and explicit_macs set in hiera, that's not applicable to prod since all the relevant settings are reset to the default values in sshd
[11:09:06] 	 moritzm: oh, hmm. I can disable those temporarily to test
[11:09:41] 	 let's do that on deployment-salt temporarily, otherwise the test case doesn't compare to prod
[11:09:48] 	 moritzm: yeah, doing now
[11:11:58] 	 moritzm: done
[11:12:06] 	 and puppet ran and put the new config in place
[11:12:27] 	 YuviPanda: I'll have a look
[11:12:47] 	 (03PS2) 10Alexandros Kosiaris: Migrate lvs::monitor to hashes and create_resources [puppet] - 10https://gerrit.wikimedia.org/r/220736 
[11:12:49] 	 (03PS1) 10Alexandros Kosiaris: lvs::monitor: Removed hardcoded IPs [puppet] - 10https://gerrit.wikimedia.org/r/220740 
[11:12:51] 	 (03PS1) 10Alexandros Kosiaris: lvs::monitor: merge configuration hashes [puppet] - 10https://gerrit.wikimedia.org/r/220741 
[11:16:37] 	 moritzm: for labs, do you have input on whether HBA from bastion to hosts is a good/bad idea? The idea is to allow people to login to the bastion via mosh, then ssh on from there. https://phabricator.wikimedia.org/T103552
[11:20:20] 	 (enabled per-project or per-host, of course)
[11:21:27] 	 valhallasw: I'll have a look at the ticket later the day, currently on something else
[11:21:34] 	 moritzm: thanks!
[11:22:17] 	 es2003 back to normal, I will deal with potential faulty memory bank on es2004 after lunch
[11:24:07] 	 YuviPanda: looks good to me, with the recommended SSH client settings chacha20-poly1305 is negotiated and otherwise the SSH client defaults are picked
[11:24:09] 	 let's merge
[11:24:16] 	 moritzm: \o/
[11:24:23] 	 moritzm: let me amend the commit message slightly and merge
[11:25:22] 	 (03PS7) 10Yuvipanda: ssh: Unify config between precise and trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/218411 (https://phabricator.wikimedia.org/T102401) 
[11:25:35] 	 moritzm: alright, merging now!
[11:26:10] 	 (03CR) 10Yuvipanda: [C: 032] ssh: Unify config between precise and trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/218411 (https://phabricator.wikimedia.org/T102401) (owner: 10Yuvipanda)
[11:26:43] 	 moritzm: \o/
[11:27:15] 	 moritzm: I'm hand running on sodium to make sure it's ok
[11:28:50] 	 moritzm: https://dpaste.de/M6Yp diff happened, but I can still ssh in
[11:29:53] 	 on what host is that diff?
[11:30:22] 	 moritzm: sodium
[11:30:29] 	 moritzm: which is lucid.
[11:31:36] 	 hmm, let me see
[11:32:11] 	 PROBLEM - SSH on sodium is CRITICAL: Connection refused
[11:32:20] 	 or maybe not.
[11:32:55] 	 I still have a shell there
[11:34:18] 	 moritzm: hmm, ssh is dead on lucid. let me submit a patch
[11:34:19] 	 YuviPanda: the sodium part of the patch is wrong, formerly it didn't set neither "Ciphers" nor "KexAlgorithms" on lucid, so sshd kept it's internal default, and with the new version it requests a ciphers which aren't present on openssl 0.9.8/lucid
[11:34:28] 	 yeah, just realized
[11:34:37] 	 I'm preparing a  patch now.
[11:37:31] 	 (03PS1) 10Yuvipanda: ssh: Don't specify ciphers / keys explicitly on lucid [puppet] - 10https://gerrit.wikimedia.org/r/220743 
[11:37:35] 	 moritzm: ^
[11:38:47] 	 I'll have a look
[11:43:58] 	 (03CR) 10Muehlenhoff: [C: 04-1] "Jessie doesn't fulfill the "os_version(['ubuntu > lucid']" conditional, so jessie would be stuck with the SSH defaults." [puppet] - 10https://gerrit.wikimedia.org/r/220743 (owner: 10Yuvipanda)
[11:46:24] 	 (03PS2) 10Yuvipanda: ssh: Explicitly don't specify ciphers / keys on lucid [puppet] - 10https://gerrit.wikimedia.org/r/220743 
[11:46:25] 	 moritzm: ^
[11:55:45] 	 moritzm: shall I merge?
[11:56:33] 	 (03PS18) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[11:57:58] 	 (03PS2) 10Jakob: Add Phragile module. [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) 
[12:01:02] 	 YuviPanda: I'll have a look
[12:01:03] 	 (03PS19) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[12:02:16] 	 (03CR) 10Muehlenhoff: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/220743 (owner: 10Yuvipanda)
[12:02:38] 	 YuviPanda: ^
[12:02:45] 	 moritzm: merging
[12:05:02] 	 (03CR) 10Yuvipanda: [C: 032] ssh: Explicitly don't specify ciphers / keys on lucid [puppet] - 10https://gerrit.wikimedia.org/r/220743 (owner: 10Yuvipanda)
[12:06:12] 	 (03PS1) 10KartikMistry: CX: Disable newarticle for languages deployed on 20150625 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220747 (https://phabricator.wikimedia.org/T103809) 
[12:09:11] 	 RECOVERY - SSH on sodium is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7.1 (protocol 2.0)
[12:31:19] 	 6operations, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, 6Multimedia, and 6 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1400166 (10hashar) >>! In T102566#1399147, @Legoktm wrote: >  > Today @Krinkle and I discovered that using homebrew's pyth...
[12:38:10] 	 (03PS1) 10Joal: Add new projectview to projectcounts aggregation [puppet] - 10https://gerrit.wikimedia.org/r/220752 (https://phabricator.wikimedia.org/T101118) 
[12:40:53] 	 would anyone have half an hour to get a few puppet patches merged in?  Got them deployed on the CI puppetmaster but they could use a final merge :-}
[12:51:34] 	 (03CR) 10JanZerebecki: [C: 04-1] Add Phragile module. (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) (owner: 10Jakob)
[12:55:11] 	 Nemo_bis: around?
[12:55:54] 	 Nemo_bis: poke about https://phabricator.wikimedia.org/T103840
[13:02:05] 	 6operations, 10ops-codfw, 7Database: Faulty memory on es2004 - https://phabricator.wikimedia.org/T103843#1400240 (10jcrespo) 3NEW
[13:02:21] 	 (03PS20) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[13:07:58] 	 6operations, 10ops-codfw, 6Labs: Labs: Install the new RAID controller in labstore2002 and test - https://phabricator.wikimedia.org/T103267#1400255 (10coren) 5Open>3Resolved a:3coren This was done as a consequence of T103356  Controller is well-supported and performs well, but there are quirks with the...
[13:10:35] 	 YuviPanda: hi, how do I discover what was in NFS?
[13:10:58] 	 Nemo_bis: hi. can you ssh into any of the instances in the project? if so I can mount them temporarily for you to see.
[13:11:06] 	 Nemo_bis: I can also just give you the output of find or something like that
[13:12:13] 	 YuviPanda: find output is fine
[13:12:13] 	 (03PS1) 10Jcrespo: Repool es2003 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220756 
[13:12:19] 	 Nemo_bis: let me do that, moment
[13:13:48] 	 6operations, 10ops-eqiad: graphite1002 slot 7 disk failed - https://phabricator.wikimedia.org/T103159#1400281 (10Cmjohnson) 5Open>3Resolved Swapped Disk   Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online,...
[13:16:27] 	 Nemo_bis: woah, find is taking forever because you seem to have a full checkout of the entire gnome project in there...
[13:16:45] 	 (03CR) 10Jcrespo: [C: 032] Repool es2003 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220756 (owner: 10Jcrespo)
[13:17:10] 	 YuviPanda: ok, so *that* is the directory; ok, just delete everything
[13:17:18] 	 Nemo_bis: haha, ok :)
[13:18:58] 	 (03PS21) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[13:19:32] <_joe_>	 Nemo_bis: don't trust Yuvi
[13:19:48] <_joe_>	 he's the NFS grinch, he steals storage to ruin your christmas
[13:23:15] 	 ACKNOWLEDGEMENT - mysqld processes on es2004 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld Jcrespo Memory failure, server could be rebooted at any time T103843. Remove after ticket is closed.
[13:25:45] 	 6operations, 10ops-eqiad: install 10g NIC card to labnet1002 - https://phabricator.wikimedia.org/T103849#1400349 (10Cmjohnson) 3NEW
[13:26:17] 	 _joe_: I'll take the risk then ;) and celebrate some pagan festivity instead
[13:30:37] 	 !log jynus Synchronized wmf-config/db-codfw.php: Repool es2003 (but not es2004) after maintenance (duration: 00m 12s)
[13:30:42] 	 Logged the message, Master
[13:36:41] 	 (03PS22) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[13:37:01] 	 is someone playing with ntp?
[13:37:13] 	 (03PS2) 10Alexandros Kosiaris: lvs::monitor: Removed hardcoded IPs [puppet] - 10https://gerrit.wikimedia.org/r/220740 
[13:37:15] 	 (03PS2) 10Alexandros Kosiaris: lvs::monitor: merge configuration hashes [puppet] - 10https://gerrit.wikimedia.org/r/220741 
[13:37:17] 	 (03PS3) 10Alexandros Kosiaris: Migrate lvs::monitor to hashes and create_resources [puppet] - 10https://gerrit.wikimedia.org/r/220736 
[13:37:24] 	 6operations, 10ops-codfw, 6Labs: Labs: Install the new RAID controller in labstore2002 and test - https://phabricator.wikimedia.org/T103267#1400393 (10coren) A note, this was done on 2001 in the end as we tried to debug that one.
[13:37:30] 	 jynus: maybe moritzm tweaks ntp
[13:37:35] 	 for the leap second 
[13:38:02] 	 (03CR) 10jenkins-bot: [V: 04-1] lvs::monitor: merge configuration hashes [puppet] - 10https://gerrit.wikimedia.org/r/220741 (owner: 10Alexandros Kosiaris)
[13:38:25] 	 I have an ntp check failure on es2004, which has just been rebooted
[13:38:36] 	 (codfw)
[13:38:52] 	 ntpdate[3625]: no servers can be used, exiting
[13:39:24] 	 it doen't mattter, its gone
[13:40:40] 	 jynus, hashar: no, nothing yet
[13:41:00] 	 it wasn't a big issue, except the alarm
[13:41:35] 	 YuviPanda: is there a task for " Reduce the size of the filesystem(s) underlying NFS to speed backups and recovery"? What about decreasing the access.log files?
[13:42:23] 	 (03PS23) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[13:42:47] 	 Nemo_bis: there's a 'get logstash for tool labs' task somewhere which mostly covers the access.log case
[13:44:04] 	 Nemo_bis: but there doesn't seem to be a task for the underlying filesystem under https://phabricator.wikimedia.org/maniphest/?statuses=open()&projects=PHID-PROJ-gegkei3dnjsw2xg3jliy#R
[13:44:53] 	 off for kids
[13:44:57] * hashar waves :-}
[13:49:44] 	 valhallasw: that seems overkill; just needs to be trimed
[13:50:06] 	 assuming the combined size of access.log files does matter for the NFS pollution, which may be true or not
[13:50:21] 	 aiui it's the rate rather than the size
[13:50:31] 	 (although backing up 1TB files obviously is not much fun either)
[13:51:13] 	 (03PS3) 10Alexandros Kosiaris: lvs::monitor: Removed hardcoded IPs [puppet] - 10https://gerrit.wikimedia.org/r/220740 
[13:51:15] 	 (03PS3) 10Alexandros Kosiaris: lvs::monitor: merge configuration hashes [puppet] - 10https://gerrit.wikimedia.org/r/220741 
[13:51:17] 	 (03PS4) 10Alexandros Kosiaris: Migrate lvs::monitor to hashes and create_resources [puppet] - 10https://gerrit.wikimedia.org/r/220736 
[13:52:34] <_joe_>	 akosiaris: I suppose I should look at those, right?
[13:54:38] 	 _joe_: well, you already did
[13:54:40] 	 I am mostly fixing bugs thanks to the catalogcompiler
[13:55:25] <_joe_>	 akosiaris: and since chase already merged the lvs patch, you must have needed to keep that into account
[13:55:28] <_joe_>	 right?
[13:55:35] 	 for example I messed up IPv6 ;-)
[13:55:47] 	 6operations, 7Database: es[12]00[123] maintenance and upgrade - https://phabricator.wikimedia.org/T101084#1400434 (10jcrespo) 5Open>3Resolved With today's work I would say the task is finished.  Summary:  * Upgraded to trusty with kernel upgrade and reboot and MariaDB 10.0.16 all es* slaves (18) hosts * De...
[13:55:54] <_joe_>	 oh Im not on ipv6, who cares
[13:55:55] <_joe_>	 :P
[13:56:28] * Reedy high fives _joe_
[13:56:40] 	 Nemo_bis: what valhallasw says. also access.log isn't backed up. also, the task talks about reducing the size of the *filesystem*, not usage of it. we had one big 40T file system...
[13:56:56] 	 _joe_: I think so, lemme rebase once more
[13:57:14] 	 yeah, it's up to date
[13:57:26] <_joe_>	 akosiaris: 'conftool' => key inside lvs_services
[13:57:31] 	 YuviPanda: ah, ok; yeah, I never understood why it was made 40T
[13:57:39] 	 yeah, me neither...
[13:57:41] 	 Bigger is better. Duh
[13:58:54] 	 _joe_: yup, way too unintrusive to matter for this change
[13:58:55] 	 Reedy: clearly you are a true USAian now :)
[13:58:55] 	 it will matter for the next ones though ;)
[13:59:01] 	 _joe_: so, catalogcompiler says a noop, wanna take a look before I merge them ?
[13:59:14] 	 Reedy: I'm going to skip the obvious joke ;-)
[13:59:49] <_joe_>	 akosiaris: well, I need a coffee right now, if you're on a hurry or blocked merge them
[14:00:01] <_joe_>	 akosiaris: keep an eye on codfw
[14:00:15] <_joe_>	 where we're using etcd actively in pybal, I think
[14:00:58] 	 ok
[14:03:06] 	 (03CR) 10Giuseppe Lavagetto: [C: 031] Migrate lvs::monitor to hashes and create_resources [puppet] - 10https://gerrit.wikimedia.org/r/220736 (owner: 10Alexandros Kosiaris)
[14:03:06] 	 I was going to catiously move the rest of codfw over akosiaris, _joe_ is that going to work out with your stuff today?
[14:03:06] 	 (03CR) 10Giuseppe Lavagetto: [C: 031] lvs::monitor: Removed hardcoded IPs [puppet] - 10https://gerrit.wikimedia.org/r/220740 (owner: 10Alexandros Kosiaris)
[14:03:23] <_joe_>	 chasemp: green light from me
[14:03:34] 	 chasemp: no blocker from me
[14:03:46] <_joe_>	 chasemp: when you're done with codfw, can we send an email to ops@ with a timeline for migrating the rest?
[14:03:57] 	 sure what timeline did you have in mind?
[14:04:25] 	 (03PS5) 10Alexandros Kosiaris: Migrate lvs::monitor to hashes and create_resources [puppet] - 10https://gerrit.wikimedia.org/r/220736 
[14:04:38] 	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Migrate lvs::monitor to hashes and create_resources [puppet] - 10https://gerrit.wikimedia.org/r/220736 (owner: 10Alexandros Kosiaris)
[14:05:07] <_joe_>	 chasemp: mh dunno, monday ulsfo, sometimes later next week esams and maybe eqiad?
[14:05:19] <_joe_>	 we need to verify the data with caution, too
[14:05:21] 	 (03PS4) 10Alexandros Kosiaris: lvs::monitor: Removed hardcoded IPs [puppet] - 10https://gerrit.wikimedia.org/r/220740 
[14:05:25] 	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] lvs::monitor: Removed hardcoded IPs [puppet] - 10https://gerrit.wikimedia.org/r/220740 (owner: 10Alexandros Kosiaris)
[14:06:19] 	 realistically I can do ulsfo and esams in teh same week, but I would say eqiad then ...less gung ho because I want to let it sit for at least 24-48 hours
[14:06:21] 	 just in case
[14:06:37] 	 not that I have a specific reason for it, I just hate to pull the rug out everywhere at once
[14:07:05] 	 but I'm open to what makes sense
[14:07:26] <_joe_>	 chasemp: I mostly agree
[14:08:07] <_joe_>	 chasemp: I think once we have live traffic on it, we can declare the goal is in good shape, right?
[14:09:06] 	 not a facetious answer, whats the goal exactly?  iirc I think so
[14:10:00] 	 (03PS4) 10Alexandros Kosiaris: lvs::monitor: merge configuration hashes [puppet] - 10https://gerrit.wikimedia.org/r/220741 
[14:18:21] 	 (03PS1) 10Rush: pybal: switch lvs2002 & lvs2005 to confd pools [puppet] - 10https://gerrit.wikimedia.org/r/220759 
[14:21:06] 	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] lvs::monitor: merge configuration hashes [puppet] - 10https://gerrit.wikimedia.org/r/220741 (owner: 10Alexandros Kosiaris)
[14:21:38] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1400460 (10Krenair) Isn't this something the restricted group can do on terbium? I know James A has mentioned being able to run maintenance scripts there before, and we should avoid givin...
[14:22:16] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1400462 (10Krenair)
[14:30:25] 	 (03Abandoned) 10Alexandros Kosiaris: role::cache: Move inclusion of lvs::configuration from base [puppet] - 10https://gerrit.wikimedia.org/r/217544 (owner: 10Alexandros Kosiaris)
[14:33:00] 	 why do leap seconds have to exist :'(
[14:33:55] <_joe_>	 Negative24: heh.
[14:34:22] 	 if only time was linear (...joke)
[14:34:37] 	 Negative24: hopefully in a few hundred years we'll become type II on kardashev scale, and we'll fix the root cause of them
[14:34:47] 	 (03PS4) 10Jcrespo: Disable temporarelly on stat1003 a cron job (mobile data) [puppet] - 10https://gerrit.wikimedia.org/r/220716 
[14:34:54] 	 MatmaRex: itym kardarshian scale?
[14:35:04] 	 Negative24: hi. Can I get rid of NFS on your performance project?
[14:35:07] 	 I filed a bug and poked you :D
[14:35:36] 	 YuviPanda: go for it. I would do it myself but I'm on vacation so ssh is mucked up :S
[14:35:51] 	 Negative24: ok, I'll do it
[14:35:58] 	 I can't get into anything so I've been doing things locally :)
[14:36:03] 	 (03CR) 10Jcrespo: [C: 032] "This breaks http://mobile-reportcard.wmflabs.org but people seem to agree it is a necesary temporary workaround." [puppet] - 10https://gerrit.wikimedia.org/r/220716 (owner: 10Jcrespo)
[14:36:24] 	 YuviPanda: "A civilization capable of harnessing the energy of Kim Kardashian's butt"? :>
[14:37:11] 	 (03PS24) 10Giuseppe Lavagetto: restructure varnish::instance's "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[14:38:18] <_joe_>	 ok, merging this ^^ if no one objects
[14:40:00] 	 akosiaris: around in next 15-20 minutes?
[14:40:11] 	 (03PS1) 10Alexandros Kosiaris: Add ensure parameter to ntp::daemon [puppet] - 10https://gerrit.wikimedia.org/r/220761 
[14:41:17] 	 kart_: yes
[14:45:04] 	 (03PS2) 10KartikMistry: CX: Add languages for deployment on 20150625 [puppet] - 10https://gerrit.wikimedia.org/r/220707 (https://phabricator.wikimedia.org/T95955) 
[14:45:37] 	 akosiaris: cool, need to merge https://gerrit.wikimedia.org/r/#/c/220707/
[14:45:49] 	 oh there is -1 on it.
[14:45:55] <_joe_>	 yes, from me.
[14:46:03] 	 Has Gerrit stopped sending email notification?
[14:46:13] <_joe_>	 since this early morning my time, but I guess you were sleeping by then :)
[14:46:17] 	 _joe_: Sorry, it can't be fixed right now.
[14:46:36] 	 _joe_: it can be fixed later when we will move to service-runner.
[14:46:44] 	 (03CR) 10Giuseppe Lavagetto: [C: 032] "Verified with the compiler, but I'll take great care in deploying it." [puppet] - 10https://gerrit.wikimedia.org/r/220644 (owner: 10BBlack)
[14:46:56] 	 _joe_: can you file a bug with what kind of possible performance issues it can bring?
[14:47:04] 	 moritzm, kudos for a very well written communication
[14:47:18] <_joe_>	 can't be fixed right now what? the way you write an array in yaml?
[14:47:41] 	 _joe_: yes
[14:47:41] <_joe_>	 arrayname: [a,b,c] is valid in yaml as much as the form you're using
[14:47:51] <_joe_>	 kart_: so you're not parsing the yaml
[14:48:09] <_joe_>	 anyway, laters. I have a potentially destructive change to babysit
[14:48:16] 	 _joe_: this can be fixed when we move to serive runner.
[14:48:23] 	 service*
[14:48:48] 	 (03PS1) 10coren: Add a cross-labstore key for backups [puppet] - 10https://gerrit.wikimedia.org/r/220763 
[14:48:49] <_joe_>	 kart_: later, sorry
[14:49:06] 	 YuviPanda: This ^^ and its parent are what I need most atm to start the backups.
[14:49:15] 	 (03PS7) 10Eevans: configure additional Cassandra metric alerts [puppet] - 10https://gerrit.wikimedia.org/r/218408 (https://phabricator.wikimedia.org/T101764) 
[14:49:48] 	 (03CR) 10Eevans: "Latest is just a rebase." [puppet] - 10https://gerrit.wikimedia.org/r/218408 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans)
[14:49:59] 	 RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 3076.03806068
[14:50:18] 	 Can someone in ops please review https://gerrit.wikimedia.org/r/#/c/139581/ ?
[14:50:48] 	 jouncebot, next
[14:50:48] 	 In 0 hour(s) and 9 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150625T1500)
[14:51:05] 	 (03PS11) 10Chad: Allow text-lb to redirect svn access to Diffusion [puppet] - 10https://gerrit.wikimedia.org/r/219228 
[14:51:28] * ostriches wants to land that
[14:51:56] 	 (03CR) 10Ottomata: [C: 031] misc-web varnish: remove stat1001 from config [puppet] - 10https://gerrit.wikimedia.org/r/220692 (owner: 10Dzahn)
[14:52:08] 	 PROBLEM - puppet last run on mw1054 is CRITICAL Puppet has 1 failures
[14:52:11] <_joe_>	 hey, no varnish change landing now please
[14:52:24] <_joe_>	 ottomata, mutante 
[14:52:30] 	 (03CR) 10GWicke: [C: 031] configure additional Cassandra metric alerts [puppet] - 10https://gerrit.wikimedia.org/r/218408 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans)
[14:52:40] 	 _joe_: cool, I just +1ed it
[14:53:24] 	 akosiaris: can you merge, https://gerrit.wikimedia.org/r/#/c/220707/ ?
[14:55:21] 	 Who's SWATing?
[14:55:57] 	 I will
[14:56:01] 	 Kk.
[14:56:25] 	 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1400512 (10yuvipanda) What next for this? We put some hosts on it and run a suspend resume loop?
[14:56:26] 	 kart_: gimme a min
[14:56:32] 	 akosiaris: okay!
[14:56:47] 	 akosiaris: I will address _joe_ 's comment later. We know that :)
[14:57:15] 	 akosiaris: Probably from next week, there won't be any updates in languages :)
[14:58:20] 	 kart_: yeah. _joe_ is obviously right. that construct has become utterly unmanageable
[14:59:19] 	 akosiaris: see mistakes I've made. I agree.
[14:59:22] 	 kart_: but I am gonna hold to your word. this needs fixing. Also service-runner and service::node are probably related
[14:59:31] 	 Yep
[14:59:34] 	 _joe_: wanna remove that -1 ?
[15:00:04] 	 manybubbles anomie ostriches thcipriani marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150625T1500).
[15:00:04] 	 James_F kart_: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[15:00:06] <_joe_>	 I'll do the same, and yes
[15:00:10] <_joe_>	 1 sec guys
[15:00:17] * James_F waves to Krenair.
[15:00:24] * kart_ waves
[15:00:28] 	 okay
[15:00:48] 	 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1400545 (10MoritzMuehlenhoff) I think so. That should show fairly reliable whether the problem still exists (the previous crashes were caused by the restarts after the VENOM securit...
[15:01:00] 	 (03PS3) 10Alex Monk: Add exception for ALA hackathon at WMF Office [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220653 (https://phabricator.wikimedia.org/T103764) (owner: 10Jalexander)
[15:01:29] 	 (03CR) 10Giuseppe Lavagetto: "I remove my -1 on the promise that this will be amended ASAP." [puppet] - 10https://gerrit.wikimedia.org/r/220707 (https://phabricator.wikimedia.org/T95955) (owner: 10KartikMistry)
[15:01:43] 	 (03CR) 10Alex Monk: [C: 032] Add exception for ALA hackathon at WMF Office [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220653 (https://phabricator.wikimedia.org/T103764) (owner: 10Jalexander)
[15:01:47] 	 (03PS3) 10Alexandros Kosiaris: CX: Add languages for deployment on 20150625 [puppet] - 10https://gerrit.wikimedia.org/r/220707 (https://phabricator.wikimedia.org/T95955) (owner: 10KartikMistry)
[15:01:49] 	 (03Merged) 10jenkins-bot: Add exception for ALA hackathon at WMF Office [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220653 (https://phabricator.wikimedia.org/T103764) (owner: 10Jalexander)
[15:01:54] 	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] CX: Add languages for deployment on 20150625 [puppet] - 10https://gerrit.wikimedia.org/r/220707 (https://phabricator.wikimedia.org/T95955) (owner: 10KartikMistry)
[15:02:08] 	 akosiaris: thanks!
[15:02:34] 	 !log krenair Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/220653/ (duration: 00m 12s)
[15:02:38] 	 Logged the message, Master
[15:02:47] 	 James_F, ^
[15:02:53] 	 Thanks, Krenair.
[15:03:51] 	 (03PS2) 10Alex Monk: CX: Enable Content Translation in 20150625 deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220706 (https://phabricator.wikimedia.org/T95955) (owner: 10KartikMistry)
[15:03:57] 	 (03CR) 10Alex Monk: [C: 032] CX: Enable Content Translation in 20150625 deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220706 (https://phabricator.wikimedia.org/T95955) (owner: 10KartikMistry)
[15:04:03] 	 (03Merged) 10jenkins-bot: CX: Enable Content Translation in 20150625 deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220706 (https://phabricator.wikimedia.org/T95955) (owner: 10KartikMistry)
[15:04:22] 	 kart_: yw. but please address that. It has become a blocker for you and unneeded irqs for us
[15:04:46] 	 !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220706/ (duration: 00m 12s)
[15:04:47] 	 akosiaris: oops. Can you file a bug, so I can give priority?
[15:04:47] 	 kart_, ^
[15:04:51] 	 Logged the message, Master
[15:05:28] 	 kart_: sure. projects to file against ?
[15:05:38] 	 (03PS1) 10Eevans: logstash-logback-encoder setup [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/220764 
[15:05:46] 	 akosiaris: cxserver-deployments
[15:08:00] 	 Krenair: doing other patch?
[15:08:07] 	 kart_, yeah, why?
[15:08:22] 	 you need to run the cxserver deployment?
[15:08:28] 	 RECOVERY - puppet last run on mw1054 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:08:32] 	 Krenair: nope. Go ahead!
[15:08:55] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1400652 (10RobH) @Ellery.  We would need a bit more information to implement this.  Please see the details on https://wikitech.wikimedia.org/wiki/Requesting_shell_access.    These instruc...
[15:09:07] 	 (03CR) 10Muehlenhoff: [C: 04-1] "There's a typo in the cron definition, it executes /usr/local/sbin/cercleaner.py instead of certcleaner.py" [puppet] - 10https://gerrit.wikimedia.org/r/220306 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott)
[15:09:59] 	 (03CR) 10Andrew Bogott: "Does pointing out a typo mean you approve of auto-signing in general?" [puppet] - 10https://gerrit.wikimedia.org/r/220306 (https://phabricator.wikimedia.org/T102504) (owner: 10Andrew Bogott)
[15:10:13] 	 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: configure less aggressive cassandra log rotation / send cassandra logs to logstash - https://phabricator.wikimedia.org/T100970#1400659 (10Eevans)
[15:10:50] 	 urgh... right, the auto-submodule-update thing
[15:10:55] 	 hm.
[15:12:52] 	 Krenair: again?
[15:12:56] 	 !log krenair Synchronized php-1.26wmf11/extensions/SemanticForms/includes/SF_AutoeditAPI.php: https://gerrit.wikimedia.org/r/#/c/220765/ (duration: 00m 12s)
[15:12:56] 	 (03PS2) 10Rush: pybal: switch lvs2002 & lvs2005 to confd pools [puppet] - 10https://gerrit.wikimedia.org/r/220759 
[15:13:01] 	 Logged the message, Master
[15:13:03] 	 kart_, I had forgotten about it, all is fine
[15:13:48] 	 (03CR) 10Rush: [C: 032] pybal: switch lvs2002 & lvs2005 to confd pools [puppet] - 10https://gerrit.wikimedia.org/r/220759 (owner: 10Rush)
[15:14:26] <_joe_>	 chasemp: why didn't you use hiera?
[15:14:34] 	 Coren: doing a round of CR now.
[15:14:52] 	 _joe_: for node selection?
[15:15:29] <_joe_>	 yep
[15:15:51] 	 (03PS2) 10Alex Monk: Enable the SandboxLink extension on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220408 (https://phabricator.wikimedia.org/T103643) (owner: 10Ricordisamoa)
[15:15:52] <_joe_>	 well, nevermind, follow your change :)
[15:15:54] 	 the logic is temporary and this puts it all in one place across sites for cleanup and is easier to test / possible to test with puppet apply locally to validate
[15:15:58] 	 (03CR) 10Alex Monk: [C: 032] Enable the SandboxLink extension on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220408 (https://phabricator.wikimedia.org/T103643) (owner: 10Ricordisamoa)
[15:16:00] <_joe_>	 ook
[15:16:01] <_joe_>	 np
[15:16:05] 	 (03Merged) 10jenkins-bot: Enable the SandboxLink extension on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220408 (https://phabricator.wikimedia.org/T103643) (owner: 10Ricordisamoa)
[15:16:08] <_joe_>	 look after your change
[15:16:10] <_joe_>	 :)
[15:16:41] 	 !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220408/ (duration: 00m 12s)
[15:16:47] 	 Logged the message, Master
[15:17:06] 	 kart_: https://phabricator.wikimedia.org/T103856
[15:17:38] 	 mutante: puppet runs in codfw seem to be starting ganglia-monitor which is stopped?  I dunnot if it is crashing in between runs and has a problem but I've seen it a few times fyi
[15:17:53] 	 (03CR) 10Alex Monk: "want to stick this up for the next swat deployment, matanya?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219599 (https://phabricator.wikimedia.org/T103062) (owner: 10Matanya)
[15:17:56] <_joe_>	 chasemp: or puppet doesn't detect correctly it's running
[15:18:16] 	 just as likely yes
[15:18:32] <_joe_>	 chasemp: I'm 99% sure that's the case
[15:18:37] 	 (03CR) 10Matanya: "yes, please." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219599 (https://phabricator.wikimedia.org/T103062) (owner: 10Matanya)
[15:18:58] <_joe_>	 I even looked into it and decided I didn't care enough to solve it until we've migrated to ganglia_new
[15:19:02] 	 (03CR) 10Ottomata: Add new projectview to projectcounts aggregation (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220752 (https://phabricator.wikimedia.org/T101118) (owner: 10Joal)
[15:19:34] 	 6operations, 10ops-codfw: EQDFW/EQORD Deployment Prep Task - https://phabricator.wikimedia.org/T91077#1400687 (10mark) Let's also get a few short SC-SC single mode fibers, so we can connect a few patches (delivered by Equinix) together, until we have the router working/up. We'll be splitting an existing connec...
[15:19:47] 	 Krenair: merged my patch? :)
[15:19:58] 	 Krenair: https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/mediawiki-config+-label:Code-Review-1+-label:Code-Review-2+-label:Verified-1,n,z has quite a bunch of patchsets…
[15:19:59] 	 kart_, the CX deployment one? yeah
[15:20:20] 	 James_F, I generally ignore the ones where ownerin:wmf-deployment
[15:20:30] 	 Krenair: second, was important to merge along with.
[15:20:44] 	 (03CR) 10Mobrovac: [C: 031] logstash-logback-encoder setup [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/220764 (owner: 10Eevans)
[15:20:46] 	 Krenair: Hmm. Reedy didn't. :-)
[15:21:00] 	 I didn't what?
[15:21:04] 	 Krenair: omg. I didn't save the page after adding to SWAT :/
[15:21:06] 	 Krenair: The old rule was "-1 it or Sam will merge it when it looks like a good time for it to go".
[15:21:07] 	 (03CR) 10Yuvipanda: Make labstore configuration into a module (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220618 (https://phabricator.wikimedia.org/T93781) (owner: 10coren)
[15:21:29] 	 Reedy: Let old config patches lie just 'cos someone who could deploy them had written them.
[15:21:31] 	 Coren: ^. I don't think the replica stuff belongs in NFS, but that's alright for now.
[15:21:40] 	 jfdi
[15:21:41] 	 Krenair: can you deploy https://gerrit.wikimedia.org/r/#/c/220747/ ?
[15:21:49] 	 (03PS2) 10Jforrester: Follow-up dab01edb: Remove remnants of the Parsoid extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/217539 
[15:22:13] 	 YuviPanda: This patch is just "move what exists into a module"
[15:22:15] 	 (03PS1) 10Alexandros Kosiaris: WIP: Use hiera to disable ntp fleet wise, with exceptions [puppet] - 10https://gerrit.wikimedia.org/r/220772 
[15:22:54] 	 (03PS4) 10Jforrester: Remove 'autoreview' usergroup from enwiki/testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203370 (https://phabricator.wikimedia.org/T91934) (owner: 10Cenarium)
[15:22:56] 	 Krenair: added in swat list. Sorry!
[15:23:00] 	 Coren: yeah, so just wanted a FIXME note :) we should put the monitoring into its own module as well, but am ok if you want to put that inot a separate patch
[15:23:10] 	 okay
[15:23:24] 	 (03PS2) 10Alex Monk: CX: Disable newarticle for languages deployed on 20150625 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220747 (https://phabricator.wikimedia.org/T103809) (owner: 10KartikMistry)
[15:24:04] 	 (03CR) 10Alex Monk: [C: 032] CX: Disable newarticle for languages deployed on 20150625 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220747 (https://phabricator.wikimedia.org/T103809) (owner: 10KartikMistry)
[15:24:10] 	 (03Merged) 10jenkins-bot: CX: Disable newarticle for languages deployed on 20150625 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220747 (https://phabricator.wikimedia.org/T103809) (owner: 10KartikMistry)
[15:24:16] 	 (03PS2) 10Jforrester: Enwiki: remove changetags from user and add it to sysop, bot and abusefilter groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium)
[15:24:35] 	 Krenair: thanks!
[15:24:42] 	 !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220747/ (duration: 00m 12s)
[15:24:47] 	 Logged the message, Master
[15:24:48] 	 kart_, you probably should have done the dependency the other way around
[15:24:54] 	 or even in one patch, perhaps
[15:25:16] 	 (03CR) 10coren: Make labstore configuration into a module (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220618 (https://phabricator.wikimedia.org/T93781) (owner: 10coren)
[15:25:41] 	 (03CR) 10Jakob: Add Phragile module. (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) (owner: 10Jakob)
[15:25:47] 	 (03PS2) 10Jforrester: Labs should not use protocol-relative URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220068 (owner: 10Cscott)
[15:25:50] 	 I'd really like to get https://gerrit.wikimedia.org/r/#/c/139326/ in but it's blocked by ops reviewing a trivial puppet patch
[15:25:59] 	 Krenair: yeah. Noted. Thanks!
[15:26:07] 	 Krenair: better idea.
[15:26:28] 	 (03PS3) 10Jakob: Add Phragile module. [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) 
[15:26:42] 	 (03PS3) 10coren: Make labstore configuration into a module [puppet] - 10https://gerrit.wikimedia.org/r/220618 (https://phabricator.wikimedia.org/T93781) 
[15:26:50] 	 YuviPanda: ^^
[15:26:54] 	 (03PS2) 10Jforrester: Update Spam and Title blacklist URLs to be https [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218539 (owner: 10Hoo man)
[15:27:06] 	 (03CR) 10Yuvipanda: "Does it have to be root? can we have a special purpose account that's just for rsync backups without too much effort?" [puppet] - 10https://gerrit.wikimedia.org/r/220763 (owner: 10coren)
[15:27:09] 	 RECOVERY - puppet last run on cp2026 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:27:47] 	 (03CR) 10Yuvipanda: [C: 031] "LGTM if we're ok splitting out the monitoring later" [puppet] - 10https://gerrit.wikimedia.org/r/220618 (https://phabricator.wikimedia.org/T93781) (owner: 10coren)
[15:27:58] 	 (03CR) 10coren: "Not really; rsync needs root in order to set owner/permission/etc and giving sudo to rsync is strictly equivalent to giving root." [puppet] - 10https://gerrit.wikimedia.org/r/220763 (owner: 10coren)
[15:28:15] 	 Krenair: OK, that's at least 5 patches we can do at 16:00 (but sadly I'll be out of the office).
[15:28:20] 	 (03PS1) 10Faidon Liambotis: Fix cr2-eqiad/cr1-esams GRE's PTR typos [dns] - 10https://gerrit.wikimedia.org/r/220775 
[15:28:22] 	 (03PS1) 10Faidon Liambotis: Add loopback IPs for cr1-eqord and cr1-eqdfw [dns] - 10https://gerrit.wikimedia.org/r/220776 
[15:28:24] 	 (03PS1) 10Faidon Liambotis: (WIP) Allocate neighbor blocks for cr1-eqord/cr1-eqdfw [dns] - 10https://gerrit.wikimedia.org/r/220777 
[15:28:48] 	 (03PS2) 10Jforrester: Disable uploads on br.wikipedia (except emergency uploads) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220715 (https://phabricator.wikimedia.org/T103068) (owner: 10Nemo bis)
[15:28:54] 	 (03CR) 10Jforrester: [C: 031] Disable uploads on br.wikipedia (except emergency uploads) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220715 (https://phabricator.wikimedia.org/T103068) (owner: 10Nemo bis)
[15:29:01] 	 (03CR) 10Jforrester: [C: 031] Update Spam and Title blacklist URLs to be https [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218539 (owner: 10Hoo man)
[15:29:18] 	 (03CR) 10Jforrester: [C: 031] Labs should not use protocol-relative URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220068 (owner: 10Cscott)
[15:29:25] 	 (03CR) 10Jforrester: [C: 031] Enwiki: remove changetags from user and add it to sysop, bot and abusefilter groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium)
[15:29:30] 	 (03CR) 10Jforrester: [C: 031] Remove 'autoreview' usergroup from enwiki/testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203370 (https://phabricator.wikimedia.org/T91934) (owner: 10Cenarium)
[15:29:45] 	 Coren: this is rsync over ssh, right?
[15:29:50] 	 (03CR) 10Alex Monk: [C: 032] Disable uploads on br.wikipedia (except emergency uploads) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220715 (https://phabricator.wikimedia.org/T103068) (owner: 10Nemo bis)
[15:29:51] * Coren nods.
[15:29:52] 	 and that's why it needs the root key?
[15:29:55] 	 Yep.
[15:29:55] 	 (03Merged) 10jenkins-bot: Disable uploads on br.wikipedia (except emergency uploads) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220715 (https://phabricator.wikimedia.org/T103068) (owner: 10Nemo bis)
[15:29:57] 	 (03CR) 10Jforrester: [C: 031] Add dvidshub to whitelist upload URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219599 (https://phabricator.wikimedia.org/T103062) (owner: 10Matanya)
[15:30:07] 	 (03CR) 10Yuvipanda: [C: 031] "Alright then!" [puppet] - 10https://gerrit.wikimedia.org/r/220763 (owner: 10coren)
[15:30:09] 	 Coren: alright! 
[15:30:14] 	 James_F, there's still plenty of time to swat more things in the queue
[15:30:31] 	 Krenair: Hmm. Want to do it now?
[15:30:42] 	 YuviPanda: Besides, because of the shared filesystems etc, having root on a labstore is already equivalent to having root on all of them.  They're really a single security domain.
[15:30:44] 	 !log krenair Synchronized commonsuploads.dblist: https://gerrit.wikimedia.org/r/#/c/220715/ (duration: 00m 12s)
[15:30:50] 	 Logged the message, Master
[15:30:54] 	 (03PS2) 10Alexandros Kosiaris: Update package_builder docs [puppet] - 10https://gerrit.wikimedia.org/r/219374 
[15:30:59] 	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Update package_builder docs [puppet] - 10https://gerrit.wikimedia.org/r/219374 (owner: 10Alexandros Kosiaris)
[15:31:19] 	 YuviPanda: ty for the reviews, that's all I needed to get the backups going.
[15:31:26] 	 yw! sorry about the delay
[15:31:35] 	 (03CR) 10Alex Monk: [C: 032] "Seems sane to me..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220068 (owner: 10Cscott)
[15:31:41] 	 (03Merged) 10jenkins-bot: Labs should not use protocol-relative URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220068 (owner: 10Cscott)
[15:31:45] 	 (03CR) 10Mark Bergsma: [C: 031] Add loopback IPs for cr1-eqord and cr1-eqdfw [dns] - 10https://gerrit.wikimedia.org/r/220776 (owner: 10Faidon Liambotis)
[15:31:48] 	 YuviPanda: You need a clone to delegate some work.  :-)
[15:31:58] 	 Coren: you're the third person to say that :)
[15:32:00] 	 today
[15:32:03] 	 Hah!
[15:32:20] 	 !log krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/220068/ - noop for prod, just labs (duration: 00m 12s)
[15:32:26] 	 Logged the message, Master
[15:32:40] 	 (03PS4) 10coren: Make labstore configuration into a module [puppet] - 10https://gerrit.wikimedia.org/r/220618 (https://phabricator.wikimedia.org/T93781) 
[15:33:51] 	 (03CR) 10coren: [C: 032] "There is already a further round of improvements in the queue, I'll split the monitoring away there." [puppet] - 10https://gerrit.wikimedia.org/r/220618 (https://phabricator.wikimedia.org/T93781) (owner: 10coren)
[15:33:53] 	 James_F, what about https://gerrit.wikimedia.org/r/#/c/200038/ ?
[15:34:52] 	 Krenair: New features probably should get a comment from the author (cscott_away).
[15:35:09] 	 James_F, yeah but I've been waiting for that for ages and it hasn't happened
[15:35:28] 	 :-(
[15:36:23] 	 (03PS1) 10coren: Labs: fix completely dumb type in role::labs::nfs [puppet] - 10https://gerrit.wikimedia.org/r/220779 
[15:36:36] 	 Coren: hehe at 'type'
[15:36:46] 	 Krenair: Should we do https://gerrit.wikimedia.org/r/#/c/218539/ https://gerrit.wikimedia.org/r/#/c/218353/ https://gerrit.wikimedia.org/r/#/c/203370/ ?
[15:37:02] 	 Of *course* I would have a typo in the commit msg of a typo fix.
[15:37:06] 	 James_F, what will it take to get someone in the ops group to +2 https://gerrit.wikimedia.org/r/#/c/139581/ ?
[15:37:19] 	 Krenair: No idea; looks scary.
[15:37:20] 	 (03PS2) 10coren: Labs: fix completely dumb typo in role::labs::nfs [puppet] - 10https://gerrit.wikimedia.org/r/220779 
[15:38:32] 	 YuviPanda: I'm impressed that we managed to *both* completely miss something this glaring.  I guess it's because it made semantic sense even though it was garbage.  :-)
[15:38:36] 	 7Blocked-on-Operations, 6Collaboration-Team, 10Echo, 10Wikimedia-Extension-setup, 5Patch-For-Review: remove echowikis to simplify configuration - https://phabricator.wikimedia.org/T59375#1400772 (10Krenair) Blocked on someone in operations reviewing https://gerrit.wikimedia.org/r/#/c/139581/
[15:39:08] 	 (03CR) 10coren: [C: 032] "Trivial glaring typo fix." [puppet] - 10https://gerrit.wikimedia.org/r/220779 (owner: 10coren)
[15:39:24] 	 James_F, Cenarium's patches... enwiki :/
[15:39:54] 	 Krenair: Community consensus exhibited on both
[15:40:28] 	 (03PS2) 10Alexandros Kosiaris: WIP: Use hiera to disable ntp fleet wise, with exceptions [puppet] - 10https://gerrit.wikimedia.org/r/220772 
[15:40:38] 	 Ah, looks like they don't actually have any users in that group anyway
[15:41:12] 	 Yeah.
[15:41:35] 	 there's some people with it on testwiki
[15:41:42] 	 (03PS12) 10BBlack: move text backend_random into "directors" [puppet] - 10https://gerrit.wikimedia.org/r/220645 
[15:41:43] 	 Want me to remove it from them?
[15:41:59] 	 yes please
[15:42:22] 	 Maybe I should get a sysadmin flag to do these things :p
[15:42:46] 	 (03PS2) 10coren: Add a cross-labstore key for backups [puppet] - 10https://gerrit.wikimedia.org/r/220763 
[15:43:09] 	 What's the display name of the autoreviewer group?
[15:43:21] 	 "Autochecked users"
[15:43:29] 	 You can find it easily by going to Special:ListUsers/autoreview
[15:43:33] 	 Oh, yeah.
[15:43:37] 	 it shows the display name as the selected option
[15:43:57] 	 (03CR) 10coren: [C: 032] "Merge after conflict." [puppet] - 10https://gerrit.wikimedia.org/r/220763 (owner: 10coren)
[15:45:15] 	 (03PS2) 10Dzahn: misc-web varnish: remove stat1001 from config [puppet] - 10https://gerrit.wikimedia.org/r/220692 
[15:45:20] 	 (03CR) 10Muehlenhoff: "Tests were successful both using zotero standalone (as tested on deployment-zotero01) and with zotero serving citoid (as tested on deploym" [puppet] - 10https://gerrit.wikimedia.org/r/220434 (https://phabricator.wikimedia.org/T98852) (owner: 10Muehlenhoff)
[15:46:17] 	 (03CR) 10Dzahn: [C: 032] misc-web varnish: remove stat1001 from config [puppet] - 10https://gerrit.wikimedia.org/r/220692 (owner: 10Dzahn)
[15:47:43] 	 Finally.
[15:47:44] 	 Krenair: https://test.wikipedia.org/w/index.php?title=Special%3AListUsers&username=&group=autoreview&limit=50
[15:48:04] 	 (03CR) 10Alex Monk: [C: 032] Update Spam and Title blacklist URLs to be https [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218539 (owner: 10Hoo man)
[15:48:28] 	 (03Merged) 10jenkins-bot: Update Spam and Title blacklist URLs to be https [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218539 (owner: 10Hoo man)
[15:49:21] 	 !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218539/ (duration: 00m 15s)
[15:49:27] 	 Logged the message, Master
[15:49:38] 	 (03CR) 10Dzahn: backup home dirs on bastion hosts per role (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220657 (owner: 10Dzahn)
[15:49:55] 	 Krenair: https://gerrit.wikimedia.org/r/#/c/217539/ is now good to go too.
[15:49:56] 	 (03PS3) 10Dzahn: backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 
[15:50:10] 	 (03CR) 10Alex Monk: [C: 032] Remove 'autoreview' usergroup from enwiki/testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203370 (https://phabricator.wikimedia.org/T91934) (owner: 10Cenarium)
[15:50:36] 	 (03Merged) 10jenkins-bot: Remove 'autoreview' usergroup from enwiki/testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203370 (https://phabricator.wikimedia.org/T91934) (owner: 10Cenarium)
[15:51:10] 	 !log krenair Synchronized wmf-config/flaggedrevs.php: https://gerrit.wikimedia.org/r/#/c/203370/ (duration: 00m 12s)
[15:51:16] 	 Logged the message, Master
[15:51:52] 	 James_F, let's wait for the parsoid team re. the Parsoid extension
[15:52:16] 	 Krenair: I did; we removed the last bits of it from Puppet last week, I just forgot this bit. :-)
[15:52:24] 	 (03PS1) 10coren: Labs: allow role::nfs::fileserver to use labstore key [puppet] - 10https://gerrit.wikimedia.org/r/220782 
[15:52:35] 	 YuviPanda: Last quickie?  ^^
[15:52:42] 	 looking
[15:53:01] 	 Coren: woah, why?
[15:53:11] 	 why %.d?
[15:53:27] 	 I'm confused
[15:53:32] 	 Because supplemental keys go there per ssh::userkeys
[15:53:37] 	 They're just not read by default
[15:53:59] 	 that seems like a bug in ssh::userkeys to not specify that by default in ssh server config
[15:54:01] 	 This allows checking for a %u.d/labstore key if present.  Same pattern as ganeti
[15:54:04] 	 (03PS1) 10BryanDavis: logging: Force Monolog logger timezone to UTC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220784 (https://phabricator.wikimedia.org/T99581) 
[15:54:06] 	 paravoid: ^
[15:54:20] 	 finally
[15:54:54] 	 (03PS2) 10BryanDavis: logging: Force Monolog logger timezone to UTC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220784 (https://phabricator.wikimedia.org/T99581) 
[15:55:04] 	 hmm, so as is ssh::userkeys won't actually work for supplemental keys without this extra hack...
[15:55:09] 	 (03CR) 10Alex Monk: [C: 032] Follow-up dab01edb: Remove remnants of the Parsoid extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/217539 (owner: 10Jforrester)
[15:55:12] 	 I think we should just add this to the default list
[15:55:14] 	 *path
[15:55:39] 	 (03Merged) 10jenkins-bot: Follow-up dab01edb: Remove remnants of the Parsoid extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/217539 (owner: 10Jforrester)
[15:55:51] 	 hmm, unless I'm missing something else.
[15:56:22] 	 (03CR) 10Dzahn: [C: 032] add an empty template for domain parking [dns] - 10https://gerrit.wikimedia.org/r/216025 (owner: 10Dzahn)
[15:56:32] 	 right, I gues ssh won't know how exactly look for the name labstore.
[15:56:44] 	 !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/217539/ (duration: 00m 13s)
[15:56:50] 	 Logged the message, Master
[15:56:52] 	 paravoid: ? finally what?
[15:56:56] 	 (03CR) 10Mobrovac: Enable firejail containment for zotero (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/220434 (https://phabricator.wikimedia.org/T98852) (owner: 10Muehlenhoff)
[15:57:04] 	 YuviPanda: Yeah, AuthorizedKeysFile doesn't actually allow you to use a *.d directory without being explicit.
[15:57:05] 	 oh I thought you pointed me at the Monolog change
[15:57:11] 	 !log krenair Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/217539/ - noop for prod, labs only part (duration: 00m 12s)
[15:57:12] 	 paravoid: haha, no :P
[15:57:14] 	 it was my bug report :P
[15:57:17] 	 Logged the message, Master
[15:57:18] 	 paravoid: that was just a clash.
[15:57:20] 	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 9.09% of data above the critical threshold [500.0]
[15:57:25] 	 James_F, anything else?
[15:57:43] 	 paravoid: this is supplemental keys in ssh::userkey requiring  a fairly inelegant hack to ssh::server::authorized_key_file to work.
[15:58:10] 	 (03CR) 10Alex Monk: [C: 032] Add dvidshub to whitelist upload URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219599 (https://phabricator.wikimedia.org/T103062) (owner: 10Matanya)
[15:58:28] 	 what is?
[15:58:33] 	 (03Merged) 10jenkins-bot: Add dvidshub to whitelist upload URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/219599 (https://phabricator.wikimedia.org/T103062) (owner: 10Matanya)
[15:58:39] 	 RECOVERY - puppet last run on cp2022 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures
[15:58:43] 	 paravoid: see https://gerrit.wikimedia.org/r/#/c/220782/
[15:58:52] 	 looking
[15:58:59] 	 yeah that's the ganeti hack
[15:59:05] 	 !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/219599/ (duration: 00m 12s)
[15:59:09] 	 YuviPanda: I can't think of a way around yet though.
[15:59:10] 	 I don't like it very much, haven't come up with a better one
[15:59:11] 	 Logged the message, Master
[15:59:13] 	 ugh
[15:59:15] 	 alright then
[15:59:32] 	 I think I missed that /labstore bit and was confused
[15:59:34] 	 paravoid: Short of resource collection, anyways.
[15:59:39] 	 and thought 'of course there has to be an easy way!!!'
[15:59:53] 	 Krenair: Not right now, I think.
[15:59:57] 	 Krenair: Oh, https://gerrit.wikimedia.org/r/#/c/218353/
[15:59:59] 	 Krenair: Did you want to +2 that too?
[16:00:01] 	 paravoid: sorry I'm so slow to fix upstream regressions ;)
[16:00:09] 	 James_F, not really
[16:00:10] 	 Coren: anyway, +1 but would love if you added a comment along the lines of 'same as ganeti' or something like that? things that look strange should always have comments I guess.
[16:00:12] 	 James_F, it's too late anyway
[16:00:17] 	 Krenair: Meh. OK.
[16:00:41] 	 YuviPanda: On the commit msg?  Sure thing.
[16:00:52] 	 Coren: ideally in the hiera YAML file itself, actually
[16:01:01] 	 not in the commit message, since that's one hop away from the strangeness
[16:01:12] 	 (03PS2) 10Dzahn: add вікімедіа.укр (xn--80adgdym4pbd.xn--j1amh) [dns] - 10https://gerrit.wikimedia.org/r/215212 (https://phabricator.wikimedia.org/T95433) 
[16:01:34] 	 6operations, 10Deployment-Systems, 10RESTBase, 6Release-Engineering, 6Services: Get ops feedback regarding the use of SSH for deployment system control channel. - https://phabricator.wikimedia.org/T102687#1400870 (10mmodell)  >>! In T102687#1395521, @Joe wrote: > I strongly oppose to using mcollective, F...
[16:02:20] 	 (03PS3) 10Alexandros Kosiaris: Use hiera to disable ntp fleet wise, with exceptions [puppet] - 10https://gerrit.wikimedia.org/r/220772 
[16:02:30] 	 (03PS2) 10coren: Labs: allow role::nfs::fileserver to use labstore key [puppet] - 10https://gerrit.wikimedia.org/r/220782 
[16:02:32] 	 YuviPanda: How's this?  ^^
[16:02:56] 	 (03CR) 10Yuvipanda: [C: 031] Labs: allow role::nfs::fileserver to use labstore key [puppet] - 10https://gerrit.wikimedia.org/r/220782 (owner: 10coren)
[16:03:10] 	 (03CR) 10coren: [C: 032] Labs: allow role::nfs::fileserver to use labstore key [puppet] - 10https://gerrit.wikimedia.org/r/220782 (owner: 10coren)
[16:03:35] 	 (03PS3) 10coren: Labs: allow role::nfs::fileserver to use labstore key [puppet] - 10https://gerrit.wikimedia.org/r/220782 
[16:03:41] 	 Bleh rebase.
[16:04:29] * Coren wishes gerrit could hey, I was asked to submit but a rebase is now needed, rebase and submit if it can be done without conflict
[16:08:11] 	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "the premise looks good, inline comments" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/220434 (https://phabricator.wikimedia.org/T98852) (owner: 10Muehlenhoff)
[16:08:18] 	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[16:09:54] 	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "change looks good, some erraneous changes though" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220657 (owner: 10Dzahn)
[16:10:23] 	 Coren: I think gerrit can actually be configured to do that (automatically rebase) but we don't turn it on
[16:11:46] 	 (03PS3) 10Andrew Bogott: puppetmaster: Enable autosigning puppet certs for labs [puppet] - 10https://gerrit.wikimedia.org/r/218380 (https://phabricator.wikimedia.org/T102504) (owner: 10Yuvipanda)
[16:11:48] 	 (03PS3) 10Andrew Bogott: Switch on salt auto_accept for labs. [puppet] - 10https://gerrit.wikimedia.org/r/220306 (https://phabricator.wikimedia.org/T102504) 
[16:12:15] 	 (03CR) 10Glaisher: "Perhaps the migrateUserGroup.php could be run when this is deployed? Then we won't need a local crat to be around when this is done." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218926 (https://phabricator.wikimedia.org/T102770) (owner: 10Glaisher)
[16:12:25] 	 (03PS4) 10Dzahn: backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 
[16:12:54] 	 akosiaris: arrr. thanks.. no i did not want to edit those files :p
[16:15:06] 	 mutante: :-)
[16:15:15] 	 (03PS5) 10Dzahn: backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 
[16:15:25] 	 but now 
[16:15:54] 	 akosiaris: ..unless we don't want to encourage people using a bastion for work in the first place
[16:16:49] 	 mutante: niah, let's do it, it might be useful some day
[16:17:19] 	 (03CR) 10Alexandros Kosiaris: [C: 031] backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 (owner: 10Dzahn)
[16:17:36] 	 (03PS6) 10Dzahn: backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 
[16:18:31] 	 (03CR) 10Dzahn: [C: 032] backup home dirs on bastion hosts per role [puppet] - 10https://gerrit.wikimedia.org/r/220657 (owner: 10Dzahn)
[16:19:26] 	 (03PS2) 10Joal: Add new projectview to projectcounts aggregation [puppet] - 10https://gerrit.wikimedia.org/r/220752 (https://phabricator.wikimedia.org/T101118) 
[16:19:39] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1401058 (10RobH) I asked that question, but I think all maintenance scripts (that I am aware of) used on terbium are for full cluster deployment access.    Ori has a lot of rights that ar...
[16:20:14] 	 (03CR) 10Joal: Add new projectview to projectcounts aggregation (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220752 (https://phabricator.wikimedia.org/T101118) (owner: 10Joal)
[16:20:40] 	 (03PS2) 10Faidon Liambotis: (WIP) Allocate neighbor blocks for cr1-eqord/cr1-eqdfw [dns] - 10https://gerrit.wikimedia.org/r/220777 
[16:20:42] 	 (03PS1) 10Faidon Liambotis: Repurpose s/cr2-eqiad/cr1-eqord/ to link with codfw [dns] - 10https://gerrit.wikimedia.org/r/220811 
[16:21:38] 	 ori: https://phabricator.wikimedia.org/T103782 is someone requesting access to terbium to run a script you wrote.  If you have a moment to advise =]
[16:21:53] 	 basically wondering if we can give less than full deployment for this to work or if we just need to give full deploy
[16:22:02] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1401069 (10Krenair) >>! In T103782#1401058, @RobH wrote: > all maintenance scripts (that I am aware of) used on terbium are for full cluster deployment access.  Sorry, I have no idea what...
[16:22:31] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1401078 (10Dzahn) fwiw: when seeing "send mass emails to editors from an internal email address", silverpop and fundraising comes to mind. They use https://en.wikipedia.org/wiki/Silverpop...
[16:24:16] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1401092 (10RobH) It was poorly worded, re-attempting:  My understanding was the majority of the scripts for maintenance on terbium are for use by those with full deployment rights.    Of...
[16:25:03] 	 6operations, 10Traffic: check if services behind misc-web enforce http->https redirect or not - https://phabricator.wikimedia.org/T103773#1401093 (10Dzahn) All service names configured in ''./templates/varnish/misc.inc.vcl.erb''. Are they enforcing https?  {| class="wikitable sortable" |- ! service name !! red...
[16:25:08] 	 6operations, 10Traffic: check if services behind misc-web enforce http->https redirect or not - https://phabricator.wikimedia.org/T103773#1401094 (10Dzahn) 5Open>3Resolved
[16:27:51] 	 mutante: Could you have a look at the svn redirects patch again?
[16:27:52] 	 6operations, 10Traffic: check if services behind misc-web enforce http->https redirect or not - https://phabricator.wikimedia.org/T103773#1401105 (10Dzahn) stat1001 - removed dev - pending to be removed download/dumps - afair there were concerns to enforce https here (what where they exactly?) releases - uses...
[16:28:48] 	 hi kart_. :-)
[16:29:22] <_joe_>	 mutante: have you seen my ganglia change?
[16:29:41] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1401124 (10Dzahn) Most maintenance scripts on terbium are executed by cron jobs.  Let's find out which specific script (name/location) it is that Ori wrote for this.
[16:29:46] <_joe_>	 mutante: I basically reversed the logic in eqiad to just /exclude/ what hasn't been migrated
[16:29:54] 	 so, I was wondering if you have some time for the email tests. I think this will be a longer process so we should use some of your time, but we shouldn't rely fully on you for the test, kart_.
[16:30:15] 	 ostriches: i will, ..today
[16:30:19] 	 Thx!
[16:30:45] 	 _joe_: i have not. that sounds cool. so it should just be analytics and analytics-kafka (and dunno about FR)
[16:31:21] 	 _joe_: i'll focus on getting the analytics ones solved soon
[16:31:23] 	 (03PS1) 10Giuseppe Lavagetto: varnish: add fixed_weight to the directors options [puppet] - 10https://gerrit.wikimedia.org/r/220815 
[16:32:12] <_joe_>	 mutante: and logstash?
[16:32:32] <_joe_>	 mutante: I'd focus on getting an aggregator un in ulsfo :)
[16:32:40] <_joe_>	 s/un/up/
[16:33:14] 	 _joe_: right, logstash. same reason as analytics. ok
[16:33:56] <_joe_>	 mutante: then we're left with netmon1001
[16:34:10] <_joe_>	 actually, I'll look into netmon1001
[16:34:53] 	 ok, thanks
[16:47:52] 	 leila: hello
[16:47:52] 	 leila: sorry, was in other window.
[16:47:52] 	 np kart_ :-)
[16:47:52] 	 leila: do we've patch for it? Or how you generally do that?
[16:47:52] 	 (03PS1) 10Dzahn: bugzilla_static: include role::backup::host [puppet] - 10https://gerrit.wikimedia.org/r/220818 (https://phabricator.wikimedia.org/T95184) 
[16:47:52] 	 I'm actually not sure. :-\ let me ask Ellery. it will take 10 min since he's fixing something else with jdlrobson. 
[16:47:52] 	 kart_, ^
[16:47:52] 	 moritzm: you around?
[16:47:52] 	 leila: okay!
[16:47:53] 	 ottomatta: I want to access the HTTP request logs for various things stemming from the Wikipedia Portal.  How do I get access?
[16:47:53] 	 cc: ottomata
[16:47:53] 	 ottomata: context -- https://phabricator.wikimedia.org/T100673
[16:47:55] 	 gwicke: yep
[16:47:55] 	 I was just starting to write a reply to your mail asking about the bit about NTP moving the system clock *forward*
[16:47:56] 	 my understanding is that it'll slow down the system clock until it matches the new time with the leap second
[16:47:56] 	 thus smearing out the leap second, without introducing a jump
[16:47:56] 	 *jump
[16:47:56] 	 that's implemented in the linux kernel now, but only very recently (it was merged into Linus git 11 days ago), so no real chance to deploy it by Tuesday
[16:47:56] 	 but should be available for the next leap second
[16:47:56] 	 ntp supports gradual adjustments too
[16:47:56] 	 it's the default mode for small-ish adjustments
[16:47:56] 	 I thought that was the point behind disabling ntp
[16:47:57] 	 so that we can start ntp with the right flags to ensure a gradual adjustment
[16:48:49] 	 from the man page:
[16:48:53] 	 -x     Normally, the time is slewed if the offset is less than the step threshold, which
[16:48:54] 	               is 128 ms by default, and stepped if above the threshold.  This option  sets  the
[16:48:56] 	               threshold  to  600  s,  which is well within the accuracy window to set the clock
[16:48:57] 	               manually.
[16:49:43] 	 we're mostly concerned about the specific leap second moment (as in applications reacting on a 23:59:60 time) and related bugs, that's what the current approach bypasses
[16:51:03] 	 there's a large class of issues caused by non-monotonicity though
[16:51:11] 	 but you have a point, we'll check how to re-enable NTP on the 1st as smoothly as possible
[16:52:01] 	 cool, thanks!
[16:52:05] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1401233 (10RobH) a:3ori I've assigned this to Ori for him to comment on his script and the access it requires.  @Ori: If this isn't cool, sorry!  Just put it back to up for grabs and op...
[16:52:10] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1401237 (10RobH) @ellery: if you know the script details, please share.  (I just assumed you didn't have them, since you are requesting access to said script.)
[16:52:23] 	 So, I’m way behind on this… does the leap second mean that the unix count is skipping a second?  Or just that the formula by which the count is converted to human-readable dates changes?
[16:52:35] 	 moritzm: for cassandra monotonicity is what matters, as we are using time to capture causality
[16:52:52] 	 andrewbogott: it's repeating a second
[16:53:13] 	 kart_: is it fine that we add you to our Hangout?
[16:53:37] 	 gwicke: seriously?  So seconds-since-epoch isn’t really seconds-since-epoch?  Why would they do it that way rather than just changing the human readable bits?
[16:54:25] 	 andrewbogott: worse is better ;)
[16:55:00] 	 6operations, 10ops-codfw, 10Incident-20150617-LabsNFSOutage: Labstore2001 controler or shelf failure - https://phabricator.wikimedia.org/T102626#1401292 (10greg)
[16:55:09] 	 10Ops-Access-Requests, 6operations: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1401295 (10Niedzielski) 3NEW
[16:55:38] 	 andrewbogott: posix epoch predates leap seconds :)
[16:56:23] 	 paravoid: good point, but barely
[16:56:25] 	 paravoid: right, but I thought the whole point of epoch was that it existed outside of mundane concerns like planetary rotation
[16:56:30] 	 http://www.ucolick.org/~sla/leapsecs/onlinebib.html -- "Unix system time and the POSIX standard"
[16:56:42] 	 warning: you'll head will explode
[16:56:56] 	 but: https://en.wikipedia.org/wiki/Leap_second#Proposal_to_abolish_leap_seconds
[16:56:58] 	 it's incredibly complicated and silly stuff
[16:57:27] 	 leila: fine!
[16:57:38] 	 PROBLEM - puppet last run on db1030 is CRITICAL Puppet has 1 failures
[16:58:09] 	 yeah I linked a document the other day of some list of timekeepers discussing this
[16:58:13] 	 and the possibility of a leap day
[16:58:24] 	 with silly statements such as referring to year 8600 or something
[16:58:45] 	 (03PS7) 10Alexandros Kosiaris: lvs::configuration: use hiera for lvs_service_ips [puppet] - 10https://gerrit.wikimedia.org/r/217289 
[16:59:08] 	 done, kart_. :-)
[16:59:36] 	 So… I’m not sure that page answers my question.  I understand why leap seconds are needed… but it seems like they should just be part of the math used to calculate the clock display.  Obviously we don’t stop the epoch clock for a day every fourth february
[17:00:09] 	 how?
[17:00:15] 	 that would mean redefining epoch
[17:00:50] 	 …
[17:01:00] 	 too late for that! :)
[17:01:14] 	 ah, you mean, because the epoch would be 11:59:58 on january 31 1969?
[17:01:49] 	 huh?
[17:02:01] 	 So, here’s what I think the epoch count is:
[17:02:14] 	 A bell rang at midnight at the start of 1970, and the clock started counting at 0
[17:02:23] 	 So now, it contains how many seconds have passed since that bell rang.
[17:02:24] 	 Incorrect?
[17:02:24] 	 leap seconds get injected all the time, but it's not known in advance when they are going to be
[17:02:49] 	 well in advance, I mean
[17:02:57] 	 it's decided by humans like 6 months in advance
[17:03:32] 	 and yes, the above statement is incorrect
[17:03:38] 	 ok, how is it incorrect?
[17:03:39] 	 because of leap seconds :)
[17:03:42] 	 yes!
[17:03:45] 	 But they’re doing it wrong
[17:03:47] 	 :)
[17:04:07] 	 if they had incorporated that, the epoch -> display time conversion wouldn't be algorithmically easy
[17:04:22] 	 you'd need a list of leap seconds as well to be able to convert
[17:04:24] 	 yep
[17:04:48] 	 ok, so the answer to my original question is “Someone decided it would be easier to stop the clock than complicate the time functions”
[17:04:53] 	 no
[17:04:54] 	 19:55 < paravoid> andrewbogott: posix epoch predates leap seconds :)
[17:04:59] 	 RECOVERY - puppet last run on db1030 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures
[17:05:12] 	 (03CR) 10Alexandros Kosiaris: "running a fleet wide catalog compilation on this one" [puppet] - 10https://gerrit.wikimedia.org/r/217289 (owner: 10Alexandros Kosiaris)
[17:05:14] 	 ^proxy again
[17:05:23] 	 sure, but it’s not like we can’t update the epoch -> display code each time a leap second is declared.
[17:05:31] 	 hm
[17:05:33] 	 how?
[17:05:45] 	 I guess the leap second isn’t done via a software update but rather via a few distant ntp servers, huh?
[17:05:46] 	 we're running kernels > 6 months old for sure
[17:06:05] 	 there is a flag in the kernel that gets set for "upcoming leap second"
[17:06:13] 	 and a flag on the ntp protocol to propagate that
[17:06:33] 	 it's a single bit that gets flipped, propages through the time network and into e.g. the Linux kernel
[17:07:30] 	 (and after the date passes, it gets un-flipped)
[17:07:32] 	 andrewbogott: what you want is called TAI, and is used in GPS among others
[17:07:47] 	 http://www.timeanddate.com/time/international-atomic-time.html
[17:07:48] 	 https://en.wikipedia.org/wiki/International_Atomic_Time
[17:07:53] 	 ^good explanation
[17:07:54] 	 gwicke: yep, I presumed that the unix epoch was the same, but, clearly not.
[17:08:19] 	 humans need imperfect measurement units
[17:08:31] 	 http://www.cl.cam.ac.uk/~mgk25/time/leap/ is a good read too
[17:09:12] 	 http://www.mail-archive.com/leapsecs@rom.usno.navy.mil/msg00206.html is the thread I was referring to above (started by the same guy)
[17:09:13] 	 "After the next leap second has been added on June 30, 2015, the difference between UTC and the International Atomic Time (UTC-TAI) will be 36 sec."
[17:09:28] 	 "We do this for the first time when we skip the Gregorian leap day 5600-02-29"
[17:09:31] 	 6operations: Install molly-guard on production hosts - https://phabricator.wikimedia.org/T103873#1401362 (10yuvipanda) 3NEW
[17:09:38] 	      8400-02-28: British time = TI - 11 h
[17:09:38] 	      8400-03-01: British time = TI + 13 h   (8400-02-29 skipped in civilian time)
[17:09:41] 	      8423:       British time = TI + 12 h
[17:10:38] 	 So when TI may have the value 7001-01-01T12:00:00 TI, JD(TI) 4278123, Thursday
[17:10:41] 	 it is completely understandable that UT might have the value 7000-12-31T12:00:00 UT, JD(UT) 4278122, Wednesday
[17:10:42] 	 6operations, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, 6Multimedia, and 6 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1401372 (10Krinkle) >>! In T102566#1400166, @hashar wrote: >>>! In T102566#1399147, @Legoktm wrote: >>  >> Today @Krinkle...
[17:10:45] 	 etc.
[17:10:52] 	 jeebus
[17:11:01] 	 don't know about you guys, but I find planning for the year 8432 a bit hillarious :)
[17:11:35] 	 I like my nights by night, but hey, I will not be around in the 3000
[17:11:45] 	 greg-g: Hi... do you mind if I do a smallish Wikidata deploy sometime before train? Would include only https://gerrit.wikimedia.org/r/220551
[17:11:49] 	 don’t knock it, we’ll all be able to double our hourly rate in 8431
[17:12:13] 	 hoo: sure, I think you have time
[17:13:13] 	 one extra second, in fact
[17:13:22] 	 greg-g: Thank you :)
[17:15:16] 	 Meh, seems I also need to backport stuff to fix jenkins, but should only touch tests
[17:15:19] 	   18922 Unknown modifier 'b': [/^Vector\-action\-move/bs/i] in /srv/mediawiki/php-1.26wmf10/includes/specials/SpecialAllMessages.php on line 311
[17:15:23] 	 who broke it?
[17:15:47] 	 6operations: Install molly-guard on production hosts - https://phabricator.wikimedia.org/T103873#1401392 (10Legoktm)
[17:17:07] 	 uh Krenair 
[17:17:16] 	 someone was complaininng about that at #mediawiki
[17:17:58] 	 that code is from like, five years ago… :D
[17:18:11] 	 we should have a lint check for preg_quote() without second parameter
[17:18:20] 	 !log ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: Ieab6b1473e6ce: תיקון טעות (duration: 00m 12s)
[17:18:22] 	 which is literally never correct
[17:18:24] 	 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster: Setup a mediawiki033 on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1401396 (10greg)
[17:18:25] 	 Logged the message, Master
[17:18:51] 	 MatmaRex, we should just friggin unit test everything
[17:19:21] 	 (03PS10) 10Greg Grossmeier: beta: varnish backend/director for isolated security audits [puppet] - 10https://gerrit.wikimedia.org/r/158016 (https://phabricator.wikimedia.org/T72181) (owner: 10Dduvall)
[17:19:26] 	 (03CR) 10jenkins-bot: [V: 04-1] beta: varnish backend/director for isolated security audits [puppet] - 10https://gerrit.wikimedia.org/r/158016 (https://phabricator.wikimedia.org/T72181) (owner: 10Dduvall)
[17:20:00] 	 mmm
[17:20:05] 	 or we could add lint checks
[17:25:19] 	 PROBLEM - Tool Labs instance distribution on labcontrol1002 is CRITICAL: NRPE: Command check_check-tools-spread not defined
[17:25:56] 	 mornin' ori. kart_ is helping us with the emails and it seems he doesn't have permission to import from 
[17:26:08] 	 '/srv/mediawiki/multiversion/MWVersion.php'
[17:26:22] 	 from the path above.
[17:26:28] 	 PROBLEM - puppet last run on labcontrol1002 is CRITICAL Puppet has 1 failures
[17:26:31] 	 can you help with that ori?
[17:26:39] 	 PROBLEM - puppetmaster https on labcontrol1002 is CRITICAL: Connection refused
[17:26:50] 	 why are you trying to interact with multiversion directly?
[17:27:10] 	 this is the error kart_ gets ori: Warning: fopen(/tmp/mw-cache-1.26wmf10/conf-enwiki): failed to open stream: Permission denied in /srv/mediawiki/wmf-config/CommonSettings.php on line 167
[17:28:20] 	 kart_: you can ignore that
[17:28:28] 	 it should work regardless
[17:28:42] 	 ori: i owe you a beer
[17:28:52] 	 (why?)
[17:29:42] 	 for closign https://phabricator.wikimedia.org/T2260
[17:29:48] 	 *closing
[17:30:08] 	 RECOVERY - puppet last run on labcontrol1002 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures
[17:30:17] 	 6operations, 10ops-eqiad, 6Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Rename virt1000 to labcontrol1002, move to same subnet as labcontrol1001 - https://phabricator.wikimedia.org/T102646#1401470 (10Andrew) 5Open>3Resolved Up and puppetized and happy.
[17:31:00] 	 ori: script exited with that error. Do it need any arguments/options?
[17:31:21] 	 kart_: where do you have the code checked out?
[17:31:27] 	 on your home directory on tin?
[17:31:42] 	 yes
[17:31:53] 	 do you mind if i take a look?
[17:31:56] 	 /home/kartik/recs on tin
[17:32:04] 	 ori: go ahead in my home :)
[17:32:06] 	 (03PS1) 10Andrew Bogott: Move the glance image dir from /a to /srv. [puppet] - 10https://gerrit.wikimedia.org/r/220828 
[17:33:13] 	 6operations, 10ops-eqiad, 6Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Rename virt1000 to labcontrol1002, move to same subnet as labcontrol1001 - https://phabricator.wikimedia.org/T102646#1401488 (10Cmjohnson) The disks were raided together. I delete the array and the install worked fine.   I did not ad...
[17:33:35] 	 (03PS2) 10Andrew Bogott: Move the glance image dir from /a to /srv. [puppet] - 10https://gerrit.wikimedia.org/r/220828 
[17:33:59] 	 kart_: you forgot to change this line: 	if ( !file_exists( __DIR__ . "/{$code}-staff-recs.json" ) ) {
[17:34:10] 	 i'll fix it
[17:35:14] 	 (03CR) 10Andrew Bogott: [C: 032] Move the glance image dir from /a to /srv. [puppet] - 10https://gerrit.wikimedia.org/r/220828 (owner: 10Andrew Bogott)
[17:35:29] 	 kart_: try now
[17:35:59] 	 ori: cool
[17:37:09] 	 ori: looks good.
[17:37:13] 	 Thanks!
[17:37:52] 	 np
[17:38:02] 	 thanks ori. :-)
[17:38:04] 	 (03PS16) 10Paladox: Rename $wmincClosedWikis to $wgWmincClosedWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 
[17:38:27] 	 andrewbogott: can someone tell once and for all what is the story with /a ? is the a new FSH invented by WMF ?
[17:38:36] 	 *is that 
[17:38:49] 	 (03PS17) 10Paladox: Rename all main WikimediaIncubator settings to have a wg prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 
[17:39:16] 	 matanya: We’re a few generations away now, but I think it was a Domas thing.
[17:39:24] 	 leila: next step?
[17:39:42] 	 is that basically replaced by /srv?
[17:40:00] 	 I vaguely recall /h/w/c or /home/wikipedia/common from a lot of the pmtpa docs
[17:40:00] 	 "a domas thing" i will quote this in the future
[17:40:16] 	 i remember that too Krenair 
[17:40:21] 	 kart_: so the file has run and the emails should be out, right?
[17:40:58] 	 (03CR) 10Alex Monk: "This can be done by ops now." [puppet] - 10https://gerrit.wikimedia.org/r/218766 (owner: 10Alex Monk)
[17:41:26] 	 (03PS7) 10coren: Labs: More puppetization fixes for labstore* [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) 
[17:41:44] 	 FYI in ~3h I'm going to merge modules/cassandra into puppet.git (https://phabricator.wikimedia.org/T92560)
[17:42:14] 	 (03PS3) 10Alex Monk: Pull out unnecessary wikitech settings, move some into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218098 (https://phabricator.wikimedia.org/T75939) 
[17:42:18] 	 (03CR) 10Andrew Bogott: [C: 032] Pull out unnecessary wikitech settings, move some into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218098 (https://phabricator.wikimedia.org/T75939) (owner: 10Alex Monk)
[17:42:20] 	 (03CR) 10jenkins-bot: [V: 04-1] Labs: More puppetization fixes for labstore* [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) (owner: 10coren)
[17:42:23] 	 (03CR) 10Alex Monk: [C: 032] Pull out unnecessary wikitech settings, move some into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218098 (https://phabricator.wikimedia.org/T75939) (owner: 10Alex Monk)
[17:42:28] 	 (03Merged) 10jenkins-bot: Pull out unnecessary wikitech settings, move some into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218098 (https://phabricator.wikimedia.org/T75939) (owner: 10Alex Monk)
[17:42:55] 	 ori, can we log into recommender-feedback@wikimedia.org
[17:42:57] 	 ?
[17:43:23] 	 leila: no, it's not an actual address; it just forwards to you and elery (and the other researcher)
[17:43:25] 	 (03PS8) 10coren: Labs: More puppetization fixes for labstore* [puppet] - 10https://gerrit.wikimedia.org/r/218666 (https://phabricator.wikimedia.org/T102478) 
[17:43:26] 	 !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
[17:43:32] 	 Logged the message, Master
[17:43:48] 	 !log krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
[17:43:53] 	 Logged the message, Master
[17:45:41] 	 everything is 503ing for me
[17:45:55] 	 I just reverted the above
[17:46:14] 	 !log krenair Synchronized wmf-config: (no message) (duration: 00m 31s)
[17:46:18] 	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0]
[17:46:20] 	 Logged the message, Master
[17:47:21] 	 Unconfirmed reports enwikis down
[17:47:29] 	 PROBLEM - Apache HTTP on mw1072 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 8.866 second response time
[17:47:38] 	 PROBLEM - HHVM rendering on mw1057 is CRITICAL - Socket timeout after 10 seconds
[17:47:41] 	 So... Was that my change?
[17:48:00] 	 seems so
[17:48:09] 	 got one 503, and now it is gone
[17:48:20] 	 how on earth did that cause issues...
[17:48:32] 	 wmf
[17:48:34] 	 wfm*
[17:48:47] 	 https://gdash.wikimedia.org/dashboards/reqerror/
[17:48:50] 	 ^ looks awful
[17:48:59] 	 RECOVERY - Apache HTTP on mw1072 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.077 second response time
[17:49:19] 	 RECOVERY - HHVM rendering on mw1057 is OK: HTTP OK: HTTP/1.1 200 OK - 62705 bytes in 0.088 second response time
[17:49:22] 	 but it was apparently very temporary
[17:49:24] 	 http://plasmasturm.org/log/6debug/
[17:50:09] 	 (03PS1) 10Alex Monk: Revert "Pull out unnecessary wikitech settings, move some into CommonSettings" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220836 
[17:50:28] 	 (03CR) 10Alex Monk: [C: 032] "already in prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220836 (owner: 10Alex Monk)
[17:50:29] 	 mutante: the ‘Virtualization cluster eqiad’ ganglia report is actually showing everything I care about for the first time.  Thank you!
[17:50:33] 	 (03Merged) 10jenkins-bot: Revert "Pull out unnecessary wikitech settings, move some into CommonSettings" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220836 (owner: 10Alex Monk)
[17:50:41] 	 leila, kart_: I created a project (research/RecommendationMailer) and submitted the current code as the initial commit: https://gerrit.wikimedia.org/r/#/c/220834/
[17:51:04] 	 ori, what's "Lost parent, LightProcess exiting"?
[17:51:20] 	 MaxSem: it's a generic message printed whenever HHVM fatals
[17:51:22] 	 Krenair: why $wgUseTeX = true; was removed ?
[17:51:24] 	 not related to the actual reason
[17:51:44] 	 matanya, it was removed in MW 1.18
[17:51:58] 	 so, the deployment caused a mass hhvm crash?
[17:52:28] 	 The change should have been harmless. At most I was expecting wikitech issues.
[17:52:34] 	 6operations, 10Traffic, 7Pybal: pybal idleconn ipv6 monitors actually check ipv4 - https://phabricator.wikimedia.org/T103880#1401543 (10BBlack) 3NEW
[17:52:51] 	 MaxSem: sounds like it
[17:52:52] 	 Did that server-restarting code go into scap or something?
[17:53:13] 	 it requires a --restart cli option
[17:53:22] 	 Well I definitely didn't specify that
[17:53:44] 	 that's not a clean restart, that sounds like a hard crash
[17:54:51] 	 lots of "hhvm main process (20962) killed by ABRT signal"
[17:54:59] 	 TC cache
[17:55:09] 	 we should start running with --restart
[17:55:25] 	 very broadly the reason this happens for some commits more than others is this:
[17:55:36] 	 HHVM does not deal especially well with global scope.
[17:55:44] 	 Facebook does not have a lot in global scope, they got rid of that.
[17:56:06] 	 You can't really reason about invariants all that well in global scope, so HHVM just declines to optimize.
[17:56:10] 	 They go into cold cache en masse
[17:56:21] 	 This is the source of some bugs that we have still not completely isolated. Namely:
[17:56:32] 	 (a) Changes to StartProfile.php have not been picked up by HHVM, and required a restart
[17:56:47] 	 (b) Changes to InitialiseSettings.php / CommonSettings.php are substantially more likely to lead to translation cache exhaustion
[17:57:10] 	 Why did it start to recover when I reverted the change?
[17:57:29] 	 because everything that could explode already did so
[17:57:36] 	 So do we want scap and/or sync to always —restart after a run?
[17:57:43] 	 Is there a phab ticket or patch for that?
[17:57:58] 	 if you're asking yourself, "why didn't ori document this somewhere", the reason is that i more or less just synthesized this in my head gradually over the past couple of days
[17:58:03] 	 Krenair: what MaxSem said
[17:58:07] 	 things were already restarting
[17:58:19] 	 so bad news: you didn't fix it. but good news: you didn't break it, either :P
[17:58:37] 	 If we did that change again, would it crash again?
[17:58:40] 	 ori: will it cause the same thing again if he 
[17:58:44] 	 What about other entirely unrelated changes?
[17:58:46] 	 heh, yes, what Krenair said :)
[17:58:51] 	 honestly, I'm not sure that --restart would've helped because you first put the files in place and only then restart. hhvm can crash in between
[17:59:05] 	 MaxSem: yes, but then you don't have the accumulation from one change to the next
[17:59:23] 	 i will bet you it won't cause crashes the next time someone syncs a config change
[17:59:26] 	 or the time after that
[17:59:27] 	 or the time after that
[17:59:33] 	 it takes a half dozen or more
[17:59:43] 	 that also answers andrewbogott and Krenair's question, I think
[18:00:02] 	 in other words: it's nothing particular about that change, other than the fact that it's (a) unlucky, (b) touches global scope
[18:00:04] 	 twentyafterfour greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150625T1800). Please do the needful.
[18:00:12] 	 Okay... Sorry about this guys.
[18:00:22] 	 Krenair: it is 100% not your fault
[18:00:22] 	 I'll probably come back and try the wikitech change again later.
[18:00:26] 	 Good luck to twentyafterfour
[18:00:33] * Krenair goes to get dinner
[18:01:28] 	 I need to gather all of these details in a single phab ticket and link the various related issues
[18:03:26] 	 MaxSem: re: LightProcess, here is Tim's explanation, edited for clarity:
[18:03:31] 	 We know that fork/exec is slow on HHVM, especially under concurrent load. When you fork/exec, you briefly acquire a lock on the kernel's view of the address space of the whole process. LightProcess is a child process that HHVM forks in advance. Then subsequent shell executions are forked from the child process. If you do your high volume forks from a separate worker process, you avoid acquiring the lock on the main process.
[18:03:44] 	 (03PS3) 10Giuseppe Lavagetto: confctl: allow regex expression and a global "all" [software/conftool] - 10https://gerrit.wikimedia.org/r/220536 
[18:03:49] 	 PROBLEM - Redis on stat1001 is CRITICAL: Connection refused
[18:04:02] 	 when the main process dies for whatever reason (including a normal shutdown IIRC), you each lightprocess exits too, with the log message you saw
[18:04:23] 	 so should I be worried about deploying the train right now?
[18:04:29] 	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[18:04:31] 	 twentyafterfour: no
[18:04:39] 	 I mean, no more than usual :P
[18:04:45] 	 but we need to work on rolling restarts of hhvm?
[18:04:54] 	 you should run scap with --restart
[18:05:07] 	 that restarts hhvm? I didn't even know it was a thing
[18:05:17] 	 it has only very recently become a thing (yesterday)
[18:05:18] 	 also, today's deploy doesn't do a full scap, just sync-wikiversions
[18:05:25] 	 that's fine then
[18:05:28] 	 no --restart
[18:05:44] 	 so it's only needed when syncing code changes?
[18:06:25] 	 I don't want to establish guidelines for its use before we have this all documented
[18:06:30] 	 so just pretend it doesn't exist for now
[18:06:37] 	 i am writing up things in phab
[18:06:46] 	 easier for me to just write it there rather than on irc and in phab concurrently
[18:06:50] 	 don't mind the man behind the curtain
[18:07:03] 	 greg-g: "for now" == for the next ten minutes
[18:07:07] 	 word
[18:07:29] 	 PROBLEM - Host es2004 is DOWN: PING CRITICAL - Packet loss = 100%
[18:07:32] * greg-g is totally really super excited about --restart
[18:07:44] 	 (03PS1) 1020after4: all wikis to 1.26wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220838 
[18:09:06] 	 (03CR) 1020after4: [C: 032] all wikis to 1.26wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220838 (owner: 1020after4)
[18:09:13] 	 (03Merged) 10jenkins-bot: all wikis to 1.26wmf11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220838 (owner: 1020after4)
[18:09:18] 	 6operations, 10MediaWiki-Sites, 10SEO, 5HTTPS-by-default, and 3 others: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1401573 (10Krinkle) 5Open>3Resolved
[18:09:40] 	 !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf11
[18:09:46] 	 Logged the message, Master
[18:12:14] 	 Weekly train completed. fatalmonitor looks good. This is the end of the line folks, next train leaves the station in 5 days.
[18:12:28] 	 aharoni: ^
[18:12:31] 	 oh
[18:12:34] 	 18:12 < twentyaft> Weekly train completed. fatalmonitor looks good. This  is the end of the line folks, next train leaves the  station in 5 days.
[18:12:38] 	 :)
[18:12:41] 	 aha
[18:12:51] 	 perfect!
[18:13:03] 	 aaaaand ContentTranslation has a new dashboard
[18:13:06] 	 yay
[18:16:18] 	 andrewbogott: :)
[18:19:50] 	 RECOVERY - Redis on stat1001 is OK: TCP OK - 0.012 second response time on port 6379
[18:23:20] 	 There's a lot of exceptions that we ran out of captchas
[18:23:24] 	 Can anyone please check?
[18:23:32] * hoo is poking at DB issues, so can't right now
[18:24:51] 	 10Ops-Access-Requests, 6operations, 6Discovery, 3Discovery-Cirrus-Sprint: Grant access to HTTP request logs - https://phabricator.wikimedia.org/T103872#1401622 (10Ottomata) James is asking to be in the `analytics-privatedata-users` group.
[18:28:22] 	 hoo, the exceptions appear to have disappeared
[18:28:37] 	 but they were definitely being thrown until about 19:23:33
[18:29:24] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401646 (10ori) 3NEW
[18:29:32] 	 ^ greg-g, twentyafterfour, Krenair, bd808
[18:30:15] 	 there are some dupes of this task that identify a part of the problem, i will merge those in
[18:30:51] * Nemo_bis still isn't clear how the issue was solved last time
[18:31:17] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401657 (10Krenair)
[18:31:24] 	 "InitialiseSettings.php alone is five times longer than the Bhagavad-Gita." lols that's pretty good
[18:33:32] 	 (03PS1) 10Alex Monk: Revert "Revert "Pull out unnecessary wikitech settings, move some into CommonSettings"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220847 
[18:34:10] 	 6operations, 10Deployment-Systems, 10MediaWiki-ResourceLoader: Bad cache stuck due to race condition with scap between different web servers - https://phabricator.wikimedia.org/T47877#1401662 (10Krinkle)
[18:34:37] 	 ori, how is wikitech.php particularly bad?
[18:34:54] 	 6operations, 10Deployment-Systems, 7HHVM, 5Patch-For-Review, 15User-Bd808-Test: Scap should restart HHVM - https://phabricator.wikimedia.org/T103008#1401664 (10ori)
[18:34:57] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401667 (10ori)
[18:35:02] 	 (03CR) 10Andrew Bogott: [C: 031] Revert "Revert "Pull out unnecessary wikitech settings, move some into CommonSettings"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220847 (owner: 10Alex Monk)
[18:35:03] 	 6operations, 10ops-codfw, 7Database: Faulty memory on es2004 - https://phabricator.wikimedia.org/T103843#1401670 (10Papaul) swap the memory to another slot (from DIMM B2 to DIMM B3 and DIMM B3 to DIMM B2) now the error is not on DIMM B2 but on DIMM B3 . Bad memory the memory needs to me replaced. I will call...
[18:35:17] 	 Krenair: great question, don't know. It is implicated in the last time this happened (i.e., your deploy a few minutes ago), so I'm flagging it
[18:35:22] 	 ok
[18:35:32] 	 I don't know how StartProfile.php is particularly bad either, but I suspect it has to do with the registration of a shutdown handler.
[18:35:36] 	 quiet hours = not many ops around?
[18:36:12] 	 honestly wikitech.php shouldn't even get loaded by machines other than silver
[18:36:42] 	 (03PS1) 10Cmjohnson: adding ip for labnet1002 instance vlan [dns] - 10https://gerrit.wikimedia.org/r/220849 
[18:36:43] 	 yeah
[18:36:46] 	 hmm... except maybe job runners/image scalers/etc.. or people running maintenance scripts on labswiki from tin/terbium.
[18:36:50] 	 (03PS1) 10Dzahn: better role descriptions for some misc services [puppet] - 10https://gerrit.wikimedia.org/r/220850 
[18:37:33] 	 although I'm not sure it's possible to do that at the moment because of the mysql grant restriction
[18:38:34] 	 (03PS2) 10Cmjohnson: adding ip for labnet1002 instance vlan [dns] - 10https://gerrit.wikimedia.org/r/220849 
[18:38:35] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401682 (10Legoktm)
[18:38:52] 	 ah
[18:39:01] 	 (03PS2) 10Dzahn: better role descriptions for some misc services [puppet] - 10https://gerrit.wikimedia.org/r/220850 
[18:39:09] 	 you can load eval.php for labswiki from tin, but none of the functions requiring a db connection will work
[18:40:05] 	 (03PS3) 10Dzahn: better role descriptions for some misc services [puppet] - 10https://gerrit.wikimedia.org/r/220850 
[18:40:41] 	 (03PS2) 10Dzahn: bugzilla_static: include role::backup::host [puppet] - 10https://gerrit.wikimedia.org/r/220818 (https://phabricator.wikimedia.org/T95184) 
[18:40:44] 	 (03CR) 10Cmjohnson: [C: 032] adding ip for labnet1002 instance vlan [dns] - 10https://gerrit.wikimedia.org/r/220849 (owner: 10Cmjohnson)
[18:41:17] 	 (03CR) 10Dzahn: [C: 032] better role descriptions for some misc services [puppet] - 10https://gerrit.wikimedia.org/r/220850 (owner: 10Dzahn)
[18:41:48] 	 (03PS3) 10Dzahn: bugzilla_static: include role::backup::host [puppet] - 10https://gerrit.wikimedia.org/r/220818 (https://phabricator.wikimedia.org/T95184) 
[18:43:15] 	 (03PS2) 10Dzahn: misc-web: delete Varnish config for dev.wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/220498 (https://phabricator.wikimedia.org/T305) 
[18:43:31] 	 (03PS3) 10Dzahn: misc-web: delete Varnish config for dev.wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/220498 (https://phabricator.wikimedia.org/T305) 
[18:44:02] 	 (03CR) 10Dzahn: [C: 032] bugzilla_static: include role::backup::host [puppet] - 10https://gerrit.wikimedia.org/r/220818 (https://phabricator.wikimedia.org/T95184) (owner: 10Dzahn)
[18:44:18] 	 RECOVERY - Host es2004 is UPING OK - Packet loss = 0%, RTA = 43.39 ms
[18:46:32] 	 (03PS1) 10BBlack: define LVS for recdns over TCP [puppet] - 10https://gerrit.wikimedia.org/r/220885 
[18:46:32] 	 ori, so if we wanted to try again now, would it be OK?
[18:46:36] 	 ori: thanks for your comment about initializesettings.php in https://phabricator.wikimedia.org/T103886 . made my day
[18:47:20] 	 ori: "    HHVM's translation cache does have an eviction mechanism, or its eviction mechanism does not work for the cold cache, or it has some unspecified bug.
[18:47:22] 	 "   does not?
[18:47:24] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401731 (10Manybubbles) Do you mean: HHVM's translation cache does *not* have an eviction mechanism, or i...
[18:47:37] 	 heh
[18:47:39] 	 PROBLEM - Host google is DOWN: PING CRITICAL - Packet loss = 100%
[18:47:49] 	 wat
[18:48:01] 	 greg-g: nice
[18:48:11] 	 host google?
[18:48:21] 	 lol
[18:48:33] 	 so the google is down?
[18:48:37] 	 I doubt it
[18:48:44] 	 (03PS1) 10Cmjohnson: Adding mac address for labnet1002 [puppet] - 10https://gerrit.wikimedia.org/r/220899 
[18:48:49] 	 that's actually true, neon can't ping google.com
[18:48:51] 	 does this translation cache have anything to do with https://phabricator.wikimedia.org/T103888 ?
[18:49:05] 	 I think that's probably something we put in as a double-check on our own monitoring validity
[18:49:19] 	 bast1001 and hooft can
[18:49:21] 	 I can't ping google from palladium either fwiw
[18:49:24] 	 eh, it's a virtual host that was there for those google safe browsing checks
[18:49:46] 	 oh palladium is in wmnet, duh
[18:49:51] 	 some of them are actually critical
[18:49:52] 	 (03CR) 10Cmjohnson: [C: 032] Adding mac address for labnet1002 [puppet] - 10https://gerrit.wikimedia.org/r/220899 (owner: 10Cmjohnson)
[18:49:58] 	 https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=Google
[18:49:59] 	 RECOVERY - Host google is UPING OK - Packet loss = 0%, RTA = 9.02 ms
[18:50:01] 	 lvs1002 can't reach it
[18:50:05] 	 173.194.206.147 replies, but 173.194.207.147 didn't
[18:50:25] 	 (03PS1) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[18:50:28] 	 actual Google maintenance?
[18:50:42] 	 (03PS2) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[18:50:53] 	 greg-g: does not, yeah.
[18:50:57] 	 (03PS3) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[18:51:21] 	 godog: from lvs1002 I get the opposite
[18:51:28] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401738 (10ori)
[18:51:28] 	 (107 ok, 106 bad)
[18:51:34] 	 in icinga it's all recovering
[18:51:58] 	 but it was actually host (ping) and some of the HTTP checks
[18:52:05] 	 I think google flipped some service switch and failed to consider DNS TTLs properly, or something
[18:52:13] 	 yea, feels like that
[18:52:21] 	 6operations, 7HHVM, 7Performance: HHVM 3.6 leaks memory - https://phabricator.wikimedia.org/T99525#1401757 (10ori)
[18:52:24] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401646 (10ori)
[18:52:33] 	 (03PS4) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[18:53:21] 	 (03PS2) 10BBlack: define LVS for recdns over TCP [puppet] - 10https://gerrit.wikimedia.org/r/220885 
[18:53:39] 	 PROBLEM - check google safe browsing for wikipedia.org on google is CRITICAL - Socket timeout after 10 seconds
[18:53:44] 	 could be also failure to withdraw a faulty cluster, sometimes I get working ips from dns
[18:53:47] 	 anyone around have any "oh god please don't do that" considerations about https://gerrit.wikimedia.org/r/#/c/220885 (recdns TCP above)
[18:54:19] 	 i take that back, it's flapping now
[18:55:21] 	 RECOVERY - check google safe browsing for wikipedia.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 4208 bytes in 3.104 second response time
[18:55:23] 	 naming things time! What should a command that can be run from tin to restart all of the hhvm servers be named? hhvm-restart? hhvm-graceful? restart-hhvm?
[18:55:43] 	 do we have anything like it already?
[18:55:48] 	 bd808: <3. thanks for jumping on that so quickly.
[18:55:50] 	 reboot-the-world
[18:55:59] 	 hhvm-graceful-all , hehe
[18:56:06] 	 that was the apache command in the past
[18:56:19] 	 restart-hhvm
[18:56:34] 	 well, no, let's make it clear from the name that its scope is not local
[18:56:52] 	 is it prod-global, or just one service cluster?
[18:56:54] 	 (03PS5) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[18:57:08] 	 app servers
[18:57:15] 	 bblack: seems like a good idea, why not bgp tho?
[18:57:18] 	 and cross-dc?
[18:57:23] * greg-g presumes
[18:57:34] 	 yes
[18:57:54] 	 hhvm-restart++
[18:57:57] 	 godog: I think that's the normal pattern for pybal services: only the first one defined for a set of listener IPs has bgp: yes, the others have bgp: no (see text + text-https and similar).
[18:58:15] 	 I presume because otherwise there would be conflict on which healthchecks do or don't control bgp advert
[18:58:17] 	 restart-hip-hop-virtual-machine-on-all-appservers --yes-im-sure
[18:58:22] 	 I like hhvm-graceful-all actually
[18:58:45] 	 sure, /me doesn't actually care all tha tmuch
[18:58:55] 	 it's just clunky enough to match the process
[18:58:59] 	 (relatedly, I should probably flip the bgp:yes/no to the https's in general!)
[18:59:02] 	 sure, that works
[18:59:11] 	 concur
[18:59:52] 	 rhap
[18:59:56] 	 bblack: indeed you are right, nevermind
[19:00:03] 	 (03PS6) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[19:00:11] 	 (03CR) 10Filippo Giunchedi: [C: 031] define LVS for recdns over TCP [puppet] - 10https://gerrit.wikimedia.org/r/220885 (owner: 10BBlack)
[19:00:27] 	 rhap is smart, it's like scap but also the HipHop reference
[19:00:47] 	 also, nobody will remember it
[19:01:30] 	 rhap-remember
[19:01:41] 	 apropos rap won't work
[19:01:43] 	 6operations, 10Deployment-Systems, 6Performance-Team: During deployment old servers may populate new cache URIs - https://phabricator.wikimedia.org/T47877#1401779 (10Krinkle)
[19:01:54] 	 (03CR) 10BBlack: [C: 032] "Note this won't take effect immediately anyways - will do manual pybal restarts and monitor the situation" [puppet] - 10https://gerrit.wikimedia.org/r/220885 (owner: 10BBlack)
[19:02:05] 	 (03PS1) 10Rush: pybal: switch lvs2003 & lvs2006 to confd pools [puppet] - 10https://gerrit.wikimedia.org/r/220925 
[19:02:09] 	 so what actually happens when "Google Safe Browsing" is broken for a site
[19:02:25] 	 (03PS2) 10Rush: pybal: switch lvs2003 & lvs2006 to confd pools [puppet] - 10https://gerrit.wikimedia.org/r/220925 
[19:03:25] 	 (03PS7) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[19:03:33] 	 6operations, 10Deployment-Systems, 6Performance-Team: During deployment old servers may populate new cache URIs - https://phabricator.wikimedia.org/T47877#510379 (10Krinkle) >>! In T47877#979631, @Jdforrester-WMF wrote: > It's an over-optimisation in ResourceLoader which creates a bug exposed by the way we d...
[19:03:59] 	 PROBLEM - Incoming network saturation on labstore1001 is CRITICAL 10.34% of data above the critical threshold [100000000.0]
[19:04:14] 	 eh?
[19:04:23] 	 (03CR) 10BBlack: [C: 04-1] "actual diff has 2 and 6 instead of 3 and 6" [puppet] - 10https://gerrit.wikimedia.org/r/220925 (owner: 10Rush)
[19:04:38] 	 (03CR) 10Rush: [C: 032] pybal: switch lvs2003 & lvs2006 to confd pools [puppet] - 10https://gerrit.wikimedia.org/r/220925 (owner: 10Rush)
[19:05:26] 	 shit bblack thanks your right
[19:05:42] 	 won't cause issues but it won't do what it is meant to do either
[19:06:11] 	 (03PS4) 10Dzahn: misc-web: delete Varnish config for dev.wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/220498 (https://phabricator.wikimedia.org/T305) 
[19:07:17] 	 (03PS1) 10Rush: pybal: fix lvs2005 duplicate to be lvs2006 for confd [puppet] - 10https://gerrit.wikimedia.org/r/220926 
[19:07:21] 	 (03CR) 10jenkins-bot: [V: 04-1] pybal: fix lvs2005 duplicate to be lvs2006 for confd [puppet] - 10https://gerrit.wikimedia.org/r/220926 (owner: 10Rush)
[19:07:22] 	 something's odd about my recdns thing too, it's using duplicate pybal service names :/
[19:07:23] 	 (03CR) 10Dzahn: [C: 032] misc-web: delete Varnish config for dev.wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/220498 (https://phabricator.wikimedia.org/T305) (owner: 10Dzahn)
[19:07:29] 	 (03PS2) 10Rush: pybal: fix lvs2005 duplicate to be lvs2006 for confd [puppet] - 10https://gerrit.wikimedia.org/r/220926 
[19:09:21] 	 I guess it constructs that name from the internal bits, bleh
[19:09:42] 	 (03CR) 10Rush: [C: 032] pybal: fix lvs2005 duplicate to be lvs2006 for confd [puppet] - 10https://gerrit.wikimedia.org/r/220926 (owner: 10Rush)
[19:10:26] 	 (03PS2) 10Dzahn: dev.wikimedia.org: delete puppet module [puppet] - 10https://gerrit.wikimedia.org/r/220504 (https://phabricator.wikimedia.org/T305) 
[19:10:44] 	 recdns_udp and recdns_tcp ? not loveable but idk
[19:11:00] 	 (03CR) 10Dzahn: "This is not used since dev.wm is just a redirect into mediawiki.org - if plans ever change for it to be a separate site it can easily be r" [puppet] - 10https://gerrit.wikimedia.org/r/220504 (https://phabricator.wikimedia.org/T305) (owner: 10Dzahn)
[19:11:23] 	 (03CR) 10Dzahn: [C: 032] dev.wikimedia.org: delete puppet module [puppet] - 10https://gerrit.wikimedia.org/r/220504 (https://phabricator.wikimedia.org/T305) (owner: 10Dzahn)
[19:11:29] 	 PROBLEM - puppet last run on lvs1002 is CRITICAL puppet fail
[19:11:31] 	 chasemp: it has to be that way in general, IPVS is for one single proto:ip:port
[19:11:57] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401821 (10Joe) @manybubbles it does not have an eviction mechanism at all.
[19:12:00] 	 I just mean, when it comes down to translating that to a [stanza_name] in pybal.conf, they're duplicated because the code never considered everything being the same but the protocol part
[19:12:47] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401824 (10Manybubbles) >>! In T103886#1401821, @Joe wrote: > @manybubbles it does not have an eviction m...
[19:13:19] 	 ah ok
[19:15:29] 	 (03PS8) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[19:15:46] 	 (03CR) 10Dzahn: "isn't that what notifies https://lists.wikimedia.org/mailman/listinfo/newprojects ?" [puppet] - 10https://gerrit.wikimedia.org/r/218766 (owner: 10Alex Monk)
[19:16:03] 	 (03PS9) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[19:17:25] 	 (03PS1) 10BBlack: pybal.conf: use any non-default proto in stanza name [puppet] - 10https://gerrit.wikimedia.org/r/220934 
[19:17:28] 	 bblack: I'm going to keep puppet disabled on lvs2006 for a moment.  the pool from etcd is kicking back as invalid as generated so I'm looking.  I had mirrored the 'enabled' hosts from current and it isn't picking it up.  just an fyi for the moment.
[19:17:54] 	 (03PS2) 10BBlack: pybal.conf: use any non-default proto in stanza name [puppet] - 10https://gerrit.wikimedia.org/r/220934 
[19:17:59] 	 (03CR) 10Legoktm: "Not anymore, the addWiki script now does it directly: a17c2ef30e0e85ced460f304cf481cdb7d924486" [puppet] - 10https://gerrit.wikimedia.org/r/218766 (owner: 10Alex Monk)
[19:18:11] 	 chasemp: ok
[19:19:12] 	 6operations: DNS Change for GreenHouse - https://phabricator.wikimedia.org/T103893#1401835 (10JGulingan) 3NEW
[19:19:32] 	 (03CR) 10BBlack: [C: 032 V: 032] pybal.conf: use any non-default proto in stanza name [puppet] - 10https://gerrit.wikimedia.org/r/220934 (owner: 10BBlack)
[19:20:23] 	 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster, 5Patch-For-Review: Setup a mediawiki033 on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1401847 (10greg) >>! In T72181#1147999, @yuvipanda wrote: > Note that the old mediawiki03 doesn...
[19:21:12] 	 (03PS12) 10Dzahn: Allow text-lb to redirect svn access to Diffusion [puppet] - 10https://gerrit.wikimedia.org/r/219228 (owner: 10Chad)
[19:22:15] 	 (03PS2) 10Dzahn: Remove notifyNewProjects script [puppet] - 10https://gerrit.wikimedia.org/r/218766 (owner: 10Alex Monk)
[19:22:29] 	 RECOVERY - puppet last run on lvs1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:24:09] 	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/218766 (owner: 10Alex Monk)
[19:24:13] 	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/219228 (owner: 10Chad)
[19:24:58] 	 (03PS3) 10Dzahn: Point svn.wikimedia.org at text-lb [dns] - 10https://gerrit.wikimedia.org/r/219234 (owner: 10Chad)
[19:25:03] 	 6operations, 10Wikimedia-DNS: DNS Change for GreenHouse - https://phabricator.wikimedia.org/T103893#1401869 (10Krenair)
[19:25:03] 	 (03PS9) 10Dzahn: Remove subversion server support [puppet] - 10https://gerrit.wikimedia.org/r/219240 (owner: 10Chad)
[19:26:24] 	 6operations, 10Wikimedia-DNS, 7Mail: DNS Change for GreenHouse - https://phabricator.wikimedia.org/T103893#1401871 (10Dzahn)
[19:27:02] 	 (03PS1) 10BBlack: pybal dns_rec: flip config names to match pybal.conf for clarity [puppet] - 10https://gerrit.wikimedia.org/r/220939 
[19:27:19] 	 (03CR) 10Dzahn: [C: 032] Remove notifyNewProjects script [puppet] - 10https://gerrit.wikimedia.org/r/218766 (owner: 10Alex Monk)
[19:27:55] 	 (03PS2) 10BBlack: pybal dns_rec: flip config names to match pybal.conf for clarity [puppet] - 10https://gerrit.wikimedia.org/r/220939 
[19:28:02] 	 (03CR) 10BBlack: [C: 032 V: 032] pybal dns_rec: flip config names to match pybal.conf for clarity [puppet] - 10https://gerrit.wikimedia.org/r/220939 (owner: 10BBlack)
[19:28:06] 	 6operations, 10MediaWiki-Sites, 10SEO, 5HTTPS-by-default, and 3 others: URLs for the same title without extra query parameters should have the same canonical link - https://phabricator.wikimedia.org/T67402#1401878 (10matmarex) Yes, that patch is fixing something slightly different, but related (and would h...
[19:29:49] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1401890 (10bd808) > Avoid deploying changes to StartProfile.php and wikitech.php in quiet hours.  I'm not...
[19:30:38] 	 (03PS1) 10BryanDavis: Add an hhvm-graceful-all command [tools/scap] - 10https://gerrit.wikimedia.org/r/220941 (https://phabricator.wikimedia.org/T103886) 
[19:31:10] 	 (03CR) 10jenkins-bot: [V: 04-1] Add an hhvm-graceful-all command [tools/scap] - 10https://gerrit.wikimedia.org/r/220941 (https://phabricator.wikimedia.org/T103886) (owner: 10BryanDavis)
[19:33:06] 	 ori: anything Nikerabbit should know before he does a scap? /me assumes no
[19:33:28] 	 PROBLEM - puppet last run on mira is CRITICAL Puppet has 1 failures
[19:33:29] 	 PROBLEM - puppet last run on silver is CRITICAL Puppet has 1 failures
[19:33:58] 	 (03PS2) 10BryanDavis: Add an hhvm-graceful-all command [tools/scap] - 10https://gerrit.wikimedia.org/r/220941 (https://phabricator.wikimedia.org/T103886) 
[19:34:33] 	 (03PS10) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[19:36:00] 	 (03CR) 10Yuvipanda: [C: 031] Puppetize toolserver.org legacy server (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/220134 (https://phabricator.wikimedia.org/T85165) (owner: 10coren)
[19:37:01] 	 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: configure less aggressive cassandra log rotation / send cassandra logs to logstash - https://phabricator.wikimedia.org/T100970#1401922 (10Eevans) a:3Eevans
[19:37:02] 	 (03PS13) 10Dzahn: Allow text-lb to redirect svn access to Diffusion [puppet] - 10https://gerrit.wikimedia.org/r/219228 (owner: 10Chad)
[19:38:38] 	 PROBLEM - etherpad.wikimedia.org HTTP on etherpad1001 is CRITICAL - Socket timeout after 10 seconds
[19:38:41] 	 (03CR) 10Dzahn: [C: 032] Allow text-lb to redirect svn access to Diffusion [puppet] - 10https://gerrit.wikimedia.org/r/219228 (owner: 10Chad)
[19:38:43] 	 (03CR) 10Paladox: [C: 031] Remove subversion server support [puppet] - 10https://gerrit.wikimedia.org/r/219240 (owner: 10Chad)
[19:40:37] 	 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts - https://phabricator.wikimedia.org/T78514#1401940 (10Eevans)
[19:40:40] 	 6operations, 10RESTBase, 10RESTBase-Cassandra: graphs for Cassandra metrics - https://phabricator.wikimedia.org/T93884#1401935 (10Eevans) 5Open>3Resolved a:3Eevans Duplicate of T101764
[19:42:18] 	 RECOVERY - etherpad.wikimedia.org HTTP on etherpad1001 is OK: HTTP OK: HTTP/1.1 200 OK - 7928 bytes in 3.561 second response time
[19:43:53] 	 greg-g: I'm ready to deploy, just waiting for okay
[19:44:21] 	 Nikerabbit: you're good
[19:44:21] 	 I don't think anything special is needed greg-g 
[19:44:25] 	 bd808: ty
[19:45:16] 	 thanks
[19:45:47] 	 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster, 5Patch-For-Review: Setup a mediawiki033 on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1401958 (10yuvipanda) It's a different mediawiki03! Just because it's the same name and runs th...
[19:46:12] 	 !log nikerabbit Started scap: T103888 CX aliases
[19:46:17] 	 Logged the message, Master
[19:47:58] 	 RECOVERY - puppet last run on silver is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures
[19:49:39] 	 RECOVERY - puppet last run on mira is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures
[19:50:48] 	 7Blocked-on-Operations, 7Puppet, 6operations, 10Beta-Cluster, and 2 others: Setup a dedicated mediawiki host in Beta Cluster that we can use for security scanning - https://phabricator.wikimedia.org/T72181#1401976 (10greg)
[19:53:29] 	 6operations, 10ops-codfw: EQDFW/EQORD Deployment Prep Task - https://phabricator.wikimedia.org/T91077#1401985 (10RobH)
[19:53:52] 	 (03CR) 10Dzahn: "[terbium:~] $ apache-fast-test svn.urls mw1033" [puppet] - 10https://gerrit.wikimedia.org/r/219228 (owner: 10Chad)
[19:57:08] 	 (03CR) 10Ottomata: Add new projectview to projectcounts aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/220752 (https://phabricator.wikimedia.org/T101118) (owner: 10Joal)
[19:58:31] 	 (03PS1) 10Rush: pybal: pybal-eval-check use syslog as well as stdout [puppet] - 10https://gerrit.wikimedia.org/r/220948 
[19:58:40] 	 (03PS2) 10Rush: pybal: pybal-eval-check use syslog as well as stdout [puppet] - 10https://gerrit.wikimedia.org/r/220948 
[19:59:10] 	 6operations: Decommission svn.wikimedia.org server (import SVN into Phabricator) - https://phabricator.wikimedia.org/T86655#1402012 (10Dzahn) merged https://gerrit.wikimedia.org/r/#/c/219228/  tested on mw1033:   ``` [terbium:~] $ apache-fast-test svn.urls mw1033 testing 10 urls on 1 servers, totalling 10 reques...
[19:59:27] 	 (03CR) 10jenkins-bot: [V: 04-1] pybal: pybal-eval-check use syslog as well as stdout [puppet] - 10https://gerrit.wikimedia.org/r/220948 (owner: 10Rush)
[20:00:03] 	 6operations, 10ops-codfw: EQDFW/EQORD Deployment Prep Task - https://phabricator.wikimedia.org/T91077#1402015 (10RobH)
[20:00:12] 	 6operations: Decommission svn.wikimedia.org server (import SVN into Phabricator) - https://phabricator.wikimedia.org/T86655#1402016 (10Dzahn) >>! In T86655#1139440, @valhallasw wrote: > Cool! Did you/could you also import the other repositories (most importantly for me, the pywikipedia repo)?  I didn't, Chad did...
[20:01:14] 	 (03PS3) 10Rush: pybal: pybal-eval-check use syslog as well as stdout [puppet] - 10https://gerrit.wikimedia.org/r/220948 
[20:02:54] 	 (03PS11) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[20:06:34] 	 (03PS1) 10Hashar: Fix Shinken Mathoid probe [puppet] - 10https://gerrit.wikimedia.org/r/220954 (https://phabricator.wikimedia.org/T103595) 
[20:06:36] 	 (03PS1) 10Ori.livneh: Set hhvm.jit_pseudomain = false on canary app and API servers [puppet] - 10https://gerrit.wikimedia.org/r/220955 
[20:06:51] 	 bblack: around for a sanity check? ^
[20:07:09] 	 the canary pools are two small subsets of the api and app server clusters
[20:07:46] 	 Nikerabbit: confirmed working, thanks
[20:07:48] 	 just a sec
[20:08:49] 	 !log nikerabbit Finished scap: T103888 CX aliases (duration: 22m 37s)
[20:08:56] 	 Logged the message, Master
[20:09:11] 	 (03CR) 10BBlack: [C: 031] Set hhvm.jit_pseudomain = false on canary app and API servers [puppet] - 10https://gerrit.wikimedia.org/r/220955 (owner: 10Ori.livneh)
[20:09:25] 	 thanks
[20:09:35] 	 (03CR) 10Hashar: "Example of probes failing on labs against the beta cluster:" [puppet] - 10https://gerrit.wikimedia.org/r/220954 (https://phabricator.wikimedia.org/T103595) (owner: 10Hashar)
[20:09:49] 	 greg-g: I'm done, thank you again
[20:09:54] 	 (03PS2) 10Ori.livneh: Set hhvm.jit_pseudomain = false on canary app and API servers [puppet] - 10https://gerrit.wikimedia.org/r/220955 
[20:10:01] 	 (03CR) 10Ori.livneh: [C: 032 V: 032] Set hhvm.jit_pseudomain = false on canary app and API servers [puppet] - 10https://gerrit.wikimedia.org/r/220955 (owner: 10Ori.livneh)
[20:10:32] 	 6operations: secure.wikimedia.org entries still showing up in Google search results - https://phabricator.wikimedia.org/T93531#1402076 (10Krenair)
[20:11:27] 	 (03PS12) 10Ottomata: Refactor eventlogging role classes to make it easier to include different processes on different hosts [puppet] - 10https://gerrit.wikimedia.org/r/220912 (https://phabricator.wikimedia.org/T102831) 
[20:13:24] 	 greg-g: twentyafterfour: I'm "slightly" delayed because of jenkins/ composer/ tests acting up :P
[20:13:30] 	 Ok if I push now-ish
[20:13:48] 	 (03PS4) 10Dzahn: Point svn.wikimedia.org at text-lb [dns] - 10https://gerrit.wikimedia.org/r/219234 (https://phabricator.wikimedia.org/T86655) (owner: 10Chad)
[20:14:51] 	 6operations, 6Labs, 10wikitech.wikimedia.org, 7HHVM: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1402098 (10Krenair)
[20:15:17] 	 (03CR) 10Legoktm: [C: 031] logging: Force Monolog logger timezone to UTC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220784 (https://phabricator.wikimedia.org/T99581) (owner: 10BryanDavis)
[20:16:49] 	 greg-g: https://gerrit.wikimedia.org/r/220962
[20:17:04] 	 hoo: yeah
[20:17:08] 	 I'm going to have food and see about this after that... will be back in a bit
[20:17:20] 	 (03CR) 10Dzahn: [C: 032] Point svn.wikimedia.org at text-lb [dns] - 10https://gerrit.wikimedia.org/r/219234 (https://phabricator.wikimedia.org/T86655) (owner: 10Chad)
[20:18:13] 	 (03PS4) 10Rush: pybal: pybal-eval-check use syslog as well as stdout [puppet] - 10https://gerrit.wikimedia.org/r/220948 
[20:18:21] 	 ori: MediaWiki.xhprof.EditPage.attemptSave.real.mean is a bit sparse, maybe the sampling can be increased?
[20:18:35] 	 !log bye SVN - subversion URLs now redirect to phab or doc
[20:18:41] 	 Logged the message, Master
[20:22:11] 	 ostriches: ^ done
[20:22:19] 	 the redirects
[20:25:28] 	 (03PS10) 10Dzahn: Remove subversion server support [puppet] - 10https://gerrit.wikimedia.org/r/219240 (https://phabricator.wikimedia.org/T86655) (owner: 10Chad)
[20:25:31] 	 mutante: yay \o/
[20:26:08] 	 AaronSchulz: because increasing the sampling rate for xhprof is not cost-free, I wonder if we'd be better off doing something like: https://gerrit.wikimedia.org/r/220965
[20:26:16] 	 would that give you the data you need?
[20:27:20] 	 (03PS11) 10Dzahn: Remove subversion server support [puppet] - 10https://gerrit.wikimedia.org/r/219240 (https://phabricator.wikimedia.org/T86655) (owner: 10Chad)
[20:29:00] 	 PROBLEM - Incoming network saturation on labstore1001 is CRITICAL 10.71% of data above the critical threshold [100000000.0]
[20:29:43] 	 (03CR) 10Dzahn: [C: 032] Remove subversion server support [puppet] - 10https://gerrit.wikimedia.org/r/219240 (https://phabricator.wikimedia.org/T86655) (owner: 10Chad)
[20:31:57] 	 (03CR) 10BryanDavis: "ping? Working in beta." [puppet] - 10https://gerrit.wikimedia.org/r/216337 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis)
[20:32:36] 	 (03PS7) 10Ori.livneh: logstash: jessie support and beta cluster cluster [puppet] - 10https://gerrit.wikimedia.org/r/216337 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis)
[20:33:11] 	 6operations, 10Wikimedia-General-or-Unknown, 7Regression, 10Security-General: svn.wikimedia.org security certificate expired - https://phabricator.wikimedia.org/T88731#1402224 (10Dzahn) >>! In T88731#1288890, @RobH wrote: > However, are we still really wanting to support SVN?   No :) It has been removed in...
[20:33:27] 	 ori: what about the API?
[20:34:26] 	 it already sends per module section profiling, maybe it could use responseTime.api-action.*
[20:34:34] 	 6operations, 10Wikimedia-General-or-Unknown, 7Regression, 10Security-General: svn.wikimedia.org security certificate expired - https://phabricator.wikimedia.org/T88731#1402241 (10Dzahn) svn.wikimedia.org points to general Apache cluster / text-lb for redirects.  So the certificate isn't an issue anymore an...
[20:34:38] 	 (03CR) 10Alex Monk: [C: 04-1] "Per @Mobrovac on the task, GET /_info please" [puppet] - 10https://gerrit.wikimedia.org/r/220954 (https://phabricator.wikimedia.org/T103595) (owner: 10Hashar)
[20:35:19] 	 AaronSchulz: That sounds like a good idea to me. Would you like to take over my patch and add that? I have to babysit prod at the moment, so too distraction-prone.
[20:36:55] 	 ok
[20:37:06] 	 AaronSchulz: I'll review it quickly
[20:38:17] 	 mutante: Nimsoft got upset, heh
[20:38:22] 	 I forgot it tracked svn.wm.o
[20:38:42] 	 ostriches: oh, right. i can fix that 
[20:39:08] 	 (03PS1) 10Dzahn: svn: delete svn.wm.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/220967 (https://phabricator.wikimedia.org/T86655) 
[20:41:25] 	 !log deleted SVN monitor from watchmouse 
[20:41:32] 	 Logged the message, Master
[20:41:59] 	 ostriches: it was a "core service":)  done
[20:42:12] 	 Hehehe, ok good
[20:42:51] 	 ostriches: are you doing the cleanup on antimony?
[20:43:32] 	 Planned to, but haven't yet.
[20:43:48] 	 alright
[20:45:06] 	 (03PS2) 10Dzahn: svn: delete svn.wm.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/220967 (https://phabricator.wikimedia.org/T86655) 
[20:45:18] 	 do we monitor gerrit/phab from there?
[20:45:52] 	 Krenair: yes, both
[20:45:54] 	 gerrit.wm.o should be, yeah
[20:45:58] 	 see http://status.wikimedia.org/
[20:46:03] 	 ah yes
[20:46:50] 	 interesting, it didn't notice the outage earlier
[20:47:59] 	 (03PS3) 10Dzahn: svn: delete svn.wm.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/220967 (https://phabricator.wikimedia.org/T86655) 
[20:48:35] 	 The check for Phab is:
[20:48:38] 	 open https://phabricator.wikimedia.org/T2001
[20:48:49] 	 and check for string "docs are teh suck"
[20:48:58] 	 the delay between checks is 5min
[20:49:51] 	 The check for Gerrit is:
[20:49:55] 	 open https://gerrit.wikimedia.org/r/
[20:50:05] 	 and check for string "Gerrit Code Review"
[20:51:13] 	 (03CR) 10Dzahn: [C: 032] "puppet module installing this has been deleted" [puppet] - 10https://gerrit.wikimedia.org/r/220967 (https://phabricator.wikimedia.org/T86655) (owner: 10Dzahn)
[20:54:17] 	 haha
[20:54:59] 	 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review, 7Regression, 10Security-General: svn.wikimedia.org security certificate expired - https://phabricator.wikimedia.org/T88731#1402298 (10Dzahn) 5Open>3Resolved a:3Dzahn deleted cert from public repo, deleted key from private repo, shredd'...
[20:56:45] 	 6operations, 5Patch-For-Review: Decommission svn.wikimedia.org server (import SVN into Phabricator) - https://phabricator.wikimedia.org/T86655#1402303 (10Dzahn) deleted monitor for svn from watchmouse /nimsoft  deleted cert and key from antimony
[20:57:39] 	 mutante, so if anyone ever removes that text, status. will say phab is down?
[20:57:43] 	 (03PS1) 10Gergő Tisza: Autocreate local versions of global accounts on meta, mw.o [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) 
[20:58:07] 	 PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL 3.39% of data above the critical threshold [1000.0]
[20:58:44] 	 Krenair: yes, but that's why we picked this specific bug. it's 2001, which means it was Bugzilla 1, which is the "docs out of date" thing which, by definition, will never be resolved. :)
[20:58:54] 	 mutante: do you have any other puppet merges coming up btw? I'll need to remove modules/cassandra submodule
[20:59:23] 	 godog: no, it's a good moment to take a break
[20:59:28] 	 go ahead
[21:00:46] 	 10Ops-Access-Requests, 6operations, 6Discovery, 3Discovery-Cirrus-Sprint: Grant access to HTTP request logs - https://phabricator.wikimedia.org/T103872#1402323 (10Jdouglas) If all I want to do is write Hive jobs, do I still need such high-level access, or can I get something more limited?
[21:00:59] 	 Krenair: and it's unlikely somebody would want to edit the original bug description. testing actual bug content seemed still better than just some UI element 
[21:01:43] 	 10Ops-Access-Requests, 6operations, 6Discovery, 3Discovery-Cirrus-Sprint: Grant access to HTTP request logs - https://phabricator.wikimedia.org/T103872#1402327 (10Krenair)
[21:02:39] 	 10Ops-Access-Requests, 6operations, 6Discovery, 3Discovery-Cirrus-Sprint: Grant access to HTTP request logs - https://phabricator.wikimedia.org/T103872#1402331 (10Ottomata) The webrequest logs are inherently 'private data' (they contain IP addresses etc.) so hence the name.  But, ja, you will need that to...
[21:03:05] 	 ostriches: uhmm.. this one is a blocker for svn decom https://phabricator.wikimedia.org/T95140 ?
[21:04:15] 	 (03CR) 10CSteipp: "I'm wondering if we should have Commons and Wikidata in here too? I don't want to add extra jobs for no reason, but it seems like we expec" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) (owner: 10Gergő Tisza)
[21:04:57] 	 mutante: I don't think so....
[21:05:52] 	 6operations, 10ops-codfw: EQDFW/EQORD Deployment Prep Task - https://phabricator.wikimedia.org/T91077#1402342 (10RobH)
[21:06:07] 	 6operations, 10ops-codfw: EQDFW/EQORD Deployment Prep Task - https://phabricator.wikimedia.org/T91077#1073784 (10RobH)
[21:06:11] 	 !log push puppet.git after module/cassandra removal T92560
[21:06:15] 	 phab doesnt actually block us from closing tasks with blockers 
[21:06:18] 	 Logged the message, Master
[21:07:53] 	 !log rm /var/lib/git/operations/puppet/modules/cassandra from strontium and palladium
[21:07:58] 	 Logged the message, Master
[21:10:03] 	 !log rm /var/lib/git/operations/puppet/modules/cassandra from rhodium
[21:10:09] 	 Logged the message, Master
[21:10:56] 	 RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge.
[21:13:00] 	 thcipriani: ^ you might want to check the beta puppetmaster too for that, it'll fail to git pull by itself until rm -r modules/cassandra is run
[21:13:34] 	 godog: yup, can do
[21:14:20] 	 (03Abandoned) 10Filippo Giunchedi: puppetmaster: don't depend scripts on role::access_new_install [puppet] - 10https://gerrit.wikimedia.org/r/220180 (https://phabricator.wikimedia.org/T103499) (owner: 10Filippo Giunchedi)
[21:14:58] 	 thcipriani: cool, thanks!
[21:17:27] 	 andrewbogott: ^ that might affect checked out self puppetmasters too
[21:17:49] 	 6operations, 6Phabricator: Create another instance for phabricator - https://phabricator.wikimedia.org/T103918#1402388 (10Paladox) 3NEW
[21:18:24] 	 6operations, 6Phabricator: Create another instance for phabricator - https://phabricator.wikimedia.org/T103918#1402398 (10Paladox)
[21:20:40] 	 6operations, 6Phabricator: Create another instance for phabricator - https://phabricator.wikimedia.org/T103918#1402403 (10Krenair) 5Open>3declined a:3Krenair This is something that should be in labs, not production.
[21:22:19] 	 6operations, 10Traffic: let all services on misc-web enforce http->https redirects - https://phabricator.wikimedia.org/T103919#1402411 (10Dzahn) 3NEW
[21:23:15] 	 6operations, 10Traffic: check if services behind misc-web enforce http->https redirect or not - https://phabricator.wikimedia.org/T103773#1398743 (10Dzahn)
[21:23:18] 	 6operations, 10Traffic: let all services on misc-web enforce http->https redirects - https://phabricator.wikimedia.org/T103919#1402421 (10Dzahn)
[21:23:45] 	 (03PS1) 10Ori.livneh: Set hhvm.jit_pseudomain = false on all app servers [puppet] - 10https://gerrit.wikimedia.org/r/220976 
[21:23:47] 	 PROBLEM - Unmerged changes on repository puppet on labcontrol1001 is CRITICAL: There are 29 unmerged changes in puppet (dir /var/lib/git/operations/puppet).
[21:23:56] 	 PROBLEM - Unmerged changes on repository puppet on labcontrol1002 is CRITICAL: There are 29 unmerged changes in puppet (dir /var/lib/git/operations/puppet).
[21:24:31] 	 (03CR) 10Ori.livneh: [C: 032 V: 032] Set hhvm.jit_pseudomain = false on all app servers [puppet] - 10https://gerrit.wikimedia.org/r/220976 (owner: 10Ori.livneh)
[21:24:52] 	 7Blocked-on-Operations, 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: move cassandra submodule into puppet repo - https://phabricator.wikimedia.org/T92560#1402438 (10fgiunchedi) submodule merged via the procedure in T96016 and puppetmaster updated via `su -c 'rm -r /var/lib/git/operat...
[21:25:02] 	 6operations, 10Traffic: let all services on misc-web enforce http->https redirects - https://phabricator.wikimedia.org/T103919#1402411 (10Dzahn) left after removing svn and dev:  git.wikimedia.org graphite.wikimedia.org releases.wikimedia.org grafana.wikimedia.org datasets.wikimedia.org config-master.wikimedia...
[21:25:11] 	 (03CR) 10Ori.livneh: "Grr, has a path conflict. Rebase and I'll merge." [puppet] - 10https://gerrit.wikimedia.org/r/216337 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis)
[21:25:20] 	 godog: wait, what might affect it?
[21:25:26] 	 the patch you just abandoned?
[21:25:39] 	 andrewbogott: no, merging modules/cassandra
[21:25:48] 	 I'm looking at labcontrol, likely the same problem
[21:25:50] 	 oh, ok.  dang
[21:26:38] 	 godog: um… I don’t know of an obvious wholesale fix for that, if all self-hosted puppet instances can’t update anymore that’s pretty bad.
[21:27:26] 	 RECOVERY - Unmerged changes on repository puppet on labcontrol1001 is OK: No changes to merge.
[21:27:34] 	 andrewbogott: it is enough to remove modules/cassandra dir to get them going again
[21:27:58] 	 unfortunately I haven't found a workaround for that while merging submodules
[21:29:17] 	 RECOVERY - Unmerged changes on repository puppet on labcontrol1002 is OK: No changes to merge.
[21:29:35] 	 !log rm /var/lib/git/operations/puppet/modules/cassandra from labcontrol1001 labcontrol1002
[21:29:41] 	 Logged the message, Master
[21:29:59] 	 godog: I can remove that dir via salt but even that doesn’t work for projects with local salt masters…
[21:30:07] 	 godog: can you (or whoever broke this) send an email to labs-l?
[21:30:22] 	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting stat1002/1003 access for sniedzielski - https://phabricator.wikimedia.org/T97866#1402471 (10RobH)
[21:30:24] 	 andrewbogott: I broke it, I'll send an email
[21:30:29] 	 thanks
[21:30:42] 	 I will remove that dir with salt, for those hosts I can reach.
[21:31:26] 	 this is… /var/lib/git/operations/puppet/cassandra?
[21:31:29] 	 or is it in a subdir?
[21:31:37] 	 6operations, 10Traffic, 7HTTPS, 5HTTPS-by-default: let all services on misc-web enforce http->https redirects - https://phabricator.wikimedia.org/T103919#1402476 (10Krenair)
[21:31:44] 	 6operations, 10Traffic, 7HTTPS, 5HTTPS-by-default: let all services on misc-web enforce http->https redirects - https://phabricator.wikimedia.org/T103919#1402478 (10Dzahn) https://wikitech.wikimedia.org/wiki/User:Dzahn/misc-web
[21:32:15] 	 (03PS8) 10BryanDavis: logstash: jessie support and beta cluster cluster [puppet] - 10https://gerrit.wikimedia.org/r/216337 (https://phabricator.wikimedia.org/T101541) 
[21:32:16] 	 godog:  so, salt “*” cmd.run “rm -rf /var/lib/git/operations/puppet/modules/cassandra”  ?
[21:32:35] 	 (03CR) 10BryanDavis: "manual rebase" [puppet] - 10https://gerrit.wikimedia.org/r/216337 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis)
[21:32:57] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, and 2 others: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1402487 (10Quiddity)
[21:33:17] 	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 11 data above and 0 below the confidence bounds
[21:33:25] 	 10Ops-Access-Requests, 6operations: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1402490 (10RobH) We can see that you have discussion on this on T97866, but that same request included access to other items, and those were granted.  Also, the reasons listed on the old t...
[21:33:27] 	 andrewbogott: I'm not sure where the self puppetmaster clones the repo, but yeah likely there
[21:34:12] 	 ok, done
[21:34:27] 	 PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100%
[21:35:03] 	 andrewbogott: sweet, thanks! sorry about the communication failure there :(
[21:35:44] 	 godog: np.  It’s just that I spent my weekend fixing projects with broken self-hosted puppet, so not excited about it breaking again
[21:35:47] 	 RECOVERY - Host mw1085 is UPING OK - Packet loss = 0%, RTA = 1.51 ms
[21:36:04] 	 went to mgmt of 1085 but it's already back
[21:36:35] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402501 (10ellery) @RobH It is these imports that require access to tin:  require_once '/srv/mediawiki/multiversion/MWVersion.php'; require_once getMediaWiki( 'maintenance/commandLine.inc...
[21:37:24] 	 andrewbogott: heh I can imagine, the difference I guess being that puppet is unaware of all this and only git is
[21:39:21] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402506 (10Krenair) That sounds incorrect to me. You should be able to do that on terbium, which can be granted via the restricted group rather than full deployment  access. Which script...
[21:39:39] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402508 (10Legoktm) That script should be runnable from terbium.
[21:39:44] 	 oh heh
[21:41:19] 	 RECOVERY - Incoming network saturation on labstore1001 is OK Less than 10.00% above the threshold [75000000.0]
[21:41:47] 	 PROBLEM - Unmerged changes on repository puppet on rhodium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet).
[21:41:49] 	 (03CR) 10Mobrovac: [C: 04-1] "Right, GET /_info is the way to go, so no need for a new command, just use check_http_port_url" [puppet] - 10https://gerrit.wikimedia.org/r/220954 (https://phabricator.wikimedia.org/T103595) (owner: 10Hashar)
[21:42:31] 	 6operations, 10Traffic, 10Wikimedia-DNS, 7Pybal: pybal DNS lookup issues causing outage risks - https://phabricator.wikimedia.org/T103921#1402514 (10BBlack) 3NEW
[21:43:07] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402522 (10MaxSem) So basically you need access to MW shell scripts. I wonder if it makes sense to create a separate group that can run maintenance scripts but can't deploy. Or can there...
[21:43:48] 	 10Ops-Access-Requests, 6operations: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1402523 (10Niedzielski)
[21:45:25] 	 (03PS2) 10Alex Monk: Fix Shinken Mathoid probe [puppet] - 10https://gerrit.wikimedia.org/r/220954 (https://phabricator.wikimedia.org/T103595) (owner: 10Hashar)
[21:46:13] 	 anyone deploying?
[21:46:28] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402532 (10Krenair) Please read my above comments, @MaxSem.
[21:46:57] * hoo takes that for a no
[21:46:59] 	 hoo, maybe thcipriani?
[21:47:03] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402533 (10Jalexander) I can confirm, as someone is a member of the restricted group (and other then the analytics group only the restricted group) it gives me access to Terbium and essen...
[21:47:06] 	 I always check `who`
[21:47:32] 	 hoo: Krenair nope not deploying, just have a term open on tin :)
[21:47:35] * hoo prefers w :P
[21:47:51] 	 hoo, I had no idea that was a thing
[21:51:03] 	 10Ops-Access-Requests, 6operations: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1402572 (10Niedzielski) @RobH, sorry for the confusion! I wish to perform statistical analysis of the user agent strings and other properties of page views. I'm not sure which databases th...
[21:54:00] 	 10Ops-Access-Requests, 6operations: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1402585 (10dr0ptp4kt) Approved.
[21:54:37] 	 10Ops-Access-Requests, 6operations: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1402588 (10RobH) p:5Triage>3High I'll set this from 'needs triage' to high until we get all the info and approvals needed.  Once those come in it can hit normal until end of 3 day wait...
[21:54:53] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402590 (10Jalexander) >>! In T103782#1402533, @Jalexander wrote: > tldr: No need for a new group, restricted already does it, but given that I think I may be the only person who has it w...
[21:59:34] 	 (03CR) 10Gergő Tisza: "Not sure what effect that would have. Normally the account is autocreated on the fly when you visit the wiki. I can think of three failure" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) (owner: 10Gergő Tisza)
[22:00:09] 	 robh: mind if i do the access requests ?
[22:01:59] 	 !log hoo Synchronized php-1.26wmf11/extensions/Wikidata/: Update Wikidata: Use SELECT FOR UPDATE in SqlIdGenerator (duration: 00m 20s)
[22:02:06] 	 Logged the message, Master
[22:02:52] 	 (03PS2) 10Gergő Tisza: Autocreate accounts on meta, mediawiki.org, loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) 
[22:03:35] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402624 (10RobH) a:5ori>3ellery So it seems this request can be adjusted to the following:  **Grant Ellery Wulczyn @ellery access to terbium via the restricted group.**    restricted:...
[22:04:47] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402628 (10RobH) Also feed free to edit your task description and title to reflect the proposed change above.  (If you agree, it seems more legitimate if you change it.)  If you agree oth...
[22:06:56] 	 10Ops-Access-Requests, 6operations: stat1002 access requested for sniedzielski - https://phabricator.wikimedia.org/T103871#1402634 (10Matanya) a:3Matanya
[22:07:16] 	 RECOVERY - carbon-cache too many creates on graphite1001 is OK Less than 1.00% above the threshold [500.0]
[22:10:57] 	 (03CR) 10CSteipp: [C: 031] "Agree it shouldn't be a problem in most cases." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) (owner: 10Gergő Tisza)
[22:12:15] 	 (03PS1) 10Matanya: access: grant niedzielski access to stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/220989 
[22:13:01] 	 (03CR) 10Hoo man: [C: 031] "Looks good to me... and yes, having accounts auto create on wikidatawiki would be awesome." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) (owner: 10Gergő Tisza)
[22:13:22] 	 (03PS2) 10Matanya: access: grant niedzielski access to stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/220989 
[22:13:58] 	 7Puppet, 6Phabricator: Local config file contains escape characters - https://phabricator.wikimedia.org/T103924#1402659 (10Negative24) 3NEW
[22:16:01] 	 EQDFW/EQORD are new DC's?
[22:18:48] 	 (03PS1) 10Matanya: access: grant Jdouglas access toanalytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/220990 
[22:22:54] 	 (03PS5) 10Rush: pybal: pybal-eval-check use syslog as well as stdout [puppet] - 10https://gerrit.wikimedia.org/r/220948 
[22:23:58] 	 (03PS1) 10Andrew Bogott: Add a labsproject fact that doesn't rely on ldap config. [puppet] - 10https://gerrit.wikimedia.org/r/220991 
[22:24:52] 	 (03PS2) 10Andrew Bogott: Add a labsproject fact that doesn't rely on ldap config. [puppet] - 10https://gerrit.wikimedia.org/r/220991 
[22:25:14] 	 (03CR) 10Rush: [C: 032] pybal: pybal-eval-check use syslog as well as stdout [puppet] - 10https://gerrit.wikimedia.org/r/220948 (owner: 10Rush)
[22:26:13] 	 (03PS3) 10Andrew Bogott: Add a labsproject fact that doesn't rely on ldap config. [puppet] - 10https://gerrit.wikimedia.org/r/220991 (https://phabricator.wikimedia.org/T93684) 
[22:31:10] 	 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, and 2 others: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1402775 (10faidon) > Iterate on the graceful restart procedure until it no longer generates alerts o...
[22:31:38] 	 (03CR) 10Giuseppe Lavagetto: [C: 031] "Looking at the compiler, it does what we expect." [puppet] - 10https://gerrit.wikimedia.org/r/220645 (owner: 10BBlack)
[22:35:39] 	 matanya: they're peering cabinets or so I think. robh will explain it better
[22:36:06] 	 ah, makes sense. thanks
[22:36:30] 	 They're just a cabinet - I know that much :)
[22:36:42] 	 Not in codfw but else where in da
[22:36:51] 	 *dallas I believe
[22:36:58] 	 probably peering exchange.
[22:37:20] 	 they're network points of presence
[22:37:34] 	 in a more central DC in Dallas (eqdfw) and Chicago (eqord)
[22:37:54] 	 both are going to connected to the local internet exchange
[22:37:58] 	 paravoid: oh Chicago? Okay :)
[22:37:58] 	 but it's not just that
[22:38:09] 	 Chicago is going to get links to Ashburn/Dallas/San Francisco
[22:38:40] 	 instead of having Ashburn-San Francisco and Ashburn-Dallas, we're splitting this in the middle and saving some money
[22:38:48] 	 plus reach the local internet community
[22:38:57] 	 plus increase our resiliency to failures
[22:39:15] 	 sounds good!
[22:39:35] 	 ori, that must be a first .. referencing bhagavad gita in a phab ticket. :)
[22:40:11] 	 paravoid: now i remember i saw a mail about this sometime ago
[22:40:59] 	 (03CR) 10Alex Monk: "I'd kind of prefer to have it properly logged etc.... At least we should make sure their page about the filemover group explains the chang" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218926 (https://phabricator.wikimedia.org/T102770) (owner: 10Glaisher)
[22:41:19] 	 PROBLEM - Incoming network saturation on labstore1001 is CRITICAL 10.71% of data above the critical threshold [100000000.0]
[22:42:37] 	 Krenair: if you wish i can do that ^ with my other hat
[22:43:08] 	 matanya, ahh you're a steward right?
[22:43:13] 	 yes
[22:43:23] 	 that should be enough
[22:43:35] 	 ok, let me know what you wish to do
[22:44:05] 	 although they do have some local bureaucrats: https://it.wikipedia.org/wiki/Speciale:Utenti/bureaucrat
[22:44:25] 	 m7 and vito are stewards as well
[22:45:26] 	 jouncebot, next
[22:45:27] 	 In 0 hour(s) and 14 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150625T2300)
[22:45:30] 	 I think that makes sense
[22:46:14] 	 so you want me to do that in 14 minutes, or wait for them to be online ?
[22:46:47] 	 PROBLEM - Incoming network saturation on labstore1001 is CRITICAL 10.34% of data above the critical threshold [100000000.0]
[22:47:08] 	 sigh...
[22:47:12] 	 is wikitech not letting anyone else in?
[22:47:49] 	 wfm'
[22:48:20] 	 Krenair: summoned Vito for you
[22:48:53] 	 matanya, as in, you can log in?
[22:48:56] 	 or just view?
[22:49:01] 	 log in Krenair 
[22:49:08] 	 what's up Krenair?
[22:49:16] 	 PROBLEM - puppet last run on db2059 is CRITICAL puppet fail
[22:49:47] 	 Vito: https://gerrit.wikimedia.org/r/218926
[22:50:15] 	 Vito, so with this change, once it's done you can't remove users from it
[22:50:28] 	 look at the task :D
[22:50:31] 	 so the users need to be removed, then the change sync'd, then the users re-added
[22:51:03] 	 YuviSheep, is wikitech login broken for you as well?
[22:51:13] 	 I planned to open two tasks for this reaons
[22:51:18] 	 *reason
[22:51:53] 	 matanya, when I try logging into wikitech it just pretends I didn't attempt to log in :/
[22:52:00] 	 it takes me to the returnto page... but still logged out
[22:52:10] 	 hmm, weird
[22:52:23] 	 might be something to do with mfa?
[22:53:41] 	 done Krenair
[22:54:30] 	 so... since I can't log in to wikitech to add these to the calendar, I plan to swat:
[22:54:35] 	 * https://gerrit.wikimedia.org/r/#/c/218926/
[22:54:38] 	 * https://gerrit.wikimedia.org/r/#/c/220847/
[22:54:43] 	 Krenair: i don't know, but i can add it for you
[22:54:49] 	 should poke YuviSheep or andrewbogott_afk 
[22:55:07] 	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected
[22:57:25] 	 matanya, interesting. it works if I load it in incognito mode
[22:57:29] 	 but not normal browsing...
[22:57:42] 	 so you might have some gadget or something like that
[22:58:14] 	 okay, cleared cookies, wikitech lets me log in
[22:58:16] 	 weird.
[23:00:05] 	 RoanKattouw ostriches rmoen: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150625T2300).
[23:00:05] 	 gilles bd808: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:00:10] 	 okay
[23:01:02] 	 gilles, around?
[23:01:10] 	 Krenair: yes
[23:01:14] 	 (03PS2) 10Alex Monk: Enable TinyRGB ICC profile swapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220485 (https://phabricator.wikimedia.org/T100976) (owner: 10Gilles)
[23:01:15] 	 o/
[23:01:21] 	 (03CR) 10Alex Monk: [C: 032] Enable TinyRGB ICC profile swapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220485 (https://phabricator.wikimedia.org/T100976) (owner: 10Gilles)
[23:01:21] 	 Krenair: are you swatting? 
[23:01:27] 	 (03Merged) 10jenkins-bot: Enable TinyRGB ICC profile swapping [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220485 (https://phabricator.wikimedia.org/T100976) (owner: 10Gilles)
[23:01:30] 	 rmoen, yep
[23:01:36] 	 Krenair: ok ty
[23:02:24] 	 !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220485/ (duration: 00m 16s)
[23:02:26] 	 gilles, done, please test
[23:02:33] 	 Logged the message, Master
[23:02:33] 	 Krenair: sure thing
[23:03:12] 	 Krenair: the test for mine is logging doesn't blow up in prod
[23:03:18] 	 very scientific
[23:03:24] 	 heh
[23:03:48] 	 !log fixed content models on lrcwiki for Module namespace
[23:03:54] 	 Logged the message, Master
[23:05:37] 	 RECOVERY - puppet last run on db2059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[23:06:21] 	 Krenair: change works, thank you
[23:06:40] 	 (03PS3) 10Alex Monk: logging: Force Monolog logger timezone to UTC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220784 (https://phabricator.wikimedia.org/T99581) (owner: 10BryanDavis)
[23:06:47] 	 (03CR) 10Alex Monk: [C: 032] logging: Force Monolog logger timezone to UTC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220784 (https://phabricator.wikimedia.org/T99581) (owner: 10BryanDavis)
[23:09:12] 	 Is jenkins asleep?
[23:09:42] 	 (03Merged) 10jenkins-bot: logging: Force Monolog logger timezone to UTC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220784 (https://phabricator.wikimedia.org/T99581) (owner: 10BryanDavis)
[23:09:49] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1402986 (10Krenair) Is this request related to the email we all just got in French, something about translating French Wikipedia pages?
[23:11:38] 	 bd808, syncing
[23:11:44] 	 !log krenair Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/#/c/220784/ (duration: 00m 13s)
[23:11:51] 	 Logged the message, Master
[23:12:15] 	 Krenair: see -research
[23:13:24] 	 and the answer to your question on the phab ticket seems to be :"no" :)
[23:13:25] 	 Krenair: logs getting into logstash and nothing exciting in fatalmonitor. I'd call that a win
[23:13:33] 	 certainly seems ok
[23:14:03] 	  nothing exciting in fatalmonitor << only boring fatals? :)
[23:14:28] 	 (03CR) 10Alex Monk: [C: 032] Create 'mover' usergroup at Italian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218926 (https://phabricator.wikimedia.org/T102770) (owner: 10Glaisher)
[23:14:51] 	 I am temted to create an app that converts reports from the bot to music
[23:14:59] 	 (03Merged) 10jenkins-bot: Create 'mover' usergroup at Italian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218926 (https://phabricator.wikimedia.org/T102770) (owner: 10Glaisher)
[23:15:01] 	 *tempted
[23:15:18] 	 so done?
[23:15:54] 	 Vito, not yet
[23:15:58] 	 !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/218926/ (duration: 00m 12s)
[23:15:59] 	 I see
[23:16:00] 	 it's still rolling out to the servers
[23:16:04] 	 and *now* it's done :)
[23:16:04] 	 Logged the message, Master
[23:16:08] 	 hehehe
[23:16:36] 	 It has to push to about 478 servers at the moment
[23:16:50] 	 so, takes a few seconds
[23:18:33] 	 Utente:Vegetable and you are done Vito 
[23:18:41] 	 already done
[23:18:47] 	 ah
[23:18:49] 	 great
[23:18:57] 	 Who's doing swat?
[23:19:00] 	 me
[23:19:08] 	 got something to add Krinkle?
[23:19:21] 	 Krenair: Yeah :) Minor SyntaHighlight patch
[23:19:25] 	 cool
[23:19:59] 	 [extensions/SyntaxHighlight_GeSHi] (wmf/1.26wmf11) - https://gerrit.wikimedia.org/r/220997
[23:20:01] 	 !log canary update of restbase on restbase1001 to 4b961f166 (deploy d1c4d9961)
[23:20:06] 	 Logged the message, Master
[23:20:55] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1403030 (10ellery) Here are the details requested for account creation:  Your full name: Ellery Wulczyn Your labs username/wikitech username: ewulczyn Your preferred shell user name: elle...
[23:22:07] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1403032 (10ellery) a:5ellery>3None
[23:22:44] 	 ty Krenair
[23:23:37] 	 10Ops-Access-Requests, 6operations: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1398996 (10ellery) @DarTar @Tnegrin could one of you guys approve this request?
[23:24:43] 	 !log krenair Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: https://gerrit.wikimedia.org/r/#/c/220997/ (duration: 00m 13s)
[23:24:44] 	 Krinkle, ^ please test
[23:24:49] 	 Logged the message, Master
[23:25:06] 	 Krenair: Done. Works perfectly, https://www.mediawiki.org/wiki/Extension:SyntaxHighlight_GeSHi#inline
[23:25:10] 	 great
[23:25:26] 	 PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL 1.69% of data above the critical threshold [1000.0]
[23:25:35] 	 (03CR) 10Alex Monk: [C: 032] Revert "Revert "Pull out unnecessary wikitech settings, move some into CommonSettings"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220847 (owner: 10Alex Monk)
[23:26:00] 	 (03Merged) 10jenkins-bot: Revert "Revert "Pull out unnecessary wikitech settings, move some into CommonSettings"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220847 (owner: 10Alex Monk)
[23:27:07] 	 !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 11s)
[23:27:13] 	 Logged the message, Master
[23:27:59] 	 !log krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 12s)
[23:28:05] 	 Logged the message, Master
[23:29:32] 	 seems ok...
[23:31:11] 	 (03PS1) 10Ori.livneh: protoproxy: Set X-SPDY header on proxied requests [puppet] - 10https://gerrit.wikimedia.org/r/221000 
[23:31:26] 	 PROBLEM - Krenair is CRITICAL 100% above critical threshold.
[23:31:53] 	 !log apt-get upgrade on zirconium
[23:31:59] 	 Logged the message, Master
[23:32:06] 	 :)
[23:32:16] 	 PROBLEM - Incoming network saturation on labstore1001 is CRITICAL 14.29% of data above the critical threshold [100000000.0]
[23:32:24] 	 godog:  PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL 1.69% of data above the critical threshold [1000.0]
[23:34:10] 	 (03PS9) 10Ori.livneh: logstash: jessie support and beta cluster cluster [puppet] - 10https://gerrit.wikimedia.org/r/216337 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis)
[23:34:15] 	 (03CR) 10Ori.livneh: [C: 032 V: 032] logstash: jessie support and beta cluster cluster [puppet] - 10https://gerrit.wikimedia.org/r/216337 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis)
[23:35:40] 	 mutante: remind me how to find a uid in labs please ldapsearch -l something ?
[23:36:34] 	 matanya: ldaplist -l passwd matanya
[23:37:02] 	 or just `id matanya`
[23:38:08] 	 thank you both
[23:38:39] 	 andrewbogott_afk, I guess we don't  need $wgOpenStackManagerProxyGateways['pmtpa'] anymore?
[23:38:44] 	 (03CR) 10Dzahn: [C: 032] add вікімедіа.укр (xn--80adgdym4pbd.xn--j1amh) [dns] - 10https://gerrit.wikimedia.org/r/215212 (https://phabricator.wikimedia.org/T95433) (owner: 10Dzahn)
[23:39:18] 	 ori: thanks, yeah new restbase deploy creates new tables
[23:40:55] 	 (03PS1) 10Matanya: access: adding Ellery Wulczyn to user with prod access [puppet] - 10https://gerrit.wikimedia.org/r/221004 
[23:41:33] 	 godog: what is the current disk situation on the graphite host?
[23:42:52] 	 (03PS1) 10Matanya: access: Grant Ellery Wulczyn @ellery access to terbium via the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/221006 
[23:43:08] 	 gwicke: https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=graphite1001.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1435275769&v=75.5&m=part_max_used&vl=%25&ti=Maximum%20Disk%20Space%20Used&z=large
[23:44:19] 	 (03CR) 10Matanya: "deploy this one before : https://gerrit.wikimedia.org/r/#/c/221004/" [puppet] - 10https://gerrit.wikimedia.org/r/221006 (owner: 10Matanya)
[23:46:26] 	 6operations, 10vm-requests, 5Patch-For-Review: EQIAD: 1 VM request for planet - https://phabricator.wikimedia.org/T101899#1403132 (10Dzahn) @akosiaris thank you very much. afraid i don't have much to add why you could not reproduce it, but happy that it works now
[23:48:04] 	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Deployment Access to tin for Ellery Wulczyn - https://phabricator.wikimedia.org/T103782#1403140 (10Platonides) Krenair, I would bet that it is. I wonder how he could send those emails when the task isn't filled, though.
[23:49:25] 	 Why do some people have corp.wikimedia.org as part of their ssh key comments?
[23:50:17] 	 (03CR) 10Alex Monk: [C: 04-1] "Already exists" [puppet] - 10https://gerrit.wikimedia.org/r/221004 (owner: 10Matanya)
[23:50:46] 	 Krenair: because they get laptops with preinstalled OS from OIT
[23:50:54] 	 and the hostname shows up in key comment
[23:53:17] 	 !log planet1001 (ganeti) - signing puppet cert, initial run
[23:53:24] 	 Logged the message, Master
[23:54:37] 	 (03PS1) 10Alex Monk: More wikitech cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221009 (https://phabricator.wikimedia.org/T75939) 
[23:54:40] 	 (03CR) 10Ori.livneh: [C: 032] "Thanks for jumping on this so quickly." [tools/scap] - 10https://gerrit.wikimedia.org/r/220941 (https://phabricator.wikimedia.org/T103886) (owner: 10BryanDavis)
[23:55:00] 	 (03Merged) 10jenkins-bot: Add an hhvm-graceful-all command [tools/scap] - 10https://gerrit.wikimedia.org/r/220941 (https://phabricator.wikimedia.org/T103886) (owner: 10BryanDavis)
[23:56:26] 	 (03Abandoned) 10Matanya: access: adding Ellery Wulczyn to user with prod access [puppet] - 10https://gerrit.wikimedia.org/r/221004 (owner: 10Matanya)
[23:56:44] 	 (03CR) 10Matanya: "ignore above comment." [puppet] - 10https://gerrit.wikimedia.org/r/221006 (owner: 10Matanya)
[23:59:34] 	 6operations, 7Database: review eqiad database server quantities / warranties / service(s) - https://phabricator.wikimedia.org/T103936#1403190 (10RobH) 3NEW a:3RobH