[00:08:32] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [00:21:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [00:27:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:28:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [00:33:52] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Mon Aug 5 00:33:49 UTC 2013 [00:34:32] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [00:52:21] Hi RD. [00:52:36] RD: You could try filing a bug. [01:26:22] done [02:05:54] !log LocalisationUpdate completed (1.22wmf12) at Mon Aug 5 02:05:54 UTC 2013 [02:06:12] Logged the message, Master [02:14:37] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Aug 5 02:14:37 UTC 2013 [02:14:48] Logged the message, Master [03:18:10] TimStarling: are you there? https://bugzilla.wikimedia.org/52539 should be a quick fix [03:18:18] i.e. very little work [03:20:44] hrmm, seems to be done now. but without a link [03:29:44] danke tim! [05:57:41] (03PS1) 10Ori.livneh: Rename 'redis.py' to 'redis_monitoring.py' to avoid conflict [operations/puppet] - 10https://gerrit.wikimedia.org/r/77657 [06:07:15] (03CR) 10Faidon: [C: 032] "This isn't used and it's soon going to be replaced but why not..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77164 (owner: 10Hashar) [06:08:12] ori-l: tested? [06:08:15] the redis change [06:19:09] (03CR) 10Faidon: [C: 031] "(2 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [07:09:31] (03PS2) 10Faidon: Rename 'redis.py' to 'redis_monitoring.py' to avoid conflict [operations/puppet] - 10https://gerrit.wikimedia.org/r/77657 (owner: 10Ori.livneh) [08:34:17] (03CR) 10Hashar: "I have been struck by that one as well with the Jenkins module:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77657 (owner: 10Ori.livneh) [09:23:23] huh [09:23:25] icinga's down [09:23:28] neon is unresponsive [09:23:29] how nice [09:23:58] oom'ed [09:24:50] login: failure forking: Cannot allocate memory [09:25:51] !log powercycling neon, OOM'ed, "login: failure forking: Cannot allocate memory" etc.; icinga down since ~01:00 UTC [09:26:02] Logged the message, Master [09:34:43] hey mark, will you have some time the coming two weeks to start looking into varnishkafka performance so that we have some data to discuss by August 15th? [09:59:55] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [09:59:55] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [09:59:55] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run [12:08:20] (03PS1) 10TTO: Add autopatrol protection level for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77701 [12:40:02] I'd like to add some useful tools (for instance a centralauth link) here https://login.wikimedia.org/wiki/MediaWiki:Sp-contributions-footer [12:40:25] anyone knows why I cannot do it even if I've just given myself admin rights on this wiki? [12:48:23] Vito: because according to https://login.wikimedia.org/wiki/Special:ListGroupRights no one has the "editinterface" right [12:48:51] MatmaRex: I can give it to myself actually [12:48:52] (compare to https://en.wikipedia.org/wiki/Special:ListGroupRights etc) [12:49:24] Vito: hm? [12:49:32] MatmaRex: loginwiki is getting a battleground for spambots [12:49:54] so we (stewies) need to do so many checkusages [12:50:07] Vito: in MW's permission system, user groups (e.g. Administrators) can have one of the many user rights (e.g. editprotected, editinterface) [12:50:07] having some links in the footer would be quite useful [12:50:21] I can give editinterface to my globalgroup [12:50:21] Vito: here, the editinterface right is not assigned to any group [12:50:36] via Special:GlobalGroupPermissions [12:50:37] okay, do this then :) [12:51:04] yep, it will be just a temporary measure ofc :D [12:51:09] i was just explaining why it doesn't work normally [12:51:23] oh we already have :| [12:51:48] actually the error message I'm getting is quite weird [12:51:53] it's the "no such page" one [12:52:11] heh [12:52:13] 404 error actually [12:52:15] ah, true [12:52:16] https://login.wikimedia.org/w/index.php?title=MediaWiki:Sp-contributions-footer&action=edit [12:52:36] heh, that's funny [12:53:00] Vito: https://login.wikimedia.org/wiki/MediaWiki:Sp-contributions-footer?action=edit apparnetly works [12:53:13] but trying to save might point to that 404 again [12:53:25] you're going to need anactual operator for this, i'm afraid :) [12:53:52] I see, ty! [12:54:33] my omnipotence has met its upper limit :D [12:54:55] I can give editinterface to my globalgroup p858snake|l: I think they are overwritten by initialsettings.php [12:58:31] in fact we already have global editinterface but it still doesn't work [12:58:56] ah good, they were setup right then [12:59:16] Vito: Yes, chris has restricted rights on the wiki for a reason [12:59:34] nope, now I can create them [12:59:43] I need to add createpage right [13:10:25] p858snake|l, MatmaRex: https://login.wikimedia.org/wiki/Special:Contributions/Vituzzu [13:12:42] SECURRRRITY BRRRREACH [13:12:43] :D [13:12:56] z0mg hacking!!!11!1!!11 [13:13:18] kids, lesson for today: it's very hard to block editing on a MediaWiki wiki! :P [13:14:12] MatmaRex: nah, chris and Reedy probably just didn't think about global user rights when restricting the rights/tools on loginwiki [13:14:49] probably. doesn't mean we can't make fun of them :D [13:15:04] p858snake|l: though global rights are supposed to be given to wise people [13:15:10] I'm an exception btw [13:15:33] local rights are ment to be given to wise people [13:15:47] global ones aren't? :O [13:26:58] Say what now? [13:27:39] I suspect the chance of abuse is relatively low [13:29:25] Does someone want to open a bug for this? [13:29:37] to me there's no bug [13:29:55] Well, there is and there isn't [13:30:25] It's somewhat based on the hope of that people who may have this ability aren't likely to abuse it [13:30:45] Reedy: I would imagine loginwiki just needs to be taken out of the global wikisets [13:31:07] Reedy: Isn't that the same theory for local user rights? [13:31:23] We purposely removed them for the local users [13:31:39] * Reedy shrugs [13:31:53] I know that, thats why I brought this whole thing up with the global groups basically [13:32:36] loginwiki out of global wikiset would get useless [14:00:07] !log Put non-European wikivoyage traffic on the eqiad text Varnish cluster [14:00:18] Logged the message, Master [14:14:39] (03PS1) 10Mark Bergsma: Move wikivoyage from the text to the text-varnish LVS group [operations/puppet] - 10https://gerrit.wikimedia.org/r/77710 [14:14:49] (03PS1) 10ArielGlenn: really fix getting last deployment date [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/77712 [14:15:45] (03CR) 10Mark Bergsma: [C: 032] Move wikivoyage from the text to the text-varnish LVS group [operations/puppet] - 10https://gerrit.wikimedia.org/r/77710 (owner: 10Mark Bergsma) [14:17:41] Vito, yes that is a bug, you just exploited it, you're not supposed to be able to do that [14:31:59] Reedy, did you open a bug or shall I? [14:33:53] Krenair: someone filed a bug under Security [14:34:15] (i don't have access to it, i just know it exists, because i lurk too much) [14:34:33] oh okay, one was filed then. [14:35:42] It's strange how the length of the URL box and the position of the severity/hardware/OS fields change with the width of the description box on the create bug page [14:47:13] um,how? [14:52:47] how? [14:53:38] well, you drag the textarea to be wider, and the URL box gets wider with it, the severity/hardware/OS fields move to the right.. [14:55:51] heh, it doesn't for me [14:56:03] but, well, the whole form is a huge four-column table with some colspans [14:56:13] soi guess if you make the table wider, it's columns get wider as well [14:57:25] reporter text also moves [14:58:16] everything moves, because that's how tables work, except apparently on Opera. :P [15:14:32] heya, apergos [15:14:42] the /usr/local/bin/daily-pagestats-copy.sh job on snapshot1 seemed to be broken [15:14:51] rsync: mkdir "/mnt/data/pagecounts/incoming" failed: No such file or directory (2) [15:14:54] I created that directory [15:15:00] and it seems to be running now [15:15:34] ugh [15:15:43] that would be me, earlier today from stupidity [15:15:58] that means it should have not lost anything. thanks for catching it [15:17:23] that's daily, I wonder why it has files from pagecounts-20130725-020004.gz in there [15:17:27] ottomata: ? [15:18:06] dunno, i just created the dir [15:18:08] and ran the job manually [15:18:11] it looks like it is rsyncing everything [15:18:44] maybe stuff is kept around in there for a week or two [15:20:18] (03PS1) 10Mark Bergsma: Add Text caches eqiad cluster to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/77715 [15:21:26] (03CR) 10Mark Bergsma: [C: 032] Add Text caches eqiad cluster to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/77715 (owner: 10Mark Bergsma) [15:40:37] (03PS1) 10Jgreen: switch to exim::roled for otrs server [operations/puppet] - 10https://gerrit.wikimedia.org/r/77717 [15:40:59] (03CR) 10jenkins-bot: [V: 04-1] switch to exim::roled for otrs server [operations/puppet] - 10https://gerrit.wikimedia.org/r/77717 (owner: 10Jgreen) [15:50:09] (03PS2) 10Jgreen: switch to exim::roled for otrs server [operations/puppet] - 10https://gerrit.wikimedia.org/r/77717 [15:50:27] (03CR) 10jenkins-bot: [V: 04-1] switch to exim::roled for otrs server [operations/puppet] - 10https://gerrit.wikimedia.org/r/77717 (owner: 10Jgreen) [15:51:38] ganglia down? [15:52:31] Aug 5 15:48:02 nickel kernel: [6865893.044225] gmetad[1570]: segfault at 18 ip 00007f2e70e0acde sp 00007f2e64986c20 error 4 in librrd.so.4.0.7[7f2e70de8000+31000] [15:52:34] nice [15:53:09] oh my, /mnt/ganglia_tmp is 100% full [15:53:43] anyone familiar with the setup and what that is for? [15:53:54] LeslieCarr is [15:53:57] i forget who else [15:54:16] I gave it 10% more [15:55:29] !log cleared old cruft snapmirror replication from nas1-a to nas1001-a. Direction has been reverted long ago [15:55:41] Logged the message, Master [15:57:40] (03PS1) 10Faidon: Ganglia: increase tmpfs size from 3G to 4G [operations/puppet] - 10https://gerrit.wikimedia.org/r/77718 [15:57:50] icinga AND ganglia broken the same day [15:58:13] (03CR) 10Faidon: [C: 032] Ganglia: increase tmpfs size from 3G to 4G [operations/puppet] - 10https://gerrit.wikimedia.org/r/77718 (owner: 10Faidon) [15:58:15] garg. can someone with fresh eyes look at the elusive typo gerrit's parser claims to be finding in my latest commit? [15:58:37] https://gerrit.wikimedia.org/r/#/c/77717/ [15:58:43] for the life of me I can't see it [15:59:51] *click* [16:02:43] you use $ as class parameters [16:02:55] AH [16:02:58] class { 'exim::roled': 11 $enable_otrs_server => 'true' [16:03:03] drop the $ there [16:03:03] right, i see it now [16:03:13] also, add a trailing comma to the enable_spamassassin option [16:03:17] that's not a syntax error, it's just nice :) [16:03:30] what's the point of the commented out block? [16:03:37] 502s from gitblit :( [16:03:39] just drop it? [16:03:49] ^d: ^ [16:04:09] paravoid: below? i'll drop it in a few minutes assuming I can get exim::roled to do my bidding [16:04:17] just drop it now? :) [16:04:19] I mean, we have git [16:04:27] if you need to get it back, you can just revert [16:04:35] paravoid: just a work style difference [16:05:02] drdee: we've had the "do we need nginx udp2log" thread like 4-5 times now [16:05:18] drdee: I'll let you (analytics) figure out if we need it first, then I'll comment again :) [16:05:48] (03PS3) 10Jgreen: switch to exim::roled for otrs server [operations/puppet] - 10https://gerrit.wikimedia.org/r/77717 [16:06:33] afaik we need it, i didn't understand ottomata's reply; maybe you can reply regardless? [16:06:47] I don't think we do [16:07:19] but that means we don't have accurate monitoring of the nginx servers and they are becoming more important [16:07:33] what andrew said is that we use nginx as an SSL (and IPv6) proxy, all requests hitting nginx end up hitting varnish/squid [16:07:55] yes, and when we disable nginx from ud2plog then we discovered we had big page count problems [16:08:11] because those requests get internal ip addresses and hence are being filtered [16:08:14] so a) you don't lose analytics by not counting them b) even if we fixed the module, I remember you didn't have a way to detect that these requests are duplicates [16:08:16] (03CR) 10Jgreen: [C: 032 V: 032] switch to exim::roled for otrs server [operations/puppet] - 10https://gerrit.wikimedia.org/r/77717 (owner: 10Jgreen) [16:08:26] so you would end up double-counting them [16:08:44] we are not double counting them because we filter internal ip addresses [16:09:08] but that means we don't have a reliable packet loss monitoring [16:09:18] for nginx; i think that's an issue [16:10:13] packet loss on the udp2log stream you mean? [16:10:23] yes [16:10:34] but you don't use that stream I think [16:11:14] i am pretty sure we do :) [16:12:05] <^d> jeremyb: On it, grr. [16:12:50] okay [16:12:54] <^d> jeremyb: I'm going to finish the icinga monitoring for it this week. [16:13:43] <^d> jeremyb: Ok, back up. Thanks. [16:14:38] (03PS1) 10Akosiaris: Refactor nrpe to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/77720 [16:14:52] (03CR) 10jenkins-bot: [V: 04-1] Refactor nrpe to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/77720 (owner: 10Akosiaris) [16:15:25] ^d: danke [16:15:51] ^d: any tips for how to teach irssi that you should not be at the end of the list for d ? :-) [16:16:13] <^d> Heh, nope :) [16:16:27] Type "^". [16:16:48] jeremyb: huh! I didn't know it would consider d a way to get to ^d at all, but you're right! [16:16:57] what Elsie said, though [16:17:23] also, irssi remembers your last completion and puts that at the top, not sure it'll work for this d but I really mean ^ case [16:18:24] greg-g: well i used to do de and that got me to chad much quicker. no, it's not remembering ^d was recently used. i suppose i could bug rhonda about it [16:18:41] then do what Elsie said [16:18:43] :P [16:18:48] that's too hard [16:18:52] I'm hard. [16:19:01] anyway, have to run [16:20:11] (03PS2) 10Akosiaris: Refactor nrpe to a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/77720 [16:20:38] paravoid: /tmp/ganglia on nickel is where all the rrd's are written - in memory because or else the disk is just always pegged --- [16:21:34] What OS do the WMF servers run? [16:21:57] Ubuntu precise, I think. [16:22:34] precise, a few left over lucid and very few hardy on their way out [16:22:37] but yes, ubuntu [16:26:27] (03CR) 10Tim Landscheidt: [C: 031] "Looks good to me. When I apply it, I also see "notice: /Stage[main]/Ganglia_new::Monitor::Service/Service[ganglia-monitor]: Triggered 're" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77657 (owner: 10Ori.livneh) [16:31:49] (03PS1) 10Jgreen: finish parameterizing spamassassin, change exim::roled to depend on it without defining it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77721 [16:32:59] wooooOoo, finally a commit that jenkins doesn't hate. [16:36:05] (03PS2) 10Jgreen: finish parameterizing spamassassin, change exim::roled to depend on it without defining it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77721 [16:37:22] (03CR) 10Mark Bergsma: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77721 (owner: 10Jgreen) [16:39:28] (03CR) 10Jgreen: [C: 032 V: 032] finish parameterizing spamassassin, change exim::roled to depend on it without defining it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77721 (owner: 10Jgreen) [16:40:54] er [16:41:00] I had a comment right [16:41:31] Jeff_Green: I didn't want to -1 or -2 it as it was minor, but perhaps a +2 from you is not in order for that patchset :) [16:41:33] i saw mention of a comment, but there wasn't anything to expand in the UI so I thought it was just a gerrit null-comment [16:41:41] huh [16:41:44] but it looks GREAT! [16:41:47] well anyway, there was an indentation issue [16:41:54] mixed tabs/spaces [16:41:55] i fixed the whitespace before pushing it out [16:42:09] oh [16:42:11] the +2 is on the version I fixed [16:42:16] ok [16:42:20] nvm then :) [16:42:59] I saw it just before i asked you to review, but was fighting with git+gerrit to get the patch amended [16:43:14] mark: do you know about theforeman? [16:43:34] about what? [16:43:36] http://theforeman.org [16:43:58] looks nice [16:44:27] would that help the ops team? [16:45:04] maybe [16:45:07] we have some of that already [16:45:22] notably the monitoring part [16:45:32] (03PS1) 10Tim Landscheidt: Tools: Enable Ganglia monitoring for Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/77722 [16:45:53] the combination of all stuff in one place is the winning part. anyway, the strong part is the provisioning, not the monitoring [16:48:26] thanks... we'll have a look at it :) [16:49:06] np [16:53:16] i think i've looked at that years ago [16:53:26] wasn't nearly as nice and polished then [16:55:19] I've seen it a few times over the years too [16:55:35] someone volunteered to set it up for Debian so I was lazily waiting for that [16:57:43] at your spare time... :) [17:00:05] (03PS1) 10Jgreen: merge otrs classes into one, add nrpe since mail::roled breaks without it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77724 [17:00:22] (03CR) 10jenkins-bot: [V: 04-1] merge otrs classes into one, add nrpe since mail::roled breaks without it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77724 (owner: 10Jgreen) [17:01:56] (03PS2) 10Jgreen: merge otrs classes into one, add nrpe since mail::roled breaks without it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77724 [17:03:33] (03CR) 10Jgreen: [C: 032 V: 031] merge otrs classes into one, add nrpe since mail::roled breaks without it [operations/puppet] - 10https://gerrit.wikimedia.org/r/77724 (owner: 10Jgreen) [17:06:08] Jeff_Green: I asked in -staff but not to you, but thought I'd just ask you: will we (ie: you) keep OTRS upgraded regularly now, instead of doing major "holy sh!t!" upgrades? [17:06:39] greg-g: good question and I don't know the answer. this one took a very long time didn't it. [17:07:03] i'd imagine periodic updates will be easier going forward and we seem better staffed to handle them then when I started [17:07:19] so I'm cautiously optimistic I guess? :-) [17:07:55] Jeff_Green: good enough answer for me :) [17:08:00] good! [17:10:04] * Jeff_Green biab. [17:14:35] AaronSchulz: ping me when you're available [17:17:39] OTRS was updated? :O [17:18:59] paravoid: btw, I *think* I agree with your mail re icinga paging, but I'm just worried that for many services the response will automatically be "we don't know enough to do anything other than kicking the reset button, should probably ping X." Which, really, that just inceases the outage time window. :/ [17:19:37] paravoid: but, I totally agree with your sentiment that you all *shouldn't* be in the situation where you don't know about the servics running in our infra [17:19:52] the question is, if such measures will just prolong and increase our ignorance. [17:19:58] * greg-g nods [17:20:26] right, one or two outages where you watch/learn what someone else does [17:20:36] but, learning by outage isn't really the best idea ;) [17:21:05] so, i don't have an answer, just thinking out loud [17:25:47] paravoid: on another topic, we're getting multiple reports of WMF hosted mailing list messages getting flagged as spam by gmail. And since "real email support" == "Faidon" in my head... ;) [17:26:31] * greg-g emails ops [17:26:39] lol [17:31:29] * greg-g highfives notpeter  [17:34:16] iirc that usually ends up with erik emailing Google contacts [17:34:40] Nemo_bis: ugh [17:35:39] last times it was too many people manually declaring our posts spam [17:35:53] we need some mechanical turks to click "not spam" a million times :P [17:38:07] but maybe it's something else this time [17:43:18] well, this changed some time last summer: [17:43:24] Received-SPF: pass (google.com: best guess record for domain of wiki-research-l-bounces@lists.wikimedia.org designates 208.80.154.4 as permitted sender) client-ip=208.80.154.4; [17:43:46] now: Received-SPF: neutral (google.com: 2620:0:861:1::2 is neither permitted nor denied by best guess record for domain of wikimedia-l-bounces@lists.wikimedia.org) client-ip=2620:0:861:1::2; [17:48:50] !log added d2to1, pep8, python-setuptools-git packages in apt.wikimedia.org. Packages were created by hashar using backports from ubuntu. I added debian/changelog entry to get an upgrade friendly version and run lintian on each one of them with no errors. [17:57:09] Nemo_bis: interesting [17:57:35] Nemo_bis: I'm copy/pasting that in an email I'm sending to the people who manage our gmail apps account [17:59:59] huh, interesting [18:02:50] Nemo_bis: IP4 -> IP6 issue? [18:03:25] nah [18:03:37] just the SPF records needing an update, *perhaps* [18:05:10] no, we have ip6 already [18:05:25] there were reports from last year about google not respecting ip6 subnets but they were supposed to be fixed [18:09:59] Hmmm. Don't know about SPF, but "dig TXT wikimedia.org" shows a "v=spf1 ..." record, "dig TXT lists.wikimedia.org" *not*. Does SPF automatically go up the domain chain? [18:15:34] scfc_de: I hope not, as we are using different IPs for lists iirc :) [18:16:00] greg-g: btw, Echo uses new/non-standard email addresses for their enotifs, dunno if they warned ops [18:17:49] Nemo_bis: So it boils down to that we don't have *any* SPF record for lists at the moment (and thus Google rates it only as neutral)? [18:17:57] There was discussion with ops about their email needs way back in the day [18:21:12] (03CR) 10ArielGlenn: [C: 032] really fix getting last deployment date [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/77712 (owner: 10ArielGlenn) [18:21:54] ah, interesting; they changed the From and Reply-To, but the envelope-from in Received: is enough to make Google happy with the SPF [18:24:31] (while an explicit Sender would make Gmail and Outlook stupid) [18:41:12] (03PS1) 10Jgreen: fix checks for enable_otrs_server in exim template [operations/puppet] - 10https://gerrit.wikimedia.org/r/77728 [18:42:15] (03CR) 10Jgreen: [C: 032 V: 031] fix checks for enable_otrs_server in exim template [operations/puppet] - 10https://gerrit.wikimedia.org/r/77728 (owner: 10Jgreen) [18:53:30] (03PS1) 10Jgreen: include passwords::exim4 in exim::smtp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77729 [18:57:14] (03CR) 10Jgreen: [C: 032 V: 031] include passwords::exim4 in exim::smtp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77729 (owner: 10Jgreen) [19:08:17] (03PS1) 10Jgreen: passwords::exim4 != passwords::exim [operations/puppet] - 10https://gerrit.wikimedia.org/r/77731 [19:15:22] (03CR) 10Jgreen: [C: 032 V: 031] passwords::exim4 != passwords::exim [operations/puppet] - 10https://gerrit.wikimedia.org/r/77731 (owner: 10Jgreen) [19:27:38] Aaron|home: ping? [19:31:35] paravoid: is ceph stable now? ;) [19:31:45] yes [19:31:49] that's what I want to talk to you about [19:31:52] do you have some time now? [19:34:28] sure [19:34:32] cool [19:34:38] so ceph looks okay now, yes [19:34:49] #5084 was fixed and so did the bugs I was telling you the other day [19:35:05] and I had no new ones, even upgraded to 0.67-rc3, rebooted all boxes etc. [19:35:45] so I run thumb syncs repeatedly, works so far [19:35:56] I'd like to sync origs too and gradually put it back in production [19:36:22] I was having a look on how to resume syncs last week but I saw that the scripts in your home dir and ran away screaming [19:36:25] :-) [19:36:50] heh [19:37:05] so let's resume this process [19:37:27] and if you can, I'd like to do this together [19:37:32] so I know how to do this myself next time :-) [19:43:49] * Aaron|home cleans up his /home dir [20:04:11] (03PS1) 10Jgreen: remove unneeded exim4.otrs.erb, tweak exim4.conf.SMTP_IMAP_MM.erb for otrs transport [operations/puppet] - 10https://gerrit.wikimedia.org/r/77813 [20:05:05] paravoid, hi, how destabilizing would it be to enable ESI on production varnishes for mobile? [20:05:28] what does that mean? [20:05:40] i would like to start testing some ESI's for the banner inclusion [20:05:42] for zero [20:06:01] varnish config needs a special setting for that IIRC [20:06:48] that's correct [20:06:58] do you have something that work in labs? [20:07:09] (03CR) 10jenkins-bot: [V: 04-1] remove unneeded exim4.otrs.erb, tweak exim4.conf.SMTP_IMAP_MM.erb for otrs transport [operations/puppet] - 10https://gerrit.wikimedia.org/r/77813 (owner: 10Jgreen) [20:07:15] *works [20:07:16] not yet, only playing it on my machine [20:07:23] *with it [20:07:24] okay, that would be the first step [20:10:27] paravoid, sure, but i suspect it will still be fairly similar to my own virtual machine and very different from production settings [20:10:43] why? :) [20:11:08] I'm not sure what you're requesting? [20:11:31] (also, mail would probably a better medium, I'm sure at least mark will also be very interested in that discussion) [20:13:28] paravoid, well, i basically need "set beresp.do_esi = true;" for all requests that have X-CS set - this way I can start playing with some experimental URL query param that enables it before switching it on by default [20:13:43] uhm [20:13:48] well, no [20:13:59] playing with it in production sounds like a bad idea to me [20:15:00] hehe, not really damagingly playing - rather enabling this feature on a per-request only to see if it performs as expected [20:15:41] we have labs for that [20:16:54] (03PS2) 10Jgreen: remove unneeded exim4.otrs.erb, tweak exim4.conf.SMTP_IMAP_MM.erb for otrs transport [operations/puppet] - 10https://gerrit.wikimedia.org/r/77813 [20:17:51] labs are the same as vagrant setup on my machine - highly simplified config. Replicating all the varnish config in labs is a fairly serious undertaking that noone has successfully done as far as i known (we had this discussion a while ago). Instead what I propose is when I browse to an article with http://en.m.wikipedia.org/wiki/Article?zeroesi=true, varnish would honor the tag... [20:17:53] ...and try to fetch sub-component of that request [20:17:57] (03CR) 10Jgreen: [C: 032 V: 031] remove unneeded exim4.otrs.erb, tweak exim4.conf.SMTP_IMAP_MM.erb for otrs transport [operations/puppet] - 10https://gerrit.wikimedia.org/r/77813 (owner: 10Jgreen) [20:18:25] this will be very non-intrusive and simple to manage [20:18:39] varnish is in beta labs with the production config I think [20:18:50] mark was working with hashar on this and I think they succeeded [20:20:08] paravoid, but betalabs uses the same production varnish config as production, isn't it? How can i enable varnish in betalabs without +2 it in varnish config? [20:20:28] I haven't really looked at it, I'm sure we can find a way [20:20:37] the configs are erb, so you could check realm [20:21:45] oki, should i make an rt ticket for this? [20:22:07] uhm, it depends on what exactly you want [20:22:54] if it's a request, an RT is best; if you know how to set this up in betalabs, a gerrit patch request (or with an RT we could do it for you I guess); if it's a discussion about ESI, then raising it over email to the list sounds best, possibly involving platform as well [20:23:09] yurik [20:23:13] you can use any puppet manifest in labs [20:23:20] even if it isn't merged yet [20:23:35] that too, yes [20:23:37] not betalabs though [20:23:45] oh no? [20:23:57] ok don't know what betalabs is so hushing :) [20:23:59] but if the groundwork for betalabs is there, setting up a separate instance with the varnish prod config doesn't sound too hard [20:24:04] ottomata, that's the probelm - replicating the whole varnish config in labs is a major pain it seems that noone has managed to achieve [20:24:13] I told you that this has happened [20:24:37] paravoid, i thought you said it only happened in betalabs, which is not exactly the same as individual labs instances [20:25:07] they're very close if not the same, we tend to avoid adding specific "if $project ==" in puppet [20:28:02] oki, but the real question is this -- is enabling just one setting -- esi = true in varnish config and no other changes (all zero extension code could go into production anyway as they are enabled by query param only) [20:28:19] is worth all the hassle of replicating it in labs? [20:29:21] its not like it would really tell us something we don't know [20:29:27] (03PS1) 10Dzahn: re-add myself to paging group for monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/77816 [20:29:33] (03CR) 10jenkins-bot: [V: 04-1] re-add myself to paging group for monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/77816 (owner: 10Dzahn) [20:29:52] it's not just the setting, is it? [20:29:59] you're planning to deploy code that uses this setting [20:30:02] in production [20:30:06] to test this [20:30:15] to play with esi [20:30:22] that's the whole point of why we're having labs, isn't it? [20:30:45] (03PS2) 10Dzahn: re-add myself to paging group for monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/77816 [20:31:30] not exactly - we have a number of "testing" code in production for zero -- for example if you supply ?X-CS=250-99 url query param, zero extension would treat it as if production varnish added the X-CS header [20:32:08] this doesn't cause any issues with production as it has no security or other implications [20:32:10] (03PS1) 10Jgreen: tweak exim.conf.SMTP_IMAP_MM.erb for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77817 [20:32:39] (03CR) 10Dzahn: [C: 032] re-add myself to paging group for monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/77816 (owner: 10Dzahn) [20:32:45] "we test code in production, so why are you opposed in us testing even more code in production?" [20:32:50] :-) [20:32:53] hehe [20:33:09] hey, writing from inside plane at gate [20:33:10] well, visual editor was also enabled for a small number of alpha users at first :) [20:33:25] that's not the same now is it [20:33:29] just changed monitoring / paging timezones for Roan, leslie and I to HKT [20:33:41] anyone else who is going feel free to use the HKT zone [20:33:42] cya [20:33:45] anyway, I think this needs some better coordination anyway [20:33:54] I'd really like to hear your ESI plans too [20:34:06] paravoid, but i emailed about the plans loooong time ago [20:34:12] you did? [20:34:13] re zero rearchitecture [20:34:14] RoanKattouw_away: LeslieCarr: changed your paging timezones to HKT, cya [20:34:47] paravoid, subject: The summary of new zero architecture proposal [20:35:15] and mark replied [20:35:15] (03CR) 10Jgreen: [C: 032 V: 031] tweak exim.conf.SMTP_IMAP_MM.erb for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77817 (owner: 10Jgreen) [20:43:50] (03PS1) 10Jgreen: more exim4.conf template tweaks for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77819 [20:44:42] (03CR) 10Jgreen: [C: 032 V: 031] more exim4.conf template tweaks for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77819 (owner: 10Jgreen) [20:51:49] Jeff_Green: I see you worked on files/ganglia/collect_exim_stats_via_gmetric ... about 1.5 years ago. Do you know the current fundraising exim setup? [20:52:19] scfc_de: yes [20:54:05] Jeff_Green: What's the host name in (current) Ganglia that displays those exim stats? [20:55:30] scfc_de: aluminium.wikimedia.org [21:00:05] (03PS1) 10Pyoungmeister: adding testsearch1001-1003 to site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77823 [21:01:17] (03CR) 10Pyoungmeister: [C: 032] adding testsearch1001-1003 to site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77823 (owner: 10Pyoungmeister) [21:04:59] Jeff_Green: Ah, okay, you really have "no_group metrics". When I tried that on tools-mail, I had to add "-g exim" to the gmetric call, otherwise I didn't see anything (or may I wasn't patient enough). Thanks! [21:07:15] cmjohnson1: hey, did you image the testsearch boxes? [21:07:33] notpeter: yep and added keys...should be able to ssh [21:07:57] indeed I can! did you put anything in puppet for them? [21:08:11] no, i wasn't sure where to add them [21:08:11] they seem to think that they're lucene frontend boxes [21:08:15] odd [21:08:16] ok, cool [21:08:17] yeah [21:08:19] very odd [21:08:20] ok [21:08:27] no worries, ican keep looking [21:08:44] thanks! [21:22:41] (03PS1) 10Jgreen: fix truncated mysql query in exim4 conf for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77826 [21:26:05] (03PS1) 10Pyoungmeister: site.pp: making search node defs more strict [operations/puppet] - 10https://gerrit.wikimedia.org/r/77827 [21:27:47] (03CR) 10Pyoungmeister: [C: 032] site.pp: making search node defs more strict [operations/puppet] - 10https://gerrit.wikimedia.org/r/77827 (owner: 10Pyoungmeister) [21:27:53] (03CR) 10Jgreen: [C: 032 V: 031] fix truncated mysql query in exim4 conf for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77826 (owner: 10Jgreen) [21:28:17] Jeff_Green: merging your thing [21:28:29] already done! [21:28:33] woah [21:28:35] there is no spoon [21:28:37] i did yours [22:08:08] (03PS1) 10Jgreen: more exim4.conf for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77829 [22:09:05] (03CR) 10Jgreen: [C: 032 V: 031] more exim4.conf for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77829 (owner: 10Jgreen) [22:17:43] (03PS1) 10Jgreen: enable_external_mail for role::otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77831 [22:19:19] (03CR) 10Jgreen: [C: 032 V: 031] enable_external_mail for role::otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77831 (owner: 10Jgreen) [22:28:43] Aaron|home: still here? [22:28:45] (03PS1) 10Jgreen: grr. disable relay in template when we haven't enabled it in parameters [operations/puppet] - 10https://gerrit.wikimedia.org/r/77832 [22:29:02] Aaron|home: not sure if you're interested, there's a ceph summit happening today (now) and tomorrow [22:29:09] Aaron|home: http://ceph.com/cds/ [22:29:17] (virtual summit) [22:29:49] (03CR) 10Jgreen: [C: 032 V: 031] grr. disable relay in template when we haven't enabled it in parameters [operations/puppet] - 10https://gerrit.wikimedia.org/r/77832 (owner: 10Jgreen) [23:07:18] (03PS1) 10Jgreen: more exim template tinkering for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77835 [23:08:34] (03CR) 10Jgreen: [C: 032 V: 031] more exim template tinkering for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77835 (owner: 10Jgreen) [23:13:24] RobH: Hey. Seems you're on RT duty? Heads-up about https://bugzilla.wikimedia.org/show_bug.cgi?id=52556 if you haven't seen it already. [23:13:33] There's growing concern on the mailing lists about it. [23:13:34] good call [23:14:00] RobH: yeah, actually you or paravoid might want to take a look at that [23:14:03] uhhh, greg just replied... [23:14:06] why me? :) [23:14:10] greg-g: yea but you just replied! [23:14:22] paravoid: you konw why! "email" and now "dns" keywords! [23:14:22] spf should fall back to the top level domain for starters [23:14:26] ah [23:14:41] and absence of an spf isn't indicative of spam, although I guess it helps having it [23:15:16] RobH: yeah, just asking for more info from people, ugh, I'm also playing in-between issue trackers (ticket open with OIT who has a ticket open with Google) [23:15:47] i'll keep an eye on the ticket, but i think we need oit's google reply info before we go changing things [23:16:05] k [23:16:10] RobH: For lists.wikimedia.org? [23:16:15] but regardless, we can setup spf [23:16:18] what would google's reply have to do with lists' spf? [23:16:40] i just rahter not change things on our configuration if oit is troubleshooting spam issue with google. [23:16:45] aha! [23:16:52] wait, I have an email for you, RobH [23:17:05] whatever google's reply, adding the spf record to our end sounds like a good idea. [23:17:11] * greg-g didn't read all of their email [23:17:19] that's basically it [23:17:30] ugh, what's your wmf email, RobH ? [23:17:38] rob@ robh@ rhalsell@ [23:17:41] take yer pick [23:17:44] :) [23:17:46] rhalsell is official [23:17:59] (but they are all mine, im the first rob damn it!) [23:17:59] (03PS1) 10Jgreen: don't spamassassin filter at exim for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77837 [23:18:48] sent [23:18:56] Jeff_Green: you realize that you doing otrs stuff will make you the ops volunteer favorite / goto person ;] [23:19:06] shhhhh [23:19:12] i'm going to filter out that text string in every communication mode I have [23:19:20] although...it's perl. hmmm. [23:19:32] well we all know what "special projects" stands for [23:19:44] RobH: FWIW, wikimedia.org already has a TXT record. lists.wikimedia.org does not. [23:19:53] Though I'm probably just repeating what you already know. [23:19:57] urgh, that was overlooked a loooong time ago [23:20:02] and i guess never fixed... [23:20:21] i would like this so much better if I weren't using git+gerrit+puppet especially with our puppet manifests [23:20:32] (03CR) 10Ori.livneh: [C: 031] "Ah, Faidon, good call -- thanks." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77657 (owner: 10Ori.livneh) [23:20:37] i could have configured a fleet of 100 servers by hand by now [23:20:43] :-) [23:21:22] lol, google's suggestion is so crap [23:21:23] (03CR) 10Jgreen: [C: 032 V: 031] don't spamassassin filter at exim for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77837 (owner: 10Jgreen) [23:21:28] I wonder why do we even bother contacting them [23:21:37] they suggested changing our SPF from neutral to softfail [23:21:48] paravoid: sigh. same as a few months ago [23:21:55] makes sense doesn't it? [23:21:57] and they suggested we deploy a typo [23:22:08] let's get advice on how to build a mailserver from some tier 1 engineers [23:22:11] what could possibly go wrong! [23:22:18] *tier 1 support engineers [23:22:46] sorry [23:22:54] neutral to softfail should be fine, not a *bad* idea [23:22:54] * greg-g didn't know if it was crap or not [23:23:05] oh don't worry greg-g, I'm not blaming you at all :) [23:23:13] whew [23:23:15] :) [23:23:29] anyway [23:23:48] we're not doing dkim though, we probably should [23:23:55] we do for fundraising [23:23:55] hmm, someone is commiting exim changesets all day [23:24:05] sounds like a special project to me [23:24:11] :-) [23:24:15] * paravoid ducks [23:24:17] and goes to sleep [23:24:19] paravoid: watch it or you'll find yourself receiving a lot of mail you didn't want [23:24:28] :D [23:24:42] what to do with this OTRS mail during the outage...hmmmmmmmmm [23:24:51] hahaha [23:25:07] our exim template is utterly insane [23:25:12] and getting worse :-P [23:25:23] it's fiiiine [23:25:55] I've seen *much* crazier exim setups [23:26:16] the final config is fine, but the template is crap [23:26:30] ah [23:28:33] (03PS1) 10Pyoungmeister: ORI FOR KING OF DEPLOYMENTS [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 [23:30:12] (03CR) 10Asher: [C: 031] "looks legit to me and jenkins" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [23:31:39] (03CR) 10Werdna: [C: 031] "Seems to behave like the current version" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [23:32:38] (03CR) 10Greg Grossmeier: [C: 031] ORI FOR KING OF DEPLOYMENTS [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [23:33:24] the good news is, if we ever need to elect a king of deployments, we know who's going to win [23:33:45] (03CR) 10Greg Grossmeier: "Forgot to comment I was so excited by the MASSIVE improvement in our deployment infrastructure. This kind of thinking will really get us c" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [23:34:46] notpeter: +2 [23:34:49] er 1, I mean 1! [23:34:53] (03CR) 10Pyoungmeister: "greg: we can push it one step further than devop borat: https://twitter.com/DEVOPS_BORAT/status/281624951298093056" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [23:35:10] haha [23:36:09] (03CR) 10Alex Monk: [C: 031] "Seems legit!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [23:37:33] (03PS1) 10Jgreen: puppetize otrs exim system_filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/77840 [23:37:34] (03PS1) 10Jgreen: remove flat-file otrs system_filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/77841 [23:40:27] gar gerrit. gar gar gerrit. [23:41:04] Elsie/RobH: Is adding a DNS record all that needs to be done? Is it that simple? [23:41:07] (03Abandoned) 10Jgreen: remove flat-file otrs system_filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/77841 (owner: 10Jgreen) [23:41:56] Krenair: not sure, that's the current theory [23:41:58] Krenair: Adding a TXT record may help. I have NFI what dkim is. [23:42:22] https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail [23:43:52] (03PS1) 10Jgreen: remove system_filter.otrs flat file, deprecated [operations/puppet] - 10https://gerrit.wikimedia.org/r/77842 [23:44:04] (03CR) 10Jgreen: [C: 032 V: 031] remove system_filter.otrs flat file, deprecated [operations/puppet] - 10https://gerrit.wikimedia.org/r/77842 (owner: 10Jgreen) [23:44:18] (03CR) 10Jgreen: [V: 032] remove system_filter.otrs flat file, deprecated [operations/puppet] - 10https://gerrit.wikimedia.org/r/77842 (owner: 10Jgreen) [23:44:47] greg-g, Krenair: I imagine the DKIM request should be split out to a separate Bugzilla (or RT) ticket. [23:44:59] SPF and DKIM seem to be independent requests. [23:45:17] (03PS1) 10Jgreen: puppetize otrs system_filter stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/77843 [23:45:49] they are, I was kind of using that as a catchall bug for now :) [23:45:59] maybe inappropriately [23:46:14] hrmm [23:46:28] i think we can add the spf, just trying to figure out what part of it i should strip out [23:46:33] since we dont do list traffic via google [23:46:43] the include google part yes [23:46:57] but the ?all stays i assume for the other stuff ahead of it [23:46:57] (03CR) 10Jgreen: [C: 032 V: 031] puppetize otrs system_filter stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/77843 (owner: 10Jgreen) [23:47:02] but i dont want to assume on this. [23:47:48] * Krenair has no idea what RobH is on about [23:47:51] lists.wikimedia.org. 600 IN TXT "v=spf1 ip4:91.198.174.0/24 ip4:208.80.152.0/22 ip6:2620:0:860::/46 ?all" [23:48:03] how to append this spf record to our dns so google can go suck it. [23:48:17] (that wasnt nice at all, i mean so google doesnt flag us as spam) [23:48:18] ;] [23:48:34] :) [23:48:38] paravoid: you have experience in this and am i right? [23:48:44] so we can play nicely and all the unicorns will be happy! [23:48:46] * greg-g goes afk for a bit [23:48:55] Jeff_Green: meh, damned unicorns are jerks [23:49:04] folks just dont realize cuz they havent met them. [23:49:07] be nice. they have stabby horns. [23:49:31] unicorns = less majestic narwhals. [23:50:02] http://www.weebls-stuff.com/songs/Narwhals/ [23:51:01] Krenair: it may not solve the entire issue, but its step one in fixing i suppose [23:52:33] RobH: I think Mark Bergsma set up SPF for wikimedia.org (cf. https://bugzilla.wikimedia.org/show_bug.cgi?id=1609). Maybe he can weigh in. [23:52:49] yea im appending to the ticket now [23:53:40] scfc_de: nice, yea i think im right [23:53:46] from watching what hashar did [23:54:01] but, i'll ping mark to look at it later when he is about and working tomorrow [23:54:16] and now i have a bunch of linked tickets to reference =] [23:56:45] doing RT triage this week is either going to be the easiest ever, or the worst. [23:57:16] (03PS1) 10Jgreen: system-filter template formatting cleanup + comments [operations/puppet] - 10https://gerrit.wikimedia.org/r/77844 [23:57:18] lighter influx i imagine due to conference, yet no one to handle many requests, due to same. [23:58:13] (03CR) 10Jgreen: [C: 032 V: 031] system-filter template formatting cleanup + comments [operations/puppet] - 10https://gerrit.wikimedia.org/r/77844 (owner: 10Jgreen)