[00:51:02] what's the way to get access to EventLogging data? I want to run some one-off queries on the data we collect for MediaViewer [02:07:45] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [02:12:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3793 MB (3% inode=99%): [02:15:26] !log LocalisationUpdate completed (1.24wmf3) at 2014-05-12 02:14:23+00:00 [02:15:37] Logged the message, Master [02:20:44] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3435 MB (3% inode=99%): [02:26:57] !log LocalisationUpdate completed (1.24wmf4) at 2014-05-12 02:25:54+00:00 [02:27:04] Logged the message, Master [02:46:44] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 9 below the confidence bounds [03:00:45] RECOVERY - Disk space on virt0 is OK: DISK OK [03:14:46] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon May 12 03:13:40 UTC 2014 (duration 13m 39s) [03:14:52] Logged the message, Master [03:35:45] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 9 below the confidence bounds [03:56:45] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [04:54:47] (03PS1) 10Springle: Support setting session variables for host DB connections. Makes it easy to control MariaDB 10 named replication channels without a bunch of code changes. [operations/software] - 10https://gerrit.wikimedia.org/r/132917 [04:55:47] (03CR) 10Springle: [C: 032] Support setting session variables for host DB connections. Makes it easy to control MariaDB 10 named replication channels without a bunch of [operations/software] - 10https://gerrit.wikimedia.org/r/132917 (owner: 10Springle) [05:19:15] (03CR) 10Springle: bacula: allow mysqldumps to be kept locally (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 (owner: 10Alexandros Kosiaris) [05:19:21] (03CR) 10Springle: bacula: allow mysqldumps to be kept locally (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 (owner: 10Alexandros Kosiaris) [05:34:44] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 8 below the confidence bounds [06:06:31] (03PS5) 10Giuseppe Lavagetto: Fix the use of $nagios_group. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132187 [06:16:21] <_joe_> JonSpenc3r [06:16:35] <_joe_> oh gotta change my laptop password, shit [06:16:53] <_joe_> luckily enough that was a one-use password [06:17:15] heh [06:17:29] IRC is *not* a password store. [06:18:03] <_joe_> brion: I tend to use new passwords for the computers, ones I never used before, because copy/paste fail happen [06:18:12] wise! [06:18:59] <_joe_> and when you're sure synergy is working and it's not, well... FAIL [06:21:54] ah, synergy [06:22:04] i used to use that for linux & mac side-by-side [06:22:09] now i use wifi all the time and it’s too much of a pain in the ass [06:22:16] i just pull up a virtual machine for linux instead :) [06:22:39] wifi latency + mouse == awful [06:23:24] brion: why do you prefer mac os ?? [06:23:37] <_joe_> well I do have wifi at home and latency is not bad [06:23:44] and morning _joe_ and brion [06:23:47] <_joe_> but I do have a badass AP :) [06:23:52] <_joe_> morning matanya [06:24:50] matanya: a) it’s unix-y enough to feel like a native web server environment on the CLI b) i can test mac/ios stuff on it *and* run windows and linux c) it’s so PRETTY [06:25:16] num num shiny drop shadows [06:25:23] interesting answer [06:25:30] matanya: you'd be surprised [06:25:34] I've been in both the WMF and Google offices [06:25:37] same deal [06:25:40] desktops on Ubuntu [06:25:43] Macbooks for laptops [06:25:49] it's not just limited to biron [06:26:09] <_joe_> brion: your trading pretty for principles!!!! You will burn in GNU Hell!!! [06:26:13] :D [06:26:19] GNU is Not Unix [06:26:21] <_joe_> (I do have a macbook air) [06:26:37] <_joe_> Jasper_Deng: luckily enough, as UNIX sucks [06:26:43] i used mac for about two days, and was so annoyed of the thought i'm holding a machine that costs like a few months work with hardware i can get for less than a half [06:26:50] heh [06:26:52] * Jasper_Deng high fives matanya on that note though [06:27:19] <_joe_> matanya: macbook air has the better price for pound in his category [06:27:28] <_joe_> ok it's a pretty small league, agreed [06:27:34] the apple machines are either a good deal or a HUGE rip-off depending on the model, the options, and where you are in the product cycle [06:27:39] apart from the unusable OS [06:27:59] <_joe_> but the lenovo carbon X-1, its main contender, actually costs a little more [06:28:14] i tend to build them on my own [06:28:24] waaay cheaper [06:28:25] <_joe_> matanya: laptops? [06:28:27] i’m still kinda looking for a really nice linux-friendly laptop, but i love my hi-dpi screens and that’s an area where things are still… immature on linux [06:28:33] those too _joe_ [06:28:52] getting harder over the years [06:29:20] everything’s moving toward compact designs and SoC consolidation [06:29:25] there’s not much to assemble anymore [06:29:50] i can’t even upgrade the storage on this mac, it’s soldered on the mainboard [06:30:04] have to go external :) [06:30:21] nowadays it costs less to buy a ready made device [06:31:10] but i hate the idea i'm tied to vendor decision on the amount of ram (for instance) i have. what if i want to expand ? [06:31:26] planned obselescence :D [06:31:43] CONSUME [06:32:58] matanya: have you met mareklug? [06:33:02] irc nick Sir_Designer [06:33:15] ok i’m going to try printing my boarding pass at the hotel front desk again. wish me luck! [06:33:17] i haven't Jasper_Deng [06:33:26] yeah you don't want to let him know you said these [06:33:42] he's a devout fan of the Mac [06:34:20] there are people in the cult [06:36:52] i haz boarding pass! i may yet make it home today [06:37:33] <_joe_> brion: can't you just show a QR code from your phone? [06:37:54] _joe_: in theory, but swiss air’s mobile code thing is broken or something and it would only let me download the printable pdf :P [06:38:23] i’ve done it before on other airlines, but i still feel safer with paper [06:39:10] <_joe_> it feels like a real ticket :) [06:39:12] in Chicago once at security, they had 3 lines for people with paper passes, and only 1 for people with QR codes...ever since then I always print it out [06:41:03] _joe_: any news with puppet 3 migration ? [06:41:43] <_joe_> matanya: I'm working on it; today I should have made the last fixes and we should have a clear report of what remains to be done [06:41:56] yay [06:53:37] ok i’m gonna coffee up and check out… see you guys from back in sf [07:08:43] (03PS1) 10Giuseppe Lavagetto: Get rid of redundant and confusing $cluster defs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 [07:09:10] morning [07:09:20] (03CR) 10Giuseppe Lavagetto: [C: 032] Fix the use of $nagios_group. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132187 (owner: 10Giuseppe Lavagetto) [07:09:34] <_joe_> akosiaris: good morning! [07:18:14] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon May 12 04:17:28 2014 [07:18:34] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon May 12 07:18:29 UTC 2014 [07:18:45] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [07:19:48] (03PS2) 10Giuseppe Lavagetto: Get rid of redundant and confusing $cluster defs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 [07:21:03] (03CR) 10jenkins-bot: [V: 04-1] Get rid of redundant and confusing $cluster defs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 (owner: 10Giuseppe Lavagetto) [07:22:48] (03PS3) 10Giuseppe Lavagetto: Get rid of redundant and confusing $cluster defs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 [07:23:07] <_joe_> I already knew that, jenkins-bot [07:44:08] <_joe_> matanya: http://puppet-compiler.wmflabs.org/change/132921/html/ here is what remains to be done as zero-order fixes - the list is still building though. [07:45:01] <_joe_> also, before embarking in some change on these topics, please coordinate with me :) [07:45:08] got much better _joe_ [07:46:39] <_joe_> I'm tackling the protoproxy module. [07:48:08] (03PS1) 10Matanya: ferm: domain is a fact, fully qualify [operations/puppet] - 10https://gerrit.wikimedia.org/r/132922 [07:48:34] this ^ one is easy _joe_ want to merge? :) [07:49:53] <_joe_> matanya: we have tons of these for puppet 3 [07:50:31] <_joe_> just navigate to one single host in the page I gave you and look at the compilation errors and warnings under puppet 3 [07:51:06] this is how i found this one ... [07:51:43] <_joe_> ok :) [07:52:07] <_joe_> (btw, table, chain and rule should not have @ in front as well? [07:52:14] <_joe_> or are they functions?) [07:53:56] i haven't check, honestly [08:10:13] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "I somehow managed to remove the mariadb submodule when rebasing." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 (owner: 10Giuseppe Lavagetto) [08:10:59] <_joe_> did I mention I HATE git submodules? [08:11:01] <_joe_> :) [08:11:28] <_joe_> springle: do we really need that mariadb thing anyway? [08:14:01] _joe_: nah i think we can switch everything to sqlite [08:14:43] <_joe_> I heard of a new version of mongodb which is as cool, only slightly faster [08:14:49] <_joe_> it's called /dev/null [08:15:00] <_joe_> results are comparable to mongodb, as well [08:15:15] <_joe_> it's concurrent and webscale [08:15:40] <_joe_> never heard anyone complaining his /dev/null is too slow. [08:18:48] <_joe_> springle: dbstore1002 seems to be in dire straits [08:19:00] _joe_: yeah, it's unhappy [08:19:24] <_joe_> ok sorry I opened icinga and I had some 25 critcal alarms that are new :P [08:19:47] np :) will ack if it takes too much longer [08:23:07] one thing with tokudb... it really takes its sweet old time to recover. think innodb wins that race [08:27:06] <_joe_> springle: never had to recover a tokudb store [08:30:10] my guess is that when one gets away with online DDL by delaying flushes to leaf nodes all day, suddenly having to clean shit up really hurts [08:32:55] <_joe_> springle: yeah [08:51:48] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [08:54:44] <_joe_> oh I should really fix this *today* [08:58:53] (03CR) 10Hashar: "My concern was disclosing article names from the private wiki. That is nicely fixed by https://gerrit.wikimedia.org/r/#/c/52608/ which se" [operations/puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [09:07:43] (03PS7) 10Giuseppe Lavagetto: protoproxy: call enable_ipv6_proxy in a sane way [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [09:08:00] <_joe_> matanya, akosiaris, please take a look [09:18:07] (03PS4) 10Giuseppe Lavagetto: Get rid of redundant and confusing $cluster defs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 [09:19:28] (03CR) 10Giuseppe Lavagetto: "Amended." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 (owner: 10Giuseppe Lavagetto) [09:26:18] (03CR) 10Matanya: [C: 031] "lgtm" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 (owner: 10Giuseppe Lavagetto) [09:28:48] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 7 below the confidence bounds [09:34:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:36:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:38:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:40:04] springle: so a wrapper script is going to be messy I think and I am leaning towards the do-it-in-stages approach. Maybe using this: http://wiki.bacula.org/doku.php?id=bacula_manual:the_job_resource#runscript_body-of-runscript [09:40:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:40:11] I am evaluating right now [09:40:24] oh and context is https://gerrit.wikimedia.org/r/132214 [09:41:00] (03PS1) 10Springle: Allow yet more file handles for TokuDB. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132924 [09:42:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:43:24] akosiaris: ok, so long as we can dump locally per-schema in parallel, i'll be a happy camper :) [09:44:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:44:20] for now I have the original dumps shell script i committed running manually on mondays [09:46:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:48:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:49:13] (03CR) 10Springle: [C: 032] Allow yet more file handles for TokuDB. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132924 (owner: 10Springle) [09:49:49] (03CR) 10Alexandros Kosiaris: [C: 04-1] "So this is right but for the wrong reason. So, domain is not a fact in this case, it is a parameter to the ferm::rule class and as such it" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132922 (owner: 10Matanya) [09:50:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:52:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:52:31] (03PS2) 10Matanya: ferm: fully qualify facts [operations/puppet] - 10https://gerrit.wikimedia.org/r/132922 [09:54:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:55:15] (03PS3) 10Matanya: ferm: fully qualify variables [operations/puppet] - 10https://gerrit.wikimedia.org/r/132922 [09:56:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:58:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:29:00 2014 [09:58:56] RECOVERY - Puppet freshness on db72 is OK: puppet ran at Mon May 12 09:58:55 UTC 2014 [10:01:06] PROBLEM - Puppet freshness on db72 is CRITICAL: Last successful Puppet run was Mon May 12 09:58:55 2014 [10:01:59] <_joe_> how can this be a problem? something is not working in this check. And I have no time to check it now. [10:09:07] (03PS3) 10Gage: initial debianization [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 [10:12:09] (03CR) 10Gage: "Revised:" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [10:17:05] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Comments inline" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [10:24:47] <_joe_> akosiaris: snap, The inclusion got yanked by my stupidity, thanks for spotting the typo :) [10:28:29] what an eye sight akosiaris ! i saw it and missed that typo [10:29:25] RECOVERY - Puppet freshness on db72 is OK: puppet ran at Mon May 12 10:29:21 UTC 2014 [10:38:15] (03PS1) 10Giuseppe Lavagetto: Fix dynamic scope lookup in maintenance.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/132933 [10:41:21] (03PS8) 10Giuseppe Lavagetto: protoproxy: call enable_ipv6_proxy in a sane way [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [10:58:21] (03CR) 10Giuseppe Lavagetto: [C: 031] protoproxy: call enable_ipv6_proxy in a sane way [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [11:01:02] !log killed bunch of slow Flow\Formatter\ContributionsQuery::queryRevisions queries on flowdb [11:01:09] Logged the message, Master [11:06:34] (03PS5) 10Giuseppe Lavagetto: Get rid of redundant and confusing $cluster defs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 [11:26:55] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [11:28:04] (03CR) 10Giuseppe Lavagetto: [C: 032] "There are no differences in compilation, so this is really a noop change. I'll wait to merge this if someone wants to CR it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 (owner: 10Giuseppe Lavagetto) [11:29:25] (03PS9) 10Giuseppe Lavagetto: protoproxy: call enable_ipv6_proxy in a sane way [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [11:54:04] (03PS1) 10Giuseppe Lavagetto: Explicitly set config in openstack template. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132934 [12:33:14] (03Abandoned) 10Tim Landscheidt: Labs: Provide symbolic links to dumps for compatibility [operations/puppet] - 10https://gerrit.wikimedia.org/r/119438 (https://bugzilla.wikimedia.org/62296) (owner: 10Tim Landscheidt) [13:08:39] (03PS10) 10Giuseppe Lavagetto: protoproxy: call enable_ipv6_proxy in a sane way [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [13:10:27] <_joe_> oh my, I hope this time it is right. [13:15:08] (03PS11) 10Giuseppe Lavagetto: protoproxy: call enable_ipv6_proxy in a sane way [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [13:54:08] (03PS1) 10Giuseppe Lavagetto: Fix dynamic scope lookup in openldap module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132942 [13:58:15] (03CR) 10Matanya: "nitpick" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132933 (owner: 10Giuseppe Lavagetto) [14:11:04] <_joe_> The import keyword is deprecated. Using it will cause deprecation warnings to be logged on the puppet master, and we plan to remove import completely in Puppet 4 [14:11:08] <_joe_> oh, yeah. [14:11:39] the letter e will be deprecated in puppet 4 and removed completely in puppet 5 [14:12:59] <_joe_> the official language of puppet documentation will be swedish! [14:13:18] <_joe_> (ref. Woody Allen's Bananas) [14:15:28] mark: you mean in puppt 4 and 5 respectively [14:17:13] <_joe_> I can't really understand why one would want to remove it - does it creates a great deal of blockers anywhere in the development of new features? [14:17:24] import you mean? [14:17:30] <_joe_> it's not even something anyone used outside of site.pp [14:17:32] <_joe_> chasemp: yes [14:17:52] <_joe_> it's just harming existing install bases without a good reason [14:17:53] yeah I had the same thought, although I think the design patterns it allows for are considered not friendly [14:18:05] like if you have puppet/templates/foo.erb [14:18:20] and you try to run via puppet apply it trips all over itself in many instances [14:18:35] I also know that puppet internally, like their labs last I spoke to them has a specific test case/ use case [14:18:43] they weren't using masters, it was all puppet apply and cron [14:18:49] which seems insane, considering...they are puppet [14:19:14] <_joe_> lol [14:19:17] but my instinct is based on internal use patterns and the heavy modules push import _shouldn't_ be required, idk, more harm than good? [14:19:25] <_joe_> talk about eating your own dogfood [14:19:33] I was genuinely wtf'ed by it [14:19:49] <_joe_> chasemp: it's not that we need import, it's that we have a ton of code written using import [14:20:11] yep, I feel ya, just saying I think that's their point of view [14:20:17] <_joe_> so why force us to a tedious, dangerous migration that will make us miserable without a real reason? [14:20:34] <_joe_> It's clear it's not their POV [14:20:45] well it's very unruby but I guess 'one way to do it'? type thinking [14:21:05] keep in mind a lot of their $$ is in supporting enterprise deployments so they are probably learning as they go along now [14:21:20] and maybe they learned it's complicated to debug or why support to patterns idk [14:21:26] two patterns I mean [14:22:01] <_joe_> yeah, ok, it's business-driven development. which most of the time produces shit. [14:22:16] true [14:22:43] <_joe_> (not that it can't create great code, it's only software designed to the needs of the business and not of the user) [14:23:17] <_joe_> now you can stop thinking about Larry Ellison :P [14:24:31] (03PS1) 10Giuseppe Lavagetto: Fix dynamic scoping in iptables.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/132945 [14:26:56] I feel icky anytime I'm toeing a line that feels like defending puppet. Puppet is like the tax guy, I like driving on good roads but I still don't enjoy handing over my money. [14:27:46] (03CR) 10Rush: [C: 031] "I believe this is right" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132945 (owner: 10Giuseppe Lavagetto) [14:28:13] <_joe_> chasemp: the point is that puppet is like the tax guy, but you have the possibility to pick another 5-6 tax services that suck at the same level, but no other one is trying that hard to render its users miserable. [14:28:43] <_joe_> and I mean, I don't dislike puppet per-se [14:28:54] <_joe_> it's just all this deprecation nonsense [14:28:57] oh you love it [14:29:09] I heard you singing in the shower about it in athens, pretty sure from the hallway [14:29:11] :) [14:29:30] <_joe_> this is a lie, and I can prove it. [14:29:51] <_joe_> If you ever heard me sing, you will still wake up at night with nightmares [14:30:02] :D [14:30:30] (03CR) 10Rush: [C: 04-1] "hey man, i think still missing python-twisted-web?" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [14:34:12] puppet is pretty awful actually, it's just there's no better option yet, so it's what we've got. The same could be said of the state of most software in the world :) [14:36:29] I recently found a 1200 line shell mess I used at a place 8 years ago before puppet to do build standards so.....I guess perspective :) [14:36:32] <_joe_> bblack: mmmh not really, there are alternatives to puppet that do not suck more that it [14:37:27] <_joe_> *than [14:37:55] <_joe_> bblack: I think if you can start fresh today you'll probably use ansible [14:38:12] (03PS1) 10Jackmcbarn: Restrict the move-categorypages right on enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132947 (https://bugzilla.wikimedia.org/65221) [14:38:14] _joe_: was that used at expedia? [14:38:22] <_joe_> chasemp: nah, puppet :) [14:38:41] no one I know has lived with ansible, I'm still cautious about it [14:38:54] <_joe_> chasemp: that was us unix freaks, I'm sure windows platform had something more horrible [14:41:21] at the devops meetup here it's all chef folks. literally 125 members, 20-ish at a time and I'm the only puppet person [14:41:30] which I find weird but they all seem to love it [14:44:22] <_joe_> oh yeah chef [14:45:36] <_joe_> overengineered, in ruby, and with a crazy representation of the state of a system. I have love words for all these systems :) [14:50:49] manybubbles: So which of us should do the SWAT today? [14:50:56] _joe_and chasemp: our local devops group is run by a chef employee and one of the members works for ansible. so long as you use something for configuration management, they are happy with you [14:51:30] anomie: either way is fine with me. I can do it just to say I've deployed while at a coffee shop. [14:51:37] manybubbles: Go for it [14:53:16] anyone around to support the visual editor changes? just check if they worked and debug them if they break the world? [14:53:34] if so I'll merge them to the deployment branches and then build the submodule update [14:53:39] or you can, kind stranger [14:53:56] manybubbles: they totally work [14:56:46] (03CR) 10Giuseppe Lavagetto: [C: 032] Fix dynamic scope lookup in openldap module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132942 (owner: 10Giuseppe Lavagetto) [14:58:51] (03CR) 10Giuseppe Lavagetto: [C: 032] Explicitly set config in openstack template. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132934 (owner: 10Giuseppe Lavagetto) [15:00:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon May 12 11:59:40 2014 [15:00:35] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Mon May 12 15:00:32 UTC 2014 [15:07:51] _joe_, chasemp: yeah there are other tools, but puppet really is (imho) the best option today still, in spite of its flaws. But there's a long way to go yet. People will write better tools eventually. [15:08:32] puppet has enjoyed much 'early mover' benefits not due to any special quality or robustness [15:08:39] an interesting historical sidenote in all of this that's sometimes not linked enough: https://www.usenix.org/legacy/publications/library/proceedings/lisa02/tech/full_papers/traugott/traugott_html/index.html [15:09:11] note Luke Kanies was (one of the) authors on that [15:09:33] but the ideas in there really get at the heart of why dependency hell is necessary, although not necessarily how best to solve it. [15:09:49] that paper pre-dates puppet considerably, but it's part of what lead to it. [15:09:59] this is good stuff, have not read this before [15:11:20] sorry, I guess he wasn't an author of the paper, he was an author of one of the iterations of the tool ISConf mentioned in the paper :) [15:16:29] twkozlowski: around for me to merge and deploy your change? [15:16:39] still building the submodule updates for the ve change [15:17:16] MatmaRex: regarding the ve changes - they contain i18n stuff so I imagine I'll have to scap [15:18:01] manybubbles: uhm, possibly. i don't know about deployments [15:18:44] MatmaRex: k. I'll do it. would you prefer that I deploy both changes at the same time (faster, less work for me) or the wmf4 first so you can verify it on test2/mw.org [15:18:48] basically, how confident are you? [15:18:53] <_joe_> bblack: well, puppet _is_ a state machine, sort of :P [15:19:19] manybubbles: it definitely works :) [15:19:26] (both changes that it) [15:19:33] that is* [15:23:01] _joe_: it's more a declarative meta-programming language for the entire turing machine that comprises a single host, or arguably for the larger turing machine which is built from many hosts. But there are lots of gaps in what puppet can do about the problem when you start looking at it like that :) [15:36:46] back1 [15:36:53] i lost network.... [15:37:01] anomie: did you happen to pick up where I left off? [15:37:56] manybubbles: I didn't notice you disappeared [15:38:06] anomie: sweet. just ignore the email I sent you. scapping [15:38:13] !log manybubbles Started scap: update visual editor for swat deploy [15:38:19] Logged the message, Master [15:40:21] * anomie was busy code reviewing [15:42:38] anomie: that's all I've done this morning too! [15:43:00] I'm running scap in screen just in case I lose wifi again.... [15:49:54] Reedy: scap question: is this ok? on mw1186 returned [255]: ssh: connect to host mw1186 port 22: Connection timed out [15:54:55] twkozlowski: I had a network outage that delayed my swat progress - I'm just finishing up deploying the ve change. It'll take a few minutes. [15:55:15] there isn't anything on the calendar after so I can deploy your config change when that is done if you'd like to check it [16:00:02] okay [16:07:11] !log manybubbles Finished scap: update visual editor for swat deploy (duration: 28m 58s) [16:07:18] Logged the message, Master [16:07:26] MatmaRex: please verify the ve fix [16:07:30] twkozlowski: doing yours [16:07:38] (03CR) 10Manybubbles: [C: 032] Enable Extension:NewUserMessage on ukwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132747 (https://bugzilla.wikimedia.org/65125) (owner: 10Odder) [16:07:47] (03Merged) 10jenkins-bot: Enable Extension:NewUserMessage on ukwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132747 (https://bugzilla.wikimedia.org/65125) (owner: 10Odder) [16:09:48] !log manybubbles synchronized wmf-config/InitialiseSettings.php [16:09:56] Logged the message, Master [16:10:03] twkozlowski: SWATed [16:11:12] i get broken ccs/js [16:11:45] matanya: just for visual editor or for everything? [16:11:47] * matanya checking ganglia for bits [16:12:19] ^d: can you read scrollback and see if I did something wrong with the last scap? I haven't scapped in a while [16:12:47] looks like only VE [16:12:49] does anyone have a url that's broken? [16:12:49] <^d> manybubbles: Only problem was that single host? [16:13:13] and now it looks ok [16:13:26] ^d: yeah, looks like I broke some static assets somehow [16:13:42] <^d> Is mw1186 a bits box? [16:14:27] mw1149-1152 are bits apaches [16:14:28] manybubbles: hmm, i don't get the new message on plwiki [16:14:44] but things seem to all work well apart from that [16:14:59] mw1161-1188 are apaches (precise) [16:14:59] http://i.imgur.com/zcMWXrB.png [16:15:05] MatmaRex: ok. I'm not seeing anything broken on mw.org [16:15:13] ^d: should just be a regular appserver [16:15:27] manybubbles: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=mw1149.eqiad.wmnet&m=cpu_report&s=by+name&mc=2&g=cpu_report&c=Bits+application+servers+eqiad [16:15:40] see that little spike ? [16:15:48] manybubbles: but it displays with uselang=en [16:16:29] MatmaRex: yeah [16:16:31] meh, it'll probably fix itself. [16:16:54] so that machine didn't take any of my scap commands [16:16:59] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Bits%2520application%2520servers%2520eqiad&tab=m&vn=&hide-hf=false [16:17:10] all for have a little spike there [16:17:16] four [16:17:30] matanya: are those spikes normal when we static assets/ [16:17:37] I figured we'd get _some_ spike [16:18:04] not an ops, so just ignore me :) [16:18:20] mark: cloud monitor sent an email to the ops list about fobidden [16:18:27] yeah [16:18:34] were any assets removed or something? [16:18:42] mark: I sure don't _think_ so [16:19:25] mark: I synced some modifications to js files, but nothing moved or removed [16:20:35] twkozlowski: is your thing working? [16:20:57] (03PS2) 10Giuseppe Lavagetto: Fix dynamic scope lookup in maintenance.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/132933 [16:21:07] (03CR) 10Giuseppe Lavagetto: [C: 032] Fix dynamic scope lookup in maintenance.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/132933 (owner: 10Giuseppe Lavagetto) [16:21:29] http://bits.wikimedia.org/skins/common/images/poweredby_mediawiki_88x31.png [16:21:33] that's the check [16:22:12] mark: I don't believe I intentionally modified that file. I did scap this morning. I normally sync-file/sync-dir during swat deploys [16:22:20] hmm, beta is forbidden too, it seems [16:22:37] ok, here's the URL that watchmouse check actually loads [16:22:45] https://bits.wikimedia.org/skins/common/images/poweredby_mediawiki_88x31.png [16:22:51] ^d: [16:22:57] oops, too late [16:23:03] thanks [16:23:21] hmmm [16:23:38] beta issues seems unrelated. [16:24:08] manybubbles: sorry, been food [16:24:33] matanya: that is good. [16:24:45] is it possible we started blocking the cloudmonitor by user agent or something (because forbidden) [16:24:55] no [16:24:59] try it in your browser, you get the same [16:25:06] true [16:25:08] MatmaRex: plwiki looks right to me, but I'm over https [16:25:30] yeah, it looks okay now to me as well [16:25:48] manybubbles: just so you have ref: http://en.wikipedia.beta.wmflabs.org/wiki/Dido_Sotiriou [16:25:57] click any image [16:26:05] <^d> mark, mutante: fwiw, nothing's changed in mw/core in skins/common/images/* in over a month, and nothing from the last 5ish commits to that directory involve that file at all. [16:26:11] so scapping changed permissions on /skins/common/images somehow? [16:26:20] uhm [16:26:25] <^d> My guess is permissions too. [16:27:40] should these be in /a/common? that is empty on mw1149 [16:28:03] manybubbles: I confirm the patch works. Thanks! [16:28:10] twkozlowski: great! one down [16:28:13] <^d> manybubbles: No, /a/common/ is just on the deploy host. It should be in /usr/local/apache/common-local/* [16:29:28] root@mw1149:/usr/local/apache/common# ls -ld php [16:29:28] lrwxrwxrwx 1 mwdeploy mwdeploy 13 Mar 21 13:39 php -> php-1.23wmf18 [16:29:32] broken link [16:29:50] <^d> Yeah, that'll do it [16:30:06] php-1.24wmf4 [16:30:18] mark: that looks right. I'll bet someone removed the directory when cleaning up old versions. [16:30:40] yeah [16:30:42] should I swap the link to php-1.24wmf4 on tin and sync it? [16:30:49] I think so yes [16:30:53] there was that change when reedy removed old versions, but before weekend.. [16:31:21] I'm probably the first scap since then [16:31:42] the file is there in 1.24wmf4 and owned by mwdeploy [16:32:46] (03PS1) 10Manybubbles: Fix symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132953 [16:33:02] https://gerrit.wikimedia.org/r/#/c/132483/ [16:33:09] Remove 1.23wmf13 through 1.23wmf20 [16:33:19] ^d: https://gerrit.wikimedia.org/r/#/c/132953/ does that look right? [16:33:25] that would have hit -> php-1.23wmf18 [16:33:38] (03CR) 10Chad: [C: 032] Fix symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132953 (owner: 10Manybubbles) [16:33:44] I'll deploy it then [16:33:45] (03Merged) 10jenkins-bot: Fix symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132953 (owner: 10Manybubbles) [16:34:20] Reedy: [16:35:00] ^d: can't sync-file it because 'it is a directory' [16:35:04] scap? [16:35:12] <^d> Yes [16:35:27] !log manybubbles Started scap: fix php symlink [16:35:27] manybubbles: sync-dir ? [16:35:34] Logged the message, Master [16:35:44] <^d> I'm paranoid right now, scap's a better choice. [16:35:46] ok [16:35:48] mutante: I fear that'd do the wrong thing to a symlink [16:36:27] yep, understand [16:36:28] it'll be about 30 minutes before we hear back from scap [16:36:45] <^d> Well if you just scapped a bit ago it shouldn't be too bad. [16:36:48] <^d> No i18n cache to rebuild. [16:38:47] <_joe_> hey I was away for some minutes, are we sure this is scap related? [16:39:20] _joe_: yea, , recovered [16:39:21] just now [16:39:36] did it just recover? [16:39:37] that looks good [16:39:39] yes [16:39:43] <_joe_> mutante: oh ok, I just got off the minute this happened :) [16:39:52] !log manybubbles Finished scap: fix php symlink (duration: 04m 25s) [16:39:57] much faster! [16:39:59] Logged the message, Master [16:40:10] I just got the recovery email [16:40:12] <_joe_> ok see you later mutante [16:40:20] so what actually broke? [16:40:29] <^d> The symlink [16:40:39] old versions got removed but the symlink wasnt updated [16:40:40] ^d: I _know_ that. what would a user see? [16:40:46] that still pointed to an old version [16:41:02] I'll send an outage report but I need to know what the user sees [16:41:19] i'm not sure it affected much [16:41:32] i got one broken js/css [16:41:47] fixed a moment later [16:41:52] <^d> mark: Yeah I think watchmouse noticed it and we got the fix out before things started disappearing from varnish. [16:42:01] <^d> varnish and/or people's local browser caches. [16:42:16] ^d: so long as their browser cache is hot for us, which is pretty likely [16:42:33] <^d> My browser cache is hot for wiki, is yours? [16:42:48] <^d> ;-) [16:42:50] always [16:45:07] ^d: add as a bugzilla quip [16:46:28] !log mw1186 - down, powercycling [16:46:35] Logged the message, Master [16:46:51] matanya: was that spike related to the broken symlink or just any deploy? [16:47:11] good question manybubbles i don't know [16:47:17] thanks [16:49:05] RECOVERY - Host mw1186 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [16:49:19] !log disabled mw1186 in pybal [16:49:25] manybubbles: ^d , so .. ^ [16:49:26] Logged the message, Master [16:49:33] that was re: your message about mw1186 [16:49:49] it would be out of sync now, so disabled it [16:49:49] mutante: it just came up? [16:49:53] ah [16:49:56] yeah [16:50:00] I guess I can scap it again? [16:50:06] manybubbles: checked last scap, see https://ganglia.wikimedia.org/latest/?r=hour&cs=04%2F29%2F2014+00%3A00+&ce=04%2F30%2F2014+00%3A00+&m=cpu_report&s=by+name&c=Bits+application+servers+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [16:50:15] no such spike [16:50:29] matanya: cool [16:50:32] so i guess this one is related, somehow [16:51:15] PROBLEM - Apache HTTP on mw1186 is CRITICAL: Connection refused [16:52:52] manybubbles: yea, sync it if you can just hit that single host.. it's disabled though [16:53:19] mutante: I'll just run another scap and it should come back into sync with all the others. scap should be pretty quick this time [16:53:25] ^d: ^^^ make sense? [16:53:52] <^d> There's a way to kick it off just on the individual apache but I can't remember what it is. [16:54:15] RECOVERY - Apache HTTP on mw1186 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.114 second response time [16:55:20] but it won't hurt anything to just scap again [16:55:48] <^d> I guess, yeah [16:57:45] over time i've found the statement 'it wont hurt to scap again' can be a dangerous tempter of server gremlins [16:57:57] !log manybubbles Started scap: scapping again to get ms1186 synced up [16:58:04] Logged the message, Master [16:58:10] RobH: woops, tempting gremlins [16:58:15] mw. but meh [16:58:28] matanya: on no! [16:58:35] sysadmining = modern day witchdoctory! [16:58:46] voodoo computing [16:58:54] !log manybubbles Finished scap: scapping again to get ms1186 synced up (duration: 00m 56s) [16:59:01] Logged the message, Master [16:59:33] RobH: looks like no whammy this time [16:59:45] mw1186 should be synced. [17:00:04] and I've logged off of tin [17:00:09] I'm done with it for the morning [17:00:13] anomie: you missed all the fun! [17:00:21] RobH: can you look at https://etherpad.wikimedia.org/p/nodes_with_a_public_IP and comment where you name is please ? [17:00:38] or question marks [17:00:52] !log re-enabled mw1186 in pybal [17:00:59] Logged the message, Master [17:02:06] thanks mutante [17:02:50] np [17:03:08] we have 1 'unknown' in icinga left.. UNKNOWN: Service is flapping: 7 data below and 8 above the confidence bounds [17:03:24] for HTTP error ratio anomaly detection [17:03:35] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Mon May 12 14:03:07 2014 [17:03:52] can you repaste that etherpad link [17:03:57] my laptop got all wonky and i had to reboot [17:04:06] https://etherpad.wikimedia.org/p/nodes_with_a_public_IP [17:06:35] matanya: i ask about firewall rules because some of them are defined in site.pp what they are doing [17:06:41] so not sure why it says esams cache? [17:06:59] the roles the server is holding [17:07:14] let me rephrase, what is the intent of this document? [17:07:43] track what needs firewall rules and prioritize [17:16:59] matanya: search for hostname and status isn't open.. there are boron tickets [17:17:18] used to be fr and now backup, afaik [17:17:51] Jeff_Green: would know [17:18:09] whut [17:18:19] Jeff_Green: what boron is used for today [17:18:34] and if it should have firewall [17:18:36] (03PS4) 10Gage: initial debianization [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 [17:18:43] oic. it's in frack [17:19:08] behind hardware firewall and it has frack-puppetized iptables rules too [17:19:11] should it stay in site.pp ? [17:19:15] in non-frack [17:19:26] not sure [17:19:46] matanya: so, it has a different puppetmaster [17:19:59] (03CR) 10Gage: "Changed python-twisted-core dep to python-twisted-web (which depends on the former)" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [17:20:10] Jeff_Green: is beryllium like that ? [17:20:31] mutante: boron doesn't get any config from prod puppet, but I'm not sure if anything in prod puppet breaks (monitoring comes to mind) if that goes away [17:20:39] beryllium no, no idea what that is [17:21:43] Jeff_Green: i see, yea, it's in icinga [17:21:49] that would break [17:21:53] i suppose [17:22:03] ok, thanks [17:22:18] dunno. icinga gets at least some of its config from the nsca_* files [17:23:06] yea, we can make a change for that and look closer [17:24:04] the only other reason I can think of for it to be in site.pp is that people refer to it for info on what a host does [17:24:33] beryllium, i don't know either [17:24:49] RobH: is that just a spare? [17:25:06] uhh, i dunno, will check shortly [17:25:13] no rush [17:25:16] if its not on my spare page then it may be an unassigned server in limbo [17:25:19] Gah. Is enwiki's i18n cache breaking generally, or just for ? Have had to create the local MW: value to fix it breaking in the last few minutes. [17:26:56] <_joe_> Jeff_Green: what host? boron? [17:27:11] James_F: i saw that for a minute on pl.wp, but it fixed itself it seems [17:27:17] _joe_: any frack host really [17:27:23] MatmaRex: :-( [17:27:53] _joe_: we don't have a great canonical source of host information, so people use site.pp sometimes [17:29:18] <_joe_> Jeff_Green: meh. [17:29:39] _joe_: seconded [17:29:47] <_joe_> Jeff_Green: I'd have plans for this, but first I must learn the whole thing :) [17:30:07] ya [17:35:11] (03CR) 10Rush: [C: 031] "looks nice to me and since it seems like filippo's concerns were also addressed I feel this is gtg" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [17:39:17] woo [18:03:35] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Mon May 12 18:03:26 UTC 2014 [18:27:58] (03PS3) 10Gergő Tisza: FUTURE: Eighth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/129828 (owner: 10MarkTraceur) [18:28:00] (03PS1) 10Gergő Tisza: FUTURE: Sixth batch of pilot sites for MediaViewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132967 [18:28:02] (03PS1) 10Gergő Tisza: FUTURE: Seventh batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132968 [18:30:51] greg-g, we are having some issues with the zero, can we do a quick deploy later today? [18:36:15] _joe_: http://osv.io/ speaking of virtualition :) [18:57:13] (03PS1) 10Odder: Set $wgCategoryCollation to 'uca-cs' on cswiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132975 (https://bugzilla.wikimedia.org/64885) [19:08:21] (03PS1) 10ArielGlenn: snapshot module: fix var ref in template to use @ for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/132977 [19:10:20] (03CR) 10ArielGlenn: [C: 032] snapshot module: fix var ref in template to use @ for puppet 3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/132977 (owner: 10ArielGlenn) [19:16:55] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 9 below the confidence bounds [19:42:55] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [20:08:13] (03CR) 10Faidon Liambotis: [C: 04-1] "Multiple issues inline." (0312 comments) [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [20:11:15] !log deployed Parsoid d1c778ea3 [20:11:22] Logged the message, Master [20:11:22] * gwicke is waiting for the restart to finish [20:27:09] (03PS1) 10Matanya: blog: moving firewall to node level [operations/puppet] - 10https://gerrit.wikimedia.org/r/133018 [20:30:19] * gwicke was done ~10 minutes ago [20:30:43] (03PS1) 10Matanya: blog: remove nrpe [operations/puppet] - 10https://gerrit.wikimedia.org/r/133019 [21:32:11] greg-g, ping [21:57:20] (03PS1) 10Dzahn: add ferm rule to allow smtp from mchenry/sodium to magnesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/133025 [21:58:53] (03CR) 10Rush: [C: 032] "yup needed, not getting emails to rt right now" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133025 (owner: 10Dzahn) [22:00:37] (03CR) 10Dzahn: [C: 032] add ferm rule to allow smtp from mchenry/sodium to magnesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/133025 (owner: 10Dzahn) [22:06:45] (03PS1) 10Dzahn: specify protocol as TCP in ferm rule for RT smtp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133026 [22:08:06] (03CR) 10Dzahn: [C: 032] "tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133026 (owner: 10Dzahn) [22:13:58] (03PS1) 10Dzahn: RT ferm rule: fix protocol, it's tcp not smtp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133028 [22:15:38] (03CR) 10Dzahn: [C: 032] RT ferm rule: fix protocol, it's tcp not smtp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133028 (owner: 10Dzahn) [22:15:52] <^d> I'm poking around in graphite but I can't seem to find where a particular gdash dashboard is generated. Help? [22:16:06] <^d> Maybe I'm looking in the wrong place? [22:17:06] (03CR) 10Dzahn: "ACCEPT tcp -- mchenry.wikimedia.org anywhere tcp dpt:smtp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133028 (owner: 10Dzahn) [22:17:55] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [22:18:37] <^d> Ah, files/gdash/dashboards/* in puppet. [22:18:41] <^d> Answered my own question. [22:36:41] (03PS1) 10Chad: Rewrite search latency metric to track new search [operations/puppet] - 10https://gerrit.wikimedia.org/r/133030 [22:49:10] (03PS2) 10Gerrit Patch Uploader: Show AbuseFilter log hits on IRC for wikis where logs public [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130274 (https://bugzilla.wikimedia.org/64255) [22:49:12] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130274 (https://bugzilla.wikimedia.org/64255) (owner: 10Gerrit Patch Uploader) [23:04:48] greg-g, is not around, gwicke is hopefully done with depl, i will go next [23:05:12] ^d deploying osmething? [23:15:39] ahem, there are tons of exceptions from Export.php... [23:17:28] jgage, do you know anything about it? [23:17:53] sorry, i do not [23:18:44] hmm... well, the fatalmonitor has tons of warnings... [23:19:02] ok, i need to depl something minor, unrelated, and will poke [23:20:26] (03PS1) 10Rush: changeup diamond collection to 60s [operations/puppet] - 10https://gerrit.wikimedia.org/r/133035 [23:20:28] (03PS1) 10Rush: rollout diamond in standard for precise only [operations/puppet] - 10https://gerrit.wikimedia.org/r/133036 [23:24:57] !log yurik synchronized php-1.24wmf4/extensions/ZeroRatedMobileAccess/ [23:25:04] Logged the message, Master [23:28:25] !log yurik synchronized php-1.24wmf3/extensions/ZeroRatedMobileAccess/ [23:28:33] Logged the message, Master [23:30:03] <^d> yurik: No, I'm not. [23:30:15] too late, already deployed :)) [23:30:15] <^d> And yes, the things with Export are known-ish. [23:30:24] thx! [23:38:20] (03CR) 10Dzahn: [C: 031 V: 031] changeup diamond collection to 60s [operations/puppet] - 10https://gerrit.wikimedia.org/r/133035 (owner: 10Rush) [23:40:32] (03CR) 10Dzahn: [C: 031] rollout diamond in standard for precise only [operations/puppet] - 10https://gerrit.wikimedia.org/r/133036 (owner: 10Rush) [23:41:30] ^d, is there a way to flush varnish for a specific url - http://zero.wikipedia.org [23:41:53] I think there is [23:41:55] <^d> Dunno? Never done anything with that. [23:42:20] There's instructions on wikitech somewhere. [23:42:35] It takes a root I think. [23:42:43] isn't it just doing a PURGE ? [23:43:13] or did varnish change that? [23:44:38] There's a maintenance script for purges I think, but there is deeper magic possible too. [23:45:22] unfortunately there is more than just one page it seems - there are 3 redirects, each of which should be flushed [23:46:17] https://wikitech.wikimedia.org/wiki/Varnish#One-off_purges [23:46:46] yurik: bblack knows how if he's around [23:46:56] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [23:47:05] bblack is the king of varnish flushes [23:47:35] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Mon May 12 20:46:59 2014 [23:48:41] hi! [23:49:20] i have a question concerning the limit of thumb.php [23:50:10] got the message "As an anti-spam measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit. Please try again in a few minutes" and would like to ask if it is somehow possible to change the limit [23:52:05] (03PS3) 10Tim Starling: rm trailing slash from destinations where unneeded [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106110 (owner: 10Jeremyb) [23:52:41] I wonder where that text comes from. I looked in the operations repos, but couldn't see it. [23:53:48] <^d> mediawiki/core [23:53:50] <^d> languages/i18n/en.json [23:53:50] <^d> 351: "actionthrottledtext": [23:54:09] springle, you might be interested in https://gerrit.wikimedia.org/r/#/c/132378/7/includes/api/ApiWatch.php :P [23:54:16] ^d: Fair play. Thanks. [23:54:20] bd808|MOBILE, thanks! he is apparently gone - who has root on varnish servers? https://wikitech.wikimedia.org/wiki/MobileFrontend#Flushing_the_cache [23:54:26] looks like it will cause horrible replication lag [23:54:30] <^d> Gloria: yw [23:55:05] Gloria: there shouldn't be too many requests, that app had about 500 downloads [23:55:25] Flexman: Ah, if you read through thumb.php, you can see the error you're hitting. [23:55:41] > [23:55:41] $user = RequestContext::getMain()->getUser(); [23:55:41] if ( $user->pingLimiter( 'renderfile' ) ) { [23:55:41] wfThumbError( 500, wfMessage( 'actionthrottledtext' ) ); [23:55:41] return; [23:55:41] > [23:55:53] bblack, when you are around, could you flush all mobile varnishes - zero is misbehaving, we are trying to figure out the cause [23:55:59] Flexman: If it's a mobile app, I doubt an IP address exemption would help you... [23:56:15] (03CR) 10Tim Starling: [C: 032] "It's a nitpick either way. Obviously the script normalises the slashes, because in redirects.conf, only the comments change. Merging becau" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106110 (owner: 10Jeremyb) [23:56:34] Gloria: that app has to do the requests over one server unfortunately, but this means that all requests will come from the same IP [23:56:45] Ahh. [23:57:18] <^d> current limits are: [23:57:20] <^d> 'renderfile' => array( [23:57:21] <^d> // 1400 new thumbnails per minute [23:57:21] <^d> 'ip' => array( 700, 30 ), [23:57:22] <^d> 'user' => array( 700, 30 ), [23:57:24] <^d> ), [23:57:29] <^d> for wmf wikis. [23:57:45] however what does 700, 30 mean in this case? [23:57:51] The comment above explains. [23:58:10] 700 per 30 seconds. [23:58:14] You can generate 700 thumbnails in 30 seconds, I guess. That sounds high. [23:58:17] hmm this can't be... i never created so many thubs... [23:58:32] after about 15 thumbs this failed to work [23:59:00] I think that may be 700 per 30 seconds for all anonymous users? [23:59:15] ah ok that could explain [23:59:26] Yeah, otherwise that limit sounds way too high. [23:59:29] * bd808|MOBILE can't browse code on phone [23:59:32] yep.. [23:59:43] so that means if i use some registered user, this will be the limit for the user? [23:59:49] <^d> the ping limiter is per-ip. [23:59:59] <^d> Or per-user, if logged in