[00:04:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [00:18:17] !log bouncing elasticsearch on elastic1015 to pick up gc logging configuration. it might warn but shouldn't cause any service disrubtion. [00:18:23] Logged the message, Master [00:58:24] (03PS1) 10Dr0ptp4kt: Provide read only access right zero-script to zeroscript group. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133200 [01:50:38] https://en.wikipedia.org/wiki/Switzerland?forceprofile=true [01:50:43] 21.05% 0.301830 225 - hook: LanguageGetTranslatedLanguageNames [01:50:46] ori: lol [01:51:01] * AaronSchulz should always browse while logged out ;) [01:51:28] so much faster [01:51:35] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Tue May 13 22:51:03 2014 [02:02:48] (03CR) 10MZMcBride: "I believe this change can now be merged and deployed." [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/106761 (https://bugzilla.wikimedia.org/59893) (owner: 1001tonythomas) [02:13:05] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3791 MB (3% inode=99%): [02:13:08] (03PS1) 10Springle: Role for miscellaneous services DBs m[123], but currently only m3. Reassign db1043 to m3 since it no longer suits s1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/133202 [02:14:26] !log LocalisationUpdate completed (1.24wmf3) at 2014-05-14 02:13:23+00:00 [02:14:37] Logged the message, Master [02:15:57] (03PS2) 10Springle: Role for miscellaneous services DBs m[123], but currently only m3. Reassign db1043 to m3 since it no longer suits s1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/133202 [02:21:05] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3434 MB (3% inode=99%): [02:21:09] !log upgrade db1043, rebuild as m3 master [02:21:17] Logged the message, Master [02:22:11] (03CR) 10Springle: [C: 032] Role for miscellaneous services DBs m[123], but currently only m3. Reassign db1043 to m3 since it no longer suits s1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/133202 (owner: 10Springle) [02:26:12] !log LocalisationUpdate completed (1.24wmf4) at 2014-05-14 02:25:08+00:00 [02:26:19] Logged the message, Master [02:36:52] (03PS1) 10Springle: Add CNAMEs for m3-master and m3-slave. Yes, presently they point to the same box. [operations/dns] - 10https://gerrit.wikimedia.org/r/133203 [02:38:57] (03CR) 10Springle: [C: 032] Add CNAMEs for m3-master and m3-slave. Yes, presently they point to the same box. [operations/dns] - 10https://gerrit.wikimedia.org/r/133203 (owner: 10Springle) [02:50:16] (03CR) 10Tim Starling: [C: 032] final (I hope!) fix for protorel redirects [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106109 (https://bugzilla.wikimedia.org/31369) (owner: 10Jeremyb) [02:53:30] !log deploying apache configuration change https://gerrit.wikimedia.org/r/106109 [02:53:36] Logged the message, Master [03:00:00] I'm getting a weird error when I try to load any of the WMF sites in Firefox: "The page isn't redirecting properly. Firefox has detected that the server is redirecting the request for this address in a way that will never complete." Works fine in Chrome and Safari though. [03:00:24] So, I'm sure someone else has reported it. [03:00:28] Too many redirects. [03:01:05] RECOVERY - Disk space on virt0 is OK: DISK OK [03:01:17] !log reverting apache change [03:01:17] Yeah, just mentioned it. [03:01:23] Logged the message, Master [03:02:08] it's working again now [03:02:22] on any page? [03:05:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [03:11:42] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 14 03:10:36 UTC 2014 (duration 10m 35s) [03:11:48] Logged the message, Master [03:12:07] Someone mentioned the redirect issue on wikitech-l, but didn't provide a link. [03:17:29] kaldari: did you get it on login? [03:17:48] no, just trying to load the en.wiki or commons Main Page [03:18:20] www.wikipedia.org worked, but not any of the language versions [03:19:39] thanks for fixing it so fast :) [03:34:24] I archived the server admin log. [03:44:52] (03PS1) 10Tim Starling: Don't overwrite CGI variable HTTP_X_FORWARDED_PROTO [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133205 [03:52:49] (03CR) 10Ori.livneh: [C: 031] Don't overwrite CGI variable HTTP_X_FORWARDED_PROTO [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133205 (owner: 10Tim Starling) [03:57:53] (03CR) 10Tim Starling: [C: 032] Added a few tests for redirects.conf [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133045 (owner: 10Tim Starling) [04:03:03] (03CR) 10Tim Starling: [C: 032] Don't overwrite CGI variable HTTP_X_FORWARDED_PROTO [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133205 (owner: 10Tim Starling) [04:18:04] !log deploying apache configuration change with fixes [04:18:09] Logged the message, Master [04:20:28] site still up? [04:33:58] Seems so. [04:34:22] But I'm wondering whether https://wiktionary.org and such are cached. curl still says they're going to http. [04:35:03] $ curl -Is https://wiktionary.org | grep Location [04:35:03] Location: http://www.wiktionary.org/ [04:35:08] $ curl -Is https://wiktionary.org?1 | grep Location [04:35:08] Location: https://www.wiktionary.org/?1 [04:36:19] Same with https://wikipedia.org [04:36:32] And wikiversity.org isn't redirecting to www, but maybe that's unrelated. [04:37:56] $ curl -Is https://wikibooks.org | grep Location [04:37:56] Location: https://en.wikibooks.org/ [04:38:01] $ curl -Is http://wikibooks.org | grep Location [04:38:01] Location: http://www.wikibooks.org/ [04:38:05] Also cute. ^ [04:42:43] < Location: http://www.wiktionary.org/ [04:42:46] < Age: 10342 [04:42:55] that means it's cached [04:45:19] Okay, I'll try waiting a bit. [04:52:35] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Tue May 13 22:51:03 2014 [05:42:42] (03PS1) 10Ori.livneh: osmium: Git-clone luasandbox and fss as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/133208 [05:43:47] (03CR) 10jenkins-bot: [V: 04-1] osmium: Git-clone luasandbox and fss as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/133208 (owner: 10Ori.livneh) [05:45:22] (03PS2) 10Ori.livneh: osmium: git-clone luasandbox and fss as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/133208 [05:47:59] (03CR) 10Ori.livneh: [C: 032] osmium: git-clone luasandbox and fss as well [operations/puppet] - 10https://gerrit.wikimedia.org/r/133208 (owner: 10Ori.livneh) [06:06:36] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [06:18:15] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 3 below the confidence bounds [06:35:35] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:57:31] <_joe_> graphite down again... [07:16:44] (03PS5) 10Matanya: applicationserver: lint and tidy [operations/puppet] - 10https://gerrit.wikimedia.org/r/122269 [07:44:25] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.024 second response time [07:45:15] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [07:50:11] (03CR) 10Giuseppe Lavagetto: Improve nginx TLS/SSL settings. (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [07:53:35] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Tue May 13 22:51:03 2014 [07:54:04] <_joe_> that seems bad [08:07:13] hello [08:16:02] hi James_F [08:42:42] (03PS1) 10Giuseppe Lavagetto: Fix dynamic scope lookup in templates. [operations/puppet] - 10https://gerrit.wikimedia.org/r/133213 [08:52:04] any sysadmin here? [08:52:55] what's up? [08:53:06] <_joe_> cortexA9: state what do you need, someone will answer :) [08:53:36] i wanna contribute as a volunteer [08:53:40] _joe_: nitpick: no trailing period in commit messages' first lines :) [08:53:55] :) [08:56:16] cortexA9: as told you in -tech [08:56:39] _joe_: thanks for fixing that protoproxy hell [08:57:02] <_joe_> paravoid: aw I did that? [08:58:03] Fix dynamic scope lookup in templates. [08:59:21] <_joe_> paravoid: wait, I should not use it, sorry quite used to projects where that is required [08:59:39] <_joe_> (just confronted my commits with other people's) [09:00:38] <_joe_> paravoid: luckily enough that's easy to fix :) [09:06:08] :) [09:06:19] we mostly follow http://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines [09:06:25] well, some of us do :) [09:07:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [09:08:01] paravoid: any chance you find a moment to prioritize https://etherpad.wikimedia.org/p/nodes_with_a_public_IP for me ? [09:11:29] prioritize for what? [09:16:54] what to do first, and what is important, and what not [09:17:10] firewall (base::firewall) [09:17:23] (03PS2) 10Giuseppe Lavagetto: Fix dynamic scope lookup in templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/133213 [09:17:59] paravoid: added at the top of the etherpad [09:18:08] oh for firewall [09:18:27] I can't do that prioritization now, but useful quick tips: avoid *lvs*, *cp* and anything in Tampa [09:19:07] that should cut your list down significantly, so hope it helps [09:34:11] (03CR) 10Giuseppe Lavagetto: [C: 032] Fix dynamic scope lookup in templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/133213 (owner: 10Giuseppe Lavagetto) [09:38:03] (03PS4) 10Giuseppe Lavagetto: exim: fix scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/119496 (owner: 10Matanya) [09:39:54] thanks paravoid [09:40:17] <_joe_> matanya: the change seems good (the one on exim) [09:40:29] oh, good, i'm glad [09:40:57] <_joe_> matanya: I should add you to the puppet-compiler in labs, so that you can verify changes by yourself, it's very useful [09:41:10] would love that [09:41:29] <_joe_> matanya: in this case, I just did comparator --change 119496 --nodes aluminium.wikimedia.org to verify it's noop [09:41:48] <_joe_> and then comparator --change 119496 --nodes aluminium.wikimedia.org --transition to verify it works exactly equal in puppet 3 [09:42:00] _joe_: any docs i should read ? [09:42:20] <_joe_> matanya: https://wikitech.wikimedia.org/wiki/Puppet_migration [09:42:31] <_joe_> most things, you already should know [09:43:05] !log Setup BGP configuration for lvs300* on cr1-esams and cr2-knams, with elevated MEDs to keep them as last resorts [09:43:09] Logged the message, Master [09:43:22] <_joe_> you have isntructions to run this in a vagrant box, but to do whtat you will need me to finish my fact-cleaning script [09:43:46] <_joe_> so you may just use the labs machine while we get to have jenkins integration [09:43:52] oh, i read that page. [09:44:02] i can't afford vagrant on my box [09:44:41] <_joe_> ok, the other option is you wait 1-2 days and I'll have full jenkins integration. [09:44:48] ok, i'll do the stuff in labs, preferable anyways [09:46:31] !log Started PyBal on lvs300* and established BGP sessions with the routers [09:46:36] Logged the message, Master [09:49:20] oh! [09:49:38] awesome [09:51:21] <_joe_> \o/ [09:59:25] (03CR) 10Mark Bergsma: "Overall not bad, but you should separate things a bit better into "ircd" (the ircd puppet module), the wikimedia specific parts around tha" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 (owner: 10Rush) [10:07:50] (03CR) 10Mark Bergsma: [C: 04-1] "Sorry, getting rusty, forgot to -1... ;)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 (owner: 10Rush) [10:11:03] (03PS4) 10Alexandros Kosiaris: bacula: allow mysqldumps to be kept locally [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 [10:17:37] akosiaris: two things: 1) what is the fate of my torrus and etherpad modules, still relevant? 2) please merge https://gerrit.wikimedia.org/r/#/c/132922/ if appropriate. [10:20:04] matanya: 1) the torrus module will probably need a lot of changes, see my various merges in the last week, the etherpad module needs some minor changes (I think I have pushed them in some patchset) 2) See my comment please. The other variables need a @ as well [10:22:14] (03PS4) 10Matanya: ferm: fully qualify variables [operations/puppet] - 10https://gerrit.wikimedia.org/r/132922 [10:22:36] akosiaris: sorry, i already fixed that, but didn't push :/ here it is ^ [10:23:53] matanya: ah nice. Running a test and merging :-) [10:28:40] akosiaris: also, is amanda dead yet ? [10:29:21] nope, but I got a local branch where I slowly kill parts of it. Need to push that really [10:30:04] this https://gerrit.wikimedia.org/r/#/c/133104/ is also part of that :) [10:31:59] nice, thanks :-) [10:34:20] (03CR) 10Mark Bergsma: "Looks like pretty solid work. :) Some comments inline." (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [10:42:08] poor amanda [10:42:10] why do you hate her [10:42:21] i once had a job at a small lawyer firm [10:42:27] where I set up Amanda for tape backups [10:42:41] and I named all the tapes "Amanda-001" to "Amanda-014" or something like that [10:42:51] then the assistants asked if Amanda was my girlfriend's name [10:42:54] haha [10:42:58] LOL [10:43:12] great story [10:48:17] (03CR) 10JanZerebecki: Improve nginx TLS/SSL settings. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [10:54:35] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Tue May 13 22:51:03 2014 [11:03:47] mark: https://rt.wikimedia.org/Ticket/Display.html?id=7485 is duplicate of https://rt.wikimedia.org/Ticket/Display.html?id=7451 [11:04:09] indeed [11:04:13] thanks for submitting the patchset [11:04:16] feel free to merge the tickets [11:04:21] will do [11:05:08] done [11:13:20] (03CR) 10JanZerebecki: Improve nginx TLS/SSL settings. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [11:43:29] (03PS3) 10Nuria: Upstart follows fork when starting celery. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) [11:44:41] (03CR) 10Nuria: Upstart follows fork when starting celery. (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) (owner: 10Nuria) [11:48:35] _joe_: graphite doesn't seem very happy, still [11:50:35] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:50:49] oh thank you icinga-wm [11:52:11] <_joe_> paravoid: err sorry I am at lunch [11:52:26] <_joe_> didn't set myself away, sorry [11:55:31] no worries :) [12:07:55] (03PS1) 10Ottomata: Adding Leila Zia's new ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/133224 [12:08:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [12:11:29] (03CR) 10QChris: [C: 04-1] "I got pinged on IRC that Ori's suggestion of using exec would not" [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) (owner: 10Nuria) [12:12:43] (03CR) 10Ottomata: [C: 032 V: 032] Adding Leila Zia's new ssh key [operations/puppet] - 10https://gerrit.wikimedia.org/r/133224 (owner: 10Ottomata) [12:13:07] can anyone with shell access offer some help? [12:13:27] liangent: what's up? [12:13:42] ottomata: just want to diagnose https://bugzilla.wikimedia.org/show_bug.cgi?id=20825 [12:14:01] can I get the list of files in /usr/local/apache/common/fonts ? [12:14:11] per putenv( "GDFONTPATH=/usr/local/apache/common/fonts" ); in CommonSettings.php [12:15:46] liangent: https://gist.github.com/ottomata/be2e8f786a3e8d1bcd78 [12:17:39] ottomata: hmm got it, so wqy-zenhei.ttc is not there. what about /usr/share/fonts/truetype/wqy ? this is where that file is installed in a typical debian system [12:18:12] nope, not there [12:18:16] in truetype/ dir [12:18:17] just [12:18:18] ttf-dejavu [12:19:02] ottomata: so I wonder where https://meta.wikimedia.org/wiki/SVG_fonts are stored... [12:19:32] GDFONTPATH needs to be pointed to there .. and I guess this fixes 20825 [12:21:48] ottomata: can I get `fc-list | grep wqy` [12:30:25] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.102 second response time [12:30:38] <_joe_> !log restarted uwsgi on tungsten [12:30:42] Logged the message, Master [12:32:00] liangent: GD needs all font in the same folder https://bugzilla.wikimedia.org/show_bug.cgi?id=37968 [12:32:09] liangent: so we will have to copy paste files there [12:32:19] (03PS4) 10Nuria: Upstart follows fork when starting celery. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) [12:32:24] <_joe_> paravoid: let's see if this is enough; if not, I'll add some workers [12:33:30] liangent: I commented on bug 20825 [12:34:07] liangent: no output [12:35:07] ottomata: grr so I guess wqy fonts are only installed in image scalers? [12:35:23] because they work in pngs generated from svgs... [12:38:55] hashar: re https://bugzilla.wikimedia.org/show_bug.cgi?id=37968 ... what I want to find is *.ttc [12:42:37] liangent: here what we have in GDFONTPATH : http://paste.openstack.org/show/80352/ [12:42:53] the pngs generated from svgs uses librsvg which uses the font in the system [12:43:04] ploticus uses libgd which reads fonts from GDFONTPATH [12:43:58] (03PS5) 10Gage: initial debianization [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 [12:43:59] we should really phase out EasyTimeline in favor of a more modern solution [12:45:39] hashar: yeah having size is good thing :) compare to the list provided ottomata - and I haven't thought about it [12:45:50] fonts containing chinese glyphs are usually much bigger [12:46:18] and I have no idea how to add new files in GDFONTPATH :( [12:46:36] i.e. no idea where the reference repository is. Potentially it is only on the deployment server (tin) [12:48:05] (03CR) 10Gage: "Fixed:" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [12:51:45] (03CR) 10Faidon Liambotis: "There are at least four BSD licenses. It's impossible to know which one the author meant and it's wrong to claim it's one of the four at r" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [12:53:35] (03CR) 10Tim Landscheidt: [C: 04-1] "Needs to copy the changes in Ie16e5de91c68ce9c7e4a3332cd14431033e52135." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109460 (https://bugzilla.wikimedia.org/60222) (owner: 10Tim Landscheidt) [12:54:05] (03PS22) 10Matanya: etherpad: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 [12:58:51] (03PS1) 10Ottomata: Removing class role::analytics::kraken::jobs::hive::partitions::external [operations/puppet] - 10https://gerrit.wikimedia.org/r/133225 [13:01:36] lol @ that class name [13:01:56] haha [13:02:10] i take it back! we need it for now too, i just need to make it not run on webrequest tables [13:03:42] (03PS1) 10Filippo Giunchedi: fix carbon destinations with additional workers [operations/puppet] - 10https://gerrit.wikimedia.org/r/133226 [13:04:44] (03PS2) 10Ottomata: Only using hive-partitioner to create partitions on pagecounts table [operations/puppet] - 10https://gerrit.wikimedia.org/r/133225 [13:05:07] _joe_: I noticed not all workers were busy, fixed in https://gerrit.wikimedia.org/r/#/c/133226/ [13:05:26] (03PS3) 10Ottomata: Only using hive-partitioner to create partitions on pagecounts table [operations/puppet] - 10https://gerrit.wikimedia.org/r/133225 [13:05:35] (03CR) 10QChris: Upstart follows fork when starting celery. (032 comments) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) (owner: 10Nuria) [13:05:46] <_joe_> godog: yes I think your change is correct [13:07:09] <_joe_> 2 more should be ok-ish [13:07:36] (03CR) 10Ottomata: [C: 032 V: 032] Only using hive-partitioner to create partitions on pagecounts table [operations/puppet] - 10https://gerrit.wikimedia.org/r/133225 (owner: 10Ottomata) [13:07:58] yeah there were not receiving data anyway [13:10:22] (03CR) 10Giuseppe Lavagetto: [C: 031] fix carbon destinations with additional workers [operations/puppet] - 10https://gerrit.wikimedia.org/r/133226 (owner: 10Filippo Giunchedi) [13:10:59] (03PS1) 10Liangent: Set unifont-5.1.20080907.ttf for timeline on ZH projects [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133228 (https://bugzilla.wikimedia.org/20825) [13:11:14] (03CR) 10Gage: "I think I need to do this in order to publish my inline comments" (033 comments) [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [13:11:26] hashar: ^ maybe we can choose an existing font for now [13:11:59] but regarding "5.1.20080907" in font file name -- I'm not sure whether it will change after some system upgrade [13:16:01] (03CR) 10Liangent: "Reference: the previous unsuccessful attempt of font change:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133228 (https://bugzilla.wikimedia.org/20825) (owner: 10Liangent) [13:16:47] hashar: and does GDFONTPATH support /path/one:/path/two ? [13:17:24] (03CR) 10Liangent: "Another attempt: https://gerrit.wikimedia.org/r/133228" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/64205 (https://bugzilla.wikimedia.org/20825) (owner: 10Reedy) [13:18:11] liangent: afaik no. Only a single path [13:22:06] hmm [13:22:30] I guess removing gd dependency means rewriting EasyTimeline completely [13:25:13] Error undeleting page [13:25:13] Error undeleting file: The file mwstore://local-swift-eqiad/local-deleted/7/v/i/7vinndy0g2n4tw9lewwymkr7iyq6u8g.jpg does not exist. [13:25:22] https://commons.wikimedia.org/w/index.php?title=Special:Undelete&target=File%3AArgentina78.JPG [13:25:30] can somone pls look into this^^? [13:25:37] I can't restore the file [13:27:02] !log restarting pybals on lvs300x [13:27:06] Logged the message, Master [13:30:35] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Wed May 14 10:29:53 2014 [13:30:53] somone has a idde whats AGAIN broken with swift? [13:31:09] it is verry annoing to get such errors when deleting, restoring, moving files [13:32:40] <_joe_> Steinsplitter: when did you load that file? [13:39:29] som minutes ago [13:46:02] _joe_: it was deleted yesterday [13:46:31] during swift outage [13:47:02] _joe_: https://commons.wikimedia.org/w/index.php?title=Special:Undelete&target=File:Argentina78.JPG [13:47:48] (03PS2) 10Filippo Giunchedi: fix carbon destinations with additional workers [operations/puppet] - 10https://gerrit.wikimedia.org/r/133226 [13:47:55] sorry, correction: was undeleted 19:59, 13 May 2014 and and delete again at: 01:53, 14 May 2014 [13:48:11] (03CR) 10Filippo Giunchedi: [C: 032] fix carbon destinations with additional workers [operations/puppet] - 10https://gerrit.wikimedia.org/r/133226 (owner: 10Filippo Giunchedi) [13:48:26] (03CR) 10Filippo Giunchedi: [V: 032] fix carbon destinations with additional workers [operations/puppet] - 10https://gerrit.wikimedia.org/r/133226 (owner: 10Filippo Giunchedi) [13:48:31] can we get deleted files from backup _joe_ ? [13:48:49] <_joe_> matanya: no idea sorry :) [13:49:19] i guess aaron is the person to ask [13:52:55] (03PS5) 10Giuseppe Lavagetto: exim: fix scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/119496 (owner: 10Matanya) [13:55:35] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Tue May 13 22:51:03 2014 [13:56:54] (03CR) 10Giuseppe Lavagetto: [C: 032] exim: fix scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/119496 (owner: 10Matanya) [13:57:01] thanks icinga-wm, I was going to ask about it, looks like puppet has been disabled there [14:00:51] <_joe_> godog: oh yes I noticed this morning [14:01:01] <_joe_> no mention anywhere about this [14:07:01] perhaps chasemp ? [14:08:36] <_joe_> either him or ori [14:12:10] (03CR) 10Tim Landscheidt: "Needs to be assessed how Ie16e5de91c68ce9c7e4a3332cd14431033e52135 applies here." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [14:28:12] !log mw1053 going down for disk replacement [14:28:16] Logged the message, Master [14:29:55] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Wed May 14 14:29:50 UTC 2014 [14:31:55] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [14:32:14] <_joe_> wtf? [14:32:55] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [14:33:08] <_joe_> bnx2 up and down [14:33:20] twice [14:34:21] akosiaris: who is the swift expert among ops ? [14:34:38] matanya: andewbogott and paravoid [14:35:02] commons is missing a file (maybe more) since the outage yesterday [14:35:05] <_joe_> matanya: I am on it [14:35:11] _joe_: cool [14:35:15] thanks _joe_ [14:35:20] <_joe_> matanya: how do you know it is due to the outage? [14:35:27] <_joe_> because I don't [14:35:33] just guessing , due to timimg [14:35:39] *timing [14:35:41] _joe_, akosiaris: these are from may 11th; probably chris investigating the D5 outage [14:35:55] paravoid: ah true [14:36:07] matanya: don't state guesses as facts please :) [14:36:15] there was no swift outage yesterday [14:36:15] <_joe_> paravoid: yes I've just checked the timing [14:36:16] yeah, sorry [14:36:26] we just lost one box for a while, that's nothing [14:36:40] paravoid: eth3 would not anyway cause SSH not responding [14:36:52] so what ? bad icinga check going haywire ? [14:36:54] the SSH error is known [14:37:01] i shouldn't jump to conclusion so fast [14:37:17] <_joe_> matanya: so, if the problem is one file, we may be able to fetch it from one month old replica, but that's about it [14:37:27] <_joe_> matanya: how many files are missing? [14:37:31] that should be cool _joe_ [14:37:42] i know of one as of now [14:37:59] i can do some checks to see if there are more [14:38:26] <_joe_> matanya: no point in doing that :) [14:38:39] ok [14:45:32] <_joe_> so, the file is not where MW expects it to be [14:46:11] <_joe_> matanya: do you have the original url of the image? [14:46:36] RECOVERY - Puppet freshness on tungsten is OK: puppet ran at Wed May 14 14:46:34 UTC 2014 [14:50:22] _joe_: mwstore://local-swift-eqiad/local-deleted/7/v/i/7vinndy0g2n4tw9lewwymkr7iyq6u8g.jpg ? [14:51:07] or do you mean https://commons.wikimedia.org/wiki/File:Argentina78.JPG _joe_ ? [14:52:12] <_joe_> the second one :) [14:52:15] <_joe_> ok thanks [14:52:22] <_joe_> it will take me some time [15:09:36] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [15:10:01] (03CR) 10Alexandros Kosiaris: [C: 032] ferm: fully qualify variables [operations/puppet] - 10https://gerrit.wikimedia.org/r/132922 (owner: 10Matanya) [15:13:51] Hi - graphite seems to be suffering issues since May 13th (yesterday) in the UTC evening [15:13:59] where should we file the report? RT? Bugzilla? [15:14:42] (this seems to affect all the sources of monitoring information we checked) [15:15:23] chasemp: you looking at this ^ [15:16:44] milimetric: it was dogging yesterday evening especially, as far s I know now and as of most of the a.m. it's been responsive to web requests [15:16:59] yeah, that's consistent with what we see [15:17:03] milimetric: can you point me to what's failing now? [15:17:33] all of the graphs we checked seem spotty since yesterday evening and don't seem to be getting better [15:18:32] can you give me a few examples just so I can line up outage times, especially if they are hosed now [15:19:21] I tihnk i see where are dropping incoming fromt the queue [15:19:24] chasemp: Graphite.mw.js.deprecate.addPortletLink.count [15:19:36] chasemp: gerrit.event.patchset-created.count [15:19:55] chasemp: zuul.pipeline.gate-and-submit.all_jobs.count [15:21:14] qchris: milimetric: thanks guys, I had hoped this was getting better but I think not [15:25:52] chasemp: Thanks for taking a look. [15:33:35] PROBLEM - Host dysprosium is DOWN: PING CRITICAL - Packet loss = 100% [15:43:30] (03PS5) 10Nuria: Make upstart track the wikimetrics PID. [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) [15:44:35] !log disabling puppet on tungsten to try tweaking carbon settings to affect queue drops (for the better) [15:44:39] Logged the message, Master [15:45:19] (03CR) 10Andrew Bogott: [C: 032] dynamicproxy: Use redis connection pooling [operations/puppet] - 10https://gerrit.wikimedia.org/r/133172 (https://bugzilla.wikimedia.org/65179) (owner: 10Yuvipanda) [15:45:29] (03CR) 10Andrew Bogott: [C: 032] dynamicproxy: Remove some unused code [operations/puppet] - 10https://gerrit.wikimedia.org/r/133171 (owner: 10Yuvipanda) [15:45:41] hashar: So, why not openstack vm pool? [15:46:21] or does it contain multiple proposals? [15:46:38] Interesting. I'm confused. [15:47:03] hey bearND! [15:47:16] YuviPanda: hello again [15:47:20] hashar: and Krinkle are the folks most knowledgeable about our current jenkins setup [15:47:38] hashar: Krinkle bearND is the new android dev, and has a good amount of interest in gerrit/jenkins/process :) [15:48:38] hashar: Krinkle: nice to meet you [15:49:01] Hi [15:49:03] Krinkle: yeah I have rushed out the doc over the afternoon. It is lacking a lot of explanations / potential solution. [15:49:30] bearND: flake8 does linting for our python scripts. zuul is the thing that reads events from gerrit and triggers jenkins, I think [15:49:30] Krinkle: if you have anytime, feel free to comment on the doc and I will complete tomorrow. [15:49:37] hashar: I'm confused why it mentions so much outside the openstack pool story. I don't have time to read it all, but did you include that as research or do you plan to use it? [15:49:40] welcome bearND ! [15:50:15] hashar: thanks [15:50:18] Krinkle: I want to investigate different solution. That follow up discussion I had during the hackathon that convinced me to investigate other solutions beside OpenStack. Ie Vagrant :-D [15:50:21] hashar: Isn't all that vm stuffy stuff handled by openstack? Afaik we wouldn't have any dealings with LXC or anything like that [15:50:40] Vagrant would have to support creating instances ahead of time, keeping them in a pool, etc. [15:50:49] yup that is a drawback [15:50:50] otherwise it's going to be slow [15:50:58] I need to list the advantages/drawback for each solution [15:51:01] and we need lots of ops resources to maintain such a pool outside of labs [15:51:18] then talk to you to find out which solution we want to use and convince the rest of engineering that is best one. [15:51:26] the ops resource side I will handle it [15:51:28] hashar: consider fsprotect too [15:51:34] I think we can make the call to just propose one implementation using openstack, seems straight forward enough. If there's drawbacks to that we can look at those. [15:51:48] ori: oh never heard of it. If you have any bandwidth feel free to add a section to the doc [15:52:04] ori: Krinkle: the doc is only shared to you two. Might want to get more people enrolled [15:52:09] next week or so [15:52:19] i'll read it thoroughly first and then comment [15:52:53] bearND: if you are curious about Jenkins / flake8 etc. We have a #wikimedia-qa channel. I am myself around from 9:30am to 6pm (GMT+2 i.e. Europe) during business days. [15:53:06] Note that I think most of this gets disqualified if you factor in that 1) it's not about applicaiton isolation, it's about trust isolation. Most of the things on this list aside from openstack vms are not build with the "absolute security" in mind of running and executing code submitted by random people. [15:53:18] ori: there is no urgency, it can wait the end of this week. [15:53:37] and 2) we need to be able to install packages (e.g. npm install) and services (e.g. listen mysql and apache) without any conflicts. [15:53:49] hashar: thanks; i'll check out the channel [15:54:49] Krinkle: that is exactly why I have shared the doc with you. You are way better than me in writing down list of features / requisites :-) [15:55:08] hashar: well, you seemed to have written up quite an essay. I couldn't do that. [15:55:10] (and wouldn't) [15:55:32] and you have the cycles available. I'm just complaining for 2 minutes a day. [15:55:36] :) [15:55:41] well [15:55:49] I took the cycles to do it :] [15:56:04] sorry, that's what I meant, yes. You got them assigned / claimed. [15:56:19] don't feel pressured Krinkle, it can wait a bit. If you manage to read it by the end of the week and add some comment that would be awesome already [15:56:31] I can take care of converting your online comments to huge paragraphs [15:56:51] just -1 +1 accordingly [15:56:56] and I will amend at will [15:57:23] I've thought it through plenty of times on various bugs and even wrote a long essay similar to this last year (forgot where it went). I can't wait to start seeing responses and ops action items getting things moving in some direction. [15:58:00] I asked mark some ops cycles to assist in review / polishing up [15:59:14] Krinkle: gotta rush out, and feel free to postpone the read for later on :] [15:59:34] I see you've got most of my points in here already. I'll finish later. Thx! [15:59:38] ori: rushing out to get daughter back home :D [15:59:41] thx timo! [16:00:47] hashar: take care [16:00:59] (03PS5) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [16:01:01] (03PS5) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [16:02:49] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [16:04:50] (03CR) 10Krinkle: admin module for user/group/permissions cleanup (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [16:05:26] (03CR) 10Rush: admin module for user/group/permissions cleanup (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [16:06:03] (03CR) 10Rush: "mark: tested in vm's in monitoring project in labs....my excuse it I had it handy and on local vm for most of the refactoring just for san" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [16:11:10] (03PS6) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [16:11:12] (03PS6) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [16:12:45] milimetric: I'm trying out some friendlier configurations, please let me know if your issues continue through the day or not [16:12:56] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [16:13:17] chasemp: Krinkle was the one with the original problem, so Krinkle, see above ^ :) [16:14:05] chasemp: Did you find out what was causing the spotty reporting in graphite? [16:14:53] Krinkle: jury is still out, but hopefully. some queueing strategy internal to carbon taht is great for 30k metrics and terrible for 125k [16:15:13] Keeping an eye on http://codepen.io/Krinkle/full/cBGCl/ [16:33:55] RECOVERY - Host dysprosium is UP: PING OK - Packet loss = 0%, RTA = 0.44 ms [16:34:22] (03PS5) 10Alexandros Kosiaris: bacula: allow mysqldumps to be kept locally [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 [16:40:12] (03PS1) 10Yuvipanda: dynamicproxy: Don't specify text/html for gzip [operations/puppet] - 10https://gerrit.wikimedia.org/r/133265 [16:40:46] (03PS6) 10Alexandros Kosiaris: bacula: allow mysqldumps to be kept locally [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 [16:41:31] (03PS1) 10Yuvipanda: toollabs: Install git-svn [operations/puppet] - 10https://gerrit.wikimedia.org/r/133266 [16:41:35] scfc_de: ^ [16:41:48] (03PS1) 10Spage: Enable Flow on another beta features talk page [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133267 [16:43:57] (03PS2) 10Tim Landscheidt: Tools: Install git-svn [operations/puppet] - 10https://gerrit.wikimedia.org/r/133266 (owner: 10Yuvipanda) [16:44:52] (03CR) 10Tim Landscheidt: [C: 031] Tools: Install git-svn [operations/puppet] - 10https://gerrit.wikimedia.org/r/133266 (owner: 10Yuvipanda) [16:48:45] PROBLEM - NTP on dysprosium is CRITICAL: NTP CRITICAL: Offset unknown [17:02:36] (03PS1) 10Ori.livneh: Add Grafana module & role [operations/puppet] - 10https://gerrit.wikimedia.org/r/133274 [17:02:42] ^ paravoid, chasemp [17:04:20] (03PS1) 10Ori.livneh: Add grafana.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/133275 [17:04:40] (03PS2) 10Ori.livneh: Add Grafana module & role [operations/puppet] - 10https://gerrit.wikimedia.org/r/133274 [17:05:42] * ori runs away [17:05:56] * YuviPanda staples ori to a subway [17:12:21] (03CR) 10BryanDavis: Add Grafana module & role (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133274 (owner: 10Ori.livneh) [17:14:15] (03CR) 10Matanya: "Although grafana is behind misc-varnish and although zirconium already has port 80 open please add a ferm::rule to allow port 80 in grafra" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133274 (owner: 10Ori.livneh) [17:22:48] (03CR) 10BryanDavis: "I removed the cherry-pick of this patch from deployment-salt:/var/lib/git/operations/puppet because of the conflict caused by a formatting" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123444 (owner: 10Hashar) [17:22:57] (03PS2) 10Yurik: Provide read only access right zero-script to zeroscript group. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133200 (owner: 10Dr0ptp4kt) [17:23:35] (03CR) 10Yurik: [C: 032] Provide read only access right zero-script to zeroscript group. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133200 (owner: 10Dr0ptp4kt) [17:24:13] (03Merged) 10jenkins-bot: Provide read only access right zero-script to zeroscript group. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133200 (owner: 10Dr0ptp4kt) [17:28:50] (03CR) 10CSteipp: Improve nginx TLS/SSL settings. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [17:30:08] !log yurik synchronized wmf-config/CommonSettings.php [17:30:13] Logged the message, Master [17:47:00] !log yurik synchronized php-1.24wmf3/extensions/ZeroRatedMobileAccess/ [17:47:04] Logged the message, Master [17:50:04] !log yurik synchronized php-1.24wmf4/extensions/ZeroRatedMobileAccess/ [17:50:09] Logged the message, Master [17:57:05] (03CR) 10Rush: "A few of my comments from old changeset didn't submit...and now I don't know how to make them..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [18:10:19] so what's this on tungsten: /srv/deployment/mwprof/mwprof/mwprof --listen-port=3811 [18:10:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [18:20:28] <_joe_> chasemp: that is mwprof, the system that collects logs that are sent over udp and streams them to consumers [18:20:45] <_joe_> in this case, there is a mwprof-to-carbon script or something [18:21:01] <_joe_> so all profiling info for mw end up on graphite [18:21:04] kind of came to similar idea [18:21:09] wondering why it's not using statsd [18:21:17] its much much older [18:21:18] <_joe_> wow it kinda looked as I knew what I was talking about [18:21:24] statsd didn't exist at the time [18:21:25] <_joe_> statsd is NIH!!! [18:21:28] gotcha [18:21:28] ok [18:21:33] <_joe_> bring the pitchforks [18:21:38] * _joe_ joking [18:22:19] <_joe_> chasemp: that is one of the zillion things we should consolidate/migrate to a single backend [18:22:30] yes that was my next q [18:22:46] ebernhardson: who would know the most about it / conversion? [18:23:19] chasemp: hmm, i think ori would be the best bet these days. [18:23:31] ebernhardson: thanks [18:23:52] np [18:25:47] !log integration-slave1001 is having issues writing to disk [18:25:52] Logged the message, Master [18:30:06] (03PS1) 10Dr0ptp4kt: Add OM tagging for 416-03. [operations/puppet] - 10https://gerrit.wikimedia.org/r/133288 [18:30:36] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Wed May 14 15:29:38 2014 [18:31:26] bblack, when you have a moment, would you please review and, if appropriate, +2 https://gerrit.wikimedia.org/r/#/c/133288/ with merge and deploy? [18:32:28] !log integration-slave1001 had its 8GB / /dev/vda1 100% full. Purging /tmp/perf-*.map brought it back to 41% [18:32:33] Logged the message, Master [18:32:49] RoanKattouw: ^ [18:32:54] hhvm is still fucked [18:33:03] it's doing crap it was told not to [18:33:05] dr0ptp4kt: since this is a config change for an existing carrier, is this going to break cached stuff again? [18:33:29] Krinkle: Thanks for fixing it [18:33:33] putting MBs worth of crap in /tmp indefinitely (instead of in the jenkins workspace) [18:33:41] (03CR) 10BBlack: [C: 032 V: 032] Add OM tagging for 416-03. [operations/puppet] - 10https://gerrit.wikimedia.org/r/133288 (owner: 10Dr0ptp4kt) [18:33:43] bblack, it shouldn't. [18:34:31] I think the idea from irc the other day was that the Special:Zero... page's contents that are already in cache for that carrier-id won't reflect that opera is allowed now? [18:35:03] or does that only apply when we remove legit method of access as zero (e.g. remove opera or https support from a carrier)? [18:40:09] bblack, i believe, at least in part, it's more the removal followed by the addition that causes the strangeness. for 133288, this is adding x-cs tagging for om, so we should be okay. one other...fun...thing i noticed since we added some extra debug logging is the particularly affected carrier's strangeness may also in part be explained by some unreliability of accessing memcached (the fallback is httpGet - an HTTP GET - that then com [18:40:09] to memcached) - still not sure what exactly is going on there, although i suspect it may have something to do with the persistent refresher scripts causing a bottleneck of some sort to surface. for any matter, 133288 should be okay [18:42:43] chasemp: the collector -> graphite thing is really awful. it's code i inherited. if you want to refactor it, it would be wonderful -- it'd be a good opportunity for more opsen to get involved in the operational aspects of mediawiki [18:43:53] ori: cool, it's...on the list! where would I find this code? [18:44:15] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:44:38] chasemp: the C app that aggregates the profiling data is , the thing that grabs data from it and routes it to graphite is [18:46:09] dr0ptp4kt: the refresher script's load should amount to 12x (in parallel) of: login, fetch carriers+proxies, logout, all happening once per minute. Given two-step login, that's about 60 hits to api.php on each minute mark (login, login2, carriers, proxies, logout) [18:46:28] I'm not sure what your fetch stuff looks like on the appserver end on top of that, though. [18:46:45] bblack, i'm not talking about /your/ frefresher....i'm talking about those keeprefreshing.com style bots. [18:46:56] bblack, your script is good! [18:47:19] oh I thought we were talking about the memcached on the zero portal [18:47:42] I thought we had blocked out keeprefreshing.com a while back? [18:48:56] yeah, you put something in VCL to cut down on it. Do we need a different/better check? [18:49:08] ( https://gerrit.wikimedia.org/r/#/c/129714/ ) [18:53:24] bblack: hey, re: https://gerrit.wikimedia.org/r/#/c/130256/ -- i know you said you're waiting on me to submit fixes for the issues tim caught in review, but it might take me a bit longer, and in the meantime this issue is hurting mobile site perf [18:53:47] bblack: and since the issues were theoretical, i was wondering if you might be ok with deploying that [18:59:35] ori: yeah, I think so. Although I have some fears that the strtok issue isn't all theoretical. [19:00:02] bblack: i can't fix it at the moment, sorry :( i'm really overloaded [19:03:52] (03PS2) 10BBlack: mobile varnishes: enable GeoIP [operations/puppet] - 10https://gerrit.wikimedia.org/r/130256 (owner: 10Ori.livneh) [19:06:13] paravoid, AaronSchulz: Would either of you be upset if I edited references to ceph out of https://wikitech.wikimedia.org/wiki/Media_storage ? [19:07:18] (03CR) 10BBlack: [C: 032 V: 032] mobile varnishes: enable GeoIP [operations/puppet] - 10https://gerrit.wikimedia.org/r/130256 (owner: 10Ori.livneh) [19:07:29] bblack: thanks, will get to that bug asap [19:08:35] the effects will take a little while to show up, because the mobile varnishes have to be restarted (with some pausing) to link in libGeoIP and get vcl reloading right again [19:08:57] * ori nods [19:12:35] bd808|LUNCH: I don't mind [19:13:07] (03PS1) 10Ori.livneh: Add wikidiff2 to osmium checkouts [operations/puppet] - 10https://gerrit.wikimedia.org/r/133300 [19:13:15] (03PS2) 10Ori.livneh: Add wikidiff2 to osmium checkouts [operations/puppet] - 10https://gerrit.wikimedia.org/r/133300 [19:13:50] (03CR) 10Ori.livneh: [C: 032 V: 032] Add wikidiff2 to osmium checkouts [operations/puppet] - 10https://gerrit.wikimedia.org/r/133300 (owner: 10Ori.livneh) [19:17:06] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54243 bytes in 0.518 second response time [19:22:15] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:30:06] (03PS1) 10Rush: set carbon cache/creates/updates on shutdown [operations/puppet] - 10https://gerrit.wikimedia.org/r/133304 [19:30:06] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54264 bytes in 0.232 second response time [19:30:24] (03CR) 10QChris: "Not voting to not stop others to merge if they care less" (031 comment) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) (owner: 10Nuria) [19:37:06] mutante: what should be done with the static asset ticket? [19:40:57] matanya2: figure out if we really want to monitor skins/ or the skins-1.5 path. confirm if the one in lvm monitor class is _really_ used. then it's either rejected (already have) or fixing the path [19:41:13] MaxSem: Thank you for merging that zhwikisource patch yesterday, appreciate it greatly. [19:41:21] eh, wait, the names are slightly different, but you know which i mean [19:41:27] :) [19:41:59] ok, i'll try to find out somehow [19:45:02] matanya2: ok, here's one part of that. [19:45:04] puppet_services.cfg: check_command check_http_lvs!bits.wikimedia.org!/skins-1.5/common/images/poweredby_mediawiki_88x31.png [19:45:07] puppet_services.cfg: check_command check_https_url!bits.wikimedia.org!/skins-1.5/common/images/poweredby_mediawiki_88x31.png [19:45:14] that's straight from neon, so we have that in icinga [19:45:23] but it's all the skins-1.5 path [19:45:33] with the 1 [19:45:34] and it seems that never triggered [19:45:43] 1.5 path [19:45:46] when we had the issue recently with the wrong php symlink [19:46:28] so changing the path seems needed [19:46:30] so we know what we have in watchmouse , what we have in icinga, and question on the ticket is.. which one should be it, or simply rejected [19:47:50] matanya2: yes, well, the one that found the issue seems better [19:49:07] i'll do that [19:49:22] thx [19:50:44] what is frack? [19:50:59] fundraising rack [19:51:40] they are seperated ? [19:52:11] yeah, probably due to PCI compliances [19:52:27] oh. makes sense [19:52:54] yes [19:53:00] exactly that [19:53:12] so i need to find out about that host in maze [19:53:16] h [19:53:27] for frack stuff ask Jeff [19:53:28] jeff will know [19:55:28] (03PS1) 10Chad: Remove LQT from wikimania2011wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133350 [19:56:55] (03CR) 10Chad: [C: 032] Remove LQT from wikimania2011wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133350 (owner: 10Chad) [19:57:03] (03Merged) 10jenkins-bot: Remove LQT from wikimania2011wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133350 (owner: 10Chad) [19:58:34] !log demon synchronized wmf-config/InitialiseSettings.php 'No more LQT on wikimania2011wiki' [19:58:38] Logged the message, Master [20:03:24] i'm multipling [20:27:10] (03PS1) 10Chad: Add CirrusSearch general debug log to whitelist [operations/puppet] - 10https://gerrit.wikimedia.org/r/133354 [20:32:27] !log Restarting logstash on logstash1001.eqiad.wmnet due to missing messages from some (all?) logs [20:32:31] Logged the message, Master [20:43:09] 2014-05-14 20:42:57 mw1107 enwiki: Flow\Data\ObjectLocator::findMulti: No index (out of 2) available to answer query for ref_src_namespace, ref_src_title with options [] [20:43:18] that is spamming at like full speed :) [20:43:35] ebernhardson: ^ [20:46:25] (03PS3) 10Matanya: bits: add icinga check [operations/puppet] - 10https://gerrit.wikimedia.org/r/133051 [20:48:57] (03CR) 10BryanDavis: [C: 031] Add CirrusSearch general debug log to whitelist [operations/puppet] - 10https://gerrit.wikimedia.org/r/133354 (owner: 10Chad) [20:51:22] btw, for people who want to understand why I'm writing https://www.mediawiki.org/wiki/Performance_guidelines , or bikeshed it :) , or talk about the security guidelines or architecture guidelines, join me in #wikimedia-office in 10 min [20:52:39] (03CR) 10BryanDavis: "Cherry-picked into beta puppetmaster." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133354 (owner: 10Chad) [20:52:52] mutante: heh, was going to say... [20:54:26] jeremyb: :) [20:55:27] (03PS4) 10Matanya: bits: add icinga check [operations/puppet] - 10https://gerrit.wikimedia.org/r/133051 [20:56:11] greg-g: poking again regarding nda things [20:56:20] :/ [20:56:24] where was I on that... [20:57:11] matanya: remind me, do you have a request blocked? RT#? [20:57:31] AaronSchulz: i let werdna know hopefully he can fix, i'm off for the next 3 weeks(baby) [20:58:24] https://icinga.wikimedia.org/cgi-bin/icinga/trends.cgi?createimage&t1=1399420800&t2=1400101083&assumeinitialstates=yes&assumestatesduringnotrunning=yes&initialassumedhoststate=0&initialassumedservicestate=0&assumestateretention=yes&includesoftstates=no&host=bits-lb.eqiad.wikimedia.org&service=LVS+HTTP+IPv4&backtrack=4&zoom=4 [20:58:38] ehm :) nice URL.. but that shows something matanya [20:58:51] (03PS1) 10BryanDavis: Logstash: remove deprecated grep filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/133359 [20:59:06] the history of the existing check [20:59:22] and that it supposedly never broke when the watchmouse URL did break [21:00:28] greg-g: logstash icinga [21:00:34] and that is: /skins-1.5/common/images/poweredby_mediawiki_88x31.png [21:00:59] mutante's link for example greg-g :) [21:01:18] * greg-g nods :) [21:02:32] (03PS5) 10Dzahn: bits: change test URL in static assets check [operations/puppet] - 10https://gerrit.wikimedia.org/r/133051 (owner: 10Matanya) [21:06:18] (03CR) 10Dzahn: [C: 031] "both URLs work, but the new one should make it identical to what watchmouse checks. and recently only the watchmouse URL broke when we had" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133051 (owner: 10Matanya) [21:07:39] (03CR) 10Dzahn: "https://bits.wikimedia.org/skins-1.5/common/images/poweredby_mediawiki_88x31.png" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133051 (owner: 10Matanya) [21:11:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [21:13:13] (03CR) 10Dzahn: [C: 032] "yea, this is what watchmouse checks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133051 (owner: 10Matanya) [21:19:25] RECOVERY - RAID on dataset1001 is OK: OK: optimal, 2 logical, 24 physical [21:25:52] (03PS2) 10Dzahn: fix apache-fast-test for use in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/130614 [21:26:26] icinga restart fails with Error: Could not find any servicegroup matching '​analytics_eqiad' [21:29:03] (03PS2) 10Rush: ircd-ratbox and udpmxircecho puppetized [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 [21:30:21] (03CR) 10Rush: "thoughts?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 (owner: 10Rush) [21:30:52] (03PS1) 10Reza: [WIP] Lots of rights changes for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 [21:30:55] (03PS3) 10Rush: puppet-lint in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 [21:31:05] (03CR) 10Rush: [C: 032 V: 032] puppet-lint in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 (owner: 10Rush) [21:31:15] (03CR) 10Dzahn: [C: 032] "ugh, why can't i submit it? we can just merge this anytime even if there is no /srv/pybal yet, it doesn't break anything and it's not goin" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130614 (owner: 10Dzahn) [21:31:35] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Wed May 14 15:29:38 2014 [21:31:50] (03PS2) 10Rush: set carbon cache/creates/updates on shutdown [operations/puppet] - 10https://gerrit.wikimedia.org/r/133304 [21:32:04] (03CR) 10Dzahn: "oh yea, dependency of course, my bad" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130614 (owner: 10Dzahn) [21:32:06] (03CR) 10jenkins-bot: [V: 04-1] set carbon cache/creates/updates on shutdown [operations/puppet] - 10https://gerrit.wikimedia.org/r/133304 (owner: 10Rush) [21:33:25] (03PS3) 10Rush: set carbon cache/creates/updates on shutdown [operations/puppet] - 10https://gerrit.wikimedia.org/r/133304 [21:34:11] (03CR) 10Rush: [C: 032 V: 032] "merging so I can enable puppet on graphite server. there may be a purer solution to this than inf for max_cache as of now I don't have it" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133304 (owner: 10Rush) [21:34:19] 10hello [21:35:25] RECOVERY - Puppet freshness on tungsten is OK: puppet ran at Wed May 14 21:35:16 UTC 2014 [21:37:14] (03PS2) 10Reza: [WIP] Lots of rights changes for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 [21:38:18] (03PS2) 10BryanDavis: Logstash: remove deprecated grep filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/133359 [21:39:06] (03CR) 10BryanDavis: "Applied in beta via cherry-pick. Logstash working fine." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133359 (owner: 10BryanDavis) [21:39:39] 04how 08are 09you 12guys 13doing? :) [21:40:19] grrrit-wm: what are your messages about? [21:40:29] I guess you are the operator here? [21:41:08] gandaro: need any help? [21:41:11] <^d> grrrit-wm is a bot. [21:41:23] (03PS3) 10Reza: [WIP] Lots of rights changes for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 [21:42:15] gandaro o_O [21:42:32] Are you really the EmperorOfChina? [21:43:08] obviously, but this is off-topic for this channel I suppose [21:43:31] this is off-topic channel? [21:43:44] greg-g: anyway, if it is too much of a hassle, let it go, i'll manage with what there is [21:44:05] matanya: Well, not much of a hassle for me yet, all I just did was re-ping the relevant thread/people. [21:44:40] greg-g: i mean as an org, not personally, but that too :) [21:44:54] well, it's a pain as an org for dumb reasons :/ [21:45:00] (03CR) 10Dzahn: "icinga on neon doesn't reload because it can't find this service group." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132921 (owner: 10Giuseppe Lavagetto) [21:45:22] EmperorOfChina: I will defer to you, master, and not unnerve you guys talking about serious stuff! [21:45:55] greg-g: Oh, UTC+2 times on the wiki deployments page now, well done :) [21:46:04] hoo: thank mwalker :) [21:46:14] hoo: also, it's just whatever your timezone is [21:46:15] nice work :) thanks [21:46:16] yay here too [21:48:31] (03PS1) 10Dzahn: fix service group in kafka role to unbreak icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/133369 [21:50:10] (03PS3) 10Hoo man: Show AbuseFilter hits on IRC for wikis with public logs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130274 (https://bugzilla.wikimedia.org/64255) (owner: 10Gerrit Patch Uploader) [21:50:31] csteipp: ^ Could you have a quick look? Would like that to go out with tonights SWAT [21:50:52] (03PS2) 10Dzahn: fix service group in kafka role to unbreak icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/133369 [21:52:02] (03CR) 10Rush: "I emailed steveivy@gmail.com asking for clarification" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [21:52:08] (03CR) 10Matanya: "$nagios_servicegroup = '<200b>analytics_eqiad' in vim." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133369 (owner: 10Dzahn) [21:52:11] (03CR) 10Dzahn: [C: 032] " :p" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133369 (owner: 10Dzahn) [21:52:20] hoo: Yeah, I'll look. It made me a little nervous on my first look at it. [21:52:39] csteipp: That's why I want to have another pair of eyes on it ;) [21:53:11] (03CR) 10Dzahn: "to see the error: root@neon:~# icinga -v /etc/icinga/icinga.cfg" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133369 (owner: 10Dzahn) [21:57:25] marktraceur says it's a good idea to mention here that i just deployed a change to the jenkins beta-parsoid-update-eqiad job [21:57:56] I sort of retracted, but while we're at it [21:58:18] cscott: The change just made it so the git files are also deployed, right? [22:03:03] * marktraceur confirms that [22:03:07] !log maxsem synchronized php-1.24wmf4/extensions/MobileFrontend/ 'bug 65042' [22:03:12] Logged the message, Master [22:03:14] !log cscott deployed a jenkins job change that pushes parsoid git files to beta-labs for version purposes [22:03:17] Logged the message, Master [22:03:55] morebots: what do you think of siri? [22:03:55] I am a logbot running on tools-exec-09. [22:03:55] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [22:03:55] To log a message, type !log . [22:04:48] !log maxsem synchronized php-1.24wmf3/extensions/MobileFrontend/ 'bug 65042' [22:04:53] Logged the message, Master [22:10:16] greg-g, more changes coming! /me laughs evily [22:10:24] :) :) [22:11:58] and ticket is resolved :) [22:12:03] night all [22:18:09] (03CR) 10BryanDavis: initial commit for a phabricator module (WIP) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132505 (owner: 10Dzahn) [22:19:34] (03PS1) 10Cmjohnson: Adding mgmt ip's for new misc servers [operations/dns] - 10https://gerrit.wikimedia.org/r/133378 [22:21:25] (03CR) 10Rush: "https://github.com/sivy/pystatsd/issues/48" [operations/debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [22:24:54] bd808: hey man on https://gerrit.wikimedia.org/r/#/c/132505/, what's the good stuff you get using trebuchet for this instead of straight clone? [22:25:37] Control of when changes go to the cluster (via `git deploy`) is a big one I think [22:26:00] With git::clone things roll out as soon as puppet runs post-merge [22:26:42] I see you mean to have the phab server be behind master [22:27:00] not playing stupid I promise :) why is that needed for this? [22:27:19] Testing in labs before updating prod? [22:27:22] hoo: I think that looks good. The notifications are equivalent to abuse-filter-log, right? Not abuse-filter-log-details? [22:27:30] why wouldn't you do that in a branch? [22:27:48] in theory cloning our fork should get you what's in our prod no? [22:28:05] chasemp: I suppose that's a good question. I never think of using branches here. [22:28:17] (03PS1) 10Ori.livneh: hhvm:dev: install libthai-dev for wikidiff2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133380 [22:28:22] hah [22:28:45] My mind has been warped into an always work on master POV [22:28:55] csteipp: Only short summaries for IRC AFAIR [22:29:02] <^d> bd808: master or go home. [22:29:05] like you can't really put much on IRC anyway [22:30:30] I think for this instance it's going to be better to not separate deployment logic, i.e. master is canonical for prod in a sense [22:31:10] (03CR) 10CSteipp: [C: 031] "Correctly sets $wgAbuseFilterNotifications = false for any wikis that already overrode the default visibility of the log with $wgGroupPerm" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130274 (https://bugzilla.wikimedia.org/64255) (owner: 10Gerrit Patch Uploader) [22:31:14] Using trebuchet seems like the right kind of dogfood eating rather than yet another deployment scenario, but I may have some intrinsic bias as the current deploy tool slave. [22:32:14] I've seen we use git::clone in numerous places, is it your thought that shouldn't be? [22:32:38] chasemp: well, why not just use different hosts/repositories for prod and tesing? (git push production master vs. git push testing master [22:33:13] we could but it's added complexity, what's the benefit? [22:33:14] I had some battles with it for deploying the scap scripts but that may have been a weird usecase (needed g+w flags) [22:34:03] <^d> I dislike git::clone. [22:34:08] chasemp: testing can easily be on a different rev than production, and still, both are on master? (wasn't that the point?) [22:34:20] <^d> Last couple of times I tried using it it barfed if your setup is anything but vanilla. [22:34:33] <^d> It doesn't recover well from partially cloned or otherwise non-empty destinations. [22:34:48] Trminator: I'm not understanding what you are advocating [22:35:12] I have no particular love of git::clone but I would rather fix it if it's screwey [22:35:15] chasemp: nvm then, likely my mistake then :) [22:35:22] than leave a half dozen instances and move on from it [22:35:53] <^d> chasemp: I'd advocate just removing it and using a real deployment system rather than clone-and-pray :p [22:36:13] I've used a git clone deployment system for years before that worked very well :) [22:36:24] but not this one [22:36:51] honestly just curious, I have no special knowledge of the use case here other than the limited discussion [22:37:24] the downside is [22:37:38] trebuchet separates the state of the system from the management logic in puppet [22:37:52] effectively splitting the ability to maintain state on a system in this case [22:38:11] definitely added complexity for simple a thing and harder to monitor at least from where I'm standing thus far [22:50:14] chasemp: Maybe you and twentyaftertwo can just rebuild all the deployment systems. :) [22:50:33] s/twentyaftertwo/twentyafterfour/ [22:50:47] big difference in those two times hehe [22:53:00] bd808: I'm poking and prodding for info at this point :) curious about trebuchet among other things. the "deployment" system for web code we both used before went through a few iterations but was exceedingly simple. staging server running master serving staff (cookie in lb), web ui button to push master on staging out to teh webfarm [22:53:23] that was done w/ scp and gnu parallel and we looked at / used in other places for similar things mcollective with a git agent for mass update of repo's over the env [22:53:26] upon push [22:53:43] but I digress [22:54:32] Trebuchet is fundamentally staging in git repo on deploy server with salt to tell nodes to fetch a specific tag [22:55:17] scap is fundamentally staging in a directory on deploy server with parallel ssh to tell nodes to rsync [22:55:17] I genuinely don't know enough, but for this salt is a very big hammer [22:56:27] and they do teh same thing as each other but in different places? [22:57:08] scap is a one trick pony for deploying the mediawiki code [22:58:06] Trebuchet is shaping up to be a "general" deployment tool. It has hooks for extending functionality on a per repo basis [22:58:38] git-deploy is a synonym for one of them? [22:58:47] git-deploy == trebuchet [22:58:51] ah [22:59:40] The naming thing... it's a long and boring story [23:03:24] <^d> It's a short and simple story. [23:03:29] <^d> "We suck at naming things" [23:03:35] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Wed May 14 20:02:47 2014 [23:04:37] FWIW trebuchet makes more sense to me as a name than sartoris did [23:04:57] sartoris was previous synonym? [23:05:02] Yes. [23:05:07] good times [23:05:19] For a slightly different version with another license and co-author [23:06:00] git-deploy was a perl thing, it was ported to python and tweaked as sartoris and then forked and tweaked more as trebuchet [23:08:33] chasemp: https://github.com/trebuchet-deploy/ is the upstream mostly kinda sorta. [23:08:41] heh [23:09:10] Ryan was talking about a patch to really brin in his upstream but I haven't seen it yet. [23:09:17] *g [23:17:04] ori, have you done anything with the swat today? [23:18:57] RoanKattouw, ebernhardson; I guess I shall do the swat [23:19:01] k [23:19:19] (03PS1) 10BryanDavis: Logstash: Add support for logging irc !log messages [operations/puppet] - 10https://gerrit.wikimedia.org/r/133393 [23:19:28] (03CR) 10Mwalker: [C: 032] Show AbuseFilter hits on IRC for wikis with public logs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130274 (https://bugzilla.wikimedia.org/64255) (owner: 10Gerrit Patch Uploader) [23:19:37] (03CR) 10Mwalker: [C: 032] Enable VisualEditor as a Beta Feature on Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132409 (https://bugzilla.wikimedia.org/65067) (owner: 10Jforrester) [23:19:57] (03Merged) 10jenkins-bot: Enable VisualEditor as a Beta Feature on Commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132409 (https://bugzilla.wikimedia.org/65067) (owner: 10Jforrester) [23:27:17] !log mwalker synchronized wmf-config/ 'SWAT of {{gerrit|132409}} (visual editor) and {{gerrit|130274}} (abuse filter)' [23:27:21] Logged the message, Master [23:28:30] hoo, can you check to see if your configuration change applied correctly? [23:29:14] !log mwalker synchronized visualeditor.dblist 'Another part of {{gerrit|132409}} (visual editor)' [23:29:18] Logged the message, Master [23:29:31] mwalker: Let me think... not quite trivial for this one [23:29:40] ya; I just looked at the change :p [23:29:53] quite the firehose to look in the irc log and you'd have to make an edit that was tagged as abuse [23:30:07] * mwalker goes off to stab the justin beiber page [23:30:16] jk [23:30:30] mwalker: mh, I think enwiki had it before [23:30:45] so better test eg. russian [23:33:39] RoanKattouw, James_F|Away; supposedly visualeditor is now on commons in all the places it should be -- but I'm not actually seeing it in my beta features [23:34:02] mwalker: Did you sync-file the visualeditor.dblist file? [23:34:11] It appears you didn't [23:34:16] Note that it is not inside the wmf-config/ directory [23:34:45] I supposedly did... I ran: mwalker@tin:/a/common$ sync-file visualeditor.dblist "Another part of {{gerrit|132409}} (visual editor)" [23:36:10] SAL shows that log entry [23:36:40] And https://noc.wikimedia.org/conf/visualeditor.dblist shows commonswiki [23:37:02] aye; and a random apache host has it too [23:37:09] OK [23:37:24] Looking on Commons [23:38:24] marktraceur, is there something special I need to do to deploy something new to beta features? e.g. some cache I need to clear or file I need to touch? [23:38:52] mwalker: You need to whitelist it [23:39:04] that. [23:39:08] mwalker: And to whitelist it you need to talk to James_F|Away. [23:39:13] Or somebody has to have. [23:39:18] marktraceur: This is VE [23:39:21] It's already whitelisted [23:39:22] :) [23:39:36] Hmm [23:39:39] https://commons.wikimedia.org/wiki/Special:Version does not list VisualEditor [23:39:47] Maybe it's config cache [23:39:59] mwalker: Could you try touch wmf-config/InitialiseSettings.php then sync-file it? [23:40:08] Then check if Special:Version lists VisualEditor [23:41:14] RoanKattouw, not sure that's going to help, mw1022 has the version of InitializeSettings that is on tin [23:41:50] I realize that [23:42:00] What I want you to do is kick the config cache [23:42:06] Excuse me, why is this happening during SWAT? [23:42:11] Which varies on the mtime of InitialiseSettings but not on some other things [23:42:13] "no new features" I thought? [23:42:24] marktraceur: It's just a one-line addition to visualeditor.dblist [23:42:36] RoanKattouw, hokay; it seems to have applied [23:42:36] It's not a new feature [23:42:40] It's just enabling an existing feature on one more wiki [23:42:42] Sweet! [23:42:49] !log mwalker synchronized wmf-config/InitialiseSettings.php 'Poking settings to try and apply them' [23:43:00] It is for the wiki where you're pushing it...but whatever [23:43:00] Logged the message, Master [23:43:12] greg-g, good question that marktraceur just raised... [23:43:19] are beta features normally swattable? [23:43:38] my feeling is yes; because it's beta [23:43:39] Clarification: expansion of existing beta features to new wikis [23:43:46] More helpfully, does "turn on a new feature" mean "new to the cluster" or "new to the wiki"? [23:44:19] new to cluster: no. New to the wiki: meh, as long as it was announced before hand. But I'd prefer the team just do it outside of a SWAT window. [23:44:37] RoanKattouw, oh; I'm guessing the config cache does not vary on the ve.dblist file which I pushed after I pushed the initializesettings.php file [23:44:40] I'd like teams with deployers on them to not use swat windows, in general. [23:44:43] so I did it in the wrong order [23:45:21] Exactly [23:45:44] (03CR) 10BryanDavis: "Applied in beta via cherry-pick." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133393 (owner: 10BryanDavis) [23:46:03] greg-g: Not at all? Not for turning features on on wikis? [23:46:10] Because if former, oops. [23:46:19] I think teams with deployers SWATting bug fixes is perfectly reasonable [23:46:45] mwalker: thanks for swatting, i missed the ping [23:46:51] (03PS1) 10Ori.livneh: Apply mediawiki::cgroup on osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/133397 [23:47:12] RoanKattouw: I just want to encourage more self-service/getting the deployers on your team comfortable as much as possible. [23:47:30] marktraceur: re which one? 1) new to cluster or 2) new to a wiki? [23:47:58] (03CR) 10Ori.livneh: [C: 032] hhvm:dev: install libthai-dev for wikidiff2 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133380 (owner: 10Ori.livneh) [23:48:04] greg-g: I'm asking if teams with deployers shouldn't use SWAT at all. [23:48:10] Because if so I've been doing it Wrong. [23:48:13] exit [23:48:16] ah; wrong window [23:48:27] :q! [23:48:37] oh, no, not not at all. Just, I'd like to get ya'll used to doing it yourself so that you feel confident. 'tis all. [23:48:55] (03CR) 10Ori.livneh: [C: 032] Apply mediawiki::cgroup on osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/133397 (owner: 10Ori.livneh) [23:48:59] oh, and to keep SWAT windows more open for people who don't have deployers they can prod easily. [23:49:00] greg-g, isn't that a reversion to the old model of doing things? [23:49:30] where anyone with deploy rights grabbed a slot? [23:49:31] greg-g: Which begs the question, what teams don't have deployers and why? [23:49:34] not exactly. The SWAT windows were for quick bug fixes, generally for people who don't have deploy rights/experience. [23:49:59] bd808: exactly. (I don't know off the top of my head and 'because') [23:51:08] mwalker: I'm not against anyone with deploy rights grabbing a non-used window of time. I'd like us to ultimately move into a more fluid deploy world. SWAT windows are a nice in-betweenie place for a lot of things though. [23:51:09] I'm guessing I'm the weirdest deployer because I'm totally comfortable rolling out a new MW branch but get nervous about the idea of deploy a single file change. :) [23:51:26] bd808: yep, you are bass ackwards. [23:51:51] it makes some amount of sense; it means that everything is consistent [23:52:10] you're not circumventing the process like I just did [23:52:17] which caused the config file cache to get out of sync [23:53:10] bd808: Also you're way more familiar with the script for the former case than the rest of us are :) [23:53:22] mwalker: Exactly. And the deep end I was pushed off of was "Hey you're deploying core next week. Ask around to find out how to do that." [23:53:46] so; all the teams I can think of who code (ve, parsoid, flow, multimedia, fundraising, mobile and zero) have deployers [23:53:49] who am I missing? [23:54:06] design? analytics? [23:54:49] (03CR) 10BryanDavis: "Dashboard at https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/irc" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133393 (owner: 10BryanDavis) [23:54:54] mwalker: correct and correct. Though Design mostly 'pairs' with people, sometimes those who are deployers. So it's "sometimes". [23:55:27] also, the big users/winners of SWAT windows are people like odder [23:55:32] which is great, imo [23:55:51] true; community bug fix deploys [23:58:23] (03CR) 10Ori.livneh: [C: 032] Logstash: remove deprecated grep filter [operations/puppet] - 10https://gerrit.wikimedia.org/r/133359 (owner: 10BryanDavis)