[00:02:12] (03PS1) 10Ryan Lane: Revert "Temporarily move OC traffic to eqiad" [operations/dns] - 10https://gerrit.wikimedia.org/r/91098 [00:02:35] Ryan_Lane: does it work? [00:03:00] let me do one last set of tests before I say yes ;) [00:03:04] I wasn't merging that just yet [00:03:10] but getting ready for it [00:03:12] k [00:03:15] I'm sort of around [00:04:23] <^demon|lunch> Ouch. We do not do bengali better. [00:08:53] looks like it's working properly to me [00:09:42] (03CR) 10Ryan Lane: [C: 032] Revert "Temporarily move OC traffic to eqiad" [operations/dns] - 10https://gerrit.wikimedia.org/r/91098 (owner: 10Ryan Lane) [00:27:41] (03CR) 10Awjrichards: "Alternatively, we could keep the www. rule and I can add a separate rule specifically for the m. domain." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91058 (owner: 10Awjrichards) [00:29:09] (03CR) 10Ryan Lane: [C: 032] Salt 0.17 compatibility for deployment scripts [operations/puppet] - 10https://gerrit.wikimedia.org/r/89495 (owner: 10Ryan Lane) [00:38:05] (03PS1) 10Springle: depool db1018 while cloning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91102 [00:38:26] (03PS1) 10Ryan Lane: Don't reference pillars before assigned [operations/puppet] - 10https://gerrit.wikimedia.org/r/91103 [00:39:01] (03CR) 10Springle: [C: 032] depool db1018 while cloning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91102 (owner: 10Springle) [00:40:18] !log springle synchronized wmf-config/db-eqiad.php 'depool db1018 while cloning' [00:40:33] Logged the message, Master [00:42:34] (03PS1) 10Springle: db1034 using mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/91104 [00:43:48] (03CR) 10Springle: [C: 032] db1034 using mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/91104 (owner: 10Springle) [00:45:32] springle, have time today to work on getting https://gerrit.wikimedia.org/r/#/c/88666/ merged? [00:46:42] (03CR) 10Ryan Lane: [C: 032] Don't reference pillars before assigned [operations/puppet] - 10https://gerrit.wikimedia.org/r/91103 (owner: 10Ryan Lane) [00:48:37] andrewbogott: certainly [00:49:08] It had conflicts with a recent patch of yours, so I'd appreciate you verifying that I didn't break any of that... [00:49:13] Then we can merge and watch it go :) [00:50:19] conflicts are hard to track when a patch moves a file :( git just flagged about 150 lines as conflicting [00:52:13] andrewbogott: go ahead [00:52:39] (03PS1) 10Springle: depool db1007 while cloning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91105 [00:53:01] (03CR) 10Springle: [C: 032] depool db1007 while cloning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91105 (owner: 10Springle) [00:53:35] !log springle synchronized wmf-config/db-eqiad.php 'depool db1007 while cloning' [00:53:47] Logged the message, Master [00:53:58] (03CR) 10Andrew Bogott: [C: 032] Move mysql_wmf into a module. [operations/puppet] - 10https://gerrit.wikimedia.org/r/88666 (owner: 10Andrew Bogott) [00:56:01] ok, watching a puppet run on db1044 [00:57:03] PROBLEM - mysqld processes on db1041 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [00:57:23] springle, here's the diff: https://dpaste.de/Iy2o [00:57:31] Not urgent but probably still a mistake of some sort... [00:58:14] Who do I tase to start a new mailing list? I tried https://bugzilla.wikimedia.org/show_bug.cgi?id=55635, but this is a rumoured black hole. [00:58:50] awight: Some are managed by ops and some are managed by OIT. And I know not the difference. [00:58:55] As an Op, I recommend you contact OIT :) [00:58:58] <^d> awight: That's the right place. [01:00:15] niiice. OK thanks, I'll rattle some bones. [01:02:16] andrewbogott: fairly painless. good job. all that an only a file perms tweak needed :) [01:02:21] so far... ;) [01:03:25] weirdly, permissions were not specified either in the before or after [01:03:41] Suppose the permissions of the files /in git/ matter? [01:04:42] <^d> You can check in permissions in git. [01:04:42] and who the heck is user 999? [01:04:53] You can, but would puppet maintain them when installing a file? [01:04:58] Surely it applies defaults instead [01:05:05] <^d> Yeah it should. [01:05:45] <^d> I consider checking permissions into git to be a helpful thing to provide mostly-sane defaults for people. [01:05:58] <^d> But you shouldn't rely on them and should set permissions appropriately on things :) [01:06:02] an explicit owner=> group=> mode=> in the file block in puppet safest? [01:06:16] Yeah, that's surely safest. I just wonder why it changed, in this case... [01:06:20] <^d> Safest. Certainly can't hurt. [01:06:20] anyway I'll just specify in puppet [01:06:45] * andrewbogott curses [01:06:54] there are three files specified in that puppet block but only two got their perms changed [01:06:57] !log upgrading db1034 & db1041 to precise [01:07:12] Logged the message, Master [01:11:02] (03PS1) 10Andrew Bogott: Specify group & permissions for a couple of files. [operations/puppet] - 10https://gerrit.wikimedia.org/r/91112 [01:11:42] springle, ^ [01:13:33] PROBLEM - DPKG on db1041 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [01:13:57] andrewbogott: they don't need owner=> ? db1044 has both owner and group 999 [01:14:29] Hm, I was only responding to the diff when puppet applied. [01:14:41] I don't actually know what the owner/group/permissions /should/ be. [01:14:43] Do you have an opinion? [01:14:56] i think they should be root:root 644 [01:15:03] PROBLEM - MySQL Processlist on db1041 is CRITICAL: Connection refused by host [01:15:04] for all three of those files [01:15:23] ok [01:15:53] PROBLEM - MySQL InnoDB on db1041 is CRITICAL: Connection refused by host [01:15:57] we can only break some monitoring here :) [01:16:31] host db1041 [01:16:38] shutup [01:17:01] (03PS2) 10Andrew Bogott: Specify group & permissions for a couple of files. [operations/puppet] - 10https://gerrit.wikimedia.org/r/91112 [01:18:59] springle: ^ ? [01:19:49] go for it [01:20:12] (03CR) 10Andrew Bogott: [C: 032] Specify group & permissions for a couple of files. [operations/puppet] - 10https://gerrit.wikimedia.org/r/91112 (owner: 10Andrew Bogott) [01:21:57] hm… ok, looks reasonable on my end. [01:22:14] yep [01:23:49] ok -- I'm going to step away in a few minutes, but please email me if you see anything misbehaving. I'll be up for a while yet. [01:24:16] andrewbogott: an doing a couple upgrade/installs now. will watch for discrepancies [01:24:26] thx [01:26:31] !log start xtrabackup clone db1018 to db1034 [01:26:47] Logged the message, Master [01:31:05] (03PS1) 10Springle: db1041 using mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/91113 [01:32:05] (03CR) 10Springle: [C: 032] db1041 using mariadb [operations/puppet] - 10https://gerrit.wikimedia.org/r/91113 (owner: 10Springle) [01:37:13] PROBLEM - mysqld processes on db1007 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [01:38:13] RECOVERY - mysqld processes on db1007 is OK: PROCS OK: 1 process with command name mysqld [01:40:59] !log start xtrabackup clone db1007 to db1041 [01:41:15] Logged the message, Master [02:10:13] (03CR) 10Fabriceflorin: [C: 031] "Sounds good to me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/91084 (owner: 10Dzahn) [02:25:12] !log LocalisationUpdate completed (1.22wmf22) at Tue Oct 22 02:25:12 UTC 2013 [02:25:29] Logged the message, Master [02:39:07] Would appreciate if any ops people could weigh in on https://bugzilla.wikimedia.org/show_bug.cgi?id=55981 [02:39:15] Trying to figure out a better Redis configuration for MediaWiki-Vagrant. [02:39:32] !log LocalisationUpdate completed (1.22wmf21) at Tue Oct 22 02:39:32 UTC 2013 [02:39:45] Logged the message, Master [03:00:25] !log upgrading db1017 to precise [03:00:39] Logged the message, Master [03:04:58] springle: Let's say someone wanted to do an ALTER on the revision table of Wikimedia wikis. Doable? How painful? [03:05:49] revision has a PK so it can be done online. load spike, but possible [03:06:10] https://bugzilla.wikimedia.org/show_bug.cgi?id=55398#c4 [03:06:19] Does database size matter? [03:06:32] I mean, is there enough space to add a column without overflowing disks? [03:06:46] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 22 03:06:46 UTC 2013 [03:06:46] Or does anyone need to be worried about that kind of thing. revision is big... [03:06:57] database table size * [03:07:01] Logged the message, Master [03:10:07] we have space for that alter [03:10:35] Okay, thanks. :-) That'll help push the bug forward a bit, I think. [03:11:29] Long-term, adding this column should reduce database load, I think. Maybe. Rather than moving rows between tables, we'll be able to do simple updates on a bit field, as I understand it. [03:12:01] i'll comment on the bug [03:12:16] That'd be awesome. [03:30:58] (03CR) 10Dzahn: "so i suppose /etc/ssl/certs/ca-certificates.crt must be included in the .chained.pem or specified additionally or this needs to keep SSLCA" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90676 (owner: 10Dzahn) [03:45:30] (03PS1) 10Springle: repool db1007, warm up db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91121 [03:46:01] (03CR) 10Springle: [C: 032] repool db1007, warm up db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91121 (owner: 10Springle) [03:46:50] !log springle synchronized wmf-config/db-eqiad.php 'repool db1007, warm up db1041' [03:47:05] Logged the message, Master [04:12:19] (03PS1) 10Dzahn: remove nomcom wiki Apache config [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91124 [04:20:15] (03CR) 10Dzahn: "no db: Unknown database 'nomcomwiki'." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91124 (owner: 10Dzahn) [04:27:47] (03PS1) 10Dzahn: remove nomcom wiki [operations/dns] - 10https://gerrit.wikimedia.org/r/91125 [04:40:13] RECOVERY - search indices - check lucene status page on search18 is OK: HTTP OK: HTTP/1.1 200 OK - 55880 bytes in 0.112 second response time [05:09:37] (03CR) 10Ori.livneh: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 (owner: 10Physikerwelt) [05:14:26] (03PS1) 10Dzahn: delete search.wikimedia.org Apache config file [operations/puppet] - 10https://gerrit.wikimedia.org/r/91132 [05:17:15] (03PS1) 10Springle: sideline db1017 for a time, disk & config issues [operations/puppet] - 10https://gerrit.wikimedia.org/r/91133 [05:23:30] (03CR) 10Springle: [C: 032] sideline db1017 for a time, disk & config issues [operations/puppet] - 10https://gerrit.wikimedia.org/r/91133 (owner: 10Springle) [05:30:55] (03CR) 10Physikerwelt: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 (owner: 10Physikerwelt) [05:35:24] (03PS1) 10Springle: sync sX-master dns after rotations [operations/dns] - 10https://gerrit.wikimedia.org/r/91136 [05:35:55] (03CR) 10Springle: [C: 032] sync sX-master dns after rotations [operations/dns] - 10https://gerrit.wikimedia.org/r/91136 (owner: 10Springle) [07:01:35] (03PS2) 10Ryan Lane: Remove hook definition calls on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/90277 [07:16:28] (03PS1) 10Ori.livneh: EventLogging Ganglia monitor: collect every 10 seconds (was: 30) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91143 [07:20:24] (03CR) 10Ori.livneh: "(6 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90277 (owner: 10Ryan Lane) [07:22:48] (03CR) 10Ori.livneh: "It's one view *right now*, but that doesn't mean it'll stay that way. I encouraged Juliusz to create this view for start and build it up f" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91079 (owner: 10JGonera) [07:23:14] (03CR) 10Ori.livneh: [C: 032] EventLogging Ganglia monitor: collect every 10 seconds (was: 30) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91143 (owner: 10Ori.livneh) [07:24:32] ori-l: isn't iteritems gone in python 3? [07:24:43] I thought it's deprecated behavior to use iteritems... [07:25:06] for py3, yes. that's a valid reason if you cite it :P [07:25:23] I have to cite every py3 compatibility I use? :) [07:25:42] no, I was half-kidding. It's fine, yeah. [07:25:45] heh [07:26:18] and bools aren't really supposed to be quoted in puppet [07:26:32] if it's indeed a puppet bool, but in that case it should be lowercased [07:26:34] a strict style guide will ding you on it. I should be using a lowercase [07:26:36] indeed [07:27:07] the Python comments were mostly of the 'syntactic sugar you may or may not have known about' variety, but the 'True' thing was confusing [07:27:34] hm, actually a lot of the things I'm doing in get_config could be done using get('item', 'default') [07:27:49] and the last could indeed be setdefault [07:54:46] (03PS3) 10Ryan Lane: Remove hook definition calls on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/90277 [07:55:29] (03CR) 10Ryan Lane: "(6 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90277 (owner: 10Ryan Lane) [07:56:19] (03CR) 10Ori.livneh: [C: 031] "thanks" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90277 (owner: 10Ryan Lane) [07:56:59] ori-l: so, now adding a repo is down to adding a small amount of config in one hash, and doing a clone on the deployment host [07:57:24] I'll be adding an 'upstream' config item to the hash to automate the clones as well [07:57:37] plus automatically configuring the repos and adding necessary hooks [07:58:43] that's pretty sweet [07:59:12] 'upstream' would be nice, yeah [07:59:45] we have a global config system via pillars, may as well make use of them :) [08:02:05] btw, who wrote puppet-merge? [08:02:18] mediawiki-config needs the same thing [08:02:28] mediawiki-config? [08:02:44] ottomatta mostly wrote it [08:02:55] with help from paravoid, mark, and andrewbogott [08:03:10] yeah, operations/mediawiki-config.git [08:03:26] it's the same process, except not automated [08:03:30] morning [08:03:34] hey hashar [08:03:36] that's slated to be pushed via git-deploy [08:03:56] we expect people to follow the same deploy practice as mediawiki [08:04:03] we have a long involved process for that [08:04:30] but deployment isn't the same as merging [08:05:00] but if there's a workflow for that, so much the better [08:05:08] it's the dev workflow [08:05:20] if you want to change that process have at it, though :) [08:06:51] nah, not really [08:07:22] it's too out of scope for me i think [08:07:28] * Ryan_Lane nods [08:07:54] I try to avoid messing with other's workflow changes when possible ;) [08:12:42] (03PS4) 10Ryan Lane: Remove hook definition calls on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/90277 [08:12:43] (03PS1) 10Ryan Lane: Cleanup of config options and pillars [operations/puppet] - 10https://gerrit.wikimedia.org/r/91151 [08:13:20] (03PS2) 10ArielGlenn: get rid of temp-es* hosts, entries from 2009 long since unused [operations/dns] - 10https://gerrit.wikimedia.org/r/90871 [08:14:08] Ryan_Lane: you should sleep :-] [08:14:13] :) [08:14:27] soon enough [08:14:36] going to do one more simple change [08:15:20] (03CR) 10ArielGlenn: [C: 032] get rid of temp-es* hosts, entries from 2009 long since unused [operations/dns] - 10https://gerrit.wikimedia.org/r/90871 (owner: 10ArielGlenn) [08:16:03] !rt 5676 [08:16:04] http://rt.wikimedia.org/Ticket/Display.html?id=5676 [08:22:57] hashar! you should not be encouraging his sleep >.> [08:23:05] :D [08:23:39] OH MY GOD IT IS CAPSLOCK DAY!! [08:23:41] I FORGOT ABOUT IT [08:23:59] SEE HTTP://CAPSLOCKDAY.COM/ [08:30:40] NO WAY! [08:39:34] MISTAKE, EXCLAMATION MARKS SHOULD BE CONVERTED TO !!!!1111!!BBQ [08:55:53] (03PS1) 10Ryan Lane: Arg lists for fetch and checkout module calls [operations/puppet] - 10https://gerrit.wikimedia.org/r/91155 [08:56:05] and that's my last change of the night :) [09:19:25] WHAT A GREAT DAY!!!1111!!!BBQ [09:28:41] (03PS1) 10Springle: repool db1018 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91156 [09:29:12] (03CR) 10Springle: [C: 032] repool db1018 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91156 (owner: 10Springle) [09:30:15] !log springle synchronized wmf-config/db-eqiad.php 'repool db1018' [09:30:32] Logged the message, Master [09:42:48] !log Zuul: no more voting on code-review label {{bug|55757}} {{gerrit|91157}}. Will make the database happier I guess [09:43:03] Logged the message, Master [09:56:18] PROBLEM - Host sq80 is DOWN: PING CRITICAL - Packet loss = 100% [10:04:13] (03PS1) 10ArielGlenn: removing last vestiges (mgmt ip) for sq31-36, decommed [operations/dns] - 10https://gerrit.wikimedia.org/r/91160 [10:09:19] !log powercycled sq80, hung on mgmt console [10:09:32] Logged the message, Master [10:11:34] RECOVERY - Host sq80 is UP: PING OK - Packet loss = 0%, RTA = 30.99 ms [10:12:22] swapdeath [10:12:29] after 555 days or so not too bad [10:28:45] (03PS1) 10ArielGlenn: fix wrong path for files/charset.cnf (mysql_wmf module) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91162 [10:30:09] (03CR) 10ArielGlenn: [C: 032] fix wrong path for files/charset.cnf (mysql_wmf module) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91162 (owner: 10ArielGlenn) [10:48:50] (03PS1) 10ArielGlenn: update path for icinga/percona files (now in mysql_wmf module) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91165 [10:50:42] (03CR) 10ArielGlenn: [C: 032] update path for icinga/percona files (now in mysql_wmf module) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91165 (owner: 10ArielGlenn) [10:56:23] (03CR) 10Mark Bergsma: [C: 04-1] "Why are you removing the decommissioned Varnish caches from the decommissioned list? They should stay in there for statistics..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/91041 (owner: 10Cmjohnson) [10:58:19] (03PS1) 10ArielGlenn: one more path fixup for icinga/percona files [operations/puppet] - 10https://gerrit.wikimedia.org/r/91167 [10:59:41] (03CR) 10ArielGlenn: [C: 032] one more path fixup for icinga/percona files [operations/puppet] - 10https://gerrit.wikimedia.org/r/91167 (owner: 10ArielGlenn) [11:06:29] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 10 hours [11:21:29] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [11:55:56] (03PS1) 10ArielGlenn: fix up path for skrillex.yaml.erb (now in mysql_wmf module) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91173 [11:56:52] (03PS5) 10Andrew Bogott: Remove generic::mysql::packages in favor of mysql module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90194 [11:57:27] (03CR) 10jenkins-bot: [V: 04-1] Remove generic::mysql::packages in favor of mysql module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90194 (owner: 10Andrew Bogott) [11:57:59] (03PS6) 10Andrew Bogott: Remove generic::mysql::packages in favor of mysql module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90194 [11:59:24] (03CR) 10Andrew Bogott: [C: 032] Remove generic::mysql::packages in favor of mysql module [operations/puppet] - 10https://gerrit.wikimedia.org/r/90194 (owner: 10Andrew Bogott) [11:59:43] (03PS2) 10ArielGlenn: fix up path for skrillex.yaml.erb (now in mysql_wmf module) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91173 [12:01:04] (03CR) 10ArielGlenn: [C: 032] fix up path for skrillex.yaml.erb (now in mysql_wmf module) [operations/puppet] - 10https://gerrit.wikimedia.org/r/91173 (owner: 10ArielGlenn) [12:04:49] RECOVERY - Puppet freshness on tin is OK: puppet ran at Tue Oct 22 12:04:41 UTC 2013 [12:13:04] RECOVERY - Puppet freshness on mchenry is OK: puppet ran at Tue Oct 22 12:13:00 UTC 2013 [12:15:53] (03PS1) 10Andrew Bogott: Remove the last of mysql.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/91175 [12:16:28] (03CR) 10jenkins-bot: [V: 04-1] Remove the last of mysql.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/91175 (owner: 10Andrew Bogott) [12:29:55] (03CR) 10Ottomata: "Is there a reason this is called 'log.line.scratch.size' instead of just 'log.scratch.size'? What does log.hash.size refer to? I guess " [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [12:31:25] (03CR) 10Ottomata: "Note my comment on https://gerrit.wikimedia.org/r/#/c/90028 about the log.* config value names." [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90030 (owner: 10Edenhill) [12:32:17] (03CR) 10Edenhill: "You are correct in your assumptions and I agree with the namespace clashes." [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [12:38:25] (03CR) 10Ottomata: "(1 comment)" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91052 (owner: 10Edenhill) [12:40:45] (03CR) 10Edenhill: "(1 comment)" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91052 (owner: 10Edenhill) [12:55:15] (03CR) 10Ottomata: [C: 031] "This looks great!" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91053 (owner: 10Edenhill) [13:01:35] using git fetch --all It hangs on feching gerrit, any idea why? [13:26:40] (03CR) 10Mark Bergsma: [C: 032] Grow scratch pad by temporary buffers if necessary (issue #2) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91053 (owner: 10Edenhill) [13:29:09] (03CR) 10Edenhill: "Should probably add "kafka." prefix to kafka properties aswell." [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [13:30:51] (03PS2) 10Matanya: Repalce exec calls with file and user. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 [13:31:12] (03CR) 10jenkins-bot: [V: 04-1] Repalce exec calls with file and user. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 (owner: 10Matanya) [13:34:50] (03PS3) 10Matanya: Repalce exec calls with file and user. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 [13:35:11] (03CR) 10jenkins-bot: [V: 04-1] Repalce exec calls with file and user. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 (owner: 10Matanya) [13:36:03] (03CR) 10Mark Bergsma: [C: 04-2] "Please don't mix formatting changes (tabs/spaces) with other changes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 (owner: 10Matanya) [13:36:18] (03CR) 10coren: [C: 032] toollabs: Install graphviz [operations/puppet] - 10https://gerrit.wikimedia.org/r/90767 (owner: 10Yuvipanda) [13:37:01] (03CR) 10coren: [C: 032] toollabs: Install ant on -login and -dev [operations/puppet] - 10https://gerrit.wikimedia.org/r/90766 (owner: 10Yuvipanda) [13:37:11] (03PS2) 10coren: toollabs: Sort package names in dev_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/90765 (owner: 10Yuvipanda) [13:48:45] (03CR) 10coren: [C: 032] toollabs: Sort package names in dev_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/90765 (owner: 10Yuvipanda) [13:48:56] (03PS2) 10coren: toollabs: Install ant on -login and -dev [operations/puppet] - 10https://gerrit.wikimedia.org/r/90766 (owner: 10Yuvipanda) [13:52:29] (03CR) 10coren: [C: 032] toollabs: Install ant on -login and -dev [operations/puppet] - 10https://gerrit.wikimedia.org/r/90766 (owner: 10Yuvipanda) [13:52:40] (03PS2) 10coren: toollabs: Install graphviz [operations/puppet] - 10https://gerrit.wikimedia.org/r/90767 (owner: 10Yuvipanda) [13:53:40] (03CR) 10coren: [C: 032] toollabs: Install graphviz [operations/puppet] - 10https://gerrit.wikimedia.org/r/90767 (owner: 10Yuvipanda) [14:12:20] (03PS1) 10Hashar: bump upstream to v0.6.0-43-g9d3a3d4 [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/91183 [14:12:21] (03PS1) 10Hashar: changelog entry for upstream bump to v0.6.0-43-g9d3a3d4 [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/91184 [14:12:24] finally [14:14:25] (03CR) 10Hashar: "The master branch originally created upstream code which we amended to release our own version. I have created an upstream branch that co" [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/91183 (owner: 10Hashar) [14:15:14] (03CR) 10Hashar: "debian/changelog entry for the upstream merge made by https://gerrit.wikimedia.org/r/#/c/91183/ ." [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/91184 (owner: 10Hashar) [14:30:23] (03PS6) 10Hashar: beta: symlink /a/common [operations/puppet] - 10https://gerrit.wikimedia.org/r/65254 [14:33:23] (03PS4) 10Matanya: Repalce exec calls with file and user. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 [14:33:43] (03CR) 10jenkins-bot: [V: 04-1] Repalce exec calls with file and user. [operations/puppet] - 10https://gerrit.wikimedia.org/r/86889 (owner: 10Matanya) [14:34:34] PROBLEM - Host srv291 is DOWN: PING CRITICAL - Packet loss = 100% [14:35:12] (03PS3) 10Hashar: adjust jobrunner/videoscaler role for beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/77034 [14:35:14] i give up [14:35:17] too busy for this [14:38:15] (03CR) 10Hashar: "Job runner configuration should be way easier since it would just be about editing the configuration hash." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77034 (owner: 10Hashar) [14:38:31] (03PS2) 10Hashar: ntp: explicitly reference variable in current scope [operations/puppet] - 10https://gerrit.wikimedia.org/r/86647 [14:41:43] (03PS3) 10Hashar: ganglia wrapper for py plugins (and add diskstat plugin) [operations/puppet] - 10https://gerrit.wikimedia.org/r/85669 [14:57:53] (03PS2) 10Andrew Bogott: Remove the last of mysql.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/91175 [15:00:41] (03PS2) 10Hashar: graphite: make sure we aggregate min/max/count properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/88470 [15:00:47] (03PS2) 10Hashar: graphite: tweak statsd aggregation [operations/puppet] - 10https://gerrit.wikimedia.org/r/88471 [15:25:05] (03CR) 10Andrew Bogott: [C: 032] Remove the last of mysql.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/91175 (owner: 10Andrew Bogott) [15:58:06] csteipp: is https://gerrit.wikimedia.org/r/#/c/90670/ OK to merge? [16:03:55] chrismcmahon: Yeah, just need to merge it on the cluster too when it's merged, so the next deployer doesn't get surprised by it [16:17:39] (03PS2) 10Reedy: Reduce all the www portal docroots into one [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90668 [16:17:48] (03CR) 10Reedy: [C: 032] Reduce all the www portal docroots into one [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90668 (owner: 10Reedy) [16:17:52] (03CR) 10jenkins-bot: [V: 04-1] Reduce all the www portal docroots into one [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90668 (owner: 10Reedy) [16:17:57] SCREW YOU JENKINS [16:18:21] (03CR) 10Reedy: [V: 032] Reduce all the www portal docroots into one [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90668 (owner: 10Reedy) [16:19:21] !log reedy synchronized docroot/wwwportal [16:19:37] Logged the message, Master [16:20:11] (03PS1) 10Chad: Foundationwiki gets new search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91194 [16:20:43] (03CR) 10Manybubbles: [C: 031] Foundationwiki gets new search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91194 (owner: 10Chad) [16:21:45] (03CR) 10Chad: [C: 032] Foundationwiki gets new search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91194 (owner: 10Chad) [16:21:47] (03PS2) 10Reedy: Switch www portals to using one docroot [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90669 [16:21:58] (03Merged) 10jenkins-bot: Foundationwiki gets new search [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91194 (owner: 10Chad) [16:26:09] !log demon synchronized cirrus.dblist 'foundationwiki gets cirrus' [16:26:20] Logged the message, Master [16:27:24] (03CR) 10Dzahn: [C: 031] "dzahn@fenari:~$ apache-fast-test wwwportals.url mw1044" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90669 (owner: 10Reedy) [16:28:00] (03CR) 10Ottomata: [C: 031] "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85669 (owner: 10Hashar) [16:28:44] ottomata1: that is a valid point :] [16:30:14] <^d> Reedy: I put foundationwiki in cirrus.dblist, but it doesn't seem to be picking it up. Extension's not installed. [16:30:39] Toch InitialiseSetting and sync? [16:31:29] !log demon synchronized wmf-config/InitialiseSettings.php 'Touch' [16:31:44] Logged the message, Master [16:32:28] <^d> That did it, thanks. [16:34:36] hashar, I can't really tell what's going on for those jenkins-deb-glue changes [16:34:41] but I'm sure its fine, do you need me to merge? [16:35:02] ottomata: basically attempted to update the package [16:35:11] upstream provides a debian directory [16:35:25] so we originally forked their repo in our master branch [16:35:46] I simply added the upstream repo in an upstream branch and merged latest version in our master [16:36:34] ok [16:36:43] (03PS1) 10Reedy: Move www.wikimedia.org portal to wwwportals file [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91195 [16:37:20] ottomata: that is a bit hacky though [16:38:46] so upstream and master are the same? [16:39:48] hashar^? [16:40:12] more or less [16:40:14] (03CR) 10Andrew Bogott: "You can install the pep8 tool locally and check your python files before submitting. Probably there was a .pep8 rule exception allowing t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90760 (owner: 10Matanya) [16:40:17] master is a fork of some commit of upstream [16:40:25] with 2 commits from Andrew to tweak the package version [16:40:33] then I merged the latest upstream in master [16:40:41] and added another commit in master to further update our version [16:40:41] (03CR) 10Dzahn: [C: 032] Move www.wikimedia.org portal to wwwportals file [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91195 (owner: 10Reedy) [16:41:50] so upstream-branch=upstream (no changes) and debian-branch=master (i know you might not be using git-buildpackage, but whatevs) [16:41:52] I think that sounds good [16:42:19] (03PS3) 10Ottomata: ntp: explicitly reference variable in current scope [operations/puppet] - 10https://gerrit.wikimedia.org/r/86647 (owner: 10Hashar) [16:42:25] (03CR) 10Ottomata: [C: 032 V: 032] ntp: explicitly reference variable in current scope [operations/puppet] - 10https://gerrit.wikimedia.org/r/86647 (owner: 10Hashar) [16:42:40] ottomata: yeah that is it [16:43:04] ottomata: jenkins-debian-glue is being build with …. jenkins-debian-glue which rely on gbp [16:43:17] ottomata: I guess they only use the default so they don't need a debian/gbp.conf [16:43:18] (03PS1) 10Jforrester: cawiki: Enable VisualEditor for Portal: and Viquiprojecte: [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91197 [16:43:19] (03PS1) 10Jforrester: enwiki: Enable VisualEditor for Portal: and Book: [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91198 [16:43:59] ottomata: the package is only used (for now) on a labs instance I play with, so you can put the packages in apt.wm.o without impacts :] [16:44:03] yeah exactly [16:44:05] upstream and master are defaults [16:44:08] ok cool [16:44:17] shall I merge that then? [16:44:25] yup :-] [16:44:41] maybe I will end up setting up my own CI reprepo :-] [16:44:48] (03PS2) 10Jforrester: cawiki: Enable VisualEditor for Portal: and Viquiprojecte: [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91197 [16:45:01] (03PS2) 10Jforrester: enwiki: Enable VisualEditor for Portal: and Book: [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91198 [16:45:05] (03PS1) 10CSteipp: Fix error message for recycled passwords [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91199 [16:45:39] (03CR) 10Ottomata: [C: 032 V: 032] changelog entry for upstream bump to v0.6.0-43-g9d3a3d4 [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/91184 (owner: 10Hashar) [16:45:53] (03CR) 10Ottomata: [C: 032 V: 032] bump upstream to v0.6.0-43-g9d3a3d4 [operations/debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/91183 (owner: 10Hashar) [16:47:29] (03PS1) 10Andrew Bogott: Remove unused apparmor class [operations/puppet] - 10https://gerrit.wikimedia.org/r/91200 [16:49:25] (03CR) 10Andrew Bogott: [C: 032] Remove unused apparmor class [operations/puppet] - 10https://gerrit.wikimedia.org/r/91200 (owner: 10Andrew Bogott) [16:49:29] (03CR) 10Dzahn: "yep, this is good for Change-Id: Ic236421a6bd178396be2f665ed0948fd10eb48e8 handle them all in one place" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91195 (owner: 10Reedy) [16:53:30] (03PS3) 10Reedy: Switch www portals to using one docroot [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90669 [16:53:31] (03CR) 10jenkins-bot: [V: 04-1] Switch www portals to using one docroot [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90669 (owner: 10Reedy) [16:54:02] (03PS4) 10Reedy: Switch www portals to using one docroot. [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90669 [16:54:22] ottomata: moving back home… Will upgrade the debian glue package tomorrow if you have made it available in apt.wm.o :-] [16:54:49] cool [16:54:53] oh, you need me to build it? [16:55:17] can you build it and put a .deb on fenari or something? I can put it in apt then [16:59:36] ottomata: yeah will do tomorrow and put on fenari then ping you :-] [17:00:01] ottomata: though git buildpackage might just work :] [17:00:11] I am off of the coworking place, it is closing/ [17:01:05] cool, laters! [17:01:25] (03CR) 10Dzahn: [C: 032] "yep, still works fine after removing all www* docroots" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90669 (owner: 10Reedy) [17:01:41] wheeeeee [17:03:28] (03PS5) 10Reedy: Move a lot of the miscellaneous wikis out of their own specific docroots [operations/apache-config] - 10https://gerrit.wikimedia.org/r/90703 [17:03:35] (03PS2) 10Reedy: Remove all superfluous docroot folders [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90704 [17:06:19] !log sync-apache, graceful to unify www portal docroots [17:06:31] Logged the message, Master [17:08:46] :D [17:09:18] Reedy: done [17:09:34] fwif, your topic branch on one of those above is called "(detached" [17:09:57] Awesome [17:10:39] bbiaw [17:10:43] meta wiki and commons is down [17:10:46] well [17:10:51] more of a redirect to foundaiton wiki [17:10:52] RobH: good morning ;) [17:10:58] Oh crap [17:11:03] heyas [17:11:05] I am guessing all wikimedia wikis have suffered this [17:11:10] oh crap [17:11:11] mutante: I think we just broke it [17:11:23] Reedy I caught it instantly :p [17:11:40] I wonder if it was [17:11:40] 205 - [17:11:41] 206 - ServerName wikimedia.org [17:11:41] 207 - Redirect permanent / http://www.wikimedia.org/ [17:11:41] I was typing a detailed post [17:11:42] 208 - [17:11:44] 209 - [17:11:53] ya [17:12:00] arr. revert or can we fix it quicker [17:12:15] right now the caching results are being poisoned [17:12:17] i'd revert. [17:12:18] Probably safer to revert the last commit [17:12:27] images still work I think [17:12:32] when i did this years ago i had to purge caches [17:12:42] Uh, last 2 as the wikimedia one being at fault [17:12:43] https://upload.wikimedia.org/wikipedia/commons/4/43/Wikimood_07.png [17:12:57] Reedy: ping [17:13:04] so can you hurry up? :) [17:13:15] (03PS1) 10Dzahn: Revert "Switch www portals to using one docroot." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91203 [17:13:35] mark: will varnish need purging for this? [17:13:40] squid [17:13:45] mutante: I'd just reset head a few revisions and push that config [17:13:45] (03CR) 10Dzahn: [C: 032] Revert "Switch www portals to using one docroot." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91203 (owner: 10Dzahn) [17:14:03] well, atleast theres less of them now.... [17:14:13] less? [17:14:23] mutante, Reedy: I think you have broken Meta-Wiki. [17:14:27] Yes [17:14:28] you may have * [17:14:30] We know full well [17:14:33] Oh, good. :-) [17:14:34] well, some are varnish instead, but i guess capacity is up [17:14:48] syncs [17:14:49] and i suppose im counting uplaod and shouldnt [17:16:39] still redirecting [17:16:47] Yes [17:16:51] No one said it was fixed [17:16:54] ah sorry [17:17:06] I assumed the Dzahn revers fixed it [17:17:15] thats code revert, still has to deploy across cluster [17:17:20] (03PS1) 10Dzahn: Revert "Move www.wikimedia.org portal to wwwportals file" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91204 [17:17:25] You know what they say about people who assume.... [17:17:28] then we have to manually purge the squid caches since they are poisoned. [17:17:46] (03CR) 10Dzahn: [C: 032] Revert "Move www.wikimedia.org portal to wwwportals file" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91204 (owner: 10Dzahn) [17:17:55] Marybelle I assume something unpleasant? [17:17:57] :D [17:18:48] RobH how long does that normally take, just to know how much time I have to kill [17:19:16] !log apache graceful-all [17:19:17] The last time I saw this happen was by me a few years ago, and that took a long time to fix for every single squid [17:19:28] Logged the message, Master [17:19:30] but mark would be more knowledgable as to timeframe (i would think) [17:20:16] normal pages are cached for a month IIRC [17:20:27] Hello, I need to run mwscriptwikiset against two temp dblist files, which are located under my home dir, but mwscriptwikiset wouldn't allow me to run it unless they are under /usr/local/apache/common-local/, and I can't copy them to the directory. Any thoughts? Thanks [17:20:37] I think if you try to manually purge all the Squids, the site will fall over. If you can limit to meta.wikimedia.org, it'd be better. [17:20:42] gwicke: yea, we will manually purge [17:21:14] we have to purge for any links affected [17:21:19] was meta the only thing redirected? [17:21:24] bsitu: just fyi, opsen are working on a issue right now, might be a bit before a response [17:21:24] No [17:21:25] *.wikimedia.org, I think. [17:21:33] So commons.wikimedia.org, meta.wikimedia.org, etc. [17:21:33] Anything *.wikimedia.org is probably problematic [17:21:36] thats way more than just meta [17:21:36] <^d> bsitu: foreachwikiindblist should work with any dblist. [17:21:44] but yea, thats less than everything [17:21:46] <^d> But I think you have to sudo -u apache for it to work. [17:21:58] reverted, synced, restarted apaches [17:21:58] (no wikipedia involvement, this is automatically less severe than when i did it years ago ;) [17:22:00] RobH: It's not wikipedia.org, which would be the most painful. :-) [17:22:08] commons back for me [17:22:10] yea my identical to this break was wikipedia [17:22:11] heh [17:22:31] mutante: So i dont think its working for evyerone though [17:22:37] images dont redirect though [17:22:37] (http://otrs-wiki.wikimedia.org/wiki/Special:RecentChanges forwards to wikimediafoundation.org, but apparently you are already aware) [17:22:49] pajz: Being worked on, yes. :-) [17:23:00] https://bugzilla.wikimedia.org/show_bug.cgi?id=56006 has my test case. [17:23:04] Which seems to be better now. [17:23:10] !log Banned Content-Type == "application/x-httpd-php" in text Varnish (eqiad) [17:23:17] Though Special pages are cached differently, I think. [17:23:23] ^d: thx, let me take a look at it [17:23:25] Logged the message, Master [17:23:40] greg-g: got it [17:23:40] meta back for me, too [17:23:53] bsitu: :) [17:23:57] Plus HTTPS reduces caching as well, I think. [17:24:07] Anyway, glad it's resolved. Thanks Reedy and mutante. :-) [17:25:43] RobH: https://upload.wikimedia.org/wikipedia/commons/c/c3/White_Cat.jpg [17:25:53] ? [17:25:53] upload. doesnt seem to recirect [17:25:58] its not fixed all the way. [17:26:02] no I know [17:26:08] sees the cat [17:26:21] but however it is defined seems to disregard the previous redirect [17:27:18] Upload is seperate [17:27:32] ^d: I couldn't find any documentation for foreachwikiindblist, is it foreachwikiindblist list.dblist script.php ? [17:28:10] Reedy I dont know, just pointed out the anomoly in *.wikimedia.org redirections [17:28:12] <^d> Something like that. [17:28:21] <^d> I think it fails with some help text. [17:29:13] looks it was a ServerAlias *.wikimedia.org in the config for the "www"-portals [17:29:37] that ate the redirects [17:29:48] where uplad isn't a portal [17:30:38] Upload shouldn't hit those configs should it? [17:31:26] ottomata: are you doing the update or should I? [17:31:33] Should have already been routed to the correct cluster [17:32:08] (please do, I can't answer at least the first question :) [17:32:52] haha, paravoid, i barely can either, what has ops done since last tuesday? [17:33:07] traffic in ulsfo [17:37:36] (03PS1) 10Reedy: Switch www portals to using one docroot [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91209 [17:43:48] ^d: thx, that works [17:44:39] <^d> yw [17:53:31] mark or paravoid, you able to make that meeting with ken and dan? ken added ryan to the meeting, as he's the https contact, but i think it would make sense for one or both of you guys to attend, too, given that you've been the ones looking at lots of facets of w0, including ip addresses. [17:55:50] (03CR) 10Bsitu: [C: 032] Enable Echo on all wikis except dewiki and itwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91072 (owner: 10Bsitu) [17:56:02] (03Merged) 10jenkins-bot: Enable Echo on all wikis except dewiki and itwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91072 (owner: 10Bsitu) [17:59:32] (03CR) 10Reedy: "At this point when it's the minority opting out..." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91072 (owner: 10Bsitu) [18:00:52] Reedy: yes, the dblist file is used by a cron as well [18:01:42] !log restarting a few Apaches that didnt get restarted by graceful-all (hrmmpf) [18:01:54] Logged the message, Master [18:04:10] (03CR) 10Bsitu: "Yes, the dblist file is used by a cron job as well" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91072 (owner: 10Bsitu) [18:04:19] (03CR) 10Ottomata: [C: 032 V: 032] Added statistics (both from varnishkafka and librdkafka) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91052 (owner: 10Edenhill) [18:05:02] (03CR) 10Ottomata: "Hm. Since this is varnishkafka.conf, I'm ok with properties without a top level prefix to be varnishkafka related settings. E.g. log.dat" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [18:05:30] (03CR) 10Ottomata: [V: 032] Log failed Kafka message deliveries (issue #1) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90029 (owner: 10Edenhill) [18:05:48] (03CR) 10Ottomata: [V: 032] Provide some more detail when Kafka ..produce() fails. [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90030 (owner: 10Edenhill) [18:06:11] (03CR) 10Ottomata: [V: 032] Added rate-limiting to (most) error logs generated by varnishkafka (issue #1) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90184 (owner: 10Edenhill) [18:07:25] (03CR) 10Ottomata: [C: 032 V: 032] Limit maximum tag size content [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91054 (owner: 10Edenhill) [18:08:38] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Echo and Thanks on all wikis except dewiki and itwiki' [18:08:50] Logged the message, Master [18:08:54] (03CR) 10Ottomata: [C: 032 V: 032] Increase string renderer output buffer to 8K [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91055 (owner: 10Edenhill) [18:08:58] !log bsitu synchronized echowikis.dblist 'Enable Echo and Thanks on all wikis except dewiki and itwiki' [18:09:12] Logged the message, Master [18:09:29] (03CR) 10Ottomata: [C: 032 V: 032] Avoid unnecessary clearing of scratch pad on logline alloc. [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91056 (owner: 10Edenhill) [18:10:19] (03CR) 10Ottomata: "How about I go ahead and merge this and we can fix the config property names in a different commit?" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [18:11:05] (03CR) 10Ottomata: [C: 032 V: 032] Make scratch buffer size configurable (issue #2) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/90028 (owner: 10Edenhill) [18:11:34] (03CR) 10Ottomata: [C: 032 V: 032] Grow scratch pad by temporary buffers if necessary (issue #2) [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/91053 (owner: 10Edenhill) [18:18:48] paravoid: btw, watch out for broken google hangouts on jessie, I had to downgrade libcairo: http://comments.gmane.org/gmane.linux.linaro.devel/16477 [18:18:58] also, other Debian users ^^^ [18:25:27] RobH: ping [18:29:14] greg-g: thanks for the info [18:29:49] mark: ping [18:31:01] gwicke: anything I can help you with? [18:31:24] cmjohnson1: not sure, am still waiting on https://gerrit.wikimedia.org/r/#/c/91046/ [18:32:07] looks good [18:32:23] (03PS2) 10Cmjohnson: Sudo for gwicke on cassandra test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91046 (owner: 10GWicke) [18:32:29] (03CR) 10Cmjohnson: [C: 032] Sudo for gwicke on cassandra test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91046 (owner: 10GWicke) [18:32:57] cmjohnson1: muchas gracias! [18:33:55] (03CR) 10Cmjohnson: [V: 032] Sudo for gwicke on cassandra test cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91046 (owner: 10GWicke) [18:35:31] !log bsitu synchronized wmf-config/InitialiseSettings.php 'touch' [18:35:46] Logged the message, Master [18:36:49] gwicke: you should be able to log on to your test host now [18:37:12] cmjohnson1: yes, am already installing cassandra. Thanks! [18:37:22] cool [18:40:02] trying to at least.. those hosts don't have access to the internets [18:42:22] I'd like to install cassandra from the datastax repository, which we don't mirror [18:42:41] !log bsitu synchronized php-1.22wmf21/extensions/Echo/modules/overlay 'touch' [18:42:54] Logged the message, Master [18:43:11] is there a recommended way to handle situations like these, or should I do some tricks with ssh tunneling to bast1001 and apt proxy settings? [18:43:29] paravoid ^^ [18:44:58] gwicke, you can dl the .deb from datastax and dpkg -i it [18:45:12] yeah, as a work-around [18:45:14] or add the datastax repo to your apt sources.list [18:45:22] that won't work [18:45:23] oh but they don't have internet [18:45:49] wait, but they can get packages from ubuntu/debian apt [18:45:56] they probably proxy through brewster automatically [18:45:58] from the local mirror of it [18:46:00] you sure it won't work? [18:46:03] ohhh [18:46:05] k [18:46:08] didn't know that [18:46:11] hm yeah [18:46:17] I just tried to install the repo key, which fails [18:46:19] i thikn you can teach apt to use proxy [18:46:24] predictably.. [18:46:39] yeah, did the proxy thing in the past [18:47:03] bast1001 has net access, so I could ssh tunnel there and set the local port as an apt proxy [18:47:08] all very hacky though.. [18:47:31] cna you just use brewster.wikimedia.org:8080? [18:47:44] there's an http proxy already there [18:48:52] ottomata: that did the trick, thanks! [18:48:58] yup! [18:49:55] ah, it is already configured in apt.conf [18:50:11] only the key import was failing [18:51:22] ah [19:00:40] Coren: apergos would know [19:01:40] Well, the filesystem is ready and exportable, needs only a place to allow writing from. :-) [19:01:51] Coren: it says $cluster = "misc"; [19:01:56] i've no idea what that means [19:03:04] PROBLEM - Host ms-be1011 is DOWN: PING CRITICAL - Packet loss = 100% [19:03:05] (03PS1) 10Cmjohnson: Removing dns entries for mw126-135 [operations/dns] - 10https://gerrit.wikimedia.org/r/91215 [19:03:31] paravoid: ms-be1011 you? [19:04:01] gwicke: i see cmjohnson1 took care of it for ya, sorry was afk @ bank [19:04:41] (03CR) 10Jdlrobson: [C: 031] Update MobileWebEditing schema revision [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90451 (owner: 10JGonera) [19:04:52] RobH: np, thanks for your help! [19:05:30] (03CR) 10MaxSem: [C: 031] Update MobileWebEditing schema revision [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90451 (owner: 10JGonera) [19:05:43] quite welcome [19:08:41] * YuviPanda pokes apergos a bit more? [19:08:50] (03PS1) 10coren: Automount /data/pagecounts for labs projects [operations/puppet] - 10https://gerrit.wikimedia.org/r/91217 [19:09:39] (03PS2) 10coren: Automount /data/pagecounts for labs projects [operations/puppet] - 10https://gerrit.wikimedia.org/r/91217 [19:10:00] ow ow [19:10:02] yes? [19:10:36] warning that my brain has pretty much shut off for the night [19:11:20] ow [19:11:29] apergos: go to sleep, this can wait :) [19:12:03] not sleep, pre-sleep vegetative state (mindless web surfing) [19:14:23] (03CR) 10Jdlrobson: [C: 04-1] "It looks like there could be a slight problem with this..." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90451 (owner: 10JGonera) [19:14:48] (03CR) 10Jdlrobson: "If we use the new schema on old code no event logging will happen as the revision it points to has added a required field..." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90451 (owner: 10JGonera) [19:17:20] apergos: the script will run on dataset2 or dataset1001? [19:17:32] apergos: Coren made a separate partition on nfs, so wants to know where they're going to be writing from [19:17:38] allow both to write [19:17:57] apergos: okay! I'll also have to setup the new mountpoint, but should be trivial. [19:18:04] we want to be able to move the job around [19:18:14] Coren: so both dataset2 and dataset1001 should be able to write to the partition [19:18:39] YuviPanda: Got it. [19:19:07] (03CR) 10coren: [C: 032] Automount /data/pagecounts for labs projects [operations/puppet] - 10https://gerrit.wikimedia.org/r/91217 (owner: 10coren) [19:19:07] ty! [19:21:55] (03PS1) 10Dzahn: make apache-fast-test pybal use eqiad config [operations/puppet] - 10https://gerrit.wikimedia.org/r/91270 [19:22:31] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for mw126-135 [operations/dns] - 10https://gerrit.wikimedia.org/r/91215 (owner: 10Cmjohnson) [19:23:21] (03CR) 10Reedy: [C: 031] make apache-fast-test pybal use eqiad config [operations/puppet] - 10https://gerrit.wikimedia.org/r/91270 (owner: 10Dzahn) [19:23:43] !log dns update [19:23:58] Logged the message, Master [19:25:10] (03CR) 10Dzahn: [C: 032] make apache-fast-test pybal use eqiad config [operations/puppet] - 10https://gerrit.wikimedia.org/r/91270 (owner: 10Dzahn) [19:27:43] paravoid: got a three-node cluster up now, documenting the process at https://www.mediawiki.org/wiki/User:GWicke/Notes/Storage/Testing [19:47:47] (03CR) 10Ryan Lane: [C: 032] Remove hook definition calls on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/90277 (owner: 10Ryan Lane) [19:54:12] !log updating git deployment system with change 90277 (0424d6f7) [19:54:27] Logged the message, Master [20:05:38] Anyone around who can purge something from varnish? Bit of extra cleanup from the outage earlier [20:06:13] mark: ^ ? [20:06:17] [21:00:08] Reedy: I think bans on content-length should be possible. /me guesses `ban obj.http.content-length == 33518` [20:06:25] (03PS1) 10Ori.livneh: IPython: make $ipythondir writable for IPython user [operations/puppet] - 10https://gerrit.wikimedia.org/r/91287 [20:06:33] he is marked (hah) as /away [20:06:53] bblack: About? [20:07:06] Reedy: yeah, mostly [20:07:18] bblack: in the office? [20:07:33] in my office, which is probably nowhere near your office :) [20:07:39] ok :/ [20:07:48] just checking if i can stop by and say hi [20:07:54] hi! :) [20:08:38] Full disclosure: I google guessed that ban command. [20:08:47] hhe [20:08:49] heh [20:08:56] Full disclosure: I don't actually know anything, I google everything I do :) [20:09:25] heh [20:09:40] I tried to remember how I got things done in the early 90's and I really coudn't [20:10:03] I do remember having a huge stack of books at my desk [20:10:06] A bookshelf full of O`Reilly animal books [20:10:07] (03CR) 10Ori.livneh: [C: 032] IPython: make $ipythondir writable for IPython user [operations/puppet] - 10https://gerrit.wikimedia.org/r/91287 (owner: 10Ori.livneh) [20:10:13] PHP development requires an active web browser session [20:10:30] PHP development requires an active hole in the side of your head! [20:10:59] bblack: so no deaf developers? [20:11:00] bblack: Yes, you need to listen to the code. It says things quietly [20:11:07] :P [20:11:16] Reedy: so, what do I have to purge where and how? [20:11:55] How? However you'd like ;) [20:12:20] bd808: my favorite new PHP fun fact that I learned via TimStarling the other day: PHP has a wrapper for select(), which can fail and return false for all the usual reasons, but it doesn't give access to errno, so you can't tell if it's just EINTR or not [20:12:38] how did they expect anyone to write fully functional select()-based code without access to errno? :P [20:13:04] I saw some of that discussion. I think the answer to your question is that they didn't expect anyone to actually use it. [20:13:35] There is a lot of code in the php core that's "works for me" for some scenario or other [20:14:43] bblack: ban obj.http.content-length == 33518 on the text varnishes would be great (if said command is correct from bd808s googling ;D) [20:14:57] stuff that will work 99% of the time is in many ways worse than something that works 50% of the time. [20:15:15] i just did on cp1055 and 1054: ban req.http.host == "commons.wikimedia.org" [20:15:29] but i'm on the way to bank [20:16:32] We are hoping to make the varnishes forget about cached copies of extract2.php [20:17:12] based on it's length [20:17:13] ok [20:17:19] all varnishes? [20:17:51] Text vanishes in all data centers I guess [20:18:45] Although mark only logged having done equiad earlier [20:19:04] > 17:23 mark: Banned Content-Type == "application/x-httpd-php" in text Varnish (eqiad) [20:19:24] Ganglia suggests we only have text varnish in eqiad [20:19:33] the rest (including some of eqiad) is squid [20:19:36] well there you go then [20:20:02] So, that's still a yes [20:20:08] * bd808 shakes fist at squids everywhere [20:20:13] It is indeed all text varnishes in all datacentres [20:20:18] yeah ok [20:21:07] actually those are the only text varnishes in puppet [20:21:24] there's amssq47 in puppet technically, but that host seems dead or something, I've never reached it when I looked before [20:22:17] oh nevermind, I must be confused, that machine does work [20:23:11] Reedy: yeah so I did the content-length ban on amssq47. if mark did eqiad text, that's it then. [20:23:41] is there an easy way to confirm it? [20:23:59] marks ban was for a php content type, not the length [20:25:10] So this is a new one, based on a hunch there might be other wrongly cached things with that content length (as aude found) [20:25:15] ok [20:26:27] (03CR) 10Ryan Lane: [C: 032] Cleanup of config options and pillars [operations/puppet] - 10https://gerrit.wikimedia.org/r/91151 (owner: 10Ryan Lane) [20:27:30] ori-l: any feelings on this before I merge it in? https://gerrit.wikimedia.org/r/#/c/91155/1 [20:27:31] Reedy: applied the content-length ban to eqiad texts now [20:29:23] Great, thanks :) [20:29:55] bblack: Would you mind !log'ing that? [20:30:35] !log varnish: banned 'obj.http.content-length == 33518' on text varnishes everywhere (extract2.php leakage) [20:30:44] (03PS1) 10Yuvipanda: dumps: Copy pagecounts data to public labs nfs too [operations/puppet] - 10https://gerrit.wikimedia.org/r/91293 [20:30:50] Logged the message, Master [20:30:56] bblack: Thanks! [20:31:32] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [20:34:42] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:37:50] (03CR) 10Hashar: [C: 04-1] "(2 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91293 (owner: 10Yuvipanda) [20:39:26] (03CR) 10Yuvipanda: "(2 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91293 (owner: 10Yuvipanda) [20:39:28] hashar: ^ [20:39:35] hashar: sticking to the file's conventions :P [20:39:42] ahh [20:40:12] (03CR) 10Hashar: "ah yeah indeed :-} Sorry!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91293 (owner: 10Yuvipanda) [20:40:55] Ryan_Lane: looks good. I don't love the use of __MAGICWORDS__ since they're usually reserved for Python interpreter internals so it makes it seems like you're practicing some kind of black magic with the interpreter [20:41:29] we can change that to some other kind of magic word [20:41:34] hashar: can you remove the -1? :) [20:41:38] let me know what magic word format you'd like [20:42:09] YuviPanda: I did, it is just that grrit-wm does know how to report a CR -1 to CR 0 state change :] [20:42:24] Ryan_Lane: does it have to have some kind of special format? can't it be 'repo' or something? [20:42:25] ah :D ok [20:42:37] ori-l: you might actually want to pass in 'repo' [20:42:47] hashar: re: grrrit-wm, it's documented on wikitech already. On grrrit-wm, rather than grrrit [20:42:59] it needs to be something unlikely to be used ever [20:44:30] Oh. Um, I'd prefer something like __salt_repo, __salt_foo, something like that. But since you're already using __salt__ , maybe it's best to stick with __REPO__ for now and refactor later [20:44:52] well, it could be a number of things [20:45:02] ideally we'd let people expand most config options for the repo or the global config [20:45:10] (03CR) 10Yuvipanda: "Not tested yet, needs apergos to test it / merge" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91293 (owner: 10Yuvipanda) [20:45:17] I started with repo because it's the one actually being used right now [20:45:24] Can someone confirm if https://gerrit.wikimedia.org/r/#/c/91270 has been merged on sockpuppet? If not, could they merge it and then kick puppet on fenari please? [20:45:34] Reedy: ok, just a moment [20:45:36] __CONFIG__ would be another one [20:45:55] or maybe __REPO_CONFIG__ [20:46:07] to pass in global and repo configs to the function [20:46:27] so if there's a better format to use that's not weird from a python POV, that would be good [20:46:49] leading double underscore but lowercase and no trailing double underscore [20:46:50] could use {REPO} {REPO_CONFIG} etc [20:46:55] oh, I like that [20:47:02] {} i mean [20:47:18] Reedy: it was already merged. I am running puppet on fenari now. [20:47:29] ok. I'll start with what I have and refactor [20:47:34] The last Puppet run was at Sun Mar 3 23:39:41 UTC 2013 (23 minutes ago). [20:47:37] I don't want to refactor and add features in the same patch [20:47:47] yeah, makes sense [20:48:50] (03CR) 10Ryan Lane: [C: 032] Arg lists for fetch and checkout module calls [operations/puppet] - 10https://gerrit.wikimedia.org/r/91155 (owner: 10Ryan Lane) [20:50:43] Reedy: it ran but it doesn't seem to have changed anything. the executable in fenari is probably not managed by puppet [20:51:19] I wonder if scriptpath is somewhere ekse.. [20:51:29] Nope, /usr/local/bin [20:51:46] i don't think the relevant class is applied on fenari [20:52:07] misc::deployment::common_scripts [20:52:48] Hmmm [20:52:53] Tin is using an old version too [20:55:46] I manually updated the file on fenari to match puppet HEAD, you may want to file a bug about it though, it should really be managed by Puppet [20:56:48] Thanks [20:57:00] I'll stick it as an RT ticket as it's more ops-y [21:08:07] !log pushed git deployment changes 91151 and 91155 to production [21:08:21] Logged the message, Master [21:31:22] (03CR) 10Reedy: [C: 031] Switch www portals to using one docroot [operations/apache-config] - 10https://gerrit.wikimedia.org/r/91209 (owner: 10Reedy) [21:37:17] !log reedy synchronized php-1.22wmf22/extensions/RelatedSites [21:37:31] Logged the message, Master [21:38:10] !log reedy synchronized php-1.22wmf22/extensions/Wikibase [21:38:22] Logged the message, Master [21:47:39] no text in ulsfo still, right? [21:47:58] i checked dns repo but i just want to be sure... [21:49:00] I donl't think so.. [21:49:25] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [21:49:29] (03CR) 10Ori.livneh: [C: 031] "Good to merge." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91199 (owner: 10CSteipp) [21:49:41] bits, upload and lvs according to ganglia [21:50:15] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [21:58:00] (03CR) 10Ori.livneh: [C: 032] Fix error message for recycled passwords [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91199 (owner: 10CSteipp) [21:58:06] (03CR) 10Ori.livneh: [C: 032] Update beta to use loginwiki for SUL [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90670 (owner: 10CSteipp) [21:58:28] (03Merged) 10jenkins-bot: Fix error message for recycled passwords [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91199 (owner: 10CSteipp) [21:58:32] (03Merged) 10jenkins-bot: Update beta to use loginwiki for SUL [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/90670 (owner: 10CSteipp) [22:03:05] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=97%): [22:05:21] xenon still parsoid? RoanKattouw ^^ [22:05:46] Not that I know of? [22:05:46] SAL suggests probably not.. [22:05:52] * RoanKattouw goes to check what's using all that space on xenon [22:06:24] # cerium,praseodymium and xenon are cassandra test host [22:06:24] node /^(cerium|praseodymium|xenon)\.eqiad\.wmnet$/ { [22:06:35] gwicke: xenon is full [22:06:55] Ah yes [22:07:00] /home/gwicke is 45GB [22:07:03] !log ori synchronized wmf-config/Bug54847.php 'Ide07296c5: Fix error message for recycled passwords (Bug 56002)' [22:07:17] Logged the message, Master [22:07:26] gwicke: xenon:/home/gwicke/dumps has filled up the root partition [22:07:36] !log during previous sync-file: could not resolve mw125; time out connecting to srv291 [22:07:48] Logged the message, Master [22:07:52] As in it's 95% of the disk space utilizatoin [22:08:28] RoanKattouw: yeah [22:08:56] !log ori synchronized wmf-config/CommonSettings.php 'I6f6e0ffa3: Update beta to use loginwiki for SUL (bug 55760)' [22:09:06] csteipp: done [22:09:10] managed to fill it up by incompressing a 1G dump file [22:09:12] Logged the message, Master [22:09:17] Thanks! [22:09:22] lzma compresses those dumps rather well [22:09:32] *decompressing [22:10:00] RECOVERY - Disk space on xenon is OK: DISK OK [22:11:17] Reedy / greg-g: any idea what the plan is with the merged/unsynced changes to FR extensions, cldr & parser functions on wmf22? [22:11:55] ori-l: awight will be fixing those during LD [22:12:01] awight: mwalker|away ^^ [22:13:09] ori-l: I wasn't aware of that, looking now [22:14:00] I can either sync those for you if you like, or you can sync my change during your LD [22:16:01] ^ awight [22:17:44] ori-l: awight; We should only have unsynced changes to CentralNotice... [22:19:31] mwalker: argh, yes. i was diffing against the wrong branch. [22:19:36] sorry. [22:23:12] !log ori synchronized php-1.22wmf21/resources/mediawiki/mediawiki.js 'I0e7a47b5a: mediawiki.inspect: add CSS report (1/4)' [22:23:27] Logged the message, Master [22:23:30] !log ori synchronized php-1.22wmf21/resources/mediawiki/mediawiki.inspect.js 'I0e7a47b5a: mediawiki.inspect: add CSS report (2/4)' [22:23:42] Logged the message, Master [22:23:48] !log ori synchronized php-1.22wmf22/resources/mediawiki/mediawiki.js 'I0e7a47b5a: mediawiki.inspect: add CSS report (3/4)' [22:24:03] Logged the message, Master [22:24:05] !log ori synchronized php-1.22wmf22/resources/mediawiki/mediawiki.inspect.js 'I0e7a47b5a: mediawiki.inspect: add CSS report (4/4)' [22:24:17] Logged the message, Master [22:24:39] greg-g: done; thanks. [22:24:42] np [22:42:35] wrote outage report as reply on existing wikitech thread [23:05:44] !log mwalker synchronized php-1.22wmf22/extensions/CentralNotice 'Updating CentralNotice to masterish' [23:05:55] !log powercycling ms-be1011, locked up, no messages in console [23:05:57] Logged the message, Master [23:06:01] weird, two in two days [23:06:11] Logged the message, Master [23:07:24] gwicke: ping [23:07:35] paravoid: pong [23:07:51] I saw you figured the http_proxy thing already [23:08:11] yeah, cassandra and rashomon are ready for testing [23:08:14] RECOVERY - Host ms-be1011 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [23:08:49] currently wrapping up a http test tool that will feed the web api with data from dumps [23:08:59] that was fast :) [23:09:14] yeah, it is not that hard [23:09:37] the hardest bit is figuring out the proxy & old node version on precise stuff [23:09:45] yeah I noticed that [23:09:48] sounds suboptimal [23:10:02] https://www.mediawiki.org/wiki/User:GWicke/Notes/Storage/Testing [23:10:02] would parsoid work with node 0.10? [23:10:07] yup, I read that [23:10:09] paravoid: yes [23:10:19] we'd like to upgrade to something recent ideally [23:10:22] maybe it's worth it upgrading across the fleet [23:10:34] the ppa I used for now does not have the latest security updates, but works with the old libc [23:10:39] (03PS1) 10Reedy: Alphasort echowikis.dblist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91311 [23:10:44] limn is the other node app I'm aware of, but that isn't in production yet [23:10:52] tried to pull from Debian unstable, but got the conflict on libc [23:10:52] nah, I can just rebuild Debian's version [23:10:59] yeah don't worry about that [23:11:02] or that [23:11:09] that would be ideal really [23:11:34] that's not the hard part [23:11:48] the "hard" part is that we currently don't partition our apt repository, for simplicity [23:11:51] (03PS1) 10Reedy: Update size related dblists [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/91312 [23:12:09] so putting a newer node into apt would mean e.g. parsoid would use it on the next box or when we do apt-get upgrade on one of the existing boxes [23:12:42] if limn works with 0.10 too then that would be fine [23:12:49] we can of course create a new section if needed but we try hard to avoid doing so, in the sense that if we're doing the work for a new version, we might just as well make it available for others to benefit from it [23:13:25] but that also means forced migrations some time, not a huge deal really, just needs talking with people :P [23:13:38] at the same time [23:13:45] statsd is also node but is compatible with 0.10 [23:13:46] we currently don't use node publicly on anything important, but if we did we should get a recent version: http://blog.nodejs.org/2013/10/22/cve-2013-4450-http-server-pipeline-flood-dos/ [23:13:51] limn is only running in labs AFAIK [23:13:54] oh, right, I knew I was forgetting something [23:13:56] thanks ori-l [23:14:45] yes, I mentioned that limn is labs only so far, but labs uses the same repo and while I could say "oh well! not production, I don't care", I'm sure people wouldn't appreciate reportcard being down :) [23:15:19] anyway, that's really not an issue here, I'll take care of it [23:16:03] k, great [23:16:07] did you also install JNA? [23:16:22] not explicitly [23:16:23] it's not on your notes but datastax seems to recommend it [23:16:53] I only did apt-get install cassandra openjdk-7-jdk for cassandra [23:17:30] datastax recommends oracle java too, but I figured I'd try openjdk first [23:17:37] reportcard is vulnerable, btw [23:17:39] had no issues with it in local testing [23:17:41] yup, i ignored that part :) [23:18:02] docs seems to indicate that jna is as easy as apt-get install libjna-java [23:18:34] ok [23:18:47] "Installing JNA can improve Cassandra memory usage. When installed and configured, Linux does not swap out the JVM, and thus avoids related performance issues." [23:18:50] whatever that means [23:19:04] memory pinning maybe? [23:19:27] mlock [23:19:42] yeah, could be for mlockall [23:19:57] https://journal.paul.querna.org/articles/2010/11/11/enabling-jna-in-cassandra/ [23:20:01] Since 0.6.2: JNA for mlockall [23:20:04] Since 0.6.6: JNA for hard links, improving snapshots [23:20:34] ah, k [23:20:40] okay :) [23:21:19] the datastax packages seem to be very similar as the ASF one [23:21:38] all changelog entries are from @apache.org addresses [23:21:41] yes, the delta should be pretty minimal if there is any [23:24:47] paravoid: for now I have set up the commitlog on one of the ssds and the data on the rotating disk raid1 [23:25:24] both are symlinked into /var/lib/cassandra [23:25:54] I diffed the packages, there's a few .class files that are different [23:26:24] (03PS1) 10Ryan Lane: Maintain repositories on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/91319 [23:26:26] ori-l: ^^ present for you [23:26:39] the packages are identical except apache-cassandra-2.0.1.jar, apache-cassandra-thrift-2.0.1.jar & stress.jar which have different md5sums [23:27:00] (03Abandoned) 10Ryan Lane: Add git clones for repos on deployment systems [operations/puppet] - 10https://gerrit.wikimedia.org/r/86762 (owner: 10Ryan Lane) [23:27:24] (03CR) 10jenkins-bot: [V: 04-1] Maintain repositories on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/91319 (owner: 10Ryan Lane) [23:27:32] maybe a different build off the same codebase [23:27:53] (03CR) 10Ryan Lane: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91319 (owner: 10Ryan Lane) [23:28:20] !log catrope synchronized php-1.22wmf21/extensions/VisualEditor 'Update VisualEditor for cherry-picks' [23:28:34] Logged the message, Master [23:28:38] !log catrope synchronized php-1.22wmf22/extensions/VisualEditor 'Update VisualEditor for cherry-picks' [23:28:40] paravoid: if you'd prefer to test the Apache version then we can simply re-install cassandra from there [23:28:51] Logged the message, Master [23:29:17] I have a slight inclination to use ASF just because, but I can't say I can sensibly argue about it :) [23:30:27] but I was curious to see their changes anyhow [23:30:38] I wonder if they release sources for it, I'll check [23:31:04] paravoid: I'm happy to switch [23:31:04] nope, they don't [23:32:09] ooooo [23:32:17] Ryan_Lane: nice! [23:32:22] yeah :) [23:32:27] wow, RT tickets with screenshots, rare enough [23:32:28] way easier than doing it in puppet [23:32:59] mutante: mine? [23:33:07] (03PS2) 10Ryan Lane: Maintain repositories on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/91319 [23:33:55] jeremyb: yea, the mailman one, thanks for handling that:) [23:35:01] mutante: i triggered the ticket to begin with :) [23:35:34] (03PS1) 10Faidon Liambotis: apt: import Cassandra from ASF [operations/puppet] - 10https://gerrit.wikimedia.org/r/91322 [23:35:38] gwicke: ^ [23:35:47] the combo of grains and pillars is pretty awesome, btw. it's possible to set config either globally or locally and use it in modules where necessary [23:35:48] all those wikimania- lists.. seems like every Wikimania a new one was created :p [23:36:00] mutante, I was wondering why you created the new one :P [23:36:10] I was actually going to suggest getting rid of some of them.. [23:36:38] which also means you can apply configuration from puppet where necessary, rather than forcing into weird places [23:36:46] (03CR) 10GWicke: [C: 031] apt: import Cassandra from ASF [operations/puppet] - 10https://gerrit.wikimedia.org/r/91322 (owner: 10Faidon Liambotis) [23:37:16] Thehelpfulone: i didn't know.. -com vs. -core .. could have been different. and i think i didnt see all of them in listinfo , then just later when checking on sodium [23:37:36] hm. I should probably combine all the deployment config grains into a single hash. I wonder how doable that is with the grain stuff I'm doing in puppet [23:37:50] Thehelpfulone: but you'll notice several tickets ad "please ask THO:) or similar while [23:37:55] yeah, only about two of them are on the main listing IIRC [23:37:56] you were out [23:37:59] Thehelpfulone: well it's more clear now that it's clear that it's really public. but if it was private then we could have just used -planning-l [23:38:00] hehe [23:38:00] (03CR) 10Faidon Liambotis: [C: 032] apt: import Cassandra from ASF [operations/puppet] - 10https://gerrit.wikimedia.org/r/91322 (owner: 10Faidon Liambotis) [23:39:12] heh, the key with which the ASF archive is signed belongs to someone with a @datastax.com address [23:39:25] but yeah, I think availability of sources might just be the tipping point for us [23:39:48] gwicke_away: cassandra packages are in apt now [23:40:31] ; apt-get clean; apt-get remove cassandra; apt-get install cassandra should do it [23:40:44] or even apt-get clean; apt-get install --reinstall cassandra might [23:40:49] but I'm obviously not doing anything for now [23:41:04] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 13925 MB (3% inode=99%): [23:41:22] paravoid: how about apt-get update? [23:41:32] paravoid: go ahead, I'm still preparing the test client [23:41:40] if you're changing sources.list? [23:42:27] I guess the config files should survive without --purge [23:42:33] correct [23:45:48] my current test client is too slow to really push the server [23:46:49] mutante: you still in the office? [23:47:03] or, back in the office, as it were [23:47:06] greg-g: no, wfl [23:47:11] eh, work yes, office no [23:47:11] * greg-g nods [23:47:34] PROBLEM - DPKG on praseodymium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:47:40] mutante: just got a report of redirects still from ldavis [23:47:46] so, was going to have her walk down, but, alas :) [23:47:54] PROBLEM - DPKG on cerium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:48:12] (03CR) 10Ori.livneh: "This looks good, but I prefer to wait until we move Graphite from tampa. I've been working on a patch and it's maybe halfway there. Could " [operations/puppet] - 10https://gerrit.wikimedia.org/r/88471 (owner: 10Hashar) [23:48:54] RECOVERY - DPKG on cerium is OK: All packages OK [23:49:30] greg-g: hrmm, i see the issue with outreach.. but.. [23:49:34] RECOVERY - DPKG on praseodymium is OK: All packages OK [23:49:37] looks [23:50:19] (03CR) 10Ori.livneh: "Can you split ganglia::pyplugin into a separate patch?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/85669 (owner: 10Hashar) [23:53:54] PROBLEM - DPKG on xenon is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:54:54] RECOVERY - DPKG on xenon is OK: All packages OK [23:56:31] (03PS1) 10Faidon Liambotis: cassandra: add sysctl.d tunable [operations/puppet] - 10https://gerrit.wikimedia.org/r/91326 [23:56:35] ori-l: fun [23:56:54] it's even worse than that, postinst is broken and expects /etc/sysctl.d/cassandra.conf to exist or it fails to install/upgrade [23:57:42] heh [23:58:08] paravoid: this is with the apache or the datastax packages? [23:58:11] both [23:58:19] k [23:58:25] they are not very polished.. [23:58:41] they're not so bad either [23:58:55] restarting the service does not work for me in 2.0.1- pretty sure it used to work in 2.0.0 [23:58:56] for vendor packages I'm very happy :) [23:59:12] most such packages install into /opt or whatever [23:59:21] sigterm seems to do the right thing though [23:59:23] (03CR) 10Ori.livneh: [C: 032] graphite: tweak statsd aggregation [operations/puppet] - 10https://gerrit.wikimedia.org/r/88471 (owner: 10Hashar) [23:59:41] (03PS3) 10Ryan Lane: Maintain repositories on deployment server [operations/puppet] - 10https://gerrit.wikimedia.org/r/91319 [23:59:47] (03CR) 10Ori.livneh: [C: 032] graphite: make sure we aggregate min/max/count properly [operations/puppet] - 10https://gerrit.wikimedia.org/r/88470 (owner: 10Hashar)