[00:09:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:19:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.960 seconds [00:53:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:04:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.239 seconds [01:38:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:41:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 245 seconds [01:42:57] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 262 seconds [01:49:24] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 651s [01:49:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.034 seconds [01:52:27] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 6 seconds [01:55:45] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 22s [01:57:24] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 4 seconds [01:59:57] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [02:02:48] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [02:21:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.033 seconds [05:31:03] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [06:05:26] New review: Jeremyb; "change looks good, no comment on policy" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/16237 [07:08:55] New patchset: Jeremyb; "InitialiseSettings.php: reformat some sections" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16273 [07:28:09] New review: Nikerabbit; "Scheduled for I18n deployment tomorrow" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/16252 [07:40:59] hello [07:42:58] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [07:44:28] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [07:47:37] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [08:01:07] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [09:01:34] New patchset: Hashar; "beta: send udp2log messages to -dbdump" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16276 [09:01:48] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16276 [09:02:06] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [09:27:26] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [09:38:49] New patchset: Hashar; "role::logging::labs for udp2log in labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16278 [09:39:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16278 [09:49:47] hello [09:53:54] good morning :-) [09:58:06] New patchset: Hashar; "abstract out udp2log for MediaWiki logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16278 [09:58:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16278 [09:59:03] New review: Hashar; "Patchset2 : reuses existing code from nfs nodes and make it a new role: role::logging::mediawiki. Be..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16278 [10:00:16] paravoid: if you want a morning review, got you an easy one with https://gerrit.wikimedia.org/r/#/c/16278/ ;-D [10:00:24] it is to setup udp2log on beta [10:09:29] hashar: nak [10:09:56] as far as I know about our design, role classes are meant to tie other classes together [10:10:22] ha hm [10:10:23] wait [10:10:32] ;-) [10:10:56] I thought about factoring out the code to misc::udp2log::instance::mediawiki [10:11:05] and then have the role class to just require that new one [10:11:22] but that added an extra level which I thought was not really going to help anyone [10:43:49] paravoid: and, include more wikimedia specifics [10:44:00] ? [10:44:31] mark: I don't understand [10:47:32] sometimes we can pass specifics via variables so not all the details have to live in manifests and templates [10:47:50] so the actual manifests (to become modules now) can be a little bit more generic than they'd otherwise be [10:48:05] what are you commenting on? [10:48:15] ah, the role classes comment [10:48:18] yep [10:48:22] okay, that makes sense now [10:48:28] :) [10:48:33] I context switched three times since I made that comment [10:48:52] did you see hashar's commit? [10:48:56] not yet [10:49:25] please do, I'm not sure if it fits into your definition of role classes [10:50:52] no not really [10:51:04] in general, only one role class should arrange everything for a box [10:51:52] New review: Mark Bergsma; "This is not really a role class entirely. In general, a role class should (under normal circumstance..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/16278 [10:54:00] back from lucnh [10:54:07] well technically not a lunch but he .. :-D [10:54:17] paravoid: have you looked at my change for labs logging ? [10:55:44] have YOU read the backlog? ;) [10:58:07] mark: doh cleaned it out :D [10:58:14] it is kind of a reflex to ^L it [10:59:32] mark: so should I rename / move that class from role::** to misc::logging::mediawiki ? [10:59:52] or misc::udp2log::instance::mediawiki [11:00:09] which would just be a wrapper around the parameterized class [11:02:18] we only need this for labs, right... since otherwise you could just put it in site.pp directly [11:04:21] we need to move this crap off of nfs anyway [11:08:11] mark: it is just for beta indeed [11:10:38] that's really slightly different from a role class [11:10:47] but I'm not sure we should add another layer of abstraction for htis [11:10:47] this [11:13:29] mark: so what do I do now ? ;-) [11:13:50] I don't care changing the class around to fit whatever coding style or class organization ops prefer [11:13:57] I just need some a clear direction [11:15:40] can you, fix the indenting to use tabs [11:15:45] and add a system_role definition [11:15:52] then I guess it's fine for now [11:15:59] argh [11:16:01] copy pasted :-D [11:17:00] New patchset: Hashar; "abstract out udp2log for MediaWiki logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16278 [11:17:27] mark: done (patchset 3 ) [11:17:32] read again [11:17:35] New review: Hashar; "Patchset3: space to tabs in manifests/role/logging.pp" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16278 [11:17:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16278 [11:17:50] need a system_role [11:18:44] <^demon> hashar: https://github.com/klaussilveira/gitlist/blob/master/lib/GitList/Application.php -- how is line 30-32 even possible? [11:19:24] <^demon> Shouldn't PHP yell at you for using an object as an array? [11:19:37] ^demon: not if the object is an array? [11:19:46] <^demon> How can an object be an array? [11:19:47] ^demon: you could make an object implement ArrayObject or something [11:20:33] <^demon> Would that work still? [11:20:53] well since it is an array [11:20:53] oh dear, tabs again :) [11:20:55] it should :-] [11:21:45] well regardless of the decision we'll make, we should not mix tabs and spaces in existing files eh [11:21:53] ^demon: that should be the ArrayAccess interface: http://www.php.net/manual/en/class.arrayaccess.php [11:22:14] <^demon> hashar: Ahhh, eventually up the symfony stack it implements ArrayAccess. [11:22:25] <^demon> Still, that's kind of silly. Nobody does that. [11:22:40] ^demon: expect all the frameworks that use symphony / silex and all ? ;-] [11:22:46] ^demon: our code base is like 5 years old hehe [11:22:57] <^demon> Doesn't mean that it's right ;-) [11:23:06] mark: as I linked the other day, http://www.emacswiki.org/pics/static/TabsSpacesBoth.png [11:23:23] (yeah, fully agreed) [11:24:08] ^demon: by reimplementing an Array, you could validate the key given to the array. So $object['invalid_key'] = 'value' , could be made to throw an exception about how 'invalid_key' is … invalid! ;-D [11:24:38] <^demon> hashar: In any case, that 'gitlist' software isn't pretty. Everything is done via exec() to cli git, and then lazy-cached in a ./cache directory if its expensive. [11:24:40] <^demon> *shudder* [11:25:59] hashar: re: jobrunner [11:26:06] are we waiting for the TMH changes as well? [11:26:10] I've kinda lost track [11:26:16] (again, heh) [11:29:42] there is the https://gerrit.wikimedia.org/r/#/c/11610/ [11:29:59] the thing I hate is that operations/debs/wikimedia-job-runner is a Debian package [11:30:11] which rely on a shell script in a MediaWiki extension ( extensions/WikimediaMaintenance ) [11:30:29] maybe we could ditch the package out and replace it by a puppet class :-D [11:31:01] anyway, in change 11610, we needed a new timeout parameter [11:31:55] ewww [11:31:58] which is closely related to https://gerrit.wikimedia.org/r/#/c/15954/ that adds -t maxtime [11:32:00] or [11:32:13] if you are in the mood for it, we can deprecate / kill the debian package [11:32:17] and move everything in puppet :-] [11:32:21] I am in the mood for it [11:32:23] (or move everything in the deb package up to ops) [11:32:25] hehe [11:32:35] but I don't see how moving it to puppet will be any better [11:32:54] one thing we have to be VERY carefully, is that any change to job related script has the potential to kill the production job system :/ [11:33:01] (which we need to rewrite entirely, really) [11:33:53] people keep telling me that I have to break the site to really be part of this team [11:34:02] which I haven't done yet [11:34:08] seriously? [11:34:13] ;-D [11:34:33] :P [11:34:41] you must be very cautious (which is a great competency/skill/ability/something) [11:35:14] killing the jobrunners doesn't cut it [11:35:20] that's too boring [11:35:28] damnit [11:36:01] so, [11:36:15] why do we even have the job runner deb? [11:36:29] isn't that a mediawiki thing? [11:36:33] I guess that is how we managed dependency / deploying init script and such [11:36:43] because once upon a time, we didn't have puppet [11:36:50] and that of course :-D [11:36:57] hehe [11:37:04] i'd be fine with that moving into puppet [11:37:10] so we had to poke mark/tim to get the change to sneak in a .deb [11:37:21] as long as puppet doesn't need to deploy heaps of files which belong in a db [11:37:22] deb [11:37:26] but I don't think that's the case here [11:37:31] right, fully agreed [11:37:45] there is a shell script / an init script / a default file in etc. That is about it [11:38:01] yeah, sounds like something more easily handled in puppet [11:38:27] could convert it to upstart too [11:38:32] if that's cleaner [11:38:39] paravoid: that would need a wait to setup specific init script [11:38:42] role based [11:38:53] eh? [11:38:58] something like having a default /etc/init.d/run-job-${some name} [11:39:11] with TMH, we will start transcoding video [11:39:20] so we will have boxes dedicated to only videotranscoding [11:39:28] says who [11:39:34] the shell script should be given the type of job to run like webTranscoding [11:39:44] * hashar finds in puppet an example [11:39:56] init scripts don't take arguments [11:40:27] some of them do, but that's not during boot and that's always counterintuitive [11:40:38] something like the varnish stuff: service { "varnishncsa-${name}": [11:40:38] require => File["/etc/init.d/varnishncsa-${name}"], [11:40:47] yes [11:40:49] I hate that :P [11:41:03] * hashar git blame the varnish.pp to find out who introduced that :-]]]]]]]]]]]] [11:41:09] then again, i'm not sure if upstart's INSTANCES are any better [11:41:22] peter introduced that [11:41:56] i played around with upstart's INSTANCE env var support, but couldn't get that to work reliably with puppet [11:42:09] although I believe puppet has some support for upstart jobs now, haven't looked at it yet [11:42:28] * paravoid knows very little about upstart jobs [11:42:46] (not on purpose :) [11:43:31] the problem is, you need to pass an environment variable (or argument) to the init script [11:43:38] which indeed doesn't work on boot [11:43:42] and also doesn't work that well in puppet [11:43:44] or at least, didn't [12:00:33] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [12:03:23] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [12:19:52] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16273 [12:22:01] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16268 [12:22:11] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16267 [12:22:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16237 [12:24:00] Reedy: I did sneak a change for beta this morning [12:24:27] Reedy: 66ca8b0 - beta: send udp2log messages to -dbdump (3 hours ago) [12:24:35] haven't synced it on production though :/ [12:24:38] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16264 [12:26:02] ^demon: do you happen to know which DB server Gerrit uses now? [12:26:15] <^demon> db1048. [12:26:48] bah it is not in ishmael apparently :/ [12:28:43] <^demon> hashar: https://gerrit.wikimedia.org/r/#/c/16150/ [12:30:29] ^demon: great :-] [12:30:57] ^demon: latency was definitely improved. that might a point to strike in the github vs gerrit wiki page :-] [12:34:00] mark: what do we do about my role::logging::mediawiki class https://gerrit.wikimedia.org/r/#/c/16278/ ? [12:34:29] mark: should I get rid of it in favor on misc::udp2log::instance::mediawiki or is that good to go ? :-] [12:35:38] <^demon> hashar: Someone already crossed it off the "todo" list on the eval page. [12:36:20] I did it :p [12:36:50] \O/ [12:38:02] !log rebalanced swift rings moving more content to new object servers [12:38:10] Logged the message, Master [12:43:21] i thought we already covered that hashar [12:43:59] mark: mind copy/pasting / repeating ? :r( [12:44:28] 13:15:40 <@mark> can you, fix the indenting to use tabs [12:44:28] 13:15:44 <@mark> and add a system_role definition [12:44:29] 13:15:52 <@mark> then I guess it's fine for now [12:45:14] ahh my brain parser skipped the system_role line :-D [12:45:51] twice :P [12:45:52] mark: I did some nice alignment for the values passed to parameters [12:45:53] maplebed: oh hi [12:45:55] wanna keep them ? [12:46:01] or should I get rid of them too ? [12:46:02] you seem like you need vacation hehe [12:46:04] paravoid: morning [12:46:13] definitely :-( haven't took any vacations for like 2 years [12:46:30] maplebed: can I shut down owa1/owa2? [12:46:30] You can parse brains? [12:46:31] and my little daughter keep crying every evening so yeah, will definitely just sleep for 3 weeks huuh [12:46:34] !log rebooting ms-be10 for xfs errors and a clean boot [12:46:42] Logged the message, Master [12:46:56] paravoid: I'd rather not; how come? [12:47:05] oh, and shut down or reboot? [12:47:43] PROBLEM - swift-account-server on ms-be10 is CRITICAL: Connection refused by host [12:47:43] PROBLEM - swift-object-updater on ms-be10 is CRITICAL: Connection refused by host [12:47:43] PROBLEM - swift-container-updater on ms-be10 is CRITICAL: Connection refused by host [12:47:52] PROBLEM - swift-account-auditor on ms-be10 is CRITICAL: Connection refused by host [12:47:52] PROBLEM - swift-object-auditor on ms-be10 is CRITICAL: Connection refused by host [12:47:52] PROBLEM - swift-container-auditor on ms-be10 is CRITICAL: Connection refused by host [12:48:01] PROBLEM - SSH on ms-be10 is CRITICAL: Connection refused [12:48:10] PROBLEM - Swift HTTP on ms-fe4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:48:10] PROBLEM - Swift HTTP on ms-fe3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:48:10] PROBLEM - LVS HTTP IPv4 on ms-fe.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:48:11] PROBLEM - swift-object-replicator on ms-be10 is CRITICAL: Connection refused by host [12:48:11] PROBLEM - swift-account-reaper on ms-be10 is CRITICAL: Connection refused by host [12:48:11] PROBLEM - swift-container-replicator on ms-be10 is CRITICAL: Connection refused by host [12:48:11] PROBLEM - Swift HTTP on ms-fe2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:48:17] maplebed: shutdown, mark was telling me that owa is something obsolete? can we recycle the hardware? [12:48:29] but CT was telling me that you might still be using it for swift tests [12:48:38] PROBLEM - swift-container-server on ms-be10 is CRITICAL: Connection refused by host [12:48:46] PROBLEM - swift-object-server on ms-be10 is CRITICAL: Connection refused by host [12:48:46] PROBLEM - swift-account-replicator on ms-be10 is CRITICAL: Connection refused by host [12:48:52] the hardware is effectively being recycled atm (as you say, for swift tests)... I just didn't change the names (since we don't do that). [12:49:16] oh I thought that had moved to labs already [12:49:23] mark: not performance testing. [12:49:26] can't do it there. [12:49:31] RECOVERY - Swift HTTP on ms-fe2 is OK: HTTP OK HTTP/1.1 200 OK - 366 bytes in 0.011 seconds [12:49:33] functional testing is in labs. [12:50:00] ignornig these pages right? [12:50:17] New patchset: Hashar; "abstract out udp2log for MediaWiki logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16278 [12:50:18] i assume ben is looking at it [12:50:20] As soon as I get the eqiad cluster up and running perf testing will move to that cluster and we can actually recycle the machines. [12:50:43] RECOVERY - LVS HTTP IPv4 on ms-fe.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 366 bytes in 0.016 seconds [12:50:43] RECOVERY - Swift HTTP on ms-fe4 is OK: HTTP OK HTTP/1.1 200 OK - 366 bytes in 0.018 seconds [12:50:43] RECOVERY - Swift HTTP on ms-fe3 is OK: HTTP OK HTTP/1.1 200 OK - 366 bytes in 0.010 seconds [12:50:56] maplebed: so, remove them from decom hosts then [12:50:58] New review: Hashar; "Patchset 4:" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/16278 [12:50:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16278 [12:51:02] (I will I mean) [12:51:07] is this just owa1/owa2? [12:51:13] mark: got the system_role and removed the spaces : https://gerrit.wikimedia.org/r/#/c/16278/ [12:51:15] or owa3 too? [12:51:18] paravoid: 3 as well. [12:51:21] okay [12:51:35] they hadn't run puppet for a while though [12:51:46] apergos: ms-be10 paged? or just hit IRC? [12:51:55] ms-fe LVS paged [12:51:58] that's system_role is wrong, hashar [12:52:03] that's not the name of the class is it [12:52:05] paravoid: that's true; puppet's disabled on them (so as to not wipe out the perf testing changes) [12:52:19] erm, that's bad [12:52:36] yeah that's not a good idea [12:52:36] ms-fe yeah [12:52:39] esp. considering we do access prov/revocation through puppet [12:52:43] among other reasons [12:53:39] can we puppetize or make puppet ignore these perf testing changes? [12:53:57] mark: I have no idea what the system_role is for I just copied it from above aka role::logging [12:54:18] hashar: it's the line in /etc/motd, and it the name should match the role class name [12:55:34] New patchset: Faidon; "Remove owa1/2/3 from decom, still in use" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16294 [12:56:09] New patchset: Faidon; "Remove OWA manifests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16295 [12:56:34] mark: I fixed both system roles with https://gerrit.wikimedia.org/r/16296 ;-D [12:56:34] PROBLEM - Host ms-be10 is DOWN: PING CRITICAL - Packet loss = 100% [12:56:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16294 [12:56:44] New patchset: Hashar; "fix system_log entry in role::logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16296 [12:56:47] maplebed: ^^ deletes owa.pp, ack? [12:56:52] 16295 that is [12:56:54] mark: too [12:57:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16295 [12:57:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16296 [12:57:28] paravoid: fine by me, but I"m no authority. I just use the hardware... :P [12:57:53] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16294 [12:58:05] maplebed: :-) [12:58:36] maplebed: so, < paravoid> can we puppetize or make puppet ignore these perf testing changes? [12:58:42] paravoid: yeah, fine [12:59:09] New review: Faidon; "approved my mark on IRC" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/16295 [12:59:10] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16295 [12:59:17] by even :) [12:59:23] paravoid: to me, it makes sense to puppetize the results of perf testing changes, but not really to do each one while testing it. [12:59:26] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16278 [12:59:41] i.e. when we figure out which changes make performance better, we puppetize and deploy. [12:59:52] but when poking around at all the various knobs, it's not really useful. [12:59:55] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16296 [12:59:58] sure, but can we then make puppet just ignore these local changes? [13:00:09] maplebed: that's all nice and fun, but you should never disable puppet on a system [13:00:17] having a disabled puppet on a system is a bad idea imho [13:00:18] since we rely on it for maintaining our systems [13:00:25] you can disable it briefly [13:00:30] but shouldn't do that for more than a day or two [13:00:53] if you want puppet not to touch something you're working on, you need to configure it not to do that, not disable it [13:01:11] mark: (I merged your merges on sockpuppet) [13:01:17] (thanks) [13:01:58] RECOVERY - Host ms-be10 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [13:03:43] I'm trying to think how I would ask puppet to ignore the swift config files while still using the role class setup we've got. [13:04:00] don't use the role class setup? [13:04:05] I suppose I could just not include the swift classes on those hosts (since they're there, puppet won't remove them) [13:04:08] or use it temporarily to set stuff up, and then remove it [13:04:17] yeah [13:04:29] that feels way weird. [13:05:03] but yeah, it would work. [13:05:08] feels fine to me [13:05:20] it feels weird because then puppet is lying about what the host is doing. [13:05:22] feels way better than disabling config management entirely and letting boxes sit unmanaged anyway [13:05:47] if you care about it, you can add parameters for a debug/testing mode where it handles the box slightly differently [13:05:59] but it clutters up the config and takes more time [13:08:44] apergos: btw, I cleaned up snapshot1/2.wikimedia.org [13:08:52] thank you [13:09:20] New patchset: Bhartshorne; "disabling puppet swift configs on test cluster for local perf testing changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16298 [13:09:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16298 [13:10:03] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16298 [13:10:04] maplebed, mark: btw, what's this "do not rename policy"? [13:10:25] are these machines going to be named owa forever? [13:12:06] these probably won't [13:12:16] because we have no owa project, and they're now misc machines [13:12:38] we try to avoid renaming machines as it's a pita and for misc machines, we have generic names [13:12:52] aha [13:12:54] okay :) [13:12:55] thanks. [13:13:20] so I try to only use cluster names where we're sure they won't rename (squids, mediawiki, etc) [13:13:25] and of course this example is right at the line [13:13:48] hehe [13:15:28] I got a funny puppet error : Could not find resource 'Class[Misc::Udp2log]' for relationship on 'Misc::Udp2log::Instance[mw]' [13:15:45] hahahaaha [13:15:47] manifests/misc/udp2log.pp [13:15:56] Reedy: at least one laughing :-] thanks! [13:16:29] the parameterized class misc::udp2log::instance has something that look like a dependency: Class["misc::udp2log"] -> Misc::Udp2log::Instance[$title] [13:16:38] I am wondering if there is a case mismatch [13:17:11] that's horrible :) [13:17:16] no case mismatch I can see though [13:17:23] ahh [13:17:35] maybe I needed to include misc::udp2log BEFORE calling the parameterized class [13:17:51] what do you mean by before? [13:18:13] I have setup a new role class class role::logging::mediawiki { [13:18:18] which just call misc::udp2log::instance { "mw": [13:18:24] maybe it need an include misc::udp2log [13:18:25] right [13:18:28] yes [13:18:30] you need to do that [13:18:40] the Class[...] -> ... is a depedency, not an include [13:18:50] can't puppet nicely autoload its classes? [13:18:52] and you depend on something that was never included, hence the error [13:19:13] it can, but how would it know that you needed that now? :) [13:20:48] cant we include the class directly inside the parameterized class ? [13:20:52] PROBLEM - Host ms-be10 is DOWN: PING CRITICAL - Packet loss = 100% [13:21:01] you can if you want [13:21:20] thus the Class["misc::udp2log"] -> Misc::Udp2log::Instance[$title] will self satisfy ;) [13:26:16] RECOVERY - Host ms-be10 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [13:26:39] New patchset: Hashar; "role::logging::mediawiki needs misc::udp2log and utilities" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16300 [13:26:58] paravoid: ended up including the needed class before calling the parameterized instance ^^^ ( 16300 ) [13:27:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16300 [13:29:52] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16300 [13:31:30] paravoid: also puppet complain about : require => apache_site['controller', '000_default'] with message Resource references should now be capitalized [13:31:37] paravoid: so we should do Apache_site [13:31:57] paravoid: but that might just be a false positive from puppet since apache_site is one of our parameterized class [13:32:51] nope, that's right [13:33:01] lemme fix that [13:33:16] paravoid: I got a change to fix some other deprecations [13:34:20] ♥ [13:34:33] maplebed: if only we had that love machine [13:34:50] so true. [13:35:20] New patchset: Hashar; "Reource references should now be capitalized" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16302 [13:35:41] paravoid: here are some more deprecations https://gerrit.wikimedia.org/r/#/c/16302/ [13:35:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16302 [13:36:07] paravoid: I did not fix the calls to our classes such as require => apache_site[foobar], or require => git::clone[foobar] [13:36:49] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16302 [13:37:25] grnbmbm [13:37:28] my puppet syntax file sucks [13:37:34] New patchset: Faidon; "Fix a non-capitalized resource reference" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16304 [13:37:39] "$swiftcleaner_basedir/swiftcleanermanager -c $swiftcleaner_basedir/swiftcleaner-$name.conf -A /tmp/swiftcleaner-${name}-\$(date +\%Y\%m\%dT\%H\%M\%S) -p /tmp/swiftcleaner-$name.pid >> /tmp/swiftcleaner-${name}-\$(date +\%Y\%m\%dT\%H\%M\%S).log" [13:37:43] that one is fully in purple [13:37:54] apparently puppet complain about \% not being a recognized escape sequence [13:38:12] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16304 [13:38:21] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16304 [13:40:27] paravoid: mark: I got mediawiki logs again on beta!!!! thanks! ;-] [13:40:48] yay [13:44:52] !log authdns-update for ms-be eqiad hosts [13:44:59] Logged the message, RobH [14:09:37] PROBLEM - Host ms-be10 is DOWN: PING CRITICAL - Packet loss = 100% [14:30:27] RECOVERY - Host ms-be10 is UP: PING WARNING - Packet loss = 80%, RTA = 0.25 ms [14:41:42] our udp2log stuff is a real mess :-D [14:42:32] the generated init script adds parameters which are not recognized by the udplog daemon :/ [14:44:19] generated by whom? [14:45:16] it is an erb template, eh? [14:45:18] haha, hilarious [14:45:22] i helped refactor the puppet stuff [14:45:34] so we have several init scripts [14:45:35] but I tried not to change the end result too much [14:45:37] yeah [14:45:41] its really annoying [14:45:43]