[00:00:13] it's in SwiftFileBackend now (plus it uses a curl multi client in core in /libs) [00:01:11] (03PS2) 10Faidon Liambotis: Make temp URLs actually work (e.g. for private containers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106635 (owner: 10Aaron Schulz) [00:01:26] alrighty, I'm deploying some lightnings [00:01:33] the funny thing is that SwiftFileBackend actually has slightly less code in it ;) [00:01:44] :) [00:01:45] MaxSem: just one, right? ;) [00:01:54] yup [00:01:58] (03CR) 10Faidon Liambotis: [C: 032] Swift: make temp URLs actually work [operations/puppet] - 10https://gerrit.wikimedia.org/r/106635 (owner: 10Aaron Schulz) [00:02:26] gerrit is awfully slow [00:03:23] maybe we should use methods = GET HEAD in the conf too [00:03:30] y so sloww, dear gerrit? [00:03:32] it's slightly scary that PUT is in there by default [00:03:55] some sort of temp key leak could lead to data loss via overwriting files [00:04:27] under [filter:tempurl] that is [00:05:16] (03CR) 10MaxSem: [C: 032] Enable new diff everywhere now that wikidiff2 has been upgraded [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106629 (owner: 10MaxSem) [00:05:21] we filter methods in varnish too [00:05:23] but it's a good point [00:05:41] and we will probably never have a use case for PUT anyway [00:05:50] paravoid: I'll make a patch [00:05:52] (03Merged) 10jenkins-bot: Enable new diff everywhere now that wikidiff2 has been upgraded [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106629 (owner: 10MaxSem) [00:06:03] I was about to, but gerrit is getting on my nerves [00:07:02] AaronSchulz: do you know if swift sends a proper Expires header for the URL? [00:08:37] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/106629' [00:08:44] Logged the message, Master [00:08:48] MaxSem: so what is this new diff feature? :) [00:09:46] paravoid, https://en.m.wikipedia.org/wiki/Special:MobileDiff/589997839...589997897?mobileaction=beta [00:10:13] * AaronSchulz checks [00:10:54] (03PS1) 10Aaron Schulz: Added a [filter:tempurl] section to the swift proxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/106637 [00:11:11] okay, I'm done [00:11:20] AaronSchulz: do these temp urls go through img_auth.php? or are they served to users via varnish too? [00:11:57] they just get used by avconv internally at the moment [00:12:03] ok [00:12:04] there are no plans for giving them to users [00:12:10] ok, good :) [00:12:15] which wouldn't work anyway due to the private dns name [00:12:40] so the cache headers probably don't matter then [00:14:51] I wonder why all of our other filters have a use = egg... [00:15:00] the docs say it's not needed [00:24:31] paravoid, re backporting- in addition to node 0.10 (doneish) and textlive we'll need a recent phantomjs [00:24:39] *texlive [00:24:55] 1.9.0? [00:25:20] yes, afaik that is the latest in debian [00:25:43] at least what I have locally [00:25:50] tracking unstable [00:26:05] (03PS4) 10BryanDavis: [WIP] Add logstash config for udp2log [operations/puppet] - 10https://gerrit.wikimedia.org/r/106154 [00:27:18] paravoid, nm- just checked on ruthenium and realized that 1.9.0 is already available [00:30:48] "Changes newer than 29 seconds may not be shown in this list. " [00:30:51] Hmm [00:31:00] "Due to high database server lag, changes newer than 43 seconds may not be shown in this list. : [00:31:05] Oh my [00:31:10] And it's rising it seems [00:31:18] Okay it stopped [00:34:09] 2014-01-09 21:34:50 mw1108 mediawikiwiki: Memcached error for key "flow-tree:rootpath:050dad18cf43f4081a6a90b11c2793de" on server "127.0.0.1:11211": SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY [00:34:52] Does this mean $wgMemc->get( key ) would fail? [00:35:32] found why RT is broken [00:35:50] bleerghh [00:37:00] bsitu: it means all memc ops would fail for a while on that server [00:37:56] pecl avoids "memcached" for php in apache, but it's really avoiding twemproxy, which proxies to all the mc* boxes [00:38:06] AaronSchulz: ah, thx [00:38:06] *pecl memcached [00:38:21] and by "that server" I mean mw1108 [00:38:52] paravoid: so....how about that swift cluster in tampa? :) [00:39:47] give ma sec, I'm doing a bazillion things atm [00:39:55] me a sec* [00:42:45] (03CR) 10Faidon Liambotis: "This is not what this commit does. The previous commit had the same commit message, never do that. The commit should had been titled "rt: " [operations/puppet] - 10https://gerrit.wikimedia.org/r/106114 (owner: 10RobH) [00:42:48] RobH: ^ [00:43:07] (I have a reason for looking at your commits and this got me confused -- RT is broken atm) [00:43:23] oh, it was workign post commit and apache restart... i thought [00:43:32] i tested, did my stuff break or something else? [00:43:41] it's not your fault exactly [00:43:53] the mail->RT is broken [00:43:59] we have mails in the queue from vendors etc. too :) [00:44:09] it's because we have two certificates now, so we're doing SNI [00:44:27] and the perl libraries in precise don't do SNI [00:44:44] so your comment on my commit message is what, it shoudl read something like 'install the rt.wikimedia.org' rather than replace? [00:44:52] since replace i guess means the apache file? [00:44:57] no [00:45:20] grr gerrit is slow [00:45:27] so 8ddbd0760987b2b63681b0ffcfcba235a6db4ee8 is fine [00:45:31] yea, im trying to look and its reallllly slow [00:45:57] but then you did another fixup [00:46:04] to move install_certificate from inside the if to outside [00:46:15] oh, so my first was fine? i thought it wasnt working =P [00:46:18] so i moved it due to that [00:46:26] second guessed myself [00:46:28] no, the commit message for the first is fine I mean [00:46:39] it was buggy, and your second commit fixed it [00:46:47] but your second commit's message should reflect that [00:46:55] and referenced the other one somehow? [00:46:57] say "Fixup on the previous commit" or something like that [00:47:01] ok [00:47:18] because I was looking at the logs, and found the "replace commit", viewed that, and it was all confusing [00:47:26] yea, i understand what you mean [00:47:27] I was like "this doesn't replace anything!?" [00:47:35] took me a while :) [00:47:47] so now its also broken due to certs? [00:47:53] kind of, yes [00:48:13] .....shit, uhh, does the mailserver use the wildcard? [00:48:16] cuz i never touched it. [00:48:28] (or unrelated, but it just occured to me now) [00:48:47] You don't need to explain what is up with it now if you are busy fixing it, I can wait =] [00:50:23] yup, give me a sec [00:58:10] RobH: can you look at gerrit in the meantime? [00:58:30] checkin [01:00:23] its maxed out its memory with some java process [01:00:56] which is of course, well, gerrit.. [01:01:34] Have you tried turning it off and on again? [01:01:58] im going to restart the process yea, but was trying to see if i could determine what its doing first [01:02:04] well restart gerrit that is [01:02:21] i hope that doesnt break shit worse [01:02:44] I presume you can restart it gracefully? (ie now kill -9 etC) [01:02:47] !log gerrit is slowed down and memory is maxed out on system, restarting gerrit to see if it helps [01:02:53] yea, its restarting gracefully [01:02:53] Logged the message, RobH [01:03:11] its stopping... i imagine this part takes awhile [01:03:20] oh awesome [01:03:28] it cleared, but its clear came through AFTER i restarted [01:03:32] so its just gonna do what its doin. [01:03:36] and its back [01:03:54] and still slow, wooo [01:04:11] =/ [01:04:40] It should be alright when it warms up and consumes moar memory [01:05:07] yea, its no longer maxed out atleast [01:05:37] ok, its faster now [01:05:59] !log gerrit seems faster again for now [01:06:06] Logged the message, RobH [01:14:10] (03PS1) 10Faidon Liambotis: Add SSLCACertificatePath for rt & magnesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/106640 [01:16:17] (03PS2) 10Faidon Liambotis: Add SSLCACertificatePath for rt & magnesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/106640 [01:16:18] (03PS2) 10Faidon Liambotis: swift: don't allow PUT for temp URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/106637 (owner: 10Aaron Schulz) [01:16:31] gerrit still seems really happy now, and my bloodsugar is craptastic, im afk for a short bit getting food. [01:16:49] (03CR) 10Faidon Liambotis: [C: 032] swift: don't allow PUT for temp URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/106637 (owner: 10Aaron Schulz) [01:17:02] RobH: ping me later if you want to know what was broken with RT (it's complicated) [01:17:16] (03CR) 10Faidon Liambotis: [V: 032] swift: don't allow PUT for temp URLs [operations/puppet] - 10https://gerrit.wikimedia.org/r/106637 (owner: 10Aaron Schulz) [01:17:33] (03CR) 10Faidon Liambotis: [C: 032] Add SSLCACertificatePath for rt & magnesium [operations/puppet] - 10https://gerrit.wikimedia.org/r/106640 (owner: 10Faidon Liambotis) [01:19:51] !log apt: installing trusty's libio-socket-ssl-perl into precise-wikimedia; SNI-capable [01:19:59] Logged the message, Master [01:21:09] !log restoring RT's mail interface; expect delayed RT email to arrive soon [01:21:16] Logged the message, Master [01:29:03] ottomata: hey [01:31:58] (03PS1) 10Springle: Reduce LB general traffic during reindexing/partitioning on groupLoadsBySection slaves. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106644 [01:32:44] (03CR) 10Springle: [C: 032] Reduce LB general traffic during reindexing/partitioning on groupLoadsBySection slaves. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106644 (owner: 10Springle) [01:32:52] (03Merged) 10jenkins-bot: Reduce LB general traffic during reindexing/partitioning on groupLoadsBySection slaves. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106644 (owner: 10Springle) [01:34:00] !log springle synchronized wmf-config/db-eqiad.php 'Reduce LB general traffic during reindexing/partitioning on groupLoadsBySection slaves' [01:34:06] Logged the message, Master [01:36:08] AaronSchulz: I'm restarting all Swift frontends for tempurl [01:36:24] ok [01:36:32] I was too quick on the depool/restart/pool one of them and it spammed exception.log, ignore that [01:37:16] ok, it's done [01:41:21] !log maxsem synchronized php-1.23wmf9/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/106639/' [01:41:30] Logged the message, Master [01:42:24] paravoid: confirmed [01:43:12] and fudging the url/sig slightly gives a 401 so that's good [01:43:36] I need to do a swift sprint sometime soon [01:43:54] upgrade to latest version, switch to using a proper puppet module, remove the whole syslog crap etc. [01:43:57] so how much do we are about tampa being well synced? [01:45:05] * AaronSchulz made https://gerrit.wikimedia.org/r/#/c/106638/ to help with that, since the timestamp based mode is slow and sub-optimal after switching around "masters" [01:45:21] RECOVERY - Disk space on ms-be11 is OK: DISK OK [01:45:41] of course swiftrepl could do this now (e.g. clean up any discrepancies) [01:45:42] !log maxsem synchronized php-1.23wmf10/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/106639/' [01:45:49] Logged the message, Master [01:46:24] !log swift: set_weight 0 to ms-be11/sde1, disk failed [01:46:30] Logged the message, Master [01:47:12] wait, swift returns md5s on the listings? [01:47:21] holy crap [01:47:21] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:47:21] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:47:38] springle: ^^ [01:48:13] research slave [01:49:11] RECOVERY - Disk space on wtp1014 is OK: DISK OK [01:49:11] RECOVERY - Disk space on wtp1019 is OK: DISK OK [01:49:11] RECOVERY - Parsoid on wtp1024 is OK: HTTP OK: HTTP/1.1 200 OK - 970 bytes in 0.004 second response time [01:49:12] RECOVERY - Disk space on wtp1004 is OK: DISK OK [01:49:21] RECOVERY - Disk space on wtp1007 is OK: DISK OK [01:49:22] RECOVERY - Parsoid on wtp1010 is OK: HTTP OK: HTTP/1.1 200 OK - 970 bytes in 0.003 second response time [01:49:31] RECOVERY - Disk space on wtp1024 is OK: DISK OK [01:49:51] RECOVERY - Disk space on wtp1010 is OK: DISK OK [01:50:11] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [01:50:11] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [01:50:22] looks like parsoid had filled the disks with logs [01:50:44] yes, three days ago [01:50:52] I pinged you back then but didn't follow up [01:50:57] I just cleaned them up [01:51:04] thanks [01:51:27] with the upstart and log rotation stuff in puppet we'll hopefully end that misery soon [01:51:33] yup [01:51:47] I'm not complaining :) [01:52:06] this is something that we long should had done within ops [01:52:22] we should had done it long time ago I mean [01:52:45] well, either way we are getting there ;) [01:53:41] (03CR) 10Faidon Liambotis: "Brandon, what do you think? Can/should we fix this in the vmod?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102887 (owner: 10Yurik) [01:55:35] (03CR) 10Faidon Liambotis: "I too find the abstraction a bit too much." [operations/puppet] - 10https://gerrit.wikimedia.org/r/83768 (owner: 10Dzahn) [01:59:11] RECOVERY - Parsoid on wtp1014 is OK: HTTP OK: HTTP/1.1 200 OK - 970 bytes in 0.007 second response time [02:01:31] RECOVERY - Parsoid on wtp1019 is OK: HTTP OK: HTTP/1.1 200 OK - 970 bytes in 0.006 second response time [02:05:05] AaronSchulz: you have a change for 1.23wmf10, were you planning on deploying it? [02:06:59] (03CR) 10Faidon Liambotis: [C: 04-1] "Pretty good. Inline comments." (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [02:07:55] robla the Flow fix is just a sync-file, but I understand its bad form to leave another's commit on tin undeployed. What should we do? [02:08:41] AaronSchulz: ping [02:09:26] paravoid: do you know if the change AaronSchulz is working on is to fix an urgent issue? [02:09:58] which change is that? [02:10:12] RECOVERY - Parsoid on wtp1004 is OK: HTTP OK: HTTP/1.1 200 OK - 970 bytes in 0.003 second response time [02:10:21] robla: I don't know, no [02:10:52] paravoid: https://gerrit.wikimedia.org/r/#/c/106638/ [02:11:07] he self-merged it, backported to wmf10, and apparently it's on tin now [02:11:11] oh, this is not urgent at all [02:11:15] but it can be deployed [02:11:22] copyFileBackend runs on a cron [02:11:46] er, not even that, it doesn't even run until we run it [02:11:56] ah, ok [02:12:24] it's for copying files between filebackends, i.e. between pmtpa & eqiad right now [02:12:48] which right now means purely reconciling differences for data integrity reasons [02:13:09] spage: is the change on tin some variant of https://gerrit.wikimedia.org/r/#/c/106638/ or something else? [02:13:31] RECOVERY - Parsoid on wtp1007 is OK: HTTP OK: HTTP/1.1 200 OK - 970 bytes in 0.007 second response time [02:13:47] robla yes, that's it, only changes maintenance/copyFileBackend.php [02:13:58] ok...go ahead and sync-file your change [02:14:49] robla OK, so that one file will be on tin but not on cluster after our sync-file. I'll e-mail Aaron Schulz and Greg-G. Thanks a bunch [02:14:59] spage: no prob [02:15:25] paravoid: thanks for clearing that up. I didn't notice it was in the maintenance directory as I skimmed it [02:18:16] robla: I don't think that was a self-merge, BTW. [02:18:22] It looks like Bryan D. +2'd it. [02:19:16] Gloria: you sure about that? Bryan is on the list, but Aaron has the checkmark [02:19:34] robla: Read the comments, which is basically the history. :P [02:19:39] Gerrit is stupid. [02:19:56] It has no actual history. You're just supposed to guess at what happened or pray for an e-mail. [02:20:06] It --> Gerrit, to be clear. [02:20:22] ah, I see now [02:20:52] it was +2 by bryan, +1 by parent5446, rebased and self-merged [02:21:29] which isn't stupid; the rebase *is* a different commit [02:21:41] The stupid part is Gerrit's lack of an actual changeset history. [02:21:48] It says all that, but in the comments. [02:22:00] As a hack, as far as I'm concerned. [02:22:38] I don't gerrit is that bad there [02:22:53] reading the comments is easy enough...unless you have like 50 versions [02:23:05] Well, it also aggressively collapses the comments. [02:23:11] And is generally hostile toward conversation. [02:23:20] So maybe that's the intent, heh. That the comments are from Gerrit alone. [02:23:27] It could use love. [02:23:41] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:53] AaronSchulz: if we were doing automated stats on self-review, that would have been caught up in the net [02:25:19] !log spage synchronized php-1.23wmf10/extensions/Flow/includes/Repository/TreeRepository.php 'Flow cache key fix to 1.23wmf10' [02:25:27] short of parsing the comments, we'd have no way of knowing what was a true self-review, and what was a manual merge after a peer review [02:25:45] robla: FWIW the sync-common on mw1017 for final sanity check of the fix took "ages". (time flies when you're having fun :) ) [02:25:58] testing fix in prod now [02:25:59] Stats are often wrong. [02:28:26] !log LocalisationUpdate completed (1.23wmf9) at Fri Jan 10 02:28:25 UTC 2014 [02:28:31] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:29:25] Perhaps morebots shouldn't be on wikitech. [02:29:51] robla: the reviewers +2ed the change, which is the codediff + parents, though. you can't assume that the reviewers are ok with +2ing the rebase [02:30:27] robla: so it *is* a self-review, just with good reason. which is okay :) [02:30:43] Gloria: yeah, see #-labs for what's happening with virt0 [02:30:52] I just saw. I heh'd pretty hard. [02:31:48] robla our Flow fix helped, thanks soooo much! [02:33:32] so did my engineering list post just go to dev/null? [02:35:36] hmm, it went to engineering@wikimedia.org instead of engineering@list.wikimedia.org [02:35:54] paravoid: arg, what is the former? [02:35:56] AaronSchulz: reading it now. [02:36:14] ok, so that worked, maybe I just don't get me own messages back from there [02:41:45] AaronSchulz: did you see ^ about your maintenance/copyFileBackend.php change? I updated 1.23wmf10 on tin to sync one file and that showed up. [02:43:12] yeah you can ignore that [02:43:48] it was taking jenkins a while to merge...and then I forgot what I was waiting on and got a burrito [02:46:21] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:46:21] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:47:11] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:47:11] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:50:10] researchers sure know how to hammer a poor defenceless db slave [02:50:17] hehe [02:55:47] (03CR) 10Chad: [C: 032] Turn off automatically searching commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106537 (owner: 10Manybubbles) [02:55:51] hey, we're monitoring logs and seeing steady exceptions from "ApiFormatXml::recXmlPrint: (P248, ...)" (looks like Wikidata 'snaks'), is this a known issue [02:55:57] (03Merged) 10jenkins-bot: Turn off automatically searching commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106537 (owner: 10Manybubbles) [02:57:25] aude, Lydia_WMDE : ^ ? [02:58:01] !log demon synchronized wmf-config/CirrusSearch-common.php 'Disable commons search results for performance.' [02:58:24] <^d> !log That was supposed to go out during today's LD before I dove down a rabbit hole and forgot [02:58:40] !log LocalisationUpdate completed (1.23wmf10) at Fri Jan 10 02:58:40 UTC 2014 [03:01:31] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 0.072 second response time [03:01:31] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.823 second response time [03:15:28] paravoid: so is there any kind of long term plan for tampa yet? [03:15:53] long term, it's going to be shut down [03:16:03] (:P) [03:16:17] but in the meantime, the current thinking is to keep a copy of all data [03:16:28] and start moving them to the new DC [03:29:23] paravoid: do we want swift failover support now or just to have copies around? [03:30:14] and would just having swiftrepl on cron be good enough? [03:32:04] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Jan 10 03:32:03 UTC 2014 [03:32:10] Logged the message, Master [04:41:59] (03CR) 10Subramanya Sastry: "Thanks for flagging the bootstrapping issue." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [05:45:51] (03PS1) 10Springle: Assign db1055 to S1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106660 [05:47:15] (03CR) 10Springle: [C: 032] Assign db1055 to S1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106660 (owner: 10Springle) [05:55:57] !log xtrabackup clone db1050 to db1055 [05:56:04] Logged the message, Master [06:06:59] PROBLEM - mysqld processes on db1055 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [06:48:46] ori: can you please explian what is somehow? https://gerrit.wikimedia.org/r/#/c/100760/8/modules/subversion/manifests/init.pp [06:57:32] (03CR) 10Faidon Liambotis: [C: 04-1] "LGTM in general, very minor comments." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 (owner: 10BryanDavis) [06:59:15] (03CR) 10Faidon Liambotis: "I think I'd prefer just calling it logstash, since that's how everyone's calling it. If alternative interfaces emerge, we probably host th" [operations/dns] - 10https://gerrit.wikimedia.org/r/105105 (owner: 10BryanDavis) [07:09:57] (03CR) 10Faidon Liambotis: "A few minor comments inline." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 (owner: 10BryanDavis) [07:10:04] (03CR) 10Faidon Liambotis: [C: 04-1] Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 (owner: 10BryanDavis) [07:29:29] spage: thx - aude was working on that [07:34:42] paravoid: https://bugzilla.wikimedia.org/show_bug.cgi?id=59894 hrm :/ [07:34:56] curl gives me '401 Unauthorized: Temp URL invalid' [07:35:41] odd since that worked find in eval.php earlier today [07:39:44] heh, still works in eval [07:42:24] ottomata: did you note: https://gerrit.wikimedia.org/r/#/c/106505/ ? [07:49:57] hm, some kind of encoding issue, 7/74/SA_challenge_2013.jpg works fine [07:51:04] (03PS1) 10Springle: Assign db1060 to S2 (future master) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106663 [07:52:58] (03CR) 10Springle: [C: 032] Assign db1060 to S2 (future master) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106663 (owner: 10Springle) [07:56:16] (03PS2) 10Faidon Liambotis: Sync EQIAD before PMTPA [operations/puppet] - 10https://gerrit.wikimedia.org/r/105006 (owner: 10Reedy) [07:56:48] (03CR) 10Faidon Liambotis: [C: 032 V: 032] "Despite my objections, it doesn't hurt, so merging." [operations/puppet] - 10https://gerrit.wikimedia.org/r/105006 (owner: 10Reedy) [07:57:16] heh, I wondering about the shuffling :) [07:57:24] hey [07:57:36] though not all the rsync stuff uses that [07:57:54] that's what my objections were about :) [07:57:57] see comments [07:58:48] !log xtrabackup clone db1018 to db1060 [07:58:54] Logged the message, Master [07:59:10] (03PS2) 10Faidon Liambotis: remove chapter domains from DNS that are not owned by the WMF [operations/dns] - 10https://gerrit.wikimedia.org/r/86659 (owner: 10Dzahn) [07:59:19] (03PS3) 10Faidon Liambotis: Remove chapter domains that aren't delegated to us [operations/dns] - 10https://gerrit.wikimedia.org/r/86659 (owner: 10Dzahn) [08:00:17] (03CR) 10Faidon Liambotis: [C: 032] Remove chapter domains that aren't delegated to us [operations/dns] - 10https://gerrit.wikimedia.org/r/86659 (owner: 10Dzahn) [08:02:17] (03CR) 10Faidon Liambotis: [C: 04-2] "I don't think we need this. This was proposed during a bits outage that was because origin started throwing 40x because of a code deploys " [operations/puppet] - 10https://gerrit.wikimedia.org/r/95534 (owner: 10Mark Bergsma) [08:02:28] (03Abandoned) 10Faidon Liambotis: Double session limit for bits caches [operations/puppet] - 10https://gerrit.wikimedia.org/r/95534 (owner: 10Mark Bergsma) [08:44:21] hello [08:44:27] hi [08:44:29] feeling better? [08:48:17] yeah a bit [08:48:37] basically slept non stop for the last 36 hours or so and barely eat anything [08:48:40] goood time [09:33:13] (03PS1) 10Dzahn: shell and deployment rights for Kartik Mistry [operations/puppet] - 10https://gerrit.wikimedia.org/r/106668 [09:34:13] paravoid: akosiaris , feel like checking that? [09:34:22] i suppose you also know Kartik from Debian? [09:37:19] mutante: LGTM [09:37:33] akosiaris: thx [09:37:39] but no... I dont know kartik :-( [09:37:53] paravoid probably does though. He seems to know everybody [09:38:22] i just assume it if somebody is a Debian Dev [09:38:26] :) [09:38:38] mutante: sent you an email [09:39:44] matanya: looking [10:02:23] (03PS5) 10Faidon Liambotis: Varnish: don't mobile redirect www.$project.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/89879 (owner: 10JanZerebecki) [10:02:29] (03PS1) 10Faidon Liambotis: varnish: simplify the mobile redirect regexp [operations/puppet] - 10https://gerrit.wikimedia.org/r/106669 [10:02:40] MaxSem: can I use your insight for a moment? :) [10:02:48] sure [10:03:01] do I have any credits left or should I help you with WAP first? :P [10:03:29] i was wondering about WAP [10:03:37] 5003 [10:04:17] MaxSem: it's the two patchsets above [10:05:55] (03CR) 10Alexandros Kosiaris: svn: convert into a module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [10:13:58] paravoid, at least it was possible to unit-test the redirector:P [10:19:52] (03CR) 10Alexandros Kosiaris: [C: 032] Removed the unneeded now wikidiff2.ini override [operations/puppet] - 10https://gerrit.wikimedia.org/r/106510 (owner: 10Alexandros Kosiaris) [10:21:19] (03CR) 10MaxSem: [C: 031] Varnish: don't mobile redirect www.$project.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/89879 (owner: 10JanZerebecki) [10:23:34] (03CR) 10MaxSem: [C: 031] "Fuck hell yeah." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106669 (owner: 10Faidon Liambotis) [10:29:32] paravoid, but yeah - vetting my WAP plans would be muchly appreciated:) [10:36:50] (03CR) 10Matanya: svn: convert into a module (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [10:38:23] akosiaris: did you understand what ori meant about moving '/etc/apache2/sites-available/svn' to the role? [10:40:42] matanya: yes. That manifests/role/subversion should have that resource [10:41:07] yeah, but why [10:41:36] hmm. So the general idea is that we should strive to create modules that are not wmf specific [10:41:37] MaxSem: yeah, I'm terribly sorry about the delay... it hasn't been easy. [10:41:41] if possible that is [10:41:52] not everyone loves that idea btw [10:42:13] it's not just not specific to wmf, it's also not specific to deployment [10:42:25] what if you want to set up an svn server for a different purpose, not svn.wm.org, for instance? [10:42:31] it's an unlikely scenario nowadays, but these things happen [10:43:25] that was my second point [10:49:16] (03PS3) 10Alexandros Kosiaris: applicationserver: pass puppetlint / retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/104920 (owner: 10Hashar) [11:04:50] (03CR) 10Alexandros Kosiaris: [C: 032] applicationserver: pass puppetlint / retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/104920 (owner: 10Hashar) [11:11:36] (03CR) 10Faidon Liambotis: [C: 04-1] Collection Renderer (Now a module!) (0316 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [11:15:37] (03PS2) 10Dzahn: shell and deployment rights for Kartik Mistry [operations/puppet] - 10https://gerrit.wikimedia.org/r/106668 [11:23:09] paravoid: ori, re: gdash migration. is this one also fixed? (#5922: fenari used as a proxy for graphite installation) [11:23:14] yes [11:23:22] cool, closing another one:) [11:23:25] thanks [11:39:31] (03CR) 10Dzahn: [C: 032] "< akosiaris> mutante: LGTM | < apergos> indentation :-/ (fixed in PS2) < apergos> otherwise lgtm | looks good" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106668 (owner: 10Dzahn) [11:52:06] !log welcome new software deployer Kartik from Language Engineering [11:52:12] Logged the message, Master [12:18:49] kart_: welcome :) [12:48:08] PROBLEM - MySQL Processlist on db1059 is CRITICAL: CRIT 0 unauthenticated, 66 locked, 0 copy to table, 0 statistics [12:58:08] RECOVERY - MySQL Processlist on db1059 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 0 statistics [13:14:18] PROBLEM - MySQL Processlist on db1006 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 119 statistics [13:15:18] RECOVERY - MySQL Processlist on db1006 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 3 statistics [13:56:41] !log Gerrit: adding WikidataJenkins user to groups "Non-Interactive Users" and "Wikidata" [13:56:48] Logged the message, Master [13:59:57] RobH: wanna complete the integration/doc migration ? [14:00:03] RobH: I went sick yesterday sorry :( [14:16:38] (03PS1) 10Faidon Liambotis: Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 [14:17:45] mark: wanna have a look ^^ ? [14:17:53] was already on it [14:17:58] I didn't configure SSL, are we going to do misc-web-lb? [14:18:07] maybe we shouldn't, like for other monitoring tools [14:19:02] your defs support it nicely I see, it shouldn't be a big deal [14:23:54] yeah perhaps not here [14:46:32] (03PS2) 10Faidon Liambotis: Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 [14:46:34] now I just have to learn the deployment system [14:46:34] fun [14:46:34] (03PS1) 10Faidon Liambotis: webserver::apache: misc SSL fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106700 [14:46:36] (03PS1) 10Faidon Liambotis: librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 [14:47:07] (03CR) 10jenkins-bot: [V: 04-1] Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 (owner: 10Faidon Liambotis) [14:47:21] (03CR) 10jenkins-bot: [V: 04-1] librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 (owner: 10Faidon Liambotis) [14:47:28] oh [14:47:36] (03CR) 10jenkins-bot: [V: 04-1] webserver::apache: misc SSL fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106700 (owner: 10Faidon Liambotis) [14:47:36] is it your intention to use $hostname where it's confusing with $::hostname? :) [14:48:04] -1? [14:48:15] yes, it is [14:48:20] my intention [14:49:00] (03CR) 10Mark Bergsma: Add librenms module and role class & apply it (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 (owner: 10Faidon Liambotis) [14:49:12] why not use another name? [14:50:28] I didn't because it doesn't confuse me, but I'll rename if it confuses you :) [14:50:33] any suggestions? [14:50:54] "$fqdn"? [14:51:02] no that's another facter variable ;) [14:51:04] same issue, I assume? [14:51:06] yeah :) [14:51:08] use $site_hostname or smt [14:51:16] or sitename or whatever [14:51:24] sitename sounds good [14:51:26] or $vhostname [14:51:45] it confused me for two seconds [14:51:47] that's enough ;p [14:52:25] and as for observium, mod rewrite just for this, bleergh [14:52:56] mod_rewrite? [14:53:00] why not just redirect? [14:53:07] oh separate site and redirect [14:53:10] yeah I guess that could work [14:53:16] yeah [14:53:38] and then get a SAN certificate? :) [14:53:46] meh [14:53:49] no ;p [14:53:56] a cert warning is fine hehe [14:54:08] or just no SSL [14:54:14] or that indeed [15:00:08] (03PS3) 10Faidon Liambotis: Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 [15:00:10] (03PS2) 10Faidon Liambotis: webserver::apache: misc SSL fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106700 [15:00:12] (03PS2) 10Faidon Liambotis: librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 [15:00:44] (03CR) 10jenkins-bot: [V: 04-1] Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 (owner: 10Faidon Liambotis) [15:00:50] oh cmon [15:00:58] (03CR) 10jenkins-bot: [V: 04-1] librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 (owner: 10Faidon Liambotis) [15:03:04] (03PS4) 10Faidon Liambotis: Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 [15:03:06] (03PS3) 10Faidon Liambotis: librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 [15:09:34] (03PS5) 10Faidon Liambotis: Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 [15:09:36] (03PS4) 10Faidon Liambotis: librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 [15:11:04] (03CR) 10Faidon Liambotis: [C: 032] webserver::apache: misc SSL fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106700 (owner: 10Faidon Liambotis) [15:13:55] so, how do I push the whole mirror in one go [15:14:13] making sure at the same time each of the thousands of commits won't become a changeset? :P [15:15:58] paravoid: direct push by-passing gerrit? [15:16:48] jzerebecki: how did you push when you imported commons app into that test gerrit box [15:18:35] " if you have direct access to Gerrit's repository location (e.g. via SSH or on the local filesystem), you can just push directly into the repository (whereever Gerrit created it), bypassing Gerrit entirely. You'll need to flush the Gerrit caches afterwards, however, for Gerrit to notice that the repository HEADs have updated." [15:19:16] and you will need a couple of extra powers like push -f i think [15:19:50] I 've done this before, after fighting with gerrit for 2 hours (my first 2 gerrit hours ...) [15:20:22] heh, and in my case i installed a test gerrit and jzerebecki pushed it in [15:22:29] i think create reference right is needed... [15:22:54] and push and forge author,commiter [15:23:04] https://stackoverflow.com/questions/14789666/import-repository-from-git-to-gerrit [15:26:10] maybe the "just copy the .git file" as well? see last answer [15:26:26] "copy the xy.git Directory of the git repository to the directory where gerrit deposits the git repos. After restart of gerrit process the new project is in the list of new projects" [15:28:36] heh [15:28:57] I wonder how happy is gerrit after all that messing with [15:30:47] hiaaaa, akosiaris, i want to add a .deb from cdh4 that we don't currently have [15:30:50] what's the proper thing to do? [15:30:58] can i just use reprepro? or [15:31:07] reprepro update [15:31:20] oh right and there is a file with a list of packages? [15:31:21] yeahhh [15:31:38] ./modules/install-server/files/reprepro/updates [15:31:45] gooot it, danke [15:31:53] :-) [15:32:30] the grep-dctrl there is tracking the packages who have a source package of whatever is in that regexp [15:32:42] so don't put the binary package name directly, it won't work [15:32:51] yahh, right [15:32:54] just add to the regex, ja? [15:32:58] yes [15:34:30] ottomata: varnishkafka nagios warnings [15:34:41] ^d: ping? [15:35:19] ottomata: two other things for you [15:35:39] ottomata: first is, are we upgrading to a newer CDH? 4.5 is out [15:35:54] ottomata: second is, have you explored the possibility of mounting HDFS over LUKS? [15:35:57] <^d> ottomata: Sup? [15:36:02] ^d: that was me :) [15:36:08] <^d> Yes, I see now. [15:36:11] ^d: I want to import a project from github onto ops/software/ [15:36:16] I created the repo in gerrit [15:36:21] what's the best way to import? [15:36:37] <^d> clone from github, adjust ACL on repo so you can directly push, push to gerrit. [15:37:19] pavavoid, yeah I know about the vk warnings [15:37:21] this is the problem [15:37:27] https://rt.wikimedia.org/Ticket/Display.html?id=6602 [15:37:35] esams gmond runs on a custom port [15:37:42] ganglios uses hardcoded 8649 [15:37:45] ^d: adjust ACL how? [15:38:06] ottomata: why does it use a custom port? what do we need ganglios for? [15:38:20] i'm not sure, i think mark set this up maybe? [15:38:23] <^d> paravoid: Second, I'll show you. [15:38:31] it looks like esams uses unicast ganglia [15:38:35] all aggregators run on hooft [15:38:39] multiple gmond instances there [15:38:56] i am using ganglios to get the varnishkafka stats into icinga [15:39:00] someone else showed me before, must have been months ago :/ [15:39:16] rather than writing some more custom stuff to get stuff out of the varnishkafka.stats.json [15:39:29] <^d> paravoid: Go to https://gerrit.wikimedia.org/r/#/admin/projects/operations/software/gdash,access (adjust repo name in URL as appropriate). Click Edit, then add "ldap/ops" (or another group you're in) to Push for refs/* [15:39:54] re cdh4.5 [15:40:03] i've looked at the changelogs, and haven't noticed anything that i thought we needed [15:40:13] i thikn cdh4.3.2 says it explicitly supports oracle jdk 7 [15:40:18] which isn't openjdk, but at least it is 7 [15:40:39] we could upgrade though, i'm for it [15:41:01] ^d: the permission labeled "Push"? [15:41:07] <^d> paravoid: Yes. [15:41:08] <^d> :) [15:41:25] <^d> Oh, one more thing. [15:41:32] <^d> Also, grant yourself "Forge Committer" [15:41:32] and click on force push [15:41:44] commiter & author? [15:41:49] <^d> Just committer. [15:41:58] <^d> Everyone has forge author (it's how you amend someone else's patch) [15:42:09] ok [15:42:19] and just to confirm, this won't create a new changeset per each of these commits, right? [15:42:38] <^d> No, if you just push directly to gerrit, and not through refs/for/* or git-review or something. [15:42:51] ok [15:44:06] * ^d mumbles something about it being before 8am and not had any caffeine yet [15:44:14] eek, sorry [15:45:00] * hashar sends european continental breakfast to ^d [15:45:10] cause we all know the US version sucks [15:45:12] <^d> paravoid: Not your fault, mine for signing into IRC before I was fully awake :p [15:45:38] <^d> hashar: Nom nom nom. [15:45:41] ^d: when you get awake, will you have a chance to fix up mediawiki/extensions.git not updating ? :D [15:45:57] gives ^d a https://www.thinkgeek.com/product/5a65/ [15:46:22] <^d> hashar: There is no fix, it's a bug. Will have to be worked around like I said on the dupe bug. [15:47:57] (03PS1) 10Ottomata: Adding hcatalog package to apt via reprepro update [operations/puppet] - 10https://gerrit.wikimedia.org/r/106712 [15:51:27] (03CR) 10Ottomata: [C: 032 V: 032] Adding hcatalog package to apt via reprepro update [operations/puppet] - 10https://gerrit.wikimedia.org/r/106712 (owner: 10Ottomata) [15:51:35] win 4 [15:51:50] (03PS1) 10Mark Bergsma: Move geoiplookup.esams to the new bits-lb.esams IP [operations/dns] - 10https://gerrit.wikimedia.org/r/106714 [15:52:19] (03CR) 10Mark Bergsma: [C: 032] Move geoiplookup.esams to the new bits-lb.esams IP [operations/dns] - 10https://gerrit.wikimedia.org/r/106714 (owner: 10Mark Bergsma) [15:54:54] (03PS6) 10Faidon Liambotis: Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 [15:54:56] (03PS5) 10Faidon Liambotis: librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 [15:55:14] (03CR) 10Faidon Liambotis: [C: 032] Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 (owner: 10Faidon Liambotis) [15:55:58] (03CR) 10Faidon Liambotis: [V: 032] Add librenms module and role class & apply it [operations/puppet] - 10https://gerrit.wikimedia.org/r/106694 (owner: 10Faidon Liambotis) [15:58:05] ^d: haven't you managed to fix the mediawiki/extension.git previously ? or are we hitting a different bug? [15:58:53] (03PS1) 10Faidon Liambotis: librenms: fix role class include scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/106715 [15:59:13] mutante: thanks [15:59:58] (03PS2) 10Faidon Liambotis: librenms: fix role class include scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/106715 [16:00:13] (03CR) 10Faidon Liambotis: [C: 032 V: 032] librenms: fix role class include scoping [operations/puppet] - 10https://gerrit.wikimedia.org/r/106715 (owner: 10Faidon Liambotis) [16:01:34] do I need shell access to be able to grep entries pushed by wfDebugLog on production machines? [16:02:21] it's the exception log in particular that I'm interested in [16:03:10] (03PS1) 10Faidon Liambotis: librenms: don't use stdlib's merge on phpdump() [operations/puppet] - 10https://gerrit.wikimedia.org/r/106717 [16:03:32] (03CR) 10Faidon Liambotis: [C: 032 V: 032] librenms: don't use stdlib's merge on phpdump() [operations/puppet] - 10https://gerrit.wikimedia.org/r/106717 (owner: 10Faidon Liambotis) [16:06:36] hmm, akosiaris [16:06:47] this actually was already here: [16:06:47] http://apt.wikimedia.org/wikimedia/pool/main/h/hcatalog/ [16:06:51] 1. not sure how, as it wasn't listed in the regex before [16:06:53] and [16:07:03] 2. it isn't showing up via apt-cache [16:07:06] or apt-get [16:07:33] "this" being hcatalog ? [16:07:35] yes [16:07:49] oh [16:07:50] nm [16:07:51] hmmmmm [16:07:51] ok [16:07:55] 502 Bad Gateway [16:07:59] it is after ap-get update [16:08:00] ok [16:08:06] so, i think i misunderstood what the updates file was doing [16:08:16] it looks like everything from cloudera is synced? [16:08:32] nope [16:08:33] but the updates file will only include it in the actual apt repo for installation if it is listed in the updates file? [16:09:00] it will only add the ones listed there [16:09:07] if it was there... it was by some error [16:09:12] and was not deleted afterwards [16:09:19] probably my error [16:10:58] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [16:11:12] hmm ok [16:11:14] alright cool [16:11:16] it is working now, thank you! [16:12:19] ottomata: from what i see [16:12:35] you did the correct gerrit change and reprepro update did fetch the binaries [16:12:43] so... no problem at all ? [16:12:55] (03PS1) 10Faidon Liambotis: webserver: remove broken docroot logic [operations/puppet] - 10https://gerrit.wikimedia.org/r/106722 [16:13:22] well, the binaries were already there? [16:13:28] no? [16:13:29] (03CR) 10Faidon Liambotis: [C: 032] webserver: remove broken docroot logic [operations/puppet] - 10https://gerrit.wikimedia.org/r/106722 (owner: 10Faidon Liambotis) [16:13:38] i mean [16:13:45] my regex is only trying to grab hcatalog [16:13:51] not hcatalog-server, webhcat-server, etc. [16:14:01] it is working, i can install hcatalog [16:14:16] but i'm not sure how webhcat stuff got there [16:14:38] maybe the binaries weren't already there [16:14:55] but, yeah, not sure how it got the packages that don't match my regex [16:15:06] aaa now I got it [16:15:10] your regexp [16:15:14] catches the source package [16:15:18] not the binary package [16:15:29] do apt-cache showsrc hcatalog [16:15:30] <^d> hashar: We worked around it before by removing the repo that was causing problems. [16:15:35] (03PS1) 10Ottomata: Adding hcatalog.pp [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/106723 [16:15:36] Binary: hcatalog, hcatalog-server, webhcat, webhcat-server [16:15:37] <^d> But VE team wants that repo now, so the bug came back. [16:16:15] that is why all the other binaries showed up. That is why I told you that the dctrl line matches source packages and not binary packages [16:16:37] (03PS1) 10Faidon Liambotis: librenms: fix another scoping/dependency cycle [operations/puppet] - 10https://gerrit.wikimedia.org/r/106724 [16:16:37] ? [16:17:24] (03CR) 10Faidon Liambotis: [C: 032] librenms: fix another scoping/dependency cycle [operations/puppet] - 10https://gerrit.wikimedia.org/r/106724 (owner: 10Faidon Liambotis) [16:17:25] ^d: :-( [16:17:28] (03PS2) 10Ottomata: Adding hcatalog.pp [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/106723 [16:17:30] (03CR) 10Faidon Liambotis: [V: 032] librenms: fix another scoping/dependency cycle [operations/puppet] - 10https://gerrit.wikimedia.org/r/106724 (owner: 10Faidon Liambotis) [16:17:49] (03CR) 10Ottomata: [C: 032 V: 032] Adding hcatalog.pp [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/106723 (owner: 10Ottomata) [16:19:56] <^d> hashar: Other way to work around it would be to write a cron...somewhere...that keeps it up to date. [16:21:47] Thanks for merging 105006 paravoid [16:21:58] (03PS1) 10Ottomata: Updating cdh4 module and installing hcatalog on hive clients [operations/puppet] - 10https://gerrit.wikimedia.org/r/106726 [16:22:00] akosiaris: not sure I understand [16:22:11] why would matching the source packages make the binaries show up? [16:22:37] (03CR) 10Ottomata: [C: 032 V: 032] Updating cdh4 module and installing hcatalog on hive clients [operations/puppet] - 10https://gerrit.wikimedia.org/r/106726 (owner: 10Ottomata) [16:23:46] (03CR) 10Dzahn: [C: 031] retab certs.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/104742 (owner: 10Hashar) [16:24:22] ^d: might end up writing my own daemon :D [16:25:58] ottomata: because reprepro updates will fetch anything that satisfies the ListShellHook [16:26:43] and the grep-dctrl has the -S flag which tells it to match source package names and fetch the packages for which those source packages names match [16:27:36] but ^hcatalog$ doesn't match webhcat-server, does it? [16:27:40] (03PS1) 10Jgreen: add *.frdev.wikimedia.org cnames for lutetium internal-use sites [operations/dns] - 10https://gerrit.wikimedia.org/r/106727 [16:28:18] what is file {} ensure => file; ? exactly the same as ensure => present;? wrong? [16:30:25] ottomata: it does. Because hcatalog is the source package for webhchat-server [16:30:27] (03CR) 10Jgreen: [C: 032 V: 031] add *.frdev.wikimedia.org cnames for lutetium internal-use sites [operations/dns] - 10https://gerrit.wikimedia.org/r/106727 (owner: 10Jgreen) [16:30:48] as well as for the other 3 [16:32:08] (03PS1) 10Ottomata: Using hcatalog-core for hive partitioning [operations/puppet] - 10https://gerrit.wikimedia.org/r/106728 [16:32:24] AHhhhh [16:32:27] ok thanks akosiaris [16:32:47] (03CR) 10Ottomata: [C: 032 V: 032] Using hcatalog-core for hive partitioning [operations/puppet] - 10https://gerrit.wikimedia.org/r/106728 (owner: 10Ottomata) [16:34:41] (03CR) 10Dzahn: certs.pp puppet lint fixes (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104743 (owner: 10Hashar) [16:36:56] ok, i think we don't have this 100% correct all over the place: ensure => file means to make sure it is a file - and not a symlink or a directory. Ensure present means make sure it exists, but it can be a file, symlink or directory. [16:41:03] (03PS1) 10Mark Bergsma: Update core routers list [operations/puppet] - 10https://gerrit.wikimedia.org/r/106729 [16:42:16] PROBLEM - MySQL Processlist on db1006 is CRITICAL: CRIT 1 unauthenticated, 0 locked, 0 copy to table, 75 statistics [16:42:36] so [16:42:40] (03PS2) 10Mark Bergsma: Update core routers list [operations/puppet] - 10https://gerrit.wikimedia.org/r/106729 [16:42:41] who maintains trebuchet these days? [16:42:45] it's broken for me :) [16:43:16] RECOVERY - MySQL Processlist on db1006 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 8 statistics [16:44:12] (03CR) 10Mark Bergsma: [C: 032] Update core routers list [operations/puppet] - 10https://gerrit.wikimedia.org/r/106729 (owner: 10Mark Bergsma) [16:44:53] paravoid: I suppose you will just have to wait for the next rename? :) [16:54:32] ^d: So, is the only fix for https://bugzilla.wikimedia.org/show_bug.cgi?id=49846 to rename the VisualEditor repos? [16:54:57] ^d: And if so, do you want to do it? RoanKattouw says he'd just ask you anyway. :-) [16:55:17] (03PS1) 10Faidon Liambotis: librenms: more syntax fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/106730 [16:55:22] I'm about to be on planes for 26 hours then sleep for 12-16 hours once I get home [16:55:31] So I'm not gonna be renaming any repos any time soon :) [16:55:47] (03CR) 10Faidon Liambotis: [C: 032] librenms: more syntax fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/106730 (owner: 10Faidon Liambotis) [17:00:44] <^d> James_F: A different name would be nice, yes. [17:00:50] <^d> And avoid the problem entirely [17:01:24] ^d: I was thinking VisualEditor/VE-core.git and mediawiki/extensions/MWVisualEditor.git, but frankly "working" is more important. [17:01:51] ^d: Can you rename repos? Is it hard? [17:01:54] <^d> It's the top-level VisualEditor.git that's causing problems. [17:01:58] <^d> It's a pain in the freaking ass. [17:02:04] <^d> I think we'll have to rename the whole tree. [17:02:31] … why? It won't conflict with anything given my proposal. [17:02:51] <^d> Oh, MWVisualEditor. [17:02:54] The extension to use VE in MW doesn't need to be called "VisualEditor". [17:02:55] <^d> I missed that. [17:02:59] Yeah. :-) [17:03:07] <^d> If we just rename that, the rest of it shouldn't matter. [17:03:20] VisualEditor.git and VisualEditor/VisualEditor.git is OK? [17:03:22] <^d> Then you can leave the other repos as-is. It'll just be the one super awful rename. [17:03:28] <^d> Yeah that doesn't matter. [17:03:30] Eh. [17:03:51] Is this a not-actually-rename-but-create-new-one,-move-history,-and-delete-old-one? [17:04:09] <^d> I'm going to move everything. [17:04:13] <^d> Plus review history. [17:04:16] RoanKattouw: Want to give your blessing? [17:04:17] <^d> It's going to take awhile. [17:04:31] No rush. I'm about to get on an aeroplane. :-) [17:04:47] <^d> I'd rather not do it on a Friday either. It's going to break all the deployed wmf branches too. [17:04:57] Eurgh. [17:05:00] Yeah. [17:05:02] <^d> Can we live broken for the weekend? [17:05:02] * James_F sighs. [17:05:10] Just add the new submodules [17:05:11] <^d> We'll fix it up first thing next week. [17:05:14] I am in no rush at all [17:05:18] I dunno. BetaLabs is useless for us (and other teams apparently). [17:05:18] Then fix the config in place [17:05:24] <^d> Reedy: I know. My point is I don't want to have to do all that...on a friday. [17:05:30] Friday [17:05:31] Friday [17:05:35] I land at SFO at 11am on Saturday so I should be alive and back in the office on Monday [17:05:36] <^d> Gotta break shit on friday. [17:05:37] Gotta break the site on Friday [17:05:41] ^d: Want to do it on Monday? [17:05:47] <^d> Sure, we can do it monday. [17:05:52] ^d: Kk. [17:05:57] <^d> I'll spend some time today making a list of everything we'll need to update. [17:06:01] Thanks! [17:06:02] <^d> Which should save us time then [17:06:14] [x] Everything [17:06:20] <^d> {{done}} [17:06:25] <^d> "Rename all the stuff" [17:06:46] 2 branches. tools/release, operations/mediawiki-config... [17:07:04] Why mw-config? [17:07:14] Oh, of course, extension/Foo/Foo.php path [17:07:23] extension-list , CommonSettings.php [17:07:23] Wow, I knew more than RoanKattouw. :-) [17:07:25] <^d> git repos on ytterbium, lanathium, gallium x2, github. [17:07:32] <^d> Plus, $n tables in the database. [17:08:01] Yay fun. [17:08:28] Do it in screen now and take an early lunch [17:08:51] <^d> It won't take long, it's just a bunch of things and they all need to be done roughly at the same time or shit starts breaking. [17:08:53] <^d> So yes, monday. [17:08:56] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [17:09:04] I think I might already have a deployment window on Monday [17:09:19] Really meant for a config change [17:09:31] "config change" [17:09:37] Monday, January 13 18:00–19:00 10:00–11:00 Roan/James VisualEditor enabled by default on "phase 4" Wikipedias – 102208 [17:09:51] Well, this is a config change. [17:09:54] Of sorts. :-) [17:10:09] But ideally we'd not want to do this just after lots of wikis get VE for all users for the first time. [17:10:11] * James_F sighs. [17:10:16] <^d> RoanKattouw: It also overlaps my search window of 9-11. Which means nobody should be doing anything but you and me [17:10:24] True. [17:11:10] <^d> We're only indexing enwiki in elasticsearch monday. [17:11:14] <^d> Should be no big deal. [17:11:20] * James_F laughs. [17:11:35] <^d> No seriously, should be no big deal :p [17:11:43] Well, OK. [17:12:02] Let's try to all be in the office for 09:00 so we can do the VE -> MWVE stuff beforehand? [17:12:04] We're only enabling VE on... how many wikis this time? [17:12:11] 25. [17:12:14] I think/. [17:12:20] There's a gerrit change somewhere. [17:12:23] Hah, that's not that many [17:12:26] Last time it was >100 [17:12:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] "One major comment. Other than that, after staring for a long time at the old and the new regexps... man the new version is so much cleaner" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106669 (owner: 10Faidon Liambotis) [17:12:54] <^d> James_F: Sounds good. I'll add a reminder for myself :) [17:13:29] ^d: And I'll drag RoanKattouw out of bed just 36 hours after he finds it. :-) [17:14:00] woo to mw/extensions getting fixed [17:14:36] PROBLEM - HTTP on netmon1001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 4581 bytes in 0.031 second response time [17:14:54] zz_yuvipanda: Then you can use BetaLabs. :-) [17:15:00] indeed [17:15:45] James_F: looking forward to that :) [17:22:36] <^d> James_F|Away: Thought of something else. We can probably make a symlink on git.wm.o so those links at least won't break. [17:24:36] RECOVERY - HTTP on netmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 6420 bytes in 0.039 second response time [17:24:51] Yeah that sounds reasonable [17:35:55] Going to update Scholarships with new i18n files. greg-g has blessed the activity [17:36:31] it's "just" i18n, and it's not mw, so that shouldn't be too scary [17:37:06] and, it's a new language (spanish), not just updates, ie: now spanish users can comfortably apply for a scholarship [17:37:11] spanish-speaking* [17:40:10] {{done}} https://scholarships.wikimedia.org/apply?uselang=es [17:40:33] <^d> Three cheers for i18n [17:40:49] eh? [17:41:31] <^d> Wikimania scholarships. We updated so we could get the es translations, which was a newly added language. [17:42:08] Which only has one glaringly obvious translation error [17:42:27] heh [17:42:59] The translator localized the FAQ url into a redlink [17:45:51] oh geez [17:45:58] that's kinda bad [17:46:58] I fixed it at translatewiki.net [17:47:34] And updated the qqq to hopefully help it not happen again [17:55:51] (03PS2) 10BryanDavis: Proxy logstash.wikimedia.org via misc varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 [17:56:03] (03CR) 10jenkins-bot: [V: 04-1] Proxy logstash.wikimedia.org via misc varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 (owner: 10BryanDavis) [17:56:26] (03PS1) 10Dzahn: add nuria to privatedata admins [operations/puppet] - 10https://gerrit.wikimedia.org/r/106738 [18:02:37] (03CR) 10BryanDavis: Proxy logstash.wikimedia.org via misc varnish cluster (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 (owner: 10BryanDavis) [18:04:51] (03CR) 10Nuria: [C: 031] add nuria to privatedata admins [operations/puppet] - 10https://gerrit.wikimedia.org/r/106738 (owner: 10Dzahn) [18:08:57] (03PS3) 10BryanDavis: Proxy logstash.wikimedia.org via misc varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 [18:08:59] (03PS2) 10BryanDavis: Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 [18:10:00] (03CR) 10jenkins-bot: [V: 04-1] Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 (owner: 10BryanDavis) [18:10:02] (03CR) 10jenkins-bot: [V: 04-1] Proxy logstash.wikimedia.org via misc varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 (owner: 10BryanDavis) [18:13:41] (03PS3) 10BryanDavis: Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 [18:15:56] (03CR) 10BryanDavis: [C: 04-1] "Faidon's -1 from patch set 1 still stands. Patch sets 2 and 3 were just manual rebase and fix for error in manual rebase." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 (owner: 10BryanDavis) [18:16:14] (03PS4) 10BryanDavis: Proxy logstash.wikimedia.org via misc varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 [18:19:22] hm, temp alarms on ps1-a1 @ stdpa. cause for concern? (email went to noc@) [18:19:42] (03PS3) 10BryanDavis: Add logstash.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/105105 [18:19:43] sdtpa even [18:23:07] (03CR) 10BryanDavis: "Renamed service to logstash because Faidon is correct that everyone thinks of this as logstash and using the name of the particular fronte" [operations/dns] - 10https://gerrit.wikimedia.org/r/105105 (owner: 10BryanDavis) [18:39:29] (03PS1) 10Faidon Liambotis: webserver::apache: misc adjustments [operations/puppet] - 10https://gerrit.wikimedia.org/r/106755 [18:39:31] (03PS1) 10Faidon Liambotis: librenms: another round of misc fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106756 [18:43:02] (03CR) 10Faidon Liambotis: [C: 032] webserver::apache: misc adjustments [operations/puppet] - 10https://gerrit.wikimedia.org/r/106755 (owner: 10Faidon Liambotis) [18:43:38] (03CR) 10Faidon Liambotis: [C: 032] librenms: another round of misc fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106756 (owner: 10Faidon Liambotis) [18:44:08] does anyone know what's happening with temp alarms in tampa [18:45:12] cmjohnson1: ouch, seeing the mails now [18:45:19] no idea yet, hope it's not on fire [18:45:20] * cmjohnson1 calling 365 main techs now [18:45:29] see if its' for real [18:45:45] oh, well 4 degress up [18:45:49] in the 20s [18:45:54] that's not so bad [18:46:01] er [18:46:09] you are aware that faidon is working on the monitoring system? :) [18:46:15] it's nothing i'm sure [18:46:31] yea, but they look like real from temperature sensors [18:46:34] (03PS1) 10Faidon Liambotis: Add IPv6 address to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106758 [18:46:43] though it's about >4 degress C [18:46:52] yes of course they're real [18:48:19] ok, yea, i knew he was on netmon1001 for the observium replacement [18:48:54] :P [18:49:21] (03CR) 10Faidon Liambotis: [C: 032] Add IPv6 address to netmon1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106758 (owner: 10Faidon Liambotis) [18:49:34] mark: i wanted to see if it was from faidon messing with it or if something happened on the floor [18:51:01] someone turned streber's port on ? [18:54:08] (03PS1) 10Faidon Liambotis: Add librenms & switch observium to netmon1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/106759 [18:54:10] I did [18:54:21] (03CR) 10Mark Bergsma: [C: 04-1] "a few comments" (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 (owner: 10BryanDavis) [18:54:38] (03CR) 10Faidon Liambotis: [C: 032] Add librenms & switch observium to netmon1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/106759 (owner: 10Faidon Liambotis) [19:05:21] (03CR) 10Aaron Schulz: [C: 032] Setup and enabled redisLockManager for all file backends in use [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104317 (owner: 10Aaron Schulz) [19:05:33] (03Merged) 10jenkins-bot: Setup and enabled redisLockManager for all file backends in use [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104317 (owner: 10Aaron Schulz) [19:06:11] !log aaron synchronized wmf-config/filebackend.php 'Setup and enabled redisLockManager for all file backends in use' [19:06:18] Logged the message, Master [19:19:38] (03CR) 10Erik Zachte: [C: 031] add nuria to privatedata admins [operations/puppet] - 10https://gerrit.wikimedia.org/r/106738 (owner: 10Dzahn) [19:34:48] check out the temp alert for ps1-b8-eqiad: message sent at 10:44 but arrived in my inbox at 11:23. are others seeing the same delay? [19:35:27] maybe its clock is just wrong? [19:40:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 51% free (3814 MB out of 7627 MB) [19:40:44] why is lutetium swapping.... [19:45:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 48% free (3599 MB out of 7627 MB) [19:45:43] eh it's going down slowly so not too worried right now [19:46:13] (03PS1) 10Faidon Liambotis: librenms: discovery fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106764 [19:46:53] (03CR) 10Faidon Liambotis: [C: 032] librenms: discovery fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/106764 (owner: 10Faidon Liambotis) [19:49:28] (03PS6) 10Faidon Liambotis: librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 [19:50:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 48% free (3606 MB out of 7627 MB) [19:55:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 0% free (0 MB out of 3900 MB) [20:00:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 50% free (3764 MB out of 7627 MB) [20:05:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 14% free (1056 MB out of 7627 MB) [20:10:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 14% free (1061 MB out of 7627 MB) [20:15:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 14% free (1065 MB out of 7627 MB) [20:19:33] (03PS1) 10RobH: librenms.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/106766 [20:20:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 15% free (1068 MB out of 7627 MB) [20:22:08] (03CR) 10RobH: [C: 032] librenms.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/106766 (owner: 10RobH) [20:23:08] (03PS7) 10Faidon Liambotis: librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 [20:23:12] hey folks [20:23:15] I need to create a repo [20:23:18] who should I talk to ? [20:23:20] (03CR) 10Faidon Liambotis: [C: 032 V: 032] librenms: install SSL certificate & enable vhost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106701 (owner: 10Faidon Liambotis) [20:23:42] mutante , paravoid , akosiaris_away ? [20:25:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 15% free (1088 MB out of 7627 MB) [20:26:55] I gotcha average. [20:27:20] (03PS1) 10Faidon Liambotis: webserver::apache::site: fix syntax error [operations/puppet] - 10https://gerrit.wikimedia.org/r/106767 [20:27:35] (03CR) 10Faidon Liambotis: [C: 032] webserver::apache::site: fix syntax error [operations/puppet] - 10https://gerrit.wikimedia.org/r/106767 (owner: 10Faidon Liambotis) [20:30:10] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 15% free (1109 MB out of 7627 MB) [20:35:08] PROBLEM - check_swap on lutetium is CRITICAL: SWAP CRITICAL - 0% free (0 MB out of 6415 MB) [20:39:26] (03PS1) 10Hashar: contint: disable tmpfs on labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/106770 [20:39:44] and lutetium went to swap death :( [20:40:08] RECOVERY - check_swap on lutetium is OK: SWAP OK - 100% free (7592 MB out of 7627 MB) [20:40:09] Jeff_Green: lutetium went to swap :( [20:40:15] (03PS1) 10RobH: star.wikimedia.org cert chain fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/106771 [20:40:30] hashar: yeah I've been looking at it. curiously it did so even while it had 18GB available [20:40:40] (03PS2) 10Hashar: star.wikimedia.org cert chain fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/106771 (owner: 10RobH) [20:41:32] Jeff_Green: I guess dmesg will tell you which process eat all men hehe [20:41:36] men -> mem [20:41:51] nah, there was nothing logged there [20:42:07] I know one of the fr folks ran the box out of RAM with a giant R process though [20:42:28] but even after they killed that, looks like the kernel didn't want to move mysql back into RAM [20:42:39] (03CR) 10Dzahn: [C: 04-1] "that needs to end in .pem" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106771 (owner: 10RobH) [20:42:42] I think restarting mysql finally fixed it [20:44:26] Jeff_Green: they might want to use another machine so :D [20:45:55] I should just stop monitoring it :-P [20:46:03] and if someone could merge in https://gerrit.wikimedia.org/r/106770 , that comment out some puppet error for contint slaves in labs; harmless :) [20:49:42] hashar: doesn't that patch affect production as well? [20:49:54] Or is the 'slave' just our hypothetical labs test box? [20:50:50] andrewbogott: ah yeah you made me doubt :D [20:51:04] andrewbogott: that is for the role::ci::slave::browsertests class which is only applied on labs and only on the instance that run the browsertests [20:51:05] You could wrap it in a test for ::realm [20:51:31] hashar, I'm about to go to lunch but feel free to ping me for a merge on my return. [20:51:32] I got to polish up that class, it is missing a lot of hand made stuff :( [20:51:48] you can merge it, it is not going to cause any harm. Guaranted. [20:51:56] since it is only applied on one instance :] [20:52:13] hm.... [20:52:14] ok! [20:52:38] (03PS3) 10RobH: star.wikimedia.org cert chain fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/106771 [20:52:40] (03CR) 10Andrew Bogott: [C: 032] contint: disable tmpfs on labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/106770 (owner: 10Hashar) [20:52:50] andrewbogott: danke! have a good lunch! [20:52:53] (03PS4) 10RobH: star.wikimedia.org cert chain fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/106771 [20:53:07] doh, stripped out hashar's reference and had to readd, heh [20:53:20] hey sorry [20:53:31] nah, i had totally wrong paste [20:53:39] i have no idea what i pasted, was rushed, so was wrong anyhow [20:53:53] andrewbogott: and that fixed puppet \O/ [20:55:42] (03CR) 10RobH: [C: 032] star.wikimedia.org cert chain fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/106771 (owner: 10RobH) [20:56:01] so yea, just traced out the intermediate that came with new cert is indeed the rapidssl intermediate we use on cluster [20:56:52] so now i imagine i need to remove the chained cert and rerun puppet to recreate it [20:56:57] on cp1043/cp1044 [20:57:08] that looks like chinese to me [20:57:25] one day I will get myself a ssl/cert training course [20:57:30] it felt like it to me 3 days ago ;] [20:57:35] well, maybe more like a week ago [20:57:37] but still [20:57:46] (our varnish that is) [20:57:58] I used to have some projects having ssl/cert involved, but then i had a team of security engineer to figure out the details for me hehe [20:58:22] so my job was mostly filling up a Word form saying: foo.company.com would use SSL on X host with IP Y. Please make it happen. [20:59:38] once i finish rolling to both we can retest the labs stuff [20:59:51] its in middle of cp1043 now [21:00:41] well, somethign about that is wrong... [21:00:46] its chained is now three long.wtf... [21:00:59] maybe i have to remove all the cert reference and have it redownload it all. [21:01:29] !log stopping nginx on cp1043 and tinkering with its star.wikimedia.org certificate stuff, cp1044 is still online to handle misc-web-lb [21:01:35] Logged the message, RobH [21:02:08] PROBLEM - HTTPS on cp1043 is CRITICAL: Connection refused [21:02:37] well, hopefully it load balances like it should and just cp1044 get the requests ;] [21:03:06] * RobH just has to wait on puppet runs [21:04:31] yea, was just the odd copy down i think, it doesn't split the chain with a newline like it should [21:04:35] should come back online now [21:05:08] RECOVERY - HTTPS on cp1043 is OK: OK - Certificate will expire on 08/24/2015 12:06. [21:09:06] !log cp1043/1044 both updated and back online [21:09:12] Logged the message, RobH [21:09:17] hashar: so the chain is corrected now on systems [21:09:24] wanna see if you get same error as before pls? [21:09:35] robh running puppet on my instance [21:10:07] so i have no idea why it worked before [21:10:15] because its the exact same cert issuer that was on there before! [21:10:38] well, it seems like it, but obviously cannot have been, so someone else got an equinix cert and we were also using it i suppose [21:10:45] (03PS4) 10BryanDavis: Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 [21:10:48] but the one we had before showed rapidssl in its details [21:10:51] so im at a loss. [21:11:01] (as to why it worked before the reissue) [21:12:48] RobH: still broken in labs :D [21:12:51] ... [21:13:02] i have no clue. [21:13:04] maybe it uses a different puppet class to install the cert on instanceS? [21:13:39] you are getting a certificate error during the puppet run when hitting git.wikimedia.org right? [21:14:55] hashar: RobH i think i have hints there [21:15:14] ? [21:15:18] trying to find something [21:16:00] hashar: what do you get this error on [21:17:58] gerrit? [21:18:36] mutante curl reject git.wm.o cert :( https://bugzilla.wikimedia.org/show_bug.cgi?id=59910#c1 [21:18:49] it shows CAfile: none and CApath: /etc/ssl/certs [21:19:01] so maybe the cert we use is not installed on labs ? [21:19:14] https://gerrit.wikimedia.org/r/#/c/90676/ [21:19:21] this is what i remembered [21:19:37] do you see anything about CAfile: /etc/ssl/certs/ca-certificates.crt [21:19:42] in the error [21:20:14] (03CR) 10BryanDavis: [C: 04-1] "Still need to figure out how to make the module more generic as suggested by Faidon." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 (owner: 10BryanDavis) [21:20:24] mutante: CAfile: none [21:20:26] it has no idea the cafile [21:20:34] (03CR) 10BryanDavis: Kibana puppet class (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 (owner: 10BryanDavis) [21:20:38] so seems like the right fix [21:20:40] it uses SSLCACertificatePath /etc/ssl/certs/ [21:20:49] to find it [21:21:12] not ChainFile [21:22:07] for what it is worth : there is no RapidSSL_CA.pem on my labs instance [21:22:11] yea, so the thing is that we did NOT change gerrit to use ChainFile [21:22:18] because it caused a problem [21:22:32] but I have GeoTrust_Global_CA.pem [21:22:39] im making a fix now [21:23:03] mutante: you didnt end up using that change? [21:23:08] mutante: and I get the same issue on gash or graphite :D [21:23:15] cuz it makes sense and i recall paravoid had to add it to some thing yesterday, i think. [21:23:27] so lets try rolling for git and see how thigns go? [21:24:01] hashar: oh? that would sound like what i say is unrelated [21:24:24] if it's about gerrit, you need to find about about /etc/ssl/certs/ca-certificates.crt [21:24:40] if it's about the other and the chained file is used in config [21:25:09] with SSLCertificateChainFile [21:25:22] then the content of the chained.pem is still wrong [21:25:23] well got the issue from labs when pointing to a host which is on misc-web-lb [21:25:36] and as I said, RapidSSL_CA.pem is not present in /etc/ssl/cert which sounds like an issue to me [21:25:46] (03PS1) 10RobH: git.wikimedia.org missing CAPath [operations/puppet] - 10https://gerrit.wikimedia.org/r/106773 [21:25:58] yes, an issue we had other things related to for rt as well [21:26:08] iirc [21:26:15] well, we had many issues, but it was part of it [21:26:31] well, normally you just use one of those options [21:26:44] CA path or ChainFile [21:26:45] ? [21:26:47] oh [21:26:51] so then my fix is no good [21:26:54] what do you suggest? [21:26:57] i'm not sure , maybe it is [21:27:12] gah, fcking swift docs [21:27:12] - figure out if it is about gerrit [21:27:19] or all misc services [21:27:28] uhh [21:27:30] its git.wikimedia.org [21:27:32] not gerrit [21:27:34] you are confusing me. [21:27:41] if it's about gerrit, find about about /etc/ssl/certs/ca-certificates.crt [21:27:41] (why do you keep asking about gerrit?) [21:27:56] if it's not , then the content of your chained.pem is still wrong [21:28:04] because it has a different config [21:28:06] all misc services mutante [21:28:08] are you suggesting its a gerrit issue? [21:28:15] again, im not sure why you keep saying gerrit. [21:28:19] but that seems to be because they all use RapidSSL_CA which is not installed on labs [21:28:55] everything resolving to misc-web-lb.eqiad uses the rapidssl cert yes [21:29:59] RobH: no, i'm not suggesting that (anymore), if it hits all the services it's unrelated [21:30:04] chain file poitns to the chain file, but not to the intermediary certificate rapidssl [21:30:19] sslcertificatechainfile i mean [21:30:37] but nothing references the rapidssl_ca path in teh config [21:30:40] the even. [21:31:22] seems certificates::rapidssl_ca is only applied when one calls install_certificate :D [21:31:33] I guess all instance should have the certs installed shouldn't they? [21:32:00] the content of the chained.pem is supposed to be ALL the certs in one [21:32:05] ahhh [21:32:14] puppet does "cat" [21:32:25] but then if the certs are not installed on the instance, it will not contains them :D [21:32:27] on the cert file, plus everything that was listed in the $ca [21:32:33] where you put 2 files [21:32:36] well, certs.pp lists the files [21:32:51] (03PS5) 10BryanDavis: Proxy logstash.wikimedia.org via misc varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 [21:33:02] make sure all 3 are actually ending up in the file on the server [21:33:13] when it combines then [21:33:37] oh, recall how it fubars the concatting? [21:33:42] and doesnt insert proper lines? [21:33:51] i had to fix that manually on cp servers [21:33:56] so star.wikimedia.org.pem and RapidSSL_CA.pem GeoTrust_Global_CA.pem [21:33:59] all 3 [21:34:03] (03CR) 10BryanDavis: "Still hoping to make changes to I504c4c1 that will allow removing the req.http.Authorization cache bypass." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 (owner: 10BryanDavis) [21:34:07] as listed in https://gerrit.wikimedia.org/r/#/c/106771/3/manifests/certs.pp [21:34:19] (03Abandoned) 10RobH: git.wikimedia.org missing CAPath [operations/puppet] - 10https://gerrit.wikimedia.org/r/106773 (owner: 10RobH) [21:34:23] so that part works on the misc varnish [21:34:39] apparently [21:34:46] RobH: yea, maybe another newline or so? [21:35:33] (03PS1) 10Odder: Add aliases for NS 100, 106 on bnwikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106776 [21:35:59] yea but where? [21:36:05] at end of the ca files? [21:36:22] the stock ones dont have them [21:36:28] so its odd its not chaining right [21:36:47] (its right on the cp servers, but i guess labs servers need it done too) [21:37:06] hashar: can you cat /etc/ssl/certs/star.wikimedia.org.chained.pem on your instance [21:37:25] see if it has fialed to concat the three together, i bet the first and second end and start certificate lines are combined [21:37:35] the three what ? [21:38:01] paravoid: can you look at https://gerrit.wikimedia.org/r/#/c/106777/1 ? [21:38:11] hashar: this is what puppet does, it's just cat: [21:38:19] command => "/bin/cat ${certname}.pem ${ca} > ${location}/${certname}.chained.pem", [21:38:24] ok [21:38:27] hashar: So cat that file, and there should be three begin and end certificate lines [21:38:30] and they all need their own line [21:38:48] i am guessing that the end and start certificate line for the first/second ones are not on different lines as it should be [21:38:51] but it only does that when one install a certificate [21:38:54] which would lead to the chain being shit [21:38:59] and apparently by default, we install no cert at all [21:39:03] actually... [21:39:05] so there is no chained either [21:39:05] $ less /etc/ssl/certs/star.wikimedia.org.chained.pem [21:39:14] oh [21:39:15] hashar: and ${ca} is what is listed in the patch above as "RapidSSL_CA.pem GeoTrust_Global_CA.pem" [21:39:28] hrmm. [21:39:33] which is what I said somewhere above rapid ssl ca is only installed when one invokes installcertificate [21:39:39] which is not the case on a default instance [21:39:47] so I guess we want to require the various certificates in base.pp [21:40:11] hashar: well, if the cert files disappear one would expect it to do that again [21:40:28] so is this just a labs issue now? [21:40:34] now? [21:40:52] well, its always been just a labs issue i think [21:40:58] I guess so [21:41:03] my fix doesnt seem to affect either thing (fixing the top level of CA chain) but meh [21:41:09] cause I could access git.wm.o from my Safari browser [21:41:13] it worked before but had bad info, it works now just fine [21:41:23] its always been labs, i just assumed labs was more strict about the chain. [21:41:26] certs are confusing hehe [21:41:34] ok, just making sure [21:41:37] i think we may need someone who gets how labs interfaces with our cluster better [21:41:49] but dunno. [21:42:07] if we get the .pem files in /etc/ssl/certs that would fix it isn't it ? [21:45:52] shouldnt they install anyhow/ [21:45:56] they are included in certs.pp [21:46:09] but perhaps thats not what installs them.. [21:47:24] yeah I guess [21:47:25] According to the history of Jenkins job https://integration.wikimedia.org/ci/job/mwext-browsertests-UniversalLanguageSelector-phantomjs/ (which uses https://git.wikimedia.org/ ). That stopped working between Dec 10 2013 17:15 and Dec 11 2013 13:40UTC. [21:47:32] not sure what happened [21:47:38] install_certificate{ "gerrit.wikimedia.org": ca => "RapidSSL_CA.pem" } [21:47:54] funny thing is that I have the same issue on lanthanum.eqiad.wmnet as well [21:47:55] that is included in the other rapidssl cert installs as example [21:47:56] heh [21:48:02] it is missing the RapidSSL_CA.pem file :D [21:48:15] old cert was somehow using equifax_ca which is included in distro [21:48:24] ahh [21:48:24] or some basic package that is [21:48:30] so that explains this i think. [21:48:52] something got changed a month ago so ? [21:49:21] no, it changed yesterday [21:49:25] so then thats not it [21:49:25] heh [21:49:29] :D [21:49:44] so this is unrelated, which means i have no idea [21:49:51] every idea i go to is totally wrong for this =P [21:49:53] heh [21:49:54] let me double check the jenkins job hhistory [21:50:21] ah was wrong [21:50:31] first fail is Jan 10 (today) [21:50:38] daaaamn [21:50:40] last success Jan 8 13:00 utc [21:50:44] i was about to let this be your problem ;] [21:50:55] well hell. [21:51:01] karma's a bitch. [21:51:18] hrmm [21:51:33] hehe [21:52:01] so maybe manifests/role/labsproxy.pp [21:52:06] look at line 61 [21:52:29] so if I look at manifests/certs.pp . the install_certificate require certificates::rapidssl_ca, certificates::digicert_ca, certificates::wmf_ca [21:52:45] I guess those CA are not provided in Ubuntu by default and we need them installed on all server / instances [21:53:00] or our server/instances would not be able to https access to the service signed with those CA [21:53:02] change install_certificate{ 'star.wmflabs.org': [21:53:06] to install_certificate{ 'star.wmflabs.org': ca => "RapidSSL_CA.pem": [21:53:08] UNLESS the server had a certificate installed [21:53:09] seems whats needed to me [21:53:18] just the labsproxy [21:53:21] not each labs instance [21:53:23] i think. [21:53:33] same issue on production rob! :D [21:53:42] I reproduce it on lanthanum.eqiad.wmnet [21:53:50] .... [21:53:51] which does not have the RapidSSL_CA.pem [21:54:08] sorry to kill your day :-D [21:54:16] that is two afternoon in week :] [21:54:55] ...how does my home machine work then? [21:54:59] bbleh [21:55:08] you got the rapidssl one ? [21:55:14] no idea honestly :( [21:56:21] i dont think we install the chain file on every system, thats not right [21:56:26] so im not sure [21:56:47] the top level geotrust is on every system that uses ssl as part of its package which should ve enough [21:56:54] i am very confused. [22:02:05] (03PS1) 10Faidon Liambotis: librenms: add librenms::syslog class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106780 [22:02:28] ohh [22:02:41] (03CR) 10Faidon Liambotis: [C: 032] librenms: add librenms::syslog class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106780 (owner: 10Faidon Liambotis) [22:02:50] Certificate chain [22:02:50] 0 s:/serialNumber=06QcQ9dUSZqu5ru7oQSfeCpXiBccrCyh/C=US/O=*.wikimedia.org/OU=GT11518520/OU=See www.rapidssl.com/resources/cps (c)10/OU=Domain Control Validated - RapidSSL(R)/CN=*.wikimedia.org [22:02:52] i:/C=US/O=GeoTrust, Inc./CN=RapidSSL CA [22:02:53] 1 s:/C=US/O=Equifax/OU=Equifax Secure Certificate Authority [22:02:53] i:/C=US/O=Equifax/OU=Equifax Secure Certificate Authority [22:02:59] using openssl s_client -connect git.wikimedia.org:443 [22:03:08] that's wrong [22:03:12] on my laptop I get the GeoTrust one [22:03:18] could it be something being cached ? [22:03:25] no [22:03:31] or queries for misc-web-lb ending on the wrong machine ? [22:04:22] i get all geotrust now that i repalced the ca line [22:05:58] but yea, iron fails [22:06:25] paravoid: So do you know what I've done wrong here? It all looks like it should work to me. [22:06:47] but yea, inside cluster trying to pull git.wikimedia.org over ssl fails [22:06:54] on home system, works great [22:06:55] pasted the command on bug https://bugzilla.wikimedia.org/show_bug.cgi?id=59910#c6 [22:07:02] For me from my home network `openssl s_client --connect git.wikimedia.org:443` returns the same cert 0 as for hashar and also says "verify error:num=19:self signed certificate in certificate chain" [22:07:06] (03PS1) 10Faidon Liambotis: webserver: fix AllowOverride for SSL too [operations/puppet] - 10https://gerrit.wikimedia.org/r/106781 [22:07:30] RobH: it works on your system because you have the geotrust ca in your certificate store, so it's shortcut [22:07:52] bd808: that's because you didn't pass -CApath /etc/ssl/certs or -CAfile /etc/ssl/ca-certificates.crt [22:08:02] (03CR) 10Faidon Liambotis: [C: 032 V: 032] webserver: fix AllowOverride for SSL too [operations/puppet] - 10https://gerrit.wikimedia.org/r/106781 (owner: 10Faidon Liambotis) [22:11:22] so what exactly has to happen to force systems to download the cert that dont have it? [22:11:56] that's not the problem [22:12:00] the problem is that you're serving the chain wrong [22:12:14] the fact that it works on your system is an accident [22:12:54] RobH: I am going to crash to bed. Whenever you get it figured out, can you comment/close https://bugzilla.wikimedia.org/show_bug.cgi?id=59910 ? :-] [22:13:04] yep [22:13:29] can't stay longer, already have to rewrite my sentences twice cause my english is escaping me :D [22:18:09] hrmm [22:18:21] RobH: I think your new star cert needs to have the RapidSSL Intermeidate CA Bundle appended to it. The cert is see served is just the O=*.wikimedia.org cert. See https://knowledge.rapidssl.com/support/ssl-certificate-support/index?page=content&actp=CROSSLINK&id=SO17664 [22:19:38] It works for clients that have the RapidSSL intermediate locally but that's not universal. [22:19:57] yea, it seems we had one on system already, but it only has one of those two certificate entries (in the intermediary) [22:21:01] but that same intermediary file is used for other rapidssl certs without issues [22:21:09] i guess its needed the additional due to wildcard... dunno [22:21:25] (cuz nonwildcard email certs with the intermediary listed, and its shorter than the wildcard by one certifiate) [22:23:54] question about prod/test. From what I understand, this is best used for attributes. How about cookbook code? [22:24:44] hrmm, so i live hacked the added cert into the chain [22:24:47] and now i dont have issue on iron [22:25:09] ie for testing cookbook's. Can I set on the client side, this node is a test node, and then in chef only hand out cookbooks that are test or of a certain version? [22:25:23] i wonder if its included in the nonwildcard if it will break them [22:26:10] though perhaps i can just add an addtional rapidssl_ca_2 and include there [22:26:18] and only include in certs.pp for star.wikimedia.org [22:26:24] paravoid: ^ does that sound sane? [22:26:46] bd808: does it work for you now? [22:26:57] if you wouldnt mind checking, i live hacked a fix into place before puppetizing to see if it was right [22:27:11] it seems ok to me on iron as well as lanthanum, but i like a second set of eyes [22:28:26] RobH: man [22:28:36] ? [22:28:49] focus follow eye problem [22:29:01] eh? [22:29:13] But, yes. It looks like curl fetches the git.wikimedia.org now [22:30:01] openssl is still only showing me the star cert, which was why I was trying to type `man openssl` in another window [22:30:57] ahh [22:31:01] now comments make sense ;] [22:31:23] so i think i can append them in, and it should work, Im going to submit a patchset in a moment [22:34:34] RobH: Here's the full cert chain I'm seeing now: http://pastebin.de/38679. No more warnings aobut self-signed bits. [22:34:49] yea, i think we are good once i merge the real fix [22:34:59] seems that rapidssl non wildcard certs use one intermediary [22:35:10] wildcard rapidssl certs use that same intermediary and an additional one [22:35:16] i was missing the additional one [22:36:07] the non wildcard cert emails include the intermediary info [22:36:10] That sort of makes sense. wildcards are the highest risk certs to validate so they are giving themselves a way to revoke them all without having to revoke every cert they ever issued. [22:36:11] the wildcard link to another page with it [22:36:21] yea, just didnt realize it [22:36:34] * bd808 nods [22:40:15] bleh, had to clean and rebase my local repo [22:40:20] was all crufty with abandoned commits [22:40:46] (03PS1) 10RobH: fixes star.wikimedia.org intermidite certificate chain [operations/puppet] - 10https://gerrit.wikimedia.org/r/106785 [22:41:11] hrmm, whats with all the crazy \r [22:41:51] vims gettin crazy with returns, gotta fix [22:42:29] well, guess not, disregard =P [22:43:27] (03CR) 10RobH: [C: 032] "no other ops about to review, and misc-web-lb is already broken as it sits, so self-reviewing and committing to fix what I already broke." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106785 (owner: 10RobH) [22:44:21] lets see if it will remove my hacky chain on its own or if i have to shread [22:44:24] shred even [22:44:26] pushing to cp1043 [22:46:54] bah, missed installing it [22:47:08] cp1043 is going to complain, im on it [22:49:44] hrmm [22:54:47] i would like to contribute somehow: I'm quite good with puppet. Where can I found if there's something to do? Bugzilla? [22:57:35] (03PS1) 10RobH: install rapidssl_ca_2.pem [operations/puppet] - 10https://gerrit.wikimedia.org/r/106842 [22:58:38] bugzilla has some i suppose [22:58:45] im not sure of a good answer for that honestly [22:59:22] we use RT, but that is a closed tool since it has security and private info, so cannot really point to that [22:59:38] i didn't found anything that is not already in progress [22:59:51] (03CR) 10RobH: [C: 032] install rapidssl_ca_2.pem [operations/puppet] - 10https://gerrit.wikimedia.org/r/106842 (owner: 10RobH) [23:02:44] woooo my fix works (via puppet rather than live hack) on cp1043 [23:03:25] ilmerovingio: Are you on the wikitech mailing list by chance? I feel I should know the answer to that but do not, it may be something worth emailing there since it has the largest audience of ops and ops volunteers (i think) [23:03:37] sorry its not the actual answer =P [23:04:01] https://lists.wikimedia.org/mailman/listinfo/wikitech-l [23:04:41] not the best answer, but better than politely ignoring you ;D [23:05:37] RobH: thanks, I'm joining now the mailing list [23:06:21] So I don't think there is a really good straight-forward answer to that, and there should be [23:06:33] a discussion on list may be the way to start ball rolling, dunno [23:06:52] or maybe im wrong and there is some easy reference i dunno about, but i doubt it [23:06:54] =] [23:07:11] ilmerovingio: One easy way to get involved could be code review [23:07:16] Not the best task in the world, I knoww [23:07:47] :D [23:08:48] is there already a script/howto to setup a testing environment (vagrant?) [23:08:58] Haha, not for puppet no :P [23:09:00] https://wikitech.wikimedia.org/wiki/Main_Page [23:09:01] Reedy: tnx [23:09:05] You'll want to create an account there [23:09:16] https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet,n,z [23:09:46] https://wikitech.wikimedia.org/wiki/Help:Terminology [23:09:53] shows users and how they tie to the other tools somewhat [23:10:19] ie: labs user gets gerrit user cuz they are tied together, etc... [23:11:04] many thanks guys :) I'll start with some code review [23:16:35] (03CR) 10Guido.iaquinti: [C: 031] stages.pp puppet lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/104919 (owner: 10Hashar) [23:24:37] (03CR) 10Guido.iaquinti: [C: 04-1] "All strings that do not contain variables or escape characters like \n or \t should be enclosed in single quotes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/104806 (owner: 10Hashar) [23:27:35] I assume Guido is ilmerovingio ? [23:27:45] yes [23:27:50] cool [23:27:55] you're quick :) [23:27:56] hi greg-g [23:28:00] hi there [23:28:13] lol, were just lint code review [23:28:27] yeah, but account creation etc, either way, welcome! [23:28:34] I would like to understand how gerrit works before do something wrong [23:28:44] yeah, we all would ;) [23:28:57] ehehe I'm a system engineer, it was not so difficult lol [23:29:35] but i don't understand how can you test puppet module without using a dev environment like vagrant, docker and so on [23:30:17] Wikipedia is a great testing ground [23:30:31] ...especially when you break it:P [23:30:35] we just always get it right the first time 'round [23:30:43] ALWAYS. [23:30:46] sir, you have no idea what we used to test. [23:31:03] "You must be new around here" [23:31:33] but I notice people are asking this so often [23:31:50] do we have a screenshot of the 2006-2008 "Wikipedia has a problem" page? [23:31:55] this is history, after all :) [23:32:02] with a link to donations on it no? [23:32:06] iirc [23:32:32] yeah [23:32:42] nono i'm not saying that you are not doing it right lol, just that for new developers or volounter it could be better to have a test environment [23:32:50] :) [23:32:56] oh go on, don't be so shy. [23:33:02] (and sorry for my bad English) [23:33:02] you mean a base test or like, mediawiki? [23:33:05] test it on the live thing [23:33:08] :-P [23:33:16] cuz we have labs for testing your puppet code, but not a perfect production replica [23:33:22] which would indeed be nice. [23:33:24] base test [23:33:35] for testing in developers worstation [23:33:52] at the commit the test are triggered —> staging —> production [23:34:31] <^demon|away> ilmerovingio: We do have a test environment. We've been working on a project with Vagrant to make it easy to spin up development VMs. [23:34:45] <^demon|away> https://www.mediawiki.org/wiki/MediaWiki-Vagrant [23:35:18] cool ^demon|away i didn't see this page before [23:36:51] (03CR) 10Guido.iaquinti: [C: 031] Use eth0 IP rather than localhost for multi-region [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane)