[00:02:57] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 21:02:40 UTC [00:14:41] <^d> ori, greg-g: sent to ops list. [01:03:47] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Fri May 30 01:03:37 UTC 2014 [01:07:24] (03CR) 10Gergő Tisza: "Ic62984e0f4a761642b2bdd1bfa362301ed94c284, if merged, will limit the number of parallel expensive thumbnailing attempts directly, so that " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132112 (owner: 10Gergő Tisza) [01:18:05] (03PS1) 10Springle: Configure dbstore* for heavier TokuDB usage, but use Aria by default for the analytics staging temporary tables. [operations/puppet] - 10https://gerrit.wikimedia.org/r/136271 [01:28:57] jenkins seems even slower than normal [01:47:01] (03CR) 10Springle: [C: 032 V: 032] Configure dbstore* for heavier TokuDB usage, but use Aria by default for the analytics staging temporary tables. [operations/puppet] - 10https://gerrit.wikimedia.org/r/136271 (owner: 10Springle) [01:48:57] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:47:08 UTC [01:50:57] PROBLEM - Puppet freshness on dataset2 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:49:01 UTC [02:31:54] !log LocalisationUpdate completed (1.24wmf6) at 2014-05-30 02:30:51+00:00 [02:32:01] Logged the message, Master [02:46:05] (03PS1) 10Dzahn: puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 [02:50:54] (03PS2) 10Dzahn: puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 [02:54:12] (03PS3) 10Dzahn: puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 [02:57:09] (03PS4) 10Dzahn: puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 [02:57:42] (03CR) 10Dzahn: "still need this for I30b59c7b705a127" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135715 (owner: 10Dzahn) [03:01:17] !log LocalisationUpdate completed (1.24wmf7) at 2014-05-30 03:00:14+00:00 [03:01:22] Logged the message, Master [03:11:57] (03CR) 10Tnegrin: [C: 031] "I approve putting the researchers in a separate group and puppetizing but do not have the technical expertise to approve the code changes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 (owner: 10Dzahn) [03:45:36] (03PS1) 10Ori.livneh: role::mediawiki: clear up nested class hierarchy [operations/puppet] - 10https://gerrit.wikimedia.org/r/136275 [03:45:38] (03PS1) 10Ori.livneh: Tidy role::mediawiki following I9fe649547 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136276 [03:47:16] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri May 30 03:46:09 UTC 2014 (duration 46m 8s) [03:47:19] Logged the message, Master [04:11:28] (03CR) 10Springle: [C: 031] puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 (owner: 10Dzahn) [04:49:57] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:47:08 UTC [04:51:57] PROBLEM - Puppet freshness on dataset2 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:49:01 UTC [05:28:49] (03PS1) 10Reza: Enable FlaggedRevs for Central Kurdish Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136282 (https://bugzilla.wikimedia.org/65809) [05:41:47] (03PS1) 10Reza: Enable MergeHistory for Persian Wikipedia admins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136284 [06:29:45] (03CR) 10Nemo bis: [C: 031] Add French Ministry for Culture to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136154 (https://bugzilla.wikimedia.org/65905) (owner: 10Jean-Frédéric) [06:31:27] (03CR) 10Calak: [C: 031] Enable MergeHistory for Persian Wikipedia admins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136284 (owner: 10Reza) [06:50:50] (03PS3) 10Christopher Johnson (WMDE): Icinga: new command "check_dispatch" for Wikidata [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 [06:56:43] (03CR) 10Nemo bis: "Not without a bug report" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136284 (owner: 10Reza) [07:03:55] (03CR) 10Calak: "https://bugzilla.wikimedia.org/show_bug.cgi?id=65937" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136284 (owner: 10Reza) [07:05:23] (03PS2) 10Nemo bis: Enable MergeHistory for Persian Wikipedia admins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136284 (https://bugzilla.wikimedia.org/65938) (owner: 10Reza) [07:50:57] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:47:08 UTC [07:52:57] PROBLEM - Puppet freshness on dataset2 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:49:01 UTC [08:28:39] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor stuff. The approach is sane though." (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 (owner: 10Dzahn) [08:43:36] (03PS3) 10QChris: Allow to set up hive's auxpath globally [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 [08:47:25] (03PS2) 10Alexandros Kosiaris: thallium,mercury ganeti eval machines setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/136104 [08:47:27] (03PS2) 10QChris: Use HCatalog as default auxpath for Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 [08:49:18] (03CR) 10QChris: Allow to set up hive's auxpath globally (032 comments) [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 (owner: 10QChris) [08:50:59] (03CR) 10QChris: Use HCatalog as default auxpath for Hive (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [09:17:31] !log Jenkins/Zuul locked [09:17:36] Logged the message, Master [09:18:36] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] thallium,mercury ganeti eval machines setup [operations/puppet] - 10https://gerrit.wikimedia.org/r/136104 (owner: 10Alexandros Kosiaris) [09:24:29] (03PS1) 10Odder: Add wgImportSources entries for Wikimedia Poland wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) [09:24:47] !log Jenkins: disconnecting and reconnecting labs slaves to reregister them with Zuu [09:24:48] l [09:24:52] Logged the message, Master [09:26:21] (03PS1) 10Giuseppe Lavagetto: monitoring: monitor mediawiki jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/136292 [09:26:48] !log Zuul is processing jobs again. For reference bug is {{bug|63760}} [09:26:53] Logged the message, Master [09:29:13] (03CR) 10Odder: "Scheduled for deployment during the morning SWAT window on Monday (2014-06-02)." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) (owner: 10Odder) [09:37:39] (03CR) 10jenkins-bot: [V: 04-1] puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 (owner: 10Dzahn) [09:40:01] (03CR) 10Filippo Giunchedi: [C: 031] monitoring: monitor mediawiki jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/136292 (owner: 10Giuseppe Lavagetto) [09:40:44] (03CR) 10jenkins-bot: [V: 04-1] Use HCatalog as default auxpath for Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [10:10:31] pff [10:10:40] ah that is data.yaml not linting hehe [10:11:06] _joe_: the puppet change apparently depends on a submodule commit that might not have been merged yet [10:11:15] 807519d285672cc6abbb2d3f22000285d8a7a6f9 from cdh4 [10:11:25] yeah https://gerrit.wikimedia.org/r/#/c/135539/ (unmerged) [10:11:50] <_joe_> hashar: which puppet change? [10:12:29] (03CR) 10Hashar: "Depends on https://gerrit.wikimedia.org/r/#/c/135539/ and Jenkins/Zuul have no support for cross repositories dependencies yet :(" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136292 (owner: 10Giuseppe Lavagetto) [10:13:01] <_joe_> hashar: ?? [10:13:10] <_joe_> our comment puzzles me [10:13:24] sorry I am not fully awake yet [10:13:27] <_joe_> how could a graphite check depend on jenkins? [10:13:36] <_joe_> s/our/your/ [10:13:41] the ops/puppet change update the puppet/cdh4 submodule to commit 807519d [10:13:46] but that commit has not been merged yet [10:14:15] <_joe_> hashar: you commented on the wrong change I'd say [10:14:22] oh [10:14:43] <_joe_> :) [10:14:46] (03CR) 10Hashar: "Wrong change sorry. Dismiss my previous comment." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136292 (owner: 10Giuseppe Lavagetto) [10:15:05] (03CR) 10Hashar: "Depends on https://gerrit.wikimedia.org/r/#/c/135539/ and Jenkins/Zuul have no support for cross repositories dependencies yet :(" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [10:15:33] I meant to comment on qchris puppet change https://gerrit.wikimedia.org/r/#/c/136014/ (chris: that bump the submodule to a commit which hasn't been merged yet so Jenkins git can't find it and fail :-( ) [10:15:51] Yep. [10:16:06] I pushed the dependent change nonetheless, so ottomata can have a look. [10:16:27] No Jenkins/zuul issue :-) [10:16:42] Zuul upstream has a patch that would make it recognize Depends-On: in commit summary :-} [10:17:01] Really? Awwww. That would be great. [10:17:29] But then ... we'll switch to phabricator :-) [10:18:53] hmm [10:18:56] yeah maybe :-] [10:19:02] Hahaha :-D [10:19:31] which mean overhauling the CI infrastructure [10:19:44] either add support for Phabricator to Zuul (and writing a custom stream-events for Phabricator) [10:20:03] or figure out a way to have Phabricator to trigger tests somewhere (maybe a self hosted Travis) [10:20:12] Aren't the Phabricator people building the own CI tool? [10:20:28] I guessed we'll switch to that as well sooner or later to meet the [10:20:37] "One tool to rule them all" part. [10:21:27] Anyways ... I am fine with Zuul not tracking submodule dependencies ... [10:21:36] Although it would be kind of neat. [10:24:01] that might be possible [10:24:09] will have to fix that for the mediawiki/core wmf branches anyway :( [10:49:30] (03PS3) 10Giuseppe Lavagetto: puppet_compiler: add ferm rule to allow web access [operations/puppet] - 10https://gerrit.wikimedia.org/r/135050 [10:50:01] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet_compiler: add ferm rule to allow web access [operations/puppet] - 10https://gerrit.wikimedia.org/r/135050 (owner: 10Giuseppe Lavagetto) [10:51:57] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:47:08 UTC [10:53:57] PROBLEM - Puppet freshness on dataset2 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:49:01 UTC [11:58:40] (03CR) 10Faidon Liambotis: [C: 04-1] "Drop the debs from the patchset, these don't belong here." (0312 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [12:00:09] ssh: Could not resolve hostname bast1001.wikimedia.org: nodename nor servname provided, or not known [12:00:10] :-( [12:00:44] must be my local DNS resolver [12:45:23] <_joe_> hashar: I had a few issues like that today, but I reverted to google dns and all went well [12:50:05] (03PS1) 10Alexandros Kosiaris: Weekly schedule for dbstore1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136309 [12:54:39] _joe_: might mean our NS have some issue :-( [12:54:44] I havent investigated though [12:55:56] I doubt we'd only hear it from you two [12:56:16] graphs look normal [13:01:03] (03PS1) 10Hashar: contint: fix resource ordering for labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/136310 [13:09:36] (03CR) 10Anomie: Add wgImportSources entries for Wikimedia Poland wiki (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) (owner: 10Odder) [13:11:54] Coren: re: UA based blocking, it's fairly easy to do. We just keep the list in puppet and have nginx block them all. Shouldn't have too much perf implications either, so we *can* do it if it becomes a problem [13:13:50] (03CR) 10Hashar: [C: 031 V: 032] "Deployed on the integration puppetmaster. That fixed the issue puppet had of /var/cache/pbuilder already existing while creating the syml" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136310 (owner: 10Hashar) [13:17:07] YuviPanda: It might be worth doing for some of the more obnoxious UAs; but I know last time I was being strong on bots some tool maintainers felt it harmful that their tool couldn't be spidered by search engines. [13:17:49] (03CR) 10Alexandros Kosiaris: [C: 032] Weekly schedule for dbstore1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136309 (owner: 10Alexandros Kosiaris) [13:18:01] The problem is mostly with those tools that have internal circular get query links; an unwise spider can be highly annoying there. [13:25:34] !log Jenkins polling a third CI slave integration-slave1003. [13:25:34] (03CR) 10Ottomata: [C: 032 V: 032] "Yup, setting undef explicitly is what I meant. Thanks!" [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/135539 (owner: 10QChris) [13:25:39] Logged the message, Master [13:25:56] (03Abandoned) 10Reedy: Update wgServer, wgCanonicalServer for sub.subdomain wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/53885 (https://bugzilla.wikimedia.org/31335) (owner: 10Reedy) [13:26:18] (03PS2) 10Odder: Add wgImportSources entries for Wikimedia Poland wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) [13:26:48] (03CR) 10Odder: Add wgImportSources entries for Wikimedia Poland wiki (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) (owner: 10Odder) [13:26:57] !log Jenkins lowering number of executors on labs slave from 5 to 4 since they have 4 CPU [13:26:59] (03PS3) 10Ottomata: Use HCatalog as default auxpath for Hive [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [13:27:01] Logged the message, Master [13:28:25] (03CR) 10Ottomata: [C: 04-1] "Looks good, except now that the submodule change is merged, it is at a different SHA. This needs a new patch so that cdh4 points at the l" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [13:34:18] (03CR) 10Alex Monk: "Dependency has been abandoned" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99299 (owner: 10MZMcBride) [13:38:03] (03CR) 10QChris: "> Looks good, except now that the submodule change is merged, > it is at a different SHA. This needs a new patch so that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [13:42:31] !log Jenkins: removing label hasJenkinsDebianGlue from labs slaves {{gerrit|136315}} [13:42:35] Logged the message, Master [13:43:28] !log Jenkins: removing label hasHhvm from labs slaves {{gerrit|136315}} [13:43:32] Logged the message, Master [13:47:58] !log Jenkins: removing label hasBrowserTests from labs slaves {{gerrit|136315}} [13:48:03] Logged the message, Master [13:52:54] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:47:08 UTC [13:52:58] (03PS1) 10Whym: FeaturedFeeds for Wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136316 [13:54:29] (03PS1) 10Filippo Giunchedi: merge bits appservers into appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136317 [13:54:54] PROBLEM - Puppet freshness on dataset2 is CRITICAL: Last successful Puppet run was Thu 29 May 2014 19:49:01 UTC [13:55:00] (03CR) 10Filippo Giunchedi: "NOT to be merged until next week, for obvious friday reasons" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136317 (owner: 10Filippo Giunchedi) [13:56:01] <_joe_> godog: I'm not sure that is a good idea; remember you have different load-balancing systems in the two cases [13:56:44] <_joe_> so, in one case (bits) it's based on consistent hashing [13:56:54] (03CR) 10Whym: "This is without a bugzilla request, but I think is uncontroversial. English Wiktionary has had a setting for this in place: https://en.wi" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136316 (owner: 10Whym) [13:57:03] <_joe_> as fare as I've been told [13:57:15] why would this matter? [13:57:26] this doesn't touch varnish btw, just point varnish backends to the general appservers pool [13:58:05] <_joe_> godog: sorry, I thought you pointed varnish to the appserver LVS balancer [13:58:23] that's what he did [13:58:27] why would consistent hashing matter, though? [13:58:55] <_joe_> paravoid: I've no idea, but we're changing that. [13:59:36] <_joe_> so, I'd like to understand first why it was done like that in the first place [13:59:59] <_joe_> and you may have the answer [14:00:02] consistent hashing at what layer? [14:00:07] director_type => "random", [14:00:12] we're not doing chash for bits, no [14:00:27] <_joe_> oh ok, so who told me that? I dunno [14:00:36] <_joe_> godog: then nevermind [14:00:39] :) [14:00:46] there's no point in doing chash for this [14:00:49] as the appservers are stateless [14:01:06] for backend caching we use memcache, and that's a common pool [14:01:21] <_joe_> paravoid: it seemed to me it could be an attempt at optimizing apc and other caches on the appservers [14:01:30] <_joe_> ok [14:01:38] nah [14:02:17] <_joe_> it did not make that much sense to me, but I assume I am still missing something if one thing does not convince me [14:02:19] (03PS1) 10Alexandros Kosiaris: Fix a variable reassignment issue [operations/puppet] - 10https://gerrit.wikimedia.org/r/136320 [14:02:27] * _joe_ still a n00b [14:02:35] I was on the edge before seeing how small bits is compared to the general appservers [14:02:47] yeah :) [14:03:01] <_joe_> bits is small for a good reason [14:03:16] <_joe_> I think the cache hits on varnish are incredibly high on them [14:03:38] <_joe_> so you hit the backend much much less and for less expensive resources [14:03:57] the resources themselves are expensive [14:04:08] to generate, I mean [14:04:13] <_joe_> when something like yesterday happens (a ton of uncached static files on all large wikipedias) [14:04:31] but the URLs are small in quantity and the total dataset is small in size, so it all fits into memory [14:04:31] <_joe_> it will blow to pieces even if we have 20 servers, IMO [14:04:53] <_joe_> if I know something about php behaviour :) [14:07:07] the other test would be a deploy and see how it fares [14:07:45] (03CR) 10Ottomata: [C: 032 V: 032] "Oh they do match...hmmm. ok!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136014 (owner: 10QChris) [14:08:02] oh bison... [14:08:49] (03CR) 10Alexandros Kosiaris: [C: 032] Fix a variable reassignment issue [operations/puppet] - 10https://gerrit.wikimedia.org/r/136320 (owner: 10Alexandros Kosiaris) [14:11:29] heya chasemp, i think the stats git issue is on analytics1027 as well [14:11:35] gid* [14:13:30] I'm reading the bison source [14:13:32] omg [14:13:33] Taking my car in to the shop can fix in a bit, or I just did the gid shuffle to free up 1001 if you on want to tackle it [14:13:37] my eyes [14:14:07] Ottomata ^ [14:14:31] (03CR) 10Filippo Giunchedi: [C: 031] contint: fix resource ordering for labs slave (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136310 (owner: 10Hashar) [14:15:37] k chasemp i'll get it [14:15:49] what node did you do that on? [14:15:51] 1003? [14:15:57] i'll just make gids match [14:27:56] ori: around? [14:33:03] * YuviPanda waves at bd808 [14:36:14] hey YuviPanda [14:36:28] bd808: the m&ms were finished today [14:36:55] YuviPanda: Oh no! Now you are without sugar. [14:37:07] bd808: indeed. but still in Scotland, so that might be fixable. [14:37:18] bd808: I have, however, bought about 3 liters of Sriracha sauce [14:37:27] :) [14:37:35] bd808: :D [14:38:23] bd808: in other ways, would be nice to review the approach taken in https://gerrit.wikimedia.org/r/#/c/135499/. latest PS and PS10 have slightly different approaches, and am wondering if you have any thoughts on less invasive solution to the hackiness in PS10. Whenever you have the time... [14:38:50] (03PS1) 10Alexandros Kosiaris: Specify the backup pool in mysqlset [operations/puppet] - 10https://gerrit.wikimedia.org/r/136324 [14:39:26] paravoid: I still can't log into mw1163.eqiad.wmnet and it's sick. The l10n cache needs to be rebuilt there with `sudo -u mwdeploy -n -- /usr/local/bin/scap-rebuild-cdbs` [14:40:36] YuviPanda: I'll put it on my list of things to look at. [14:40:39] bd808: ty! [14:47:21] cmjohnson1, chasemp: Do either of you have a minute to fix l10n cache on mw1163.eqiad.wmnet? It needs `sudo -u mwdeploy -n -- /usr/local/bin/scap-rebuild-cdbs` to be run and ssh isn't letting me authenticate to the host. I think at this point it is only logging errors because of health checks; the error rate isn't high enough to indicate that it's pooled. [14:47:56] <_joe_> bd808: I can do that [14:48:10] Awesome. Thanks [14:48:26] <_joe_> bd808: I think there was some issue with mw-sync though [14:48:39] <_joe_> let me check with the other ops [14:48:51] (03PS4) 10Christopher Johnson (WMDE): Icinga: new command "check_dispatch" for Wikidata [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 [14:48:52] That was the host where /usr/local/apache/common-local was missing [14:49:46] It is making noises now in the logs like it got the mw code installed but hasn't had the l10n compiled. [14:50:00] <_joe_> ok lemme check [14:51:44] <_joe_> bd808: running [14:52:16] (03CR) 10Faidon Liambotis: [C: 04-1] rcs: install servers with trusty (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 (owner: 10Giuseppe Lavagetto) [14:53:23] <_joe_> bd808: done [14:53:28] <_joe_> should i log this? [14:53:31] I don't remember faidon repooling it (1163), but remember his explicitly not repooling it yesterday. [14:53:35] _joe_: yes please [14:54:00] (03CR) 10Alexandros Kosiaris: [C: 032] Specify the backup pool in mysqlset [operations/puppet] - 10https://gerrit.wikimedia.org/r/136324 (owner: 10Alexandros Kosiaris) [14:54:00] greg-g: I don't think it is pooled. It's just being hit by some health check. [14:54:03] <_joe_> !log ran scap-rebuild-cdb on mw1163 [14:54:08] Logged the message, Master [14:54:14] <_joe_> you can verify if it's pooled btw [14:54:20] the depool -> fix -> repool process needs to be much more automated/fool proof than it is now. :/ [14:54:32] it used to be fine [14:54:40] the process or the machine? [14:54:44] the process [14:54:46] * greg-g nods [14:54:55] it used to be the case where the machine would boot up, apache would *not* be running [14:54:59] what changed? [14:55:00] ah [14:55:03] puppet would do mw-sync, then start apache [14:55:05] <_joe_> http://noc.wikimedia.org/pybal/eqiad/apaches it is not pooled [14:55:14] mw-sync is kinda broken now [14:55:17] in at least two ways [14:55:35] well, the combination of puppet/mw-sync i should say [14:57:05] * greg-g waits patiently [14:57:10] it was the existence of common-local at first [14:57:15] now it breaks with err: /Stage[main]/Misc::Deployment::Vars/File[/a/common]/ensure: change from absent to directory failed: Cannot create /a/common; parent directory /a does not exist [14:57:32] plus the /usr/local/bin/scap-rebuild-cdbs that bd808 mentioned above [14:57:56] I did at least addd scap-rebuild-cdbs to puppet a while ago; I think [14:59:13] Yeah. exec { 'mw-sync-rebuild-cdbs': ... } in mediawiki::sync [14:59:29] so why did someone need to run it manually now? [14:59:33] needed* [14:59:45] But on that host, mw-sync failed in puppet because of missing /usr/local/apache/common-local [15:00:03] I think it the sync-common was run by hand. [15:00:12] Maybe puppet isn't enabled there now? [15:00:17] * bd808 is guessing [15:00:37] err: /Stage[main]/Misc::Deployment::Vars/File[/a/common]/ensure: change from absent to directory failed: Cannot create /a/common; parent directory /a does not exist [15:00:38] (03PS4) 10Giuseppe Lavagetto: rcs: install servers with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 [15:01:05] (and stops after that) [15:01:07] Blech. /a should die [15:01:18] /apache too :) [15:01:39] ottomata: did you get that gid thing straightened out? [15:01:50] paravoid: FWIW, I ran into that parent issue too, supposedly it is fixed in 2.7 as per http://projects.puppetlabs.com/issues/6368 but I couldn't figure out why it didn't [15:02:18] <_joe_> godog: parent issue? [15:02:22] <_joe_> what is that? [15:02:54] <_joe_> godog: that never worked before 3.x [15:02:59] (03PS1) 10Hashar: beta: bring in mediawiki/skins.git [operations/puppet] - 10https://gerrit.wikimedia.org/r/136325 (https://bugzilla.wikimedia.org/65868) [15:03:04] <_joe_> even if the docs said otherwise [15:03:36] So currently puppet is missing creation of /a and /usr/local/apache/common-local for a new host. [15:03:49] fair enough, I didn't dig into the commit from the issue linked [15:03:53] bd808: do we actually use /a anywhere? [15:04:28] On tin yes. Elsewhere I'm not sure [15:04:50] /a/common is the prep location on tin for scap/sync-* [15:04:59] And the root for the rsync server [15:05:06] All of which is pretty easy to change [15:05:42] https://gerrit.wikimedia.org/r/#/c/136251/2/modules/mediawiki/manifests/sync.pp [15:05:51] morning [15:06:19] (03PS2) 10Hashar: beta: bring in mediawiki/skins.git [operations/puppet] - 10https://gerrit.wikimedia.org/r/136325 (https://bugzilla.wikimedia.org/65868) [15:06:41] ori: misc::deployment::vars needs a require on /a/common too [15:07:11] the whole thing is such a fucking mess [15:07:27] yeah it's a tangle of legacy whatever [15:07:48] the initial bootstrapping should be a shell script perhaps rather than a bunch of resources linking arms across multiple files in the puppet repo [15:08:40] also the way labs is handled is very poor [15:09:03] We were under some time pressure :) [15:09:09] chasemp: naw been in meetings [15:09:14] having the same class applied on labs and production is a pretty hollow victory if it is accomplished by having if labs { ..... } else { .... } [15:09:41] let's never again compromise production for the sake of labs [15:09:41] that's how betalabs has been setup traditionally :( [15:09:55] although hashar will remember all of my -1s (and occasionally -2s) against that [15:10:18] hiera("mw_stage_dir") [15:10:39] it doesn't need to be different across labs and prod [15:10:44] ori: with all bryan work on beta, we can surely drop some if $::realm labs [15:10:51] (03CR) 10BryanDavis: "misc::deployment::vars has a define for File[$mw_common_source] that needs a Require => File['/a'] as well. And we need to rewrite the who" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136251 (owner: 10Ori.livneh) [15:11:04] <_joe_> hashar: $mw_stage_dir in puppet 3 then! [15:11:11] _joe_: !!!! [15:11:18] my view is that it should all be contained in /srv/mediawiki [15:11:35] puppet should create /srv/mediawiki and chown it, and that's the full extent of it [15:11:53] yeah [15:11:55] all the other crazy parts, symlinks etc. should really be gone [15:11:57] agreed [15:12:02] Now that we are mostly off of nfs in beta (apache config still there) we probably can clean up more puppet forking. [15:12:24] and when I say /srv/mediawiki, I mean that [15:12:32] not /srv/deployment/foo/bar/baz/mediawiki [15:12:36] (03PS3) 10Hashar: beta: bring in mediawiki/skins.git [operations/puppet] - 10https://gerrit.wikimedia.org/r/136325 (https://bugzilla.wikimedia.org/65868) [15:12:38] yes [15:12:46] :) [15:12:51] /srv/deployment/mediawiki/mediawiki/mediawiki/mediawiki [15:13:01] that's the rcstream pattern isn't it [15:13:02] http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo [15:13:05] yep [15:13:19] True story: /srv/deployment/scap/scap/scap/__init__.py [15:13:28] ottomata: ok fixed it [15:13:47] sad fact: MediaWiki is written in PHP. [15:14:15] PHP that is just barely incompatible with php 4 [15:14:34] (03CR) 10Ori.livneh: "if puppet manages a directory and its subdirectory, it automatically orders them, such that explicit dependencies are not required" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136251 (owner: 10Ori.livneh) [15:15:02] bd808: i had second thoughts about using composer to manage libs in prod [15:15:22] bd808: one issue is that afaik there is no python api for composer, whereas there is for git [15:15:38] <_joe_> paravoid: I already complained about the rcstream path [15:15:56] 11:38 < _joe_> paravoid: we should disagree sometimes :) [15:16:01] rightbackatya :P [15:16:07] <_joe_> eheh [15:16:09] ori: And that matters because? [15:16:29] <_joe_> I was about to add, ori [15:16:42] (03CR) 10BryanDavis: "Great. I think you've told me this before but it never sticks in my head." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136251 (owner: 10Ori.livneh) [15:16:48] <_joe_> responded like it was the 100th time he heard that complaint :) [15:17:02] because as we learned from scap and jobs-loop.sh having a deployment infrastructure that is glued together with shell pipelines is brittle [15:18:05] it's adding a dependency on another tool that isn't doing anything git isn't already doing [15:18:08] and it's not replacing git [15:18:15] I wouldn't actually expect the deploy pipeline to execute Composer, but maybe I'm missing a longer term thought. [15:18:46] <_joe_> someone said "Composer" and "deploy" in the same sentence? [15:18:48] bd808: oh, so i would run composer locally to update the vendor repo? [15:18:55] bd808: and then git-review that? [15:18:55] It is actually going something that git isn't doing. It creates an optimized autoloader.php file. [15:19:02] ori: Yes. [15:19:07] <_joe_> IMO, composer may be good at BUILD time [15:19:11] (03PS4) 10Hashar: beta: bring in mediawiki/skins.git [operations/puppet] - 10https://gerrit.wikimedia.org/r/136325 (https://bugzilla.wikimedia.org/65868) [15:19:19] _joe_: And I agree [15:19:21] _joe_: yes this is what bd808 is proposing apparently, i misunderstood [15:19:21] <_joe_> if you have a completely different setup than ours [15:19:31] <_joe_> bd808: :) [15:19:36] hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm [15:19:48] lets use composer to deploy! [15:19:54] <_joe_> eheh [15:20:01] bd808: yeah, i think that renders my point moot [15:20:06] <_joe_> hashar: I've seen people using puppet to deploy [15:20:08] chasemp: thanks! [15:20:11] bd808: ok, having second thoughts about my second thoughts now :) [15:20:14] <_joe_> nothing would surprise me anymore [15:20:31] _joe_: well that probably works on small clusters I guess [15:20:40] bd808: but we should shoot for doing that for extensions and skins too then [15:20:41] <_joe_> hashar: it does not. [15:21:04] <_joe_> hashar: problems you have at 10 servers are not that different, in terms of concepts, of what you have at 10K servers [15:21:10] ori: Hmmm... interesting. [15:21:37] <_joe_> the only difference is that with 10 servers your inadequacies are less evident if you deploy wrong [15:22:07] bd808: we'd be following the advice we gave jeroen [15:22:18] bd808: namely having a composer.json in the prod branch that enumerates the skins and extensions [15:22:55] basically instead of having a different .gitmodules between master / 1.2#wmf# we'd have a different composer.json [15:23:19] ori: Yeah. THat would make composer a first class citizen for us which should help downstream users too. [15:23:19] (03PS1) 10Rush: admin yaml convert oxygen and snapshot* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136329 [15:24:14] (03PS5) 10Hashar: beta: bring in mediawiki/skins.git [operations/puppet] - 10https://gerrit.wikimedia.org/r/136325 (https://bugzilla.wikimedia.org/65868) [15:24:19] bd808: and composer's plugin framework / custom types lets us collect all the idiosyncratic logic that we currently have scattered around the wmf-config repo and the maintenance scripts [15:24:22] _joe_: Were you able to access those puppet 2 test boxes, and did they teach you anything? (Looked to me like they just came up and worked, no fuss) [15:24:52] <_joe_> andrewbogott: nope sorry, stuck in another task today [15:25:15] (03CR) 10Giuseppe Lavagetto: rcs: install servers with trusty (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 (owner: 10Giuseppe Lavagetto) [15:25:23] (03PS5) 10Giuseppe Lavagetto: rcs: install servers with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 [15:25:46] _joe_: no problem, just making sure I'm not blocking you [15:26:09] <_joe_> no, you're not :) [15:26:15] bd808, I'm not super-familiar with the composer stuff, but am seeing some parallels with how we use npm in JS land [15:26:20] <_joe_> I'll try this weekend if I do have time [15:26:24] (03CR) 10Christopher Johnson (WMDE): "Thanks for your review. I responded inline to the comments and pushed a new patch with the changes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [15:26:30] gwicke: It is very similar [15:26:33] <_joe_> gwicke: it's npm for php [15:26:36] <_joe_> basically [15:26:45] yeah, same problems I guess [15:28:13] npm, composer, virtualenv + pip, rbenv + gem are all similar [15:28:33] they are all great in their own niche, but suck at integrating between languages etc [15:28:34] (03PS5) 10Dzahn: puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 [15:29:02] There are no silver bullets. ;) [15:29:24] yeah, true [15:29:48] (03CR) 10jenkins-bot: [V: 04-1] puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 (owner: 10Dzahn) [15:31:03] (03PS2) 10Rush: admin yaml convert oxygen and snapshot* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136329 [15:31:08] I'm leaning towards using a more general package format wherever possible [15:31:10] (03CR) 10Rush: [C: 032 V: 032] admin yaml convert oxygen and snapshot* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136329 (owner: 10Rush) [15:31:19] but you knew that already ;) [15:31:25] bd808: you have two patches now with the subject "Add a PSR-3 based logging interface" [15:32:11] The first is permanently -2'd. I should amend the subject [15:33:00] just abandon it [15:34:00] I can do that. It was sticking around just to show the whole set of changes together, but that's of diminishing use. [15:37:28] (03CR) 10Giuseppe Lavagetto: [C: 032] rcs: install servers with trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/136022 (owner: 10Giuseppe Lavagetto) [15:39:41] (03CR) 10Hashar: "That is probability not the nicest puppet around but it got the job done." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136325 (https://bugzilla.wikimedia.org/65868) (owner: 10Hashar) [15:41:27] bd808, ori: so how will this composer stuff work for packaging? [15:41:29] (03CR) 10Hashar: contint: fix resource ordering for labs slave (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136310 (owner: 10Hashar) [15:41:42] will it be easy to package the dependencies separately? [15:41:48] (03PS2) 10Hashar: contint: fix resource ordering for labs slave [operations/puppet] - 10https://gerrit.wikimedia.org/r/136310 [15:42:11] <_joe_> gwicke: packaging what? [15:42:17] gwicke: I'm not sure. Which variety of packaging are you talking about? [15:42:42] debs for example [15:42:49] * _joe_ prepares his _we_won't_use_debs_for_deploying_php sign [15:42:55] <_joe_> eheh [15:42:58] <_joe_> I knew that [15:42:59] other folks do [15:43:03] sorry have to run for a bit [15:43:05] bbiaf [15:43:06] For the monolithic tarballs it shouldn't be too bad to integrate. I'm not sure about native packages for mw. I've really never looked at them. [15:43:09] will reply then [15:43:15] <_joe_> gwicke: other folks eat crickets [15:43:23] <_joe_> not a good reason to do that :) [15:43:32] well, I for one wouldn't mind an easy way to install mediawiki [15:43:34] <_joe_> but I won't get drained in that debate again :) [15:43:46] <_joe_> gwicke: composer may help a lot then [15:43:49] gwicke: `git clone ...` [15:44:00] Or `vagrant up` [15:44:23] how good is composer at replacing something like apt? [15:44:25] <_joe_> bd808: If someone wants to create a bundle of mw + a lot of extensions + themes, composer may be a good idea [15:44:32] <_joe_> gwicke: it is not [15:44:37] <_joe_> as npm is not [15:44:40] can it install memcached, node etc? [15:44:46] nope [15:44:52] <_joe_> gwicke: your question makes no sense, sorry [15:44:59] It's for managing php libraries [15:45:10] local to a given directory [15:45:18] <_joe_> "how good is a hammer at replacing a bycicle?" [15:45:28] mediawiki is more than some php code [15:45:28] <_joe_> that's the same kind of difference :) [15:45:46] especially with us moving more towards SOA [15:46:01] <_joe_> gwicke: what has SOA to do with deployment? [15:46:02] gwicke: Which is why it's going to be more and more difficult to package easily [15:46:10] <_joe_> and packaging, also? [15:46:40] It's moving towards needing an "appliance" install for casual users. [15:46:40] <_joe_> the problems with SOA are not the deploy and packaging of apps [15:46:43] _joe_, good packaging is a way to make the installation of a collection of services easy [15:47:04] RECOVERY - Puppet freshness on oxygen is OK: puppet ran at Fri May 30 15:47:03 UTC 2014 [15:47:12] <_joe_> gwicke: but the problems with SOA are 99% not with installation [15:47:30] <_joe_> I've worked in an heavy-soa shop for the last three years [15:47:42] they aren't at our scale [15:47:58] but for the casual user, installing 10 services manually won't work [15:48:06] <_joe_> gwicke: as complexity of the software, I'd say 10x [15:48:19] <_joe_> and we had about half the servers we have now [15:48:32] I personally never seen a complex system that can be installed and configured via `apt-get install foo` [15:48:40] <_joe_> so I know something about scale too. Don't assume everyone is naive. [15:48:56] <_joe_> bd808: I did, but that was not a webapp with deploys 3 times a week [15:49:10] <_joe_> (or a day, if we ever want to do that) [15:49:28] I'm aiming for once and hour [15:49:30] bd808, I think a lot of distros are fairly complex these days [15:49:36] s/and/an/ [15:50:02] not sure if something like a kde desktop qualifies as complex though [15:50:05] <_joe_> bd808: nor - worse! an SOA with 100 services with different versions of a service living at the same time because of lack of coordination [15:50:19] <_joe_> (which is a good thing, bwt) [15:50:39] so what is the status of https://packages.debian.org/search?keywords=mediawiki [15:50:46] Debian has mediawiki after all [15:50:53] re: apt discussion [15:50:57] they only run LTS [15:51:11] yeah, and don't support multiple wikis off the same codebase afaik [15:51:29] i see [15:51:31] we generally discourage its use, and didn't work on it [15:51:35] <_joe_> mutante: there's no apt discussion, really :) [15:51:45] * bd808 goes back to work [15:51:52] I hope that we'll change that eventually [15:54:14] PROBLEM - Host thallium is DOWN: PING CRITICAL - Packet loss = 100% [15:54:25] that one is me, no worries [15:54:27] bd808: in any case, it'd be great if library inclusion would work without rewriting composer.json, especially for optional stuff [15:54:54] RECOVERY - Host thallium is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [15:54:54] that'd make it easy to package those things separately for distribution [15:55:54] PROBLEM - Host mercury is DOWN: PING CRITICAL - Packet loss = 100% [15:56:22] <_joe_> akosiaris: stop playing with ganeti! [15:56:23] <_joe_> :) [15:56:33] <_joe_> ok, bye, gotta go for today [15:56:43] heh. my playground until I send an email :P [15:57:26] hashar: does the 'Ops' group (aka 1004) have any meaning in the beta project? I'm wondering if I can assign it a new ID without breaking anything [15:58:10] andrewbogott: I don't think I ever heard of Ops group [15:58:19] hashar: perfect [15:58:22] andrewbogott: we relied on wikidev at some point but then switched everything to mwdeploy [15:58:25] or l10nupdate [15:58:32] 'k [15:58:36] so should be safe :} [16:00:04] RECOVERY - Host mercury is UP: PING OK - Packet loss = 0%, RTA = 1.18 ms [16:00:24] PROBLEM - Host thallium is DOWN: PING CRITICAL - Packet loss = 100% [16:02:34] !log enabled VT on thallium/mercury for ganeti evaluation purposes [16:02:39] Logged the message, Master [16:02:54] RECOVERY - Host thallium is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [16:03:45] (03PS6) 10Dzahn: puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 [16:05:47] Coren: http://www.foodbeast.com/2014/05/07/this-spicy-insect-crunch-roll-is-sushi-made-from-crickets-and-worms/ [16:07:11] (03PS1) 10Rush: admin yaml fixing dataset user dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/136335 [16:07:29] !log changes the 'Ops' gid to 700 in ldap [16:07:34] Logged the message, Master [16:07:52] (03PS2) 10Rush: admin yaml fixing dataset user dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/136335 [16:08:33] mutante: Past experience with eating insecta have reached the conclusion of "generally palatable, but drab beyond description. Survival food, not gourmet." [16:09:50] Coren: heh, fair, i'd try some crunchy crickets:) [16:11:57] (03CR) 10Dzahn: [C: 031] admin yaml fixing dataset user dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/136335 (owner: 10Rush) [16:12:15] (03PS1) 10Rush: admin yaml fix gadolinium.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136336 [16:12:32] (03CR) 10Rush: [C: 032 V: 032] admin yaml fixing dataset user dependencies [operations/puppet] - 10https://gerrit.wikimedia.org/r/136335 (owner: 10Rush) [16:12:39] (03CR) 10Rush: [C: 032 V: 032] admin yaml fix gadolinium.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136336 (owner: 10Rush) [16:12:59] mutante: Crickets are one of the worst offenders. The only thing I can say about their taste is "crunchy and nondescript". They are usually seasoned, mind you, and that's all you taste. [16:13:43] Coren: sounds like potato chips [16:14:44] RECOVERY - Puppet freshness on gadolinium is OK: puppet ran at Fri May 30 16:14:42 UTC 2014 [16:15:04] RECOVERY - Puppet freshness on dataset1001 is OK: puppet ran at Fri May 30 16:14:57 UTC 2014 [16:17:16] (03PS7) 10Dzahn: puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 [16:18:35] (03CR) 10Dzahn: [C: 032] puppetize researchdb password file [operations/puppet] - 10https://gerrit.wikimedia.org/r/136273 (owner: 10Dzahn) [16:18:36] mutante: With legs. [16:19:04] RECOVERY - Puppet freshness on dataset2 is OK: puppet ran at Fri May 30 16:19:00 UTC 2014 [16:19:11] mealworms, another common fare, are basically just mostly tasteless raw protein. Animal tofu. :-) [16:19:29] animal tofu, hehee [16:19:32] I've eaten silkworm pupae, they were pretty good but similarly just a vehicle for chili [16:19:52] But they have the upside of being a byproduct of an already-farmed crop [16:20:00] licked lemon ants [16:20:05] lemon ants? [16:20:10] it tasted lemony [16:20:13] * ori googles [16:20:15] But also, because pupae, no visible features. Just little pill-shaped things. Pretty inoffensive. [16:20:23] (licked lemon) ants or licked (lemon ants)? [16:20:34] "Lemon Ants get their name from literally tasting like lemons when eaten.[4]" heh [16:20:44] The ants are, in fact, edible. They are called “lemon ants” because of their vague tangy, lemony taste. Feel free to have a try: it won’t hurt [16:21:07] * greg-g won't eat fire ants [16:21:29] mutante: I expect it hurts the ants. :-) I wonder if that's because formic acid tastes lemony or because those ants actually have citric acid in tastable doses in 'em. [16:21:31] Several people have told me that termites are super delicious. But I've never had the opportunity. [16:21:32] I lived in Texas as a kid, I know what they do to you when a swarm gets a hold of your leg. [16:21:36] greg-g: Might be spicy! :-) [16:21:48] One of the said 'especially with jam' which raised a lot of questions [16:21:53] Tastes like fire! [16:22:14] * Coren hates fire ants, but that's just because he plays Nethack. :-) [16:22:26] _joe_: btw, the puppet catalog compiler job says this in its console output: cp: cannot create regular file `/usr/local/bin/naggen': Permission denied [16:22:30] _joe_: it still works, but just fyi [16:22:54] Lemon ants are the only known insect to use formic acid as a herbicide [16:23:28] goes back to puppet, sorry, i just had to because of the "eating crickets" :) [16:24:20] (03PS2) 10Ori.livneh: role::mediawiki: clear up nested class hierarchy [operations/puppet] - 10https://gerrit.wikimedia.org/r/136275 [16:24:44] <_joe_> ori: yeah, known issue :) [16:25:48] loomio: get insects for 3rd floor kitchen [16:27:27] mutante++ [16:33:14] PROBLEM - Disk space on virt1001 is CRITICAL: DISK CRITICAL - free space: /tmp/wlm-apache1 212 MB (2% inode=36%): [16:33:42] (03PS1) 10Chad: Remaining wikis other than enwiki and commonswiki to Cirrus as primary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136338 [16:36:07] (03CR) 10Ori.livneh: "INFO: Nodes: 7 OK 0 DIFF 0 FAIL" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136275 (owner: 10Ori.livneh) [16:36:41] chasemp: got a sec to glance at that? ^^ it's a simple change and i verified it with jenkins [16:36:49] Coren: I think all acids have a similarly sour taste to the tongue, not just citric. the only data I could find was this, though: http://thehappyscientist.com/science-experiment/acid-hunt [16:38:00] ori: meeting with ottomata atm hopefully in a bit? [16:38:10] chasemp: ah np i'll shop around :P [16:38:18] anyone else? [16:39:11] (03PS2) 10Ori.livneh: Tidy role::mediawiki following I9fe649547 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136276 [16:43:19] (03CR) 10Ori.livneh: "INFO: Nodes: 7 OK 0 DIFF 0 FAIL" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136276 (owner: 10Ori.livneh) [16:45:17] (03PS1) 10RobH: fixing a mistake for rcs1001 forward entry [operations/dns] - 10https://gerrit.wikimedia.org/r/136340 [16:46:30] (03CR) 10RobH: [C: 032] fixing a mistake for rcs1001 forward entry [operations/dns] - 10https://gerrit.wikimedia.org/r/136340 (owner: 10RobH) [16:46:43] new version of firefox is so much less painful in gerrit [16:47:54] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 13:47:01 UTC [16:47:56] wonders how this got in his local git change: modified: ../../modules/cdh4 [16:49:17] RECOVERY - Disk space on virt1001 is OK: DISK OK [16:52:09] (03PS1) 10Dzahn: ensure passwords directory exists on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136341 [17:01:25] (03CR) 10Andrew Bogott: [C: 031] "Yep, this is better" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136275 (owner: 10Ori.livneh) [17:01:44] (03PS3) 10Ori.livneh: role::mediawiki: clear up nested class hierarchy [operations/puppet] - 10https://gerrit.wikimedia.org/r/136275 [17:01:53] (03CR) 10Ori.livneh: [C: 032] role::mediawiki: clear up nested class hierarchy [operations/puppet] - 10https://gerrit.wikimedia.org/r/136275 (owner: 10Ori.livneh) [17:04:22] (03CR) 10Andrew Bogott: [C: 031] "I don't love the way the gerrit diff-highlighter is marking 'whole strings' as different when all you changed was the quote marks :( But " [operations/puppet] - 10https://gerrit.wikimedia.org/r/136276 (owner: 10Ori.livneh) [17:05:16] (03PS3) 10Ori.livneh: Tidy role::mediawiki following I9fe649547 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136276 [17:05:26] (03CR) 10Ori.livneh: [C: 032 V: 032] Tidy role::mediawiki following I9fe649547 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136276 (owner: 10Ori.livneh) [17:07:04] andrewbogott: many thanks! already applied them on several apaches in prod and they are indeed no-ops [17:08:14] (03PS1) 10Ori.livneh: mediawiki::web: Remove $::lsbdistrelease guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/136344 [17:15:40] (03PS1) 10Ori.livneh: Get rid of role::mediawiki::appserver::test [operations/puppet] - 10https://gerrit.wikimedia.org/r/136346 [17:17:24] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Fri May 30 17:17:15 UTC 2014 [17:18:25] (03PS1) 10Ori.livneh: role::mediawiki::job_runner: same config for beta & prod [operations/puppet] - 10https://gerrit.wikimedia.org/r/136347 [17:21:07] (03PS1) 10Ori.livneh: get rid of {jobrunner,videoscaler}-apache-service-stopped Execs [operations/puppet] - 10https://gerrit.wikimedia.org/r/136349 [17:21:54] !log restarted pdns on virt0 and virt1000 [17:21:59] Logged the message, Master [17:26:22] <_joe_> ori: oh the last one, I love this! [17:26:36] _joe_: :) [17:26:41] (03PS1) 10Ori.livneh: mediawiki::web: make $maxclients numeric; simplify config [operations/puppet] - 10https://gerrit.wikimedia.org/r/136351 [17:27:12] <_joe_> ori: is it not relevant anymore? [17:27:57] (03CR) 10Aaron Schulz: [C: 031] Get rid of role::mediawiki::appserver::test [operations/puppet] - 10https://gerrit.wikimedia.org/r/136346 (owner: 10Ori.livneh) [17:30:41] (03CR) 10Aaron Schulz: "It only takes 17 jobs to fill the pipeline, so all someone has to do is edit a template. If it has a few hundred backlinks (do any have mo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136347 (owner: 10Ori.livneh) [17:31:09] (03PS1) 10Ori.livneh: Move beta-specific configs from role::mediawiki::common to role::mediawiki::appserver::beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/136353 [17:31:16] (03CR) 10Aaron Schulz: [C: 031] get rid of {jobrunner,videoscaler}-apache-service-stopped Execs [operations/puppet] - 10https://gerrit.wikimedia.org/r/136349 (owner: 10Ori.livneh) [17:31:43] (03PS1) 10RobH: more fixes for rcs dns [operations/dns] - 10https://gerrit.wikimedia.org/r/136354 [17:32:20] (03CR) 10RobH: [C: 032] more fixes for rcs dns [operations/dns] - 10https://gerrit.wikimedia.org/r/136354 (owner: 10RobH) [17:33:02] (03CR) 10Aaron Schulz: [C: 031] mediawiki::web: make $maxclients numeric; simplify config [operations/puppet] - 10https://gerrit.wikimedia.org/r/136351 (owner: 10Ori.livneh) [17:35:15] (03PS1) 10Ori.livneh: role::mediawiki::common: remove if $::realm == production guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/136355 [17:42:06] (03CR) 10Dzahn: "do we have labs icinga that is similar enough to prod icinga for nrpe::monitor_service to work there?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136355 (owner: 10Ori.livneh) [17:47:02] (03PS1) 10Dzahn: ensure passwords dir exists on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136360 [17:49:10] (03CR) 10Dzahn: [C: 032] ensure passwords dir exists on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136360 (owner: 10Dzahn) [17:51:08] (03CR) 10Dzahn: "doesn't have the researchers group yet, needs switching to yaml, but then it will" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136360 (owner: 10Dzahn) [17:51:47] (03Abandoned) 10Dzahn: ensure passwords directory exists on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136341 (owner: 10Dzahn) [18:01:57] (03CR) 10Aaron Schulz: [C: 031] Move beta-specific configs from role::mediawiki::common to role::mediawiki::appserver::beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/136353 (owner: 10Ori.livneh) [18:05:42] (03CR) 10Ori.livneh: "@Dzahn: yep, 10.68.16.195 is the target, per modules/nrpe/manifests/init.pp" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136355 (owner: 10Ori.livneh) [18:09:36] (03PS2) 10coren: adding bonded port labstore1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/135571 (owner: 10Cmjohnson) [18:11:01] (03CR) 10Dzahn: [C: 031] "ori, alright, i was more curious about the current state of labs icinga because in the past they were entirely separate unfortunately" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136355 (owner: 10Ori.livneh) [18:11:38] Help from anyone who has the credentials and skills to play with our eqiad switches? [18:12:47] (03PS1) 10Dzahn: admin yaml for zirconium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136363 [18:13:16] mutante: any +1s you can spare for the dependent commits much appreciated :) [18:14:40] (03CR) 10BryanDavis: [C: 031] "I'm ok with not having access to zirconium. I'll ask for access if and when I need it again." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136363 (owner: 10Dzahn) [18:14:42] modules/beta/files/dsh/group/ ?? what [18:15:00] ask bd808 [18:15:39] (03CR) 10coren: [C: 032] "The time is now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135571 (owner: 10Cmjohnson) [18:20:21] (03CR) 10Dzahn: [C: 031] "moving it to the beta class seems to make sense" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136353 (owner: 10Ori.livneh) [18:20:26] "enwiki: [c36702ef] /wiki/Main_Page Exception from line 468 of /usr/local/apache/common-local/php-1.24wmf6/includes/cache/LocalisationCache.php: No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php." [18:20:29] heh, wonder what that was [18:22:13] AaronSchulz: wild guess.. while rebuildLocalisationCache.php is recreating it? [18:22:52] "No handler for model '�w�i�k�i�t�e�x�t' registered in $wgContentHandlers" [18:37:58] these errors get wierder every day :) [18:37:59] AaronSchulz: hmm, that's unicode "replacement character" in between "w i k i t e x t" [18:37:59] http://www.fileformat.info/info/unicode/char/0fffd/index.htm [18:37:59] used to replace an incoming character whose value is unknown or unrepresentable in Unicode .. shrug ? [18:37:59] that's very s�t�r�a�n�g�e [18:37:59] PROBLEM - Host labstore1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:37:59] that would be Coren [18:37:59] AaronSchulz: The missing en l10n cache was the partially setup mw1163 host [18:37:59] It was fixed about 4 hours up the backscroll. [18:37:59] gerrit bot temp. down due to labs maintenance [18:37:59] grrrit-wm: right [18:37:59] mutante: Planned maintenance window [18:37:59] mutante: Although it shouldn't have resulted in the net being completely down for more than a few secs. [18:38:00] mutante: Do you have the creds and skills to fiddle with equiad switch config? [18:38:00] Coren: alright [18:38:00] It wasn't ready for my change, I'll have to rollback unless I find someone to do it shortly. [18:38:00] Coren: i have never connected before.. nope [18:38:00] it would start out with needing the creds [18:38:00] Yeah, I can't find enough documentation to do it myself with any mount of confidence. [18:38:00] * Coren rolls back. [18:38:00] RECOVERY - Host labstore1001 is UP: PING OK - Packet loss = 0%, RTA = 2.37 ms [18:39:09] mwalker: hey, Special:BannerRandom, can I ask you about that? [18:39:55] basically, a GET to it is blocking the complete loading of pages on Beta Cluster [18:42:21] (03PS2) 10Dzahn: admin yaml for zirconium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136363 [18:42:45] (03CR) 10Dzahn: [C: 032] admin yaml for zirconium [operations/puppet] - 10https://gerrit.wikimedia.org/r/136363 (owner: 10Dzahn) [18:42:53] (03PS1) 10Dzahn: zirconium yaml: it's 'admin', not 'admins', duh [operations/puppet] - 10https://gerrit.wikimedia.org/r/136367 [18:42:55] (03CR) 10jenkins-bot: [V: 04-1] zirconium yaml: it's 'admin', not 'admins', duh [operations/puppet] - 10https://gerrit.wikimedia.org/r/136367 (owner: 10Dzahn) [18:42:57] (03PS2) 10Dzahn: zirconium yaml: it's 'admin', not 'admins', duh [operations/puppet] - 10https://gerrit.wikimedia.org/r/136367 [18:43:00] mwalker: nvm, now another request is the one blocking, not Special:BannerRandom [18:45:27] (03CR) 10Dzahn: [C: 032] zirconium yaml: it's 'admin', not 'admins', duh [operations/puppet] - 10https://gerrit.wikimedia.org/r/136367 (owner: 10Dzahn) [18:47:24] greg-g, yay heisenbugs! [18:47:29] yeah :/ [18:47:43] beta cluster is just one big fancy heisenbug [18:47:52] But will it blend? [18:48:03] :P [18:48:04] those ciscos are kinda big and clunky [18:48:09] we'd need a BIG blender [18:49:55] Memcached error for key "commonswiki:messages:zh-cnmzg4othgqja1revbmxocndy2oue1qjqyrjkyntm4odk4rkiwnurfqtmzniodntm3mgqxnzk1oguxmjc3neigowe2mgq5yjjhothiodc0ztg1n2rmntuynzbmymy5owfijrnpavgayiaxmdy5odhhnzu0zdmxzju2ymriowqzytfiyjgxzde5y2i3o9bs5chyee5ld3nfrl9szwn0yw5nbgv6c25ld3mucxeuy29t:hash" on server ":": A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE [18:49:57] wtf? [18:50:53] that is a really long key! [18:58:54] (03PS1) 10Dzahn: admin yaml for osmium (hhvm box) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136425 [19:07:35] (03PS1) 10Rush: admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 [19:08:02] (03PS2) 10Dzahn: admin yaml for osmium (hhvm box) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136425 [19:12:33] (03CR) 10Dzahn: admin yaml stat* and analytics fixups (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 (owner: 10Rush) [19:13:53] PROBLEM - NTP on rcs1002 is CRITICAL: NTP CRITICAL: Offset unknown [19:17:20] (03CR) 10Dzahn: admin yaml stat* and analytics fixups (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 (owner: 10Rush) [19:18:53] RECOVERY - NTP on rcs1002 is OK: NTP OK: Offset -0.005000114441 secs [19:19:09] (03CR) 10Aaron Schulz: [C: 031] merge bits appservers into appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136317 (owner: 10Filippo Giunchedi) [19:19:11] (03CR) 10Dzahn: [C: 032] admin yaml for osmium (hhvm box) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136425 (owner: 10Dzahn) [19:19:46] (03PS1) 10Spage: Enable Flow on mw:Talk:Cite-from-id [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136428 [19:21:00] (03PS1) 10Dzahn: admin yaml for titanium (archiva) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136432 [19:22:10] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for titanium (archiva) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136432 (owner: 10Dzahn) [19:22:22] (03PS2) 10Dzahn: admin yaml for titanium (archiva) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136432 [19:22:24] (03CR) 10jenkins-bot: [V: 04-1] admin yaml for titanium (archiva) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136432 (owner: 10Dzahn) [19:23:08] (03PS3) 10Dzahn: admin yaml for titanium (archiva) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136432 [19:24:25] user documention link on archiva is 404 :p [19:24:59] mark: if you are still around… is there a wiki page tracking Ops goals for the next year? Since I just wrote my personal goals I may as well fill in some for Labs while I'm thinking of it [19:25:51] (03PS1) 10Ori.livneh: provision role::rcstream on rcs1001 & rcs1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 [19:26:12] (03CR) 10Dzahn: [C: 032] admin yaml for titanium (archiva) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136432 (owner: 10Dzahn) [19:26:46] it has moved mutante! [19:26:49] new version of a rchiva it looks like [19:26:53] dunno why they got rid of the old docs [19:26:55] http://archiva.apache.org/docs/2.0.1/ [19:27:18] greg-g: I'm going to update scap on tin and the prod cluster to get back on the python sync-* scripts and add some debug logging. [19:27:23] ooo new feature [19:27:23] ▪ [MRM-1089] - LDAP Support and Documentation [19:27:24] yay! [19:27:50] ottomata: ah, gotcha, thx, yea, we have 2.0.0 it seems [19:28:09] bd808|deploy: cool [19:29:22] ottomata: just switched titanium over to admin yaml [19:30:58] !log Scap updated to c4204dd [19:31:04] Logged the message, Master [19:32:28] cool, thanks mutante [19:33:37] !log bd808 Synchronized README: Testing sync-file (duration: 00m 06s) [19:33:40] Logged the message, Master [19:33:50] (03CR) 10Dzahn: provision role::rcstream on rcs1001 & rcs1002 (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 (owner: 10Ori.livneh) [19:34:09] mutante: thanks, will do that oto [19:34:11] *too [19:34:15] (03PS1) 10QChris: Force Hive to pick HCatalog from local file system [operations/puppet] - 10https://gerrit.wikimedia.org/r/136435 [19:34:24] ori: :) [19:35:56] (03PS2) 10Ori.livneh: provision role::rcstream on rcs1001 & rcs1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 [19:36:33] andrewbogott: Coren : is "admins::labs" deprecated? it includes one account which says it's revoked [19:36:46] (03CR) 10Ottomata: [C: 032 V: 032] Force Hive to pick HCatalog from local file system [operations/puppet] - 10https://gerrit.wikimedia.org/r/136435 (owner: 10QChris) [19:36:57] i can make a new group or just remove it from virt0 [19:37:18] mutante: It's not deprecated but it is, as you say, empty. [19:37:20] mutante: You can axe it; it was a one-off for a contractor that never actually worked. [19:37:24] What are you hoping to accomplish? [19:38:02] goals! [19:38:14] andrewbogott: convert virt0 to admin yaml [19:38:26] (03CR) 10Ottomata: admin yaml stat* and analytics fixups (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 (owner: 10Rush) [19:38:33] ah, yeah, you can scrap that class entirely if you want [19:38:50] cool, thanks [19:39:16] we can always create it again in the new way [19:40:08] andrewbogott: Did we ever figure out what happend to the dude? [19:40:23] Not so far as I know :( [19:40:25] something about a carpet [19:40:42] (03CR) 10Ori.livneh: "verified the range() logic:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 (owner: 10Ori.livneh) [19:40:46] !log bd808 Synchronized wmf-config/db-eqiad.php: Touched db-eqiad.php to test sync-file (duration: 00m 03s) [19:40:51] Logged the message, Master [19:41:21] Interesting [19:41:35] bd808|deploy: didn't work for db-eqiad, huh? [19:41:43] ori: Nope [19:41:55] README is top-level [19:41:59] db-eqiad is in wmf-config [19:42:11] (03PS1) 10Dzahn: delete admins::labs, has 0 members currently [operations/puppet] - 10https://gerrit.wikimedia.org/r/136437 [19:42:13] also the hyphen in the file name [19:43:13] two points from the rsync man page: [19:43:13] 1) "If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the hierarchy. This is particularly important when using a trailing ’*’ rule." [19:43:13] 2) "if the pattern starts with a / then it is anchored to a particular spot in the hierarchy of files, otherwise it is matched against the end of the pathname." [19:43:45] s.pringle's suspicion was the include/exclude stuff [19:43:53] (03PS2) 10Dzahn: delete admins::labs, has 0 members currently [operations/puppet] - 10https://gerrit.wikimedia.org/r/136437 [19:44:02] I do the / anchoring, but maybe the exclude=* is blocking [19:44:27] maybe they should be reordered? [19:44:33] I'm going to try with another file in the sub dir [19:44:38] good idea [19:45:02] Just to rule out `*-*` as an issue [19:45:11] nod [19:45:47] !log bd808 Synchronized wmf-config/throttle.php: Touched to test sync-file (duration: 00m 05s) [19:45:52] Logged the message, Master [19:45:53] (03CR) 10Ori.livneh: [C: 031] merge bits appservers into appservers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136317 (owner: 10Filippo Giunchedi) [19:45:55] (03PS1) 10Dzahn: admin yaml for virt0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136438 [19:46:32] (03CR) 10Andrew Bogott: [C: 031] "Removing this is just fine -- it's been around long enough to revoke what needed revoking." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136437 (owner: 10Dzahn) [19:46:55] ori: Same result. My magic isn't working for files in sub directories [19:46:59] (03PS2) 10Rush: admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 [19:47:10] (03PS1) 10Dzahn: admin yaml for virt1000 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136439 [19:47:26] I'll switch the deployed version back to bash and work up a fix [19:47:52] bd808|deploy: http://masstransmit.com/garage_blog/rsync-quirks/ [19:47:57] "Another issue is subdirectories. If you want to only include a file nested down a level, images/pic.gif, you have to also include the parent directory separately:" [19:48:02] "rsync -rav --include=images/ --include=images/pic.gif --exclude=* user@source::modulename /path/to/local/destination" [19:48:23] Yeah. That was my thought as well [19:48:27] Should be easy to do [19:48:29] bd808|deploy: so it seems that for sync-file, you just need to add an --include arg that is the basename of the arg [19:48:29] yeah [19:48:37] so don't revert, let's try it [19:49:08] (03PS3) 10Rush: admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 [19:49:10] Okey doke. [19:50:29] (03PS4) 10Rush: admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 [19:50:50] (03PS1) 10Dzahn: add yaml group for PDF QA users [operations/puppet] - 10https://gerrit.wikimedia.org/r/136440 [19:52:34] (03CR) 10RobH: [C: 031] provision role::rcstream on rcs1001 & rcs1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 (owner: 10Ori.livneh) [19:53:00] (03PS3) 10Ori.livneh: provision role::rcstream on rcs1001 & rcs1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 [19:53:27] (03PS2) 10Dzahn: add yaml group for PDF QA users, switch tantalum [operations/puppet] - 10https://gerrit.wikimedia.org/r/136440 [19:53:39] robh: and https://gerrit.wikimedia.org/r/#/c/132429/ (_joe_ LGTM'd, i am aware of the path issue). and then you're really free of me. [19:54:35] (03CR) 10RobH: [C: 031] "my +1 doesn't count towards the python script part of this patch set, im not proficient enough in python for that." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [19:54:40] yea i cannot really review the python script [19:54:43] but i +1 the rest of it [19:55:07] robh: the patch deletes the python file :) [19:55:20] it's in another repo [19:55:29] oh, yea.... dur [19:55:32] its on left, my bad [19:55:44] * ori hugs robh [19:55:46] you're awesome [19:55:48] i'm so happy [19:55:48] weee [19:55:49] thanks! [19:55:54] (03CR) 10RobH: "corrected, its removing the python script so good enough for me" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [19:56:05] welcome [19:56:22] (03PS4) 10Ori.livneh: Move rcstream server implementation to external repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 [19:56:29] (03CR) 10Ori.livneh: [C: 032 V: 032] Move rcstream server implementation to external repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 (owner: 10Ori.livneh) [19:56:40] (03PS4) 10Ori.livneh: provision role::rcstream on rcs1001 & rcs1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 [19:56:46] (03CR) 10Ori.livneh: [C: 032 V: 032] provision role::rcstream on rcs1001 & rcs1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136433 (owner: 10Ori.livneh) [20:01:56] (03PS5) 10Rush: admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 [20:03:44] (03PS1) 10Ori.livneh: rcstream: handle fixnum $backends gracefull in rcstream.nginx.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/136442 [20:04:18] robh: i lied [20:04:27] but that's really it [20:06:02] (03PS6) 10Rush: admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 [20:09:16] (03CR) 10Dzahn: [C: 031] ""statistics-users" _might_ turn out to be identical to "researchers" group, but we can figure that out later. people like handrade show up" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 (owner: 10Rush) [20:10:23] (03CR) 10Ottomata: [C: 032] admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 (owner: 10Rush) [20:10:26] (03PS3) 10Dzahn: delete admins::labs, has 0 members currently [operations/puppet] - 10https://gerrit.wikimedia.org/r/136437 [20:11:12] (03CR) 10Ottomata: "I dont' thikn statistics-users == researchers. E.g. milimetric and qchris are not researchers." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 (owner: 10Rush) [20:11:39] (03CR) 10Dzahn: [C: 032] delete admins::labs, has 0 members currently [operations/puppet] - 10https://gerrit.wikimedia.org/r/136437 (owner: 10Dzahn) [20:15:19] (03PS2) 10Dzahn: admin yaml for virt0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136438 [20:21:14] (03CR) 10Dzahn: [C: 032] admin yaml for virt0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136438 (owner: 10Dzahn) [20:24:49] https://gerrit.wikimedia.org/r/#/c/136442/ is a 5-character commit, could someone +1? [20:35:16] !log Scap updated to 6c0c4f0 [20:35:20] Logged the message, Master [20:35:27] (03CR) 10Ottomata: [C: 031] rcstream: handle fixnum $backends gracefull in rcstream.nginx.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/136442 (owner: 10Ori.livneh) [20:35:34] ori ^ [20:35:38] ottomata: :) thank you [20:35:38] !log bd808 Synchronized wmf-config/throttle.php: Touched to test sync-file (duration: 00m 04s) [20:35:43] (03CR) 10Ori.livneh: [C: 032] rcstream: handle fixnum $backends gracefull in rcstream.nginx.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/136442 (owner: 10Ori.livneh) [20:35:43] Logged the message, Master [20:36:03] !log bd808 Synchronized wmf-config/db-eqiad.php: Touched db-eqiad.php to test sync-file (duration: 00m 04s) [20:36:08] Logged the message, Master [20:36:41] bd808|deploy: verdict? [20:36:50] Seems to work! [20:36:58] I'm going to try one deep path to make sure [20:37:03] nice! [20:37:39] (03CR) 10Dzahn: [C: 032] "worked fine on virt0" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136439 (owner: 10Dzahn) [20:37:43] (03PS2) 10Dzahn: admin yaml for virt1000 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136439 [20:39:00] !log bd808 Synchronized php-1.24wmf7/extensions/Elastica/Elastica/test/lib/Elastica/Test/IndexTest.php: Touched to test sync-file (duration: 00m 04s) [20:39:03] Logged the message, Master [20:39:36] ori: \o/ works [20:40:08] bd808|deploy: maybe one or two other test cases? i was thinking: one with absolute path, one with some relative path traversal ('../../php-1.24wmf7/extensions', etc.) [20:40:28] just to be sure [20:40:44] Absolute path won't work. It never worked. [20:40:54] I can try a relative path though [20:41:19] actually that won't work either [20:41:21] (03CR) 10Dzahn: [C: 032] admin yaml for virt1000 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136439 (owner: 10Dzahn) [20:41:32] bd808|deploy: but the same things that worked before work now, right? [20:42:15] I don't think so. Let me look at the bash script again [20:43:16] ori: The logic was always "DESTDIR=$MW_COMMON/$DIR" [20:44:47] * ori nods [20:44:51] (03CR) 10Dzahn: "Hoo man: the logfile destinations would be fixed in I2cfbb34d0d186" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136118 (owner: 10Dzahn) [20:45:11] (03PS1) 10Chad: Lower search suggestion cache to 3 hours [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136449 [20:45:23] (03CR) 10Dzahn: [C: 031] "after Change-Id: I70bc7bb631b6d953 ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [21:01:52] PROBLEM - Redis on rcs1001 is CRITICAL: Connection refused [21:02:02] PROBLEM - Redis on rcs1002 is CRITICAL: Connection refused [21:02:51] that's me [21:02:59] it's not in service, i'll turn off the alerts [21:04:26] greg-g: I'm done messing around with sync-* [21:05:02] bd808: just saw the email :) [21:07:52] RECOVERY - Redis on rcs1001 is OK: TCP OK - 0.001 second response time on port 6379 [21:08:02] RECOVERY - Redis on rcs1002 is OK: TCP OK - 0.007 second response time on port 6379 [21:09:20] (03PS1) 10Ori.livneh: Add $wgRCFeeds entries for RCStream on rcs100[12].eqiad.wmnet [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136455 [21:10:57] (03PS1) 10Dzahn: admin yaml for labnet1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136456 [21:12:45] (03CR) 10Rush: [C: 031] admin yaml for labnet1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136456 (owner: 10Dzahn) [21:20:37] (03PS1) 10Rush: admin yaml rhenium.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136460 [21:20:59] (03CR) 10Rush: [C: 032 V: 032] admin yaml rhenium.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/136460 (owner: 10Rush) [21:21:24] !log Zuul restarted (somehow by mistake) [21:21:28] Logged the message, Master [21:21:40] well it is actually attempting to restart ..... [21:25:27] (03PS1) 10Rush: add jkrauska user to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136461 [21:25:40] (03CR) 10Rush: [C: 032 V: 032] add jkrauska user to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136461 (owner: 10Rush) [21:26:13] (03PS2) 10Dzahn: admin yaml for labnet1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136456 [21:26:32] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [21:29:36] (03CR) 10Dzahn: [C: 032] admin yaml for labnet1001 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136456 (owner: 10Dzahn) [21:29:39] (03PS1) 10Rush: admin yaml terbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136462 [21:30:09] (03PS2) 10Rush: admin yaml terbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136462 [21:30:17] (03CR) 10Rush: [C: 032 V: 032] admin yaml terbium.eqiad.wmnet [operations/puppet] - 10https://gerrit.wikimedia.org/r/136462 (owner: 10Rush) [21:31:39] (03PS1) 10Rush: admin yaml terbium.eqiad.wmnet typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/136463 [21:31:47] (03PS2) 10Rush: admin yaml terbium.eqiad.wmnet typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/136463 [21:31:52] (03CR) 10Rush: [C: 032 V: 032] admin yaml terbium.eqiad.wmnet typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/136463 (owner: 10Rush) [21:36:17] (03PS1) 10Rush: admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 [21:36:38] (03PS2) 10Rush: admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 [21:37:10] ori, in labs; do we have a statsd server I can send things to? [21:37:16] or is that a production only thing? [21:39:45] mwalker: we do, bd808 configured it [21:39:52] i think it's deployment-bastion [21:40:29] oh nope [21:40:31] It's on deployment-graphite.eqiad.wmflabs [21:40:33] deployment-graphite.eqiad.wmflabs [21:40:34] that [21:40:48] awesome, does that have a webfrontend? [21:40:54] Although I'm not 100% sure it works [21:40:58] heh [21:41:04] bd808: exactly like prod ;) [21:41:08] mwalker: graphite [21:41:20] mwalker: I mean https://graphite-beta.wmflabs.org/ [21:41:39] And it's broken [21:41:41] I need a user/pass there apparently that has to come from you [21:41:56] See the office page it mentions [21:41:57] oh; which is on your office wiki page if I learned how to read [21:42:02] :) [21:42:16] And then log in and figure out why it's broken [21:42:31] I see how it is :D [21:42:44] payback for all the trouble I gave you with deployments yesterday [21:43:04] I did see your promise of cookies :) [21:43:17] do you like choco? [21:43:40] or rather; what is your favourite type of cookie? [21:43:51] looks like the wsgi server is dead on that box at the moment [21:44:05] mwalker: Oatmeal chocolate chip [21:44:27] Or a beer when I'm in the office next week [21:44:46] both of these things can happen [21:45:06] mwalker: https://graphite-beta.wmflabs.org/dashboard/ works now [21:45:21] Or at least it renders a page [21:45:23] heh [21:45:48] ok; /me prepares to send some junk its way [21:46:44] slightly off-topic query (it's an en.wp issue, but not really our problem) - does IMDb have a bugzilla ? [21:47:41] NotASpy: https://getsatisfaction.com/imdb?topics[type]=problem [21:48:17] * bd808 has never gotten satisfaction from getsatisfaction [21:49:25] OK. Their mobile site doesn't like the URLs we use on en.wp. The normal site converts http://www.imdb.com/name/nm115/ into http://www.imdb.com/name/nm0000115/ but the mobile site just 404s. [21:50:40] I suppose we could tweak our template, but there's bound to be a lot of links on the web being broken on mobile devices [21:50:42] (03PS3) 10Rush: admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 [21:51:00] (03PS4) 10Rush: admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 [21:54:25] (03CR) 10Greg Grossmeier: [C: 031] "Safe for a Friday deploy. Ori promised to watch and revert if any problems arise." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136455 (owner: 10Ori.livneh) [21:54:57] (03CR) 10Chad: [C: 031] Add $wgRCFeeds entries for RCStream on rcs100[12].eqiad.wmnet [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136455 (owner: 10Ori.livneh) [21:55:17] (03CR) 10Ori.livneh: [C: 032] Add $wgRCFeeds entries for RCStream on rcs100[12].eqiad.wmnet [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136455 (owner: 10Ori.livneh) [21:55:25] (03Merged) 10jenkins-bot: Add $wgRCFeeds entries for RCStream on rcs100[12].eqiad.wmnet [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136455 (owner: 10Ori.livneh) [21:57:32] !log ori Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 03s) [21:57:37] Logged the message, Master [21:59:50] (03PS1) 10Dzahn: admin yaml for wtp* (parsoid) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136471 [22:00:04] !log ori sync'd out a config change to Add $wgRCFeeds entries for RCStream on rcs100[12].eqiad.wmnet [22:00:08] Logged the message, Master [22:00:15] * greg-g fills in the (no message) bit [22:00:44] d'oh, thanks [22:00:57] (03PS1) 10Dzahn: rm old admins::parsoid class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136476 [22:00:58] :) :) [22:03:19] (03PS1) 10Rush: admin yaml wtp* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136477 [22:04:21] (03CR) 10jenkins-bot: [V: 04-1] admin yaml wtp* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136477 (owner: 10Rush) [22:06:49] (03PS2) 10Dzahn: admin yaml for wtp* (parsoid) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136471 [22:07:50] (03CR) 10Dzahn: [C: 032] admin yaml for wtp* (parsoid) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136471 (owner: 10Dzahn) [22:08:25] (03PS1) 10Rush: admin yaml for tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/136479 [22:08:42] (03Abandoned) 10Rush: admin yaml wtp* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136477 (owner: 10Rush) [22:08:50] (03PS2) 10Rush: admin yaml for tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/136479 [22:08:57] (03CR) 10Rush: [C: 032 V: 032] admin yaml for tin [operations/puppet] - 10https://gerrit.wikimedia.org/r/136479 (owner: 10Rush) [22:10:30] (03PS1) 10Dzahn: rm admins::ldap, replaced by ldap-admins yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136480 [22:10:36] (03CR) 10jenkins-bot: [V: 04-1] rm admins::ldap, replaced by ldap-admins yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136480 (owner: 10Dzahn) [22:19:39] (03PS2) 10Dzahn: rm admins::ldap, replaced by ldap-admins yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136480 [22:26:38] ori, I'm sending a statsd packet that looks like "ocg.pdftest_count:1|c"; but for some reason graphite seems to be treating it as a gauge and not incrementing -- so the time series just has a series of whatever I put in as the value -- do you have any insight onto what I could be doing wrong? [22:27:18] mwalker: you're reading the protocol spec correctly, which the txstatsd folks aren't [22:27:26] change it to an 'm' [22:27:34] i think -- let me verify, sec [22:27:50] yes [22:27:59] send ocg.pdftest:1|m' [22:28:12] no _count, it'll be added automatically [22:28:28] that's a meter, per https://github.com/b/statsd_spec#meters [22:28:39] you'll get a count out of that too [22:29:53] ah; interesting; I was reading the Etsy implementation of the statsd protocol assuming it was canonical [22:50:59] (03PS1) 10Yurik: Removed obsolete $wmgZeroDisableImages and $wgZeroDisableImages [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136487 [22:58:00] * YuviPanda waves at ori [23:09:48] (03PS6) 1020after4: initial commit for a phabricator module (WIP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132505 (owner: 10Dzahn) [23:20:12] (03CR) 10BryanDavis: initial commit for a phabricator module (WIP) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132505 (owner: 10Dzahn) [23:35:38] (03CR) 10DarTar: "JanZerebecki asked me to comment on the analytics requirements for this patch. if I understand csteipp’s request we’re already collecting " [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki) [23:37:01] (03CR) 1020after4: initial commit for a phabricator module (WIP) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132505 (owner: 10Dzahn) [23:51:49] (03CR) 1020after4: initial commit for a phabricator module (WIP) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132505 (owner: 10Dzahn) [23:54:10] ori, this doesn't make a lot of sense; the code in github (and I assume in launchpad...) for txstatsd has counter implemented like one would expect. the behaviour we're seeing seems like bug -- something somewhat unfortunate though is that Sidnei (the author) hasn't been seen on github since nov. [23:55:44] mwalker: i'm just about to head upstairs to 6th, but let's figure that out [23:56:19] we could try and grab chasemp, i think he has a replacement that he plans to roll out [23:58:08] mwalker: note that we're using the configurableprocessor class, not the processor [23:58:10] mwalker: https://github.com/sidnei/txstatsd/blob/master/txstatsd/server/configurableprocessor.py [23:58:34] (03PS1) 10Ori.livneh: Add service check for rcstream backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/136502