[00:01:07] superm401: did you change nicks? [00:01:22] jeremyb_, no, why? [00:01:38] i thought i remembered another nick [00:01:46] anyway, you might want to actually !log that [00:01:50] instead of just saying it [00:01:56] ori-l: isAnon +1 [00:02:09] (e.g. mflaschen or something) [00:02:18] jeremyb_, doesn't scap automatically log when you pass a message? [00:02:27] Yes [00:02:28] yeah, forgot about that one earlier [00:02:30] It does now [00:02:32] Didn't always [00:03:05] ok... i can never remember what's automatic [00:03:24] jeremyb_, mattflaschen on Gerrit mflaschen for email and a a couple obscure other things [00:04:31] PROBLEM - Puppet freshness on colby is CRITICAL: Puppet has not run in the last 10 hours [00:04:32] superm401: i was nearly certain i had seen you on IRC at some point before. maybe i imagined it [00:04:33] mutante: Any idea what results.labs.wikimedia.org is? Resolves to 131.152.80.208.in-addr.arpa name = recursor0.wikimedia.org. [00:04:53] HTTPSEverywhere has (?:ipv4|ipv6and4|results)\.labs [00:05:13] The other 2 can be dealt with when we get answers to your email [00:05:27] Reedy: i asked ops list about those 4 [00:05:35] like a little while ago..pending reply [00:05:38] Oh, duh [00:05:42] Reedy: that's not what it resolves to for me [00:05:43] I didn't see results on the end [00:06:13] ipv6and4.labs.wikimedia.org. [00:06:15] whatever [00:06:23] binasher: on the whole it looks like client-side rendering time dwarfs network latency [00:06:23] and 91.198.174.7 [00:06:29] ip4.labs and results.labs have the same IP [00:06:37] so pretty sure it is part of that [00:07:29] [00:07:35] We should update this regex [00:07:41] ganglia3? has HTTPS now [00:07:48] so does wikitech [00:08:20] so does shop [00:08:26] What's harmon? [00:09:14] not found in site.pp , is up and running, but nothing in motd tells me right away..emm [00:10:05] I'm killing (?:commonsprototype|mlqt|mobile)\.tesla\.usability as none of them resolve [00:10:07] it runs 'lldpd' [00:10:09] ori-l: although, does responseEnd only cover the initial response, and not the fetching of all http resources? of so, that would be counted in rendering [00:10:16] LLDP is to provide an inter-vendor compatible mechanism to deliver Link-Layer notifications to adjacent network devices. [00:10:22] apt-cache show lldpd [00:10:34] binasher: dunno, looking it up [00:10:45] Reedy: yea, removed tesla stuff from DNS to cleanup a while ago [00:10:50] jobs redirects to foundation [00:10:52] except one... the controller itself afair [00:11:05] because Ryan might still need that to actually shut down stuff [00:11:20] ori-l: i've been reading through http://dvcs.w3.org/hg/webperf/raw-file/tip/specs/NavigationTiming/Overview.html, but still not sure [00:13:55] milimetric noticed a while back that the nav timing events correspond to markers in the timeline view in chromium's dev tools, could be useful to cross-reference. my head is exploding from today's deployment though, not able to think through it. [00:16:02] jeremyb_, scap doesn't log anything when it starts, only when it gets to syncing, and mw-update-l10n before that can take a while. [00:17:13] (?:(?:commons|de|en|test)\.)?prototype [00:17:28] !log mflaschen Started syncing Wikimedia installation... : Deployment for Mobile and E3 [00:17:34] Logged the message, Master [00:18:13] [00:18:28] he's speaking in regexes again [00:18:30] 21 minutes later, see what I mean :) [00:18:31] * ori-l sprinkles holy water [00:18:32] flaggedrevssandbox and the next are dead too [00:19:08] nagios points to incinga which also has https [00:21:30] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [00:22:01] I got on add error during scap: [00:22:05] "mw10.pmtpa.wmnet: rsync: send_files failed to open "/php-1.21wmf11/.git/modules/extensions/FormPreloadPostCache/index.lock" (in common): Permission denied (13)" [00:22:15] Some kind of permissions issue [00:22:28] "^http://(?:apt|bayes|bayle|brewster|bug-attachment|cs|cz|dataset2|download|dumps|ekrem|emery|ersch|etherpad|harmon|hume|(?:ipv4|ipv6and4|results)\.labs|m|project2|search|sitemap|snapshot3|stafford|statu?s|torrus|ubuntu|wiki-mail|wlm|yongle)\.wikimedia\.org" [00:22:51] what is the issue with cs and cz? [00:23:00] I suspect they're hosted offsite... [00:23:15] oh, yea, true, they redirect to their own TLDs [00:23:17] Yeah, they resolve elsewhere [00:23:52] etherpad has https [00:23:53] That's quite a reduction in number of exceptions though [00:24:53] you can also remove "m" [00:24:58] mutante: that's not quite accurate [00:24:58] https://m.wikimedia.org/ [00:25:31] what is project2? [00:25:35] jeremyb_: which part [00:25:41] mutante: rt 2751 [00:26:47] That's etherpad.. [00:27:03] New patchset: Ori.livneh; "Add NavigationTiming to extension-list" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53903 [00:27:33] 15 00:23:52 < mutante> etherpad has https [00:27:44] Ah [00:27:53] yeah, it loads originally, then goes back to http [00:27:54] jeremyb_: ugh, confirmed, well .. kind of .. it redirects from https to http when creating a new pad.. but also.. if i just change the URL back to https.. it stays :p [00:28:10] mutante: but the ajax is over HTTP? [00:28:30] iirc [00:28:35] umpf.. Lost connection with the EtherPad synchronization server. This may be due to a loss of network connectivity. [00:28:40] yea, as described [00:29:10] so, "has https" may be a bit of a stretch :) [00:29:12] but we can remove "m",, right [00:29:28] yea, agreed [00:30:59] Okay, there were a few scap errors in the "Updating rsync proxies..." section. [00:31:05] Reedy: fwiw.. https://wikitech.wikimedia.org/wiki/Httpsless_domains [00:31:16] But it seems to be back to normal operation on mw and srv machines. [00:31:53] "^http://(?:apt|bayes|bayle|brewster|bug-attachment|cs|cz|dataset2|download|dumps|ekrem|emery|ersch|etherpad|harmon|hume|(?:ipv4|ipv6and4|results)\.labs|m|project2|search|sitemap|snapshot3|stafford|statu?s|torrus|ubuntu|wiki-mail|wlm|yongle)\.wikimedia\.org" [00:31:58] I think I'll submit that for now then [00:32:00] the FIXME on rt is also fixed , isnt it [00:32:12] Reedy: cool [00:32:26] maybe we should ask mobile about "m" [00:33:31] marking RT and wikitech as fixed [00:33:32] Reedy: wikivoyage too? or did it already? [00:33:46] Eh? [00:33:53] Ages ago [00:33:57] yea [00:33:58] this is HTTPS everywhere, right? [00:34:05] oh, well it wasn't showing for me in chrome [00:34:13] https://github.com/reedy/https-everywhere/compare/master...reducewmfexceptions [00:34:14] idk how long it takes to propagate [00:34:14] Yeah [00:34:16] to the stores [00:34:22] [00:34:22] [00:41:20] New patchset: Reedy; "Bug 39482 - Rename "chapcomwiki" to "affcomwiki"" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53894 [00:41:29] ^ jenkins seems to be workign now [00:43:20] project2 - is in DNS but that IP does not appear to be assigned anywhere and it got removed from reverse DNS [00:43:30] !log mflaschen Finished syncing Wikimedia installation... : Deployment for Mobile and E3 [00:43:38] Logged the message, Master [00:45:38] And console confirms, scap is done. [00:46:17] New patchset: Reedy; "Bug 39482 - Rename "chapcomwiki" to "affcomwiki"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53892 [00:46:24] mutante: Fancy testing https://gerrit.wikimedia.org/r/#/c/53894 [00:47:24] !log olivneh synchronized php-1.21wmf11/extensions/NavigationTiming/modules/ext.navigationTiming.js 'Additional fields for NavTiming' [00:47:30] Logged the message, Master [00:47:52] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:49:59] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53903 [00:51:59] New patchset: Reedy; "Bug 39482 - Rename "chapcomwiki" to "affcomwiki"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53892 [00:55:50] Reedy: Warning: DocumentRoot [/usr/local/apache/common/docroot/affcom] does not exist [00:56:00] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [00:56:57] !log reedy synchronized wmf-config/InitialiseSettings.php 'chapcomwiki readonly' [00:57:05] Logged the message, Master [01:00:25] New patchset: Pyoungmeister; "WIP: first bit of stuff for taming the mysql module and making the SANITARIUM" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53907 [01:00:27] New patchset: Reedy; "Bug 39482 - Rename "chapcomwiki" to "affcomwiki"" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53894 [01:06:31] New patchset: Reedy; "Bug 39482 - Rename "chapcomwiki" to "affcomwiki"" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53894 [01:09:41] !log olivneh synchronized php-1.21wmf11/extensions/NavigationTiming 'Missed a file in earlier sync' [01:10:05] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53894 [01:10:40] Logged the message, Master [01:10:44] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53892 [01:11:01] syncing [01:12:16] !log reedy synchronized docroot [01:12:22] Logged the message, Master [01:12:39] dzahn is doing a graceful restart of all apaches [01:13:22] !log dzahn gracefulled all apaches [01:13:29] Logged the message, Master [01:13:47] !log reedy synchronized wmf-config/InitialiseSettings.php [01:13:56] Logged the message, Master [01:15:13] RECOVERY - MySQL Replication Heartbeat on db71 is OK: OK replication delay 0 seconds [01:15:22] RECOVERY - MySQL Slave Delay on db71 is OK: OK replication delay 0 seconds [01:21:10] !log olivneh synchronized php-1.21wmf11/extensions/GettingStarted/CategoryRoulette.php 'Fix to use single-parameter version of SRANDMEMBER' [01:21:18] Logged the message, Master [01:27:00] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [01:27:04] Logged the message, Master [01:28:12] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [01:28:18] Logged the message, Master [01:34:13] !log reedy synchronized docroot [01:34:21] Logged the message, Master [01:34:53] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [01:34:59] Logged the message, Master [01:35:25] !log reedy synchronized wmf-config/ [01:35:32] Logged the message, Master [01:40:00] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:43:45] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [01:43:52] Logged the message, Master [01:44:11] !log reedy synchronized wmf-config/ [01:44:17] Logged the message, Master [01:44:43] !log olivneh synchronized php-1.21wmf11/extensions/GettingStarted/SpecialGettingStarted.php 'Fix deployed bug in check for editability' [01:44:51] Logged the message, Master [01:48:57] New patchset: Reedy; "Revert "Bug 39482 - Rename "chapcomwiki" to "affcomwiki""" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53914 [01:50:49] New patchset: Reedy; "Revert "Bug 39482 - Rename "chapcomwiki" to "affcomwiki""" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53915 [01:50:56] Change merged: Asher; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53914 [01:51:01] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53915 [01:51:33] reedy is doing a graceful restart of all apaches [01:52:04] !log reedy gracefulled all apaches [01:52:10] Logged the message, Master [01:52:39] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [01:52:46] Logged the message, Master [01:53:44] !log reedy synchronized wmf-config/ [01:53:50] Logged the message, Master [01:59:20] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 185 seconds [01:59:50] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 192 seconds [02:02:00] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [02:08:43] New patchset: Reedy; "Bug 39482 - Rename "chapcomwiki" to "affcomwiki"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/53921 [02:09:03] New patchset: Reedy; "Bug 39482 - Rename "chapcomwiki" to "affcomwiki"" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/53922 [02:11:57] sync-apache .. [02:12:14] done [02:12:45] dzahn is doing a graceful restart of all apaches [02:13:26] !log dzahn gracefulled all apaches [02:13:33] Logged the message, Master [02:16:14] !log reedy synchronized docroot [02:16:19] Logged the message, Master [02:20:10] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:30:02] !log LocalisationUpdate completed (1.21wmf11) at Fri Mar 15 02:30:02 UTC 2013 [02:30:11] Logged the message, Master [02:32:02] New patchset: Ottomata; "hdfs sync to stats.wikimedia.org on *:15" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53925 [02:32:20] PROBLEM - MySQL Replication Heartbeat on db66 is CRITICAL: CRIT replication delay 200 seconds [02:32:48] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53925 [02:32:50] PROBLEM - MySQL Slave Delay on db66 is CRITICAL: CRIT replication delay 233 seconds [02:33:10] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [02:40:42] New patchset: J; "install libjpeg-turbo-progs for rotate api" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44008 [02:50:51] * jeremyb_ knew that wouldn't work, mutante [03:00:23] mutante: so... there's a labs env for venus, right? to test the migration when you first switched [03:00:36] have you been able to reproduce it there? [03:08:19] i almost want to tell him to stop commenting on the bug. but i don't know python *that* well (or at least it's been too long since i had to deal with this particular issue) [03:09:50] mutante: want to add me to the planet project and i'll play with it some? [03:11:18] idk if ori-l still wants in [03:11:50] RECOVERY - MySQL Slave Delay on db66 is OK: OK replication delay 0 seconds [03:12:20] RECOVERY - MySQL Replication Heartbeat on db66 is OK: OK replication delay 0 seconds [03:23:37] New patchset: Danny B.; "Adding Wikivoyage" [operations/debs/wikistats] (master) - https://gerrit.wikimedia.org/r/53928 [03:32:15] This NavigationTiming shit is exciting. [03:32:19] Yay performance data. [03:50:23] StevenW: is that like boomerang? [03:50:56] Dunno. [03:53:25] StevenW: http://lognormal.github.com/boomerang/doc/ [03:54:20] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 186 seconds [03:54:50] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 198 seconds [04:03:40] Can someone please take a look at ? [04:14:31] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [04:16:04] Susan: i did [04:16:19] Helpful. [04:16:20] (a while ago) [04:16:29] Susan: it was kicked. people didn't like the spamming [04:16:37] I don't like people. [04:16:39] I think we're even. [04:16:42] there was a user trying to get help and had trouble because of the flood [04:16:50] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 23 seconds [04:16:52] (wasn't my decision i only saw the scrollback) [04:17:03] Susan: anyway, there's #mediawiki-feed for now [04:17:11] It's a terrible replacement. [04:17:15] As it's missing wikibugs. [04:17:22] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [04:17:29] it has wm-bot providing a similar service [04:17:32] i witnessed it [04:17:51] and you've been there too [04:18:34] Susan: at some point we should either let it back into #mediawiki or configure it to not join (and remove the ban). also we could maybe get wikibugs to join #mediawiki-feed if you really care [04:18:48] I do really care. [04:28:46] where have you gone mr. wikibugs, joltin' joe has left and gone away [04:32:29] there was no flood [04:32:56] the channel was quiet while Andre Klapper went about his normal work [04:35:04] well i was only reading after the fact not contemperaneously and didn't review timestamps. i saw *other* people called it a flood. i think [04:36:20] jeremyb_: it is exactly like boomerang, except probably not as good. I hadn't heard of it before. They do some nifty things. [04:36:38] ori-l: is it worth just using boomerang? [04:37:11] maybe; dunno. [04:39:25] The thing that seems good about it is that they've gone through some of the work figuring out how to extract useful metrics from the new wealth of client-side performance data. That's a lot of intellectual labor and experimentation but not a lot of code. We could probably just use some of their techniques. [04:40:51] ori-l: how would i have found about NavigationTiming besides just reading on IRC? i'm on the EE list. but i guess i don't read everything there. maybe it was mentioned [04:41:45] i guess there's > [Analytics] RFC: Building a frontend performance analysis platform [04:41:52] a whole thread :) [04:42:06] but that's all from 3ish months back [04:42:29] there wasn't really any work getting done for three months because supporting pieces needed to be put into place [04:43:06] what are you guys on now, mingle? [04:43:25] (i guess we should move this convo to someplace more relevant) [04:43:26] no; I've made a concerted effort to move away from anything proprietary. [04:43:47] I have to run, actually [04:44:14] well, but the rest of the team(s)? analytics and EE [04:44:21] bye :) [04:46:30] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [04:55:44] New patchset: Tim Starling; "Fix lack of output for empty path part" [operations/debs/squid] (master) - https://gerrit.wikimedia.org/r/53930 [04:56:01] Change merged: Tim Starling; [operations/debs/squid] (master) - https://gerrit.wikimedia.org/r/53930 [05:15:26] New patchset: Tim Starling; "Update redirector binary" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53935 [05:15:50] New review: Tim Starling; "Tested live." [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/53935 [05:16:00] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53935 [05:19:53] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 192 seconds [05:20:13] TimStarling: the best way to test [05:24:30] I tested it on a single squid first [05:24:56] I was just afraid that it would fail to start up due to incorrect C library version or something [05:25:07] causing the whole site to instantly fail [05:25:39] packages do have their positives, hint to ops [05:26:05] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 13 seconds [05:27:11] heh [05:30:10] TimStarling: any chance for CR today? [05:32:34] quite a list you've got there [05:34:11] notpeter isn't online anymote it seems [05:34:27] New patchset: Krinkle; "Ensure package 'doxygen' on contint." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53938 [05:35:02] Can someone in ops please merge/deploy the above package for gallium? I was assuming the package was already there, but looks like it slipped through the cracks when migrating the scripts from svn.pp [05:35:07] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [05:35:25] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [05:35:26] jenkins script is currently erroring on bash command not found :-/ [05:35:39] usually I try to avoid building it up and slow done (doing CR and reading about stuff) [05:36:09] it kind of doubled lately though [05:38:05] TimStarling: I wish one could publicly star gerrit changes [05:40:16] Aaron|home: what would that mean? [05:40:35] i wish one could public ask for review from whoever felt like working the queue. without having to name people [05:40:59] Well, it would be most useful for the list of changes by author. One could flag the more important ones out. [05:41:27] jeremyb_: maybe https://www.mediawiki.org/wiki/Git/Reviewers will solve that a little [05:41:29] AaronSchulz: so with twemproxy, everything will normally connect directly to twemproxy, except for mcc and mctest? [05:41:37] assuming enough people add themselves [05:41:44] * Aaron|home hasn't added any regexes himself [05:41:58] TimStarling: depending on the --noproxy option or whatever passed to the script [05:42:25] ugh, one would have to keep the internalServers and .yaml file in sync though, meh [05:42:36] why not use a separate BagOStuff object? [05:42:51] then you could specify a $wgObjectCaches key on the command line to mcc.php [05:43:13] php mctest.php --cache=backend [05:43:13] Aaron|home: ahhhh, interesting star use [05:43:13] php mctest.php --cache=twemproxy [05:43:50] I guess having a dummy objectcache won't hurt [05:44:16] it could be useful to have it there in configuration in case a bug develops in twemproxy and we have to stop using it [05:44:26] TimStarling: Could you perhaps merge that puppet change for me? It's a one line change, currently being blocked by it. https://gerrit.wikimedia.org/r/53938 [05:44:32] yep [05:45:09] TimStarling: I abandoned that change [05:45:15] * Aaron|home will add the --cache param [05:45:29] thanks [05:46:10] Aaron|home: hrmmmm, i think i saw that page before. but that doesn't really solve my problem [05:46:36] Aaron|home: on bugzilla (at least the way mozilla uses it) you can ask for a review from a specific pereson or from "the wind" [05:47:53] and how well does that work? :) [05:48:23] i think it was ok? [05:48:31] i don't really pay attention there any more [05:48:54] Krinkle: sucks to be you [05:49:06] New patchset: Tim Starling; "Ensure package 'doxygen' on contint." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53938 [05:49:06] TimStarling: excuse me? [05:49:21] I'm a bit of a puppet noob, but I'm learing. [05:49:24] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53938 [05:49:31] Thanks :) [05:49:35] I mean, your lot in life is a hard one, you deserve sympathy [05:49:47] because you were blocked i guess [05:49:55] since you have puppetized your stuff instead of getting root, and now you need to beg any time you want to get something done [05:50:32] Indeed. Perhaps one day I'll be crossing over in platform more than I am. I'm certainly interested in it. [05:50:39] this is on gallium? [05:50:46] Yes sir. [05:51:13] I'm doing a manual puppet run there for you, so you will have that package within a minute or so [05:51:22] I'm not sure what the puppet interval is, if , that'd be great. [05:51:39] puppet interval is 30 mins [05:52:34] Seems you got it all worked out :) Nice clean queue: https://gerrit.wikimedia.org/r/#/q/owner:%22Tim+Starling+%253Ctstarling%2540wikimedia.org%253E%22+status:open,n,z [05:53:11] great, the bin is there. [05:54:20] * TimStarling is cherry picking the easy stuff out of Aaron's list [05:54:37] guilty pleasures [05:55:14] so argument parsing in mcc.php looks fun [05:55:24] $debug = in_array( '--debug', $argv ); [05:55:25] $help = in_array( '--help', $argv ); [05:57:00] Tyler may have a point on https://gerrit.wikimedia.org/r/#/c/53799/1/includes/ScopedCallback.php,unified [06:00:24] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 183 seconds [06:01:06] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 199 seconds [06:02:58] .. and we're rolling! https://integration.mediawiki.org/ci/job/test-mediawiki-docgen/3/console https://doc.wikimedia.org/mediawiki-core/master/php/html/ [06:03:02] how many search clients do you suppose there are at any given time? [06:03:24] we don't have stats? [06:03:43] I mean open connections to lucene [06:05:26] you think this is it? http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=search_threads&s=by+name&c=Search+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [06:05:52] I'm not sure what that measures, but it looks like a plausible number [06:06:06] i.e. 60-80 [06:07:19] the yearly graph shows a spike to 150 [06:07:25] PROBLEM - Puppet freshness on mw26 is CRITICAL: Puppet has not run in the last 10 hours [06:07:49] we could wrap search connections in a PoolCounter, then it would never cause downtime [06:08:54] assuming poolcounterd can manage ~1.5 req/s [06:10:58] well, 3 req/s if you count both lock and unlock [06:11:08] TimStarling: I guess https://gerrit.wikimedia.org/r/#/c/53796/1 and https://gerrit.wikimedia.org/r/#/c/53806/1 are "easy" too [06:13:30] looks like it's currently doing 550 pps received with 0.5% CPU [06:14:13] per server [06:14:15] there are two servers [06:14:26] PROBLEM - Puppet freshness on strontium is CRITICAL: Puppet has not run in the last 10 hours [06:15:56] * jeremyb_ wonders if mutante's there [06:17:32] we could use closures for PoolCounter now [06:18:17] once there was a closure subclass of PoolCounterWork, it would only take a few lines of code to add more callers [06:19:12] Aaron|home: how busy are you? [06:19:18] doesn't poolcounter get much more than 3 req/s when we're being slashdotted? [06:19:24] (or popedotted) [06:20:02] TimStarling: what's pps? [06:20:13] TimStarling: like right now? [06:20:14] packets per second [06:20:41] no, like middle of next week [06:21:25] * jeremyb_ can't tell if Tim's serious :P [06:21:36] I'm serious [06:21:46] this was always meant to be part of the application space for poolcounter [06:22:04] no i meant about "middle of next week" [06:22:20] well, that is when I would want it done by [06:22:34] but there are lots of other people who could do it if aaron is busy [06:23:13] some of those nodes are so much more idle than others [06:23:26] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 13 seconds [06:24:00] TimStarling: maybe if my commit backlog was small I'd be more likely to look at it :) [06:24:15] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [06:24:19] fair [06:24:31] bribery [06:24:38] * Aaron|home should go on another review sprint like last week [06:24:51] I think it took two people to keep up with brad [06:25:07] jeremyb_: the time resolution is very low now but maybe it increased the rate from 520 to 670 packets per second [06:26:14] you would expect very brief spikes each time squid is purged [06:26:23] that wouldn't show up on this data [06:30:30] Hm.. quick bash question involving file descriptors 1 and 2 and tee to send to stdout and a file. https://gerrit.wikimedia.org/r/#/c/39212/11/tools/mw-doc-gen.sh [06:37:24] PROBLEM - Puppet freshness on mw1118 is CRITICAL: Puppet has not run in the last 10 hours [06:38:44] If |& is used, the standard error of command is connected to command2's standard input through the pipe; it is shorthand for 2>&1 |. [06:39:08] maybe that's what you want [06:40:06] $cmd > out.txt |& ( tee error.txt >&2 ) [06:41:06] http://manpages.ubuntu.com/manpages/precise/en/man1/bash.1.html#contenttoc9 [06:46:48] Aaron|home: 53796 is not really that simple [06:47:53] since there is no calling code so it's hard to know whether the usual problems with purging heavily used caches apply [06:50:14] Aha, I almost got it. [06:50:15] Thanks! [06:50:19] I get it now. [06:50:25] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [06:56:21] * jeremyb_ waits for Krinkle's new patchset [06:56:49] Oh not yet, I'm doing something else first. the script works fine, the output is in jenkins. It's just the txt file [06:57:12] jeremyb_: I'm currently trying to figure out why docroot on integration.mediawiki.org and doc.wikimedia.org is being overwritten by puppet [06:58:21] what exactly is being overwritten? [06:58:55] gallium now has the docroot out of puppet in integration/docroot.git, so we don't have to have ops merge simple html/css changes to the web portal [06:59:13] afaik we moved the stuff out of puppet, but somehow it is still enforcing it [06:59:30] causing the files deployed by git/jenkins to be overidden in their working copy [07:08:14] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 192 seconds [07:08:24] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 194 seconds [07:12:58] Krinkle: you've checked for cron jobs? [07:13:43] jeremyb_: I know it's puppet doing it based on the specific version of the docroot it is restoring [07:13:55] idk how you could know that [07:14:02] other things could pull from git [07:14:02] it's still managed by puppet, see modules/contint/manifests/website.pp [07:14:10] i looked there [07:14:25] ori-l: Only the directory, not the files [07:14:31] those file resources were removed [07:16:27] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [07:16:27] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [07:16:27] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [07:16:27] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [07:16:27] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [07:17:30]