[00:03:28] (03PS1) 10Ori.livneh: Disable module storage on enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100933 [00:03:30] y'all, https://www.mediawiki.org/w/index.php?title=Talk:Flow&workflow=050b8fe7deb710412404782bcb0873b9 Sincerely <3 \o/ [00:04:05] (03CR) 10Ori.livneh: [C: 032] Disable module storage on enwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100933 (owner: 10Ori.livneh) [00:04:15] !log ori updated /a/common to {{Gerrit|I62541406b}}: Disable module storage on enwiki [00:04:31] Logged the message, Master [00:04:55] !log ori synchronized wmf-config/InitialiseSettings.php 'I62541406b: Disable module storage on enwiki' [00:05:11] Logged the message, Master [00:06:48] (03CR) 10Dzahn: [C: 032] add mw1-16 (pmtpa inactive job runners) to dsh apaches [operations/puppet] - 10https://gerrit.wikimedia.org/r/100759 (owner: 10ArielGlenn) [00:08:10] ori-l: Are you done LDing? [00:08:28] RoanKattouw: yes, go ahead; I was sneaking this in before LD [00:08:33] OK no worries [00:09:53] (03CR) 10Dzahn: "better sync one too many than the other way around and until they are actually shut down, yea" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100759 (owner: 10ArielGlenn) [00:11:33] !log catrope updated /a/common/php-1.23wmf6 to {{Gerrit|I70eaddd39}}: Update VisualEditor to wmf6 branch for cherry-picks [00:11:50] Logged the message, Master [00:12:20] !log catrope synchronized php-1.23wmf6/includes/libs/CSSMin.php 'Fix for bug 58338 (@import URL mangling)' [00:12:37] Logged the message, Master [00:12:52] !log catrope synchronized php-1.23wmf6/extensions/VisualEditor/ 'Cherry-picks' [00:13:08] Logged the message, Master [00:13:14] (03CR) 10Dzahn: [C: 031] "looks like a no-op. you just did tabs->spaces, right" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100790 (owner: 10Matanya) [00:18:03] (03CR) 10Dzahn: "line 52: is this right? = ::labs_mediawiki_hostname ? no $?" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100790 (owner: 10Matanya) [00:20:25] (03CR) 10Dzahn: "please also put your pubkey on your user page on office wiki, thx" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100923 (owner: 10Andrew Bogott) [00:27:13] (03CR) 10Akosiaris: [C: 032] Backup geowiki's data-private bare repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/98499 (owner: 10QChris) [00:27:24] (03CR) 10Akosiaris: [V: 032] Backup geowiki's data-private bare repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/98499 (owner: 10QChris) [00:30:54] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [00:58:34] !log catrope synchronized php-1.23wmf6/extensions/VisualEditor/ 'Cherry-picks' [00:58:48] Logged the message, Master [01:15:02] (03PS1) 10Bsitu: Fix Flow config setting for beta labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100945 [01:15:08] (03CR) 10jenkins-bot: [V: 04-1] Fix Flow config setting for beta labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100945 (owner: 10Bsitu) [01:15:59] (03PS2) 10Bsitu: Fix Flow config setting for beta labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100945 [01:36:26] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [01:37:52] i need DateTime.pm , i wonder if i can get it from one of all those "libdatetime-*" packages [01:38:18] but there are so many [01:38:41] mutante: yes you can [01:39:08] mutante: heh, there are just two :)) [01:39:21] one for i386 and one for amd64 [01:39:36] s/i386/x86/ [01:40:12] apt-cache search libdatetime | wc -l [01:40:13] 31 [01:40:20] libdatetime-perl [01:40:32] mutante: I've used it for jenkins [01:40:42] oh, libdatetime-set-perl is probably all unrelated [01:40:46] average: k, thanks! [01:41:09] :) np [01:41:59] i see, that would pull in a bunch of other lib*-perl [01:42:08] but yea [01:42:43] mutante: and for the general solution https://metacpan.org/pod/release/JKUTEJ/Debian-Apt-PM-0.09/script/apt-pm [01:43:07] "apt-pm - locate Perl Modules in Debian repositories" [01:44:10] average: nice! ah [01:44:15] :) [01:54:04] (03PS2) 10Gage: Add gage as a root [operations/puppet] - 10https://gerrit.wikimedia.org/r/100923 (owner: 10Andrew Bogott) [01:54:08] (03PS1) 10Dzahn: add package libdatetime-perl for bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/100947 [01:59:45] (03PS1) 10Tholam: Update favicon wiktionary/si.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100949 [02:01:03] (03CR) 10Dzahn: [C: 032] "welcome to root - https://office.wikimedia.org/wiki/User:JGerard_%28WMF%29" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100923 (owner: 10Andrew Bogott) [02:05:54] (03Abandoned) 10BryanDavis: Hack: cron job to clean up files orphaned by UploadFromUrl [operations/puppet] - 10https://gerrit.wikimedia.org/r/100928 (owner: 10BryanDavis) [02:18:04] !log LocalisationUpdate completed (1.23wmf6) at Thu Dec 12 02:18:04 UTC 2013 [02:18:21] Logged the message, Master [02:21:10] (03PS2) 10Mattflaschen: fix routing of non-wikipedia on beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100573 (owner: 10Hashar) [02:29:47] greg-g, et al, do I need a window to deploy something that should only affect Beta (https://gerrit.wikimedia.org/r/#/c/100573/)? [02:38:50] !log LocalisationUpdate completed (1.23wmf5) at Thu Dec 12 02:38:50 UTC 2013 [02:39:08] Logged the message, Master [02:46:41] (03CR) 10Mattflaschen: [C: 031] "This looks good. I just need to check if I need a window to deploy this." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100573 (owner: 10Hashar) [03:25:15] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 12 03:25:15 UTC 2013 [03:25:32] Logged the message, Master [03:30:05] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [04:36:45] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [04:53:06] superm401: You don't need a window, you can just deploy that. [04:53:57] Oh, I thought that was -labs.php files. [04:55:16] That whole script is a bit terrifying. :-/ [05:01:29] * gwicke lols at http://www.joachim-breitner.de/heisse-news/news_27.xml [05:03:41] heh [05:05:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:07:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:09:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:11:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:13:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:15:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:17:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:19:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:21:25] if you had asked me a month ago if it were possible for puppet freshness alerts to be more annoying and less useful, I would have said no [05:21:48] and yet here we are [05:21:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:23:38] Gloria: you should write a bot [05:23:53] that produces dada-style puppet alerts [05:23:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:25:16] Puppet freshness is POINTY [05:25:50] ori-l: I think you're the only person who sees them. [05:25:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:26:09] gwicke: Ich lachte, but mainly at "Systemd is outdo in fact no init system , but a secretly by Linus Torvalds" [05:26:10] My tolerance for stupidity is pretty low. I ignored the bot. [05:27:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:00:04 AM UTC [05:28:22] spagewmf: ;) [05:29:13] ori-l: I liked my puppet idea better, in any case. ;-) [05:29:49] RECOVERY - Puppet freshness on mw1036 is OK: puppet ran at Thu Dec 12 05:29:44 UTC 2013 [05:31:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:29:44 AM UTC [05:33:59] PROBLEM - Puppet freshness on mw1036 is CRITICAL: Last successful Puppet run was Thu 12 Dec 2013 05:29:44 AM UTC [05:59:32] RECOVERY - Puppet freshness on mw1036 is OK: puppet ran at Thu Dec 12 05:59:30 UTC 2013 [06:28:10] PROBLEM - udp2log log age for lucene on oxygen is CRITICAL: CRITICAL: log files /a/log/lucene/lucene.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [06:30:10] RECOVERY - udp2log log age for lucene on oxygen is OK: OK: all log files active [07:28:45] (03PS1) 10Ori.livneh: Decouple Nginx configs from graphite::web & relegate to subclass [operations/puppet] - 10https://gerrit.wikimedia.org/r/100957 [07:30:26] (03PS1) 10Nemo bis: Preference "Email me when a page or file on my watchlist is changed" explicitly false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100958 [07:37:33] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [07:41:26] (03CR) 10Spage: [C: 031] "Looks more right than before ;) . This change only affects labs, but as I understand it ops doesn't like undeployed changes hanging around" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100945 (owner: 10Bsitu) [08:00:14] (03PS1) 10ArielGlenn: stat1002 access for ironholds, rt #6452 [operations/puppet] - 10https://gerrit.wikimedia.org/r/100960 [08:05:58] (03PS2) 10ArielGlenn: stat1002 access for ironholds, rt #6452 [operations/puppet] - 10https://gerrit.wikimedia.org/r/100960 [08:07:31] (03CR) 10ArielGlenn: [C: 032] stat1002 access for ironholds, rt #6452 [operations/puppet] - 10https://gerrit.wikimedia.org/r/100960 (owner: 10ArielGlenn) [08:58:07] goood morning [08:58:44] hi hashar [09:12:41] hunting the 503 errors we got on beta [09:12:49] somehow varnish can't access the backends :( [09:14:03] morning [09:19:55] apergos: have you ever imported huge files on commons using mediawiki import scripts ? [09:20:01] no [09:20:03] apergos: was wondering if I should do it from tin.eqiad.wmnet [09:20:10] roan did that for a time [09:20:42] how huge is huge? [09:20:51] ~15GB [09:20:56] files at http://lan.80686.net [09:21:05] tin as enough disk space apparently [09:21:16] hashar@tin:~$ mwscript [09:21:17] This script can only be run from the command line. [09:21:18] hehe [09:22:20] h but any one file is 2gb or so? [09:22:32] https://commons.wikimedia.org/wiki/Help:Server-side_upload#See_also [09:22:35] hm 2.8 still ok [09:23:02] hashar: if you harrse Reedy, he should give you pointers [09:23:04] ah yeah one is 2.4G [09:23:11] the files are listed at http://lan.80686.net [09:23:19] Nemo_bis: thank you :-] [09:24:53] ah has reedy been doing them? awesome [09:25:02] we should really have that on wikitech someplace [09:25:59] apergos: any clue how I can check the max size a FS supports ? [09:28:27] if you are thinking of / on tin (ext4) it's fine [09:28:35] nice thanks [09:29:09] Max. file size 16 TiB (for 4k block filesystem) [09:29:25] import something that big into commons and we will kill you [09:29:32] before you get anywhere near finishing :-P [09:30:17] apergos: that would fill Swift right? [09:30:26] ahh [09:30:40] and tin doesn't have access to internet grblblb [09:30:49] internal ip [09:31:43] apergos: At this stage, You probably just need to convert Reedy into CAL and attach his brain to wikitech, to retain his knowledge [09:31:53] I'll even copy-edit [09:32:18] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [09:32:50] https://wikitech.wikimedia.org/wiki/Uploading_large_files well looky there, one whole line [09:32:51] :-D [09:34:20] :-D [09:35:35] gotta find out from which host I will wget though [09:37:48] we usually rely on brewster as proxy server [09:38:28] ahh that is it [09:38:31] forgot about that host [09:38:40] is that simply a http proxy or does it have some cache as well ? [09:39:01] there's squid over there [09:39:54] lemme see which ports etc [09:40:46] 8080 [09:40:49] brewster:8080 indeed [09:40:54] gotta find out how to pass that to wget :D [09:41:08] http_proxy=http://brewster.wikimedia.org:8080 [09:41:11] export http_proxy [09:41:14] should get it [09:41:23] ah lowercase [09:41:24] damn [09:41:31] 0% [ ] 10,625,229 1.15M/s eta 45m 17s [09:41:41] 45 mins [09:41:44] yawn [09:42:09] !log on tin, started a screened wget to download materials that will be uploaded to commons {{bug|58155}} [09:42:25] Logged the message, Master [09:43:36] will let it run and figure out later on how to run the import [09:44:26] that one-liner tells you [09:44:38] well assuming you put your stuff in the right directory [09:45:08] but we still should have more of a description, you are right [09:47:41] 16 FetchError c backend write error: 11 (Resource temporarily unavailable) [09:47:45] ah and here are my 503 [09:47:50] somehow apaches backend don't respond [09:48:17] it's repeatable? [09:48:35] I suspect the apache pool is too small [09:48:45] ah so a percent of them fail? [09:48:50] so they would refuse connections when they have too many connected [09:49:00] how is maxclients on them? [09:49:28] for that matter how many connections are in use at a given time? do we have that in the local ganglia? [09:49:35] I have no clue [09:49:38] (the prod cluster shows it) [09:50:42] ahh we have apache metrics \O/ [09:50:45] http://ganglia.wmflabs.org/latest/?r=hour&cs=&ce=&m=load_one&s=by+name&c=deployment-prep&h=deployment-apache32&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [09:50:59] day view v [09:51:00] http://ganglia.wmflabs.org/latest/?r=day&cs=&ce=&c=deployment-prep&h=deployment-apache32&tab=m&vn=&mc=2&z=small&metric_group=ALLGROUPS [09:51:06] lots of idle threads [09:51:46] maxclients isn't the issue [09:51:49] I really got to cleanup the apache conf and add them to the git repo [09:52:23] well I say that but I don't know what the setting is [09:52:46] maybe it's 20 [09:53:51] apache2.conf:MaxClients 40 [09:53:52] apache2.conf:MaxClients 5 [09:53:54] :D [09:54:54] er? :-D [09:55:16] yeah maxclients 5 is defined when there is a define SLWO [09:55:18] SLOW [09:55:23] so that should be MaxClients 40 [09:57:11] nothing related in apache error logs [09:58:57] the apache2.conf comes from puppet so that should be fine hopefully [10:00:41] right [10:01:50] what does apache status say, anything remotely useful? [10:05:11] it is working [10:05:22] I would need some extended error logs though [10:05:26] to find out why it reject connections [10:07:48] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [10:08:12] I can look with you in a little while (hopefully) [10:10:53] ah at least I found out the reason for SIGTERM [10:10:54] notice: /Stage[main]/Applicationserver::Service/Service[apache]: Triggered 'refresh' from 1 events [10:11:00] thanks puppet [10:11:26] is there anything more complex and confusing in the universe than LDAP? [10:11:27] heh [10:11:42] *cough* x.509 + asn1 [10:11:50] worst. idea. ever. [10:12:06] heh, good call [10:12:09] but ldap is way up there, I gotta give you that [10:12:39] does all this also relate to the fact that oauth is not working ? [10:19:59] GerardM: no [10:20:52] giving up on apache restarting, will RT it [10:25:43] hashar: bonjour. The Flow deploy broke Flow on beta labs. We think we have a fix for wmf-config/InitialiseSettings-labs.php in https://gerrit.wikimedia.org/r/#/c/100945/ . Do you usually +2 and deploy these, or is it only Demon/Reedy ? [10:28:49] spagewmf: anyone with +2 access would randomly approve the -labs.php changes [10:29:09] spagewmf: I do self merge often :-D [10:29:23] though only if I am 1000% sure it is not going to impact production hehe [10:30:28] hashar right, but ^demon & Roan say not to leave merged config patches undeployed [10:30:34] (03CR) 10Hashar: [C: 032] "+2 and it will self deploy on beta. Then one need to git pull on tin.eqiad.wmnet to avoid people freaking out because something did not g" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100945 (owner: 10Bsitu) [10:30:38] spagewmf: yeah replied there [10:30:40] so basically : +2 [10:30:43] (03Merged) 10jenkins-bot: Fix Flow config setting for beta labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100945 (owner: 10Bsitu) [10:30:46] jenkins will deploy on beta cluster [10:31:01] then I go on tin.eqiad.wmnet and pull the beta only change [10:31:09] this way people get a clean git status on tin [10:31:17] doing it right now [10:32:10] hashar, thanks! I didn't know that jenkins deploys config changes as well as core and extension updates. [10:32:12] ori-l: what ldap thing are you working on? [10:32:20] (out of curiosity) [10:32:24] graphite [10:32:39] access control? [10:32:42] spagewmf: well Jenkins report so on the change :-D [10:32:48] yeah [10:32:52] sorry... [10:33:00] spagewmf: https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/1659/console : Change has been deployed on the beta cluster in 13s [10:33:14] spagewmf: the related documentation is https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated [10:33:29] spagewmf: anyway change got deployed and tin is up to date :-] So you are safe now [10:33:50] apergos: hm? [10:34:03] oh that you get stuck with that [10:34:42] hashar: right, it's the "101% sure it is not going to impact production" that gets me. We deployed other labs changes in our deployment window (that didn't quite work, hence this patch :) ) [10:34:45] i sort of did it to myself :) [10:34:47] anyway, thanks [10:35:06] spagewmf: and if in doubt, get review from other folks as usual :-] [10:36:02] the best lack all conviction [10:38:32] PROBLEM - Puppet freshness on elastic1007 is CRITICAL: Last successful Puppet run was Wed 11 Dec 2013 10:33:16 AM UTC [10:39:10] doo dee doo dee doo [10:39:21] * apergos is not loving ruby [10:39:34] oo oo what ruby? [10:39:39] meh [10:39:44] 'doo dee doo dee doo' is probably valid ruby [10:39:49] :-D [10:40:18] oh just trying to beat a script into submission [10:40:38] I'm close to winning, just not excited about it [10:42:04] paravoid: akosiaris: apergos: any clue why a pure virtual package ends up not being marked as installed ? [10:42:22] what test are you using? [10:42:33] We have a package {  'fonts-sil-yi': ensure => present } [10:42:36] but that is never installed [10:42:59] so puppet reinstall it and triggers an event that escalate up to Service['apache'] which get restarted :( [10:43:14] it reinstalls every run? [10:43:21] yep [10:43:21] well, pretends to reinstall [10:43:31] puppet log shows no complaints about it? [10:44:19] * apergos is going to get downright crabby about this ruby thang [10:45:18] na it keep reinstalling it [10:45:27] apparently purely virtual package are some kind of redirection [10:45:35] the real package being installed is fonts-sil-nuosusil [10:45:48] (03CR) 10Addshore: "2 different ways as in the single repo on labs and multiple repos on prod?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude) [10:47:01] hashar: http://en.wikipedia.beta.wmflabs.org/wiki/Talk:Flow_QA working again, merci beaucoup [10:49:22] spagewmf: bravo! [10:49:45] (03PS1) 10Hashar: imagescaler: replace virtual font package with real package [operations/puppet] - 10https://gerrit.wikimedia.org/r/100970 [10:51:26] apergos: https://gerrit.wikimedia.org/r/100970 removes the purely virtual package and use the real one instead [10:52:44] ok but doesn't .. I mean dpkg -l doesn't show the virtual package installed? [10:53:02] pasted on https://rt.wikimedia.org/Ticket/Display.html?id=6500 [10:53:09] it does: [10:53:10] dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' fonts-sil-yi [10:53:13] err puppet do that [10:53:18] and that returns unknown ok not-installed fonts-sil-yi [10:53:30] so version being 'not-installed' puppet install it as a good citizen [10:53:35] the package is http://packages.ubuntu.com/precise/fonts-sil-yi [10:53:42] apparently the font got renamed [10:53:51] it was probably valid in Lucid, no more the case today [10:54:00] we probably have had that issue since we migrated to Precise :-D [10:56:23] I am not sure it happens in production though [10:59:59] away for some coding dojo [11:00:04] bb in 2 hours [11:00:06] ok [11:00:49] * apergos goes back to kicking ruby  [11:09:33] !log labsdb1002:3307 db instance crash, page corruption, now assertion failures. restoring from upstream db snapshot [11:09:50] Logged the message, Master [11:11:26] bah nodojo today apparently :D [11:20:50] !gerrit I6b9c47055e7ee232cec3fb1f9f5d3a15d4ad392d [11:20:51] https://gerrit.wikimedia.org/ [11:21:00] !gerrit I6b9c47055e7ee232cec3fb1f9f5d3a15d4ad392d [11:21:01] https://gerrit.wikimedia.org/ [11:21:03] .. [11:21:10] !alias gerrit [11:21:12] !help gerrit [11:21:13] !documentation for labs !wm-bot for bot [11:21:16] !gerrit [11:21:17] https://gerrit.wikimedia.org/ [11:22:22] (03PS2) 10Hashar: imagescaler: replace virtual font package with real package [operations/puppet] - 10https://gerrit.wikimedia.org/r/100970 [11:22:55] (03CR) 10Hashar: "added reference to I6b9c47055e7ee232cec3fb1f9f5d3a15d4ad392d by Reedy : Update font packages to not use virtual packages" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100970 (owner: 10Hashar) [11:23:09] apergos: I think you can go ahead and merge that change :D [11:23:27] anyway, will be back in 2 hours :-D [11:23:47] I want to see what's going on in production first [11:23:53] sure [11:24:07] couldn't find an occurrence of SIGTERM in apache log, but I might be looking at the wrong file [11:24:25] I have amended the change to refer to a previous commit that changed the fonts package to no more point to virtual packages [11:24:25] I will do it in a while, just... still in my ruby death match, this is dragging on much longer than expected [11:24:31] right [11:24:31] good luck with ruby :-D [11:24:32] off ! [11:24:33] thanks [11:24:35] see ya [11:42:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [11:46:29] (03PS1) 10Springle: assign db1017 to s5 for analytics [operations/puppet] - 10https://gerrit.wikimedia.org/r/100973 [11:48:18] (03CR) 10Springle: [C: 032] assign db1017 to s5 for analytics [operations/puppet] - 10https://gerrit.wikimedia.org/r/100973 (owner: 10Springle) [11:57:12] !log xtrabackup clone s5 db1005 to db1017 [11:57:29] Logged the message, Master [11:58:32] (03CR) 10Ori.livneh: [C: 032] Decouple Nginx configs from graphite::web & relegate to subclass [operations/puppet] - 10https://gerrit.wikimedia.org/r/100957 (owner: 10Ori.livneh) [11:59:35] (03CR) 10Akosiaris: [C: 032] include the bugzilla config in puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/100752 (owner: 10Dzahn) [12:00:12] (03PS1) 10Ori.livneh: Use Apache to serve Graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/100974 [12:01:21] (03PS2) 10Ori.livneh: Use Apache to serve Graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/100974 [12:01:43] mutante: hi [12:01:54] mutante: I have a small question about a repo that disappeared from gerrit [12:02:14] (03CR) 10Ori.livneh: [C: 032 V: 032] Use Apache to serve Graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/100974 (owner: 10Ori.livneh) [12:02:17] mutante: actually, some changes that disappeared from gerrit and together with them, also the repos on git.wikimedia.org [12:02:25] mutante: ofc not all, just one in particular [12:03:33] mutante: is there any reason why a gerrit change would just vanish ? [12:03:46] mutante: this is the gerrit change http://web.archiveorange.com/archive/v/RcW9ZCDoEjVWWzbob2aI [12:04:04] mutante: https://gerrit.wikimedia.org/r/73860 [12:04:59] and the underlying repo also disappeared gerrit.wikimedia.org:29418/analytics/dclass [12:05:16] fortunately, I still have copies of this on my machine [12:05:50] the gerrit change disappearing is probably the result of the repo not being there [12:06:03] why the repo is not there ... i do not know at this time [12:06:29] akosiaris: that's ok, this is not urgent [12:06:48] PROBLEM - mysqld processes on db1017 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [12:06:55] akosiaris , mutante when you find something please let me know [12:07:56] akosiaris: I would suspect that gerrit has a log somewhere with who performed what operations [12:08:55] akosiaris: but it's also a policy question. do we keep just the latest X months/days of gerrit merged changes ? or do we keep all of them ? [12:09:10] all of them [12:10:16] when did the repo disappear ? [12:10:36] (03PS1) 10Ori.livneh: Decouple Nginx from Gdash module; serve via Apache instead [operations/puppet] - 10https://gerrit.wikimedia.org/r/100975 [12:11:11] (03CR) 10Ori.livneh: [C: 032 V: 032] Decouple Nginx from Gdash module; serve via Apache instead [operations/puppet] - 10https://gerrit.wikimedia.org/r/100975 (owner: 10Ori.livneh) [12:11:18] RECOVERY - mysqld processes on labsdb1003 is OK: PROCS OK: 3 processes with command name mysqld [12:11:55] akosiaris: I first noticed about 1-2 months ago [12:12:18] heh... logs go back 10 days... [12:12:54] (03PS1) 10Ori.livneh: Serve gdash.wikimedia.org via Apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/100976 [12:13:17] akosiaris: I see.. [12:13:39] SAL also does not have any entry for dclass being deleted [12:14:17] akosiaris: how about the git repo ? [12:14:20] (03PS2) 10Ori.livneh: Serve gdash.wikimedia.org via Apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/100976 [12:14:35] (03CR) 10Ori.livneh: [C: 032 V: 032] Serve gdash.wikimedia.org via Apache [operations/puppet] - 10https://gerrit.wikimedia.org/r/100976 (owner: 10Ori.livneh) [12:14:44] akosiaris: ssh://gerrit.wikimedia.org:29418/analytics/dclass [12:15:15] not there [12:16:28] (03PS1) 10Ori.livneh: Correct typo in template path reference introduced in Ib87eb5dbc [operations/puppet] - 10https://gerrit.wikimedia.org/r/100977 [12:17:26] anyone online i might be able to chat with about throttling job queue jobs? [12:17:36] (03CR) 10Ori.livneh: [C: 032] Correct typo in template path reference introduced in Ib87eb5dbc [operations/puppet] - 10https://gerrit.wikimedia.org/r/100977 (owner: 10Ori.livneh) [12:18:37] sigh, i really shouldn't rush [12:19:32] (03PS1) 10Ori.livneh: Correct file reference in Ib87eb5db [operations/puppet] - 10https://gerrit.wikimedia.org/r/100978 [12:19:39] sorry for git spam :/ [12:19:47] (03CR) 10Ori.livneh: [C: 032 V: 032] Correct file reference in Ib87eb5db [operations/puppet] - 10https://gerrit.wikimedia.org/r/100978 (owner: 10Ori.livneh) [12:21:49] PROBLEM - DPKG on tungsten is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:22:48] RECOVERY - DPKG on tungsten is OK: All packages OK [12:25:53] (03PS1) 10ArielGlenn: write out attributes for a nagios service in fixed order [operations/puppet] - 10https://gerrit.wikimedia.org/r/100979 [12:26:05] (03PS2) 10ArielGlenn: write out attributes for a nagios service in fixed order [operations/puppet] - 10https://gerrit.wikimedia.org/r/100979 [12:26:42] gerrit rebase button, best thing since sliced bread [12:28:05] (03CR) 10ArielGlenn: [C: 032] write out attributes for a nagios service in fixed order [operations/puppet] - 10https://gerrit.wikimedia.org/r/100979 (owner: 10ArielGlenn) [12:29:44] apergos: @parameters.sort.map { |param,value| [12:29:55] sorts by key anyway [12:30:26] this is directly from their branch however [12:31:11] well... :) [12:31:33] I was wasting a lot of time trying to get the to_s to work before realizng that they had redefined it :-/ [12:33:09] I now have to do two puppet runs on neon just to confirm (though I did some testing already writing to files in a different location) [12:33:21] and neon puppet runs are still the slowest except maybe bast1001 [12:34:16] is map faster than each and retrieval of the value as they do here? [12:34:25] (might as well learn something while we're at it) [12:34:32] no, just more idiomatic if you need the value too [12:34:54] actually, durr, each can do |k,v| too [12:34:59] yes it can [12:35:00] that's even more straightforward [12:35:12] I don't know why they didn't here [12:36:18] each can do |k,v| ? [12:36:22] isn't that each_pair ? [12:36:54] I don't know each_pair, but I know each does it [12:37:22] * apergos looks it up [12:38:37] http://www.ruby-doc.org/core-1.8.7/Hash.html#method-i-each [12:38:42] seems like the same [12:38:47] it does, doesn't it [12:39:02] marginally more efficient [12:39:07] whatever that means [12:39:12] meh [12:39:44] heh... I 've never used each for hashes... I always used each_pair [12:40:17] which btw returns a different order of items in every execution of puppet [12:40:34] yes, we know :-D [12:40:41] ruby 1.8 ftw (not!) [12:40:53] I"m fixing one of those right now, hopefully [12:41:10] yeah i fixed one of those yesterday... [12:41:41] meh.. the ruby 1.8 => 1.9 change was too big for a minor version number [12:42:00] what else got changed? [12:42:09] strings [12:42:20] they got an encoding [12:42:31] (03CR) 10Reedy: "The commit summary doesn't mention that though, does it?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude) [12:42:35] they used to be bytes... not so much anymore [12:42:41] ohhhh [12:42:46] like the python2 => python3 thingy [12:42:55] yeah I'm not so convinced about python 3 [12:44:11] so map differs from each in that it returns a new array containing the values returned by the block [12:46:18] good to know [12:49:13] food soon..jut gotta wait for two more neon runs and verify [12:49:39] *just [12:49:48] !log Took down gdash.wm.o for some time to finish Nginx -> Apache change. [12:50:05] Logged the message, Master [12:53:47] !log reedy synchronized php-1.23wmf5/extensions/Scribunto/Scribunto.namespaces.php 'Fix ml namespace issue' [12:54:03] Logged the message, Master [12:54:18] !log reedy synchronized php-1.23wmf6/extensions/Scribunto/Scribunto.namespaces.php 'Fix ml namespace issue' [12:54:33] Logged the message, Master [13:05:40] meh all this to track down the freshness check issues when it goes awry [13:05:43] but next time I'll be ready [13:13:24] !log reedy updated /a/common to {{Gerrit|Ib2c3794dc}}: Fix Flow config setting for beta labs [13:13:30] (03PS1) 10Reedy: Add wmf7 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100985 [13:13:38] Logged the message, Master [13:13:45] (03CR) 10Reedy: [C: 032] Add wmf7 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100985 (owner: 10Reedy) [13:13:54] (03Merged) 10jenkins-bot: Add wmf7 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100985 (owner: 10Reedy) [13:14:30] !log reedy synchronized docroot and w [13:14:33] (03PS3) 10ArielGlenn: imagescaler: replace virtual font package with real package [operations/puppet] - 10https://gerrit.wikimedia.org/r/100970 (owner: 10Hashar) [13:14:46] Logged the message, Master [13:17:52] (03CR) 10ArielGlenn: [C: 032] imagescaler: replace virtual font package with real package [operations/puppet] - 10https://gerrit.wikimedia.org/r/100970 (owner: 10Hashar) [13:25:50] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [13:30:19] apergos: thank you :-] [13:30:30] thank you for tracking it down [13:30:54] !log reedy synchronized php-1.23wmf7 'staging' [13:31:11] Logged the message, Master [13:31:28] info: /Stage[main]/Applicationserver::Config::Apache/Exec[Fake sync apache wmf config on beta]: Scheduling refresh of Service[apache] [13:31:28] hehe [13:31:30] never ending [13:32:28] there is always something [13:34:49] hashar: so, that would be the reason for the 503s in beta, right? [13:34:55] paravoid: might be [13:35:02] that would certainly cause 503s [13:35:17] yeah part of them, not sure it is the only reason though [13:35:25] RECOVERY - Puppet freshness on elastic1007 is OK: puppet ran at Thu Dec 12 13:35:15 UTC 2013 [13:35:49] why would apache be restarted because a font changed? [13:35:51] that is wrong [13:38:27] yeah that as well [13:38:33] havent investigated though [13:38:57] (03PS1) 10Hashar: beta: dont restart apache on fake mwsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/100988 [13:39:15] (03PS1) 10Faidon Liambotis: Temporarily lock-down Icinga Web UI [operations/puppet] - 10https://gerrit.wikimedia.org/r/100989 [13:40:06] got another puppet thing that restart Apache : https://gerrit.wikimedia.org/r/100988 [13:40:07] (03CR) 10Faidon Liambotis: [C: 032] Temporarily lock-down Icinga Web UI [operations/puppet] - 10https://gerrit.wikimedia.org/r/100989 (owner: 10Faidon Liambotis) [13:40:14] related to how we want to rsync mediawiki files before starting apache [13:41:32] apergos: another apache restart reason https://gerrit.wikimedia.org/r/#/c/100988/ :D [13:42:00] yep, having a look [13:44:44] !log reedy started scap: testwiki to 1.23wmf7, build l10n cache [13:45:01] Logged the message, Master [13:45:10] (03PS1) 10Siebrand: Set $wgULSNoWebfontsSelectors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 [13:47:08] (03PS2) 10Siebrand: Set $wgULSNoWebfontsSelectors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 [13:47:34] hashar: the creates attribute takes care of it for production, but you don' t have that in beta [13:48:29] (03PS2) 10ArielGlenn: beta: dont restart apache on fake mwsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/100988 (owner: 10Hashar) [13:49:14] ah that was it [13:49:43] which mean in prod that is only useful when a new server is added in [13:49:50] that is correct [13:49:54] doesn't prevent us from repooling and old apache with an obsolete conf :( [13:50:02] there should be anotehr sync for that [13:51:30] (03CR) 10Nikerabbit: [C: 04-1] Set $wgULSNoWebfontsSelectors (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 (owner: 10Siebrand) [13:52:06] see service apache in applicationserver::service [13:54:09] (03CR) 10ArielGlenn: [C: 032] beta: dont restart apache on fake mwsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/100988 (owner: 10Hashar) [13:55:44] Reedy: yoho? [13:56:59] apergos: hurrah, puppet no more restarting apache \O/ [13:57:08] really? excellent [13:57:12] (03CR) 10Siebrand: "@niklas: Current file uses the pattern I implemented, as far as I can tell (see inline comment). I'll look into implementing it they way y" (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 (owner: 10Siebrand) [13:57:31] apergos: that has been like that for ages though, not sure it is really the cause of the 503 on beta [13:57:39] did anyone see Reedy lately, enwiki arb election really needs him [14:01:14] matanya: been idle for 55 minutes [14:01:22] probably at breakfast/shower stage :-D [14:01:55] thanks. ok, whenever one sees him, please let him know he is in need [14:03:42] (03PS3) 10Siebrand: Set $wgULSNoWebfontsSelectors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 [14:04:18] (03CR) 10Nikerabbit: [C: 031] Set $wgULSNoWebfontsSelectors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 (owner: 10Siebrand) [14:04:37] paravoid: I'm helping! https://github.com/elasticsearch/elasticsearch/pull/3105#issuecomment-30339106 [14:05:05] haha! [14:05:16] we don't mind that much anyway :) [14:05:22] we can install java via puppet [14:05:25] (we already do) [14:07:05] Reedy, hashar: If you're doing any deployments any time soon, can you please include https://gerrit.wikimedia.org/r/#/c/100990 ? [14:08:09] siebrand: what about I deploy it for you ? :D [14:09:01] hashar: That would be very nice. [14:09:32] siebrand: when are i18n team lightning deploys windows ? [14:09:43] (03CR) 10Hashar: [C: 032] "pushing to prod" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 (owner: 10Siebrand) [14:09:45] hashar: Tuesdays. [14:09:54] (03Merged) 10jenkins-bot: Set $wgULSNoWebfontsSelectors [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100990 (owner: 10Siebrand) [14:10:06] siebrand: ah true, same as mine :-] [14:10:16] I usually deploy CI changes just after your window [14:11:48] hashar: This one turned up this morning and I had to make a little effort convincing the team it was a high priority issue. I try to keep those to a minimum and feel slightly embarrassed asking for deployment priority. It's my strong preference to always try and ride the deployment train. [14:12:29] !log hashar synchronized wmf-config/CommonSettings.php 'Set $wgULSNoWebfontsSelectors {{gerrit|100990}}' [14:12:42] !log reedy finished scap: testwiki to 1.23wmf7, build l10n cache [14:12:45] Logged the message, Master [14:12:58] !log hashar synchronized wmf-config/InitialiseSettings.php 'touch' [14:13:01] Logged the message, Master [14:13:02] siebrand: production has been my top priority for the last two years or so and is still. [14:13:13] siebrand: so whenever you have such a change to get deployed, feel free to ping me. [14:13:16] Logged the message, Master [14:13:22] hashar: Thanks again. [14:13:27] siebrand: preference during European morning since it is quiet / easier to follow :-] [14:13:37] * siebrand nids. [14:13:41] * siebrand nods at hashar  [14:13:56] manybubbles: but thanks :-) [14:14:03] siebrand: specially for such trivial changes which have low impact, have already been reviewed by your team and are super easy to deploy. Takes like 3 minutes :-D [14:14:22] paravoid: just trying to get them to do the "right" thing. by my definition of right [14:14:28] cool [14:14:40] hmm [14:15:04] apache2.log being spammed by File does not exist /usr/local/apache/common/docroot/wikipedia.org/ ... [14:15:05] :( [14:16:24] hashar: Because of the change just made? [14:16:29] na unrelated [14:16:47] seems to have occurred on Dec 5-6th [14:17:25] k [14:17:55] zgrep -F -c 'File does not exist' apache2.log-201312*.gz [14:17:59] * hashar whistles and waits [14:19:41] gotta poke sam about it I guess [14:22:54] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [14:30:36] bah Apache ErrorLogFormat is not available in Apache 2.2 :( [14:41:54] ori-l: ldap isn't confusing ;) [14:41:59] not any more than sql [14:42:10] it's just a matter of using it long enough to know it [14:52:29] (03PS11) 10Aude: Enable Wikidata build on beta labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 [14:52:41] (03PS12) 10Aude: Enable Wikidata build on beta labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 [14:54:18] (03CR) 10Aude: "I tried to make the commit message more clear. the "Wikidata" git repo loads all the dependencies as submodules and with a "Wikidata" ent" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95996 (owner: 10Aude) [15:04:19] !log Manually disabled storage backends main1b, main2b on cp1068 and restarted Varnish (puppet will overwrite) [15:04:32] Logged the message, Master [15:21:55] (03PS1) 10Ottomata: Adding $should_subscribe parameter to varnishkafka module [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/100998 [15:22:12] (03CR) 10Ottomata: [C: 032 V: 032] Adding $should_subscribe parameter to varnishkafka module [operations/puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/100998 (owner: 10Ottomata) [15:22:33] labs is down? [15:24:57] (03PS1) 10Ottomata: Not subscribing varnishkafka to its config files [operations/puppet] - 10https://gerrit.wikimedia.org/r/101000 [15:25:19] (03CR) 10Ottomata: [C: 032 V: 032] Not subscribing varnishkafka to its config files [operations/puppet] - 10https://gerrit.wikimedia.org/r/101000 (owner: 10Ottomata) [15:36:10] PROBLEM - Host elastic1007 is DOWN: PING CRITICAL - Packet loss = 100% [15:39:10] RECOVERY - Host elastic1007 is UP: PING OK - Packet loss = 0%, RTA = 0.70 ms [15:45:39] !log Restarted varnish backend on cp1055 with no config changes [15:45:55] Logged the message, Master [15:46:10] !log started import to commons of several videos from user 80686. {{bug|58155}} [15:46:20] apergos: started uploading the files I downloaded this morning. Thank you for the command at https://wikitech.wikimedia.org/wiki/Uploading_large_files [15:46:27] Logged the message, Master [15:47:39] glad it's working out [15:47:40] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [15:47:41] (03CR) 10Bartosz Dziewoński: [C: 031] "I assume you checked that the pref is not set to any other value for any other wiki. I forgot to check that with 'watchdefault' IIRC, caus" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100958 (owner: 10Nemo bis) [15:51:14] (03CR) 10Nemo bis: "Yep, it's not set anywhere." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100958 (owner: 10Nemo bis) [15:55:21] (03PS1) 10Bartosz Dziewoński: (bug 55630) $wgCategoryCollation = 'xx-uca-ckb' for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101005 [15:56:20] (03PS2) 10Bartosz Dziewoński: (bug 55630) $wgCategoryCollation = 'xx-uca-ckb' for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101005 [16:14:36] (03CR) 10Hashar: [C: 031] fix routing of non-wikipedia on beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100573 (owner: 10Hashar) [16:28:44] Reedy: you had "Failed to recurse into submodule path 'Elastica'" [16:28:47] did you fix / how? [16:28:53] what was the problem? [16:28:58] same problem on beta [16:29:28] hashar: ^ [16:30:33] ah [16:30:41] aude: the jenkins job updating beta would tell us [16:30:56] that's the problem [16:31:07] "Failed to recurse into submodule path 'Elastica'" [16:31:13] ah it is broken grbmblbl [16:31:24] which i think reedy had with the new branch [16:32:41] I am not sure whether --recursive is passed [16:32:49] ok [16:32:56] 00:01:51.642 Submodule 'Elastica' (https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Elastica.git) registered for path 'Elastica' [16:32:56] 00:01:52.560 fatal: reference is not a tree: e36213362ef459f92e8b09167e537e9312f55ae3 [16:32:57] 00:01:52.561 Unable to checkout 'e36213362ef459f92e8b09167e537e9312f55ae3' in submodule path 'Elastica' [16:32:58] bahh [16:33:53] <^d> e36213362ef459f92e8b09167e537e9312f55ae3 is a sha1 in Elastica submodule. [16:34:01] <^d> Nik had same problem locally (no idea why) [16:34:02] <^d> He recloned. [16:34:44] * hashar looks on deployment-bastion [16:36:28] <^d> hashar: For reference, that sha1 should point at the v0.90.7.0 tag. [16:36:38] <^d> Vice versa, rather. [16:36:58] I am afraid git is confused because the submodule Elastica attempts to register a submodule named Elastica :( [16:37:27] <^d> I can't imagine why :) [16:38:06] i'm sure it worked before [16:38:10] <^d> We've had this setup for several months now without incident. We just changed the deployed version this week. [16:39:50] extensions/CirrusSearch/Elastica is like 5 months old [16:41:27] indeed, what average reported [16:41:38] whats wrong with https://gerrit.wikimedia.org/r/#/c/73860/ [16:41:50] The page you requested was not found, or you do not have permission to view this page. [16:42:44] <^d> hashar: That's...wrong :) [16:42:52] <^d> It was updated within the last week [16:43:01] must have been in a repo that has been deleted [16:43:39] <^d> Either that, or it's a draft (in this case it's not) [16:44:39] no clue [16:44:40] honestly :( [16:44:55] (03CR) 10Odder: [C: 04-1] "Still, the icon is far from the current one; Quim's comment from Dec 11 9:29 PM gives helpful hint as to what we're after." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [16:45:06] [pid 7313] open("/data/project/apache/common-local/php-master/extensions/.git/modules/Elastica/modules/Elastica/objects/e3/6213362ef459f92e8b09167e537e9312f55ae3", O_RDONLY|O_NOATIME) = -1 ENOENT (No such file or directory) [16:45:06] :( [16:45:12] ^d: ok, backlog tells they were looking for "analytics/dclass" and it's gone and there are just no logs when it went away [16:45:20] <^d> hashar: I'd say just destroy the repo and re-clone. [16:45:35] yeah hmm [16:45:36] no [16:45:40] <^d> mutante: There are no logs for deletions. And yes, it was deleted...can't remember who asked me though. [16:45:43] takes too long to recline everything [16:45:52] <^d> Noooooo, just reclone that one repo. [16:46:06] <^d> I shall fix :) [16:46:17] ^d: ok, thanks, i think that's all that was requested [16:46:19] average: ^ [16:47:11] i broke it :( fatal: Not a git repository: Elastica/../.git/modules/Elastica [16:47:17] <^d> Yeah, I just saw that... [16:47:24] <^d> I'm on the system. [16:47:28] <^d> I'll fix it. [16:47:37] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [16:47:46] I went crazy and deleted the Elastica module in .git using: [16:47:47] rm -fR /data/project/apache/common-local/php-master/extensions/.git/modules/Elastica [16:47:50] did not went well [16:47:54] no clue how to fix it :-((((( [16:48:10] neither of git submodule update --init // init // sync fix it [16:48:49] mutante: thanks [16:48:58] ^d: I probably forgot to delete the Elastica dir [16:49:10] ^d: anyway not touching files anymore [16:50:13] :( [16:50:21] (03CR) 10Odder: [C: 04-1] "There's too much difference between the 16px and 32px version." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100949 (owner: 10Tholam) [16:50:23] anyway it seems not to interfere with other code updating on beta [16:50:29] apparently [16:51:17] I should get rid of mediawiki/extensions and submodules [16:51:29] and write some bot that listen for Gerrit merge event, then clone / pull accordingly [16:51:50] <^d> Dealing with hundreds of submodules annoys me :p [16:51:55] <^d> Submodules weren't meant for that :p [16:52:08] it not being parallel annoys me even more [16:53:26] <^d> Well, hence my point about them not being designed for it. [16:53:32] (03CR) 10Hashar: [C: 04-1] "Dr0ptp4kt what is that mobile friendliness you are talking about ? I am not sure it is a good idea to have google bot crawl bits :/" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95548 (owner: 10Dr0ptp4kt) [16:53:39] <^d> I think the guys who wrote submodules were expecting you to have like...2 or 3 [16:53:47] or none :-D [16:53:52] <^d> Yup [16:54:04] writing a python script that listen for stream-events would be trivial [16:54:08] might hack that up [16:54:15] could be used for auto git-Deploy as well :D [16:54:31] heh [16:54:42] cause hmm [16:54:56] the cabal eventually want to have us auto deploy by +2 ing in Gerrit [16:54:59] <^d> Submodule fixed. [16:55:08] how ??!!? [16:55:55] <^d> Remove entry from .git/config, remove old repo from the disk, then do submodule update init, update, etc. [16:56:49] ah I keep forgetting about .git/config [16:58:20] aude Elastica fixed ^^^ [16:58:25] aude: thanks for the notification! [16:58:44] yay! [17:02:08] superm401: if no one answered you: no, you don't. Beta always be deploying ;) [17:07:19] greg-g: we're preparing some backports for wikibase [17:07:27] to improve js performance [17:07:45] if we can get these in before deploying to wikipedias, that would be awesome [17:08:31] aude: yeah, prepare a backport to wmf6 and have Reedy push it out during today's deploy (in less than 2 hours) [17:08:38] just chatting with lydia/daniel about it :) [17:08:39] ok [17:08:50] yeah [17:08:57] * aude is at home so can't be on hangout [17:08:59] :) [17:09:52] before january we'll try to change more how things work to use parser cache more and do less in js (where not needed) [17:10:02] or soonas possible [17:18:02] aude: yeah, hitting the wbterms table more, right? daniel will ping s-pringle about that to get his thoughts [17:18:50] i dont' know that it's more but yes we'd like springle's thoughts [17:19:02] daniel has some patches for wbterms that involve schema change [17:19:19] ideally elastic might be able to handle some of this :) [17:19:45] if it is really more "performant" (probably) [17:20:09] aha [17:20:15] * greg-g nods [17:26:58] (03PS1) 10Ottomata: Restructuring code from bin/logster into logster.logster module [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/101021 [17:27:38] ottomata: are you pushing these upstream? [17:27:58] I feel a bit reluctant having all this upstream work under ops/debs/logster [17:28:07] greg-g: Reedy: https://gerrit.wikimedia.org/r/#/c/101020/ and https://gerrit.wikimedia.org/r/#/c/100701/ [17:34:07] All of the pings [17:35:00] Numerous pings in numerous channels... [17:35:07] If people need/want me to do something... [17:35:29] ping ping ping [17:35:33] Reedy: let's make a club! [17:35:49] #wikimedia-ping [17:36:27] ing ping [17:36:30] pint* [17:36:31] ping* [17:36:33] :) [17:36:49] http://tty.gr/s/irssi.png [17:36:55] it's old but still appropriate [17:44:46] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: testwiki back to 1.23wmf6 till window [17:45:03] Logged the message, Master [17:50:43] (03PS1) 10Reedy: Allow 'crats on test2wiki to give oversight [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101030 [17:52:08] Reedy, that wiki has non-WMF crats -> privacy policy [17:52:19] (03PS2) 10Reedy: Allow 'crats on test2wiki to give oversight [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101030 [17:52:37] Yeah [17:52:40] The reason I just didn't deploy it.. [18:07:37] of course apache doesn't support reverse-proxying to unix domain sockets [18:07:48] !@$!@# [18:11:22] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [18:41:39] (03CR) 10Aaron Schulz: [C: 031] l10nupdate-1: Log start and end times of rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/100913 (owner: 10Anomie) [19:13:58] I have nfi what I need to do [19:14:09] How many things need updating and syncing... [19:14:26] Reedy: our stuff [19:14:35] * aude grabs links [19:14:44] https://gerrit.wikimedia.org/r/#/c/100701/ [19:14:47] There's pings all over the place [19:14:51] :) [19:14:54] It's really unsustainable [19:15:05] https://gerrit.wikimedia.org/r/#/c/101020/ [19:15:09] you're popular! [19:17:09] Can someone please rm -rf /a/common/php-1.22wmf17 on tim please? [19:17:12] *tin [19:17:42] poor tim [19:19:06] bblack, around? [19:24:57] !log reedy synchronized php-1.23wmf6/extensions/ [19:25:14] Logged the message, Master [19:26:16] !log reedy synchronized php-1.23wmf7/extensions/ [19:26:34] Logged the message, Master [19:27:06] !log reedy updated /a/common to {{Gerrit|Ib597431a3}}: Set $wgULSNoWebfontsSelectors [19:27:23] Logged the message, Master [19:27:41] lies all lies [19:27:58] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.23wmf6 [19:28:20] heh [19:28:43] w00t our parser function caches correctly :) [19:28:49] :) [19:29:36] Logged the message, Master [19:29:53] PROBLEM - Host elastic1007 is DOWN: PING CRITICAL - Packet loss = 100% [19:29:59] now to see if our js is better [19:30:42] (03PS1) 10Reedy: Wikipedias to 1.23wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101045 [19:30:43] (03PS1) 10Reedy: phase1 wikis to 1.23wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101046 [19:31:33] (03CR) 10Reedy: [C: 032] Wikipedias to 1.23wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101045 (owner: 10Reedy) [19:31:33] how do we refresh the timestamps on our js? [19:31:37] 'touch' stuff? [19:31:39] touch all the things [19:31:42] * greg-g nods [19:31:42] or wait ~15 minutes IIRC [19:31:45] ok, do it! :) [19:31:53] for wmf6? [19:32:23] RECOVERY - Host elastic1007 is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [19:32:24] * aude still has timestamp from tuesday [19:32:30] (03Merged) 10jenkins-bot: Wikipedias to 1.23wmf6 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101045 (owner: 10Reedy) [19:33:03] (03PS1) 10Ori.livneh: Add apache::mod::uwsgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/101048 [19:33:04] (03PS1) 10Ori.livneh: Configure Gdash and Graphite to use mod_uwsgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/101049 [19:33:05] (03PS1) 10Ori.livneh: Parametrize listen_socket in Gdash and Graphite modules [operations/puppet] - 10https://gerrit.wikimedia.org/r/101050 [19:33:12] you shouldn't have to touch javascript any longer [19:33:24] ? [19:34:03] * Reedy touches ori-l [19:34:07] ori-l: gdash is broken [19:34:15] paravoid: i know, i noted in the SAL [19:34:23] oh sorry [19:34:25] i was working on it until 4:30 AM last night and ran out of steam [19:34:43] (03CR) 10Reedy: [C: 032] phase1 wikis to 1.23wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101046 (owner: 10Reedy) [19:34:44] oh really [19:34:50] (03CR) 10Ori.livneh: [C: 032] Add apache::mod::uwsgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/101048 (owner: 10Ori.livneh) [19:34:50] you run out of steam!?! [19:34:53] (03Merged) 10jenkins-bot: phase1 wikis to 1.23wmf7 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101046 (owner: 10Reedy) [19:34:55] I thought this wasn't possible [19:34:59] heh [19:35:02] :P [19:35:39] (03CR) 10Ori.livneh: [C: 032] Configure Gdash and Graphite to use mod_uwsgi [operations/puppet] - 10https://gerrit.wikimedia.org/r/101049 (owner: 10Ori.livneh) [19:35:47] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: phase1 wikis to 1.23wmf7 [19:35:58] looks like i am getting new js [19:36:03] Logged the message, Master [19:36:14] (03CR) 10Ori.livneh: [C: 04-1] "Not needed, since mod_uwsgi *can* proxy to a UNIX domain socket. But keeping this patch on hand just in case." [operations/puppet] - 10https://gerrit.wikimedia.org/r/101050 (owner: 10Ori.livneh) [19:36:43] (03PS3) 10Dzahn: include the bugzilla config in puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/100752 [19:36:50] * Reedy waits for APC errors to go away [19:37:48] WMF error-handling :) [19:39:36] (03CR) 10Dzahn: "thanks for the review Alex, this last set just changes that we use "bugzilla_testing" db first. from the past there are 3 db's: bugzilla3," [operations/puppet] - 10https://gerrit.wikimedia.org/r/100752 (owner: 10Dzahn) [19:40:07] 2013-12-12 19:39:57 mw1002 frwiki: /usr/local/apache/uncommon/1.23wmf6/bin/texvc does not exist or is not executable. [19:40:30] damn it [19:40:55] But it's broken anyway [19:40:57] reedy@tin:/a/common$ ~/scap-recompile [19:40:57] MediaWiki 1.23wmf6: Compiling texvc...rsync: change_dir "/srv/deployment/mediawiki/common/1.23wmf6/extensions/Math/math" failed: No such file or directory (2) [19:40:57] rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9] [19:40:57] failed [19:41:50] I lie [19:41:58] fixing [19:43:47] (03CR) 10Dzahn: [C: 032] include the bugzilla config in puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/100752 (owner: 10Dzahn) [19:43:56] Reedy: when you're done fixing that texvc thing, share what broke and how you fixed it :) [19:45:17] Just forgetting to run the script to make it [19:45:19] dsh -F25 -cM -g mediawiki-installation -o -oSetupTimeout=10 'sudo -u mwdeploy /usr/bin/scap-recompile' [19:45:30] I did start changing it so we don't need to do that [19:45:32] but never finished [19:46:05] Reedy: :) put it back on the list for your sanity [19:46:07] https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment_v2 [19:46:23] Other distractions in between make things be forgotten [19:46:38] * greg-g nods [19:47:15] PROBLEM - NTP on elastic1007 is CRITICAL: NTP CRITICAL: Offset unknown [19:47:30] yurik: yes [19:48:29] (03PS2) 10Dzahn: add package libdatetime-perl for bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/100947 [19:49:16] (03CR) 10MZMcBride: Parametrize listen_socket in Gdash and Graphite modules (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/101050 (owner: 10Ori.livneh) [19:49:25] bblack, mark & paravoid were discussing ESI stuff, I was wondering if you poked at it (talk to them about it) [19:49:42] sort of, yes [19:49:42] Gloria: good call [19:49:46] ooo! [19:49:59] aaand? [19:50:49] (03CR) 10MZMcBride: "Files should contain trailing newlines. I think you may need to tweak your editor?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/95548 (owner: 10Dr0ptp4kt) [19:51:15] RECOVERY - NTP on elastic1007 is OK: NTP OK: Offset 0.0004472732544 secs [19:53:29] (03CR) 10Dzahn: "yea, how about this: rename the whole module to "svnserver" or something to clarify and move server.pp content to init.pp as class svnserv" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [19:55:22] (03PS1) 10Ori.livneh: Correct uWSGI parameters for Gdash & add authnz_ldap for Graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/101053 [19:58:54] (03PS2) 10Ori.livneh: Correct uWSGI parameters for Gdash & add authnz_ldap for Graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/101053 [19:58:55] i hate the apache module so much [19:59:11] that makes two of us [19:59:17] (or three, or four, ...) [19:59:37] * yurik has switched to netscape server [19:59:38] (03CR) 10Ori.livneh: [C: 032 V: 032] Correct uWSGI parameters for Gdash & add authnz_ldap for Graphite [operations/puppet] - 10https://gerrit.wikimedia.org/r/101053 (owner: 10Ori.livneh) [20:00:04] ottomata: did you add https://www.varnish-cache.org/utility/varnishkafka-varnishlog-apache-kafka-integration ? [20:00:44] NO [20:00:45] waaa [20:00:47] haha [20:00:49] whoputthat thar [20:01:02] Snaps probably :) [20:01:13] heh [20:02:47] how's varnishkafka going? [20:03:29] ottomata: btw, varnishkafka instances should be defines, not classes; right now you can't configure more then one per varnish [20:03:57] paravoid, i'm going down a rabbit hole to fix these ganglia graphs :/ [20:04:04] almost there though [20:04:17] making logster into a ganglia python module, rather than using gmetric cli [20:04:31] ori-l, had thought about that a bit, but wasn't sure [20:04:37] do you have a need of running more than one instance? [20:04:55] (i'm also in a meeting right now, so replying with high latency) [20:05:15] (03PS1) 10Ori.livneh: Correct link target of /etc/apache2/sites-available/gdash [operations/puppet] - 10https://gerrit.wikimedia.org/r/101056 [20:05:34] ottomata: yeah, i'd prefer a separate stream for stats beacons with a different format [20:05:56] (03CR) 10Ori.livneh: [C: 032 V: 032] Correct link target of /etc/apache2/sites-available/gdash [operations/puppet] - 10https://gerrit.wikimedia.org/r/101056 (owner: 10Ori.livneh) [20:06:00] ok cool [20:06:25] ori-l we can change that no prob…i guess we'll have to remoive the default init script that gets installed though, and install our own isntance specific one via puppet [20:06:32] paravoid: any objections to that? [20:06:43] of what, sorry? [20:07:00] ottomata: i have a really good pattern for multi-instance upstart jobs, too [20:07:47] yeahhhhh, i'd like that better but i think paravoid wants init scripts [20:07:58] paravoid, ori-l wants me to make varnishkafka into a define rather than a class [20:08:05] so that he can run multiple instances on a single node [20:08:18] multiple instances? what for? [20:08:59] i wanted a separate stream for perf stats beacon [20:09:14] different format, fewer fields, different topic [20:09:23] could you share a few more details about that? [20:09:57] i told you about it; it's this, basically: https://gerrit.wikimedia.org/r/#/c/89359/ [20:10:15] sorry, too many things at the same time, I'm getting forgetful :/ [20:10:23] np at all [20:10:41] i abandoned that patch because it was using varnishncsa, and mark asked me to hold off for varnishkafka [20:10:47] ah, right [20:10:49] I remember [20:10:52] Hey all, could use a second opinion: ottomata and I were just plotting in -analytics to stick node.js on stat1, anyone care to object? [20:10:58] Or affirm? [20:11:06] stat1 is a mess anyway [20:11:12] Ah, it's lunch time, I have to go, but will read responses, and then do when I get back [20:11:20] heh [20:11:22] but one topic at a time :) [20:11:24] paravoid: Fantastic [20:11:44] (03PS1) 10Dan-nl: gwtoolset-config [operations/puppet] - 10https://gerrit.wikimedia.org/r/101058 [20:12:15] ori-l: so, why can't we do the filtering on the consumer side? [20:12:39] I guess it's distributing the problem vs. centralizing it [20:12:58] but on the other hand it's also putting more strain on boxes that serve a more important function than logging (serving the site) [20:13:00] some time, an audit of stat1 to check which users are actually still using it and on it would be good. i just remember people being added but seldom removed. the number is like 77 [20:13:13] is that really needed i wonder [20:13:15] we've had occurences where varnishncsa instances were using more CPU than varnishd itself [20:13:15] mutante: we could do that when we get the replacement eqiad machine [20:13:44] ottomata: sounds good, yea [20:13:50] looks like elastic1007 rebooted 41 minutes ago [20:13:57] I say, to no one in particular [20:14:22] heheh [20:14:24] marktraceur: it's been flapping [20:14:30] you mean manybubbles? [20:14:33] er [20:14:34] sorry [20:14:41] huh, I'm not surprised with it rebooting [20:14:43] manybubbles: it's been flapping, twice as far as I can see [20:14:49] (03PS1) 10Ori.livneh: profiler-to-carbon: prefix profiling stats with 'MediaWiki' [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101060 [20:15:15] ori-l: what do you think? [20:15:21] (03CR) 10Ori.livneh: [C: 032 V: 032] profiler-to-carbon: prefix profiling stats with 'MediaWiki' [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101060 (owner: 10Ori.livneh) [20:15:52] marktraceur: but "sticking" doesnt involve manual action, right [20:15:59] paravoid: i fielded the same question a year and a half ago, when i set up the eventlogging stream [20:16:10] that is, we're about to have full page stats anyways, why don't you just grep what you need out of that [20:16:35] my attitude now is the same as it was then, which is to be conservative, and not tie a small discrete problem to a much bigger problem [20:16:56] full page view stats, I mean [20:17:57] well [20:18:13] it was essentially tied by waiting for varnishkafka [20:18:21] paravoid: huh. elasticsearch is recoving. it looks like it is taking about 3 hours to recover all the shards. which seems like a long time [20:18:44] is it possible to have multiple consumers for a single topic? [20:18:57] hah, ori-l is talking about a much longer thing than varnishkafka there, more like 'kraken' in general :p [20:19:00] that's my understanding, but I'm far from an expert [20:19:02] ori-l yes [20:19:03] totally [20:19:08] you can have as many consumers as you want [20:19:21] well, ok, maybe that's doable then [20:19:25] each is identified by a consumer-group [20:19:36] that is used to store the consumers high water mark offset in zookeeper [20:19:46] (03PS1) 10Dan-nl: gwtoolset-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101061 [20:19:47] (at least, with the scala/java consumer) [20:19:48] oh, nice [20:20:02] not sure what the python module does, [20:20:04] the question is if the consumer is going to able to keep up with processing 100k messages per second :) [20:20:09] ...in python [20:20:16] yes, it will [20:20:21] heh, well, you can do as many conumser threads as there are partitions in a topic [20:20:27] jdlrobson pushed it to 150k before [20:20:27] we've got 10 per topic righ tnow [20:20:33] though not deliberately :) [20:20:42] also, ori-l, this isn't settled in stone, but I was planning on doing multiple webreuest topics [20:20:45] one for each varnish role [20:20:50] mobile, upload, bits, text [20:20:59] yes, this is a good idea [20:21:14] how would this affect me? [20:21:19] i'd only be interested in the bits one [20:21:19] its good for many reasons, but it does make consuming the full firehose a little more cumbersome [20:21:23] ah, then not at all! [20:21:23] :) [20:21:28] you don't have to process everything, just bits [20:21:34] ah, right [20:21:48] btw, back in september, we talked about a separate eventlogging lb [20:21:50] marktraceur: remember it's going to die with tampa, might make more sense to propose for the new server [20:22:01] (changing the subject momentarily) [20:22:09] (forgive me if it's too chaotic :)) [20:22:14] (03PS2) 10Dan-nl: gwtoolset-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101061 [20:22:20] (03CR) 10jenkins-bot: [V: 04-1] gwtoolset-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101061 (owner: 10Dan-nl) [20:22:44] (03PS3) 10Dan-nl: gwtoolset-config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101061 [20:23:19] paravoid: that would be nice, but the use-case for that is very specific -- namely to get detailed geocoded latency data [20:23:44] ok [20:23:46] is it still the plan? i wasn't sure if the interactive map app thing you posted a while back obviated it [20:23:50] greg-g: do you know of any open concerns re: deploying gwtoolset to production? [20:23:58] i would still love to do it, fwiw [20:24:33] I don't foresee playing with anything nice like that for 6+ months, but if it's going to be useful to you in other ways, we should do it, yes [20:24:56] but what i'm taking away from this conversation is that there's a bit of a ways to go to set up varnishkafka for web requests just right, and in the meantime i don't think i'm helping things by piling on an additional requirement [20:25:11] so i'll just sit on my hands for now, i think -- i don't have a burning need for this RIGHT AWAY [20:25:22] i'm just excited about the vk deployment, it looks like a very nice tool [20:28:33] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [20:29:24] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [20:30:33] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [20:31:29] (03PS1) 10Nemo bis: Make logscale in reqerror graphs actually work [operations/puppet] - 10https://gerrit.wikimedia.org/r/101065 [20:34:57] paravoid, did you see my comments about metrics in ganglia with postiive slope? [20:35:17] how it is broken if they get updated less often than gmetad's polling interval (15 seconds) [20:37:00] ottomata: I've seen that [20:37:11] yeah pretty stupid [20:37:34] don't use positive slope if you can avoid it [20:37:43] use the gauge type and calculate deltas in your metric module [20:37:50] pff, [20:37:51] and report the absolute value to ganglia [20:37:59] really you thnik that is better? [20:38:04] yes [20:38:16] could just make sure your module runs at least every 15 seconds :/ [20:38:41] that almost means you have to keep state in your ganglia module [20:38:45] sure, miss one and you get a gigantic spike that makes your graph unusable [20:38:46] which makes them more complicated [20:38:54] yes, it's a raw deal [20:39:05] yeah ergh [20:39:17] i just did a bunch of work yesterday and today making a generic ganglia module for logster [20:39:59] i feel like i may have just gone down a rabbit hole of wasted time, not sure though [20:40:09] this is is a cool abstraction, but mehhhhhh [20:40:29] dan-nl-afk: not that I know of, so I think we're on track for Tuesday :) [20:40:54] mutante: i don't think we are going to jsut scrap stat1 as it is, but just copy everything over to a replacement server in eqiad whenever it is available [20:41:03] we can audit accounts then for sure [20:42:30] (03CR) 10Reedy: [C: 04-1] "Should be removed from extension-list aswell in this commit!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98074 (owner: 10Nemo bis) [20:45:39] greg-g: cool. if you happen to see csteipp please ask if he can look at https://gerrit.wikimedia.org/r/#/c/101008/. bd808 wants his review on that before merging [20:46:04] oh, thought that was merged, so, my comment is pending that merge :) [20:46:57] :) [20:48:13] greg-g is csteipp nearby? i don't see him in irc … [20:48:54] dan-nl-afk: I'm not in the office today, but, he was on this morning, I'll look for him later, he might be lunching at home [20:49:05] thanks [20:49:32] (03PS1) 10Chad: Clean up old CodeReview settings: [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101068 [20:56:17] (03CR) 10Reedy: Clean up old CodeReview settings: (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101068 (owner: 10Chad) [20:57:05] (03CR) 10Chad: Clean up old CodeReview settings: (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101068 (owner: 10Chad) [20:57:26] * MatmaRex sheds a single tear for SVN [20:57:27] (03PS2) 10Chad: Clean up old CodeReview settings: [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101068 [20:58:21] * bawolff was also feeling kind of sad about that [21:05:41] (03CR) 10Dzahn: [C: 032] "Description-en: module for manipulating dates, times and timestamps" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100947 (owner: 10Dzahn) [21:19:17] Damn it guys, stop pinging me [21:19:53] marktraceur: what? [21:21:44] greg-g: A few pings for me that were meant for manybubbles|away, AFAICT [21:26:38] ori-l: and i'll push it to 150k again given the chance! muahahahha! ;-) [21:28:46] (03PS1) 10Aaron Schulz: EasyTimeline support for private wikis via img_auth [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101105 [21:30:28] (03PS1) 10Aaron Schulz: Cross-wiki backlink purging for commons file changes [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101106 [21:30:56] !log reedy synchronized php-1.23wmf6/extensions/Wikibase 'Fix scribunto integration' [21:31:12] Logged the message, Master [21:31:36] !log reedy synchronized php-1.23wmf7/extensions/Wikibase 'Fix scribunto integration' [21:31:53] Logged the message, Master [21:34:11] mutante: where should a cron like this be defined for mail relays? https://bugzilla.wikimedia.org/show_bug.cgi?id=57890 [21:34:42] I've been reading mchenry's entry at site.pp but I don't understand what's the relevant role if any [21:35:04] marktraceur: Maybe I should change my name [21:35:26] manybubbles: Now that's just unreasonable [21:35:34] We can just publicly shame people who get it wrong [21:36:00] (03CR) 10Mattflaschen: [C: 032] "Greg confirmed this can be deployed anytime." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100573 (owner: 10Hashar) [21:37:51] Nemo_bis: in puppet, somewhere in mail.pp i suppose [21:38:10] Nemo_bis: best to ask mark [21:39:01] (03Merged) 10jenkins-bot: fix routing of non-wikipedia on beta cluster [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100573 (owner: 10Hashar) [21:39:13] superm401: !!! [21:39:22] anyone for this ? /usr/bin/perl: symbol lookup error http://paste.debian.net/70595/ [21:39:35] hashar, ...? [21:40:18] (03PS1) 10Ori.livneh: Update role::statsd's graphiteHost to tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/101110 [21:40:52] (03CR) 10Ori.livneh: [C: 032 V: 032] Update role::statsd's graphiteHost to tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/101110 (owner: 10Ori.livneh) [21:41:08] hashar, did you want someone else to review it after all? [21:41:18] superm401: na it is fine :-] [21:41:37] Are you sure? [21:41:46] nop [21:41:49] need to test it out [21:42:00] hehe http://en.wikisource.beta.wmflabs.org/wiki/Main_Page gives me blank pagesnow [21:42:17] hashar, I haven't deployed it yet. [21:42:28] Does Beta pull from that repo automatically? [21:42:44] yup it should [21:43:26] It looks like it worked. [21:43:26] superm401: the Gerrit change received a post merge comment: [21:43:27] https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update/1664/console : Change has been deployed on the beta cluster in 14s [21:44:36] http://en.wikisource.beta.wmflabs.org/wiki/Main_Page?foobar yeah works [21:44:55] http://en.wikisource.beta.wmflabs.org/wiki/Main_Page works for me too, even without a cache-buster. [21:45:20] it shows me the wikipedia content :/ [21:45:41] * hashar attemps a purge [21:46:35] (03PS1) 10MarkTraceur: Add nodejs to stat1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/101111 [21:46:46] ottomata, ^^ [21:47:14] Oh, hm, I only now saw the notes in -analytics telling me to do exactly what I did [21:47:27] I'm a psychic clearly [21:49:39] (03PS1) 10Ori.livneh: Annotate logging calls with exception info [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101113 [21:49:39] superm401: thank you very much. People working on Proofread extension will now be able to test it on beta :-] [21:49:59] (03CR) 10Ori.livneh: [C: 032 V: 032] Annotate logging calls with exception info [operations/software/mwprof] - 10https://gerrit.wikimedia.org/r/101113 (owner: 10Ori.livneh) [21:51:04] (03CR) 10MaxSem: "Can be deployed now, required code went live in wmf6." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/94179 (owner: 10MaxSem) [21:51:54] hashar, am I supposed to update it on the production cluster too, so it isn't sitting in git but undeployed there? [21:52:08] superm401: yup we should do that [21:52:18] Okay, will do. [21:54:11] Alright, that's the only change that's not pulled to mediawiki-config on tin. [21:54:15] So I'll go ahead and sync it. [21:55:10] yeah we are apparently very good at deploying change as soon as they are merged [21:55:11] and rebasing locally [21:55:18] s/locally/on tin/ [21:55:44] Right [21:58:04] (03PS1) 10Nemo bis: Enable collect_exim_stats_via_gmetric cron for mail relay [operations/puppet] - 10https://gerrit.wikimedia.org/r/101117 [21:59:02] !log mflaschen synchronized multiversion/MWMultiVersion.php 'Sync routing fix that only impacts Beta' [21:59:18] Logged the message, Master [22:01:02] (03PS2) 10Nemo bis: Remove ancient ArticleFeedbackTool v4 cruft [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98074 [22:01:17] bd808: https://gerrit.wikimedia.org/r/#/c/101118/1/includes/job/JobQueueGroup.php dur dur herp herp [22:03:30] Aaron|home: I suppose you found that in the most annoying way possible [22:04:12] wow jenkins is slow [22:04:54] (03CR) 10Nemo bis: "Right, it couldn't had already been done if it was still here, sorry; https://gerrit.wikimedia.org/r/#/c/101122/" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98074 (owner: 10Nemo bis) [22:07:02] and then it takes an age to submit [22:08:46] too bad https://gdash.wikimedia.org/dashboards hasn't worked for weeks [22:09:13] https://gdash.wikimedia.org/ does though [22:10:15] !log aaron synchronized php-1.23wmf6/includes/job/JobQueueGroup.php 'de525efb4a149680f8bbf7f0d5db18489f59e4e8' [22:10:31] Logged the message, Master [22:11:29] bd808: looks like de-duplication stopped the queue from filling, heh [22:11:43] https://gdash.wikimedia.org/dashboards/jobq/ [22:12:33] heh, 86% duplicate rate at that peak 113k/min [22:13:06] * bd808 really hates graphite's default graph color scheme [22:14:54] change it! :-D [22:15:18] bblack, btw, if the ESI bugfix works, we won't have to even deal with an extra vmod! :) [22:16:11] hashar: I'll put that on the list of "stuff to do one of these days", but whining in public makes me feel slightly better even without fixing it [22:16:23] yeah I do that too [22:16:25] then fill a bug :-D [22:16:33] and contextswitch back [22:16:51] after x pick( seconds, days, weeks, months) the bug is solved magically [22:17:09] the problem with the color scheme is the bikeshedding :/ [22:17:26] Agreed [22:17:53] one sure thing, I am super happy Ori has been allocated some time to work on gdash / graphite [22:18:07] i gave a presentation of how we use it, and people were very excited [22:18:27] a couple friend set it up for their infra and are now wondering how they managed to live without it [22:19:08] anyway bed time [22:19:42] hashar: goodnight [22:39:19] RoanKattouw: hi, we spoke yesterday about a fix for a breaking change for the Education Extension... If something merged this morning, that means it definitely got in with this cycle? Specifically this is the one that's urgent: https://gerrit.wikimedia.org/r/#/c/100956/ [22:46:19] AndyRussG: Let me get my laptop back to my desk and then I'll check if it made it in. If not, I have a deployment window in 15 minutes that I can use to sneak it in [22:46:44] RoanKattouw: K, thanks a ton [22:46:47] AndyRussG: Also, which wikis do you care about this going live to and when? Does it fix a regression from less than a week ago, or from longer ago? [22:49:30] RoanKattouw: 'tis not in wmf6 or 7 (what's out there now) [22:50:22] RoanKattouw: It's important for any wikis that have the Education Program extension installed, and have a version of core with commit 57d3f41876599497fcb9c672b88ebad7239353dc [22:50:47] AndyRussG: which are which ones? [22:50:55] do you know off hand? [22:51:15] No, I don't. enwiki is one of them. Let me ask Sage [22:51:36] k [22:52:21] (03CR) 10Dzahn: "on a related note: Andre, when doing labs testing please check if ./bugzilla/lib/ has any content. from tarball it should just have a READ" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100947 (owner: 10Dzahn) [23:18:58] (03PS4) 10BryanDavis: Production configuration for GWToolset [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101061 (owner: 10Dan-nl) [23:29:44] (03PS2) 10BryanDavis: Add runner for GWToolset jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/101058 (owner: 10Dan-nl) [23:29:56] !log catrope updated /a/common/php-1.23wmf7 to {{Gerrit|I492fe5762}}: Update VisualEditor to wmf7 branch for cherry-picks [23:30:12] Logged the message, Master [23:30:18] (03CR) 10BryanDavis: "Tried to make the commit summary a little more descriptive." [operations/puppet] - 10https://gerrit.wikimedia.org/r/101058 (owner: 10Dan-nl) [23:30:21] greg-g: here is the list of wikis with the Education Program extension: http://wikiapiary.com/wiki/Extension:Education_Program . Also, I'm told that it's working fine currently on test2, so that indicates that the fixing patch from this morning made it in with the breaking one [23:30:46] (03CR) 10BryanDavis: "Tried to make the commit summary a little more descriptive." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/101061 (owner: 10Dan-nl) [23:30:49] !log catrope synchronized php-1.23wmf6/extensions/VisualEditor 'Update VE for cherry-pick' [23:31:06] Logged the message, Master [23:31:07] AndyRussG: ah, so only wikipedias. [23:31:13] !log catrope synchronized php-1.23wmf7/extensions/VisualEditor 'Update VE for cherry-pick' [23:31:13] greg-g: No [23:31:28] wikiversity too [23:31:29] Logged the message, Master [23:31:34] and news, ugh [23:31:43] AndyRussG: Ctrl + F in https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php and type EducationProgram [23:31:51] yeah, that's canonical [23:32:05] greg-g: English Wikinews and Gernman wikiversity [23:33:22] AndyRussG: btw, this should explain which wikis get which version when (better tool forthcoming): https://wikitech.wikimedia.org/wiki/Deployments/One_week#Generalized_Deploy_calendar [23:33:40] with https://wikitech.wikimedia.org/wiki/Deployments#Near_Term for the specifics [23:35:13] AndyRussG: So it looks like that commit in core you were talking about is only in wmf7 [23:35:36] Ah OK [23:35:53] I'm verifying this in Gerrit now [23:36:05] It's https://gerrit.wikimedia.org/r/#/c/92004/ right? Cause that's not in wmf6, only in wmf7 [23:36:12] AndyRussG: And what's the change you need backported again? [23:36:52] !log catrope started scap: Scap for VisualEditor i18n problems [23:37:11] Logged the message, Master [23:37:27] RoanKattouw: Yes, that's the one that broke things. This is the fix: https://gerrit.wikimedia.org/r/#/c/100956/ [23:38:20] AndyRussG: And where is the followup where you fix the getSkin() thing? [23:38:42] https://gerrit.wikimedia.org/r/#/c/101018/ [23:38:48] Thanks [23:38:59] You bet...! Thanks you also [23:40:53] Only the first one (100956) is really urgent. What breaks is the addition of a new method in IContextSource that we had to implement [23:41:39] Well surely that method working correctly is also not a small deal [23:42:11] AndyRussG: Anyway I've got it all lined up (for both of them) now, I just need to wait until this other deploy process which I started earlier finsihes [23:43:06] RoanKattouw: thanks a ton! Let me know if you need anything [23:43:49] Nope, I'm all set. I'll ping you once it's deployed, it'll just take a while because the thing I'm waiting for is really slow [23:55:39] !log catrope finished scap: Scap for VisualEditor i18n problems [23:55:56] Logged the message, Master