[00:15:43] (03PS1) 10Manybubbles: Cirrus config updates [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107108 [02:11:41] !log LocalisationUpdate completed (1.23wmf9) at Mon Jan 13 02:11:41 UTC 2014 [02:11:49] Logged the message, Master [02:21:30] !log LocalisationUpdate completed (1.23wmf10) at Mon Jan 13 02:21:29 UTC 2014 [02:21:35] Logged the message, Master [02:39:15] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Jan 13 02:39:15 UTC 2014 [02:39:22] Logged the message, Master [03:04:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:06:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:08:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:10:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:12:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:14:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:16:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:18:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:20:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:22:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:24:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:26:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:28:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:30:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:00:04 AM UTC [03:30:41] RECOVERY - Puppet freshness on mw32 is OK: puppet ran at Mon Jan 13 03:30:34 UTC 2014 [03:32:41] PROBLEM - Puppet freshness on mw32 is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:30:34 AM UTC [04:00:29] RECOVERY - Puppet freshness on mw32 is OK: puppet ran at Mon Jan 13 04:00:22 UTC 2014 [05:18:09] <^d> !log enwiki reporting lsearchd hasn't updated in days. Cursory investigation says this is right. Nothing in searchidx1001's logs seems telling, yet. [05:18:15] Logged the message, Master [05:20:30] <^d> Hmm, getting tons of timeouts trying to obtain locks. [05:23:05] <^d> enwiki index seems *very* out of date :\ [05:39:21] (03CR) 10Ottomata: [C: 032 V: 032] imported Mercurial ganglios from https://bitbucket.org/maplebed/ganglios/overview [operations/software/ganglios] - 10https://gerrit.wikimedia.org/r/106505 (owner: 10Matanya) [06:51:39] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:50:33 AM UTC [09:52:39] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:50:33 AM UTC [10:26:30] !log upgrading packages on gallium and lanthanum [10:26:37] Logged the message, Master [10:28:27] (03CR) 10Dzahn: [C: 032] "seems this was meanwhile also fixed on the remote side, but prefer non-capitalized anyways" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107105 (owner: 10Nemo bis) [10:28:40] (03CR) 10Dzahn: [V: 032] "seems this was meanwhile also fixed on the remote side, but prefer non-capitalized anyways" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107105 (owner: 10Nemo bis) [10:30:27] (03PS2) 10Dzahn: [Planet] Add Virginia Gentilini to Italian Planet [operations/puppet] - 10https://gerrit.wikimedia.org/r/107106 (owner: 10Nemo bis) [10:32:42] (03CR) 10Dzahn: [C: 032] identd:lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/107032 (owner: 10Matanya) [10:36:16] (03CR) 10Dzahn: [C: 032] "lgtm, feed works" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107106 (owner: 10Nemo bis) [10:39:57] (03CR) 10Dzahn: [C: 032] "per Chad, search in Tampa is decom" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106622 (owner: 10Chad) [10:45:11] hashar: have you seen https://bugzilla.wikimedia.org/show_bug.cgi?id=59980 ? [10:46:59] matanya: yeah and replied on it [10:47:15] matanya: I think it is going to be a wont fix :/ [10:47:48] was talking to ops about it. We use 'puppet parser validate' which does have that many knowledge about which parameters are valid [10:48:35] ok, hashar thanks. I had an idea, but i guess it is too much of a hassle and resource intensive to implemnt [10:49:02] matanya: apparently we should compile the puppet catalog [10:50:10] hashar: yes, i though bringing up a vm in labs for every patch and compile the catalog [10:50:16] *thought [11:02:07] that is more or less the idea I want to eventually achieve one day [11:02:18] matanya: the crazy idea would be to have a dedicated CI project on wmflabs [11:02:32] that would run on specific servers isolated from the network [11:02:41] sounds great [11:02:42] then spawn a pool of VM to be consumed by Jenkins jobs [11:02:50] is that doable? [11:02:54] unfortunately, there is not much horse power on ops side to make it happen :-] [11:02:57] yeah it is doable [11:02:58] entirely [11:03:00] OpenStack did it [11:03:19] they wrote a python daemon that interact with OpenStack cloud API to maintain a pool of VM [11:03:33] then get a Jenkins slave installed on the VM and have it register with the Jenkins master [11:03:52] so a job can be run in the vm. Once the job is done, some magic thing deletes the vm [11:04:05] that is all that should be done in order to achive this? [11:04:14] (03CR) 10Dzahn: "i'd wait for consensus on the bug here" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 (owner: 10Tinaj1234) [11:04:18] this sounds like something nice to do [11:05:15] matanya: http://tinyurl.com/pmgqb4c [11:05:40] that shows the number of VM being: build, available, running tests and finally being deleted [11:06:05] the little daemon attempts to maintain a pool of 100 VM apparently (yellow + green) [11:06:23] tempting to do [11:06:26] definitely [11:06:27] :D [11:06:32] but need labs to be migrated to EQIAD first [11:06:43] and then find out how to get an isolated box or two in there [11:07:15] matanya: I don't want to put pressure on ops though :/ They are busy enough like that [11:07:50] do you want me to try and help out with this a bit? if there is anything i can do? [11:08:08] (03CR) 10Dzahn: [C: 031] "lgtm (quoting, ensure first, etc), but I'll leave merge to people who were involved writing it and can babysit it to make sure" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107035 (owner: 10Matanya) [11:17:04] (03CR) 10Hashar: [C: 031] "Added a bunch of folks that might be interested in casting their voice." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 (owner: 10Tinaj1234) [11:19:00] (03CR) 10Dzahn: [C: 031] "personal opinion, i like the new format better" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 (owner: 10Tinaj1234) [11:20:39] out for lunch [11:28:00] (03PS1) 10Matanya: ganglia_new: lint clean [operations/puppet] - 10https://gerrit.wikimedia.org/r/107128 [11:30:13] (03PS2) 10Alexandros Kosiaris: retab certs.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/104742 (owner: 10Hashar) [11:31:50] (03CR) 10Dzahn: [C: 031] ldap : lint cleanup (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102629 (owner: 10Matanya) [11:32:41] (03CR) 10Dzahn: "some inline comments" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102629 (owner: 10Matanya) [11:39:20] (03CR) 10Dzahn: "other comments here? don't let my -1 from Sept. block it, some platform eng. reviews could get it going again" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [11:44:08] matanya: you know what would be helpful, wikitech editing when you find references to Tampa and know it's already eqiad now [11:45:04] mutante: I wish i new what is quiad now. i'll will try and fix it when i meet it, but i really don't know where server are [11:45:08] and that table of "https-less domains" [11:45:21] checking which of them can be removed or checked as resolved/wont fix [11:45:26] what table? [11:45:38] matanya: yea, only when you can be really sure it's done from logs [11:46:02] matanya: https://wikitech.wikimedia.org/wiki/Httpsless_domains [11:46:09] and there is a matching tracking bug in BZ [11:46:14] for https/cert issues etc [11:46:24] ok, i'll sort it out [11:46:29] thank you!:) [11:48:05] matanya: https://wikitech.wikimedia.org/wiki/Tampa_cluster (just fix ticket links/status updates if you see them, fyi) [11:48:30] don't move service around as "done" though without double checking with people who did it [11:49:00] yeah, sure :) [11:49:08] cool, tyvm [11:49:19] (03CR) 10Alexandros Kosiaris: [C: 032] retab certs.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/104742 (owner: 10Hashar) [11:49:26] (03CR) 10Matanya: ldap : lint cleanup (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102629 (owner: 10Matanya) [11:53:37] (03CR) 10Alexandros Kosiaris: [C: 032] stages.pp puppet lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/104919 (owner: 10Hashar) [11:54:16] (03CR) 10Dzahn: "bump" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96413 (owner: 10Dzahn) [11:55:59] (03CR) 10Dzahn: [C: 031] "acked by ezachte, now ideally this would be +2ed/merged by another ops" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106738 (owner: 10Dzahn) [12:00:43] (03PS7) 10Matanya: ldap : lint cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/102629 [12:02:11] confilcts are so much fun :/ easpicialy when i create them with other patchs [12:11:34] mutante: just to make sure i got it right : ekrem.wikimedia.org redirects to https://meta.wikimedia.org/wiki/IRC and the https version links to /dev/null [12:12:10] this means it is not https enables, yeah? [12:14:51] matanya: redirecting it to IRC was just a convenience thing to get people to the right docs, yes [12:15:22] what is ekrem anyway? [12:15:26] because it's role::ircd [12:15:35] so IRC related docs [12:15:48] but besides that it's not a http server, it's an IRC server [12:16:16] so it should not have http anyway [12:16:19] before that redirect you got a different error [12:16:24] long time ago [12:17:01] i think it doesnt and the redirect is in cluster redirects.conf [12:17:14] but need to check exactly that [12:17:41] where is that file? [12:18:06] matanya: correction, i know why [12:18:13] it runs apache for another reason [12:18:21] root@ekrem:/etc/apache2/sites-enabled# ls [12:18:21] irc.wikimedia.org mobile.wikipedia.org wap.wikipedia.org [12:18:36] need to find out about the other 2 being deprecated etc [12:18:52] and after that, remove the httpd from it , correct [12:19:17] so it had apache anyways and also ircd, and then the redirect was just to make it better than giving you "it works" [12:19:30] when people enter the URL in browser [12:19:39] mobile does work, doesn't ekrem serve it? [12:19:52] wap doesn't [12:20:02] find the related bugs and latest status there [12:20:21] they are already somewhere waiting for comment afair [12:20:35] bz tickets or rt? [12:20:38] both :p [12:20:46] ekrem has RT as a host and services on it [12:20:57] and BZ has tickets about issues with redirects, certs, .. [12:21:17] and it should be in the "what's left in Tampa" tracking bug in RT [12:21:35] search for hostname [12:21:54] and in that wikitech "Tampa cluster" template links to tickets [12:25:29] mutante: ok, found https://rt.wikimedia.org/Ticket/Display.html?id=4784 and the relevant ircd role you created. [12:26:02] this means the ircd is still in tampa, and need a replace in eqaid. what host is allocalted for that? [12:26:57] (03CR) 10Alexandros Kosiaris: [C: 032] add nuria to privatedata admins [operations/puppet] - 10https://gerrit.wikimedia.org/r/106738 (owner: 10Dzahn) [12:36:42] akosiaris: thanks, that worked, resolved 6617 [12:36:52] matanya: put exactly that on the ticket please, valid question:) [12:37:59] mutante: :-) [12:41:12] hey, can someone restart poolcounter service on helium and potassium? it looks sickly, resulting in jawiki's main page displaying nothing but errors for anons [12:42:09] (03PS1) 10Alexandros Kosiaris: Remove all occurences of old etherpad [operations/puppet] - 10https://gerrit.wikimedia.org/r/107136 [12:42:23] akosiaris, apergos, paravoid, mark ^ [12:42:46] MaxSem: done [12:42:52] thanks:) [12:42:54] we had a bunch of connection refused [12:42:55] ah, i had the shell open [12:42:59] aren't they monitored ? [12:43:03] almost restarted twice [12:43:12] wee, worked [12:43:27] great [12:44:02] I love our wfDebugLog( 'poolcounter' ) messages: [12:44:02] 2014-01-13 12:43:43 mw1207 ruwiki: Ошибка при подключении к серверу-счётчику пула: Connection refused [12:44:05] log it? [12:44:40] !restarted poolcounter on potassium, helium after MaxSem's request [12:44:44] hasharAway, bad example if I can read it;) [12:44:51] !log restarted poolcounter on potassium, helium after MaxSem's request [12:44:58] Logged the message, Master [12:44:59] PROBLEM - poolcounter on helium is CRITICAL: PROCS CRITICAL: 0 processes with command name poolcounterd [12:45:04] hmmm [12:45:09] PROBLEM - poolcounter on potassium is CRITICAL: PROCS CRITICAL: 0 processes with command name poolcounterd [12:45:12] !log that was https://bugzilla.wikimedia.org/show_bug.cgi?id=59993 [12:45:18] Logged the message, Master [12:45:23] ah we have a monitor but that just verify the process is around :( [12:45:42] the [rboblem was that this process stuck [12:46:02] well it is not running now [12:46:15] monitoring not recovered yet, yea [12:46:27] nope... it is really not running [12:46:33] not only monitoring, logs indicate that it doesn't work [12:46:50] !log starting poolcounter on heloum [12:46:57] Logged the message, Master [12:46:58] * Starting poolcounter poolcounter [ OK ] [12:46:59] RECOVERY - poolcounter on helium is OK: PROCS OK: 1 process with command name poolcounterd [12:47:02] ? [12:47:05] (03PS1) 10Dan-nl: adding '*.openbeelden.nl' to the wgCopyUploadsDomains array. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107138 [12:47:09] RECOVERY - poolcounter on potassium is OK: PROCS OK: 1 process with command name poolcounterd [12:47:25] !log started poolcounter on potassium [12:47:31] Logged the message, Master [12:47:32] damn, i always get the typos into the log :p [12:47:36] ok... why did this happen ? [12:47:44] heh, well, there we are, but why [12:48:42] where does poolcounter log to ? [12:48:46] different restart method? init script? [12:48:49] grrr [12:48:49] poolcounterd* [12:48:51] vs puppet? [12:49:03] puppet seems to use the init script [12:49:11] Jan 13 12:43:08 helium puppet-agent[2959]: (/Stage[main]/Poolcounter/Service[poolcounter]/ensure) change from stopped to running failed: Could not start Service[poolcounter]: Execution of '/etc/init.d/poolcounter start' returned 1: at /etc/puppet/manifests/poolcounter.pp:19 [12:49:19] it now works but is getting constant lock timeouts [12:50:14] it runs as "109" [12:50:27] UID? permissions? [12:50:53] poolcounter:x:109:113:PoolCounter,,,:/:/bin/false [12:50:58] so no problem there [12:51:02] hmm [12:51:31] well, to start used the exact same command line puppet says it used [12:51:33] mutante: i asked that. moving on to the next one. thank for the toturial :) [12:52:17] matanya: welcome, the tickets probably need just those questions to be un-stalled [12:52:40] rt bugmiester :) [12:53:20] akosiaris: your restart was also /etc/init.d/ ? [12:53:25] yes [12:53:35] uhm, then i start to run out of ideas [12:53:39] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:50:33 AM UTC [12:53:45] I did it via cssh [12:53:54] so I restarted both at pretty much the same time [12:54:12] MaxSem: would this cause any problems ? [12:54:38] akosiaris, no idea [12:54:47] maybe the restart method has a timing issue, when trying to kill and wait before restart ? [12:54:51] (03PS1) 10Hashar: poolcounter.pp: retab/puppet lint fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/107140 [12:54:51] but now it looks just like before the restart [12:54:53] (03PS1) 10Hashar: poolcounter: monitor TCP port 7531 replies [operations/puppet] - 10https://gerrit.wikimedia.org/r/107141 [12:54:57] ^^^^ might give us TCP monitoring for poolcounter. [12:54:59] it was a clear stop, start btw [12:54:59] remember we had some "sleep x" hacks in slightly similar things [12:55:03] hmm [12:55:31] at least, jawiki's main page now works:) [12:55:47] and hashar lints poolcounter.pp, hehe:) nice [12:55:52] (03PS2) 10Hashar: poolcounter: monitor TCP port 7531 replies [operations/puppet] - 10https://gerrit.wikimedia.org/r/107141 [12:55:59] MaxSem: good [12:56:11] (03CR) 10Hashar: "Patchset 2 explains how I found out port 7531." [operations/puppet] - 10https://gerrit.wikimedia.org/r/107141 (owner: 10Hashar) [12:56:23] likes ja though he cant read it [12:57:56] also checked there as no package upgrade of poolcounter by grepping /var/log/apt [12:58:42] it might be hosed due to libevent upgrade or some other dependency [12:59:17] bah poolcounter.log spam us for jawiki [12:59:21] albeit with an empty message :( [12:59:37] or maybe japanese is filtered out by wfDebugLog() :D [12:59:47] potassium, last thing in apt/history.log is just 2014-01-09 upgrading puppet itself [13:00:03] mutante , akosiaris bugzilla is on zirconium which is in eqiad, right? [13:00:36] matanya: yes and no, it's not done [13:01:07] what is missing? test_user? [13:01:09] matanya: while we speak prod is still kaulen, but zirconium is prepared for new version, that's because we do multiple things at one time [13:01:25] mutante: 4.4 ? [13:01:31] moving server, upgrading BZ major version and making puppet a module [13:01:38] to solve those tickets at once [13:01:41] yes [13:01:47] heads up, jawiki is broken again [13:01:56] grmbl [13:02:07] the question is what purging it [13:02:45] or we have a memacached failure that kills that particular page's parser cache? [13:03:06] paravoid: time to chime in here? [13:04:18] worst case, we could disable PC completely and pray no cache stampede happens while we're fixingg it [13:04:39] the poolcounter process is still running [13:04:45] weird thing, errors are only jawiki and a bit of ruwiki [13:04:48] and the server doesn't look very busy [13:05:26] the process is running fine from was it sems [13:05:32] ack [13:05:47] what is different with "ja" [13:05:52] does it have any log? [13:06:13] going through epoll (libevent), receivfrom, sendto... I actually see data passing through [13:06:20] MaxSem: not that I can find out... [13:08:26] do we need Platonides? (author of poolcounter?) [13:09:59] Reedy, around? any ideas what might be causing constant main page purges on jawiki? [13:10:39] (03PS1) 10MaxSem: Disable PoolCounter on jawiki, lots of errors breaking main page [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107143 [13:11:06] not sure if I should deploy ^^ yet [13:11:15] MaxSem: add ruwiki too [13:11:32] that's relatively rare [13:12:21] need some help re: a difference between commons beta cluster and production. gwtoolset can download images from http://www.europeana1914-1918.eu without issue on beta, but not on production. the domain has been whitelisted on both servers in the wgCopyUploadsDomains array. any way i can track down the issue? is there a log on production i could look at? [13:18:32] mutante: so 4.4 is ready or not? [13:19:17] matanya: not ready, just close to it [13:19:46] but we know the missing steps and it's being worked on [13:19:59] no need for new tickets there [13:20:11] MaxSem: do we have a translated version of that error message? [13:20:34] mark, lock wait timeout [13:21:04] isn't that just poolcounter lock contention? [13:21:25] i mean, disabling poolcounter might then just make things worse [13:21:29] mark, another one is "queue full" [13:21:47] that suggests there's a lot of contention doesn't it [13:22:17] yeah, that's why I'm trying to fugure out why it gets constantly reparsed [13:23:17] full_queues: 10772 [13:23:19] that's increasing [13:26:16] so when did this start happening? [13:26:45] (03CR) 10Alexandros Kosiaris: [C: 032] Remove all occurences of old etherpad [operations/puppet] - 10https://gerrit.wikimedia.org/r/107136 (owner: 10Alexandros Kosiaris) [13:27:01] was first reported an hour ago [13:27:32] (03CR) 10Mark Bergsma: [C: 04-1] "I'm not sure that's wise, as the poolcounter service itself seems to be functioning correctly." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107143 (owner: 10MaxSem) [13:27:53] although looking at poolcounter.log, it was this way before too, maybe in a less severe way [13:28:08] likely at least through the weekend [13:28:16] awww [13:28:25] now other wikis also report problems [13:28:42] 2014-01-13 13:28:02 mw1201 enwiki: Pool queue is full [13:31:02] I see it even back in october [13:31:14] 2013-10-19 07:53:22 mw1144 ruwiki: Накопитель запросов полон [13:31:14] 2013-10-19 07:53:22 mw1201 enwiki: Pool queue is full [13:31:14] 2013-10-19 07:53:22 mw1199 jawiki: プールキューがいっぱいです [13:31:14] 2013-10-19 07:53:22 mw1130 dewiki: Poolwarteschlange ist voll [13:31:14] 2013-10-19 07:53:22 mw1208 enwiki: Pool queue is full [13:31:28] yeah, that log is never empty [13:31:29] cute, those localised errors [13:31:41] (03CR) 10Dzahn: [C: 031] poolcounter: monitor TCP port 7531 replies [operations/puppet] - 10https://gerrit.wikimedia.org/r/107141 (owner: 10Hashar) [13:31:47] we can perhaps increase the queue size a bit, see what that does [13:32:05] however, looking in archive, yesterday's log was more than twice as long as the day before it [13:32:09] we don't have stats on pool queues do we [13:32:26] I was trying to find them, but couldn't [13:35:37] The ja.wp error was reported in https://bugzilla.wikimedia.org/show_bug.cgi?id=59993 [13:35:48] mark ^ [13:36:13] that's what we're investigating [13:36:37] the restart timestamps can be kind of found via root@neon:/var/log/icinga# grep poolcounter icinga.log [13:38:58] most traffic is for lucene [13:39:43] from the api I think [13:40:06] there was a change earlier that removed tampa search, decom'ed per chad [13:40:15] search regularly gives poolcounter errors [13:40:25] searchidx2 was removed from dsh groups [13:40:33] Nemo_bis, red teh backscroll [13:41:50] https://gerrit.wikimedia.org/r/#/c/106622/ [13:42:37] (03Abandoned) 10Odder: Disable local uploads on Korean Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106273 (owner: 10Odder) [13:42:40] that was earlier today, can it be related? [13:42:51] since you say lucene traffic [13:43:16] mutante: the large log started yesterday [13:43:23] matanya: ok [13:43:24] as noted above [13:43:52] so i guess it is not directly related, tough might be adding to the situation [13:44:03] -rw-r--r-- 1 udp2log udp2log 1229513 Jan 11 06:20 archive/poolcounter.log-20140111.gz [13:44:03] -rw-r--r-- 1 udp2log udp2log 1249982 Jan 12 06:23 archive/poolcounter.log-20140112.gz [13:44:03] -rw-r--r-- 1 udp2log udp2log 2718145 Jan 13 06:25 archive/poolcounter.log-20140113.gz [13:44:03] -rw-r--r-- 1 udp2log udp2log 73958635 Jan 13 13:31 poolcounter.log [13:44:20] related to the decom'ing of tampa search that occured before that? [13:44:30] maybe [13:44:52] mutante, I doubt old search used parsed wikitext [13:45:08] MaxSem: k, just ruling things out that happened today [13:45:21] (03PS1) 10Mark Bergsma: Raise ArticleView pool size by 50% [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107146 [13:45:53] the question is, does new search use parser? [13:46:05] no idea [13:46:09] any objection to this change? [13:46:14] nope [13:46:18] let's try then [13:46:24] (03CR) 10Mark Bergsma: [C: 032] Raise ArticleView pool size by 50% [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107146 (owner: 10Mark Bergsma) [13:46:58] !log mark updated /a/common to {{Gerrit|I0442878ea}}: Raise ArticleView pool size by 50% [13:47:05] Logged the message, Master [13:47:33] however, page views are affected, and in a weird way, as if parser cache was not saving parse results [13:47:50] !log mark synchronized wmf-config/PoolCounterSettings-eqiad.php 'Raise ArticleView pool queue size by 50%' [13:47:56] Logged the message, Master [13:48:54] no real improvement [13:54:21] big log started yesterday, but it was also weekend [13:54:27] it could have been some deployment on friday or thursday [13:55:19] and there have been both mediawiki upgrades as well as search migrations then [13:56:25] mutante: netmon1001 can host manutius and streber services? [13:58:34] matanya: something netmon*, not sure if all on the same, maybe netmon1002 [13:58:50] should i preper patches for those? [13:59:25] just update the tickets for now asking that, (which service to which hardware) [13:59:36] and i don't see any netmon1002, i guess not deployed yet [14:00:03] i made it up to say i'm not sure about that, i think there has simply been no discussion yet [14:00:23] for what? [14:00:33] !log powering off hooper [14:00:38] streber services have already moved [14:00:40] the question if netmon1001 replaces streber and manutius at the same time [14:00:40] Logged the message, Master [14:00:46] or there should be 2 hosts [14:00:51] sure [14:01:07] they were both torrus and ganglia at some point [14:01:18] and streber was rancid [14:01:29] mark: site.pp shows streber is still in heavy use [14:01:43] site.pp can never show whether anything is in heavy use [14:02:02] streber is certainly not in use atm [14:02:25] mark: read as: has many roles :) [14:02:49] PROBLEM - Host hooper is DOWN: PING CRITICAL - Packet loss = 100% [14:03:06] https://wikitech.wikimedia.org/wiki/Tampa_cluster#manutius [14:03:18] MaxSem: might be helpful to clue up those poolcounter error messages a bit [14:03:23] matanya: [14:03:32] queue full, which queue, for what article, etc [14:03:47] mark, to clue up = to make them all English? [14:03:51] ah [14:03:54] not necessarily [14:03:58] but some more debugging info couldn't hurt [14:04:06] do we know it's just the main page? [14:04:28] most likely not only [14:05:44] mutante: so if i undrstand correctly, manutius and streber should be replaced by some netmon servers in equiad, but not clear which or what is the current status. it this right? [14:06:00] all services on streber have already moved to netmon1001 [14:06:01] stupid fucking Status class doesn't allow you to pass both technical and end-user facing error information [14:06:06] smokeping almost, the rest is done [14:06:19] and as for manutius, at least torrus still needs to be moved, but it's not really important as torrus is pretty broken [14:06:21] matanya: no, the "observium" TODO is done meanwhile, just not updated on wiki [14:06:28] and ganglia aggregators need to be moved elsewhere [14:06:45] torrus can go onto netmon1001 too [14:07:29] thanks, that made it more clear [14:07:32] mark: would it be ok if i push puppet patches for those changes you mention? [14:10:43] do we even need torrus now? [14:11:01] it was mostly squid stats, wasn't it? [14:11:16] and power usage I think? [14:11:48] can't we just move these elsewhere and have one tool less? [14:11:53] e.g. librenms has some power stuff for example [14:11:58] I wanted to ask that same question ... [14:12:43] * matanya looks at the channel siling [14:13:36] i like torrus better [14:13:43] for? [14:13:47] what do you use torrus for? [14:13:48] everything [14:14:04] but mostly aggregated stats [14:14:09] I've used it once or twice, I'm not very familiar with it [14:16:59] anyhow, puppet patches for it, yes or no? [14:36:21] so http://gdash.wikimedia.org/dashboards/poolcounter/ shows that it indeed most likely started yesterday morning [14:36:39] i see one config change by reedy, on the CategoryTree extension [14:36:53] no idea if that could possibly cause this, parsercache related issues maybe [14:37:01] oh you moved it here [14:37:04] I have another theory then [14:37:15] jawiki & ruwiki are s6, along with frwiki [14:37:32] db1006 alerted on 07:55 UTC yesterday, briefly [14:37:34] and is s6 [14:37:36] timing out on a db query inside a poolcounter lock? [14:37:42] yeah [14:37:46] possible yeah [14:39:38] hey hashar, bd808|BUFFER, having an issue with gwtoolset working on production. when david tries to use the extension to download media from an external domain that has been whitelisted by wgCopyUploadsDomains it fails. it works fine on beta. bawolf thinks there may be a proxy whitelist as well … do you know if that's the case or if there might be another config we have to consider? [14:40:58] or could be the other way around of course, whatever causing poolcounter contention also causing db load [14:43:19] right now it's lots of wikis hitting full pool [14:43:31] it was a burst, now it's fine [14:43:39] trouble is, that error is appearing a lot normally too [14:45:16] !log revoked hooper.wikimedia.org in puppetCA, Salt, stored configs in puppet cleaned [14:45:22] Logged the message, Master [14:45:33] /* Title::loadRestrictions */ select pr_type, pr_expiry, pr_level, pr_cascade from `page_restrictions` where pr_page = ? [14:45:47] 52% of queries, 30% of time for db1006 in the past 48h [14:46:34] 600ms to run [14:46:43] Who could gather information from Wikipedia's server logs for the amount page requests that include Firefox's "Mozilla-search" URL parameter? [14:46:45] See https://bugzilla.mozilla.org/show_bug.cgi?id=758857#c20 [14:47:06] yeah [14:47:09] but most for frwiki [14:48:32] why does it take 600ms to run, that doesn't seem right [14:48:33] it's a pk [14:48:44] in a 4-column table [14:49:05] https://ishmael.wikimedia.org/more.php?host=db1006&hours=48&checksum=3714932819255757230 [14:49:18] AFT/ [14:49:21] yes [14:49:21] ? [14:49:24] * Nemo_bis runs [14:49:35] /* ArticleFeedbackv5Permissions::getProtectionRestriction */ select pr_level, pr_expiry from `page_restrictions` where pr_page = ? and pr_type = ? and (pr_expiry = ? or pr_expiry >= ?) limit ? [14:49:39] is the second more popular [14:49:43] so yes, it matches pretty well [14:50:29] fwiw AFT will be disabled on fr.wiki in a matter of days AFAIK [14:50:30] mark: time avg. seems to spike at a pattern that matches the poolcounter graph; unsure yet if one is the cause of the other or both effects of something else [14:50:30] funnily, there's no AFT on jawiki [14:51:24] nor on ruwiki [14:52:28] http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=MySQL+eqiad&h=db1006.eqiad.wmnet&jr=&js=&v=196&m=mysql_innodb_read_views&vl=views&ti=mysql_innodb_read_views [14:54:49] (03CR) 10Dzahn: [C: 031] poolcounter.pp: retab/puppet lint fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/107140 (owner: 10Hashar) [14:55:19] (03Abandoned) 10MaxSem: Disable PoolCounter on jawiki, lots of errors breaking main page [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107143 (owner: 10MaxSem) [14:56:06] it's a table with just 28k rows [14:56:14] which one? [14:56:43] page_restrictions [14:56:54] yeah it's nothing [14:57:02] the query is super simple, with pr_page being the primary key [14:57:29] too many of them? degraded server performance? [14:58:03] I also don't understand why db1015, at the same weight as db1006, gets 1/4 of the queries [14:58:16] I'd bet for the former as PC failures result in errors being returned, not cached and people making more requests ghitting apaches [15:00:01] query count is only slightly elevated [15:00:34] current transactions almost doubled [15:00:43] which isn't as bad as it sounds, it went from 128 to 240 [15:01:00] so it's "100 more" I guess [15:10:44] (03PS1) 10Mark Bergsma: Revert "Remove $wgCategoryTreeDynamicTag = true" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107158 [15:10:52] fuck it, let's try it [15:11:05] heh [15:11:08] fair enough [15:11:20] (03CR) 10Mark Bergsma: [C: 032] Revert "Remove $wgCategoryTreeDynamicTag = true" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107158 (owner: 10Mark Bergsma) [15:11:44] !log mark updated /a/common to {{Gerrit|Ib336530c8}}: Revert "Remove $wgCategoryTreeDynamicTag = true" [15:11:51] Logged the message, Master [15:12:35] !log mark synchronized wmf-config/CommonSettings.php 'Revert Remove = true' [15:12:39] oh yeah [15:12:42] Logged the message, Master [15:12:46] poolcounter.log went quiet [15:13:05] ohrly [15:14:39] reedy owes us some beer next week [15:14:50] (03PS1) 10Alexandros Kosiaris: decom hooper,eiximenis [operations/puppet] - 10https://gerrit.wikimedia.org/r/107159 [15:14:57] beers? [15:14:59] he can pay us in gadgets [15:15:30] nexus 5s or something :P [15:15:47] i don't need that crap [15:17:16] https://graphite.wikimedia.org/render/?title=PoolCounter%20Client%20Average%20Latency%20%28ms%29%20log%282%29%20-1week&from=-1week&width=1024&height=500&until=now&areaMode=none&hideLegend=false&logBase=2&lineWidth=1&lineMode=connected&target=cactiStyle%28MediaWiki.PoolCounter.Client.*.tavg%29 [15:17:21] haha [15:17:25] nice [15:17:27] yeah [15:17:39] sorry for derailing you temporarily :) [15:17:46] it wasn't derailing [15:17:51] most certainly related ;) [15:18:20] can that be re-enabled on Meta? [15:18:28] I don't know [15:18:30] i'm not gonna try it [15:18:35] talk to reedy [15:19:28] mark: will you reply to the bug report or should I? [15:19:31] https://bugzilla.wikimedia.org/show_bug.cgi?id=59798 [15:19:39] you can do the bug, i'm writing a partial outage report [15:20:07] cool [15:23:07] oh man [15:23:13] db1006 is so much quieter now [15:23:20] http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=MySQL%20eqiad&h=db1006.eqiad.wmnet&r=hour&z=default&jr=&js=&st=1389626543&v=106&m=mysql_innodb_read_views&vl=views&ti=mysql_innodb_read_views&z=large [15:23:24] among others [15:26:07] (03CR) 10Anomie: [C: 031] Changed date format in l10nupdate-1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/106892 (owner: 10Tinaj1234) [15:32:34] if ( $parser && $wgCategoryTreeDisableCache && !$wgCategoryTreeDynamicTag ) { [15:32:37] $parser->disableCache(); [15:32:38] uhm... [15:32:40] } [15:33:14] $wgCategoryTreeDisableCache defaults to true [15:35:14] akosiaris: pdf servers are still hardy? (please tell me they aren't) [15:35:22] they are [15:35:25] and they are being replaced soon [15:35:27] dammit [15:35:32] don't bother [15:35:47] paravoid: replaced with what? [15:36:33] https://www.mediawiki.org/wiki/PDF_rendering [15:37:39] good news. thanks paravoid [15:37:49] matanya: manifests/role/ocg.pp i believe [15:38:02] offline content generator [15:38:12] indeed [15:38:19] the updated puppet manifests changeset is https://gerrit.wikimedia.org/r/#/c/102352/ [15:38:28] That is what i thought. I hope i didn't casue an RT-mailing storm in the ops mails because of your request mutante [15:39:11] what server will hold this module? something in eqiad? [15:39:38] i see it applied on rhodium.eqiad [15:39:40] yes, we have 4 servers assigned [15:39:40] site.pp [15:39:52] rhodium is the test server, there are three more [15:39:54] there's an RT somewhere [15:40:42] 1101 (puppetize PDF servers), 838 (upgrade pdf servers to precise) [15:40:47] no [15:41:12] #6149: final decisison / migration of PDF servers [15:41:20] Hardware request for new PDF render servers [15:41:23] #6335 [15:41:26] #6335: eqiad: (4) pdf generation servers - one allocated for testing [15:41:32] k:) [15:41:32] right, those two [15:41:53] no permission to view :/ [15:42:02] matanya: search RT for "pdf" and you got them all :p [15:42:10] mutante: please link 6149 with 6335 [15:42:21] no [15:42:24] separate tickets [15:42:35] and link, no merge paravoid [15:42:42] *a link [15:42:43] linked as "refers to" only [15:42:46] yes [15:42:53] no dependency [15:53:54] PROBLEM - Puppet freshness on mchenry is CRITICAL: Last successful Puppet run was Mon 13 Jan 2014 03:50:33 AM UTC [15:59:44] RECOVERY - Puppet freshness on mchenry is OK: puppet ran at Mon Jan 13 15:59:38 UTC 2014 [16:10:47] (03PS1) 10Tim Landscheidt: Fix various typos [operations/puppet] - 10https://gerrit.wikimedia.org/r/107165 [16:15:04] (03CR) 10Faidon Liambotis: [C: 032] Fix various typos [operations/puppet] - 10https://gerrit.wikimedia.org/r/107165 (owner: 10Tim Landscheidt) [16:15:11] (03PS2) 10Faidon Liambotis: Fix various typos [operations/puppet] - 10https://gerrit.wikimedia.org/r/107165 (owner: 10Tim Landscheidt) [16:15:23] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Fix various typos [operations/puppet] - 10https://gerrit.wikimedia.org/r/107165 (owner: 10Tim Landscheidt) [16:25:34] hey bd808, having an issue with gwtoolset working on production. when david tries to use the extension to download media from an external domain that has been whitelisted by wgCopyUploadsDomains it fails. it works fine on beta. bawolf thinks there may be a proxy whitelist as well … do you know if that's the case or if there might be another config we have to consider? [16:26:06] (03PS3) 10Physikerwelt: added basic hbase support [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 [16:26:30] dan-nl: Is it a particular domain or just any external site? [16:26:53] we only tried europeana1914-1918.eu [16:27:52] I'll look around a bit and see if I can remember where the config for the proxy lives [16:28:03] thanks! [16:28:34] (03CR) 10Chad: [C: 031] "lgtm, will merge when window opens" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107108 (owner: 10Manybubbles) [16:45:48] dan-nl: The squid proxy that is used to fetch external content is configured by files/squid/copy-by-url-proxy.conf in the operations/puppet.git repository. I'm not seeing anything in the acls used there that references particular hosts that are allowed/denied other than internal network blocks. [16:46:40] Do you have an example URL that I could try fetching via the proxy to see what it's response is? [16:55:07] (03PS3) 10Jforrester: Enable VisualEditor by default on "phase 4" Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102208 [16:55:18] (03PS4) 10Jforrester: Enable VisualEditor by default on "phase 4" Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102208 [16:55:35] (03CR) 10Alexandros Kosiaris: [C: 032] poolcounter.pp: retab/puppet lint fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/107140 (owner: 10Hashar) [16:55:46] (03CR) 10Alexandros Kosiaris: [C: 032] poolcounter: monitor TCP port 7531 replies [operations/puppet] - 10https://gerrit.wikimedia.org/r/107141 (owner: 10Hashar) [16:56:03] (03PS1) 10Mark Bergsma: Revert "Raise ArticleView pool size by 50%" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107176 [16:57:40] (03CR) 10Mark Bergsma: [C: 032] Revert "Raise ArticleView pool size by 50%" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107176 (owner: 10Mark Bergsma) [16:58:07] !log mark updated /a/common to {{Gerrit|I89b765424}}: Revert "Raise ArticleView pool size by 50%" [16:58:13] Logged the message, Master [16:58:47] !log mark synchronized wmf-config/PoolCounterSettings-eqiad.php [16:58:52] Logged the message, Master [16:59:06] (03PS6) 10JanZerebecki: Varnish: don't mobile redirect www.$project.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/89879 [16:59:08] (03PS2) 10JanZerebecki: varnish: simplify the mobile redirect regexp [operations/puppet] - 10https://gerrit.wikimedia.org/r/106669 (owner: 10Faidon Liambotis) [16:59:54] rebase? [17:00:17] very unhelpful [17:00:33] as it gives the impression akosiaris' comment is addressed on first glance, I almost merged [17:01:41] mark: so, as of now, it's ok to do new search related things? ie: we're back to stable-state/normal? [17:01:53] yes [17:01:56] yes, wasn't search related [17:02:27] <^d> Search uses pool counter, even if it was the article view queues that were messed up :) [17:02:32] * greg-g nods [17:02:44] so, cool, go forth and break more things, ^d [17:03:03] (03CR) 10Chad: [C: 032] Cirrus config updates [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107108 (owner: 10Manybubbles) [17:03:03] the majority of poolcounter traffic is search yes [17:03:13] (03Merged) 10jenkins-bot: Cirrus config updates [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107108 (owner: 10Manybubbles) [17:04:38] !log demon synchronized wmf-config/CirrusSearch-common.php 'Commonswiki + enwiki Cirrus settings' [17:04:45] Logged the message, Master [17:05:32] !log demon synchronized wmf-config/InitialiseSettings.php 'enwiki gets Cirrus as secondary index' [17:05:38] Logged the message, Master [17:06:09] <^d> !log cirrus indexes created for enwiki [17:06:16] Logged the message, Master [17:06:16] <^d> manybubbles: Go forth and index :) [17:06:22] I shal! [17:08:02] it has begun [17:08:22] (03PS7) 10Faidon Liambotis: Varnish: don't mobile redirect www.$project.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/89879 (owner: 10JanZerebecki) [17:08:24] (03PS3) 10Faidon Liambotis: varnish: simplify the mobile redirect regexp [operations/puppet] - 10https://gerrit.wikimedia.org/r/106669 [17:14:07] (03CR) 10JanZerebecki: [C: 031] varnish: simplify the mobile redirect regexp [operations/puppet] - 10https://gerrit.wikimedia.org/r/106669 (owner: 10Faidon Liambotis) [17:28:17] (03CR) 10GWicke: "Can we deploy the new code before merging this puppet change to avoid the bootstrapping problem?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [17:29:28] <^d> 1.7mil htmlCacheUpdate jobs, 0 claimed. [17:29:30] <^d> This seems, wrong. [17:29:37] <^d> s/,// [17:30:00] <^d> 2 claimed because I played with it, but meh, why aren't job runners picking them up? [17:30:46] Aaron worked on the jobs to defer the expansion of backlink jobs lazy [17:31:05] s/lazy// [17:31:21] might or might not be related [17:31:29] that was before Christmas though [17:31:33] <^d> Yeah [17:34:07] domas! [17:34:22] mark! [17:34:24] whatsup! [17:34:29] are you go a good swimmer? [17:34:44] what why [17:34:45] no [17:34:47] maybe [17:34:53] well I keep hearing you're on a sinking ship ;p [17:35:22] damn! [17:35:58] and you keep hearing that where? [17:36:04] in the news [17:36:14] but i'll see you in the water ;p [17:37:03] hm [17:37:04] :) [17:37:44] * domas looks at operational dashboards, at general dashboards, at stock market, sips some coffee, and continues working :) [17:38:00] :) [17:38:06] mark: I hear that the only dashboard at wikipedia going up is page load times! [17:38:17] stab stab stab [17:38:18] :) [17:38:46] though interesting, there have been page load reductions [17:39:19] * domas eyes ori's page load metrics [17:39:35] which one? [17:39:55] the http://noc.wikimedia.org/~ori/metrics/page-load.html one [17:40:01] good data [17:41:14] what was the gesture to kill apps in ios7?:) [17:41:24] stupid me [17:41:25] got it [17:41:29] <^d> Throw the iphone on the ground? [17:41:43] size 10 boot heel [17:41:55] up in the air is preferable, I think [17:41:58] spotify app toasts my battery from time to time [17:46:52] <^d> gwicke: Well whatever the cause, the queue is definitely growing right now :\ [17:47:27] mark: how are things on your side?! [17:47:45] ok I guess [17:47:47] I'll be in SF next week [17:47:50] let's meet up some day [17:48:02] ok! [17:50:12] (03CR) 10Catrope: [C: 032] Enable VisualEditor by default on "phase 4" Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102208 (owner: 10Jforrester) [17:50:22] (03Merged) 10jenkins-bot: Enable VisualEditor by default on "phase 4" Wikipedias [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102208 (owner: 10Jforrester) [17:53:36] !log catrope synchronized visualeditor-default.dblist 'Enable VE by default on phase 4 Wikipedias' [17:53:43] Logged the message, Master [17:54:04] (03CR) 10GWicke: WIP: Update parsoid puppet config to use new repository (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [17:55:37] ^d, lets ask Aaron when he is around [17:55:48] <^d> Yeah, haven't seen him on IRC or in the office yet. [17:58:27] ^d, sent a mail [17:58:29] paravoid, akosiaris, mutante: to push into gerrit you just need the gerrit right push (usually without -f). that just bypasses creation of changesets to review. [17:58:48] <^d> gwicke: I saw, thanks. [17:58:55] i would not advise to bypass gerrit. [17:59:15] ? [17:59:17] the other gerright permissions are only for specially modified changesets for review. [17:59:46] paravoid: answer to a few days ago :) [17:59:58] yeah I've found what I was looking for [18:01:17] (03PS1) 10Chad: Deletion jobs are also high priority [operations/puppet] - 10https://gerrit.wikimedia.org/r/107184 [18:01:23] jzerebecki: you mean only when you want to import existing git repos to fork, right [18:02:13] to get history but not a million gerrit patches [18:02:37] yes [18:03:52] k, thx [18:05:03] why don't i get updates from rt on tickets i edit and people reply to me? [18:05:39] because you are neither requestor nor admincc? [18:05:48] oh, not good [18:06:11] ask jeremyb, he adds himself to tickets he wants mail for [18:06:41] i look more at web ui than mail anyways, personally [18:06:46] for RT [18:07:37] specifically the "Operations Activity" thing [18:08:05] matanya: and/or you can also use the bookmark feature, little star icons [18:08:39] thanks [18:08:48] got Bookmarked Tickets widget on dashboard for that [18:09:02] so i can check the ones i want to watch without necessarily taking them [18:17:36] (03PS1) 10Matthias Mullie: Fix notice due to incorrect capitalization [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107186 [18:35:38] !log catrope synchronized wmf-config/InitialiseSettings.php 'touch' [18:35:44] Logged the message, Master [18:39:08] (03PS1) 10Chad: Raise number of job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/107189 [18:39:16] <^d> AaronSchulz: ^ [18:39:52] (03CR) 10Aaron Schulz: [C: 031] Raise number of job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/107189 (owner: 10Chad) [18:40:23] (03CR) 10jenkins-bot: [V: 04-1] Raise number of job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/107189 (owner: 10Chad) [18:41:02] (03PS2) 10Chad: Raise number of job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/107189 [18:41:13] <^d> commas are hard, sometimes. [18:42:34] which bastion should i use to reach gallium? i can't get to it from iron. [18:43:35] !log aaron updated /a/common/php-1.23wmf9 to {{Gerrit|Ic44c352a8}}: Update MobileFrontend to wmf/1.23wmf9 tip [18:43:41] Logged the message, Master [18:43:45] got it, nm [18:43:51] missed -A arg to ssh [18:44:12] !log aaron synchronized php-1.23wmf9/includes/job/jobs/HTMLCacheUpdateJob.php 'Remove live sleep() hack in html cache jobs' [18:44:19] Logged the message, Master [18:44:49] * AaronSchulz wonders what's up with that log entry [18:49:48] AaronSchulz: sleep! Oh no! [18:50:01] that certainly would have slowed everything down a bit [18:50:40] hah [18:51:02] and to think, I was joking in another project's channel (git-annex) about it being slow because of random sleep()s [18:51:07] that was there for ages, though was obsoleted by something else in core [18:51:46] should we expect the job queue to do more things now? [18:52:04] it has collected an impressive backlog of those jobs [18:52:53] it should help, but I don't know if that's enough [18:53:16] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/83867 [19:00:59] ori: I can't find RunJobs.execute-HTMLCacheUpdateJob.count in graphite, only the achived- stuff [19:01:53] hmm, probably the cli stuff doesn't end up that, just stats...right [19:12:10] you couldn't call the jobrunners busier https://ganglia.wikimedia.org/latest/?c=Jobrunners%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [19:15:10] Nemo_bis: I would have expected them more evenly distributed [19:15:49] maybe some of those do video scaling? or that's in another group, but they're not all created equal [19:30:18] ^d: would be nice to get https://gerrit.wikimedia.org/r/#/c/106777/ into wmf10 [19:36:40] teaching myself salt; i was curious what eqiad hosts mount things via nfs. is there a less hacky way than this command? : salt '*.eqiad.wmnet' cmd.run 'grep -i nfs /proc/mounts' | grep -B1 -i nfs [19:37:36] there's a yaml output argument [19:38:19] and there's an exit code thing too iirc [19:39:03] oh cool, ok. [19:39:07] * jgage digs around [19:46:57] cmd.retcode will give me 1/0, but i don't see a way to say "don't print a line for hosts returning no output" [19:55:27] (03CR) 10Aaron Schulz: [C: 032] Prevent ParsoidCacheUpdateJobOnDependencyChange from running in the main loops [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107196 (owner: 10Aaron Schulz) [19:56:27] !log aaron synchronized wmf-config/CommonSettings.php 'Prevent ParsoidCacheUpdateJobOnDependencyChange from running in the main loops' [19:56:33] Logged the message, Master [19:59:43] hey bd808, sorry, had to quickly take care a set of twins ... [20:00:03] bd808 here's a link to a media file http://www.europeana1914-1918.eu/attachments/9459/1234.9459.full.jpg". [20:00:40] dan-nl: Cool. I'll try to fetch that with curl and see if I get any useful error messages [20:01:40] !log aaron synchronized php-1.23wmf10/includes/filebackend/SwiftFileBackend.php 'c1ab935f1307876a2127b17ad8ba1108f3e877b8' [20:01:47] Logged the message, Master [20:04:51] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [20:05:00] AaronSchulz: I've been watching the job queue for enwiki lately. that htmlCacheUpdate jobs are accumulating. so are my cirruSearchLinksUpdate jobs. it is like refreshlinks has a higher priority then expected [20:06:02] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 35.37 ms [20:08:48] dan-nl: The URL you gave me downloads fine via curl using `--proxy url-downloader.wikimedia.org:8080` from inside the cluster. [20:09:21] hello [20:09:38] * bd808 waves to hashar  [20:11:05] ahh [20:11:10] my hero :-] [20:11:22] bd808: I wrote a basic command line json linter for PHP :-] [20:11:29] it is not urgent though [20:12:04] was looking at dan-nl reported issue about production application servers not being to do HTTP get :/ [20:12:16] and I am pissed of because I should have spotted that [20:13:02] hashar: I saw the json linter. I've got the review for it open but haven't focused on it long enough to approve or comment. [20:15:04] hashar: I can not reproduce dan-nl's download problem with curl from terbium. That makes me think the problem is not the squid proxy. [20:15:37] bd808: by default there is no proxy configured I think [20:16:29] but that one works: curl -v http://www.europeana1914-1918.eu/ --proxy url-downloader.wikimedia.org:8080 --user-agent 'hashar-test' [20:16:53] Is his code not using $wgCopyUploadProxy ? dan-nl ? [20:18:28] (03PS1) 10Ottomata: Adding mapreduce_shuffle_port parameter [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/107207 [20:18:42] ahh yet another $wg.*Proxy :( [20:19:52] it is [20:20:05] hashar: He's using UploadFromUrl which checks $wgCopyUploadProxy in UploadFromUrl::reallyFetchFile() [20:20:29] bd808: it is … /mediawiki-config/wmf-config/InitialiseSettings.php [20:21:03] * bd808 nods [20:21:07] wgCopyUploadsDomains, not wgCopyUploadProxy [20:21:25] it works on beta without issue [20:21:37] yeah beta instance have direct access to web [20:21:44] it doesn't need a web proxy [20:21:50] i see [20:21:55] the apaches/job runners in production do need a web proxy though [20:22:24] so I guess you will have to figure out how to have your code use $wgCopyUploadProxy whenever it is set [20:22:49] in production that is url-downloader.wikimedia.org:8080 [20:22:57] I tested it on a random application server and that works [20:22:59] so do i need to add the domains in the wgCopyUploadsDomains array to the wgCopyUploadProxy array somewhere? [20:23:01] albeit the box in pmtpa :/ [20:23:26] UploadFromUrl::reallyFetchFile() checks for and uses the value of $wgCopyUploadProxy [20:23:38] ahh [20:23:44] so different issue :/ [20:25:26] so can i just add the whitelisted domains to InitialiseSettings.php wgCopyUploadProxy array just like the wgCopyUploadsDomains array? [20:25:52] dan-nl: No. wgCopyUploadProxy has the right value [20:26:10] (03CR) 10Ottomata: [C: 032 V: 032] Adding mapreduce_shuffle_port parameter [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/107207 (owner: 10Ottomata) [20:26:18] I don't see any sign that the squid proxy restricts the domains that can be accessed [20:26:20] also wonder how flickr works for upload wizard then … do i need to alter the extension code then? [20:27:04] bd808: yeah I tried that proxy using curl against http://www.europeana1914-1918.eu/ and that works [20:27:30] Agreed. It worked for me as well. [20:28:11] So what we need to see is the response body and possibly headers from the request that is failing in prod [20:31:10] manybubbles: I have some extra runners for RL on terbium to get them down faster after all the RL2 jobs were converted [20:31:13] * AaronSchulz was still running that [20:31:29] ah! [20:31:30] bd808, hashar i added this patch hoping to get some insight from the MWHttpRequest https://gerrit.wikimedia.org/r/#/c/107038/, but need to wait for it to get merged [20:31:45] is there anything else i could do? [20:31:46] my linksUpdateJobs would have kept up without that, I think [20:33:15] dan-nl: You are using MWHttpRequest directly? [20:33:29] That's the bug [20:33:36] for that evaluation where it fails, yes [20:33:55] that one needs $wgHTTPProxy isn't it? [20:34:24] Yes. It needs to set the 'proxy' option if $wgCopyUploadProxy !== false [20:34:26] GWToolset/includes/Handlers/UploadHandler.php evaluateMediafileUrl [20:34:55] we should probably make $wgHTTPProxy to support an array definition [20:35:00] with roles as keys :-D [20:35:16] dan-nl: Look at UploadFromUrl::reallyFetchFile to see the logic that you need to add to use the http proxy in environments that configure it [20:35:20] $wgHTTPProxy['gwtoolset'] = 'proxyhost:8080' [20:35:34] k, looking now ... [20:40:56] hashar: Interestingly we don't seem to use $wgHTTPProxy in our config at all. It's in includes/DefaultSettings.php but we don't change it in operations/mediawiki-config anywhere [20:41:38] hashar, is it possible to set-up the beta cluster to use the a proxy server as well? [20:41:41] bd808: time for yet another RFC :-D [20:41:59] <^d> bd808: We use a proxy. See what I did in ExtensionDistributor. [20:42:01] <^d> $wgExtDistProxy = 'url-downloader.wikimedia.org:8080'; [20:42:05] dan-nl: I am not sure how to prevent instances from reaching the outside world [20:43:05] i'm just thinking that it would be "best" if the beta cluster could mimic the production set-up as much as possible, then we would have found this issue on it instead of on production ... [20:43:17] ^d: I agree that we use proxies in several places but we don't set the "default" proxy value for MWHttpRequest in the cluster config [20:43:33] <^d> No, we don't. Because we don't want things to just start making requests. [20:43:40] <^d> It's whitelist-based, so they'd fail anyway. [20:44:44] <^d> bd808: Anyway, it's sort of an ops question. I had to get help from ma rk I think when I rewrote ExtDist. [20:44:49] <^d> To set things up right on linne. [20:45:37] There is a squid on linne that seems to "do the right thing". Dan just didn't know to configure his code to use it and we all missed it in code review. [20:46:23] It should be a an easy fix though. ~3 lines of code [20:47:27] Profiling error: in(CirrusSearch\Updater::buildDocumentForPages-link-counts), out(RunJobs::execute-CirrusSearch\LinksUpdateJob) [20:47:37] ^d: looks like you have mismatched sections in there [20:48:23] <^d> Hmm. [20:48:26] <^d> Thought we fixed those. [20:51:44] (03CR) 10Aaron Schulz: [C: 031] Deletion jobs are also high priority [operations/puppet] - 10https://gerrit.wikimedia.org/r/107184 (owner: 10Chad) [20:52:17] <^d> AaronSchulz: Wonder if we can get that and 107189 merged. [20:53:28] <^d> Maybe we can bribe ori :p [20:53:43] bd808, how does this look to you https://gerrit.wikimedia.org/r/#/c/107038? it works locally for me [20:53:57] but i don't have a proxy server set-up [20:53:58] ^d: which ones? [20:54:01] 107189 and? [20:54:06] <^d> 107184 [20:56:44] (03PS3) 10Ori.livneh: Raise number of job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/107189 (owner: 10Chad) [20:57:13] (whitespace change to break up long line) [20:57:20] (03CR) 10Ori.livneh: [C: 032 V: 032] Raise number of job runners by 1 per server [operations/puppet] - 10https://gerrit.wikimedia.org/r/107189 (owner: 10Chad) [20:59:17] hi abartov [21:00:16] ^d: are you getting the Gerrit /VisualEditor repository renamed or was I dreaming ? :D [21:00:35] <^d> You weren't dreaming, but we're not going to rename. [21:00:36] hashar: Abandonned. [21:00:37] <^d> It'd break stuff. [21:05:45] (03PS2) 10Ori.livneh: Deletion jobs are also high priority [operations/puppet] - 10https://gerrit.wikimedia.org/r/107184 (owner: 10Chad) [21:05:51] (03CR) 10Ori.livneh: [C: 032 V: 032] Deletion jobs are also high priority [operations/puppet] - 10https://gerrit.wikimedia.org/r/107184 (owner: 10Chad) [21:06:43] ^d: ^ [21:07:18] <^d> woot. [21:07:34] <^d> Do we have to do anything else after they get merged through, or will the job runners pick up smartly? [21:07:39] * ^d can't remember [21:07:53] they will pick it up when puppet runs next, but i can run puppet manually [21:07:56] what are the hostnames again? [21:08:18] mw1001-mw1016 [21:08:41] * ori salts [21:11:27] oh, i fucked up [21:11:47] i forced puppet on 'mw10*' [21:12:02] if puppetmaster overloads that's why [21:13:40] looks ok so not freaking out [21:14:08] <^d> AaronSchulz: When do abandoned jobs get abandoned forever? [21:14:32] 7 days after being abandoned they get deleted [21:15:13] <^d> 7 days, gotcha [21:21:56] <^d> I wonder if those extra runners are helping. [21:22:04] <^d> I might be crazy, but I think the # of jobs is going down now. [21:22:19] <^d> They are :D [21:34:18] (03PS1) 10Chad: Fix variable name [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107234 [21:35:18] (03CR) 10Chad: [C: 032] Fix variable name [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107234 (owner: 10Chad) [21:35:27] (03Merged) 10jenkins-bot: Fix variable name [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107234 (owner: 10Chad) [21:37:13] !log demon synchronized wmf-config/CirrusSearch-common.php 'Typofix in variable name' [21:37:20] Logged the message, Master [21:39:49] (03CR) 10Matanya: "why two patches for lint stuff? this one and https://gerrit.wikimedia.org/r/#/c/104807/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/104806 (owner: 10Hashar) [21:43:43] regarding https://wikitech.wikimedia.org/wiki/Cricket : is the proper thing to rename it to Obsolete:Cricket ? [21:44:45] <^d> jgage: {{obsolete}} [21:44:54] jgage: i don't think we have that namespace yet, but better than deleting it. maybe just add a template with a warning that it's old first [21:44:54] <^d> At the top. [21:44:58] ah, then that [21:45:09] ok, thanks dr daemon [21:45:39] (03CR) 10Matanya: added basic hbase support (033 comments) [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [21:49:04] greg-g: We're skipping MW deploy next week? :-( [21:49:16] yep [21:49:20] Boo. [21:49:23] blame freedom [21:49:27] and RFCs [21:49:38] We don't do roll-outs on Mondays anyway, so no blaming MLK. [21:49:46] it'd be a two-day week [21:49:51] what with the RFC stuff [21:49:54] For some people. [21:50:03] Lots of people aren't going to that ivory tower thing. :-) [21:50:08] most people that can fix the site ;) [21:50:23] {{cn}}. :-P [21:50:41] * greg-g looks at platform's involvement [21:51:07] roan's participating, too [21:51:26] yep, pretty much all the go-to people :P [21:58:39] <^d> Shouldn't we deploy though, since everyone will be here? [21:58:47] <^d> So when the site breaks, we're all on hand. [22:02:44] why isn't there deploy greg-g ? [22:06:06] (03CR) 10Hashar: "That one is only tab to spaces, that easy to review and has no impact on production since no code is actually changed." [operations/puppet] - 10https://gerrit.wikimedia.org/r/104806 (owner: 10Hashar) [22:18:30] (03CR) 10Matanya: [C: 04-1] "some consistency comments :)" (0326 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/104807 (owner: 10Hashar) [22:18:59] no offense hashar [22:19:01] matanya: see above [22:19:18] ^d: we'll all be a bit focused on other things [22:19:45] freedom and RFCs? what is that greg-g ? [22:20:01] <^d> greg-g: Focus? [22:20:07] * ^d finds something shiny [22:20:15] matanya: Martin Luther King Jr day, which is a holiday in the US and the Architecture Summit (which goes over RFCs) on Thur/Fri [22:20:20] ^d: sorry, you were saying? [22:20:31] thanks [22:20:50] <^d> greg-g: Oooh, a pigeon outside the window! [22:23:16] (I looked) [22:26:42] LeslieCarr: are you the right person to escalate issues that need rt attention? [22:26:56] ticket 6629 needs some attention [22:28:03] oh not this week [22:28:04] but i can look [22:28:24] oh man, i have no idea.... [22:29:26] (03CR) 10Ottomata: "Thanks so much for this, btw. I'm going to try to find time to test and review this week. :)" [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [22:31:30] LeslieCarr: can you forward it to the appropriate victim? I don't think anyone knows what the right way to start that thing is. [22:39:37] um, shoot [22:39:39] let me see [22:39:59] do you know know how the search cluster is set up ? [22:40:14] it's sort of this black box.... [22:40:34] looks like bad things just happened to redis [22:40:48] LeslieCarr: kinda. [22:41:13] but I really don't know how to launch that job [22:41:18] rather, what the right way to log the job is [22:41:25] yeah... [22:42:28] me neither ;) [22:42:37] <^d> Me neither (for the record) [22:42:38] Jeff_Green: is the one on duty... but i don't think he knows either ;) [22:43:01] <^d> LeslieCarr: I hear not peter knows about search ;-) [22:43:04] hahaha [22:43:05] I have a hunch that they just start this job in a screen session or something horrible. [22:43:09] oh go [22:43:10] god [22:43:12] seriously ? [22:43:17] dunno [22:43:30] I _think_ it should always be running I can't find the way to make that happen [22:43:31] PROBLEM - DPKG on searchidx1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:43:32] how is it launched in a "not the right way" [22:43:42] oh that alert is me dist-upgrading as i logged in [22:43:49] since i saw 500 un updated packages [22:43:56] brave of you. [22:44:01] updating those on search... [22:44:08] it'snot going to get less broken [22:44:09] atleast enwiki wont care [22:44:16] i mean more broken [22:44:19] omg i'm on duty this week. totally forgot [22:44:25] it can ALWAYS be more broken. [22:44:27] Jeff_Green: short straw [22:44:39] manybubbles: i volunteered :-( [22:44:40] <^d> RobH: Yeah they will. [22:44:45] <^d> It's secondary for enwiki. [22:44:52] <^d> lsearchd still serves primary traffic. [22:44:56] oh, well [22:45:03] leslie is quite brave. [22:45:19] (03Abandoned) 10MZMcBride: Scale back deployment of UniversalLanguageSelector (ULS) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/107014 (owner: 10MZMcBride) [22:45:23] and may want to be the cause of an outage on her last week [22:45:24] and no i have no clue either [22:45:27] hehy [22:45:48] manybubbles: so how would you force a launch in the incorrect way ? [22:45:53] and what is the process even called ? [22:45:58] lucene-search ? [22:46:42] wrong way? [22:47:02] well the command called by that [22:47:16] well, you want it launched and logging correctly [22:47:22] but how would you even launch it without logging ? [22:47:42] basically, this is a black box that i am poking at, and i have no idea what to even look for [22:47:49] minimum would be $(nohup /a/search/lucene.jobs.sh inc-updater-start) or something [22:47:53] i mean, i can make sure apache is [22:47:58] but I really don't know [22:48:07] ok [22:49:13] interesting ... since htat's not referenced in its init file ... [22:49:23] ah there's supposed to be a cron with that .... [22:49:50] either once a day or once a week for most commands ? [22:53:30] I think it is all the time. it is in a while true loop [22:55:19] so.... http://git.wikimedia.org/blob/operations%2Fpuppet.git/2e85a5187b5fce60a800c17020a811824a15d736/manifests%2Fsearch.pp line 171 [22:55:31] are those a complete set of the jobs that need to run ? [23:00:15] LeslieCarr: I'm really not sure. I think the incremental update process isn't there but was running until that machine was restarted. I'm just piecing that together from the log files though [23:00:31] hrm [23:01:19] on reading the code that incremental updater _looks_ useful. I _think_ we used it and if I found some evidence besides the log files I'd advise starting it [23:01:29] heck, I think we should start it any way, but I don't know the "right" way to start it [23:01:34] I'm 70% it'll fix it [23:01:48] manybubbles: in .bash_history: sudo -u lsearch /a/search/lucene.jobs.sh inc-updater-start [23:02:00] ori: you have that? [23:02:06] well, root does [23:02:17] sounds right [23:02:28] until the next restart [23:02:41] not good [23:02:42] right [23:02:49] better? [23:03:13] the shell script backgrounds that but I don't see it nohup it [23:03:30] it certainly could some place [23:03:52] i have no idea, just figured i'd look [23:03:56] thanks [23:04:17] if you have sudo on the machine look through root's .bash_history for a poor man's server admin log [23:04:29] I can't sudo on that machine [23:04:44] I would have done that though :( [23:05:15] I have to head out soon too. [23:05:26] We might be able to get away with leaving it for the morning [23:05:37] I don't really know how urgent it is [23:05:41] I mean, we want it fixed [23:05:45] certainly [23:12:46] Is manifests/iptables.pp the current best practice for adding iptables rules in puppet? [23:14:17] seems to be used to add services mostly, but I need to use it to add a redirect to the NAT table.... [23:16:31] cajoel: I think ferm is the way to go, but I'm not sure. [23:17:49] cajoel: ferm [23:17:51] definitely [23:18:24] LeslieCarr: found documentation! [23:18:29] oh yay [23:18:33] https://wikitech.wikimedia.org/wiki/Search#Indexing [23:18:45] says [23:18:46] root@searchidx1001:~# su -s /bin/bash -c "/a/search/lucene.jobs.sh inc-updater-start" lsearch [23:18:49] like we thought [23:18:55] wow seriously ? [23:18:56] ugh [23:19:00] why is that not on startup [23:19:11] ugh indeed [23:19:22] cajoel: scfc_de: a potential issue with ferm is that the default policy is to drop :-] [23:20:37] manybubbles|away: I had a upstart job for lucene.jobs.sh :-D [23:20:49] but eventually we forgot about it since we wanted to migrate out to something else [23:21:06] Where the hell is $wmgUseDualLicense used? [23:25:45] !log started indexing on searchidx1001 with " su -s /bin/bash -c "/a/search/lucene.jobs.sh inc-updater-start" lsearch" [23:25:52] Logged the message, Mistress of the network gear. [23:25:54] hashar: thanks for the ferm hookup... maybe we should deprecate out iptables.pp if ferm is the new new hotness. [23:26:40] LeslieCarr: manybubbles|away: here is the old puppet change to wrap lucene inc-updater-start in an upstart job https://gerrit.wikimedia.org/r/#/c/55406/ :D [23:26:54] cajoel: well that is being slowly migrated as time allow [23:27:26] hi, matanya! [23:27:31] sorry I missed this. [23:27:54] I am out to bed *wave* [23:53:11] (03CR) 10Mwalker: "@ori; good catch :[" (0316 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 (owner: 10Mwalker) [23:54:53] TimStarling: is F30 the highest scap can go, even with the proxies? [23:58:16] (03PS10) 10Mwalker: Collection Renderer (Now a module!) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102352 [23:58:52] no, I think I just neglected to increase it after I introduced the proxies