[00:00:21] Yay, error 500 [00:01:17] sweet :) [00:05:44] 1 isn't much use [00:09:14] Server: nginx/0.7.65 [00:09:15] o_0 [00:11:31] X-Cache: MISS from cp1004.eqiad.wmnet,MISS from cp1012.eqiad.wmnet [00:11:31] X-Cache-Lookup: MISS from cp1004.eqiad.wmnet:3128,MISS from cp1012.eqiad.wmnet:80 [00:13:41] Are we using nginx for api app servers? [00:14:20] /proxy [00:41:36] New patchset: Tim Starling; "Restrict NFS exports" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7765 [00:41:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7765 [00:46:49] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7765 [00:46:52] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7765 [00:53:03] "homeless apache servers" ? [00:53:11] Yeah :) [00:53:39] PROBLEM - Puppet freshness on search22 is CRITICAL: Puppet has not run in the last 10 hours [00:53:53] I think that how much we deal with homeless Apaches is appropriate for our respective neighborhoods [00:59:50] We should get people to donate to help them buy homes [01:02:29] heh [01:41:24] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 232 seconds [01:44:15] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:23:52] morebots is AWOL [02:30:34] damn it. wikitech doesn't pull from our repo [02:30:52] I have morebots packaged and updated [02:31:00] We don't miss her [02:31:53] I like using !log and it appearing in my twitter stream [02:34:49] arrrrrggghhh [02:35:14] Reedy: The bot is very anti-social to be on a social network :D [02:35:23] It's submissive [02:35:41] Oh so you just like dominating it? :D [02:36:12] It's a hussie [02:36:18] It suggests most people are master [02:36:26] ok, it's back [02:36:27] PROBLEM - Frontend Squid HTTP on cp1004 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error [02:37:11] thanks [02:39:03] RECOVERY - Frontend Squid HTTP on cp1004 is OK: HTTP OK HTTP/1.0 200 OK - 27545 bytes in 0.109 seconds [03:03:12] RECOVERY - Puppet freshness on search22 is OK: puppet ran at Wed May 16 03:03:03 UTC 2012 [03:03:12] PROBLEM - Frontend Squid HTTP on cp1004 is CRITICAL: HTTP CRITICAL: HTTP/1.0 504 Gateway Time-out [03:05:16] New review: Krinkle; "Maybe we need a Settings file for labs and for production and CommonSettings.php includes the right ..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7706 [03:05:48] New review: Krinkle; "Maybe use PrivateSettings for that (which isn't in the repo, but is included)" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/7706 [03:09:17] New review: Krinkle; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/7577 [03:10:36] Nemo_bis: I recall you showing interest in dbbot-wm . FYI, the code is now up on GitHub. including documentation. [03:10:52] https://github.com/Krinkle/ts-krinkle-Kribo https://github.com/Krinkle/ts-krinkle-wmfDbBot [03:14:27] !log depooling cp1004 and stopping the squid backend service to let some connections close [03:14:32] Logged the message, Master [03:16:06] RECOVERY - Frontend Squid HTTP on cp1004 is OK: HTTP OK HTTP/1.0 200 OK - 27543 bytes in 0.113 seconds [03:17:54] PROBLEM - Backend Squid HTTP on cp1004 is CRITICAL: Connection refused [03:22:47] !log repooling squid frontend on cp1004 [03:22:51] Logged the message, Master [03:29:18] RECOVERY - Backend Squid HTTP on cp1004 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.108 seconds [03:30:41] New patchset: Hashar; "redirect (302) /w/ to /w/index.php" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/7772 [03:34:46] !log stopped the squid process on cp1004 and stopped puppet to avoid it being restarted. it's having issues and I can't debug it right now. [03:34:50] Logged the message, Master [03:35:26] @externals [03:35:26] Krinkle: [all.dblist] last update: 2012-05-16 01:40:02 (UTC); [db.php] last update: 2012-05-16 01:40:02 (UTC) [03:36:39] PROBLEM - Backend Squid HTTP on cp1004 is CRITICAL: Connection refused [03:45:39] Hey guys I'm getting constant API errors now, mostly from cp1005.eqiad.wmnet [03:45:49] New patchset: Hashar; "disable "last message repeated n times"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/7773 [03:46:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/7773 [03:48:15] kaldari: What's the error msg? [03:48:41] same as it's been all afternoon: Error: ERR_SOCKET_FAILURE, errno (98) Address already in use [03:49:08] Ugh [03:49:19] And Ryan is on his way home [03:49:22] He fixed cp1004 earlier [03:49:29] Yeah [03:49:49] Constant? [03:49:55] not anymore [03:50:01] it's just intermittant now [03:50:14] but I was getting it constantly for a few minutes [03:50:19] Yeaaah [03:51:08] I missed Ryan by 10 minutes [03:51:32] New review: Hashar; "$cluster is going to be used to include labs specific settings and, in some files, to tweak settings." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/7706 [03:51:38] I think he had somewhere to be [03:51:48] i suspect there's only really TimStarling around atm.. [03:52:00] New patchset: Hashar; "override $cluster when on labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7706 [03:52:14] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7706 [03:52:17] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7706 [03:53:25] New patchset: Hashar; "implements beta labs specific domains" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7705 [03:53:58] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7705 [03:54:00] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/7705 [03:54:26] It seems to be failing about 10-20% of the time on API POST requests for me - all cp1005 as far as I can tell [03:54:29] judging by the error message, Ryan just shut down cp1004, he didn't fix it [03:54:42] I mean the log message [03:55:01] Yeah, he said he didn't have time to debug it [03:57:10] yeah, it's spamming the syslog at a high rate [03:57:43] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:02:06] mutante mentioned that ops had to upgrade the kernel on the API machines recently, but I don't know the details [04:04:36] Most likely for the uptime bug [04:04:42] bind(4306, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [04:05:09] !log updating a few plugins on Jenkins (host: gallium ) [04:05:13] Logged the message, Master [04:05:31] doesn't seem right [04:07:01] ok, that is right [04:10:01] kaldari: Ryan despooled cp1005 [04:10:23] kaldari: ho sorry, you were already aware about it :-D [04:10:32] how? [04:10:42] you mean cp1004? [04:11:03] yeah [04:11:05] well, it's the backend process that's an issue [04:11:05] I'm going to turn it off and stop puppet [04:11:10] yeah cp1004 [04:11:14] sorry for the confusion tim [04:12:14] I'm not sure why it is calling bind for outbound connections, is that necessary? [04:15:47] I think so [04:15:53] Man it's been forever since I've done C socket programming [04:17:32] Hah I guess you're right [04:17:39] You don't need to bind() an outgoing socket [04:18:39] Or at least not explicitly [04:18:46] But I seem to recall that connect() calls bind() or something [04:21:43] PROBLEM - Frontend Squid HTTP on cp1005 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error [04:23:13] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [04:23:22] RECOVERY - Frontend Squid HTTP on cp1005 is OK: HTTP OK HTTP/1.0 200 OK - 27542 bytes in 0.116 seconds [04:30:15] it looks like you can use bind() to specify a local interface for an outbound connection [04:31:40] the relevant interface is configurable in squid [04:44:39] timstarling - check out -http://ganglia.wikimedia.org/latest/?c=Application%20servers%20pmtpa&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [04:57:47] it looks like something happened about 8 hours ago judging by the graphs here: http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=load_one&s=by+name&c=Application+servers+pmtpa&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [04:58:37] that's around the time we did the deploy for PageTriage to en.wiki [05:02:59] would turning off PageTriage in the en.wiki config be worth trying? [05:04:08] TimStarling, Reedy: ^ [05:08:46] ctwoo: ^ [05:09:32] i would defer to tim/roan or sam on this matter [05:15:06] * kaldari whistles to himself [05:15:44] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:17:05] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [05:22:08] Krinkle, thanks; is it linked on Meta somewhere? [05:22:14] or at least wikitech [05:22:24] What is linked ? [05:22:40] Krinkle, dbbot source code [05:22:44] https://www.mediawiki.org/wiki/dbbot-wm [05:22:45] you highlighted me a while ago [05:22:47] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: Puppet has not run in the last 10 hours [05:22:48] ok thanks [05:22:51] * Nemo_bis out now [05:24:17] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [05:27:50] those ganglia graphs are broken [05:28:33] i c [05:29:54] !log experimentally started squid on cp1004 [05:29:57] Logged the message, Master [05:30:10] let's see what happens... [05:30:26] RECOVERY - Backend Squid HTTP on cp1004 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.107 seconds [05:30:53] it's the same [05:30:57] just got the socket failure error for cp1004 [05:31:05] I'm stopping it again [05:32:45] now back to cp1005 socket error [05:34:53] RECOVERY - Lucene on search1015 is OK: TCP OK - 9.020 second response time on port 8123 [05:35:47] PROBLEM - Backend Squid HTTP on cp1004 is CRITICAL: Connection refused [05:38:38] hey Ryan [05:38:45] hi [05:38:47] sup? [05:38:53] looks like we're having the same problem from cp1005 as we were having from cp1004 [05:39:02] * Ryan_Lane grumbles [05:39:17] Tim can probably fill you in on more [05:39:40] I haven't found out much [05:39:40] seems the new version of ganglia isn't working very well on the app servers [05:39:44] sounds like i should scroll up [05:39:51] yeah, it's broken [05:40:15] yet its fine on all the other hosts... [05:40:16] weird [05:40:17] but the problem is pretty hard to see on ganglia [05:40:21] yeah [05:40:33] you can see extra system CPU [05:40:38] all of the sockets are being used up [05:41:15] bind() fails, but I'm not sure how that is possible [05:41:16] I didn't get much time to investigate [05:41:22] the FD count according to cachemgr is quite low [05:42:26] 418 for the backend and 6700 for the frontend [05:42:38] that seems awfully low [05:42:42] not enough to cause an ephemeral port exhaustion or anything like that [05:44:02] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [05:44:31] 93020 connections in time wait [05:44:50] if it helps at all, I first noticed the problem about 7+ hours ago, around 22:00 UTC [05:45:21] the rise in system CPU was around 22:30 on cp1005 [05:45:44] what limit do time_wait sockets count towards? [05:46:14] * TimStarling googles [05:46:14] let's see.... [05:46:20] heh. doing the same [05:46:44] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.027 second response time on port 8123 [05:47:48] net.ipv4.ip_local_port_range [05:47:56] I believe [05:48:57] tcp_max_tw_buckets [05:49:24] one site recommends setting tcp_tw_recycle=1 [05:49:45] net.ipv4.tcp_max_tw_buckets = 360000 <— so, it isn't that [05:50:11] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [05:50:48] but tcp_tw_recycle is 0 on cp1003 [05:51:17] question is, why are we seeing so many more TIME_WAIT on cp1005 [05:51:28] I saw the same thing on cp1004 earlier [05:52:08] of course, this could simply be a red herring. [05:52:12] it could [05:52:27] but if I set this thing to 1 then the TIME_WAIT connections should go away and we can get on with other theories [05:52:32] just in soft state [05:52:50] !log on cp1005 setting tcp_tw_recycle=1 [05:52:53] Logged the message, Master [05:53:01] * Ryan_Lane nods [05:53:55] * jeremyb seems to be caught up. doesn't seem to be much I can do to help at this point (from the outside) [05:54:24] down to 34k now [05:54:49] 11k, seems to have plateaued [05:55:32] no more bind errors! [05:55:35] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 192 seconds [05:55:53] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 196 seconds [05:56:02] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [05:56:12] "Enabling this option is not recommended since this causes problems when working with NAT" <— oh no! we're going to break everything! [05:56:14] heh [05:56:31] my reference is http://www.stolk.org/debian/timewait.html [05:56:49] I'll try it on cp1004 [05:58:17] !log setting net.ipv4.tcp_tw_recycle=1 on cp1005 seems to have fixed it, doing it on cp1004 as well now [05:58:20] Logged the message, Master [05:58:54] RECOVERY - Backend Squid HTTP on cp1004 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.162 seconds [06:01:25] seems to be fixed on my end [06:02:20] a lot of TIME_WAITs were from the appservers [06:02:40] in fact, it looks like the majority of them were [06:03:13] that makes sense, though [06:03:29] does this version of squid not support connection pooling? [06:06:24] dunno [06:07:42] actually, the properly behaving systems don't have a lot of TIME_WAITs to the appservers [06:07:45] what about pipelining? or maybe they're the same thing? [06:10:17] persistent connections is what I'm thinking of, really [06:10:49] maybe these backends were affected because they hold some special high-traffic URLs [06:10:53] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:11:11] they are api squids [06:11:39] if they were api squids, why would they have connections to appservers.svc.pmtpa.wmnet ? [06:11:45] isn't there another VIP for API? [06:11:54] yes [06:12:05] haha, used some generic terms in google and [[Manual:Squid caching]] was the second hit [06:12:05] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 3.029 second response time on port 8123 [06:12:46] they are definitely listed as API in text-settings, though [06:13:36] separate issue: search1015 seems to need a boot. (is intermittent in nagios about same as 1016 a few days ago) [06:14:01] (that's the pool4 recovery that just happened) [06:16:06] night guys, thanks for saving Wikipedia agains ;) [06:16:14] kaldari: night [06:18:58] cachemgr shows an equal number of requests delivered to appservers.svc and api.svc [06:20:02] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [06:20:02] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [06:20:16] !log restarted lucene on search1015 [06:20:20] Logged the message, Master [06:20:44] seems lucene is possibly misconfigured, but that's unrelated to the process being locked up [06:20:56] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.027 second response time on port 8123 [06:22:05] tcpdump shows it too [06:22:10] odd [06:22:25] well, it does have it configured in the file [06:22:33] I wonder why that's the case [06:23:24] the ones in pmtpa do as well, though [06:23:24] where is it configured? I have been looking at the configuration files and it looks fine [06:24:20] I was looking at the backend conf [06:24:34] cp1005:/etc/squid/squid.conf ? [06:24:40] yes [06:25:06] cache_peer_access 10.2.1.1 deny api_php [06:25:18] ah. right [06:25:19] that should prevent api.php requests from going to 10.2.1.1, but I see them in tcpdump [06:25:29] that makes no sense at all [06:25:54] wait [06:25:54] cache_peer_access 10.2.1.1 allow all [06:26:19] yeah, the first matching rule takes precedence [06:26:22] or at least it's meant to [06:27:03] it's configured the same way in pmtpa [06:28:05] pmtpa is also talking to the appservers [06:28:11] on the api squids [06:29:04] "And, finally, don't forget rules are read from top to bottom. The first rule matched will be used. Other rules won't be applied." [06:29:26] http://www.visolve.com/system_services/opensource/squid/squid30/accesscontrols-4.php#http_access [06:29:35] seems not to be working correctly, then :) [06:29:50] if it's changed, everything would break [06:30:41] I've got to go get some groceries [06:30:48] the site's up isn't it? [06:31:09] yeah [06:31:19] the tw recycle change is working for now [06:31:26] I also need to go away [06:31:28] I need to pack [06:32:23] well, with this configuration the api traffic isn't hitting the app servers, but other requests are. [06:32:31] so it makes sense that there is traffic hitting them [06:34:40] I wonder if one of the api servers is acting up [06:35:14] damn it [06:35:18] ding ding ding [06:35:39] ? [06:35:49] 2012-05-16 06:35:09 srv204 enwiki: [7d70df73] /w/api.php Exception from line 44 of /usr/local/apache/common-local/php-1.20wmf2/extensions/PageTriage/api/ApiPageTriageTemplate.php: ApiPageTriageTemplate::execute: template file not found: "/usr/local/apache/common-local/php-1.20wmf2/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/ext.pageTriage.toolbarView.html" [06:36:00] about a billion of those errors [06:36:03] baaaahhhh [06:36:05] a constant stream [06:36:14] did someone fuck up a deploy? [06:36:30] maybe [06:36:48] heh. people don't check the exception log after they deploy something? :D [06:37:07] * jeremyb is hanging getting into bastion1.pmtpa.wmflabs. very likely unrelated to all of this [06:37:42] hm [06:37:44] is for me too [06:37:46] that isn't good [06:38:09] home dirs are hanging [06:38:32] labs-nfs1 is having issues [06:38:36] might need to rebootit [06:39:48] OATHAuth is going to make me fix these session issues with labs quicker I can already tell [06:40:05] rotfl [06:40:11] using it already? [06:40:18] yep [06:40:25] I'm likely going to force cloudadmins to use it [06:40:53] i heard [06:41:46] well sure enough that html file ain't in there [06:41:53] I'll see if I can find it somewhere in the tree [06:42:17] question is, is it supposed to be there, or not supposed to be referenced anymore? [06:42:40] I don't know, but if it's referenced then it can darn well be there for right now [06:42:40] hm. looks like labs-nfs1 reboot itself [06:42:46] yeah [06:44:04] well that's bad [06:44:46] well, it probably reboot itself because it patched and needed a reboot [06:45:00] it's not a great instance to have that happen on, though. heh [06:45:21] not in wnf2 or wmf3. so looks like broken code. meh [06:45:40] it could be in an older version of the extension [06:45:43] did you chek its log? [06:46:16] no, I'm about to do that now [06:47:31] that missing file was introduced in https://gerrit.wikimedia.org/r/7764/ (~6 hrs ago) [06:48:14] I see the refereence is in some javascript here [06:49:02] maybe it wasn't deployed? [06:49:37] yep [06:49:40] wasn't deployed [06:49:52] maybe we should page kaldari [06:50:28] I'm going to page him [06:50:30] good [06:50:31] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [06:50:56] what determines what version of an extension is deployed? [06:50:59] it's just bordeline getting too late i.e. I don't yet feel guilty about it [06:51:28] the extensions megarepo just has master and no other branches [06:51:37] quick hack so the unfinished toolbar won't load if this code gets randomly deployed, since it's in master (oops!) [06:51:40] that's awesome :-D [06:51:49] it could be causing the issue. [06:52:00] it's for sure spamming the hell out of the logs [06:54:35] well something is causing that piece to execute,. master or no. so yeah let's hope he shows up in here soon [07:00:08] who is raindrift? that's our other option [07:00:08] https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/PageTriage.git;a=commitdiff;h=d80183f6a69d14a14d77b929ae8b6b9638e84a23 [07:03:27] ian [07:04:11] could ping him I guess [07:05:52] 21:09 logmsgbot: raindrift synchronizing Wikimedia installation... : PageTriage update [07:06:01] that's from yesterday [07:06:05] * Ryan_Lane nods [07:06:45] you wanna ping him? (I have a soft voip phone so I'm restricted to actual phone calls I guess) [07:07:22] but I am happy to sit in here and work it out with whoever shows up (as in you should go to bed) [07:07:36] pages [07:07:36] *paged [07:07:37] thanks [07:07:40] I still need to pack. heh [07:07:46] ok. well get packing :-D [07:07:50] and now labs is having some issue [07:07:56] oh. right :-/ [07:08:03] I bet you anything it's due to gluster again [07:08:24] gluster is turning out to be a real ball-buster isn't it (scuse the language) [07:08:27] I'm hoping the load goes back down to normal [07:08:31] yes. it's killing me [07:08:33] so nice in theory [07:08:41] so fubar in real life [07:09:16] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:09:17] if there is something you would want me to try/do if it doesn't behave, I can babysit it [07:09:35] I'm thinking it's actually due to labs-nfs1 [07:10:32] raindrift: heh. funny autocorrect too [07:10:41] hello raindrift [07:10:41] totally [07:10:43] hi! [07:10:45] sorry for the page, [07:10:49] so, I guess our stuff is broken? [07:10:53] what's going on is a pile of errors in the logs: [07:10:55] (yeah) [07:11:01] raindrift: check /home/w/log/exception.log [07:11:05] 2012-05-16 06:35:09 srv204 enwiki: [7d70df73] /w/api.php Exception from line 44 of /usr/local/apache/common-local/php-1.20wmf2/extensions/PageTriage/api/ApiPageTriageTemplate.php: ApiPageTriageTemplate::execute: template file not found: "/usr/local/apache/common-local/php-1.20wmf2/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/ext.pageTriage.toolbarView.html [07:11:07] like these [07:11:15] and so there is no View.html yet of course [07:11:25] er ext.pageTriage.toolbarView.html [07:11:56] what I don't understand is why that stuff is being executed at all [07:12:57] i don't either. [07:13:07] grrrr... I love it when other european leaders tell us what our elections are for. (sorry, have morning news program on. prolyl shouldn't, it just pisses me off) [07:13:19] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:13:24] I am guessing that it has to do with yesterday's deployment, what else could it be [07:13:42] anyway, the fix for this is to make the api not throw those errors, since you can request any template file. i'll go fix it. just a minute. [07:13:48] ok [07:13:51] does it need to be deployed at all? or is it just for e.g. test/test2? [07:13:53] well, the file is actually missing [07:13:59] sure. [07:14:10] but it's for a feature that should be disabled. [07:14:13] ah [07:14:32] the toolbar's not done yet, and is therefore turned off. [07:14:39] at least in theory :-D [07:14:54] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:15:03] i don't understand what the procedure is to get the current deployed commit id/tag for an extension. PageTriage doesn't even have tags or branches (besides master) it seems [07:15:28] no branches. right [07:19:41] anyone feel like reviewing a change? https://gerrit.wikimedia.org/r/7775 [07:20:40] what happens when that returns false (I don't feel like digging through the rest of the code)? [07:21:17] apergos: look at the 3 lines above [07:21:56] that doesn't answer my question [07:22:12] yeah, maybe that should just return, actually. [07:22:23] what does the caller do when execute returns false, that was my q [07:22:25] ok [07:24:07] okay, fixed [07:25:50] so here I am at patchset two and it says return false... [07:26:28] wtf [07:26:47] there's no difference between the patchsets [07:26:59] ok, so it's not just me hatin' on gerrit's ui [07:27:01] besides commit msg [07:27:49] sorry, i'm still not used to specifying "-a" every time I want to commit. [07:28:00] should be better now. [07:28:33] the thing I don't get is why that js code is being executed at all. It shouldn't be. The only way it should run is if $wgPageTriageEnableCurationToolbar is set, which it isn't. [07:28:50] worth testing but given the hour where you are, not right now [07:29:07] (unless you are now so irritated that you have to fix it before you go to bed :-D) [07:29:15] no, i have things to do in the morning. [07:29:23] merged [07:29:31] awesome. thanks. [07:29:38] should i deploy this? [07:29:48] * apergos grits teeth [07:29:56] what else would need to go out with it? [07:30:06] i mean, it's one file. i could just sync the file. [07:30:24] yes, just the one file [07:30:35] sounds good. i'll do that. [07:32:20] prolly worth figuring out why the code is executed soon-ish (next couple of days) [07:35:11] yeah. i'm working on that section actively, so i'll do it tomorrow. [07:35:28] cool, I'm curious to learn the answer too :-D [07:37:00] * jeremyb just subscribed to notifications for all changes on the PageTriage repo ;-P [07:37:07]