[16:10:31] hey, is there a task for the cloudservices2003-dev work going on the past day? [16:19:36] shdubsh: .... maybe? I think that host is kind of in limbo right now. We were going to reimage it to do something different and then we got another new host for that role sooner than we expected. [16:19:49] shdubsh: did it fall out of downtime again? [16:20:38] it didn't fall out of downtime, it's unavailability cascaded to our prometheus jobs reduced availability check [16:21:32] I'm debating whether to remove that host from prometheus scrapes to clear up the alert. [16:22:02] andrewbogott: ^ could you take some time today and see if you can make the monitoring dashboards happier about our codfw cluster? [16:23:13] bd808: I am working on those hosts today. But — I don't entirely understand why it's not OK to have a host be a work in progress. shdubsh is there some moral equivalent to icinga downtime that I should be doing for new hosts? [16:23:26] Everything is downtimed but people are still popping in here constantly to ask these things [16:26:51] andrewbogott: To be clear, I do see that you downtimed the host which is appropriate and very helpful. It is ok and necessary to perform maintenance on hosts. Your actions are not in question at all. [16:27:29] I am investigating another alert which is fallout from that action and am looking for information on how to deal with my alert. [16:28:07] shdubsh: right right, it's just that you're the third person to ask about that host which makes me think we need a better way to communicate "please ignore this system" [16:29:07] If the host is expected to be back soon, then I'll ack the alert and let it go -- preferable for the short term because it takes code changes to bring it back to true. [16:29:48] If it's going to be longer, then the code changes are preferable because the alert itself aggregates a lot of systems and is an indicator of larger problems. [16:31:31] Hence, asking for a task to try and judge which action to take. I'm sorry that many questions have been asked about this. I can see how that can get tedious :( [16:42:29] this is the parent task… https://phabricator.wikimedia.org/T251294 [16:42:37] I don't have one for that server in particular but can make one [16:43:07] some of those blockers are for other people so it won't be sorted today [16:47:29] andrewbogott: this is perfect. Thanks! [17:10:13] !log tools.versions Restarted to create versions.toolforge.org ingress [17:10:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.versions/SAL [17:13:59] !log tools.versions Restarted to force traffic to versions.toolforge.org [17:14:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.versions/SAL [17:16:20] bd808: any idea when you could review the Striker phab project patch? [17:17:05] Majavah: sadly no date I can honestly commit to. I did start getting my test environment for that in shape this week. [17:18:00] The big problem here is that I'm supposed to be a manager as my main role these days and that the non-manager things lose to that work. :/ [17:18:00] bd808: no hurry [17:18:24] also, any chance you could award me the Nerd Sniper and Volunteer badges in Phab? [17:18:40] totally recognize myself as a nerd sniper :D [17:19:11] badges come mysteriously like barn stars ;) [17:19:54] oh well, still worth a try [19:57:23] @bd808 I just applied to the manager position, so if everything goes well, maybe I can help? [19:58:00] fsargent_: maybe so :) [19:59:43] !log tools.versions Updated to 8c286cf (Update elasticsearch URL) [19:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.versions/SAL [21:10:22] How many rows does the enwiki commen table have? About 320M? [21:10:46] (Which is ridiculous, the revision table has 950M.) [21:11:25] To "join" the revision table to the comment table I have to query the comment table in groups of about 50M revisions and then hardcode the IDs into the next query, eg. [21:11:44] So hopefully 6 queries will be enough. :) [21:13:01] are you using the comment_revision view? [21:15:47] !log tools.versions Updated to c544a32 [21:15:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.versions/SAL [21:16:02] Hm let's see, I was at some point [21:58:08] !log tools.sal Updated $HOME/.lighttpd.conf for sal.toolforge.org routing [21:58:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [21:59:21] !log tools.sal Udpated to c4ad4bf [21:59:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [22:08:58] AntiComposite: comment_revision is not faster here for me [22:14:40] Nemo_bis: we are down to 2 wiki replica servers right now because of ongoing hardware/software issues with labsdb1011, so I expect everything is slower for everyone. [22:15:07] folks are working on labsdb1011 and trying to get it back into rotation [22:16:32] But I think I remember that some of your work with the comments table is doing "in string" kinds of searches. That sort of thing is always going to be expensive on the server side and probably slow no matter what we do with the hardware [22:17:30] I suppose an interesting helper for that kind of work would be some search system with full text indexing on comments. [22:18:34] I don't know what kind of hardware (disk, ram) we would need to keep up with that across all of the wikis off the top of my head. [22:22:29] bd808: yes, I know about the maintenance. Doing a LIKE with a single wildcard on a 256 byte field should really not be such a gigantic issue. [22:23:22] I'm certainly narrow-minded compared to the overall goals of the database structure but this "optimisation" has only brought performance losses for the end user so far. [22:23:30] Sorry, normalisation. [22:24:02] Nemo_bis: yeah, it makes all the analytic query stuff that tools do crappier. No doubt about that. [22:24:27] Nobody asked about tools when they were trying to figure out how to keep MediaWiki working at scale [22:25:40] Correction: one thing got faster, or should have: renames of "big" users. That's the only user-facing justification for the entire mess. No idea about the comment table. [22:28:17] Nemo_bis: the comment table was invented for a 100% user facing feature -- long comments for non-latin alphabets (T153333) [22:28:18] T153333: RFC: How should we store longer revision comments? - https://phabricator.wikimedia.org/T153333 [22:29:06] so blame Kaldari for caring about folks using non-latin scripts I guess ;) [22:58:44] !log tools.docker-registry Switched on --canonical [22:58:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.docker-registry/SAL [22:58:58] !log tools.integraality Deploy latest from Git master: f0db935, f8d9fdf, f57c060, 6c534b7 (T248788) [22:59:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.integraality/SAL [23:08:09] !log tools.gmt Switched on --canonical [23:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gmt/SAL [23:11:46] !log tools.grid-jobs Created ingress for grid-jobs.toolforge.org [23:11:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.grid-jobs/SAL [23:20:32] !log tools.gridengine-status Turned on --canonical, but it is apparently not working? [23:20:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gridengine-status/SAL [23:32:00] !log tools.gridengine-status Live hacked a fix for malformed numeric handling [23:32:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gridengine-status/SAL