[05:36:17] <_joe_> fsero: well it all went pretty well, after all? [07:45:52] morning! I'd like to deploy https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/514656/ today if nobody opposes [07:46:30] it seems working fine on canaries [07:47:37] to avoid any service disruption I could go with one mw host at the time, depool/run-puppet/pool [07:47:51] or even small batches [07:47:56] +1 [07:48:39] thanks :) [11:08:52] ok so mcrouter restarts almost done [11:09:05] the codfw proxies are still to be completed [11:09:27] I did only one and the impact (from the mcrouter's metrics) was minimal [11:09:38] same thing reported from Manuel from the db side [11:09:51] will do the rest after the SWAT [12:18:22] as FYI, two codfw proxies are still to restart. Need to run errand for a bit, will complete the work when back :) [12:18:32] (puppet will be disabled for this time frame) [12:20:04] <_joe_> thanks :) [12:20:14] <_joe_> and good to know [12:20:24] <_joe_> lmk when puppet is enabled again [15:03:11] ottomata: we currently don't have the eventschemas servers covered at https://wikitech.wikimedia.org/wiki/Service_restarts, is there more to restarting/rebooting them other than a standard depool/repool? [15:26:26] Nope! [15:31:15] ok, I'll amend the service restarts page, then [15:31:22] i'm updating now moritzm [15:31:23] ! [15:31:28] ok! [15:32:02] i also updated the Druid stuff, that was out of date [15:32:09] ack [16:35:51] <_joe_> jijiki: are you attached to my tmux on cumin1001? [16:36:23] <_joe_> it just closed... [16:40:41] yesh, my keyboard fucked up [16:41:01] because I hadnt replaced the batteries [16:41:39] oh, it happened to me once, very annoying [16:42:11] _joe_: I am looking at which servers are remaining [16:42:22] and I will continue the restarts [16:42:55] I am using the laptop keyboard so this should be fin [16:43:00] e [16:43:03] <_joe_> jijiki: I have an idea about that [16:43:11] <_joe_> let's deploy my nrpe check [16:43:16] <_joe_> and see what warns/alerts [16:43:26] <_joe_> then you can restart on those [16:46:27] they are 11 servers [16:46:29] sure [16:46:34] ok let me look at the patch [17:00:59] <_joe_> the last version works better :D [17:02:49] lol I thought you'd go with uptime [17:02:56] :p [17:03:07] <_joe_> no hits is more accurate [17:03:20] ok should I merge it ? [17:03:34] <_joe_> if you wish [17:04:05] let me do a verification [17:05:16] 10serviceops, 10Citoid, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Figure out how to test Citoid with Zotero in the pipeline - https://phabricator.wikimedia.org/T225236 (10mobrovac) [17:07:51] lol _joe_ I was running the compiler catalogue :p [17:08:08] <_joe_> there is an error, not caught by the compiler [17:08:21] <_joe_> fixing it [17:40:51] <_joe_> I'm ashamed of myself, I did a gross miscalculation. We have servers in warning that shouldn't be (the opcache hit ratio)