[08:17:59] 10serviceops, 10Graphoid, 10Operations: Graphoid "No graph found." failures - 11 Dec 2019 - https://phabricator.wikimedia.org/T240419 (10akosiaris) 05Open→03Resolved a:03akosiaris Graphoid is to be undeployed mid of next quarter. With that in mind, the alerts were because some one edited the monitored... [08:35:05] hello people [08:35:19] we'd need to decice what name to use for the memcached gutter pool [08:35:46] the rack/setup/deploy tasks will need it :) [08:36:27] effie proposed something like mc-gp[12]00[1-3] IIRC [08:36:33] to differenciate [08:37:19] the alternative is mc[12]0[37-39] or something else entirely [08:40:59] <_joe_> elukey: +1 [08:41:23] <_joe_> :D [08:42:43] I sense some trolling :D [08:43:36] if nobody comes up with a different naming I'll go for mc-gp :) [08:48:11] <_joe_> elukey: that meant "anything you prefer" [08:48:20] <_joe_> mc-gp is ok [08:48:47] <_joe_> given it's a different set of servers with different characteristics [08:49:01] <_joe_> elukey: in the meantime, we noticed with cdanis yesterday [08:49:19] <_joe_> 1 - mcrouter is running with niceness 0, while php-fpm runs at -19 [08:50:00] <_joe_> 2 - there is a 1:1 correspondence between slowness in proxy processing in mcrouter and slowness in response times from single appservers [08:50:04] <_joe_> see https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=appserver&var-instance=mw1324&var-memcached_server=All&fullscreen&panelId=2&from=now-24h&to=now [08:50:47] <_joe_> and https://grafana.wikimedia.org/d/000000550/mediawiki-application-servers?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=appserver&var-node=mw1324&from=now-24h&to=now&fullscreen&panelId=82 (the GET metrics) [08:51:17] <_joe_> also we have pretty skewed weights on https for the appservers, and I shall fix them [09:00:47] _joe_ yep I know I was joking, but I'd prefer to avoid taking decisions for a service that I don't own :) [09:01:00] about your points [09:01:17] <_joe_> you don't own memcached? that is new! [09:02:12] 1. I didn't have any idea about the niceness, no idea also about the real perf consequences on the node (I mean in slowness caused by the imbalance) [09:02:16] really nice find [09:02:57] 2. for the 1:1 correspondence it makes sense right? Mediawiki will wait up to 1s for mcrouter, so latencies will suffer when mcrouter slows down [09:17:17] hi all is there anything special to consider in when restarting kube-controller-manager on the k8s controlers. other then depool; restart; pool [10:43:30] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10Nikerabbit) [10:54:48] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10Nikerabbit)... [16:00:13] _joe_: o/ why don't you like beacon-.wm.org as a domain name for stream intake (eventgate) services? [16:00:49] <_joe_> because beacon-something is killed by most adblockers [16:00:54] ah [16:00:56] <_joe_> but I'm in a meeting [17:54:52] 10serviceops: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835 (10akosiaris) [17:57:22] jbond42: no, not really. Feel free to restart it at will [17:57:39] the other instance will take over anyway. It does a master election via the API [17:58:23] akosiaris: great thanks alex, just finishing off a meeting now but will still likley wait untill tomorrow [18:00:13] jbond42: I should add that to a runbook. What triggered you though to want to restart kube-controller-manager? The CA change? [18:00:30] yes exactly [18:01:05] ill update the service_restarts page as well which is what i mostly use [18:03:45] akosiaris: i added this https://wikitech.wikimedia.org/wiki/Kubernetes#Restarting_calico-node if you are able to give it a check [18:04:28] not related to the kub-controller but could stil use some review :) [18:04:58] ccccccktdtgkgkhgilcgfinrruchgjcvcfdijlkndlkt [18:05:13] * jbond42 stupid cat [18:05:42] lol [18:05:56] mine said hello during our meeting [18:06:17] cat incident! [18:06:37] amazing how they can press those yubi buttons so accurately [18:07:38] cdanis: yes i saw yours go bye :) [18:07:39] jbond42: it's pretty great [18:07:44] ahaha you weren't the only one [18:08:01] I 've added more for the apiserver, the kube-controller manager and kube-scheduler [18:08:11] great thanks [18:08:24] the service restarts page points to that so thats great [18:13:25] I am happy my dog cant reach my keyboard [18:13:32] and cant reach many stuff in general [18:14:15] lucky [18:14:41] what I have now is [18:14:53] if she wants to either go out, is hungry, or bored [18:14:56] she is pacing [18:14:59] A LOT [18:15:05] all around the flat [18:15:19] just pacing aimlessly [18:17:09] toenails on the floor? click click click click [18:56:48] yes [18:56:52] tap tap tap [19:26:52] lol [19:26:57] I used to have this with my cats [19:27:18] I knew when it was time to clip their claws because they would get loud when walking across the wood floor