[08:17:59] <wikibugs>	 10serviceops, 10Graphoid, 10Operations: Graphoid "No graph found." failures - 11 Dec 2019 - https://phabricator.wikimedia.org/T240419 (10akosiaris) 05Open→03Resolved a:03akosiaris Graphoid is to be undeployed mid of next quarter. With that in mind, the alerts were because some one edited the monitored...
[08:35:05] <elukey>	 hello people
[08:35:19] <elukey>	 we'd need to decice what name to use for the memcached gutter pool
[08:35:46] <elukey>	 the rack/setup/deploy tasks will need it :)
[08:36:27] <elukey>	 effie proposed something like mc-gp[12]00[1-3] IIRC
[08:36:33] <elukey>	 to differenciate
[08:37:19] <elukey>	 the alternative is mc[12]0[37-39] or something else entirely
[08:40:59] <_joe_>	 elukey: +1
[08:41:23] <_joe_>	 :D
[08:42:43] <elukey>	 I sense some trolling :D
[08:43:36] <elukey>	 if nobody comes up with a different naming I'll go for mc-gp :)
[08:48:11] <_joe_>	 elukey: that meant "anything you prefer"
[08:48:20] <_joe_>	 mc-gp is ok
[08:48:47] <_joe_>	 given it's a different set of servers with different characteristics
[08:49:01] <_joe_>	 elukey: in the meantime, we noticed with cdanis yesterday
[08:49:19] <_joe_>	 1 - mcrouter is running with niceness 0, while php-fpm runs at -19
[08:50:00] <_joe_>	 2 - there is a 1:1 correspondence between slowness in proxy processing in mcrouter and slowness in response times from single appservers
[08:50:04] <_joe_>	 see https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=appserver&var-instance=mw1324&var-memcached_server=All&fullscreen&panelId=2&from=now-24h&to=now
[08:50:47] <_joe_>	 and https://grafana.wikimedia.org/d/000000550/mediawiki-application-servers?orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=appserver&var-node=mw1324&from=now-24h&to=now&fullscreen&panelId=82 (the GET metrics)
[08:51:17] <_joe_>	 also we have pretty skewed weights on https for the appservers, and I shall fix them
[09:00:47] <elukey>	 _joe_ yep I know I was joking, but I'd prefer to avoid taking decisions for a service that I don't own :)
[09:01:00] <elukey>	 about your points
[09:01:17] <_joe_>	 you don't own memcached? that is new!
[09:02:12] <elukey>	 1. I didn't have any idea about the niceness, no idea also about the real perf consequences on the node (I mean in slowness caused by the imbalance)
[09:02:16] <elukey>	 really nice find 
[09:02:57] <elukey>	 2. for the 1:1 correspondence it makes sense right? Mediawiki will wait up to 1s for mcrouter, so latencies will suffer when mcrouter slows down
[09:17:17] <jbond42>	 hi all is there anything special to consider in when restarting kube-controller-manager on the k8s controlers.  other then depool; restart; pool
[10:43:30] <wikibugs>	 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10Nikerabbit)
[10:54:48] <wikibugs>	 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10Nikerabbit)...
[16:00:13] <ottomata>	 _joe_:  o/  why don't you like beacon-<something>.wm.org as a domain name for stream intake (eventgate) services?
[16:00:49] <_joe_>	 because beacon-something is killed by most adblockers
[16:00:54] <ottomata>	 ah
[16:00:56] <_joe_>	 but I'm in a meeting
[17:54:52] <wikibugs>	 10serviceops: setup new, buster based, kubernetes etcd servers for staging/codfw/eqiad cluster - https://phabricator.wikimedia.org/T239835 (10akosiaris)
[17:57:22] <akosiaris>	 jbond42: no, not really. Feel free to restart it at will
[17:57:39] <akosiaris>	 the other instance will take over anyway. It does a master election via the API
[17:58:23] <jbond42>	 akosiaris: great thanks alex, just finishing off a meeting now but will still likley wait untill tomorrow
[18:00:13] <akosiaris>	 jbond42: I should add that to a runbook. What triggered you though to want to restart kube-controller-manager? The CA change?
[18:00:30] <jbond42>	 yes exactly
[18:01:05] <jbond42>	 ill update the service_restarts page as well which is what i mostly use
[18:03:45] <jbond42>	 akosiaris: i added this https://wikitech.wikimedia.org/wiki/Kubernetes#Restarting_calico-node if you are able to give it a check
[18:04:28] <jbond42>	 not related to the kub-controller but could stil use some review :)
[18:04:58] <jbond42>	 ccccccktdtgkgkhgilcgfinrruchgjcvcfdijlkndlkt
[18:05:13] * jbond42 stupid cat
[18:05:42] <effie>	 lol
[18:05:56] <cdanis>	 mine said hello during our meeting
[18:06:17] <apergos>	 cat incident!
[18:06:37] <apergos>	 amazing how they can press those yubi buttons so accurately
[18:07:38] <jbond42>	 cdanis: yes i saw yours go bye :)
[18:07:39] <akosiaris>	 jbond42: it's pretty great
[18:07:44] <cdanis>	 ahaha you weren't the only one
[18:08:01] <akosiaris>	 I 've added more for the apiserver, the kube-controller manager and kube-scheduler
[18:08:11] <jbond42>	 great thanks
[18:08:24] <jbond42>	 the service restarts page points to that so thats great
[18:13:25] <effie>	 I am happy my dog cant reach my keyboard
[18:13:32] <effie>	 and cant reach many stuff in general 
[18:14:15] <apergos>	 lucky
[18:14:41] <effie>	 what I have now is 
[18:14:53] <effie>	 if she wants to either go out, is hungry, or bored
[18:14:56] <effie>	 she is pacing 
[18:14:59] <effie>	 A LOT 
[18:15:05] <effie>	 all around the flat 
[18:15:19] <effie>	 just pacing aimlessly 
[18:17:09] <apergos>	 toenails on the floor? click click click click
[18:56:48] <effie>	 yes 
[18:56:52] <effie>	 tap tap tap 
[19:26:52] <apergos>	 lol
[19:26:57] <apergos>	 I used to have this with my cats
[19:27:18] <apergos>	 I knew when it was time to clip their claws because they would get loud when walking across the wood floor