[08:01:03] <_joe_>	 fsero: I'm changing the values of all the keys we export with mtail, so wait before you edit the graphs more
[08:01:33] <fsero>	 Ack
[08:01:52] <_joe_>	 I changed the prefix to mediawiki_http, which is more fitting
[09:28:57] <fsero>	 _joe_: fixed variables in grafana
[09:29:02] <fsero>	 it was querying old names
[09:29:10] <fsero>	 and im going to add some charts, feel free to burn them all
[09:29:26] <_joe_>	 fsero: ugh wait
[09:29:37] <fsero>	 -me waiting
[09:29:39] <_joe_>	 I think I forgot to save the dashboard
[09:30:02] <fsero>	 save it i can remade what i just did
[09:30:27] <_joe_>	 yeah I did
[09:30:32] <_joe_>	 yeah wait a sec pls
[09:31:46] <_joe_>	 ok I think you have to fix something for the instance variable
[09:34:28] <fsero>	 lemme fix it
[09:34:44] <_joe_>	 oh yeah they need all to be fixed
[09:36:30] <fsero>	 done
[09:38:05] <_joe_>	 it's interesting to note how on the API cluster hhvm has a better p95 and p75 than php-fpm, while on the appserver cluster it's clearly the other way around
[09:38:12] <_joe_>	 php-fpm is always marginally faster
[09:46:14] <fsero>	 i like the dashboard a lot now :)
[09:46:17] <fsero>	 _joe_: curious https://grafana.wikimedia.org/d/RIA1lzDZk/xxx-joe-appserver?orgId=1&panelId=22&fullscreen&var-datasource=eqiad%20prometheus%2Fops&var-cluster=api_appserver&var-instance=mw1221%3A3903&var-method=GET&var-code=200&from=1564562765879&to=1564566365879
[09:47:31] <_joe_>	 that 304 are the highes ranked doesn't surprise me
[09:48:39] <_joe_>	 yeah it's getting to where I wanted it to be
[09:48:47] <_joe_>	 we shall have such a dashboard for all services
[09:49:05] <_joe_>	 we do for most of the ones on k8s
[16:13:39] <_joe_>	 mutante: I plan to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/525584 tomorrow
[16:13:46] <_joe_>	 and then test it on mw1270
[16:13:57] <_joe_>	 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/526720
[16:40:28] <mutante>	 _joe_: ok, sounds good! i was already contemplating whether i should. also i amended to https://gerrit.wikimedia.org/r/c/operations/puppet/+/526289  and it compiles on scandium now after some fixes
[16:42:13] <mutante>	 and yea. it does include the common role in the role. though i still think it was discouraged (in the past)
[16:44:46] <mutante>	 made https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups  to replace old docs talking about "DSH group files" when they were still actual files
[16:44:50] <mutante>	 after https://phabricator.wikimedia.org/T227547
[17:12:45] <_joe_>	 mutante: well written, I think it's very clear
[17:14:45] <mutante>	 :)
[18:22:42] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway)
[18:30:13] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway) These worker deaths are being caused by the additional load generated by pregenerating respons...
[18:37:23] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Pchelolo) > Aside from that, I would have expected that the pregeneration jobs would be finishing up by n...
[18:47:30] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway) So one issue right off the bat is that we're generating `/feed/onthisday/all` in the service i...
[18:49:03] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway) p:05Normal→03High
[18:51:12] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Pchelolo) >>! In T229286#5381330, @Mholloway wrote: > So one issue right off the bat is that we're genera...
[18:54:18] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Mholloway) Weird.  Why are we still getting internal requests for `/feed/onthisday/all`, then?
[19:01:20] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Reading-Infrastructure-Team-Backlog (Kanban): "worker died, restarting" mobileapps issue - https://phabricator.wikimedia.org/T229286 (10Pchelolo) >>! In T229286#5381396, @Mholloway wrote: > Weird.  Why are we still getting internal requests...
[22:16:17] <mutante>	 there is a "Google Kubernetes Engine Plugin 0.6.3" plugin for Jenkins. " allows you to publish deployments built within Jenkins to your Kubernetes clusters running within GKE". maybe that can be configured to your own cluster