[02:24:40] 10serviceops, 10Operations, 10Performance-Team (Radar), 10User-jijiki: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 (10Jdforrester-WMF) [02:26:52] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Jdforrester-WMF) This was a blocker for {T219150}, right? Does {T224857} also block that or is this work s... [03:58:54] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10jijiki) >>! In T224491#5291244, @Jdforrester-WMF wrote: > This was a blocker for {T219150}, right? Yeah... [06:09:46] 10serviceops, 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This was resolved via T215975 [07:00:28] 10serviceops, 10Operations: cronspam for slow queries in PageAssessments - https://phabricator.wikimedia.org/T197564 (10Joe) a:05Joe→03None [07:02:53] 10serviceops, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, 10Services (watching): Undeploy electron service from WMF production - https://phabricator.wikimedia.org/T226675 (10Joe) p:05Triage→03Normal [07:23:29] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10Joe) a:05jijiki→03Joe I did rollout the new version on the canary servers today. If I don't see higher error rates on mo... [07:57:54] 10serviceops, 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10hashar) 05Resolved→03Open I need the package for **Stretch**! [08:01:37] 10serviceops, 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10MoritzMuehlenhoff) And that is what T215975 provides... [08:42:20] 10serviceops, 10Operations, 10Continuous-Integration-Infrastructure (phase-out-jessie): Upload docker-ce 18.06.3 upstream package for Stretch - https://phabricator.wikimedia.org/T226236 (10hashar) Ah eventually I found the entry: ` Name: thirdparty/kubeadm-k8s-docker.com Method: https://download.docker.com/... [08:45:33] _joe_: Buongiorno! For contint1001 disk partitioning, you told me earlier this week you had a recommendation in mind. Would you be able to dump it on the task please ? :] https://phabricator.wikimedia.org/T207707 [10:29:08] 10serviceops, 10Operations, 10observability: Gather metrics on request status codes, latencies from the MediaWiki appservers - https://phabricator.wikimedia.org/T226815 (10Joe) [10:36:08] 10serviceops, 10Operations, 10observability: Gather metrics on request status codes, latencies from the MediaWiki appservers - https://phabricator.wikimedia.org/T226815 (10Joe) One relatively easy way to go could be to use mtail, which we use for quite some other things too. There is even an [[https://gith... [10:36:29] 10serviceops, 10Operations, 10observability: Gather metrics on request status codes, latencies from the MediaWiki appservers - https://phabricator.wikimedia.org/T226815 (10Joe) [10:37:10] <_joe_> hashar: yeah I said I needed to discuss that with the team, and I think some of it will happen on monday [10:38:05] _joe_: awesome! thank you :) [10:40:34] 10serviceops, 10Operations, 10observability: Gather metrics on request status codes, latencies from the MediaWiki appservers - https://phabricator.wikimedia.org/T226815 (10jbond) p:05Triage→03Normal [10:48:28] 10serviceops, 10Continuous-Integration-Infrastructure, 10Operations, 10Release-Engineering-Team (Kanban): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10hashar) Update: it needs more discussion among SRE team :-] [10:59:27] 10serviceops, 10Wikidata-Termbox-Hike: Create termbox release for test.wikidata.org - https://phabricator.wikimedia.org/T226814 (10Tarrow) [13:15:17] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10awight) @jijiki We're a bit confused because the beta cluster [[ https://en.wikipedia.beta.wmflabs.org/wiki/Special:Version |... [13:20:57] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10Joe) >>! In T223391#5292192, @awight wrote: > @jijiki We're a bit confused because the beta cluster [[ https://en.wikipedia.b... [13:22:30] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10awight) >>! In T223391#5292196, @Joe wrote: >>>! In T223391#5292192, @awight wrote: >> @jijiki We're a bit confused because t... [13:24:54] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10Joe) Yes :) It's by far the best option. [13:25:21] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10awight) Certainly seems to be the case. Good news, with PHP7 enabled I get the expected results where the new version of wik... [13:33:14] what do I need to do, if I need a change of configuration to sessionstore (running in k8s)? [13:34:15] I'm 99.9% sure I know where that configuration happens (deploy1001:/srv/scap-helm/sessionstore/sessionstore-{codfw,eqiad,staging}-values.yaml), but I don't know what needs to happen from there [13:38:22] to be clear, the configuration I speak of is the application deployed there (Kask), and nothing else [13:53:33] <_joe_> you need to do a new deployment, using the new values file [13:53:54] <_joe_> have you ever deployed sessionstore before? [13:56:42] <_joe_> something like (check the values please) [13:57:31] <_joe_> CLUSTER="..." scap-helm sessionstore upgrade production -f .yaml stable/sessionstore [13:57:52] <_joe_> but akosiaris and fsero can confirm if I missed something [13:58:56] <_joe_> btw I see you're at revision 9 of the deployment in eqiad, and at version 2 in codfw [13:59:05] <_joe_> but they happened at the same time [14:00:24] <_joe_> it's stable/kask obviously [14:00:31] <_joe_> which is the chart name [14:00:32] what _joe_ describes its correct [14:00:36] yup [14:00:49] <_joe_> yeah I didn't check before writing [14:00:51] you just need to change the values.yaml and do scap-helm [14:01:35] urandom: which reminds me that we should schedule a training for this. But wait out a bit so we finalize the final pieces of helmfile and ditch scap-helm [14:02:36] akosiaris: yeah, a training would be awesome! [14:03:21] so, CLUSTER="sessionstore" scap-helm sessionstore upgrade production -f .yaml stable/kask ? [14:03:49] or...what, specifically, should CLUSTER be set to? [14:05:01] CLUSTER should be staging|eqiad|codfw [14:05:12] i suggest you start with staging and check everything works as it should [14:06:10] fsero: yup; thanks! [14:08:12] hrmm [14:08:21] https://www.irccloud.com/pastebin/OWGay4oy/ [14:08:38] Error: UPGRADE FAILED: "production" has no deployed releases [14:09:35] in staging is called staging [14:10:36] CLUSTER=staging scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [14:11:06] oooohhh [14:11:17] man, I had to parse and reparse that like 3 times [14:22:11] OK, one more dumb question (hopefully the last), but what is the staging address? I know the production (LVS) host(s), but I see no reference to staging in any of the associated ticket [14:33:36] mmm that i dont know [14:33:51] maybe there is no LVS/DNS associated [14:33:55] lemme check [14:43:44] yup none [14:43:54] urandom: in deploy1001 you can do sudo KUBECONFIG=/etc/kubernetes/sessionstore-staging.config kubectl -n sessionstore port-forward kask-staging-7b797797cd-6rbmq :8081 [14:44:11] and it will output a random local port like Forwarding from 127.0.0.1:37363 -> 8081 [14:44:19] then in another terminal on deploy1001 [14:44:28] curl -vv -k https://127.0.0.1:37363/healthz [14:44:33] or whatever you want to check [14:45:43] fsero: is this something that needs to be undone when I'm finished? [14:45:55] just control c the port forward command [14:46:00] oic [14:47:20] heh, I don't have sudo on deploy1001 [14:47:43] you dont need it [14:47:49] KUBECONFIG=/etc/kubernetes/sessionstore-staging.config kubectl -n sessionstore port-forward kask-staging-7b797797cd-6rbmq :8081 [14:47:56] i think [14:48:01] :) [14:48:26] yeah, seems to be working... [14:52:08] fsero: thanks! [15:09:01] fsero: yeah we need to create a DNS like staging.svc.eqiad.wmnet pointing to kubestage1001, kubestage1002 [15:09:13] that should address the testing issue [15:09:18] doing LVS for that though is an overkill [17:22:02] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: etcd1004-1006 is unused and idle, use the cluster or kill it. - https://phabricator.wikimedia.org/T212934 (10Dzahn) [17:23:52] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: etcd1004-1006 is unused and idle, use the cluster or kill it. - https://phabricator.wikimedia.org/T212934 (10Dzahn) added decom checklists for each host. taken from https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Steps_for_ANY_Opsen -> https://phabricato... [17:25:26] 10serviceops, 10Prod-Kubernetes, 10decommission, 10Kubernetes: etcd1004-1006 is unused and idle, use the cluster or kill it. - https://phabricator.wikimedia.org/T212934 (10Dzahn) [17:42:35] hey is https://wikitech.wikimedia.org/wiki/LVS#Pool_or_depool_hosts_(for_non-Etcd_managed_pools) correct? [17:42:41] /srv/pybal-confi [17:42:51] doesn't exist (i see /srv/config-master/pybal/) [17:43:06] and, i'm not sure what is meant by 'Please don't forget to commit your changes locally.'