[00:51:40] 10serviceops, 10Shellbox: Have Shellbox emit metrics - https://phabricator.wikimedia.org/T271179 (10Legoktm) [09:32:30] <_joe_> work up for grabs: merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/610050, rebuild the base stretch image, then update all the docker images based on stretch in our production images list. [09:37:56] <_joe_> hnowlan: around? I see cpjobqueue is still running on the scb cluster... I hope it's not still processing jobs though [09:45:15] 10serviceops, 10Machine Learning Platform, 10ORES, 10Okapi, and 4 others: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10awight) >>! In T263910#6620877, @Ladsgroup wrote: > Guess when changes got merged: https://grafana.wikimedia.org/d/HIRrxQ6mk/ores?viewPanel=1... [09:56:02] 10serviceops, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10JMeybohm) > kubernetes[1007-1014].eqiad.wmnet As I see it these hosts do have IPv6 in DNS (and netbox). [09:59:24] 10serviceops, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10Joe) Please let's not add IPV6 to hosts blindly, each group would need to be verified independently to ensure enabling ipv6 would not break firewall rules (are the... [10:01:16] 10serviceops, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10Joe) p:05Triage→03Low [10:01:58] _joe_: definitely not processing, all jobs have been disabled since the migration [10:02:01] stopping them now [10:02:40] <_joe_> hnowlan: we should just run systemctl disable on all nodes [10:02:49] <_joe_> and maybe remove the unit files altogether? [10:04:10] aye, doing that [10:15:28] 10serviceops, 10SRE-tools, 10IPv6: Some Service Operations clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271142 (10Joe) Specifically, the following clusters are reached directly by hostname (and not via LVS) and will need special care: - mc* - rdb* - restbase* - sessionstore*... [10:16:11] <_joe_> thanks <3 [10:22:46] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: k8s_infrastructure_users: rsyslog and echostore share the same id - https://phabricator.wikimedia.org/T269461 (10JMeybohm) a:03JMeybohm [14:59:44] jayme: o/ what can I do to get https://phabricator.wikimedia.org/T269160 done? [14:59:48] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/644612 [15:00:58] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (10Ottomata) [15:01:12] hi ottomata o/ sorry, did not had time to look into that last Q [15:01:21] no prob, it is low priority for sure [15:06:35] ottomata: guess you want to make this easily available via LVS/geodns, right? [15:07:21] jayme: yeah that'd be fine, but not publicly, so something like eventstreams-internal.discovery.wmnet would be fine [15:07:31] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) [15:07:33] MAYYYYBE we could do CAS with a public addy [15:08:18] but would rather just get it set up first and then think about that [15:08:39] its not really any less secure than say superset.wikimedia.org so, i guess it would be fine [15:11:36] 10serviceops, 10Machine Learning Platform, 10ORES, 10Okapi, and 4 others: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10calbon) I'm just happy I am not resetting it every few hours. One of our goals was to put it in a stable mode and we've done it! [15:12:32] It's the usual setup dance like with any "public" service, then. I can pick that up this Q (unless someone else wants to - will ask on monday) and set up the LVS stuff etc. [15:12:59] <_joe_> I have a question, though [15:13:10] Of cause you have :-) [15:13:11] <_joe_> how does analytics need assistance from us? [15:13:24] <_joe_> you have two SREs plus you, ottomata [15:13:33] <_joe_> I doubt you need to wait for us [15:19:17] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1025.eqiad.wmnet ` The log can be... [15:19:23] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2025.codfw.wmnet ` The log can be... [15:19:53] We could use that as trainig opportunity as well...nothing involved that only service ops can do I guess [15:21:36] 10serviceops, 10Shellbox: Have Shellbox emit metrics - https://phabricator.wikimedia.org/T271179 (10Joe) I think we will be running every single endpoint as a separate installation (to reduce the attack surface in the single container). Apart from that, what you described can be covered by a single prometheus... [15:23:53] i'd be happy to do it [15:24:02] would need hand holding for the network parts [15:24:06] jayme: _joe_ ^ [15:24:53] <_joe_> ottomata: maybe we could involve someone who's not currently also managing the team :) [15:25:08] <_joe_> so that, say, luca knows how to operate on k8s in the end :) [15:25:21] hmmm that would be cooll [15:25:23] i bet he'd like that [15:25:25] elukey: ^ :) [15:25:55] wait who's managing the team? don't say me, i don't manange no sir I don't. [15:26:08] :p [15:30:23] did I get a gift for my 5y wiki-birthday? [15:30:49] :D [15:33:10] jayme: if you have patience to teach me the dark secrets of k8s I'd be very happy to learn [15:33:30] something like: I read the docs, come up with a non-sense plan, and you tell me what's wrong [15:33:42] (probably more than once) [15:34:09] I fear the darker parts are on the puppet/lvs side but, yeah - of cause! [15:34:37] there is always puppet ruining the joy [15:35:26] jayme: deal then, I'll be able to pick this up in ~ 2 weeks time, would it be ok for you? [15:36:28] yeah, cool! Let me know when you have a spot for a quick check-in. Might make sense to figure out where to pick you up regarding all this :) [15:38:11] jayme: is there some background reading for n000bs that you'd suggest me to read before asking questions? [15:38:37] jaime: +1 on the training opportunity [15:39:19] jayme: +1 on the training opportunity [15:40:14] elukey: Sure I'll send you something my mail [15:43:53] thankssss [15:50:18] (tomorrow probably) [16:14:29] :) [16:14:53] elukey: i think most of the dev work is done here: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/644612 [16:14:56] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1025.eqiad.wmnet'] ` and were **ALL** successful. [16:15:10] there's just some background networking etc. work that also needs to be done to set up a new deployment in k8s [16:17:49] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2025.codfw.wmnet'] ` and were **ALL** successful. [16:19:45] 10serviceops, 10Operations, 10conftool, 10Datacenter-Switchover: Disable maintenance scripts via conftool - https://phabricator.wikimedia.org/T266717 (10Joe) I think we have a better way to avoid this. Basically we want to stop running scripts once we get into the readonly phase. So we could modify the wra... [16:49:26] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) [17:08:12] <_joe_> rzl/legoktm: I forgot to say - I will try to pen down a bit better what we said in the meeting [17:09:44] 👍 thanks for doing that [17:13:18] thanks [17:20:03] 10serviceops, 10Parsoid, 10RESTBase, 10Patch-For-Review: Decommission Parsoid/JS from the Wikimedia cluster - https://phabricator.wikimedia.org/T241207 (10Jdforrester-WMF) 05Open→03Resolved This is definitely done now. [17:38:20] 10serviceops, 10Operations, 10conftool, 10Datacenter-Switchover: Disable maintenance scripts via conftool - https://phabricator.wikimedia.org/T266717 (10RLazarus) >>! In T266717#6722705, @Joe wrote: > I think we have a better way to avoid this. Basically we want to stop running scripts once we get into the... [19:02:09] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review, 10User-jijiki: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10aaron) >>! In T264604#6681125, @jijiki wrote: > @Krinkle @aaron do you think we are ready to move this forward?... [21:41:28] 10serviceops, 10Operations, 10Wikimedia-production-error: PHP7 corruption reports in 2020 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10hashar) That happened during 1.36.0-wmf.25 promotion to testwiki. We then had three servers showing all the symptoms of suffering from an opcac... [21:43:42] 10serviceops, 10Operations, 10Wikimedia-production-error: PHP7 corruption reports in 2020 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10RhinosF1) Do we need to update the title / create a 2021 task? [21:44:08] 10serviceops, 10Operations, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10Krinkle) [22:30:19] 10serviceops, 10MW-on-K8s, 10Platform Engineering, 10Release-Engineering-Team-TODO, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) [22:54:46] 10serviceops, 10MW-on-K8s, 10Operations, 10Shellbox, and 3 others: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10Legoktm)