[00:37:49] 10serviceops, 10Operations, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) [07:27:40] _joe_: hi - back to mcrouter proxies...? :) (just grabbing a coffee, be right back) [07:38:53] <_joe_> jayme: hey, yes, so the way to do it is [07:39:25] <_joe_> comment two of them in puppet's hieradata, run puppet in all of eqiad, reboot them, repeat with the other two [07:39:40] <_joe_> I've already rebooted the eqiad ones [07:39:50] <_joe_> without need to do this dance as no one writes to codfw [07:40:45] Ah, I see. wilco [07:41:02] <_joe_> happy to review/assist :) [07:41:15] thanks [07:51:03] _joe_: https://gerrit.wikimedia.org/r/c/operations/puppet/+/621196 [07:59:55] _joe_: 'A:all-mw-eqiad' is probably right to run puppet on, right? (not really *all of eqiad*) [08:02:23] <_joe_> yes [08:02:30] <_joe_> add -b 30 [08:02:39] <_joe_> so we don't kill the puppetmasters [08:03:58] ok [08:14:41] _joe_: did you already restore effies modifications on mwdebug1002? [08:14:58] <_joe_> jayme: yes, but don't worry about it [08:15:06] <_joe_> just ignore it in the puppet run [08:15:46] _joe_: ignore mwdebug1002 completely you mean? Because the puppet run faild there [08:16:06] <_joe_> yes, it's not serving live traffic so it's ok if it fails for a bit [08:16:17] ack [08:26:29] _joe_: next one please :) https://gerrit.wikimedia.org/r/c/operations/puppet/+/621198 [08:32:43] <_joe_> jayme: yes sorry, I was in another convo [08:32:51] np [08:37:49] _joe_: prepared the re-enable patch as well https://gerrit.wikimedia.org/r/c/operations/puppet/+/621201 [08:40:40] 10serviceops, 10Operations, 10Platform Engineering, 10Wikidata, 10Sustainability (Incident Followup): mw* servers memory leaks (12 Aug) - https://phabricator.wikimedia.org/T260281 (10Joe) a:03Joe [08:55:01] _joe_: mw1381 (the bpo kernel one) was rebooted as well some time ago. Which means it's now running nokmem as well :/ [08:55:15] <_joe_> sigh [08:58:56] <_joe_> so it was this morning right? [09:03:43] mhm :-) [09:06:01] 10serviceops, 10Operations, 10SRE-tools: Create a cookbook for depooling one or all services from one kubernetes cluster - https://phabricator.wikimedia.org/T260663 (10fgiunchedi) p:05Triage→03Medium [09:06:09] 10serviceops, 10Operations, 10SRE-tools: Create a cookbook for applying an apache config change safely - https://phabricator.wikimedia.org/T260664 (10fgiunchedi) p:05Triage→03Medium [09:20:36] _joe_: servers still missing nokmem: mwdebug[2001-2002].codfw.wmnet,mwdebug1002.eqiad.wmnet,wtp[1047-1048].eqiad.wmnet [09:20:58] I'll do mwdebug[2001-2002].codfw.wmnet [09:21:12] nooo [09:21:18] nooo? [09:21:18] I will do it [09:21:22] oooook [09:21:23] I am running a test [09:21:32] ah on codfw as well [09:21:34] I will do the debugs [09:21:45] I am running from codfw to eqiad [09:22:17] ok, sorry for the stroke :D [09:22:32] nah no big deal even if they were [09:22:52] I have disabled puppet on 1002 so I have to do that as well [09:24:10] sorry I didn't do any reboots this time around [09:24:36] np, next ones are on you then :P [09:24:58] * effie whistles [09:25:26] the job is qiute easy now with joes cookbook... [09:28:22] wtp[1047-1048] are okay now as well... \o/ [09:46:20] <_joe_> yes i was restarting all of parsoid [12:35:13] 10serviceops, 10Operations, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10kostajh) This might substantially complicate the implementation, but it would be nice if you could do something like: `lang=php $result... [12:41:54] hi serviceops! how do I get my new production-images merged/uploaded? I do not have the rights [12:46:07] could somebody please volonteer to upload them while Hugh is vacationing? [12:49:02] <_joe_> 🤔 [12:49:17] <_joe_> Pchelolo: did anyone review your patches? [12:50:06] I have +1s on them: https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/619512 and https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/620068 [12:52:47] <_joe_> I have one comment about the first :D [12:53:24] 10serviceops, 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10MSantos) [12:54:58] eheh, I boldly looked over that one :P [12:55:58] <_joe_> jayme: I think it's also a good day to rebuild the base images [12:56:24] _joe_: done! put myself in there. [12:57:52] 10serviceops, 10Operations, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) >>! In T260330#6396222, @kostajh wrote: > Then, if the command execution exceeds the value set with `setTimeLimit()`, the com... [12:59:21] <_joe_> Pchelolo: I'm rebuilding the base images, then I'll build your two new images [12:59:41] <_joe_> base images == wikimedia-buster etc [12:59:49] cool, thank you! I'm not in any rush, but without Hugh around I had no way forward.. [13:01:27] <_joe_> that's why serviceops are around :P [13:06:03] <_joe_> Pchelolo: done! [13:06:23] woohoo! thank you. I'll start deploying some stuff soon [13:06:32] <_joe_> cool [13:07:00] <_joe_> Pchelolo: how are you managing edge traffic? it goes to the api gateway and from there to mediawiki, when the wiki should be reached? [13:08:41] _joe_: afaik yes, but Hugh was setting that up so i'm not sure [13:08:51] <_joe_> ack [13:11:42] 10serviceops, 10Operations, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10kostajh) >>! In T260330#6396259, @tstarling wrote: > This already exists, and will continue to exist in the new system. You can use Shel... [13:25:47] <_joe_> jayme: looks like our change didn't do the right thing [13:25:59] <_joe_> and I don't get why the heck [13:26:12] _joe_: XFP ? [13:26:15] <_joe_> yes [13:26:19] hmm... [13:26:27] <_joe_> actually it seems to have broken both [13:27:26] <_joe_> it's strange given it worked, apparently, for parsoid... [13:28:44] <_joe_> I have a meeting now, I'd rather revert and analyze later [13:33:25] sounds about right [14:24:30] _joe_: I'm a bit confused...I was very sure that "helmfile template" would render all releases in case of old style helmfile.d layout (means: all releases in default environment). Debug output also states that it will process 1 group of releases: "eventgate-analytics/production, eventgate-analytics/canary" but I only get output for the first one...can you confirm? [14:25:10] <_joe_> jayme: not now, sorry [14:25:41] _joe_: nah, no need for *now*. Whenever you have a minute [14:26:20] <_joe_> yeah sorry already doing 5 things at a time [14:26:49] <_joe_> jayme: but yes what you wrote was my assumption too [14:26:56] <_joe_> that helmfile template would process both [14:28:04] no problem! :) ...let me know if I can take something off of you. I'm about to just add an alias "hellfile -> helmfile" to the deb and leave it alone :D [14:32:27] ah, silly me. canary release of eg-analytics has "installed: false" set for codfw and ewqiad [14:41:38] <_joe_> jayme: https://gerrit.wikimedia.org/r/c/operations/puppet/+/621283 [14:44:28] _joe_: I don't quite get why the last patch broke it then in case we send xfp: https manually to envoy... [14:44:52] <_joe_> because it was sending "https,https" [14:44:59] <_joe_> which is *correct and valid* [14:45:02] oh man... [14:45:03] <_joe_> but oh well mediawiki [14:46:22] think it would be nice to conserve that in the commit message as well :) [14:47:49] <_joe_> yeah, doing so [14:51:26] thanks [15:01:09] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Patch-For-Review: Refactor our helmfile.d dir structure for services - https://phabricator.wikimedia.org/T258572 (10JMeybohm) What I more or less did and what probably help others to review and produce CRs that are "best we can get"-ish to review: `la... [15:02:38] <_joe_> jayme: done [15:07:34] _joe_: +1ed :) [15:44:03] 10serviceops, 10Patch-For-Review: Rebuild wikimedia-stretch docker image for repository updates - https://phabricator.wikimedia.org/T257327 (10Dzahn) [15:44:38] <_joe_> Pchelolo: does restbase propagate X-Forwarded-Proto from the request? [15:47:09] _joe_: I have to check. Will get back to you after the meeting [15:47:26] <_joe_> Pchelolo: ack thanks :) [15:57:13] _joe_: nope. it does not [16:39:57] <_joe_> Pchelolo: ok, thanks [17:38:56] 10serviceops, 10Operations, 10Patch-For-Review: decom releases1001 and releases2001 - https://phabricator.wikimedia.org/T260742 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `releases2001.codfw.wmnet` - releases2001.codfw.wmnet (**PASS**) - Downtimed host on... [18:04:53] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install kubernetes1017.eqiad.wmnet - https://phabricator.wikimedia.org/T258747 (10wiki_willy) a:03Jclark-ctr [18:37:50] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) This is stalled on T260627 but otherwise should be good to go. [18:40:34] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) [18:40:49] 10serviceops, 10DBA, 10Operations, 10Parsoid, and 2 others: update mysql GRANTs for testreduce - https://phabricator.wikimedia.org/T260627 (10Dzahn) 05Open→03Resolved >>! In T260627#6392112, @Kormat wrote: > Hi, i've created the new grants. Please test and let me know if there are any issues. Cheers.... [18:40:59] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) 05Stalled→03Open [18:42:53] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) @ssastry The `parsoid-rt` service is now running on testreduce1001 and does not stop anymore because it can t... [18:43:46] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) @ssastry Please let me know what else you need on the new instance. [19:42:12] Amir1: addshore: _joe_: Where are we on https://phabricator.wikimedia.org/T240884 - who's waiting for what? [19:42:20] (RFC: regex service) [19:48:43] We are waiting for resourcing / allocated people power to create such a service (and for our roadmap etc to allow it) [21:23:58] https://strimzi.io/ -- CRD for running Kafka inside a k8s cluster [21:46:06] 10serviceops, 10MediaWiki-Cache, 10MediaWiki-General, 10Performance-Team: Use monotonic clock instead of microtime() for perf measures in MW PHP - https://phabricator.wikimedia.org/T245464 (10Krinkle) [21:47:57] addshore: ok, so you're not blocked on the RFC currently [22:23:22] 10serviceops, 10Operations, 10Patch-For-Review: decom releases1001 and releases2001 - https://phabricator.wikimedia.org/T260742 (10hashar) [22:24:58] 10serviceops, 10Operations, 10Patch-For-Review: decom releases1001 and releases2001 - https://phabricator.wikimedia.org/T260742 (10hashar) I am guessing this task is purely for tracking purpose. In case you are seeking any blessing, given releases.wm.o and the releases-jenkins.wm.o are properly working out o... [23:03:31] 10serviceops, 10Operations, 10Patch-For-Review: decom releases1001 and releases2001 - https://phabricator.wikimedia.org/T260742 (10Dzahn) >>! In T260742#6398400, @hashar wrote: > I am guessing this task is purely for tracking purpose. In case you are seeking any blessing, given releases.wm.o and the releases...