[06:13:24] <_joe_> oh damn mw1251 was a scap proxy mutante [07:44:34] 10serviceops, 10Release-Engineering-Team: mw1251 down (no ssh) but still in dsh group? - https://phabricator.wikimedia.org/T248501 (10Joe) p:05High→03Unbreak! a:05Dzahn→03Joe [08:06:30] 10serviceops, 10Release-Engineering-Team: mw1251 down (no ssh) but still in dsh group? - https://phabricator.wikimedia.org/T248501 (10Joe) Not only that, but also mw1252, which is a mcrouter proxy, got decommissioned yesterday. Fixing both. [08:35:58] 10serviceops, 10Release-Engineering-Team, 10Patch-For-Review: mw1251 down (no ssh) but still in dsh group? - https://phabricator.wikimedia.org/T248501 (10Joe) 05Open→03Resolved [09:19:33] 10serviceops, 10MediaWiki-General, 10Operations, 10Service-Architecture: Use envoy for TLS termination on the appservers - https://phabricator.wikimedia.org/T247389 (10Joe) p:05Triage→03Medium [12:09:12] closing port 80 for caching servers on mwmaint. not needed since it's also TLS now (ATS -> envoy) for noc.wikimedia.org [12:09:44] though cumin master are still allowed as they were before [12:10:59] mutante: spicerack's mediawiki module calls noc with [12:11:02] url = 'http://{noc}/conf/{filename}.php.txt'.format(noc=noc_server, filename=filename) [12:11:08] but can be easily moved to HTTPS [12:11:22] this was done before when only http was supported internally [12:11:33] volans: alright, we can look at that in a separate step, nice [12:12:26] arr. i got one extra ";" in ferm config, gotta fix that [14:27:34] 10serviceops, 10Operations, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10Dzahn) @Jclark-ctr @Cmjohnson @wiki_willy We (serviceops) are aware that currently there won't be onsite work except for emergencies. Additionally we also wanted to clarify that in thi... [14:29:08] 10serviceops, 10Operations, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10Dzahn) 05Open→03Stalled Setting to stalled. We are waiting at least until Monday before removing the remaining 5 servers in rack D5. [14:48:49] hello there, i need a python 2 container to build wheels for Zuul. I crafted the container for docker-pkg but I would need a final review and the container to be build/published to our registry. https://gerrit.wikimedia.org/r/#/c/operations/docker-images/production-images/+/580128/ [14:49:12] it is blocking me for the Zuul migration to scap [14:51:29] 10serviceops, 10Analytics, 10Event-Platform, 10Patch-For-Review, 10Wikimedia-production-error: Lots of "EventBus: Unable to deliver all events" - https://phabricator.wikimedia.org/T247484 (10Joe) 05Open→03Resolved >>! In T247484#5999552, @Ottomata wrote: > Checking in, how goes? We're now down to mu... [15:52:22] new docs how to decom appserver (includes the check if scap/mcrouter proxy and example changes) https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Removing_old_appservers_from_production_(decom) [15:52:49] (i dont really agree that this stuff was moved to a page called "Runbook" though because that's slightly different but ..shrug) [15:53:12] at least the "how to add new" and "how to remove old" is in the same place [16:43:16] 10serviceops, 10Operations, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10wiki_willy) Thanks for the heads up @Dzahn . @Jclark-ctr has been working on some of the other decom tasks this past week, but as long as this one doesn't show up on the eqiad workboa... [16:55:20] 10serviceops: Lots of "EventBus: Unable to deliver all events: 504: Gateway Timeout" - https://phabricator.wikimedia.org/T248602 (10Urbanecm) [16:58:48] 10serviceops: Lots of "EventBus: Unable to deliver all events: 504: Gateway Timeout" - https://phabricator.wikimedia.org/T248602 (10Joe) So what happened there is that eventgate-main was struggling to respond, and we have a rather aggressive timeout now (at 10 seconds). I'll relax that and add some more retry lo... [16:59:36] 10serviceops: Lots of "EventBus: Unable to deliver all events: 504: Gateway Timeout" - https://phabricator.wikimedia.org/T248602 (10Joe) p:05Triage→03Medium [17:12:21] o/ ok back after meetings and lunch [18:45:22] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Pipeline), 10Services (watching): TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production CD Pipel... - https://phabricator.wikimedia.org/T212801 [18:45:26] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 3 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10holger.knust) 05Open→03Resolved [18:47:03] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 3 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10Pchelolo) Are you planning to file a new ticket for migrating all the rules to k8s change-prop? The work here is not done yet. [22:17:19] 10serviceops, 10ChangeProp, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 3 others: Migrate changeprop to kubernetes - https://phabricator.wikimedia.org/T213193 (10holger.knust) Yes, Hugh Nolan will be creating another ticket.