[00:01:59] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10tstarling) I haven't made a tarball for wikidiff2 before and I can't find any documentation of how that is meant to be done. It looks... [00:15:15] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10tstarling) Never mind, I found https://www.mediawiki.org/wiki/Extension:Wikidiff2/Release_process [00:15:46] 10serviceops, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn) 19:10 < mutante> !log phab2001 - restart ssh-phab service after repooling it after buster reinstall, it wasn... [00:16:19] 10serviceops, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn) [00:19:47] 10serviceops, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn) [00:20:24] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10tstarling) Should be done now. [00:21:34] 10serviceops, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn) - phab1001 is on buster - phab2001 is now also on buster - next: declare maintenance window to switch prod f... [00:24:10] 10serviceops, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn) Technically this ticket is resolved. The "switch prod from phab1003 to phab1001" and "decom phab1003" are pr... [06:02:31] 10serviceops, 10Phabricator, 10Documentation, 10Release-Engineering-Team (Development services), and 2 others: Make PHD run on the backup phabricator server (phab2001, currently) - https://phabricator.wikimedia.org/T232883 (10Marostegui) >>! In T232883#5676791, @Dzahn wrote: > > - When i merged and puppet... [06:51:58] 10serviceops, 10Operations, 10Phabricator, 10Traffic: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10Joe) I don't think the solution is removing aphlict, but instead proxying to it directly from envoy or ATS, our choice. @ema @Dz... [06:54:56] 10serviceops, 10Operations, 10User-Joe: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 - https://phabricator.wikimedia.org/T212828 (10Joe) 05Open→03Resolved [07:04:45] 10serviceops, 10Operations: envoyproxy does not automatically reload certificates - https://phabricator.wikimedia.org/T238597 (10Joe) I'm confused. The hot restarter is the default since https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536149/ We just need the cert to notify the envoy service. [08:30:50] 10serviceops, 10Operations, 10Release-Engineering-Team-TODO, 10Patch-For-Review, 10Wikimedia-Incident: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 (10akosiaris) 05Open→03Resolved a:03akosiaris I 'll boldly resolv... [08:31:46] 10serviceops, 10Operations: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10akosiaris) 05Open→03Resolved krypton is no more since 7a36b4e7a94f486a400f0363c263c446c33bba80, resolving. [09:57:49] 10serviceops, 10Operations, 10Packaging: Build and upload envoy 1.12.0 package. - https://phabricator.wikimedia.org/T237235 (10Joe) In the meantime, we have a security release 1.12.1 - I will build it and upload it to stretch and buster. [09:58:37] 10serviceops, 10Operations, 10Packaging: Build and upload envoy 1.12.0 package. - https://phabricator.wikimedia.org/T237235 (10Joe) [13:19:52] 10serviceops, 10Operations: upgrade and rename krypton & create its codfw equivalent - https://phabricator.wikimedia.org/T224247 (10Dzahn) 05Resolved→03Open [13:21:52] 10serviceops, 10Phabricator, 10Documentation, 10Release-Engineering-Team (Development services), and 2 others: Make PHD run on the backup phabricator server (phab2001, currently) - https://phabricator.wikimedia.org/T232883 (10Dzahn) Yes, it was. But it was not expected that phd fails to start entirely wit... [13:24:20] 10serviceops, 10Operations, 10Phabricator, 10Traffic: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10Dzahn) Either way when switching the Hiera key it should either enable or disable all the things and not just some of them. A... [13:40:01] 10serviceops, 10MediaWiki-Maintenance-scripts, 10Operations: Stop forcing RUNNER=php for foreachwiki/foreachwikiindblist - https://phabricator.wikimedia.org/T230110 (10Dzahn) @DannyS712 I don't think we should even add "Patch-For-Review" in the first place, i am not aware of a single time getting a review be... [13:50:49] 10serviceops, 10MediaWiki-Maintenance-scripts, 10Operations: Stop forcing RUNNER=php for foreachwiki/foreachwikiindblist - https://phabricator.wikimedia.org/T230110 (10MarcoAurelio) >>! In T230110#5678220, @Dzahn wrote: > @DannyS712 I don't think we should even add "Patch-For-Review" in the first place, i am... [14:11:07] _joe_: o/ [14:11:23] <_joe_> ottomata: hi! [14:11:50] <_joe_> so, do you have a task where I can read what you did up to now? [14:11:56] sur do! [14:12:02] you back in italy land now? [14:12:07] <_joe_> yep [14:12:16] https://phabricator.wikimedia.org/T236386 [14:12:23] i updated the description with the todos [14:12:35] <_joe_> ohh that's great [14:12:48] <_joe_> I'll try going through that list of patches as fast as I can [14:13:28] k danke, there's an outstanding q for ema in https://gerrit.wikimedia.org/r/c/operations/puppet/+/551247 [14:13:38] that's the last one tho [14:13:41] <_joe_> I just have a few other people to unblock too [14:13:44] k [14:13:56] <_joe_> and they woke up earlier than you :D [14:13:58] i'll ping ema in other channel... [14:13:59] :) [14:14:26] np, just checking email, looks like I have some more naming bike shed discussions to occupy my time [14:15:21] <_joe_> ahah [14:41:12] 10serviceops, 10Operations: envoyproxy does not automatically reload certificates - https://phabricator.wikimedia.org/T238597 (10CDanis) 05Open→03Resolved a:03Joe [15:08:24] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) [15:11:28] 10serviceops, 10Parsoid-PHP: Class not found transient errors after Parsoid/PHP scap3 deploys - https://phabricator.wikimedia.org/T238748 (10ssastry) p:05Triage→03High [15:20:46] <_joe_> hey serviceops folks, can someone pick https://phabricator.wikimedia.org/T236437 up? [15:21:02] <_joe_> we have to give racking reccommendations to our dcops colleagues [15:21:38] <_joe_> basically this means going to netbox and ensuring we have an almost-even rate of servers across rows once we remove the servers those will substitute [15:29:25] I can work on it [15:30:09] 10serviceops, 10Operations, 10ops-eqiad: rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) [15:34:15] * _joe_ hugs effie [15:34:24] <_joe_> thanks :) [15:35:53] <_joe_> we've been blocking that for way too long, and with "we" I mean "I" :P [15:57:57] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10akosiaris) >>! In T236386#5672742, @Ottomata wrote: > @Joe @akosiaris @ema I'd like to move forward with these patches this... [15:59:53] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) Thank you! I like your suggestions on the kafka producer TLS one, will implement. Joe can help with the rest tod... [17:27:38] 10serviceops, 10Parsoid-PHP: Class not found transient errors after Parsoid/PHP scap3 deploys - https://phabricator.wikimedia.org/T238748 (10ssastry) [17:29:47] 10serviceops, 10Parsoid-PHP: Class not found transient errors after Parsoid/PHP scap3 deploys - https://phabricator.wikimedia.org/T238748 (10Joe) The solution would be, IMHO, to deploy parsoid-php as part of the main mediawiki code deployment (so, via scap and not scap3). We have a well established, working... [17:32:51] 10serviceops, 10Parsoid-PHP: Class not found transient errors after Parsoid/PHP scap3 deploys - https://phabricator.wikimedia.org/T238748 (10mobrovac) >>! In T238748#5679068, @Joe wrote: > The solution would be, IMHO, to deploy parsoid-php as part of the main mediawiki code deployment (so, via scap and not sca... [17:37:17] _joe_, o/ ... we do want to move parsoid extension code into core, but that is some time away and so, if there is a solution to make this work till then, that would be helpful ... there are so many pieces to the porting and integration, that we are trying to bite off smallish pieces at a time and finish them off. [17:41:49] <_joe_> ok, it's not going to be particularly easy but we can try something [18:00:57] _joe_: how goes over there :)? [18:01:13] akosiaris: made some changes in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/551610, not totally sure if that works though [18:01:23] my local minikube is so busted I can't really test locally [18:01:37] <_joe_> ottomata: I'm midway through the first patch, but I'll finish tomorrow [18:01:48] <_joe_> I will have all of them reviewed by the time you come online [18:01:52] ok ty [18:01:54] <_joe_> pinkie promise [18:02:04] <_joe_> it looked good though [18:04:27] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team: Provide the official production base images for Wikimedia use - https://phabricator.wikimedia.org/T238774 (10Jdforrester-WMF) [18:42:40] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10jijiki) Version 1.10.0-1~wmf1 has been deployed to `deployment-mediawiki-09` and `deployment-mediawiki-07`. Please let me know if it w... [19:06:30] 10serviceops, 10Operations: dropped packets to phab1003 22280/tcp - https://phabricator.wikimedia.org/T238781 (10ayounsi) p:05Triage→03Normal [19:10:34] 10serviceops, 10Operations: dropped packets to phab1003 22280/tcp - https://phabricator.wikimedia.org/T238781 (10Dzahn) a:03Dzahn [19:18:04] 10serviceops, 10Operations: dropped packets to phab1003 22280/tcp - https://phabricator.wikimedia.org/T238781 (10Dzahn) [19:18:08] 10serviceops, 10Operations, 10Phabricator, 10Traffic: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10Dzahn) [19:19:54] 10serviceops, 10Operations: dropped packets to phab1003 22280/tcp - https://phabricator.wikimedia.org/T238781 (10Dzahn) Port 22280/tcp is the aphlict service which is currently disabled (T238593). In addition to the backend It has also been disabled in ATS (https://gerrit.wikimedia.org/r/c/operations/puppet/+... [19:26:48] 10serviceops, 10Parsoid-PHP, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review: php-fpm isn't restarted when deploys are rolled back - https://phabricator.wikimedia.org/T238685 (10mobrovac) a:03mobrovac [19:27:20] 10serviceops, 10Parsoid-PHP, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review: Class not found transient errors after Parsoid/PHP scap3 deploys - https://phabricator.wikimedia.org/T238748 (10mobrovac) a:03mobrovac [19:37:01] 10serviceops, 10Operations, 10Patch-For-Review: dropped packets to phab1003 22280/tcp - https://phabricator.wikimedia.org/T238781 (10Dzahn) @ema @Vgutierrez So this was removed from ATS in https://gerrit.wikimedia.org/r/c/operations/puppet/+/551731 but does it also need https://gerrit.wikimedia.org/r/c/oper... [20:08:05] 10serviceops, 10Performance-Team: make xhgui::app role support buster and deploy on new xhgui machines - https://phabricator.wikimedia.org/T238788 (10Dzahn) [20:27:20] 10serviceops, 10Operations: dropped packets to echostore.svc.eqiad 8082/tcp - https://phabricator.wikimedia.org/T238789 (10ayounsi) p:05Triage→03Normal [20:31:51] 10serviceops, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, and 2 others: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Dzahn) [20:34:40] 10serviceops, 10Operations, 10observability: dropped packets to conf1004/5/6 2379/tcp - https://phabricator.wikimedia.org/T238791 (10ayounsi) [20:38:21] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds: Wikifeeds deployment failed in staging - https://phabricator.wikimedia.org/T238792 (10Mholloway) [20:38:41] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds: Wikifeeds deployment failed in staging - https://phabricator.wikimedia.org/T238792 (10Mholloway) [20:50:28] 10serviceops, 10Operations, 10Release-Engineering-Team, 10Performance-Team (Radar), and 2 others: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp on mwdebug1002) - https://phabricator.wikimedia.org/T214734 (10Krinkle) Looks like it is still happening: https://log... [20:55:22] 10serviceops, 10Operations, 10Phabricator, 10Traffic, 10Patch-For-Review: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) >>! In T238593#5678148, @Dzahn wrote: > As Mukunda pointed out the aphlict service does not even... [20:55:49] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds: Wikifeeds deployment failed in staging - https://phabricator.wikimedia.org/T238792 (10Mholloway) Looks like there are currently issues accessing docker-registry.wikimedia.org, which may be to blame here. [20:56:19] 10serviceops, 10Performance-Team, 10Patch-For-Review: make xhgui::app role support buster and deploy on new xhgui machines - https://phabricator.wikimedia.org/T238788 (10Dzahn) [21:03:17] 10serviceops, 10Parsoid-PHP, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review: Class not found transient errors after Parsoid/PHP scap3 deploys - https://phabricator.wikimedia.org/T238748 (10ssastry) 05Open→03Resolved [22:16:44] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10Pchelolo) Hehe... @jijiki could you do deployment-parsoid as well please? [22:17:57] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10eprodromou) >>! In T236963#5679379, @jijiki wrote: > Version 1.10.0-1~wmf1 has been deployed to `deployment-mediawiki-09` and `deploym... [22:20:25] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10eprodromou) OK, I jumped the gun. Apparently that's not output from the 1.10.0 version. [22:24:50] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10jijiki) >>! In T236963#5680172, @Pchelolo wrote: > Hehe... @jijiki could you do deployment-mediawiki-parsoid-* as well please? all the...