[00:57:47] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1345.eqiad.wmnet'] ` an... [01:07:12] 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Convert mwdebug VMs to debian buster - https://phabricator.wikimedia.org/T274023 (10Dzahn) [01:08:26] 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Convert mwdebug VMs to debian buster - https://phabricator.wikimedia.org/T274023 (10Dzahn) [01:08:54] 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Convert mwdebug VMs to debian buster - https://phabricator.wikimedia.org/T274023 (10Dzahn) [x] https://wikitech.wikimedia.org/wiki/Mwdebug1001 [x] https://wikitech.wikimedia.org/wiki/Mwdebug1002 [x] https://wikitech.wikimedia.org/wiki/Mwdebug2001... [01:45:28] 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Convert mwdebug VMs to debian buster - https://phabricator.wikimedia.org/T274023 (10Dzahn) 05Open→03Resolved [01:45:35] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10Dzahn) [01:47:31] 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Convert mwdebug VMs to debian buster - https://phabricator.wikimedia.org/T274023 (10Dzahn) All users can find a .tar.gz in their home dir on each host that contains what was in their home before they were reimaged. Fingerprints can be found on th... [04:50:39] 10serviceops, 10SRE, 10Sustainability (Incident Followup): High latency on appservers - https://phabricator.wikimedia.org/T272215 (10Krinkle) [04:52:52] 10serviceops, 10SRE, 10Traffic, 10Sustainability (Incident Followup), 10Wikimedia-Incident: The safe service restart script doesn't detect failure when running with poolcounter. - https://phabricator.wikimedia.org/T272262 (10Krinkle) [05:11:48] 10serviceops, 10SRE, 10SRE-OnFire-Incident-Docs, 10Sustainability (Incident Followup): High latency on appservers - https://phabricator.wikimedia.org/T272215 (10Krinkle) >>! In T272215#6755992, @jcrespo wrote: > More details are yet to be provided on the Incident report, I can help with that once the right... [05:14:44] 10serviceops, 10SRE, 10SRE-OnFire-Incident-Docs, 10Sustainability (Incident Followup): High latency on appservers - https://phabricator.wikimedia.org/T272215 (10Krinkle) [08:16:02] <_joe_> uhm it's not clear to me what we decided yesterday re: T271573 [08:34:21] _joe_: IIRC we did not decide anything on that particular one [08:40:04] 10serviceops, 10SRE, 10SRE-OnFire-Incident-Docs, 10Sustainability (Incident Followup): High latency on appservers - https://phabricator.wikimedia.org/T272215 (10jcrespo) I personally don't feel capable neither to write proper docs, file follow ups nor to close it. When I said "more details are yet to be pr... [08:46:40] 10serviceops, 10SRE, 10SRE-OnFire-Incident-Docs, 10Sustainability (Incident Followup): High latency on appservers - https://phabricator.wikimedia.org/T272215 (10Joe) >>! In T272215#6836259, @Krinkle wrote: >>>! In T272215#6755992, @jcrespo wrote: >> More details are yet to be provided on the Incident repor... [10:08:34] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10Joe) 05Open→03Resolved [10:08:37] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review, 10User-fsero: Set up PodSecurityPolicies in clusters - https://phabricator.wikimedia.org/T228967 (10Joe) [10:12:17] 10serviceops, 10SRE, 10envoy, 10Service-Architecture: Using envoy to connect from MediaWiki to restbase causes an explosion of live LVS connections. - https://phabricator.wikimedia.org/T266855 (10Joe) a:03Joe [10:56:20] 10serviceops, 10SRE, 10envoy, 10Service-Architecture: Using envoy to connect from MediaWiki to restbase causes an explosion of live LVS connections. - https://phabricator.wikimedia.org/T266855 (10Joe) First observation I can make is that most requests are done by the math extension, and usually go in pairs... [14:09:43] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: Check/Rebuild all docker-pkg build docker images running on kubernetes - https://phabricator.wikimedia.org/T274254 (10JMeybohm) [16:43:50] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for host... [16:44:53] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for host... [16:45:39] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for host... [16:46:30] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for host... [16:57:31] 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review: Check/Rebuild all docker-pkg build docker images running on kubernetes - https://phabricator.wikimedia.org/T274254 (10JMeybohm) [17:12:09] mutante: did you see the opcache free space alert on mwdebug1002? haven't dug at all yet [17:13:01] correction, 1002 and 2002 [17:14:28] rzl: not yet, but thanks for pointing it out. I think they need reboot, will do that [17:14:55] (physical machines get that reboot automatically from cookbook) [17:16:05] ahh [17:17:12] rzl: actually.. i won't because I see effie is testing something there [17:17:41] jiji is tailing logs on both. I will check again after the research showcase [17:18:56] 👍 [18:07:56] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1275.eqiad.wmnet'] ` and were **ALL** s... [18:08:33] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1344.eqiad.wmnet'] ` and were **ALL** s... [18:09:40] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1343.eqiad.wmnet'] ` and were **ALL** s... [18:18:31] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1350.eqiad.wmnet'] ` and were **ALL** s... [20:27:46] 10serviceops, 10MediaWiki-Debug-Logger, 10Developer Productivity, 10Patch-For-Review, 10Release-Engineering-Team (Logspam): Fix unhelpful/duplicate "in on " in php7-fatal-error.php messages - https://phabricator.wikimedia.org/T275075 (10Krinkle)