[07:01:59] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Joe) @Legoktm ran some tests, analogous to the one we... [07:06:50] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Legoktm) >>! In T273312#6795003, @Joe wrote: > * The... [07:34:10] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10MoritzMuehlenhoff) All architectual mitigations are a... [07:39:41] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10MoritzMuehlenhoff) In fact, we still have an API serv... [08:30:22] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10Joe) I'm currently implementing this step, and for now my implementation idea goes as follows: - By default, we'll verify that a `test` executable is present in the image directory, and run it with the i... [08:44:02] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10JMeybohm) While I like the flexibility it brings the downside of potentially having to duplicate test code for multiple images. We will probably have some repeated tasks in the tests, right? I'm especiall... [08:54:42] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10Joe) >>! In T273427#6795227, @JMeybohm wrote: > While I like the flexibility it brings the downside of potentially having to duplicate test code for multiple images. > We will probably have some repeated... [08:56:50] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10Joe) So to make an example, right now `production-images` has some tests written in bash (yes, I know!). We could think of adding to the root of that repository a relatively simple `common-tests.sh` file... [08:58:53] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10JMeybohm) Yeah, sure. I just wanted to make the point that we should (be able to) provide some basic boilerplate for testing and probably have some generic tests implemented that can easily be used (or wi... [08:59:07] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10JMeybohm) >>! In T273427#6795277, @Joe wrote: > So to make an example, right now `production-images` has some tests written in bash (yes, I know!). > > We could think of adding to the root of that reposi... [09:03:04] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Release-Engineering-Team (Pipeline): Helm install fails in CI namespace: apparmor failed to apply profile - https://phabricator.wikimedia.org/T273563 (10JMeybohm) a:03JMeybohm Most likely my fault as I installed the apparmor package as part of T228967. Wil... [09:10:55] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Release-Engineering-Team (Pipeline): Helm install fails in CI namespace: apparmor failed to apply profile - https://phabricator.wikimedia.org/T273563 (10Joe) p:05Triage→03High [09:27:06] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Legoktm) >>! In T273312#6795045, @MoritzMuehlenhoff w... [09:40:34] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Joe) This kind-of seals the deal. Upgrading the kerne... [09:43:40] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Ladsgroup) Maybe mw devs (including yours truly) can... [09:56:13] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Joe) >>! In T273312#6795425, @Ladsgroup wrote: > Mayb... [09:57:58] 10serviceops, 10SRE: upgrade conf2* servers to stretch - https://phabricator.wikimedia.org/T271573 (10jcrespo) > If it is too impacting we could try to figure out a workaround for these nodes :( How bad would it be to disable monitoring of backups (and backups to fail) of these (but keeping the backups of con... [10:23:17] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (10elukey) I have followed https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_l... [11:39:13] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review, 10Release-Engineering-Team (Pipeline): Helm install fails in CI namespace: apparmor failed to apply profile - https://phabricator.wikimedia.org/T273563 (10JMeybohm) Fortunately, this issue had not effected prod clusters. So it must be som... [11:56:18] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review, 10Release-Engineering-Team (Pipeline): Helm install fails in CI namespace: apparmor failed to apply profile - https://phabricator.wikimedia.org/T273563 (10JMeybohm) 05Open→03Resolved Did a puppet run on the affected nodes and scaled b... [11:57:13] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Daimona) >>! In T273312#6795006, @Legoktm wrote: >>>!... [15:59:24] 10serviceops, 10docker-pkg: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10Joe) p:05Triage→03Medium a:03Joe [16:32:08] 10serviceops, 10docker-pkg, 10Patch-For-Review: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10hashar) I am not sure what kind of problem it is aimed at solving or what the proposed command will solve? For the CI images in `integration/config.git` we have two set of tests:... [16:38:10] 10serviceops, 10docker-pkg, 10Patch-For-Review: Add a verify step to docker-pkg - https://phabricator.wikimedia.org/T273427 (10Joe) >>! In T273427#6796653, @hashar wrote: > B) for some images, we have plain shell scripts (`dockerfiles/*/example-run.sh`). We run them manually and they simply execute the image... [20:02:45] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Legoktm) >>! In T273312#6795794, @Daimona wrote: >>>!... [20:36:49] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [20:37:48] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [21:13:05] 10serviceops, 10MW-on-K8s, 10Release Pipeline, 10Release-Engineering-Team-TODO: Create restricted docker-registry namespace for security patched images - https://phabricator.wikimedia.org/T273521 (10thcipriani) [21:30:17] 10serviceops, 10SRE, 10Performance-Team (Radar), 10Release-Engineering-Team (Deployment services), and 2 others: Investigate possible performance degradation on mediawiki servers after Debian Buster upgrade - https://phabricator.wikimedia.org/T273312 (10Joe) >>! In T273312#6797450, @Legoktm wrote: > Not su... [21:54:27] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1335.eqiad.wmnet'] ` an... [21:55:00] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1336.eqiad.wmnet'] ` an... [21:57:23] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token - https://phabricator.wikimedia.org/T273681 (10dduvall) [21:58:55] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token - https://phabricator.wikimedia.org/T273681 (10Dzahn) a:03Dzahn [22:00:21] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token - https://phabricator.wikimedia.org/T273681 (10Dzahn) 21:56 <+icinga-wm> PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.01208 ge 0... [22:08:39] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token - https://phabricator.wikimedia.org/T273681 (10Dzahn) I fixed the first issue by adding the "kubernetes_config::token" to the releases role in the private... [22:09:03] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token - https://phabricator.wikimedia.org/T273681 (10Dzahn) ` Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not... [22:12:09] 10serviceops, 10MW-on-K8s, 10SRE, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token, dependency issue in kubeconfig.pp - https://phabricator.wikimedia.org/T273681 (10Dzahn) [23:33:22] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token, dependency issue in kubeconfig.pp - https://phabricator.wikimedia.org/T273681 (10Dzahn) ^ Mostly fixed the puppet runs on releases*... [23:35:04] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team-TODO: Puppet failing on releases hosts due to missing profile::ci::kubernetes_config::token, dependency issue in kubeconfig.pp - https://phabricator.wikimedia.org/T273681 (10Dzahn) a:05Dzahn→03None At this point it is ge...