[09:37:55] 10serviceops, 10Operations: wtp1025's root partition full - https://phabricator.wikimedia.org/T258775 (10Joe) p:05Triage→03High I want to check the rest of the wtp servers as well. We'll need to schedule reimaging it ASAP. [09:40:57] 10serviceops, 10Operations, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10elukey) >>! In T252391#6328325, @aaron wrote: > Given the libketama-style consistent hashing in twemproxy and that, AFAIK, CentralAuth sessions can regenerate... [09:43:53] 10serviceops, 10Operations: All wtp servers in eqiad have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Joe) [09:45:09] 10serviceops, 10Operations: All wtp servers in eqiad have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Joe) I think this is a good chance for the newcomers to do some practice with the process of reinstalling a server in our infra - @JMeybohm and @hnowlan I'm thinking of you, but also... [09:46:52] 10serviceops, 10Operations: All wtp servers in eqiad have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10Joe) The problem is also present in codfw, btw: ` $ sudo cumin 'wtp*' 'lvdisplay | fgrep -A 10 _placeholder | fgrep Size' 43 hosts will be targeted: wtp[2001-2004,2006-2020].codfw.w... [09:54:15] 10serviceops, 10Operations: All wtp servers in eqiad have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10hnowlan) I've already done a few reimagings so I'm happy to let someone else have the opportunity :) [09:59:06] 10serviceops, 10Operations: All wtp servers in eqiad have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10JMeybohm) a:03JMeybohm Happy to pick this up [10:10:20] <_joe_> jayme: I thought it would be useful for both you and hnowlan to try to reimage a few servers [10:10:42] <_joe_> it's an automated process, in theory, but we can go through what needs to be changes together [10:10:53] <_joe_> (possibly, it's just the partman recipe used) [10:12:08] I've reimaged a few restbase servers already, with wmf-auto-reimage. If this is a different process I'd be interested though [10:43:39] <_joe_> no it's the same indeed [11:03:19] I know we discussed this a bit before, but I'm not sure which route to pursue on envoy 1.15.0. We mentioned creating a package with a specific name like envoyproxy115, publishing a regular envoyproxy package to a dedicated component or just using the official upstream image for the time being. Any thoughts? The last one is obviously the path of least resistance but I'm not sure how tolerated [11:03:25] that is [11:07:01] I was on the alternatative package name front in this. But a dedicated component in the apt repo might be fine as well I guess. What I don't remember is: Do you actually need a package or just a docker image? [11:10:27] Just a docker image in theory [11:26:53] Hm...then you might even be able to skip the step of bulding a debian package entirely I guess [11:41:10] Makes sense. I guess it'd be a matter of doing another docker-pkg container in production-images that does the build from source [12:15:47] Right. And maybe it's a mess to do it (or better: easier when being able to reuse the "build envoy" work of jo.e and rz.l) [13:11:31] 10serviceops, 10Operations, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10Ottomata) First I've heard that WikimediaEvents uses memcached to track funnels. From https://meta.wikimedia.org/wiki/Schema_talk:EditorJourney, it looks like... [13:24:15] 10serviceops, 10Operations, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10Joe) Data in redis got evicted at furious rates for years before sessionstore, if we remove one server from the ring i don't expect any real issue. [13:26:38] 10serviceops, 10Operations, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10Joe) >>! In T252391#6248986, @RLazarus wrote: > Side note: This question is also interesting from a DC switchover perspective (T243316) since that will also ef... [13:30:45] _joe_ should we start thinking about merging my patch then? --^ [13:31:10] <_joe_> elukey: yes :) [13:32:13] * elukey dances [13:34:15] 10serviceops, 10Operations, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10kostajh) >>! In T252391#6337452, @Ottomata wrote: > First I've heard that WikimediaEvents uses memcached to track funnels. From https://meta.wikimedia.org/wik... [13:45:48] 10serviceops, 10Operations, 10Patch-For-Review: All wtp servers in eqiad have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1025.eqiad.wmnet ` The log can be found in `/var/log/... [15:18:18] 10serviceops, 10Operations: All wtp servers in eqiad have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1025.eqiad.wmnet'] ` and were **ALL** successful. [15:50:24] 10serviceops, 10Operations: All wtp servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10JMeybohm) [19:36:44] 10serviceops, 10Operations, 10Kubernetes: Fix nginx config and caching for docker registry - https://phabricator.wikimedia.org/T256762 (10herron) p:05Triage→03Medium