[00:21:47] 10serviceops: Roll out proxy gutter pool - https://phabricator.wikimedia.org/T258779 (10Krinkle) [00:22:22] 10serviceops: Roll out proxy gutter pool - https://phabricator.wikimedia.org/T258779 (10Krinkle) [00:52:17] 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Followup): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Krinkle) >>! In T240684#6204688, @Joe wrote: > - mcrouter + gutter pool: better consistency (because deletes get... [01:38:12] rzl: I've documented gutterpool and proxies pool at https://wikitech.wikimedia.org/wiki/Memcached_for_MediaWiki [01:38:26] (mainly as a way to test my understanding of it) [04:42:56] 10serviceops, 10MediaWiki-General, 10MediaWiki-Stakeholders-Group, 10Release-Engineering-Team, and 3 others: Drop PHP 7.2 support in MediaWiki 1.35 - https://phabricator.wikimedia.org/T257879 (10Joe) Wikimedia is currently on php 7.2, and I expect the move from 7.2 to 7.3 or 7.4 will only happen after we'v... [04:44:18] 10serviceops, 10MediaWiki-General, 10MediaWiki-Stakeholders-Group, 10Release-Engineering-Team, and 3 others: Drop PHP 7.2 support in MediaWiki 1.35 - https://phabricator.wikimedia.org/T257879 (10Joe) I want to add that, as far as production code goes, we will need to keep support for 7.2 and 7.3 in paralle... [05:40:10] hiii [05:40:12] hiii [05:40:16] hiiiiiiiiiii [05:40:23] HELLOOOOO [05:40:35] OI ANYONE FUCKEN THERE [05:40:38] OIIIII [07:01:11] 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Followup): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10elukey) >>! In T240684#6339437, @Krinkle wrote: >>>! In T240684#6204688, @Joe wrote: >> - mcrouter + gutter pool:... [07:44:23] 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Followup): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Joe) >! In T240684#6339437, @Krinkle wrote: > > I might be missing something, but that doesnt' seem great. Per @... [07:46:32] 10serviceops, 10Operations: All wtp servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1026.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202007280746_j... [08:48:24] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10JMeybohm) [08:52:48] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10JMeybohm) @Dzahn as discussed yesterday I'm going to reimage the eqiad nodes. Would be nice if you could do the codfw ones. We should also reimage the new (still role(insetup)) "par... [09:03:38] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1026.eqiad.wmnet'] ` and were **ALL** successful. [09:14:15] jayme: o/ if this has already discussed apologies in advance, in icinga I see a lot of wtp servers with root partition's space used up to 9x% [09:14:33] (not only 1026) [09:14:37] <_joe_> elukey: yes, that's why he's reimaging them [09:14:49] what he said... [09:14:51] <_joe_> wtp1026 should not be an issue anymore though [09:15:08] ah you are doing all of them, fun Tuesday jayme :D [09:15:50] thanks :P [09:19:27] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1027.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [09:24:51] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1028.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [09:28:41] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1029.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [09:32:00] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1030.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [09:39:14] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10kostajh) Adding some tags for visibility. I'm going to be away from a computer for the next two weeks but @Tgr... [10:31:58] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1027.eqiad.wmnet'] ` and were **ALL** successful. [10:33:41] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10kostajh) [10:34:03] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1031.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [10:37:47] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1028.eqiad.wmnet'] ` and were **ALL** successful. [10:42:17] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1029.eqiad.wmnet'] ` and were **ALL** successful. [10:43:06] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1030.eqiad.wmnet'] ` and were **ALL** successful. [10:47:51] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1032.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [10:51:04] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1033.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [11:33:56] 10serviceops, 10Graphoid, 10Operations, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Platform Engineering (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Jseddon) [11:44:35] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1031.eqiad.wmnet'] ` and were **ALL** successful. [11:55:34] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1032.eqiad.wmnet'] ` and were **ALL** successful. [12:00:16] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1033.eqiad.wmnet'] ` and were **ALL** successful. [12:05:07] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1034.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [12:08:46] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1035.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [12:12:37] if I'm repackaging envoy 1.15.0 with an aim towards putting it in its own component, should I still follow https://wikitech.wikimedia.org/wiki/Envoy#Prepare_a_new_version? wondering if there are issues with our upstream having a more recent version than our official package version just in case [12:12:59] also can/should I push stuff to review instead of straight to origin? [12:14:26] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1036.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [12:17:51] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1037.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [13:17:59] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1035.eqiad.wmnet'] ` and were **ALL** successful. [13:18:31] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1034.eqiad.wmnet'] ` and were **ALL** successful. [13:25:45] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1036.eqiad.wmnet'] ` and were **ALL** successful. [13:28:45] <_joe_> hnowlan: so for upstream changes just push them [13:29:02] <_joe_> before starting to build, we need to make good to our cloud people though [13:29:15] <_joe_> and clean up a bit of the mess we left the last few times :) [13:29:56] heh [13:30:27] I did not get to this by now, sorry [13:31:21] <_joe_> jayme: uh? [13:31:44] <_joe_> hnowlan: don't worry, it's just that we need to spin down a couple old vms and spin up a new one [13:32:31] we said we gonne talk about this today, _joe_ (the cleanup stuff I mean) [13:33:25] <_joe_> jayme: I can look into it with hnowlan once he's done preparing the code :) [13:33:39] <_joe_> also, you've done enough gruntwork for today :P [13:35:39] _joe_: That would be nice. Was just that I offered hnowlan my help on all this - but happy with you taking that [13:37:22] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1038.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [13:38:40] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1039.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [13:39:19] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10Joe) My first note here is that we are actively discouraging shelling out from MediaWiki in production for a s... [13:40:31] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1040.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [13:42:36] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10kostajh) > I think the ideal model for this is having a very simple service that basically accepts post reques... [13:46:54] 10serviceops, 10Operations: Clean up the /*/mw/ mcrouter routing prefix - https://phabricator.wikimedia.org/T256291 (10RLazarus) 05Open→03Invalid [13:47:43] Krinkle: thank you! looks good [13:50:52] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1037.eqiad.wmnet'] ` and were **ALL** successful. [13:55:54] 10serviceops, 10Graphoid, 10Operations, 10Chinese-Sites, 10Platform Engineering (Icebox): Undeploy graphoid for phase 2 wiki's - https://phabricator.wikimedia.org/T258463 (10Aklapper) [13:56:08] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1041.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [14:00:58] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10kostajh) [14:26:09] _joe_: ready to go I think [14:26:39] <_joe_> hnowlan: I'm in erb hell, bear with me for a few minutes [14:26:59] no worries [14:27:56] <_joe_> hnowlan: so first of all, have you ever used horizon? [14:28:22] yeah, used it for all of the deployment-prep changeprop stuff [14:28:48] I'm not in the package building project on Horizon though [14:28:55] <_joe_> ok good [14:29:00] <_joe_> let me add you there [14:30:28] <_joe_> you're hnowlan on wikitech too? [14:31:13] <_joe_> ok, you should have access now [14:31:44] <_joe_> so, we have two envoy build machines: builder-envoy (old, stretch) [14:31:58] <_joe_> and builder-envoy-02 (new, buster) [14:32:43] <_joe_> I think we could even get away with just removing the old one, but the new one uses ceph for disk storage, which is costly and we want to be good citizens and not use too many resources [14:33:00] (also it's slower to build on) [14:33:28] <_joe_> so our idea was to remove the oldest one, create a new "builder-envoy-03" as buster with m1.xlarge [14:33:42] <_joe_> once puppet works there, remove the ceph based VM [14:33:56] <_joe_> did I resume things correctly rzl? [14:34:03] yep, sounds right to me [14:37:19] cool, sounds reasonable. [14:40:04] 10serviceops, 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10akosiaris) [14:40:07] <_joe_> as for the puppet config, just copy what -02 has [14:40:54] definitely nothing on the stretch one that needs saving? if not I'll delete now [14:41:16] <_joe_> rzl: ? [14:41:17] 👍 [14:41:26] <_joe_> 👍 [14:42:57] oooph, for a just second the horizon interface refreshed after I clicked delete missing some of the other instances 😰 [14:47:17] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1038.eqiad.wmnet'] ` and were **ALL** successful. [14:49:13] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1040.eqiad.wmnet'] ` and were **ALL** successful. [14:55:30] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1042.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [14:58:58] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1043.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [15:01:47] 10serviceops, 10Operations, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10elukey) If we want to play this very safe, we could do the following steps: * step 1 - stop redis on mc1036 and wait a day to see if anything is reported or i... [15:10:40] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1039.eqiad.wmnet'] ` and were **ALL** successful. [15:16:17] _joe_: I learned only yesterday that there are "mcrouter proxies", I had assumed we broadcast and shard to other DCs directly. Is there a nutshell you could share about that design choice? Just curious. TLS related? [15:19:18] <_joe_> yes [15:19:26] <_joe_> memcached itself can't speak TLS [15:19:33] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1041.eqiad.wmnet'] ` and were **ALL** successful. [15:19:44] <_joe_> so the alternatives were: proxy via mcrouter which can [15:19:53] <_joe_> or add something like socat to memcacheds [15:20:04] <_joe_> which well, no reason tbh [15:20:20] <_joe_> it's also a common pattern to stack mcrouters one on top of the other [15:27:46] So how do we configure the proxies to behave differently eg to avoid an infinite loop based on the seemingly static puppet config for wancache [15:29:29] <_joe_> what do you mean? [15:29:44] Ah, we use the remote proxies only not (also) the local one. Nice. So it just works. There ::$site is different so it won't point at the proxies again [15:29:45] <_joe_> oh I just got what you meant [15:29:52] <_joe_> yes [15:30:00] <_joe_> also prefixes get removed [15:30:18] <_joe_> so you send a request to eqiad to write to /*/mw-wan [15:30:28] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1044.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [15:30:33] <_joe_> this gets sent to the proxy in codfw with the prefix stripped [15:30:49] <_joe_> so it gets added the default, local-only prefix [15:31:00] <_joe_> /codfw/mw [15:31:01] Ah it gets stripped even in the proxy case. So the only thing you can address is the default /main route [15:31:06] <_joe_> so it doesn't get replicated [15:31:13] Right [15:31:27] <_joe_> every key gets its prefix removed once it's sent upstream [15:31:35] <_joe_> the prefix is used only by mcrouter [15:31:46] Last Q: how does that play with gutter pool. Applies to proxied as well? [15:32:02] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jayme on cumin1001.eqiad.wmnet for hosts: ` wtp1045.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [15:32:15] <_joe_> let's call it failoverroute [15:32:30] <_joe_> so the prefix /eqiad/mw-wan has a failoverroute [15:32:48] <_joe_> the /codfw/mw-wan doesn't right now [15:33:12] <_joe_> it will be added soon, cc rzl (and e.ffie, but she's on PTO) [15:33:26] I mean once it arrives at the proxy, there the default/main route utilizes its local gutter as well right? [15:33:31] <_joe_> yes [15:33:36] Okay [15:33:42] <_joe_> but the route to the proxy itself doesn't failover [15:33:44] <_joe_> and it should [15:33:53] Right yeah I saw the ticket [15:33:53] <_joe_> it should failover without modifying the ttl [15:34:07] <_joe_> it's just a trick given no real sharding happens there [15:34:24] <_joe_> sorry I gtg afk in a couple minutes [15:34:42] Right. Those could just RR if desirable/possible. Doesn't matter [15:34:43] Thanks [16:04:12] Looks like 1.15.0 built fine on envoy-builder-03 [16:05:01] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1043.eqiad.wmnet'] ` and were **ALL** successful. [16:07:05] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1042.eqiad.wmnet'] ` and were **ALL** successful. [16:19:02] _joe_: I've two servers depooled because of mw version missmatches that I don't know the reason for yet. You think it's okay and worth it to leave them depooled for investigation? [16:21:39] wikitech suggests I could run "scap pull" there to fix that, but I'm unsure if that is everything thats needed [16:22:41] jayme: which two? might be related to the api/appserver mixup somehow [16:24:05] 10serviceops, 10Operations, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10nettrom_WMF) >>! In T252391#6337508, @kostajh wrote: > That said, I am pinging @nettrom_WMF and @MMiller_WMF to see if EditorJourney is something we are still... [16:24:58] rzl: wtp1034 and 1035 [16:25:34] ah, that's not it then [16:26:04] oh sorry, I should have realized you were talking about T258775 [16:26:22] jayme: if you're re-imaging right in the middle of a scap deploy, that will happen [16:26:34] i'd just `scap pull` on them [16:26:54] yeah that's what I was just looking for [16:27:17] cdanis: was hoping for an answer like this :) [16:30:42] 👍 [16:38:54] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1045.eqiad.wmnet'] ` and were **ALL** successful. [16:47:49] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10JMeybohm) All hosts but `wtp104[6-8].eqiad.wmnet` completed. Unfortunately wtp1044.eqiad.wmnet is still waiting for the puppet run after reboot. Checking back on that later. [17:03:36] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1044.eqiad.wmnet'] ` and were **ALL** successful. [17:03:59] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10herron) p:05Triage→03Medium [17:42:14] 10serviceops, 10Operations, 10Phabricator, 10Traffic, and 2 others: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) 05Stalled→03Open [18:49:39] 10serviceops, 10Prod-Kubernetes, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Patch-For-Review: "Failed to fork" errors on kubernetes100[1,3,4] - https://phabricator.wikimedia.org/T257679 (10Mholloway) The new Chromium launch flags appear to have completely stopped the service from leaking zombie p... [18:56:45] 10serviceops, 10Prod-Kubernetes, 10Proton, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): "Failed to fork" errors on kubernetes100[1,3,4] - https://phabricator.wikimedia.org/T257679 (10Mholloway) a:03Mholloway [18:58:28] 10serviceops, 10Graphoid, 10Operations, 10Platform Engineering (Icebox): Undeploy graphoid for phase 1 wiki's - https://phabricator.wikimedia.org/T257402 (10Pppery) [19:54:49] Vote for the Gerrit favicon at https://phabricator.wikimedia.org/V21 [21:13:44] 10serviceops, 10Operations, 10Phabricator, 10Traffic, and 2 others: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10Dzahn) @20after4 Current status is now - `aphlict1001.eqiad.wmnet` up and running - phabricator-roots admin gro... [22:24:17] 10serviceops, 10Operations: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10JMeybohm) > Unfortunately wtp1044.eqiad.wmnet is still waiting for the puppet run after reboot. Checking back on that later. wtp1044.eqiad.wmnet complete and repooled. [23:06:12] 10serviceops, 10Operations, 10Patch-For-Review: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp1046.eqiad.wmmet ` The log can be found in `/var/log... [23:06:15] 10serviceops, 10Operations, 10Patch-For-Review: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1046.eqiad.wmmet'] ` Of which those **FAILED**: ` ['wtp1046.eqiad.wmmet'] ` [23:07:39] 10serviceops, 10Operations, 10Patch-For-Review: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp1046.eqiad.wmnet ` The log can be found in `/var/log... [23:15:45] 10serviceops, 10Operations, 10Patch-For-Review: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp1047.eqiad.wmnet ` The log can be found in `/var/log... [23:52:54] 10serviceops, 10Operations, 10Patch-For-Review: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp1048.eqiad.wmnet ` The log can be found in `/var/log...