[00:20:46] <wikibugs>	 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['scandium.eqiad.wmnet'] `  and were **ALL** successful.
[02:16:23] <wikibugs>	 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) scandium is back up again.  But unfortunately even after the puppet changes above and a second reinstall the...
[08:45:15] <moritzm>	 mw1381 is currently running 4.19, which is causing some issues with mcelog not working, I think this was only needed for some experiments with the kernel mem leak we found a few months ago, ok to revert it back to 4.9 like the rest?
[09:00:31] <jayme>	 IIRC _joe_ wanted to keep it 4.19 for a while. But as we're fine with the workaround and "on track" for buster, I guess it's okay to revert the host back to 4.9
[09:00:56] <_joe_>	 +1
[09:02:11] <_joe_>	 jayme / akosiaris / effie: can I get a couple reviews of https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/634924 and https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/636634/2 ?
[09:03:49] <jayme>	 _joe_: for 634924 there are still a couple of unresolved comments AFAICT
[09:04:06] <_joe_>	 oh damn gerrit
[09:04:28] <jayme>	 unresolved and still valid, I mean :)
[09:06:19] <effie>	 I think we charge double when asked to review and already reviewed patch 
[09:06:25] <_joe_>	 jayme: yeah I just pushed some correction on top of it
[09:06:35] <_joe_>	 and so no comment nor votes appeared
[09:07:45] <moritzm>	 k, will revert mw1381 back to 4.9 later, then
[09:14:16] <jayme>	 thanks!
[09:20:50] <_joe_>	 jayme: I'm about to deploy the mw change to go via envoy to cxserver :P
[09:25:33] <jayme>	 _joe_: cool. Finally cxserver again :D
[09:30:16] <_joe_>	 jayme: I'm just trying to figure out how to test it
[09:30:27] <_joe_>	 given it doesn't seem like there are many errors regarding that
[09:31:45] <effie>	 _joe_:  are you in a hurry with the apache images?
[09:31:57] <_joe_>	 effie: what do you mean?
[09:32:25] <effie>	 I can have a look later
[09:32:38] <effie>	 but not for the next couple of hours
[09:34:06] <_joe_>	 ah ok :) then no, I need to deploy two mediawiki config changes
[09:34:18] <_joe_>	 and then I have a meeting
[09:34:31] <_joe_>	 I guess my morning is mostly gone
[09:35:02] <effie>	 ok cool 
[10:38:28] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Build kubernetes 1.19 - https://phabricator.wikimedia.org/T266766 (10JMeybohm)
[10:40:12] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm)
[10:45:21] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Build kubernetes 1.19 - https://phabricator.wikimedia.org/T266766 (10JMeybohm) p:05Triage→03High
[10:50:24] <jayme>	 _joe_: A mobileapps deployment still kind of stands out :D https://logstash-next.wikimedia.org/goto/73bb32b5d937e268c186e9add1465f0d
[12:51:57] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm)
[13:30:05] <_joe_>	 effie: rzl can I have a brief summary of how we've gotten ourselves into forward-porting redis 2 to debian buster?
[13:30:33] <_joe_>	 I'd also like to know if you've cleared this choice with moritz and john, I would suppose they'd have opinions
[13:33:20] <effie>	 yes sorry, I wanted to first check somethings 
[13:33:29] <effie>	 yes I have spoken to moritz and john
[13:34:13] <effie>	 so, I found some more things that depend on redis
[13:34:17] <effie>	 let me give you the task 
[13:34:28] <_joe_>	 more things meaning other than mediawiki?
[13:34:50] <effie>	 https://phabricator.wikimedia.org/T213089
[13:35:13] <_joe_>	 I'm not reading the whole task again now
[13:35:14] <effie>	 got to Redis Lock Manager issues
[13:35:24] <effie>	 no no just the Redis Lock Manager issues part
[13:35:24] <_joe_>	 that's one of the mediawiki usages, yes
[13:35:55] <effie>	 yeah I meant that is an additional depend on top to the PET depends
[13:36:12] <_joe_>	 you didn't discover something new, and I don't see how that is relevant to "keep redis 2"
[13:36:17] <_joe_>	 if anything, quite the contrary
[13:36:35] <elukey>	 does mediawiki use specific redis 2 functionalities that don't work with redis 5?
[13:36:42] <_joe_>	 so, why are we going with redis 2?
[13:36:59] <effie>	 I wanted to at least use it for the test to reimage one server on buster 
[13:37:11] <effie>	 this is to accomodate this part 
[13:37:26] <effie>	 so I have opened a task for the redis upgrade 
[13:37:40] <_joe_>	 it seems like a waste of time to me
[13:37:44] <effie>	 https://phabricator.wikimedia.org/T265643
[13:38:03] <_joe_>	 if you want to just check memcached, you should remove one redis shard - say mc1020 - from nutcracker
[13:38:13] <_joe_>	 and reimage that server
[13:38:31] <effie>	 yes I know that
[13:38:49] <_joe_>	 ok so... why spend time forward-porting a 10 years old redis version?
[13:39:13] <_joe_>	 anyways, if nothing is decided, that's ok. I'm quite opposed to the idea of using redis 2.8 on buster
[13:39:24] <effie>	 I do not like it more than you do 
[13:39:38] <effie>	 but unless I get an "ok go with redis 5" from stakeholders
[13:39:54] <effie>	 I do not think we are left with another choice other than install 2.8 
[13:40:05] <_joe_>	 NO>
[13:40:11] <effie>	 lol
[13:40:12] <_joe_>	 We already talked about this
[13:40:18] <_joe_>	 you *ARE* the stakeholder
[13:40:31] <_joe_>	 why do you think anyone outside of our team would knwo if it's ok at all
[13:41:07] <_joe_>	 developers never cared particularly, and I'm pretty sure they don't know how to answer that question without investing a significant amount of time
[13:41:20] <effie>	 ^ that I agree
[13:41:33] <effie>	 but what if I go ahead an install redis 5
[13:41:42] <effie>	 and then we find some snowflake that breaks 
[13:41:54] <_joe_>	 you'll do with some care, and if something breaks, we'll consider fixing it
[13:42:00] <_joe_>	 depooling a redis server is pretty easy
[13:42:16] <_joe_>	 also, all uses of redis are by key-value if they go through the mainstash
[13:42:26] <_joe_>	 I fail to imagine how that could be not backwards compatible
[13:42:56] <_joe_>	 the lock manager might be more thorny, but in that case we'll ask aaron to work on making it compatible with redis 5
[13:43:09] <_joe_>	 this can also all tested in deployment-prep first
[13:43:28] <_joe_>	 which would also help fix all the damn mess we have with memcached/redis there
[13:44:06] <effie>	 I agree with evetything, but 
[13:44:16] <effie>	 I do not think this can be solved within a month 
[13:44:27] <effie>	 which is when we want to upgrade the memcached cluster 
[13:44:33] <_joe_>	 seriously?
[13:44:45] <_joe_>	 1 month to spin up 2 buster vms in beta
[13:44:53] <_joe_>	 ask peopel to test the lock manager there
[13:44:57] <_joe_>	 and proceed in prod?
[13:46:05] <_joe_>	 I think it doesn't add much mroe time than forward-porting redis 
[13:46:21] <_joe_>	 (which btw could have some unforeseen issues in itself)
[13:46:45] <_joe_>	 you can also ask for help with parts of the process from other people in the team, me included
[13:47:23] <effie>	 what I am afraid is a) spending time trouble shooting bera 
[13:47:25] <effie>	 beta*
[13:47:56] <effie>	 b) if there is a redis 5 issue, when and how easy can this be solved 
[13:48:19] <effie>	 and how quicly as well 
[13:48:34] <_joe_>	 no one said we can't encounter problems and miss deadlines. I mean if the problem is meeting deadlines, let's forward port memcached too
[13:48:39] <_joe_>	 you're done tomorrow.
[13:48:57] <elukey>	 there is one thing to consider (something that I have brought up 100 times) - we need to reimage one node and let it bake for a bit, memcached perf tuning takes a bit 
[13:48:58] <_joe_>	 well maybe in a week with all the reimages, but you get my point
[13:49:17] <_joe_>	 hence my suggestion to: - reimage one node without redis today
[13:49:24] <effie>	 elukey: that is the goal for the 1 shard, reimaging should start in december
[13:49:26] <_joe_>	 - test redis on beta since we're so nervous
[13:49:36] <_joe_>	 oh.
[13:50:17] <effie>	 this is for one shard in codfw and one shard in eqiad
[13:50:36] <elukey>	 yep but if they are not redis lock managers we can use redis 5, in theory (to avoid the forward port)
[13:50:52] <_joe_>	 yeah
[13:51:03] <_joe_>	 and also eliminate one redis shard
[13:51:08] <_joe_>	 nothing would suffer really
[13:51:20] <_joe_>	 and take our time to reinsert it once beta is done
[13:51:45] <elukey>	 but if we want to keep redis across mc10xx for the foreseeable future even testing redis 5 on a non lock manager is fine (my 2c)
[13:53:47] <effie>	 ok ok so 
[13:54:36] <effie>	 I will remove a redis shard for now
[13:54:59] <effie>	 and we will se how we will proceed
[13:55:39] <effie>	 if we are going to go with redis 5, there might be puppet adjustments needed
[14:07:03] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul) This is taking place today and no update yet from service owners.
[14:09:46] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Joe) As far as mc2029 is concerned, you can just proceed without any impact.  Not sure about sessionstore2002. @hnowlan @Eevans can you please adv...
[14:10:29] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul) @Joe thanks
[14:10:45] <_joe_>	 hnowlan: please take a look at T266577 
[14:10:55] <_joe_>	 papaul needs an answer from us
[14:12:07] <hnowlan>	 ack, will do
[14:19:25] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10hnowlan) All sessionstore2002 will need is a drain before the host is to be moved - I will be on hand for this.
[14:22:23] <_joe_>	 hnowlan: <3 thanks
[14:49:13] <effie>	 I have downtimed mc2029 
[14:49:21] <effie>	 for the upcoming relocation 
[14:54:40] <_joe_>	 effie, jayme I doubt there will be a syncup today
[14:54:46] <_joe_>	 PET is at their virtual offsite
[14:54:50] <effie>	 oh 
[14:54:54] <effie>	 ok I will decline 
[15:10:57] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul) @hnowlan sessionstore2002 has been move and back up online. All yours.  Thanks
[15:12:55] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul)
[15:13:33] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10hnowlan) sessionstore2002 looks good on my end, thanks!
[15:15:21] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul)
[15:31:20] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul)
[15:32:19] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw, 10User-jijiki: codfw: relocate sessionstore2002 and mc2029 from C4 to C3 - https://phabricator.wikimedia.org/T266577 (10Papaul) 05Open→03Resolved This is complete. Thanks to all
[16:28:48] <wikibugs>	 10serviceops, 10Wikidata, 10Wikidata Query Builder, 10Wikidata Query UI, 10User-Addshore: Host static sites on kubernetes - https://phabricator.wikimedia.org/T264710 (10akosiaris) 05Open→03Invalid >>! In T264710#6586070, @Addshore wrote: > Sounds like a fine solution from our side for now. > I'll let...
[22:26:34] <mutante>	 moving the scap proxy role for A7 eqiad from mw1268 to mw1269, mw1268 needs to move physically to another rack. since it means a puppet diff on all appservers.. merged it just over 30 min before next deploy window
[22:27:06] <mutante>	 some appservers are moving around because they are in the way of ms-be servers with 10G