[09:16:47] 10serviceops, 10dev-images, 10docker-pkg, 10Patch-For-Review, and 2 others: docker-pkg: "certificate verify failed: unable to get local issuer certificate" for docker-registry.discovery.wmnet when publishing dev-images from contint2001 - https://phabricator.wikimedia.org/T274306 (10JMeybohm) @brennen let m... [09:25:57] <_joe_> jayme: question: should scaffolding use helm2 or helm3 for now? [09:26:00] <_joe_> I guess the former [09:26:24] using for what? [09:27:49] _joe_: it should produce apiVersion: v1 charts, if that's what you mean [09:27:59] <_joe_> yes [09:28:06] <_joe_> that's what I meant [09:29:03] <_joe_> yeah I wanted to be sure it was still the right version :) [09:30:45] for services unfortunately yes [09:42:40] <_joe_> akosiaris: you'll love this [09:43:00] <_joe_> so you remember our discussion about the series of ifs with no else right? [09:43:14] <_joe_> I followed your desires and defined the variable inside every if [09:43:23] <_joe_> but ofc golang scoping of variable holds [09:43:36] <_joe_> so what is defined inside an if isn't reachable outside of it [09:43:58] <_joe_> it seems to me that text/template goes out of its way to only go with what golang does when it's painful for the user, tbh [09:45:32] you could define the variable outside and still add an else for a2x, no? :) [09:45:50] <_joe_> I refuse to be that repetitive, even if it's for a8s [09:47:50] a8s? [09:48:19] <_joe_> a8s = expand(a2x) [09:48:58] heh [09:51:58] is there guidance/recommendations on how to generate a password to store in private puppet? can I just use e.g. keepassxc's generator function? or is there a preferred openssl command? [09:55:09] <_joe_> legoktm: I used to use apg but keepassxc seems fine as well [10:00:17] ok [10:07:46] _joe_: could you double-check my commit to /srv/private to make sure I did it right? [10:08:09] <_joe_> sure [10:09:03] <_joe_> legoktm: you did change the "regular push password"? [10:10:08] <_joe_> legoktm: not sure if you needed the passwrod as clear-text or hashes, but apart from that, seems legit [10:10:48] _joe_: we decided to keep the legacy uploader account as part of the regular-push htpasswd file for now, then we'll switch everything that uses it over to regular-push and then remove "uploader" [10:11:06] it's supposed to be clear text, puppet will hash it [10:12:26] and thanks [10:13:44] 10serviceops, 10SRE, 10User-jijiki: Upgrade memcached to version 1.6.x - https://phabricator.wikimedia.org/T270315 (10jijiki) [10:14:10] <_joe_> legoktm: it would maybe make sense to have also a separated user for release engineering that only uploads to releng/ [10:14:24] <_joe_> but not today :) [10:14:30] I was going to say :p [10:14:55] but should be straightforward I think, it's just passing the hiera variables through all the classes mostly [10:18:16] Feb 12 10:17:40 registry1002 nginx[24457]: nginx: [emerg] invalid condition "$request_method" in /etc/nginx/sites-enabled/registry:105 [10:18:59] <_joe_> ugh [10:19:27] <_joe_> do you have puppet disabled on the other registries? [10:19:42] no [10:20:01] should I? [10:20:15] <_joe_> sudo cumin 'registry*' 'disable-puppet "emergency --joe"' [10:20:18] <_joe_> just done [10:20:29] thanks and sorry [10:20:40] <_joe_> now is nginx running on 1002? [10:20:43] I think I need () around the if conditions [10:21:00] no, it's down [10:21:18] should I try to fix it or revert first? [10:21:28] <_joe_> so, let's depool that server [10:21:31] <_joe_> sudo depool [10:21:34] <_joe_> on the machine [10:21:51] <_joe_> and then we can find out what's wrong [10:21:52] done [10:22:11] <_joe_> !log it :) [10:23:21] really should've done this disable puppet/depool beforehand [10:23:33] <_joe_> lesson learned :) [10:23:50] <_joe_> the point being, we don't have any good ci for this stuff [10:25:08] mhm :/ [10:25:11] https://gerrit.wikimedia.org/r/c/operations/puppet/+/663797/ is my proposed fix [10:25:40] <_joe_> let's try it on registry1002 but it looks like what I was about to suggest [10:26:16] I grepped for "if (" and "if $" against **nginx* and only found the former.. also it's how https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ has it [10:27:48] ah wonderful [10:27:49] "auth_basic" directive is not allowed here in /etc/nginx/sites-enabled/registry:106 [10:29:35] <_joe_> you need probably to use limitexcept in a clever way [10:29:56] <_joe_> or limit [10:30:18] <_joe_> if you want to go to bed, we can revert your change and iterate from there [10:30:24] https://nginx.org/en/docs/http/ngx_http_auth_basic_module.html#auth_basic "http, server, location, limit_except" [10:30:38] I should've read that more carefully [10:30:50] <_joe_> ah http server configs are the worst [10:30:55] <_joe_> they're full of such footguns [10:31:18] <_joe_> but yeah I usually run a docker container locally to test stuff before I commit a change [10:31:18] this is uh, my second time doing nginx server config. I feel like it took me the past 5 years to barely figure out Apache [10:31:24] yeah [10:31:38] <_joe_> if you think you've figured out apache, I'm impressed [10:31:40] if it's okay for me to leave puppet disabled for a bit longer, I'd like to fix this now [10:31:46] <_joe_> I am still at a loss most of the time [10:31:47] <_joe_> sure [10:32:48] for reference, I just figured out the contexts for auth_basic by reading this Russian 2014 post to the nginx-ru mailing list https://forum.nginx.org/read.php?21,253994,253996#msg-253996 [10:33:24] <_joe_> lol [10:33:58] in https://forum.nginx.org/read.php?21,253994,254003#msg-254003 they suggest using the if to set variables and then passing them to auth_basic outside the if statement [10:34:49] <_joe_> that could also work, but if it's allowed two subsequent limit_except could work too [10:34:56] <_joe_> maybe it's not [10:35:05] <_joe_> I really have to re-read the docs every time [10:36:32] the reason I didn't use limit_except was because... [10:36:34] <_joe_> but yeah the blog post makes me think you could just rewrite to different locations for different methods [10:37:00] <_joe_> that's probably the best approach in general now that I think of it [10:37:47] <_joe_> anyways, what the forum sucggests might be good too [10:39:04] to do the "return 418" -> error_page @other -> location @other {} thing? [10:39:28] I tried reading https://agentzh.blogspot.com/2011/03/how-nginx-location-if-works.html and now I'm even more lost than before [10:41:05] I'll try the return / location thing [10:42:12] <_joe_> there is a problem with that [10:42:21] <_joe_> I don't know if we'll proxy the url correctly after that [10:42:43] https://nginx.org/en/docs/http/ngx_http_core_module.html#error_page [10:43:07] > If there is no need to change URI and method during internal redirection it is possible to pass error processing into a named location: [10:43:17] <_joe_> right [10:43:49] <_joe_> named locations won't change the url [10:44:15] <_joe_> ok, I think doing things that way in general will actually make that nginx site more readable [10:44:26] <_joe_> but you can leave that for tomorrow and just revert for now [10:47:08] * legoktm does [10:50:55] ok, everything should be back to normal now [10:51:34] _joe_: thanks for walking me through that [11:03:39] <_joe_> legoktm: no one should be left alone fighting http server configs :) [11:11:49] :) [11:12:06] I read through https://www.nginx.com/resources/wiki/start/topics/tutorials/config_pitfalls/ and have a good sense of how to fix this tomorrow and test it locally [11:12:36] <_joe_> great [11:12:49] <_joe_> get your well deserved rest now [11:12:54] good night :) [11:13:04] <_joe_> good night :) [12:42:01] 10serviceops, 10Graphoid, 10SRE, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), and 2 others: Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10akosiaris) [12:42:15] 10serviceops, 10Graphoid, 10SRE, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), and 2 others: Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10akosiaris) [12:47:46] <_joe_> akosiaris: <3 [12:48:06] <_joe_> have you sent the kill order for the scbs to dcops? [12:48:18] No, not yet [12:48:28] but I am preparing patches [12:48:38] <_joe_> can we ask them for a video of them hammering the servers to pieces when it's time? [12:48:42] ooooohhh? wow [12:48:47] yeah that's pay per view stuff right there [12:49:28] rotfl [12:49:29] +1 [12:49:52] <_joe_> it's amazingly fitting I was listening to "Still" while thinking of this [12:50:14] <_joe_> (it's the music you hear in this scene https://www.youtube.com/watch?v=N9wsjroVlu8) [12:50:33] <_joe_> the audio is definitely NSFW [12:50:54] <_joe_> but I guess everyone is familiar with that scene [12:56:28] 10serviceops, 10Graphoid, 10SRE, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), and 2 others: Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10akosiaris) Patches being uploaded. I 've tried to cover everything, but maybe I missed something. In the course of the next week they will be slowly d... [13:17:38] thanks for the link, I hadn't watched it in a long time [13:17:40] it was time. [13:22:11] 10serviceops, 10Graphoid, 10Projects-Cleanup, 10SRE, and 3 others: Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10hashar) Tagging #cleanup for the repositories archival. I guess we can empty up `mediawiki/service/graphoid.git` with a note pointing back to this task, mark the repository r... [15:26:25] 10serviceops, 10Graphoid, 10Projects-Cleanup, 10SRE, and 3 others: Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10DannyS712) [17:22:09] 10serviceops, 10dev-images, 10docker-pkg, 10Release-Engineering-Team (Local Dev), 10User-brennen: docker-pkg: "certificate verify failed: unable to get local issuer certificate" for docker-registry.discovery.wmnet when publishing dev-images from contint2001 - https://phabricator.wikimedia.org/T274306 (10b... [17:27:42] FYI for https://phabricator.wikimedia.org/T272085 [17:28:04] we are going for the latest option with more CPUs [17:28:26] _joe_ wkandek elukey ^ [17:29:27] <_joe_> effie: ack, do you need me to comment on the task? [17:29:41] no I think we are covered [17:30:00] just to FYI what is the final status [17:30:15] we all agree it is the best option [17:31:37] <_joe_> good [17:31:43] <_joe_> I also saw the comment from willy [17:31:58] <_joe_> not sure we'd be able to decom those servers safely without replacements ready [17:32:13] <_joe_> but I definitely want to assist them [17:33:36] 10serviceops, 10Performance-Team, 10SRE, 10Patch-For-Review, 10User-jijiki: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10jijiki) @aaron now that T252564 has been unblocked, after I finish with T273115, I think we should proceed with movin... [17:34:20] _joe_: we have the gutter pool [17:34:32] <_joe_> ' [17:34:51] <_joe_> effie: not sure how's that related to willy's comment [17:34:53] so we can do the replace one by one slower [17:34:58] <_joe_> he was talking about mw1269-79 [17:34:59] oh let me read again [17:36:22] I rememberd something differently [17:40:45] so 4 api and 7 app servers [17:42:23] yes, at best we can be minus 2 api and minus 2 app servers [17:42:46] I will make willy and offer [17:44:51] <_joe_> lol I like the banter [17:45:44] <_joe_> effie: please sign the counteroffer with https://i.pinimg.com/originals/2d/f4/59/2df4596f7335c735c28d777190a61ba5.jpg [17:55:49] I am not italian enough, but I can do my best [18:02:01] <_joe_> neither was Marlon Brando [19:05:30] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [19:06:22] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [19:51:42] effie: I noticed on mwdebug2002, after reimaging it, memcached is "connect to address 10.192.16.66 and port 11210: Connection refused" [19:51:54] I'll see if it does that as well on 2001 now [19:52:35] service / unit is active (running) though [19:56:56] it is fixed after restarting memcached [19:57:20] it was specific to mwdebug though, not on regular appservers [19:57:55] probably just because of the extra reboot they got [20:08:30] yes, on puppet the memcached service is not restarted and update in the unit file [20:08:55] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1353.eqiad.wmnet'] ` an... [20:09:12] so if there is not an extra service restart or reboot, this is expected then [20:09:41] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1358.eqiad.wmnet'] ` an... [20:31:17] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [20:31:25] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [21:15:49] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1357.eqiad.wmnet'] ` an... [21:29:58] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqia... [21:31:33] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1356.eqiad.wmnet'] ` an... [22:15:43] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by legoktm on cumin1001.eq... [22:51:39] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1348.eqiad.wmnet'] ` an... [23:02:21] 10serviceops, 10WikimediaDebug, 10Patch-For-Review, 10Performance-Team (Radar): Convert mwdebug VMs to debian buster - https://phabricator.wikimedia.org/T274023 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mwdebug1002.eqiad.wmnet` - mwdebug1002.eqiad.wmnet... [23:16:00] 10serviceops, 10Continuous-Integration-Config: Add tox CI to operations/software/benchmw - https://phabricator.wikimedia.org/T274686 (10Legoktm) [23:42:44] 10serviceops, 10SRE, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10User-jijiki: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1281.eqiad.wmnet', 'mw12...