[06:02:41] 10serviceops, 10MediaWiki-REST-API, 10Operations, 10wikidiff2, and 2 others: Deploy version 1.10.0 of wikidiff2 to production - https://phabricator.wikimedia.org/T236963 (10Legoktm) @tstarling is the gpg key that you used to sign that release available anywhere? https://www.mediawiki.org/keys/keys.txt stil... [09:11:56] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1317.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911270911_jiji_16360_mw1317_... [09:14:21] !log reimage mw1322.eqiad.wmnet - T239054 [09:14:41] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1322.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911270914_jiji_16885_mw1322_... [09:26:12] effie, _joe_: following up on monday's meeting, what's missing in the reimage script to help this batch of reimages? [09:27:48] volans: iirc was to use httpbb in place of apache-fast-test [09:29:30] <_joe_> that's not a blocker though, right? [09:29:43] <_joe_> it's nice to have :) [09:30:01] :) [09:32:08] effie: are you aware of the parallel and sequential modes for the reimage script? [09:34:01] I love it when volans is selling his merchandise [09:34:15] I was not aware, please tell me [09:34:20] :D:D <3 [09:34:28] I am actually serious [09:35:35] I read from https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reimage [09:36:22] so I run wmf-auto-reimage-host for the first ones, just to make sure we are good to go [09:38:07] that is to run for a single host [09:38:12] if you run wmf-auto-reimage [09:38:31] you can pass multiple hosts, either in parallel (with some soft and hard limit on the number of hosts you can do) [09:38:40] or in sequence with an optional sleep in between them [09:40:38] I will update the doc and add an example for wmf-auto-reimage [09:40:47] thank you ! [09:42:56] * effie will buy again [09:43:29] * volans has another happy customer :D [09:43:46] the help message should be descriptive enough, but lmk if it's not [09:44:19] give you told me that you can run a couple at a time [09:45:09] I would open two tmux and run there in sequence a bunch. As a protip, start the second one a couple of minutes after the first, so that they are a bit splayed, to avoid contention when they try to run puppet on the Icinga host for the downtime [09:45:26] with many in parallel you might get some failed downtime [09:57:14] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [09:57:32] effie: ^^^ in case you missed my last part [10:01:33] I actually run them with a few minutes apart as I was compiling the list in the task [10:02:23] is 3 at the same time ok you believe ? [10:02:35] I was thinking of batches of one of each [10:04:52] * volans fail to parse [10:21:22] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1317.eqiad.wmnet'] ` and were **ALL** successful. [10:24:20] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1322.eqiad.wmnet'] ` and were **ALL** successful. [10:38:52] volans: I meant one of each cluster (api, app, job) [10:40:11] ah sure, that makes sense [10:45:48] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1347.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271045_jiji_35912_mw1347_... [10:47:48] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1337.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271047_jiji_36247_mw1337_... [10:50:10] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1327.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271049_jiji_36621_mw1327_... [11:17:46] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by filippo on cumin1001.eqiad.wmnet for hosts: ` mw2290.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271117_filippo_42756_m... [11:21:31] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw2289.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271120_jiji_43318_mw2289_... [11:31:23] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2290.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2290.codfw.wmnet'] ` [11:50:47] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1337.eqiad.wmnet'] ` and were **ALL** successful. [11:51:42] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1347.eqiad.wmnet'] ` and were **ALL** successful. [12:00:32] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1327.eqiad.wmnet'] ` and were **ALL** successful. [12:05:13] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2289.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2289.codfw.wmnet'] ` [12:05:51] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw2289.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271205_jiji_55597_mw2289_... [12:05:54] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2289.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2289.codfw.wmnet'] ` [12:06:18] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw2289.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271206_jiji_55698_mw2289_... [12:07:11] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by filippo on cumin1001.eqiad.wmnet for hosts: ` mw2290.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271207_filippo_55841_m... [12:07:17] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2290.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2290.codfw.wmnet'] ` [12:07:30] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by filippo on cumin1001.eqiad.wmnet for hosts: ` mw2290.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271207_filippo_55893_m... [12:07:33] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2290.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2290.codfw.wmnet'] ` [12:07:44] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by filippo on cumin1001.eqiad.wmnet for hosts: ` mw2290.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271207_filippo_55940_m... [12:47:45] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2289.codfw.wmnet'] ` and were **ALL** successful. [12:47:53] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2290.codfw.wmnet'] ` and were **ALL** successful. [13:16:19] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1348.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271315_jiji_70711_mw1348_... [13:17:08] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1338.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271316_jiji_70851_mw1338_... [13:17:32] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1328.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271317_jiji_70890_mw1328_... [13:24:56] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mw2288.codfw.wmnet', 'mw2287.codfw.wmnet', 'mw2286.codfw.wmnet'] ` The log can be found in `/var/log/... [13:26:07] 10serviceops, 10Operations: Reimage mwdebug1002 and mw1317 - https://phabricator.wikimedia.org/T236806 (10jijiki) 05Open→03Resolved [14:07:04] 10serviceops, 10Operations, 10Traffic: Investigate the remaining usage of X-Real-IP - https://phabricator.wikimedia.org/T239340 (10akosiaris) [14:07:10] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2288.codfw.wmnet', 'mw2287.codfw.wmnet', 'mw2286.codfw.wmnet'] ` and were **ALL** successful. [14:07:14] 10serviceops, 10Operations, 10Traffic: Investigate the remaining usage of X-Real-IP - https://phabricator.wikimedia.org/T239340 (10akosiaris) p:05Triage→03Low [14:16:27] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['mw2285.codfw.wmnet', 'mw2284.codfw.wmnet', 'mw2283.codfw.wmnet'] ` The log can be found in `/var/log/... [14:20:53] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1338.eqiad.wmnet'] ` and were **ALL** successful. [14:28:12] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1328.eqiad.wmnet'] ` and were **ALL** successful. [14:28:40] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1348.eqiad.wmnet'] ` and were **ALL** successful. [14:44:32] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1346.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271443_jiji_96389_mw1346_... [14:58:57] <_joe_> ottomata: howdy [14:59:53] hello! [15:00:01] ready over here :) [15:00:12] <_joe_> sooo I have good news for you [15:00:17] <_joe_> I have an LVS change too [15:00:17] oh? [15:00:21] wonderful [15:00:21] <_joe_> we can run them together [15:00:36] what's yours? [15:00:42] <_joe_> what was your change again? [15:00:53] patches are listed here [15:00:54] https://phabricator.wikimedia.org/T236386 [15:00:55] <_joe_> https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/553294/ is mine - blubberoid already exists [15:01:05] first puppet LVS [15:01:08] then the discovery stuff [15:01:28] then maybe if we are feeling frisky we could do the public routing but let's see how the other stuff goes first [15:01:37] the Kafka TLS stuff i'll do later, not a blcoker for this [15:01:43] <_joe_> ok good [15:02:00] <_joe_> so, first things first, do you want to share a root tmux on cumin1001? [15:02:03] k [15:02:24] <_joe_> I have a session open [15:02:53] k i usually use screen so am doing tmux --help [15:02:53] ... [15:03:02] <_joe_> sudo -i tmux att [15:03:07] danke [15:03:25] cool [15:03:49] haha, already root, no sduo>? [15:03:56] oh really user [15:03:56] ha [15:06:15] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1336.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271505_jiji_103277_mw1336... [15:06:28] <_joe_> ottomata: now I'm going to merge my patch and your patch [15:06:33] k [15:06:34] <_joe_> on puppetmaster1001 [15:06:42] <_joe_> but you don't really need to see how that's done [15:06:47] :) [15:07:59] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mw1326.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911271507_jiji_103657_mw1326... [15:10:31] <_joe_> ottomata: back to our tmux session [15:10:34] k [15:11:46] <_joe_> ottomata: you can see your ip being added [15:11:56] 10.2.2.50 ya [15:13:02] <_joe_> we now have to ensure puppet runs on all the workers [15:13:06] aye [15:13:16] <_joe_> (I created a new tab in tmux) [15:13:22] ah [15:13:27] ha, thought i did something [15:13:28] ok [15:14:20] <_joe_> uhm [15:14:57] hm [15:15:07] did we run puppe ton the eqiad ones? [15:15:15] i guess kubernetes-worker should have done that too [15:15:39] evengate [15:15:39] ! [15:15:42] <_joe_> ottomata: "evengate" [15:15:44] <_joe_> even [15:15:45] :( [15:15:46] <_joe_> :P [15:15:48] ok.. [15:15:51] <_joe_> ok, let's fix it [15:15:56] <_joe_> can you prepare the patch? [15:16:18] should we add "evengate" to typos? 0:-) [15:16:34] ha yeah [15:17:00] ya prpeping [15:18:12] https://gerrit.wikimedia.org/r/c/operations/puppet/+/553352 [15:18:40] ok to merge? [15:19:05] <_joe_> yes :P [15:19:58] ah cool! [15:19:59] conftool::cleanup: Removing node with tags eqiad/kubernetes/evengate-logging-external/kubernetes1002.eqiad.wmnet [15:20:24] so puppet on kube workers doesn't matter here, that just creates the LVS IP [15:20:32] the confd entries are added by puppet-merge [15:20:32] ? [15:20:33] <_joe_> yep [15:20:37] cool [15:20:43] <_joe_> back to tmux [15:20:45] k [15:21:41] <_joe_> so now everything's set up to be pooled [15:21:46] <_joe_> we can move to the LVS servers [15:21:51] k [15:22:02] <_joe_> let's start with the codfw secondary, I'll run puppet via cumin [15:23:44] huh dpkg-reconfigure eh? [15:23:58] <_joe_> ottomata: that's how lvs ips get configured yes [15:24:12] aye interesting [15:24:49] <_joe_> for some value of "interesting", yes [15:25:06] <_joe_> ok now we need to hop on the server to do things correctly [15:25:16] ok [15:25:19] <_joe_> first lemme log [15:25:31] lvs2006? [15:26:13] <_joe_> or well let's do it from cumin [15:26:20] either is fine [15:27:01] wmmnet [15:27:39] <_joe_> you see we have the session ESTABLISHED at 1.0 [15:27:51] <_joe_> now we will restart pybal [15:28:35] <_joe_> the sessions are already established again [15:28:43] k [15:29:37] <_joe_> ok now try to reach eventgate-logging-external.codfw.svc.wmnet [15:29:42] <_joe_> and see if it works [15:29:45] <_joe_> it should! [15:30:10] 10serviceops, 10Page Content Service, 10Product-Infrastructure-Team-Backlog: Mobileapps flapping on scb2005 since 2019-11-26 0:00 UTC - https://phabricator.wikimedia.org/T239344 (10Mholloway) [15:30:27] ya it does! [15:30:32] < HTTP/1.1 201 Created [15:30:34] when posting test event [15:30:41] <_joe_> kubernetes2003.codfw.wmnet: enabled/partially up/pooled │················ [15:30:43] yeehaw, to https too! [15:30:43] <_joe_> uhmmm [15:31:21] h [15:31:24] hm [15:31:40] <_joe_> Nov 27 15:31:33 lvs2006 pybal[53992]: [eventgate-logging-external_43192 ProxyFetch] WARN: kubernetes2005.codfw.wmnet (enabled/partially up/pooled): Fetch failed (http://localhost/_info), 0.002 s [15:31:50] hm [15:31:51] <_joe_> maybe https? or the url is wrong? [15:32:04] needs port no? [15:32:18] <_joe_> uhm not sure [15:32:23] trying [15:33:17] _info responds on both http and https ports [15:33:29] <_joe_> but we're connecting to the https one [15:33:30] but i can't use the https port diretly with kubeternetes2005 due to SAN [15:33:46] <_joe_> that doesn't matter, this is pybal's proxyfetch url [15:34:03] <_joe_> I'm pretty sure the problem is http vs https [15:34:30] hm, but why would it work on 3 of them? [15:34:34] <_joe_> also, why are we using ProxyFetch for kubernetes [15:34:35] <_joe_> that's wrong [15:34:38] oh [15:34:40] they are all partially up [15:34:41] ok [15:34:44] <_joe_> just use idleconnection [15:34:54] i don't know the difference? [15:35:09] <_joe_> ProxyFetch tests that the service is healthy and responds to a url [15:35:21] <_joe_> IdleConnection just that the port is open and accepts connections [15:35:32] <_joe_> k8s has its own health checking logic [15:35:41] <_joe_> we shouldn't add a second stratum above [15:36:02] meaning we don't need pybal to check the url? [15:36:06] <_joe_> yeah [15:36:09] <_joe_> but if you want to [15:36:12] <_joe_> add https:// [15:36:16] <_joe_> no need for the port [15:36:17] https://gerrit.wikimedia.org/r/c/operations/puppet/+/550922/5/hieradata/common/lvs/configuration.yaml [15:36:29] ok, i see, the url there needs https in it [15:36:35] <_joe_> yeah [15:36:37] k [15:36:38] <_joe_> just change that [15:36:40] k [15:36:43] 10serviceops, 10Page Content Service, 10Product-Infrastructure-Team-Backlog: Mobileapps flapping on scb2005 since 2019-11-26 0:00 UTC - https://phabricator.wikimedia.org/T239344 (10Mholloway) Looks like the timeouts are occurring on requests to `https://api-rw.discovery.wmnet/w/api.php` — note the `https://`. [15:37:42] _joe_: this is the icinga check command... pybal uses that? [15:37:54] <_joe_> not the icinga check command [15:38:06] <_joe_> under monitors: [15:38:11] oho oho [15:38:12] got it [15:38:17] ProxyFetch right sorry [15:39:04] huh, _joe_ why doesn't that need the port or the hostname? will that work without localhost being in the SAN (or does localhost somehow not matter?) [15:39:23] <_joe_> ottomata: because pybal [15:39:32] haa ok [15:39:36] <_joe_> ask mark :P [15:39:56] <_joe_> but if you think hard about it, it's just a convention [15:40:09] <_joe_> the request will be performed to the backend, and will use the scheme you offer [15:40:41] hm oook... [15:40:48] https://gerrit.wikimedia.org/r/c/operations/puppet/+/553355 [15:43:58] why were they 'partially up' instead of just down? [15:44:08] i guess the configured port responded? [15:44:14] but the extra ProxyFetch check failed? [15:44:37] <_joe_> ottomata: yep [15:44:39] <_joe_> btw [15:44:45] <_joe_> it works now [15:44:49] i see! :) [15:45:07] ask what? :) [15:45:18] hahh o/ [15:45:32] <_joe_> you can do the same, server by server, for the other lvs servers involved [15:46:00] was just wondering how pybal ProxyFetch works with e.g. https://localhost/_info without port info, and without 'localhost' being in the https certificate's SAN [15:46:03] ok [15:46:24] so _joe_ you are merging the lvs changes on all the codfw lvs servers [15:46:26] <_joe_> ottomata: I'm doing it today because there is also an endpoint of mine [15:46:39] <_joe_> running puppet on lvs, yes [15:46:40] so that's just the hostname and URI it uses for the request, right [15:46:48] the port is the lvs service port [15:46:56] and the ip is the realserver ip [15:47:08] ok _joe_ ah, thoughy you wanted me to do the restarts? [15:47:20] <_joe_> nope [15:47:23] k [15:47:47] why would you use localhost if it's not in the SAN? [15:47:53] * mark is not sure he totally gets the question :) [15:49:01] mark: where is the proxyfetch check actually run from? i assume the lvs server checking to see if the backend responds at that URL [15:49:02] ? [15:49:13] yes, from the lvs host to the realserver ip [15:49:29] does it replace e.g. localhost with the realserver ip and port? [15:49:35] no [15:49:48] that url does not determine what ip and port it contacts at all [15:49:54] it is just what it sends in the HTTP request [15:50:01] the URI and Host: values [15:50:34] so you can put any hostname in there that the server accepts [15:50:40] and uri [15:51:41] isn't the http request from the lvs server tho? [15:52:05] does pybal just parse the ProxyFetch url and uses the bits it needs, e.g. the path part? [15:52:22] OH, 'ProxyFetch' [15:52:45] is the actual http request somehow happening from the backend itself? [15:52:58] no [15:53:04] the port would still be needed [15:53:09] no [15:53:14] the port is the same port as the lvs service is on [15:53:22] pybal connects to the realserver ip, and the lvs service port [15:53:33] and then sends the URI in that url, and a Host: header with the hostname from that url [15:53:48] the url does not influence what tcp connection pybal makes, at all [15:54:27] ahhh ok, so pybal parses the url and just uses bits of it [15:54:52] 'Host: ' realserverip:port/ [15:54:53] yes, the hostname and the uri [15:55:01] ok got it [15:55:27] ty [15:55:32] _joe_: how we doing? [15:55:41] <_joe_> all ok [15:55:42] and the scheme also I think [15:55:44] <_joe_> almost done [15:55:45] so it can do https requests [15:55:50] <_joe_> the scheme is used, yes [15:55:57] <_joe_> ottomata just learned that :P [15:56:01] ah yes and :// [15:56:07] but it would ignore port [15:56:09] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1346.eqiad.wmnet'] ` and were **ALL** successful. [15:56:37] so the scheme just determines whether it does an SSL connection or plaintext [15:58:35] aye [15:58:42] <_joe_> ottomata: our next step is [15:58:46] <_joe_> running puppet on icinga [15:58:55] <_joe_> and hoping the alerts you set up work correctly [15:59:41] ha [15:59:55] i can hit that https url from deployment1001 [16:00:05] hm _joe_ did we do eqiad already? [16:00:25] <_joe_> ottomata: yes [16:00:45] oh ok [16:00:50] lemme double check there [16:01:20] ya looks good _joe_ [16:03:02] so this is actually the reason why the monitor was called ProxyFetch [16:03:05] cool is showing up in icinga.wm.org [16:03:25] because it ignores the url for the tcp connection part and 'proxies' it via the realserver [16:03:25] except [16:03:31] it doesn't do a proxy style http request [16:03:33] oh that is just hte host [16:03:39] pending check for lvs [16:03:55] i should have just called it HTTPFetch [16:03:58] (then again, https ;) [16:04:04] or Fetch [16:04:45] or maybe have the url check parts specified separately? I think the confusing bit is 'localhost' and no port [16:05:33] maybe proto, path, hostname as sepearate configs ? oor whateveer is fine :) [16:06:20] localhost is weird anyway when you're talking to another host [16:06:47] ya [16:07:44] not sure why that is used? pybal doesn't care [16:08:08] i guess just as a placeholder? easier for copy/paste in hiera config? [16:08:38] i wonder if without hostname works too [16:08:50] http:///uripath [16:08:51] _joe_: the icinga check still says pending, not sure how long it takes there? [16:08:54] never tested :) [16:09:08] heh, i guess it depends on pybal's url parser [16:09:14] and the server of course [16:13:04] <_joe_> ottomata: it shouldn't stay pending too long [16:13:16] <_joe_> if it keeps being pending, ask someone for help in #-sre [16:13:25] <_joe_> I'm a bit clogged right now [16:13:53] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1336.eqiad.wmnet'] ` and were **ALL** successful. [16:16:31] _joe_: ok, will we have time to do the discovery patches today? [16:16:56] <_joe_> ottomata: possibly [16:17:14] <_joe_> lemme take a break, I've been working since 10 hours with little breks in the middle [16:18:16] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2285.codfw.wmnet', 'mw2284.codfw.wmnet', 'mw2283.codfw.wmnet'] ` and were **ALL** successful. [16:19:47] _joe_: ok, its ok, i am technically off today anyway :) [16:19:52] i could probably do this again tomorrow this time [16:19:54] if that works for you [16:21:11] <_joe_> ottomata: I can take care of it [16:21:23] _joe_: oh ok [16:21:23] ty [16:21:40] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1326.eqiad.wmnet'] ` and were **ALL** successful. [16:24:52] looks like the icinga checks are coming in fine [16:29:54] 10serviceops, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) [16:31:42] ok, i'm going off, thanks for help _joe_ ! will be of rest of the week, we can pick up remaining stuff on monday [16:31:46] off* [16:31:47] byeee [16:32:50] 10serviceops, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, 10Patch-For-Review: Mobileapps flapping on scb2005 since 2019-11-26 0:00 UTC - https://phabricator.wikimedia.org/T239344 (10LGoto) p:05Triage→03Normal [16:33:24] 10serviceops, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Mobileapps flapping on scb2005 since 2019-11-26 0:00 UTC - https://phabricator.wikimedia.org/T239344 (10Mholloway) [16:42:27] 10serviceops, 10Core Platform Team, 10Product-Infrastructure-Team-Backlog: PCS internal request rates tripled on 2019-11-19 - https://phabricator.wikimedia.org/T238832 (10LGoto) p:05Triage→03Normal [16:55:54] 10serviceops, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Mobileapps flapping on scb2005 since 2019-11-26 0:00 UTC - https://phabricator.wikimedia.org/T239344 (10Mholloway) a:03Mholloway [16:59:09] 10serviceops, 10Operations, 10Kubernetes, 10Release Pipeline (Blubber): Move blubberoid to use TLS only. - https://phabricator.wikimedia.org/T236017 (10Joe) p:05Triage→03Normal [17:02:32] 10serviceops, 10Operations, 10Kubernetes, 10Patch-For-Review, 10Release Pipeline (Blubber): Move blubberoid to use TLS only. - https://phabricator.wikimedia.org/T236017 (10Joe) Once the patch I created is merged, we will be able to remove the HTTP endpoint as soon as we're varnish-be-free. [17:32:22] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10jijiki) [19:08:37] 10serviceops, 10MediaWiki-General, 10Core Platform Team Workboards (Clinic Duty Team), 10Language-Team (Language-2019-October-December), and 4 others: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache - https://phabricator.wikimedia.org/T235188 (10CCicalese_WM... [19:09:48] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Core Platform Team Workboards (Clinic Duty Team): PCS internal request rates tripled on 2019-11-19 - https://phabricator.wikimedia.org/T238832 (10CCicalese_WMF) [20:08:47] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2215.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272008_dzahn_166465_mw22... [20:20:46] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2224.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272020_dzahn_168724_mw22... [20:21:59] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10Dzahn) [21:19:06] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2215.codfw.wmnet'] ` and were **ALL** successful. [21:26:40] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2224.codfw.wmnet'] ` and were **ALL** successful. [21:32:58] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2225.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272132_dzahn_183058_mw22... [21:57:43] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2217.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272155_dzahn_187651_mw22... [22:06:21] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10Dzahn) [22:10:13] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2226.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272209_dzahn_191211_mw22... [22:15:52] 10serviceops, 10Operations, 10HHVM, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), 10Performance-Team (Radar): Remove HHVM from production - https://phabricator.wikimedia.org/T229792 (10awight) [22:43:22] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2225.codfw.wmnet'] ` and were **ALL** successful. [22:47:26] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2243.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272246_dzahn_198389_mw22... [23:05:58] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2244.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272305_dzahn_202800_mw22... [23:07:31] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2217.codfw.wmnet'] ` and were **ALL** successful. [23:13:02] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10Dzahn) [23:15:54] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` mw2218.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/201911272314_dzahn_205154_mw22... [23:20:59] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2226.codfw.wmnet'] ` and were **ALL** successful. [23:53:24] 10serviceops, 10Operations: Reimage all mediawiki servers - https://phabricator.wikimedia.org/T239054 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2243.codfw.wmnet'] ` and were **ALL** successful.