[00:04:43] cdanis: if these were mostly rest.php req/s I suppose I could hackishly change that back under a hacky req-url condition in wmf-config PHP [00:04:51] but afaik restbase uses api.php mostly right? [00:04:55] so not sure what we can do there. [00:05:06] I've only become more confused [00:05:31] does restbase have alerts for a use case taht is publicly r eproducticle? [00:09:30] https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase is unhelpfully a 404 [00:12:54] https://en.wikipedia.org/api/rest_v1/#/Feed/get_feed_announcements [00:12:58] This seems to work [00:13:02] wikifeeds API [00:13:25] hi, so the Icinga check is locally this: [00:13:32] /usr/local/bin/check-restbase [00:13:59] and https://en.wikipedia.org/api/rest_v1/feed/onthisday/events/04/11 also works fine [00:14:02] running that on restbase2017 does show timeout . it tries to connect to port 721 [00:14:07] 7231 [00:14:21] and regular restbase/page html also fine: https://en.wikipedia.org/api/rest_v1/page/html/Foobar [00:15:45] urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool [00:16:05] but nodejs is running on that port [00:16:23] so that is because it won't work via HTTP anymore and tries HTTPS now? [00:17:27] inside that check-restbase shell script is a http:// [00:18:07] http://10.192.48.119:7231/en.wikipedia.org/v1 [00:26:13] Krinkle: on that page trying to use the "featured" article of the day thing.. "try it out" -> "execute" and it times out [00:26:19] but also that is marked as unstable [00:27:04] the command it gives is: curl -X GET "https://en.wikipedia.org/api/rest_v1/feed/featured/2020/05/04" -H "accept: application/json" [00:30:23] are there any restbase developers or operators on the relevant service alerts? [00:30:45] I imagine it's a one-line fix in its code or configuration somewhere [00:30:59] perhaps file a task for now? [00:37:50] team-services should get email at services@wikimedia [00:38:01] says icinga exim [00:48:58] PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [00:49:34] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10Dzahn) [00:50:08] PROBLEM - LVS apaches codfw port 80/tcp - Main MediaWiki application server cluster- appservers.svc.eqiad.wmnet IPv4 #page on appservers.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.1 and port 80: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [00:50:18] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10Dzahn) Note this endpoint is also marked as "unstable" in the API page [00:50:24] PROBLEM - Check systemd state on mw2236 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:50:25] great [00:50:40] PROBLEM - PyBal IPVS diff check on lvs2010 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.1:80]) https://wikitech.wikimedia.org/wiki/PyBal [00:50:44] RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [00:51:15] ๐Ÿ‘‹ [00:51:18] hi rzl [00:51:33] so i just made a ticket about the restbase part [00:51:36] and now this [00:51:36] PROBLEM - Check the last execution of php7.2-fpm_check_restart on mw2236 is CRITICAL: CRITICAL: Status of the systemd unit php7.2-fpm_check_restart https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:51:42] I believe the cause of the page is I80ca62643f5c [00:51:47] yes [00:51:50] PROBLEM - PyBal IPVS diff check on lvs2009 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.1:80]) https://wikitech.wikimedia.org/wiki/PyBal [00:52:26] Krinkle: seems it is getting worse and we might have to revert that [00:52:30] I'm not sure why; many appservers are failing the healthchecks that pybal sends (which I suspect is an artifact of it not accepting a 302 as 'successful') [00:52:35] but even in that case, it should fail open? [00:52:44] if enough of them went unhealthy, that is [00:53:06] o/ [00:53:41] mutante: are you saying that the broken unstable restbase endpoint is overloading app servers? [00:54:16] reverting it means breaking real-user logins given the new Chrome version is stable as of today (past-midnight UTC) being rolled out any moment now. [00:54:47] Krinkle: no, i am just saying that things started breaking since that deploy [00:55:12] if it's just that one unstable restbase endpoint I don't care right now [00:55:15] and then i tried answering your question for a testable use case where restbase times out [00:55:20] and made the ticket for that [00:55:25] and then we all got paged [00:55:36] which alert paged? [00:55:46] is the mw2236 failure related? [00:55:53] * Krinkle is not SRE and does not know anything [00:55:59] PROBLEM - LVS apaches codfw port 80/tcp - Main MediaWiki application server cluster- appservers.svc.eqiad.wmnet [00:56:36] 2236 is probably unrelated, the health check is failing on what looks like all appservers [00:56:40] Krinkle: ones with the magic word # p a g e (which many SREs highlight on) [00:56:48] rzl: I believe I know how to fix it [00:57:01] at least, *that* part of it; I make no promises about restbase. [00:57:03] it... will require a pybal restart [00:57:13] I am about to attempt on the backup codfw pybal [00:57:16] the health check we determined a while ago is not important given apache/80 is not used for anything, and for things that do use it it responds 301 now which is still valid for clinets that follow redirects. [00:57:21] if it works there I will puppetize [00:57:44] sounds good, I'm here if I can help [00:57:53] it's 2 AM here. [00:58:20] alerts aside, does it seem like anything beyond wikifeeds/features is having problems? [00:59:32] Krinkle: from icinga's viewpoint, not much else seems to be alerting [01:01:41] !log โœ”๏ธ cdanis@lvs2010.codfw.wmnet ~ ๐Ÿ•˜๐Ÿบ sudo systemctl restart pybal.service [01:01:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:02:20] RECOVERY - PyBal IPVS diff check on lvs2010 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [01:02:26] RECOVERY - PyBal backends health check on lvs2010 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [01:02:48] great [01:02:56] I will figure out how to puppetize that change [01:03:10] nice [01:03:14] oh man [01:03:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [01:04:03] can someone look at the exceptions ^ and make sure they seem like the normal background noise kind? [01:04:21] * shdubsh looks [01:04:23] checking now, the timing at least looks wrong to be your change [01:04:26] now that is still looking like background noise before [01:04:33] withtin the last 12 hours [01:04:55] not looking special on https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&from=now-24h&to=now [01:05:17] The grafana panel has an (i) link to https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors [01:05:28] nothing special in logstash that I can see [01:05:31] nothing new today [01:05:49] The pre-existing issue "NamespaceInfo::isTalk called with non-integer (string)" had another spike of user-induced traffic [01:05:56] it's being worked on and should be fixed later this week [01:05:58] thanks [01:05:59] (03PS1) 10CDanis: http_status 302 expected after I80ca62643f5c [puppet] - 10https://gerrit.wikimedia.org/r/612449 [01:06:08] just being paranoid rn [01:07:16] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [01:07:22] https://puppet-compiler.wmflabs.org/compiler1001/23850/lvs2010.codfw.wmnet/index.html [01:07:23] this looks correct [01:07:26] cdanis: lgtm, although given its primary use is accepting tier-1 traffic via 443, mayve a test that x-fwd-proto:https: responds 200 OK is importnat as well. [01:07:26] it matches what I did by hand [01:07:28] splendid [01:07:46] although I hope there's a separate alert on 443 already that effectively tests that [01:08:12] I'm looking at proxyfetch.py for the first time today, but, I don't believe Pybal supports setting headers [01:08:42] don't forget this is just a health check, not an integration test [01:08:43] I meant icinga [01:08:50] ah for the other alert [01:08:54] yeah I haven't started looking at that yet [01:08:56] was this causing pyball to depool servers? [01:09:01] there were two different failures [01:09:21] does pybal not test on 443 since that's where it directs traffic? [01:09:31] this was also causing pybal to depool servers from `apaches_80`, which is used by ????, but *not* for `appservers-https_443`, which is what all traffic I know of uses [01:09:38] there are 2 separate icinga checks, 80 and 443 [01:09:40] there are two different service IPs [01:09:40] ah, I see [01:09:43] 443 checks are green [01:09:47] there is a sparate LVS for port 80 [01:09:49] interesting [01:09:52] legacy [01:09:55] I would not have guessed that has a separate lvs [01:09:57] (03CR) 10RLazarus: [C: 03+1] http_status 302 expected after I80ca62643f5c [puppet] - 10https://gerrit.wikimedia.org/r/612449 (owner: 10CDanis) [01:09:57] maybe still used by restbase??? [01:09:59] dunno [01:10:17] I imagine they both behave the same and both support port 80/443 maybe even depending on how smart lvs is trying to be [01:10:28] (03CR) 10CDanis: [C: 03+2] "pcc matches manual edit that worked on lvs2010 https://puppet-compiler.wmflabs.org/compiler1001/23850/lvs2010.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/612449 (owner: 10CDanis) [01:10:29] actually probably not knowing what I kow about lvs, it's port based right [01:10:58] the eqiad check is green with a 302 [01:11:07] just the codfw check is not [01:11:19] cdanis: maybe that explains why restbase times out, if it means bypas was exhausting the pool [01:11:30] bypas*pybal [01:12:01] when pybal healthchecks fail on enough servers, it is supposed to start ignoring the healthchecks and considering them healthy [01:12:06] pybal will only depool down to -- ^ [01:12:08] because that saves you from various cascading failure scenarios [01:12:27] okay [01:12:28] you can see that in https://grafana.wikimedia.org/d/000000421/pybal?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-server=All&var-service=apaches_80&from=1594602743325&to=1594689143325 [01:12:30] now to do the scary thing [01:12:34] the number of pooled servers drops but not to zero [01:12:38] right, I don't know if that has been tested lately and/or whether that tipping point is high enough that it will not e.g. timeout due to not enough concurrent workers or something [01:13:01] rzl: nice dashboard [01:13:10] I was checking on appserver saturation and it looks fine, but that's the right question [01:13:12] pybal.magic-- [01:13:17] it does look like pybal is doing the right thing though [01:13:18] pybal.understanding++ [01:13:45] rzl: yeah when I looked earlier, I also didn't see any difference in CPU load or network traffic before-vs-after on a handful of appservers that pybal had marked as depooled [01:13:51] which reassured me that nothing was horribly broken [01:13:56] oh interesting, I didn't look that deeply [01:14:10] so it didn't even depool the servers it depooled? *that* surprises me a little [01:14:10] !log โœ”๏ธ cdanis@lvs2009.codfw.wmnet ~ ๐Ÿ•˜๐Ÿบ sudo systemctl restart pybal.service [01:14:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:14:14] RECOVERY - LVS apaches codfw port 80/tcp - Main MediaWiki application server cluster- appservers.svc.eqiad.wmnet IPv4 #page on appservers.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 302 Found - 651 bytes in 0.136 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [01:14:20] RECOVERY - PyBal backends health check on lvs2009 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [01:14:25] but let's talk about this afterward [01:15:02] I'm pleasantly surprised by how quickly that alert resolves btw :) [01:15:16] RECOVERY - PyBal IPVS diff check on lvs2009 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [01:15:19] I'm used to the "anxiously watch the graph-squigglies move" phase being a lot slower [01:15:48] !log โœ”๏ธ cdanis@lvs1016.eqiad.wmnet ~ ๐Ÿ•˜๐Ÿบ sudo systemctl restart pybal.service [01:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:52] okay so that's the backup in eqiad [01:16:08] although I believe the same alert wasn't alerting there? [01:16:10] cdanis: do we know why codfw paged and eqiad didn't, btw? [01:16:13] no [01:16:13] haha same hat [01:16:23] any state you want to check before we blow it away? [01:17:24] the interesting part is that eqiad check was showing the 302 for sure [01:17:27] and was still ok with it [01:17:35] just not the page [01:17:45] the 'PyBal backends health check' alert is still firing on lvs101 and lvs1015 [01:17:48] lvs1016* [01:17:50] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [01:17:54] okay [01:17:55] good [01:19:15] okay, it's been longer than the indicated 120 seconds, I am going to also restart the active server [01:20:25] !log โŒcdanis@lvs1015.eqiad.wmnet ~ ๐Ÿ•ค๐Ÿบ sudo systemctl restart pybal.service [01:20:27] cdanis: Failed to log message to wiki. Somebody should check the error logs. [01:20:43] that's unexpected [01:20:47] no, that's expected [01:20:57] the failover is not-quite-instant *and* it breaks open connections [01:21:05] ohhh got it [01:21:08] RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [01:21:44] cr1-eqiad now shows it's re-peered with lvs1015 [01:21:53] (inserting that in the SAL manually, just for a complete record) [01:21:58] ty [01:22:43] done [01:22:46] okay, so that's the pybal alerts [01:22:50] I still do not know what is up with restbase [01:22:55] nor do I have any idea how important that endpoint is [01:23:13] !log Started long-running Elasticsearch reindex of `eqiad`, `codfw`, and `cloudelastic`. tmux session `reindex` under `ryankemper` on `mwmaint1002` [01:23:14] the endpoint is labelled "unstable" in the API page [01:23:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:23:19] that makes me rate it low.. i guess [01:23:31] I know we've been timing out on ~8 rps since I80ca62643f5c was deployed [01:23:36] from envoy telemetry [01:23:54] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [01:24:00] ^ checking [01:24:36] that looks like itw as associated with the restart [01:24:47] it's not surprising, a bunch of http calls would time out or get connection reset [01:24:52] yeah agreed [01:24:53] a [01:24:55] spike of Elastica\Exception\ResponseException [01:24:57] a lot of Wikimedia\Assert\InvariantException from line 224 of /srv/mediawiki/php-1.35.0-wmf.40/vendor/wikimedia/assert/src/Assert.php: Invariant failed: Bad UTF-8 at end of string (2 byte sequence) [01:25:03] LVS restarts don't have to be impactful, theoretically, but they are rn [01:25:04] which does sound like broken connections [01:25:18] the : Invariant errors are unrelated, been happening in parsoid for a few weeks [01:25:19] it also looks like it recovered, which is consistent [01:25:28] Elastica\Exception\ResponseException is new [01:25:34] presumnably due to something happening here? [01:25:40] amongst pybal's many charms are that it doesn't look at existing ipvsadm state when it starts up; it nukes everything (incl all service IPs) and re-creates state from scratch [01:25:46] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [01:26:03] Krinkle: okay, thanks [01:26:11] Krinkle: Possibly related to the reindex I just kicked off interacting with whatever is going on eqiad? still catching up on the above stuff [01:26:27] ryankemper: seems unlikely [01:26:31] ack [01:26:58] although it occurs to me I don't know where on grafana to look for elasticsearch rps/latency etc [01:27:01] Yeah, to your point that wouldn't really be generating exceptions at the application level [01:27:32] 10Operations, 10Epic, 10Goal: automatically collect network error reports from users' browsers - https://phabricator.wikimedia.org/T257527 (10Ottomata) > Determine whether or not we want additional stream processing to split apart NEL responses into their component events, as each POST made to the reporting... [01:27:45] cdanis: okay I'm going to bed. [01:28:08] cdanis: perhaps https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1 [01:28:36] I see an nginx request rate (altho not seeing data) and qps by type at least [01:28:36] Krinkle: sounds good [01:29:17] interestingly I do see an increase in codfw fetch latency at 1:18 but it looks like from zero to a reasonalbe number, so that probably means that codfw just started getting traffic for whatever reason [01:29:36] anyway seems fine, and very likely to be due to the pybal restart i did [01:31:26] rzl: shdubsh: do either of you feel like the restbase/wikifeeds issue is worth pursuing further outside of working hours? [01:32:06] (I'm pretty sure I think it isn't) [01:32:54] I don't have any context besides what's been discussed, but my read is the same as yours [01:33:04] re: restbase, a simple "curl http://10.192.48.119:7231" on random restbase2017 works just fine. it's just the service-checker-swagger that times out [01:33:37] and that also does .. something.. http://10.192.48.119:7231/en.wikipedia.org/v1 [01:33:39] I was just about to ask the likelyhood that it's a monitoring glitch leftover from the earlier mw-config change [01:33:55] but it's very suspicious that both issues started at the same time [01:34:09] there is https://phabricator.wikimedia.org/T257887 for that part [01:34:16] I think it's very likely there's some restbase endpoint that is not talking to the appservers as it should be [01:34:35] there is envoy there but that doesn't mean everything in restbase's url router/configuration got converted to use it [01:34:36] hmm, previously: https://phabricator.wikimedia.org/T241068 [01:34:56] that looks very possibly related [01:35:49] none of this goes to severity though [01:36:05] I think I'm convinced that this can wait until morning [01:36:32] rzl: https://grafana.wikimedia.org/d/VTCkm29Wz/envoy-telemetry?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-origin=restbase&var-origin_instance=All&var-destination=All&from=now-6h&to=now [01:36:35] the final panel [01:36:42] 8 rps feels like wait until morning for me [01:36:51] ๐Ÿ‘€ [01:37:05] heh, wikifeeds and local_port_7231 [01:37:09] yes [01:37:10] okay yeah, agreed [01:37:11] it's quite a name [01:38:05] the latter actually gets the bulk of the traffic [01:38:16] based on the, uh, upstrean request rate [01:38:43] ahaha i noticed that too [01:38:53] but even beyond the absolute error rate, we're looking at 1-2% errors, not ideal but life will go on [01:39:06] it's possible that the majority of that timeout is monitoring trying repeatedly [01:42:12] times out: /usr/bin/service-checker-swagger -t 5 10.192.48.119 http://10.192.48.119:7231/en.wikipedia.org/v1 works: curl http://10.192.48.119:7231/en.wikipedia.org/v1/ [01:46:13] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10Dzahn) on restbase2017: times out: `/usr/bin/service-checker-swagger -t 5 10.192.48.119 http://10.192.48.119:7231/en.wikipedia.org/v1` works: `curl http://10.192.48.119:7231/en.wikipedia.org/v1/` [01:55:46] it taks about 4 minutes to render http://10.192.48.119:7231/en.wikipedia.org/v1/feed/featured/2016/04/29 [02:05:16] confirmed it works with curl eventually, just with the swagger check it even fails quickly if you raise the -t to 300 or something [02:05:39] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.35.0-wmf.41 [core] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612451 [02:16:27] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10Dzahn) if you wait for about 4 minutes it eventually will return content when testing with curl. using the service-checker-swagger though fails relatively soon also if you raise the -t parameter val... [02:28:23] 10Operations, 10Epic, 10Goal: automatically collect network error reports from users' browsers - https://phabricator.wikimedia.org/T257527 (10CDanis) >>! In T257527#6303123, @Ottomata wrote: >> Determine whether or not we want additional stream processing to split apart NEL responses into their component eve... [02:34:23] (03CR) 10Dzahn: [C: 03+2] xhgui: Pin php-twig at version 1.* [puppet] - 10https://gerrit.wikimedia.org/r/610446 (https://phabricator.wikimedia.org/T254310) (owner: 10Dave Pifke) [02:34:30] (03CR) 10Dzahn: [C: 03+2] "cloud only" [puppet] - 10https://gerrit.wikimedia.org/r/610446 (https://phabricator.wikimedia.org/T254310) (owner: 10Dave Pifke) [02:46:21] (03CR) 10Dzahn: [C: 03+2] webperf: Serve different robots.txt on beta site [puppet] - 10https://gerrit.wikimedia.org/r/608962 (https://phabricator.wikimedia.org/T255092) (owner: 10Dave Pifke) [02:55:21] (03CR) 10Dzahn: "on webperf1001: Profile::Webperf::Site/File[/var/www/no-robots.txt]/ensure: defined content as ..." [puppet] - 10https://gerrit.wikimedia.org/r/608962 (https://phabricator.wikimedia.org/T255092) (owner: 10Dave Pifke) [03:36:45] (03CR) 10Andrew Bogott: [C: 03+2] wmcs domain proxy: add a fallthrough redirect for unknown .wmflabs.org domains [puppet] - 10https://gerrit.wikimedia.org/r/612442 (https://phabricator.wikimedia.org/T256276) (owner: 10Andrew Bogott) [04:07:06] (03PS1) 10Andrew Bogott: wmcs domain proxy: change escaping in lua regexps [puppet] - 10https://gerrit.wikimedia.org/r/612455 (https://phabricator.wikimedia.org/T256276) [04:08:23] (03CR) 10Andrew Bogott: [C: 03+2] wmcs domain proxy: change escaping in lua regexps [puppet] - 10https://gerrit.wikimedia.org/r/612455 (https://phabricator.wikimedia.org/T256276) (owner: 10Andrew Bogott) [04:14:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11884 and previous config saved to /var/cache/conftool/dbconfig/20200714-041440-marostegui.json [04:14:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:15:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1093 in preparation for failover', diff saved to https://phabricator.wikimedia.org/P11885 and previous config saved to /var/cache/conftool/dbconfig/20200714-041548-marostegui.json [04:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:16:00] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:17:50] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:28:30] (03CR) 10Marostegui: mariadb: Promote db1093 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/611964 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [04:28:36] (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db1093 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/611964 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [04:39:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P11886 and previous config saved to /var/cache/conftool/dbconfig/20200714-043907-marostegui.json [04:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:44:25] I will be switching over s6 master in 15 minutes [04:59:12] (03PS2) 10Jforrester: Branch commit for wmf/1.35.0-wmf.41 [core] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612451 (https://phabricator.wikimedia.org/T256669) (owner: 10TrainBranchBot) [04:59:19] (03CR) 10Jforrester: [C: 03+2] Branch commit for wmf/1.35.0-wmf.41 [core] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612451 (https://phabricator.wikimedia.org/T256669) (owner: 10TrainBranchBot) [04:59:44] !log 1.35.0-wmf.41 branched at 7d04152db4f8ea9a459511bed8117101d9bb4602 [04:59:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:04] marostegui: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for s6 database master failover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T0500). [05:00:10] !log Starting s6 failover from db1131 to db1093 - T257253 [05:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:14] T257253: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257253 [05:00:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'Set s6 as read-only for maintenance T257253', diff saved to https://phabricator.wikimedia.org/P11887 and previous config saved to /var/cache/conftool/dbconfig/20200714-050039-marostegui.json [05:00:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:58] Warning: The database has been locked for maintenance, so you will not be able to save your edits right now [05:01:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Promote db1093 to s6 master and remove read-only from s6 T257253', diff saved to https://phabricator.wikimedia.org/P11888 and previous config saved to /var/cache/conftool/dbconfig/20200714-050157-marostegui.json [05:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:02:17] edit worked now [05:02:23] all done [05:02:27] hey jynus! [05:06:19] (03PS1) 10Marostegui: site.pp: Fix typo for db1131 [puppet] - 10https://gerrit.wikimedia.org/r/612458 [05:06:21] (03PS2) 10Marostegui: wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/611965 (https://phabricator.wikimedia.org/T257253) [05:06:57] (03CR) 10Marostegui: wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/611965 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [05:07:06] (03CR) 10Marostegui: [C: 03+2] site.pp: Fix typo for db1131 [puppet] - 10https://gerrit.wikimedia.org/r/612458 (owner: 10Marostegui) [05:07:16] (03CR) 10Marostegui: [C: 03+2] wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/611965 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [05:09:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1093 from api', diff saved to https://phabricator.wikimedia.org/P11889 and previous config saved to /var/cache/conftool/dbconfig/20200714-050912-marostegui.json [05:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:09:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1131 for HW maintenance', diff saved to https://phabricator.wikimedia.org/P11890 and previous config saved to /var/cache/conftool/dbconfig/20200714-050931-marostegui.json [05:09:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:11:51] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257253 (10Marostegui) [05:12:16] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257253 (10Marostegui) The switchover was done, db1131 is no longer the primary master Times: RO started: 05:00:39 RO finished: 05:01:58 Total RO: 1 minute and 19 seconds [05:15:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'Decrease a bit db1088 load', diff saved to https://phabricator.wikimedia.org/P11891 and previous config saved to /var/cache/conftool/dbconfig/20200714-051551-marostegui.json [05:15:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:17:53] (03Merged) 10jenkins-bot: Branch commit for wmf/1.35.0-wmf.41 [core] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612451 (https://phabricator.wikimedia.org/T256669) (owner: 10TrainBranchBot) [05:22:34] (03CR) 10Jcrespo: [C: 03+2] bacula: Merge prometheus exporter and icinga check into a single file [puppet] - 10https://gerrit.wikimedia.org/r/611390 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [05:34:06] (03PS1) 10Jcrespo: bacula: Fix typo on check_bacula.py path [puppet] - 10https://gerrit.wikimedia.org/r/612460 (https://phabricator.wikimedia.org/T234900) [05:34:38] (03CR) 10Jcrespo: [C: 03+2] bacula: Fix typo on check_bacula.py path [puppet] - 10https://gerrit.wikimedia.org/r/612460 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [05:43:23] RECOVERY - Backup freshness on backup1001 is OK: (No output returned from plugin) https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [05:45:50] ^working on that, I belive it is a race condition on deploy [05:54:00] (03PS1) 10Giuseppe Lavagetto: scb: add service proxy, use it in the applications. [puppet] - 10https://gerrit.wikimedia.org/r/612461 (https://phabricator.wikimedia.org/T244843) [05:54:02] (03PS1) 10Giuseppe Lavagetto: maps: add the service proxy [puppet] - 10https://gerrit.wikimedia.org/r/612462 (https://phabricator.wikimedia.org/T244843) [05:54:04] (03PS1) 10Giuseppe Lavagetto: maps: use the service proxy to connect to wdqs [puppet] - 10https://gerrit.wikimedia.org/r/612463 (https://phabricator.wikimedia.org/T244843) [06:00:08] (03PS6) 10Jcrespo: bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) [06:04:55] (03CR) 10Jcrespo: [C: 03+2] bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [06:19:07] (03PS1) 10Jforrester: testwikis wikis to 1.35.0-wmf.41 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612467 [06:19:09] (03CR) 10Jforrester: [C: 03+2] testwikis wikis to 1.35.0-wmf.41 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612467 (owner: 10Jforrester) [06:19:51] (03Merged) 10jenkins-bot: testwikis wikis to 1.35.0-wmf.41 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612467 (owner: 10Jforrester) [06:29:02] !log jforrester@deploy1001 Started scap: testwikis wikis to 1.35.0-wmf.41 [06:29:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:11] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10Joe) p:05Triageโ†’03Unbreak! The homepage of the mobile applications is broken since tonight. [06:30:07] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:30:11] PROBLEM - ores on ores1009 is CRITICAL: connect to address 10.64.48.28 and port 8081: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores [06:35:05] (03PS1) 10Jforrester: ExtensionDistributor: There are now REL1_35 dev snapshots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612469 [06:35:35] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:40:28] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10Joe) The feeds that do work do so because they're cached at the edge and/or restbase. Not because wikifeeds (and termbox, for that matter) are not broken. [06:46:05] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10jcrespo) There is [[ https://grafana.wikimedia.org/d/35vIuGpZk/wikifeeds?panelId=20&fullscreen&orgId=1&from=1594665610342&to=1594708810342&var-dc=eqiad%20prometheus%2Fk8s&var-service=wikifeeds | 2 i... [06:50:15] RECOVERY - ores on ores1009 is OK: HTTP OK: HTTP/1.0 200 OK - 6397 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores [06:52:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11893 and previous config saved to /var/cache/conftool/dbconfig/20200714-065229-marostegui.json [06:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:09] !log Deploy MCR schema change on s5 primary master T238966 [06:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:13] T238966: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 [06:53:50] !log oblivian@deploy2001 helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [06:53:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:01] !log jforrester@deploy1001 Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org) [06:54:01] !log jforrester@deploy1001 scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org) (duration: 24m 59s) [06:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:21] RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:54:29] <_joe_> jynus: ^^ [06:54:38] <_joe_> that's me deploying to eqiad [06:55:01] Hmm. [06:55:07] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [06:55:18] (03PS1) 10Marostegui: db1131: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/612471 (https://phabricator.wikimedia.org/T257253) [06:55:53] <_joe_> {"name":"wikifeeds","hostname":"wikifeeds-production-6c49f4bf7d-2nlss","pid":29,"level":"ERROR","message":"unable to verify the first certificate","status":504,"type":"internal_http_error","detail":"unable to verify the first certificate" [06:55:54] <_joe_> sigh [06:56:00] <_joe_> ok no way to fix it quickly then [06:56:06] <_joe_> lemme rollback [06:56:30] !log oblivian@deploy2001 helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [06:56:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:19] PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:57:31] PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:57:31] PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:57:31] PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:57:51] (03CR) 10Marostegui: [C: 03+2] db1131: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/612471 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [06:57:57] PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:57:57] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:57:57] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:58:03] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [06:58:07] PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:58:07] PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:58:07] PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:58:07] PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:58:07] PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:58:09] !log Stop mysql on db1131 for HW maintenance [06:58:09] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:15] PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [06:59:57] PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:04:30] !log Drop gerrit, gerritro, gerrittest users from m2 databases - T255715 [07:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:04:35] T255715: Make sure both `reviewdb-test` (used forgerrit upgrade testing) and `reviewdb` (formerly production) databases get torn down - https://phabricator.wikimedia.org/T255715 [07:04:51] PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [07:09:59] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [07:12:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P11894 and previous config saved to /var/cache/conftool/dbconfig/20200714-071233-marostegui.json [07:12:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:00] (03CR) 10Jforrester: [C: 03+1] Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612486 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [07:18:07] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612486 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [07:18:56] (03Merged) 10jenkins-bot: Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612486 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [07:19:04] (03PS1) 10Volans: mgmt: netbox-generated data for frack mgmt eqiad [dns] - 10https://gerrit.wikimedia.org/r/612472 (https://phabricator.wikimedia.org/T233183) [07:21:18] <_joe_> pulling on mwdebug1001 [07:24:13] (03CR) 10Volans: mgmt: netbox-generated data for frack mgmt eqiad (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/612472 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [07:24:41] <_joe_> still not finished pulling, wow [07:25:05] _joe_: have you set the --faster option? :-P [07:25:16] <_joe_> not in the mood for jokes [07:25:26] <_joe_> we left various systems broken for 8 hours [07:25:33] <_joe_> 9 hours now [07:27:20] !log installing libtasn1-6 security updates [07:27:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:53] 10Operations, 10Puppet, 10Diffusion, 10Phabricator: Diffussion (Phabricator) operations-puppet repo synchronization error - https://phabricator.wikimedia.org/T257895 (10jcrespo) [07:29:38] 10Operations, 10Puppet, 10Diffusion, 10Phabricator: Diffusion (Phabricator) operations-puppet repo synchronization error - https://phabricator.wikimedia.org/T257895 (10jcrespo) [07:31:13] !log oblivian@deploy1001 Scap failed!: 7/9 canaries failed their endpoint checks(http://en.wikipedia.org) [07:31:15] RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:31:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:15] RECOVERY - restbase endpoints health on restbase2017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:32:28] !log oblivian@deploy1001 sync-file aborted: revert forcehttps in an attempt to fix T257887 (duration: 00m 20s) [07:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:33] T257887: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 [07:32:44] 10Puppet, 10DBA, 10cloud-services-team (Kanban): labtestpuppetmaster2001 is failing to backup - https://phabricator.wikimedia.org/T256846 (10jcrespo) I have added a rule to ignore labtestpuppetmaster2001 backup monitoring: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/1cd5aee3ff46cda2a1a... [07:33:09] RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:34:09] RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:34:49] RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:35:09] RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:36:13] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:36:49] RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds [07:36:57] PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:38:09] RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:38:29] RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:38:43] RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:38:43] RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:38:45] RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:38:47] RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:39:53] PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:39:53] PROBLEM - restbase endpoints health on restbase2017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:40:27] PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:40:35] RECOVERY - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:40:45] PROBLEM - Ensure local MW versions match expected deployment on mw1378 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:45] PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:40:47] PROBLEM - Ensure local MW versions match expected deployment on wtp2013 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:49] PROBLEM - Ensure local MW versions match expected deployment on mw1328 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:49] PROBLEM - Ensure local MW versions match expected deployment on mw1370 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:49] PROBLEM - Ensure local MW versions match expected deployment on mw1383 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:49] PROBLEM - Ensure local MW versions match expected deployment on mw1341 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:49] PROBLEM - Ensure local MW versions match expected deployment on mw1343 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:51] PROBLEM - Ensure local MW versions match expected deployment on mw2244 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:51] PROBLEM - Ensure local MW versions match expected deployment on mw2264 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:51] PROBLEM - Ensure local MW versions match expected deployment on mw2216 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:51] PROBLEM - Ensure local MW versions match expected deployment on mw2255 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:51] PROBLEM - Ensure local MW versions match expected deployment on mw2231 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:51] PROBLEM - Ensure local MW versions match expected deployment on mw2135 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:40:52] PROBLEM - Ensure local MW versions match expected deployment on mw2143 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:01] PROBLEM - Ensure local MW versions match expected deployment on mw1402 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:03] PROBLEM - Ensure local MW versions match expected deployment on mw1344 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:03] PROBLEM - Ensure local MW versions match expected deployment on wtp1036 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:03] PROBLEM - Ensure local MW versions match expected deployment on mw1362 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:03] PROBLEM - Ensure local MW versions match expected deployment on mw1284 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:03] PROBLEM - Ensure local MW versions match expected deployment on mw1282 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:03] PROBLEM - Ensure local MW versions match expected deployment on mw1268 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:05] PROBLEM - Ensure local MW versions match expected deployment on mw1275 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:05] PROBLEM - Ensure local MW versions match expected deployment on mw1310 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:05] PROBLEM - Ensure local MW versions match expected deployment on mw1269 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:05] PROBLEM - Ensure local MW versions match expected deployment on mw1266 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:05] PROBLEM - Ensure local MW versions match expected deployment on mw2328 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:07] PROBLEM - Ensure local MW versions match expected deployment on mw2207 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:11] PROBLEM - Ensure local MW versions match expected deployment on mw2311 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:11] PROBLEM - Ensure local MW versions match expected deployment on mw2320 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:11] PROBLEM - Ensure local MW versions match expected deployment on mw2271 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:13] PROBLEM - Ensure local MW versions match expected deployment on mw2227 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:13] PROBLEM - Ensure local MW versions match expected deployment on mw2217 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:13] PROBLEM - Ensure local MW versions match expected deployment on mw2218 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:13] PROBLEM - Ensure local MW versions match expected deployment on mw2241 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:15] PROBLEM - Ensure local MW versions match expected deployment on mw2295 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:15] PROBLEM - Ensure local MW versions match expected deployment on mw2322 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:15] PROBLEM - Ensure local MW versions match expected deployment on mw2309 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:15] PROBLEM - Ensure local MW versions match expected deployment on mw2318 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:15] PROBLEM - Ensure local MW versions match expected deployment on mw2323 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:15] PROBLEM - Ensure local MW versions match expected deployment on mw2307 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:15] PROBLEM - Ensure local MW versions match expected deployment on mw2302 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:16] PROBLEM - Ensure local MW versions match expected deployment on mw2360 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:16] PROBLEM - Ensure local MW versions match expected deployment on mw2370 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:17] PROBLEM - Ensure local MW versions match expected deployment on mw2339 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:17] PROBLEM - Ensure local MW versions match expected deployment on mw1304 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:19] PROBLEM - Ensure local MW versions match expected deployment on mw1394 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:21] PROBLEM - Ensure local MW versions match expected deployment on mw2206 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:21] PROBLEM - Ensure local MW versions match expected deployment on mw2190 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:23] PROBLEM - Ensure local MW versions match expected deployment on mw2221 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:25] PROBLEM - Ensure local MW versions match expected deployment on mw2354 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:25] PROBLEM - Ensure local MW versions match expected deployment on mw2301 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:37] PROBLEM - Ensure local MW versions match expected deployment on mw1408 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:37] PROBLEM - Ensure local MW versions match expected deployment on wtp1042 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:37] PROBLEM - Ensure local MW versions match expected deployment on wtp1045 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:37] PROBLEM - Ensure local MW versions match expected deployment on mw1351 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:37] PROBLEM - Ensure local MW versions match expected deployment on mw1296 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:37] PROBLEM - Ensure local MW versions match expected deployment on mw2283 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:37] PROBLEM - Ensure local MW versions match expected deployment on wtp2012 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:38] PROBLEM - Ensure local MW versions match expected deployment on wtp2005 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:38] PROBLEM - Ensure local MW versions match expected deployment on wtp2007 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:39] PROBLEM - Ensure local MW versions match expected deployment on wtp2018 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:41] PROBLEM - Ensure local MW versions match expected deployment on mw1392 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:41] PROBLEM - Ensure local MW versions match expected deployment on mw1355 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:45] PROBLEM - Ensure local MW versions match expected deployment on mw1401 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:45] PROBLEM - Ensure local MW versions match expected deployment on mw1374 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:45] PROBLEM - Ensure local MW versions match expected deployment on mw1375 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:45] PROBLEM - Ensure local MW versions match expected deployment on mw1333 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:45] PROBLEM - Ensure local MW versions match expected deployment on mw1273 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:45] PROBLEM - Ensure local MW versions match expected deployment on mw1270 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:45] PROBLEM - Ensure local MW versions match expected deployment on mw1299 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:46] PROBLEM - Ensure local MW versions match expected deployment on mw1345 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:46] PROBLEM - Ensure local MW versions match expected deployment on mw1272 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:47] PROBLEM - Ensure local MW versions match expected deployment on mw1312 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:47] PROBLEM - Ensure local MW versions match expected deployment on mw2335 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:48] PROBLEM - Ensure local MW versions match expected deployment on mw2352 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:48] PROBLEM - Ensure local MW versions match expected deployment on mw2312 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:49] PROBLEM - Ensure local MW versions match expected deployment on mw2316 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:49] PROBLEM - Ensure local MW versions match expected deployment on mw2305 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:50] PROBLEM - Ensure local MW versions match expected deployment on mw2261 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:50] PROBLEM - Ensure local MW versions match expected deployment on mw2284 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:51] PROBLEM - Ensure local MW versions match expected deployment on mw2234 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:51] PROBLEM - Ensure local MW versions match expected deployment on mw2228 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:52] PROBLEM - Ensure local MW versions match expected deployment on mw2192 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:52] PROBLEM - Ensure local MW versions match expected deployment on mw2189 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:53] PROBLEM - Ensure local MW versions match expected deployment on mw2229 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:53] PROBLEM - Ensure local MW versions match expected deployment on mw2142 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:55] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [07:41:57] PROBLEM - Ensure local MW versions match expected deployment on wtp1046 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:57] PROBLEM - Ensure local MW versions match expected deployment on mw1410 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:41:57] PROBLEM - Ensure local MW versions match expected deployment on mw1281 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:07] PROBLEM - Ensure local MW versions match expected deployment on mw1385 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:07] PROBLEM - Ensure local MW versions match expected deployment on mw1302 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:07] PROBLEM - Ensure local MW versions match expected deployment on mw1342 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:07] PROBLEM - Ensure local MW versions match expected deployment on wtp1047 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:07] PROBLEM - Ensure local MW versions match expected deployment on mw1305 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:09] PROBLEM - Ensure local MW versions match expected deployment on mw2313 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:09] PROBLEM - Ensure local MW versions match expected deployment on wtp2020 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:29] RECOVERY - Restbase edge codfw on text-lb.codfw.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:42:29] PROBLEM - Ensure local MW versions match expected deployment on mw1407 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:29] PROBLEM - Ensure local MW versions match expected deployment on mw1390 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:29] PROBLEM - Ensure local MW versions match expected deployment on wtp1044 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:29] PROBLEM - Ensure local MW versions match expected deployment on mw2226 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:33] PROBLEM - Ensure local MW versions match expected deployment on mw1320 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:33] PROBLEM - Ensure local MW versions match expected deployment on mw1271 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:33] PROBLEM - Ensure local MW versions match expected deployment on mw1323 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:33] PROBLEM - Ensure local MW versions match expected deployment on mw1298 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:33] RECOVERY - restbase endpoints health on restbase2015 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:42:33] PROBLEM - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Wikifeeds [07:42:33] PROBLEM - Ensure local MW versions match expected deployment on mw2372 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:35] PROBLEM - Ensure local MW versions match expected deployment on mw2233 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:35] PROBLEM - restbase endpoints health on restbase2014 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:42:37] PROBLEM - Ensure local MW versions match expected deployment on mw2289 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:37] PROBLEM - Ensure local MW versions match expected deployment on mw2268 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:43] PROBLEM - Ensure local MW versions match expected deployment on mw1391 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:43] PROBLEM - Ensure local MW versions match expected deployment on mw2145 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:43] PROBLEM - Ensure local MW versions match expected deployment on mw2214 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:43] PROBLEM - Ensure local MW versions match expected deployment on wtp2004 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:45] PROBLEM - Ensure local MW versions match expected deployment on wtp2015 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:45] PROBLEM - Ensure local MW versions match expected deployment on wtp2011 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:45] PROBLEM - Ensure local MW versions match expected deployment on wtp2002 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:45] PROBLEM - Ensure local MW versions match expected deployment on wtp2017 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:45] PROBLEM - Ensure local MW versions match expected deployment on mw1376 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:45] PROBLEM - Ensure local MW versions match expected deployment on mw1373 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:46] PROBLEM - Ensure local MW versions match expected deployment on mw2297 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:47] PROBLEM - Ensure local MW versions match expected deployment on mw2314 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:47] PROBLEM - Ensure local MW versions match expected deployment on mw2357 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:47] PROBLEM - Ensure local MW versions match expected deployment on mw2243 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:49] uh oh... [07:42:57] RECOVERY - restbase endpoints health on restbase2013 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:42:57] PROBLEM - Ensure local MW versions match expected deployment on wtp1034 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:57] PROBLEM - Ensure local MW versions match expected deployment on mw1347 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:42:57] PROBLEM - Ensure local MW versions match expected deployment on mw1361 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:01] PROBLEM - Ensure local MW versions match expected deployment on wtp1030 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:01] PROBLEM - Ensure local MW versions match expected deployment on wtp1035 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:01] PROBLEM - Ensure local MW versions match expected deployment on mw2327 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:01] PROBLEM - Ensure local MW versions match expected deployment on mw2358 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:01] PROBLEM - Ensure local MW versions match expected deployment on mw2253 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:01] PROBLEM - Ensure local MW versions match expected deployment on mw2273 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:01] RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:43:05] PROBLEM - Ensure local MW versions match expected deployment on mw1400 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:07] PROBLEM - Ensure local MW versions match expected deployment on wtp1027 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:07] PROBLEM - Ensure local MW versions match expected deployment on mw1295 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:07] PROBLEM - Ensure local MW versions match expected deployment on mw1274 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:07] PROBLEM - Ensure local MW versions match expected deployment on mw1288 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:07] PROBLEM - Ensure local MW versions match expected deployment on mw2351 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:07] PROBLEM - Ensure local MW versions match expected deployment on mw2224 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:07] PROBLEM - Ensure local MW versions match expected deployment on mw2215 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:08] PROBLEM - Ensure local MW versions match expected deployment on wtp2001 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:09] PROBLEM - Ensure local MW versions match expected deployment on mw1412 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:09] PROBLEM - Ensure local MW versions match expected deployment on mw1371 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:09] PROBLEM - Ensure local MW versions match expected deployment on mw2291 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:10] PROBLEM - Ensure local MW versions match expected deployment on mw2252 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:10] PROBLEM - Ensure local MW versions match expected deployment on mw2260 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:11] PROBLEM - Ensure local MW versions match expected deployment on mw2138 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:15] PROBLEM - Ensure local MW versions match expected deployment on mw2368 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:15] PROBLEM - Ensure local MW versions match expected deployment on mw2247 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:17] PROBLEM - Ensure local MW versions match expected deployment on mw1388 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:17] PROBLEM - Ensure local MW versions match expected deployment on mw2245 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:21] PROBLEM - Ensure local MW versions match expected deployment on mw1387 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:21] PROBLEM - Ensure local MW versions match expected deployment on mw1382 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:21] PROBLEM - Ensure local MW versions match expected deployment on mw1356 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:21] PROBLEM - Ensure local MW versions match expected deployment on mw1348 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:25] PROBLEM - Ensure local MW versions match expected deployment on mw2282 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:29] PROBLEM - Ensure local MW versions match expected deployment on mw1365 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:29] PROBLEM - Ensure local MW versions match expected deployment on mw1332 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:29] PROBLEM - Ensure local MW versions match expected deployment on mw1287 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:29] PROBLEM - Ensure local MW versions match expected deployment on mw1297 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:29] PROBLEM - Ensure local MW versions match expected deployment on mw2355 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:29] PROBLEM - Ensure local MW versions match expected deployment on mw2299 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:31] PROBLEM - Ensure local MW versions match expected deployment on mw2139 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:33] PROBLEM - Ensure local MW versions match expected deployment on mw1409 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:33] PROBLEM - Ensure local MW versions match expected deployment on mw1316 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:33] PROBLEM - Ensure local MW versions match expected deployment on mw1377 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:33] PROBLEM - Ensure local MW versions match expected deployment on mw1319 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:33] PROBLEM - Ensure local MW versions match expected deployment on mw1352 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:33] PROBLEM - Ensure local MW versions match expected deployment on mw1327 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:33] PROBLEM - Ensure local MW versions match expected deployment on mw2308 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:34] PROBLEM - Ensure local MW versions match expected deployment on mw2364 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:34] PROBLEM - Ensure local MW versions match expected deployment on mw2375 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:35] PROBLEM - Ensure local MW versions match expected deployment on mw2288 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:35] PROBLEM - Ensure local MW versions match expected deployment on mw2278 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:36] PROBLEM - Ensure local MW versions match expected deployment on mw2267 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:36] PROBLEM - Ensure local MW versions match expected deployment on mw2274 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:37] PROBLEM - Ensure local MW versions match expected deployment on mw2223 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:37] PROBLEM - Ensure local MW versions match expected deployment on mw2239 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:38] PROBLEM - Ensure local MW versions match expected deployment on mw2246 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:38] PROBLEM - Ensure local MW versions match expected deployment on mw2251 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:39] PROBLEM - Ensure local MW versions match expected deployment on mw2193 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:39] PROBLEM - Ensure local MW versions match expected deployment on mw2187 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:40] PROBLEM - Ensure local MW versions match expected deployment on mw2144 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:40] PROBLEM - Ensure local MW versions match expected deployment on mw2195 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:41] PROBLEM - Ensure local MW versions match expected deployment on mw1399 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:41] PROBLEM - Ensure local MW versions match expected deployment on mw1368 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:42] PROBLEM - Ensure local MW versions match expected deployment on mw1413 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:42] PROBLEM - Ensure local MW versions match expected deployment on mw1300 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:43] PROBLEM - Ensure local MW versions match expected deployment on mw1336 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:43] PROBLEM - Ensure local MW versions match expected deployment on mw2292 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:44] PROBLEM - Ensure local MW versions match expected deployment on mw2298 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:44] PROBLEM - Ensure local MW versions match expected deployment on mw2338 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:45] PROBLEM - Ensure local MW versions match expected deployment on mw2293 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:45] PROBLEM - Ensure local MW versions match expected deployment on wtp1041 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:46] PROBLEM - Ensure local MW versions match expected deployment on mw2269 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:46] PROBLEM - Ensure local MW versions match expected deployment on mw2280 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:47] PROBLEM - Ensure local MW versions match expected deployment on mw2248 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:47] PROBLEM - Ensure local MW versions match expected deployment on mw2236 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:48] PROBLEM - Ensure local MW versions match expected deployment on mw2230 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:48] PROBLEM - Ensure local MW versions match expected deployment on mw2199 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:49] PROBLEM - Ensure local MW versions match expected deployment on mw1406 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:49] PROBLEM - Ensure local MW versions match expected deployment on wtp1039 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:50] PROBLEM - Ensure local MW versions match expected deployment on mw2376 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:50] PROBLEM - Ensure local MW versions match expected deployment on mw2304 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:51] PROBLEM - Ensure local MW versions match expected deployment on mw2198 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:51] PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:43:52] RECOVERY - restbase endpoints health on restbase2020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:43:59] PROBLEM - Ensure local MW versions match expected deployment on wtp1038 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:43:59] PROBLEM - Ensure local MW versions match expected deployment on wtp1037 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:05] PROBLEM - Ensure local MW versions match expected deployment on mwmaint1002 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:05] PROBLEM - Ensure local MW versions match expected deployment on mw2336 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:07] PROBLEM - Ensure local MW versions match expected deployment on wtp1028 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:07] PROBLEM - Ensure local MW versions match expected deployment on mw1350 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:07] PROBLEM - Ensure local MW versions match expected deployment on wtp1026 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:07] PROBLEM - Ensure local MW versions match expected deployment on mw1321 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:07] PROBLEM - Ensure local MW versions match expected deployment on mw1290 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:07] PROBLEM - Ensure local MW versions match expected deployment on mw1313 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:07] PROBLEM - Ensure local MW versions match expected deployment on mw1311 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:13] PROBLEM - Ensure local MW versions match expected deployment on mw1308 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:13] PROBLEM - Ensure local MW versions match expected deployment on mw1364 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:13] PROBLEM - Ensure local MW versions match expected deployment on mw1329 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:13] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:44:15] PROBLEM - Ensure local MW versions match expected deployment on mw1398 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:15] PROBLEM - Ensure local MW versions match expected deployment on mw1384 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:19] PROBLEM - Ensure local MW versions match expected deployment on mw1404 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:19] PROBLEM - Ensure local MW versions match expected deployment on mw2330 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:19] PROBLEM - Ensure local MW versions match expected deployment on mw2337 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:19] PROBLEM - Ensure local MW versions match expected deployment on mw2317 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:19] PROBLEM - Ensure local MW versions match expected deployment on mw2300 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:19] PROBLEM - Ensure local MW versions match expected deployment on mw2279 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:19] PROBLEM - Ensure local MW versions match expected deployment on mw2249 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:20] PROBLEM - Ensure local MW versions match expected deployment on mw2281 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:23] RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:44:25] PROBLEM - Ensure local MW versions match expected deployment on scandium is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:27] PROBLEM - Ensure local MW versions match expected deployment on mw1395 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:27] PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:44:27] PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:44:27] PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:44:29] PROBLEM - Ensure local MW versions match expected deployment on wtp1031 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:29] PROBLEM - Ensure local MW versions match expected deployment on mw1360 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:29] PROBLEM - Ensure local MW versions match expected deployment on mw1359 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:29] PROBLEM - Ensure local MW versions match expected deployment on mw1354 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:29] PROBLEM - Ensure local MW versions match expected deployment on mw1379 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:31] PROBLEM - Ensure local MW versions match expected deployment on wtp1032 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:33] PROBLEM - Ensure local MW versions match expected deployment on mw1306 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:35] PROBLEM - Ensure local MW versions match expected deployment on mw1334 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:35] PROBLEM - Ensure local MW versions match expected deployment on mw1337 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:35] PROBLEM - Ensure local MW versions match expected deployment on wtp1033 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:35] PROBLEM - Ensure local MW versions match expected deployment on mw1346 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:35] PROBLEM - Ensure local MW versions match expected deployment on mw1285 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:35] PROBLEM - Ensure local MW versions match expected deployment on mw1330 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:35] PROBLEM - Ensure local MW versions match expected deployment on mw1338 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:36] PROBLEM - Ensure local MW versions match expected deployment on mw1286 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:36] PROBLEM - Ensure local MW versions match expected deployment on snapshot1006 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:37] PROBLEM - Ensure local MW versions match expected deployment on mw1331 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:37] PROBLEM - Ensure local MW versions match expected deployment on mw1293 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:38] PROBLEM - Ensure local MW versions match expected deployment on mw2374 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:38] PROBLEM - Ensure local MW versions match expected deployment on mw2371 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:39] PROBLEM - Ensure local MW versions match expected deployment on mw2329 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:39] PROBLEM - Ensure local MW versions match expected deployment on mw2369 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:40] PROBLEM - Ensure local MW versions match expected deployment on mw2365 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:40] PROBLEM - Ensure local MW versions match expected deployment on mw2362 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:41] PROBLEM - Ensure local MW versions match expected deployment on mw2373 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:41] PROBLEM - Ensure local MW versions match expected deployment on mwdebug2002 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:42] PROBLEM - Ensure local MW versions match expected deployment on mw2272 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:42] PROBLEM - Ensure local MW versions match expected deployment on mw2290 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:43] PROBLEM - Ensure local MW versions match expected deployment on mw2256 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:43] PROBLEM - Ensure local MW versions match expected deployment on mw2266 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:44] PROBLEM - Ensure local MW versions match expected deployment on mw2276 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:44] PROBLEM - Ensure local MW versions match expected deployment on mw2240 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:45] PROBLEM - Ensure local MW versions match expected deployment on mw2220 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:45] PROBLEM - Ensure local MW versions match expected deployment on mw2222 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:46] PROBLEM - Ensure local MW versions match expected deployment on mw2286 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:46] PROBLEM - Ensure local MW versions match expected deployment on mw2270 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:47] PROBLEM - Ensure local MW versions match expected deployment on mw2204 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:47] PROBLEM - Ensure local MW versions match expected deployment on mw2188 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:48] PROBLEM - Ensure local MW versions match expected deployment on mw2210 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:48] PROBLEM - Ensure local MW versions match expected deployment on mw2194 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:49] PROBLEM - Ensure local MW versions match expected deployment on mw2191 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:53] PROBLEM - Ensure local MW versions match expected deployment on labweb1001 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:53] PROBLEM - Ensure local MW versions match expected deployment on mw1403 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:55] PROBLEM - Ensure local MW versions match expected deployment on mw2367 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:55] PROBLEM - Ensure local MW versions match expected deployment on mw2366 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:57] PROBLEM - Ensure local MW versions match expected deployment on mwmaint2001 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:57] PROBLEM - Ensure local MW versions match expected deployment on mw1367 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:57] PROBLEM - Ensure local MW versions match expected deployment on mw2211 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:44:57] PROBLEM - Ensure local MW versions match expected deployment on mw2141 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:03] PROBLEM - Ensure local MW versions match expected deployment on snapshot1008 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:03] PROBLEM - Ensure local MW versions match expected deployment on mw1380 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:03] PROBLEM - Ensure local MW versions match expected deployment on mw1381 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:03] PROBLEM - Ensure local MW versions match expected deployment on mw2146 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:03] PROBLEM - Ensure local MW versions match expected deployment on mw2201 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:05] PROBLEM - Ensure local MW versions match expected deployment on mw2254 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:05] PROBLEM - Ensure local MW versions match expected deployment on mw2262 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:05] PROBLEM - Ensure local MW versions match expected deployment on mwdebug2001 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:05] PROBLEM - Ensure local MW versions match expected deployment on mw2196 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:05] PROBLEM - Ensure local MW versions match expected deployment on mw2200 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:05] PROBLEM - Ensure local MW versions match expected deployment on mw2140 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:06] PROBLEM - Ensure local MW versions match expected deployment on mw2203 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:07] PROBLEM - Ensure local MW versions match expected deployment on mw1289 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:07] PROBLEM - Ensure local MW versions match expected deployment on mw1314 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:07] PROBLEM - Ensure local MW versions match expected deployment on mw1325 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:08] PROBLEM - Ensure local MW versions match expected deployment on mw1349 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:08] PROBLEM - Ensure local MW versions match expected deployment on mw2277 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:09] PROBLEM - Ensure local MW versions match expected deployment on mw2257 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:11] PROBLEM - Ensure local MW versions match expected deployment on wtp1048 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:11] PROBLEM - Ensure local MW versions match expected deployment on snapshot1007 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:11] PROBLEM - Ensure local MW versions match expected deployment on mw2350 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:11] PROBLEM - Ensure local MW versions match expected deployment on mw2294 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:11] PROBLEM - Ensure local MW versions match expected deployment on mw2325 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:12] PROBLEM - Ensure local MW versions match expected deployment on mw2232 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:15] PROBLEM - Ensure local MW versions match expected deployment on mw1397 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:15] PROBLEM - Ensure local MW versions match expected deployment on mw1396 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:21] PROBLEM - Ensure local MW versions match expected deployment on mw1309 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:21] PROBLEM - Ensure local MW versions match expected deployment on mw2334 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:21] PROBLEM - Ensure local MW versions match expected deployment on mw2275 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:23] PROBLEM - Ensure local MW versions match expected deployment on snapshot1010 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:25] PROBLEM - Ensure local MW versions match expected deployment on mw1353 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:25] PROBLEM - Ensure local MW versions match expected deployment on mw1393 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:25] PROBLEM - Ensure local MW versions match expected deployment on mw1411 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:25] PROBLEM - Ensure local MW versions match expected deployment on mw1294 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:25] PROBLEM - Ensure local MW versions match expected deployment on mw1315 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:25] PROBLEM - Ensure local MW versions match expected deployment on mw2238 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:25] PROBLEM - Ensure local MW versions match expected deployment on mw2258 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:26] PROBLEM - Ensure local MW versions match expected deployment on mw2285 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:26] PROBLEM - Ensure local MW versions match expected deployment on mw2202 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:27] PROBLEM - Ensure local MW versions match expected deployment on wtp2019 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:27] PROBLEM - Ensure local MW versions match expected deployment on mw1372 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:28] PROBLEM - Ensure local MW versions match expected deployment on mw2315 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:28] PROBLEM - Ensure local MW versions match expected deployment on mw2310 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:29] PROBLEM - Ensure local MW versions match expected deployment on mw1363 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:31] PROBLEM - Ensure local MW versions match expected deployment on wtp1043 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:31] PROBLEM - Ensure local MW versions match expected deployment on mw1339 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:31] PROBLEM - Ensure local MW versions match expected deployment on mw1369 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:31] PROBLEM - Ensure local MW versions match expected deployment on mw1283 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:31] PROBLEM - Ensure local MW versions match expected deployment on labweb1002 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:32] PROBLEM - Ensure local MW versions match expected deployment on snapshot1009 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:32] PROBLEM - Ensure local MW versions match expected deployment on mwdebug1002 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:33] PROBLEM - Ensure local MW versions match expected deployment on wtp1029 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:33] PROBLEM - Ensure local MW versions match expected deployment on mw1317 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:34] PROBLEM - Ensure local MW versions match expected deployment on wtp1040 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:34] PROBLEM - Ensure local MW versions match expected deployment on mw1340 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:35] PROBLEM - Ensure local MW versions match expected deployment on mw1335 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:35] PROBLEM - Ensure local MW versions match expected deployment on mw2361 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:36] PROBLEM - Ensure local MW versions match expected deployment on mw2353 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:36] PROBLEM - Ensure local MW versions match expected deployment on mw2331 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:37] PROBLEM - Ensure local MW versions match expected deployment on mw2332 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:37] PROBLEM - Ensure local MW versions match expected deployment on mw2321 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:38] PROBLEM - Ensure local MW versions match expected deployment on mw2306 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:38] PROBLEM - Ensure local MW versions match expected deployment on mw2259 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:39] PROBLEM - Ensure local MW versions match expected deployment on mw2265 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:39] PROBLEM - Ensure local MW versions match expected deployment on mw2209 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:40] PROBLEM - Ensure local MW versions match expected deployment on mw2212 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:40] RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:45:41] PROBLEM - Ensure local MW versions match expected deployment on mw2237 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:41] PROBLEM - Ensure local MW versions match expected deployment on wtp2010 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:42] PROBLEM - Ensure local MW versions match expected deployment on mw2136 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:42] PROBLEM - Ensure local MW versions match expected deployment on wtp2009 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:43] PROBLEM - Ensure local MW versions match expected deployment on wtp2016 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:43] PROBLEM - Ensure local MW versions match expected deployment on wtp2006 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:44] PROBLEM - Ensure local MW versions match expected deployment on wtp2003 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:44] PROBLEM - Ensure local MW versions match expected deployment on mw2147 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:45] PROBLEM - Ensure local MW versions match expected deployment on mw1303 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:47] PROBLEM - Ensure local MW versions match expected deployment on mw1389 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:47] PROBLEM - Ensure local MW versions match expected deployment on mw1301 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:47] PROBLEM - Ensure local MW versions match expected deployment on mw1267 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:47] PROBLEM - Ensure local MW versions match expected deployment on mw1326 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:47] PROBLEM - Ensure local MW versions match expected deployment on mw2303 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:48] PROBLEM - Ensure local MW versions match expected deployment on mw2319 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:48] PROBLEM - Ensure local MW versions match expected deployment on mw2333 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:49] PROBLEM - Ensure local MW versions match expected deployment on mw2359 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:49] PROBLEM - Ensure local MW versions match expected deployment on mw2250 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:50] PROBLEM - Ensure local MW versions match expected deployment on mw2219 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:50] PROBLEM - Ensure local MW versions match expected deployment on mw2242 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:51] PROBLEM - Ensure local MW versions match expected deployment on mw2208 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:51] PROBLEM - Ensure local MW versions match expected deployment on mw2225 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:52] PROBLEM - Ensure local MW versions match expected deployment on mw2197 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:52] PROBLEM - Ensure local MW versions match expected deployment on wtp2014 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:53] PROBLEM - Ensure local MW versions match expected deployment on mw2137 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:57] PROBLEM - Ensure local MW versions match expected deployment on wtp1025 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:57] PROBLEM - Ensure local MW versions match expected deployment on snapshot1005 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:45:57] PROBLEM - Ensure local MW versions match expected deployment on wtp2008 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:07] RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:46:07] PROBLEM - Ensure local MW versions match expected deployment on mw1322 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:07] PROBLEM - Ensure local MW versions match expected deployment on mw1318 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:07] PROBLEM - Ensure local MW versions match expected deployment on mw1366 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:09] PROBLEM - Ensure local MW versions match expected deployment on mw1324 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:11] PROBLEM - Ensure local MW versions match expected deployment on mw2235 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:13] PROBLEM - Ensure local MW versions match expected deployment on mw2287 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:15] RECOVERY - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds [07:46:15] PROBLEM - Ensure local MW versions match expected deployment on mw1386 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:15] PROBLEM - Ensure local MW versions match expected deployment on mw1405 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:17] PROBLEM - Ensure local MW versions match expected deployment on mw2296 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:21] PROBLEM - Ensure local MW versions match expected deployment on mw1358 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:21] PROBLEM - Ensure local MW versions match expected deployment on mw2356 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:21] PROBLEM - Ensure local MW versions match expected deployment on mw2363 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:27] PROBLEM - Ensure local MW versions match expected deployment on mw2326 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:27] PROBLEM - Ensure local MW versions match expected deployment on mw2324 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:27] PROBLEM - Ensure local MW versions match expected deployment on mw2263 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:27] PROBLEM - Ensure local MW versions match expected deployment on mw2205 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:33] PROBLEM - Ensure local MW versions match expected deployment on mw1307 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:46:35] PROBLEM - Ensure local MW versions match expected deployment on mw1357 is CRITICAL: CRITICAL: 3 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [07:47:27] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:47:41] RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:05] RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:09] RECOVERY - restbase endpoints health on restbase2018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:09] RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:13] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:13] RECOVERY - restbase endpoints health on restbase2021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:13] RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:15] RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:15] RECOVERY - restbase endpoints health on restbase2022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:48:19] !log oblivian@deploy1001 Synchronized wmf-config/InitialiseSettings.php: revert forcehttps in an attempt to fix T257887 (duration: 01m 06s) [07:48:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:24] T257887: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 [07:48:31] <_joe_> ok my revert worked [07:48:48] <_joe_> James_F: you can proceed with the train if you want to [07:49:01] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:49:07] RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:09] RECOVERY - restbase endpoints health on restbase2017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:13] <_joe_> so the revert fixed wikifeeds it seems [07:49:19] RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:19] RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:19] RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:45] RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:45] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:45] RECOVERY - restbase endpoints health on restbase2016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:57] _joe_: Excellent, thanks. [07:49:57] RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds [07:49:57] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [07:49:57] RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:57] RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:57] RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:49:57] RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:50:01] RECOVERY - restbase endpoints health on restbase2023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:50:01] RECOVERY - restbase endpoints health on restbase2014 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:50:23] <_joe_> James_F: I suspect I can even remove my scap config tweak rn [07:50:26] !log jforrester@deploy1001 Started scap: Re-start full scap to push out wmf.41 and switch testwikis to it T256669 [07:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:31] T256669: 1.35.0-wmf.41 deployment blockers - https://phabricator.wikimedia.org/T256669 [07:50:43] _joe_: Well, maybe wait half an hour for this scap to run? [07:50:58] <_joe_> James_F: no I mean even now, given my revert [07:51:19] Yeah, I guess "my" scap is running with a copy of the code at this point so it won't disrupt? [07:51:29] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:51:54] <_joe_> James_F: so basically I reverted the change that caused mediawiki to redirect to https if it got XFP: http [07:52:07] PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw1275.eqiad.wmnet, mw1371.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1355.eqiad.wmnet, mw1323.eqiad.wmnet, mw1384.eqiad.wmnet, mw1327.eqiad.wmnet, mw1387.eqiad.wmnet, mw1354.eqiad.wmnet, mw1351.eqiad.wmnet, mw1270.eqiad.wmnet, mw1405.eqiad.wmnet, mw1329.eqiad.wmnet, mw1 [07:52:07] mw1352.eqiad.wmnet, mw1413.eqiad.wmnet, mw1326.eqiad.wmnet, mw1333.eqiad.wmnet, mw1393.eqiad.wmnet, mw1366.eqiad.wmnet, mw1324.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1269.eqiad.wmnet, mw1321.eqiad.wmnet, mw1401.eqiad.wmnet, mw1403.eqiad.wmnet, mw1325.eqiad.wmnet, mw1274.eqiad.wmnet, mw1409.eqiad.wmnet, mw1411.eqiad.wmnet, mw1369.eqiad.wmnet, mw1328.eqiad.wmnet, mw1353.eqiad.wmnet, mw1368.eqiad.wmnet, mw1373.eqiad. [07:52:07] ad.wmnet, mw1332.eqiad.wmnet, mw1385.eqiad.wmnet, mw1330.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [07:52:19] <_joe_> uh [07:52:21] <_joe_> wat [07:52:26] <_joe_> James_F: ^^ [07:52:29] PROBLEM - PyBal backends health check on lvs2010 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw2255.codfw.wmnet, mw2233.codfw.wmnet, mw2313.codfw.wmnet, mw2225.codfw.wmnet, mw2271.codfw.wmnet, mw2301.codfw.wmnet, mw2256.codfw.wmnet, mw2227.codfw.wmnet, mw2197.codfw.wmnet, mw2371.codfw.wmnet, mw2238.codfw.wmnet, mw2196.codfw.wmnet, mw2312.codfw.wmnet, mw2353.codfw.wmnet, mw2325.codfw.wmnet, mw2190.codfw.wmnet, mw2 [07:52:29] mw2232.codfw.wmnet, mw2316.codfw.wmnet, mw2303.codfw.wmnet, mw2228.codfw.wmnet, mw2314.codfw.wmnet, mw2239.codfw.wmnet, mw2242.codfw.wmnet, mw2194.codfw.wmnet, mw2275.codfw.wmnet, mw2257.codfw.wmnet, mw2269.codfw.wmnet, mw2199.codfw.wmnet, mw2361.codfw.wmnet, mw2315.codfw.wmnet, mw2191.codfw.wmnet, mw2270.codfw.wmnet, mw2230.codfw.wmnet, mw2272.codfw.wmnet, mw2241.codfw.wmnet, mw2351.codfw.wmnet, mw2274.codfw.wmnet, mw2277.codfw. [07:52:29] fw.wmnet, mw2329.codfw.wmnet, mw2311.codfw.wmnet, mw2237.codfw.wmnet, mw2307.codfw.wmnet, mw2268.codfw.wmnet, mw2226.codfw.wmnet, mw2273.codfw.wmnet, mw2276.codfw.wmnet, mw2331.codfw.wm https://wikitech.wikimedia.org/wiki/PyBal [07:52:36] <_joe_> revert please [07:52:40] !log jforrester@deploy1001 sync aborted: Re-start full scap to push out wmf.41 and switch testwikis to it T256669 (duration: 02m 14s) [07:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:52:44] Aborted. [07:52:53] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw1275.eqiad.wmnet, mw1371.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1333.eqiad.wmnet, mw1323.eqiad.wmnet, mw1384.eqiad.wmnet, mw1327.eqiad.wmnet, mw1328.eqiad.wmnet, mw1413.eqiad.wmnet, mw1364.eqiad.wmnet, mw1354.eqiad.wmnet, mw1351.eqiad.wmnet, mw1270.eqiad.wmnet, mw1 [07:52:53] mw1329.eqiad.wmnet, mw1269.eqiad.wmnet, mw1352.eqiad.wmnet, mw1326.eqiad.wmnet, mw1355.eqiad.wmnet, mw1393.eqiad.wmnet, mw1366.eqiad.wmnet, mw1324.eqiad.wmnet, mw1273.eqiad.wmnet, mw1370.eqiad.wmnet, mw1320.eqiad.wmnet, mw1401.eqiad.wmnet, mw1403.eqiad.wmnet, mw1325.eqiad.wmnet, mw1274.eqiad.wmnet, mw1373.eqiad.wmnet, mw1411.eqiad.wmnet, mw1369.eqiad.wmnet, mw1387.eqiad.wmnet, mw1353.eqiad.wmnet, mw1368.eqiad.wmnet, mw1409.eqiad. [07:52:53] ad.wmnet, mw1332.eqiad.wmnet, mw1385.eqiad.wmnet, mw1330.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [07:52:53] But scap was in a local-only step. [07:52:54] <_joe_> what the heck is happening [07:53:02] No idea. [07:53:11] PROBLEM - PyBal backends health check on lvs2009 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw2255.codfw.wmnet, mw2233.codfw.wmnet, mw2313.codfw.wmnet, mw2271.codfw.wmnet, mw2301.codfw.wmnet, mw2256.codfw.wmnet, mw2227.codfw.wmnet, mw2197.codfw.wmnet, mw2371.codfw.wmnet, mw2274.codfw.wmnet, mw2196.codfw.wmnet, mw2312.codfw.wmnet, mw2353.codfw.wmnet, mw2325.codfw.wmnet, mw2190.codfw.wmnet, mw2310.codfw.wmnet, mw2 [07:53:11] mw2316.codfw.wmnet, mw2303.codfw.wmnet, mw2228.codfw.wmnet, mw2314.codfw.wmnet, mw2275.codfw.wmnet, mw2242.codfw.wmnet, mw2188.codfw.wmnet, mw2194.codfw.wmnet, mw2239.codfw.wmnet, mw2257.codfw.wmnet, mw2195.codfw.wmnet, mw2269.codfw.wmnet, mw2199.codfw.wmnet, mw2361.codfw.wmnet, mw2315.codfw.wmnet, mw2191.codfw.wmnet, mw2270.codfw.wmnet, mw2230.codfw.wmnet, mw2272.codfw.wmnet, mw2241.codfw.wmnet, mw2238.codfw.wmnet, mw2277.codfw. [07:53:11] fw.wmnet, mw2329.codfw.wmnet, mw2311.codfw.wmnet, mw2237.codfw.wmnet, mw2307.codfw.wmnet, mw2268.codfw.wmnet, mw2226.codfw.wmnet, mw2273.codfw.wmnet, mw2276.codfw.wmnet, mw2331.codfw.wm https://wikitech.wikimedia.org/wiki/PyBal [07:57:32] (03PS1) 10Alexandros Kosiaris: Revert "http_status 302 expected after I80ca62643f5c" [puppet] - 10https://gerrit.wikimedia.org/r/612487 [07:58:08] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Revert "http_status 302 expected after I80ca62643f5c" [puppet] - 10https://gerrit.wikimedia.org/r/612487 (owner: 10Alexandros Kosiaris) [08:00:17] !log restart pybal on lvs2010 after merging https://gerrit.wikimedia.org/r/612487 [08:00:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:32] <_joe_> !log restart pybal on lvs1015 [08:00:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:41] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:01:11] !log akosiaris@cumin1001 conftool action : set/pooled=inactive; selector: name=restbase2009.codfw.wmnet [08:01:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:01:21] RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:01:43] RECOVERY - PyBal backends health check on lvs2010 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:01:43] RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:02:54] !log restart pybal on lvs2007 [08:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:47] <_joe_> !log restart pybal on lvs1016 [08:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:57] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:04:03] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:04:21] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:05:06] !log restart pybal on lvs2009 [08:05:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:53] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:06:07] RECOVERY - PyBal backends health check on lvs2009 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:07:50] (03PS1) 10Marostegui: mariadb: Promote db1103 to x1 master [puppet] - 10https://gerrit.wikimedia.org/r/612474 (https://phabricator.wikimedia.org/T254871) [08:08:27] RECOVERY - Ensure local MW versions match expected deployment on mw1349 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:08:55] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [puppet] - 10https://gerrit.wikimedia.org/r/612474 (https://phabricator.wikimedia.org/T254871) (owner: 10Marostegui) [08:09:36] (03PS1) 10Marostegui: wmnet: Update x1 alias [dns] - 10https://gerrit.wikimedia.org/r/612475 (https://phabricator.wikimedia.org/T254871) [08:09:53] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [dns] - 10https://gerrit.wikimedia.org/r/612475 (https://phabricator.wikimedia.org/T254871) (owner: 10Marostegui) [08:13:37] !log jforrester@deploy1001 Started scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it T256669 [08:13:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:42] T256669: 1.35.0-wmf.41 deployment blockers - https://phabricator.wikimedia.org/T256669 [08:18:47] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:19:59] <_joe_> uh [08:23:47] <_joe_> a lot of Invariant failed: Bad UTF-8 at end of string (2 byte sequence) [08:23:50] <_joe_> from parsoid [08:24:37] <_joe_> James_F: what version are you syncing? .41? [08:24:56] <_joe_> yes [08:25:02] <_joe_> so it's all good, it can continue [08:25:46] Yeah. [08:26:02] wmf.41 isn't pointed at by the production manifest yet. [08:26:07] So nothing will call it. [08:26:11] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:29:21] RECOVERY - Ensure local MW versions match expected deployment on mw2289 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:29:53] RECOVERY - Ensure local MW versions match expected deployment on mw2215 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:30:19] RECOVERY - Ensure local MW versions match expected deployment on mw1319 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:30:36] !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime [08:30:38] !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:30:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:41] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10ops-monitoring-bot) Icinga downtime for 2 days, 0:00:00 set by akosiaris@cumin1001 on 1 host(s) and their services with reason: Memory upgrade ` etcd1002.eqiad.wmnet ` [08:30:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:44] !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime [08:30:45] !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:30:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:49] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10ops-monitoring-bot) Icinga downtime for 2 days, 0:00:00 set by akosiaris@cumin1001 on 1 host(s) and their services with reason: Memory upgrade ` kubetcd1005.eqiad.wmnet ` [08:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:51] RECOVERY - Ensure local MW versions match expected deployment on mw1313 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:30:52] !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime [08:30:53] !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [08:30:55] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1005-cloudelastic-chi-eqiad on cloudelastic1005 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1005&panelId=37 [08:30:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:58] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10ops-monitoring-bot) Icinga downtime for 2 days, 0:00:00 set by akosiaris@cumin1001 on 1 host(s) and their services with reason: Memory upgrade ` ganeti1008.eqiad.wmnet ` [08:30:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:17] RECOVERY - Ensure local MW versions match expected deployment on mw1285 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:31:25] RECOVERY - Ensure local MW versions match expected deployment on mw2188 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:31:47] RECOVERY - Ensure local MW versions match expected deployment on mw2254 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:32:35] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10akosiaris) [08:32:49] RECOVERY - Ensure local MW versions match expected deployment on mw1366 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:33:07] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10akosiaris) @Jclark-ctr ganeti1008 want faster than expected and it is ready for the memory upgrade. Downtimed and powered off. [08:33:33] RECOVERY - Ensure local MW versions match expected deployment on mw1268 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:33:43] RECOVERY - Ensure local MW versions match expected deployment on mw2309 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:33:43] RECOVERY - Ensure local MW versions match expected deployment on mw2360 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:34:03] RECOVERY - Ensure local MW versions match expected deployment on wtp2005 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:34:03] RECOVERY - Ensure local MW versions match expected deployment on wtp2007 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:34:11] RECOVERY - Ensure local MW versions match expected deployment on mw2312 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:34:11] RECOVERY - Ensure local MW versions match expected deployment on mw2189 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:34:11] RECOVERY - Ensure local MW versions match expected deployment on mw2192 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:34:59] RECOVERY - Ensure local MW versions match expected deployment on mw1390 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:03] RECOVERY - Ensure local MW versions match expected deployment on mw1271 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:05] RECOVERY - Ensure local MW versions match expected deployment on mw1320 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:17] RECOVERY - Ensure local MW versions match expected deployment on mw1376 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:17] RECOVERY - Ensure local MW versions match expected deployment on mw1373 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:17] RECOVERY - Ensure local MW versions match expected deployment on mw2297 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:17] RECOVERY - Ensure local MW versions match expected deployment on wtp2017 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:17] RECOVERY - Ensure local MW versions match expected deployment on wtp2015 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:21] RECOVERY - Ensure local MW versions match expected deployment on mw2357 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:21] RECOVERY - Ensure local MW versions match expected deployment on mw2314 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:21] RECOVERY - Ensure local MW versions match expected deployment on mw2243 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:35] RECOVERY - Ensure local MW versions match expected deployment on mw2327 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:41] RECOVERY - Ensure local MW versions match expected deployment on mw1288 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:43] RECOVERY - Ensure local MW versions match expected deployment on mw2252 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:35:43] RECOVERY - Ensure local MW versions match expected deployment on mw2138 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:03] RECOVERY - Ensure local MW versions match expected deployment on mw1365 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:03] RECOVERY - Ensure local MW versions match expected deployment on mw1287 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:07] RECOVERY - Ensure local MW versions match expected deployment on mw2375 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:07] RECOVERY - Ensure local MW versions match expected deployment on mw2364 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:07] RECOVERY - Ensure local MW versions match expected deployment on mw2308 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:13] RECOVERY - Ensure local MW versions match expected deployment on mw1413 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:13] RECOVERY - Ensure local MW versions match expected deployment on mw1336 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:13] RECOVERY - Ensure local MW versions match expected deployment on mw2338 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:13] RECOVERY - Ensure local MW versions match expected deployment on mw2298 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:13] RECOVERY - Ensure local MW versions match expected deployment on mw2269 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:23] RECOVERY - Ensure local MW versions match expected deployment on mw2376 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:35] RECOVERY - Ensure local MW versions match expected deployment on wtp1038 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:39] RECOVERY - Ensure local MW versions match expected deployment on mw2336 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:41] RECOVERY - Ensure local MW versions match expected deployment on wtp1028 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:47] RECOVERY - Ensure local MW versions match expected deployment on mw1329 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:55] RECOVERY - Ensure local MW versions match expected deployment on mw2337 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:36:55] RECOVERY - Ensure local MW versions match expected deployment on mw2281 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:03] RECOVERY - Ensure local MW versions match expected deployment on mw1379 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:03] RECOVERY - Ensure local MW versions match expected deployment on mw1360 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:09] RECOVERY - Ensure local MW versions match expected deployment on mw1334 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:09] RECOVERY - Ensure local MW versions match expected deployment on mw1346 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:09] RECOVERY - Ensure local MW versions match expected deployment on snapshot1006 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:09] RECOVERY - Ensure local MW versions match expected deployment on mw2374 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:11] RECOVERY - Ensure local MW versions match expected deployment on mw2272 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:15] RECOVERY - Ensure local MW versions match expected deployment on mw2210 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:15] RECOVERY - Ensure local MW versions match expected deployment on mw2191 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:37] RECOVERY - Ensure local MW versions match expected deployment on mw2262 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:41] RECOVERY - Ensure local MW versions match expected deployment on mw1314 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:43] RECOVERY - Ensure local MW versions match expected deployment on wtp1048 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:47] RECOVERY - Ensure local MW versions match expected deployment on mw1396 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:55] RECOVERY - Ensure local MW versions match expected deployment on snapshot1010 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:55] RECOVERY - Ensure local MW versions match expected deployment on mw2334 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:57] RECOVERY - Ensure local MW versions match expected deployment on mw1315 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:57] RECOVERY - Ensure local MW versions match expected deployment on mw1372 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:57] RECOVERY - Ensure local MW versions match expected deployment on mw2315 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:37:57] RECOVERY - Ensure local MW versions match expected deployment on mw2285 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:05] RECOVERY - Ensure local MW versions match expected deployment on wtp1043 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:05] RECOVERY - Ensure local MW versions match expected deployment on snapshot1009 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:05] RECOVERY - Ensure local MW versions match expected deployment on mw1317 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:05] RECOVERY - Ensure local MW versions match expected deployment on wtp1029 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:09] RECOVERY - Ensure local MW versions match expected deployment on mw2306 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:09] RECOVERY - Ensure local MW versions match expected deployment on wtp2009 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:09] RECOVERY - Ensure local MW versions match expected deployment on wtp2016 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:25] RECOVERY - Ensure local MW versions match expected deployment on mw2250 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:27] RECOVERY - Ensure local MW versions match expected deployment on mw2225 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:27] RECOVERY - Ensure local MW versions match expected deployment on mw2197 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:30] (03PS1) 10Muehlenhoff: profile::piwik::webserver: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/612506 [08:38:45] RECOVERY - Ensure local MW versions match expected deployment on mw2287 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:49] RECOVERY - Ensure local MW versions match expected deployment on mw1358 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:51] RECOVERY - Ensure local MW versions match expected deployment on mw2356 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:53] RECOVERY - Ensure local MW versions match expected deployment on mw2326 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:53] RECOVERY - Ensure local MW versions match expected deployment on mw2205 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:38:59] RECOVERY - Ensure local MW versions match expected deployment on mw1307 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:03] RECOVERY - Ensure local MW versions match expected deployment on mw1357 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:09] (03PS1) 10Jbond: prometheus::memcached_exporter: fix arguments hiera call [puppet] - 10https://gerrit.wikimedia.org/r/612507 [08:39:11] RECOVERY - Ensure local MW versions match expected deployment on mw2231 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:11] RECOVERY - Ensure local MW versions match expected deployment on mw2264 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:25] RECOVERY - Ensure local MW versions match expected deployment on mw1310 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:25] RECOVERY - Ensure local MW versions match expected deployment on mw1284 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:26] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [08:39:33] PROBLEM - k8s API server requests latencies on argon is CRITICAL: instance=10.64.32.133 verb=PATCH https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [08:39:33] RECOVERY - Ensure local MW versions match expected deployment on mw2302 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:35] RECOVERY - Ensure local MW versions match expected deployment on mw1304 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:53] RECOVERY - Ensure local MW versions match expected deployment on wtp2012 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:39:57] RECOVERY - Ensure local MW versions match expected deployment on mw1355 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:01] RECOVERY - Ensure local MW versions match expected deployment on mw1401 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:01] RECOVERY - Ensure local MW versions match expected deployment on mw1374 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:01] RECOVERY - Ensure local MW versions match expected deployment on mw2335 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:01] RECOVERY - Ensure local MW versions match expected deployment on mw2228 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:07] RECOVERY - Ensure local MW versions match expected deployment on mw2229 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:07] RECOVERY - Ensure local MW versions match expected deployment on mw2142 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:09] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133 operation=get https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [08:40:25] RECOVERY - Ensure local MW versions match expected deployment on mw2313 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1135 for PDU upgrade T257871', diff saved to https://phabricator.wikimedia.org/P11895 and previous config saved to /var/cache/conftool/dbconfig/20200714-084033-marostegui.json [08:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:40:38] T257871: eqiad: PDU Upgrade in C8 (July 14, 2pm-4pm UTC)) - https://phabricator.wikimedia.org/T257871 [08:40:44] the etcd request latencies is probably etcd1002 being shutdown [08:40:51] RECOVERY - Ensure local MW versions match expected deployment on mw1407 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:51] RECOVERY - Ensure local MW versions match expected deployment on wtp1044 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:40:51] RECOVERY - Ensure local MW versions match expected deployment on mw1323 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:07] RECOVERY - Ensure local MW versions match expected deployment on mw1391 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:07] but etcd1001 and etc1003 are up and running, so no harm done [08:41:07] RECOVERY - Ensure local MW versions match expected deployment on mw2145 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:23] RECOVERY - Ensure local MW versions match expected deployment on mw1347 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:25] RECOVERY - Ensure local MW versions match expected deployment on wtp1030 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:31] RECOVERY - Ensure local MW versions match expected deployment on mw1295 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:33] RECOVERY - Ensure local MW versions match expected deployment on mw2351 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:33] RECOVERY - Ensure local MW versions match expected deployment on mw2224 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:33] RECOVERY - Ensure local MW versions match expected deployment on wtp2001 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:35] RECOVERY - Ensure local MW versions match expected deployment on mw2291 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:43] RECOVERY - Ensure local MW versions match expected deployment on mw2245 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:45] RECOVERY - Ensure local MW versions match expected deployment on mw1348 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:45] RECOVERY - Ensure local MW versions match expected deployment on mw1356 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:55] RECOVERY - Ensure local MW versions match expected deployment on mw1332 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:55] RECOVERY - Ensure local MW versions match expected deployment on mw2299 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:55] RECOVERY - Ensure local MW versions match expected deployment on mw2355 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:41:59] RECOVERY - Ensure local MW versions match expected deployment on mw1377 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:01] RECOVERY - Ensure local MW versions match expected deployment on mw2267 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:01] RECOVERY - Ensure local MW versions match expected deployment on mw2288 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:01] RECOVERY - Ensure local MW versions match expected deployment on mw2223 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:03] RECOVERY - Ensure local MW versions match expected deployment on mw2251 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:03] RECOVERY - Ensure local MW versions match expected deployment on mw2246 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:03] RECOVERY - Ensure local MW versions match expected deployment on mw2195 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:05] RECOVERY - Ensure local MW versions match expected deployment on mw2293 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:07] RECOVERY - Ensure local MW versions match expected deployment on mw2230 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:15] RECOVERY - Ensure local MW versions match expected deployment on mw1406 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:17] RECOVERY - Ensure local MW versions match expected deployment on mw2304 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:27] RECOVERY - Ensure local MW versions match expected deployment on wtp1037 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:33] RECOVERY - Ensure local MW versions match expected deployment on mw1350 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:38] GET /w/index.php?format=json&title=Special:EntityData&id=Q56787564&revision=961513465 HTTP/1.1 [08:42:39] RECOVERY - Ensure local MW versions match expected deployment on mw1308 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:41] RECOVERY - Ensure local MW versions match expected deployment on mw1384 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:46] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [08:42:47] RECOVERY - Ensure local MW versions match expected deployment on mw2330 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:47] RECOVERY - Ensure local MW versions match expected deployment on mw2279 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:42:57] RECOVERY - Ensure local MW versions match expected deployment on mw1359 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:01] RECOVERY - Ensure local MW versions match expected deployment on mw1293 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:01] RECOVERY - Ensure local MW versions match expected deployment on mw1337 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:01] RECOVERY - Ensure local MW versions match expected deployment on mw1286 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:03] RECOVERY - Ensure local MW versions match expected deployment on mw2290 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:07] RECOVERY - Ensure local MW versions match expected deployment on mw2286 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:07] RECOVERY - Ensure local MW versions match expected deployment on mw2222 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:07] RECOVERY - Ensure local MW versions match expected deployment on mw2220 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:09] RECOVERY - Ensure local MW versions match expected deployment on mw2270 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:13] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1006-cloudelastic-chi-eqiad on cloudelastic1006 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1006&panelId=37 [08:43:21] RECOVERY - Ensure local MW versions match expected deployment on mw2366 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:25] RECOVERY - Ensure local MW versions match expected deployment on mw2141 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:29] RECOVERY - Ensure local MW versions match expected deployment on mw2146 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:29] RECOVERY - Ensure local MW versions match expected deployment on mw2203 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:29] RECOVERY - Ensure local MW versions match expected deployment on mw2201 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:33] RECOVERY - Ensure local MW versions match expected deployment on mw2277 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:47] RECOVERY - Ensure local MW versions match expected deployment on mw1309 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:49] RECOVERY - Ensure local MW versions match expected deployment on mw1411 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:49] RECOVERY - Ensure local MW versions match expected deployment on mw1353 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:51] RECOVERY - Ensure local MW versions match expected deployment on mw2202 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:51] RECOVERY - Ensure local MW versions match expected deployment on wtp2019 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:57] RECOVERY - Ensure local MW versions match expected deployment on mw1369 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:43:57] RECOVERY - Ensure local MW versions match expected deployment on mw1339 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:01] RECOVERY - Ensure local MW versions match expected deployment on mw2331 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:01] RECOVERY - Ensure local MW versions match expected deployment on mw2237 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:01] RECOVERY - Ensure local MW versions match expected deployment on mw2212 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:01] RECOVERY - Ensure local MW versions match expected deployment on wtp2010 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:11] RECOVERY - Ensure local MW versions match expected deployment on mw1303 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:13] Sorry for all the noise. :-( [08:44:17] RECOVERY - Ensure local MW versions match expected deployment on mw1301 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:18] (03CR) 10Elukey: [C: 03+2] profile::piwik::webserver: Remove support for stretch [puppet] - 10https://gerrit.wikimedia.org/r/612506 (owner: 10Muehlenhoff) [08:44:25] RECOVERY - Ensure local MW versions match expected deployment on wtp1025 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:44:43] RECOVERY - Ensure local MW versions match expected deployment on mw2363 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:01] RECOVERY - Ensure local MW versions match expected deployment on wtp2013 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:03] RECOVERY - Ensure local MW versions match expected deployment on mw1370 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:03] RECOVERY - Ensure local MW versions match expected deployment on mw2244 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:05] RECOVERY - Ensure local MW versions match expected deployment on mw2255 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:15] RECOVERY - Ensure local MW versions match expected deployment on mw1282 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:21] RECOVERY - Ensure local MW versions match expected deployment on mw2320 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:23] RECOVERY - Ensure local MW versions match expected deployment on mw2217 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:23] RECOVERY - Ensure local MW versions match expected deployment on mw2227 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:25] RECOVERY - Ensure local MW versions match expected deployment on mw2322 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:25] RECOVERY - Ensure local MW versions match expected deployment on mw2370 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:25] RECOVERY - Ensure local MW versions match expected deployment on mw2339 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:25] RECOVERY - Ensure local MW versions match expected deployment on mw2307 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:33] RECOVERY - Ensure local MW versions match expected deployment on mw1394 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:37] RECOVERY - Ensure local MW versions match expected deployment on mw2354 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:37] RECOVERY - Ensure local MW versions match expected deployment on mw2301 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:43] RECOVERY - Ensure local MW versions match expected deployment on mw1408 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:43] RECOVERY - Ensure local MW versions match expected deployment on wtp1045 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:45] RECOVERY - Ensure local MW versions match expected deployment on mw1296 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:53] RECOVERY - Ensure local MW versions match expected deployment on mw1312 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:53] RECOVERY - Ensure local MW versions match expected deployment on mw1333 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:53] RECOVERY - Ensure local MW versions match expected deployment on mw1345 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:53] RECOVERY - Ensure local MW versions match expected deployment on mw1272 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:53] RECOVERY - Ensure local MW versions match expected deployment on mw2316 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:53] RECOVERY - Ensure local MW versions match expected deployment on mw2305 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:45:53] RECOVERY - Ensure local MW versions match expected deployment on mw2261 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:03] RECOVERY - Ensure local MW versions match expected deployment on wtp1046 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:03] RECOVERY - Ensure local MW versions match expected deployment on mw1281 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:17] RECOVERY - Ensure local MW versions match expected deployment on mw1342 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:43] RECOVERY - Ensure local MW versions match expected deployment on mw2226 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:45] RECOVERY - Ensure local MW versions match expected deployment on mw1298 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:47] RECOVERY - Ensure local MW versions match expected deployment on mw2233 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:51] RECOVERY - Ensure local MW versions match expected deployment on mw2268 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:57] (03PS1) 10Jbond: role: noop style change [puppet] - 10https://gerrit.wikimedia.org/r/612510 [08:46:59] RECOVERY - Ensure local MW versions match expected deployment on mw2214 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:46:59] RECOVERY - Ensure local MW versions match expected deployment on wtp2002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:15] RECOVERY - Ensure local MW versions match expected deployment on mw1361 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:15] RECOVERY - Ensure local MW versions match expected deployment on wtp1034 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:27] RECOVERY - Ensure local MW versions match expected deployment on mw1371 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:35] RECOVERY - Ensure local MW versions match expected deployment on mw1388 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:37] RECOVERY - Ensure local MW versions match expected deployment on mw1382 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:51] RECOVERY - Ensure local MW versions match expected deployment on mw1352 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:55] RECOVERY - Ensure local MW versions match expected deployment on mw2144 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:47:57] RECOVERY - Ensure local MW versions match expected deployment on mw2236 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:07] RECOVERY - Ensure local MW versions match expected deployment on mw2198 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:23] RECOVERY - Ensure local MW versions match expected deployment on wtp1026 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:29] RECOVERY - Ensure local MW versions match expected deployment on mw1364 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:36] (03CR) 10Jbond: [C: 03+2] role: noop style change [puppet] - 10https://gerrit.wikimedia.org/r/612510 (owner: 10Jbond) [08:48:37] RECOVERY - Ensure local MW versions match expected deployment on mw1404 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:39] RECOVERY - Ensure local MW versions match expected deployment on mw2300 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:47] RECOVERY - Ensure local MW versions match expected deployment on wtp1031 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:47] RECOVERY - Ensure local MW versions match expected deployment on wtp1032 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:51] RECOVERY - Ensure local MW versions match expected deployment on mw1306 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:51] RECOVERY - Ensure local MW versions match expected deployment on wtp1033 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:51] RECOVERY - Ensure local MW versions match expected deployment on mw1331 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:53] RECOVERY - Ensure local MW versions match expected deployment on mw2371 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:53] RECOVERY - Ensure local MW versions match expected deployment on mw2365 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:55] RECOVERY - Ensure local MW versions match expected deployment on mw2256 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:48:55] RECOVERY - Ensure local MW versions match expected deployment on mw2276 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:01] RECOVERY - Ensure local MW versions match expected deployment on mw2240 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:15] RECOVERY - Ensure local MW versions match expected deployment on mw1367 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:19] RECOVERY - Ensure local MW versions match expected deployment on mw1381 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:19] RECOVERY - Ensure local MW versions match expected deployment on mw1380 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:19] RECOVERY - Ensure local MW versions match expected deployment on mw2196 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:21] RECOVERY - Ensure local MW versions match expected deployment on mw2200 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:25] RECOVERY - Ensure local MW versions match expected deployment on mw2257 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:27] RECOVERY - Ensure local MW versions match expected deployment on mw2294 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:27] RECOVERY - Ensure local MW versions match expected deployment on mw2232 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:39] RECOVERY - Ensure local MW versions match expected deployment on mw1294 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:47] RECOVERY - Ensure local MW versions match expected deployment on mw1283 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:49] RECOVERY - Ensure local MW versions match expected deployment on mw1340 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:53] RECOVERY - Ensure local MW versions match expected deployment on mw2332 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:53] RECOVERY - Ensure local MW versions match expected deployment on mw2361 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:53] RECOVERY - Ensure local MW versions match expected deployment on mw2353 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:53] RECOVERY - Ensure local MW versions match expected deployment on mw2321 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:53] RECOVERY - Ensure local MW versions match expected deployment on mw2259 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:53] RECOVERY - Ensure local MW versions match expected deployment on mw2265 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:53] RECOVERY - Ensure local MW versions match expected deployment on wtp2006 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:49:57] RECOVERY - Ensure local MW versions match expected deployment on mw2147 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:07] RECOVERY - Ensure local MW versions match expected deployment on mw1389 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:09] RECOVERY - Ensure local MW versions match expected deployment on mw1267 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:11] RECOVERY - Ensure local MW versions match expected deployment on mw1326 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:11] RECOVERY - Ensure local MW versions match expected deployment on mw2303 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:11] RECOVERY - Ensure local MW versions match expected deployment on mw2319 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:11] RECOVERY - Ensure local MW versions match expected deployment on mw2333 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:11] RECOVERY - Ensure local MW versions match expected deployment on mw2242 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:25] RECOVERY - Ensure local MW versions match expected deployment on mw1324 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:27] RECOVERY - Ensure local MW versions match expected deployment on mw1386 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:27] RECOVERY - Ensure local MW versions match expected deployment on mw2235 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:47] RECOVERY - Ensure local MW versions match expected deployment on mw1378 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:55] RECOVERY - Ensure local MW versions match expected deployment on mw2135 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:50:55] RECOVERY - Ensure local MW versions match expected deployment on mw2143 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:03] RECOVERY - Ensure local MW versions match expected deployment on mw1344 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:07] RECOVERY - Ensure local MW versions match expected deployment on mw1266 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:07] RECOVERY - Ensure local MW versions match expected deployment on mw1269 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:07] RECOVERY - Ensure local MW versions match expected deployment on mw2328 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:09] RECOVERY - Ensure local MW versions match expected deployment on mw2207 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:15] RECOVERY - Ensure local MW versions match expected deployment on mw2241 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:17] RECOVERY - Ensure local MW versions match expected deployment on mw2295 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:17] RECOVERY - Ensure local MW versions match expected deployment on mw2318 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:17] RECOVERY - Ensure local MW versions match expected deployment on mw2323 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:21] (03CR) 10Elukey: "Nice catch! One qs - from what I can see in puppet, profile::prometheus::memcached_exporter::arguments is defined only for some memcached " [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [08:51:25] RECOVERY - Ensure local MW versions match expected deployment on mw2206 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:35] RECOVERY - Ensure local MW versions match expected deployment on wtp1042 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:35] RECOVERY - Ensure local MW versions match expected deployment on mw2283 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:45] RECOVERY - Ensure local MW versions match expected deployment on mw1270 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:45] RECOVERY - Ensure local MW versions match expected deployment on mw1299 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:45] RECOVERY - Ensure local MW versions match expected deployment on mw1273 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:45] RECOVERY - Ensure local MW versions match expected deployment on mw2352 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:45] RECOVERY - Ensure local MW versions match expected deployment on mw2284 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:51:45] RECOVERY - Ensure local MW versions match expected deployment on mw2234 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:52:07] RECOVERY - Ensure local MW versions match expected deployment on wtp1047 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:52:07] RECOVERY - Ensure local MW versions match expected deployment on mw1305 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:52:07] RECOVERY - Ensure local MW versions match expected deployment on wtp2020 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:05] RECOVERY - Ensure local MW versions match expected deployment on mw2358 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:11] RECOVERY - Ensure local MW versions match expected deployment on mw1274 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:19] RECOVERY - Ensure local MW versions match expected deployment on mw2368 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:25] RECOVERY - Ensure local MW versions match expected deployment on mw1387 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:27] RECOVERY - Ensure local MW versions match expected deployment on mw2282 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:37] RECOVERY - Ensure local MW versions match expected deployment on mw1316 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:37] RECOVERY - Ensure local MW versions match expected deployment on mw1327 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:39] RECOVERY - Ensure local MW versions match expected deployment on mw2278 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:41] RECOVERY - Ensure local MW versions match expected deployment on mw1399 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:41] RECOVERY - Ensure local MW versions match expected deployment on mw2193 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:41] RECOVERY - Ensure local MW versions match expected deployment on mw2187 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:43] (03PS1) 10Legoktm: Add REL1_35 to ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612511 [08:53:45] RECOVERY - Ensure local MW versions match expected deployment on mw2248 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:45] RECOVERY - Ensure local MW versions match expected deployment on mw2280 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:45] RECOVERY - Ensure local MW versions match expected deployment on mw2199 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:53:53] RECOVERY - Ensure local MW versions match expected deployment on wtp1039 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:09] RECOVERY - Ensure local MW versions match expected deployment on mwmaint1002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:11] RECOVERY - Ensure local MW versions match expected deployment on mw1321 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:11] RECOVERY - Ensure local MW versions match expected deployment on mw1311 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:21] RECOVERY - Ensure local MW versions match expected deployment on mw1398 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:27] RECOVERY - Ensure local MW versions match expected deployment on mw2249 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:33] RECOVERY - Ensure local MW versions match expected deployment on scandium is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:35] RECOVERY - Ensure local MW versions match expected deployment on mw1395 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:43] RECOVERY - Ensure local MW versions match expected deployment on mw1330 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:45] RECOVERY - Ensure local MW versions match expected deployment on mw2369 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:47] RECOVERY - Ensure local MW versions match expected deployment on mw2373 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:47] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: PDU Upgrade in C8 (July 14, 2pm-4pm UTC)) - https://phabricator.wikimedia.org/T257871 (10Marostegui) db1135 has been depooled, just in case. [08:54:51] RECOVERY - Ensure local MW versions match expected deployment on mw2194 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:59] RECOVERY - Ensure local MW versions match expected deployment on labweb1001 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:54:59] RECOVERY - Ensure local MW versions match expected deployment on mw1403 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:03] RECOVERY - Ensure local MW versions match expected deployment on mw2367 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:03] RECOVERY - Ensure local MW versions match expected deployment on mwmaint2001 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:09] RECOVERY - Ensure local MW versions match expected deployment on snapshot1008 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:11] RECOVERY - Ensure local MW versions match expected deployment on mwdebug2001 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:11] RECOVERY - Ensure local MW versions match expected deployment on mw2140 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:15] RECOVERY - Ensure local MW versions match expected deployment on mw1325 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:15] RECOVERY - Ensure local MW versions match expected deployment on mw1289 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:17] RECOVERY - Ensure local MW versions match expected deployment on snapshot1007 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:17] RECOVERY - Ensure local MW versions match expected deployment on mw2350 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:23] RECOVERY - Ensure local MW versions match expected deployment on mw1397 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:29] RECOVERY - Ensure local MW versions match expected deployment on mw2275 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:31] RECOVERY - Ensure local MW versions match expected deployment on mw1393 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:41] RECOVERY - Ensure local MW versions match expected deployment on labweb1002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:41] RECOVERY - Ensure local MW versions match expected deployment on mwdebug1002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:41] RECOVERY - Ensure local MW versions match expected deployment on mw1335 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:55:45] RECOVERY - Ensure local MW versions match expected deployment on wtp2003 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:03] RECOVERY - Ensure local MW versions match expected deployment on mw2359 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:07] RECOVERY - Ensure local MW versions match expected deployment on wtp2008 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:15] RECOVERY - Ensure local MW versions match expected deployment on mw1318 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:15] RECOVERY - Ensure local MW versions match expected deployment on mw1322 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:19] RECOVERY - Ensure local MW versions match expected deployment on mw1405 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:25] RECOVERY - Ensure local MW versions match expected deployment on mw2324 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:25] RECOVERY - Ensure local MW versions match expected deployment on mw2263 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:43] RECOVERY - Ensure local MW versions match expected deployment on mw1341 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:43] RECOVERY - Ensure local MW versions match expected deployment on mw1343 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:43] RECOVERY - Ensure local MW versions match expected deployment on mw1383 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:45] RECOVERY - Ensure local MW versions match expected deployment on mw2216 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:56:53] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133 operation=get https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [08:56:57] RECOVERY - Ensure local MW versions match expected deployment on wtp1036 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:57:03] RECOVERY - Ensure local MW versions match expected deployment on mw2271 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:57:15] RECOVERY - Ensure local MW versions match expected deployment on mw2190 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:57:27] RECOVERY - Ensure local MW versions match expected deployment on wtp2018 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:57:33] RECOVERY - Ensure local MW versions match expected deployment on mw1392 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:57:37] RECOVERY - Ensure local MW versions match expected deployment on mw1375 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:57:55] RECOVERY - Ensure local MW versions match expected deployment on mw1385 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:57:55] RECOVERY - Ensure local MW versions match expected deployment on mw1302 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:25] RECOVERY - Ensure local MW versions match expected deployment on mw2372 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:39] RECOVERY - Ensure local MW versions match expected deployment on wtp2011 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:39] RECOVERY - Ensure local MW versions match expected deployment on wtp2004 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:55] RECOVERY - Ensure local MW versions match expected deployment on wtp1035 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:55] RECOVERY - Ensure local MW versions match expected deployment on mw2273 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:55] RECOVERY - Ensure local MW versions match expected deployment on mw2253 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:57] RECOVERY - Ensure local MW versions match expected deployment on mw1400 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:58:57] RECOVERY - Ensure local MW versions match expected deployment on wtp1027 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:01] RECOVERY - Ensure local MW versions match expected deployment on mw1412 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:03] RECOVERY - Ensure local MW versions match expected deployment on mw2260 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:09] RECOVERY - Ensure local MW versions match expected deployment on mw2247 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:21] RECOVERY - Ensure local MW versions match expected deployment on mw1297 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:23] RECOVERY - Ensure local MW versions match expected deployment on mw2139 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:27] RECOVERY - Ensure local MW versions match expected deployment on mw1409 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:27] RECOVERY - Ensure local MW versions match expected deployment on mw2274 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:31] RECOVERY - Ensure local MW versions match expected deployment on mw2239 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:33] RECOVERY - Ensure local MW versions match expected deployment on mw1368 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:33] RECOVERY - Ensure local MW versions match expected deployment on wtp1041 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:33] RECOVERY - Ensure local MW versions match expected deployment on mw1300 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:35] RECOVERY - Ensure local MW versions match expected deployment on mw2292 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [08:59:37] (03PS1) 10Muehlenhoff: Switch matomo to CAS (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/612512 [09:00:01] RECOVERY - Ensure local MW versions match expected deployment on mw1290 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:19] RECOVERY - Ensure local MW versions match expected deployment on mw2317 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:29] RECOVERY - Ensure local MW versions match expected deployment on mw1354 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:35] RECOVERY - Ensure local MW versions match expected deployment on mw1338 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:35] RECOVERY - Ensure local MW versions match expected deployment on mw2362 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:37] RECOVERY - Ensure local MW versions match expected deployment on mw2329 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:37] RECOVERY - Ensure local MW versions match expected deployment on mwdebug2002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:37] RECOVERY - Ensure local MW versions match expected deployment on mw2266 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:00:43] RECOVERY - Ensure local MW versions match expected deployment on mw2204 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:01] RECOVERY - Ensure local MW versions match expected deployment on mw2211 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:11] RECOVERY - Ensure local MW versions match expected deployment on mw2325 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:23] RECOVERY - Ensure local MW versions match expected deployment on mw2310 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:23] RECOVERY - Ensure local MW versions match expected deployment on mw2258 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:25] RECOVERY - Ensure local MW versions match expected deployment on mw1363 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:27] RECOVERY - Ensure local MW versions match expected deployment on mw2238 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:33] RECOVERY - Ensure local MW versions match expected deployment on wtp1040 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:37] RECOVERY - Ensure local MW versions match expected deployment on mw2136 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:37] RECOVERY - Ensure local MW versions match expected deployment on mw2209 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:53] RECOVERY - Ensure local MW versions match expected deployment on mw2219 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:53] RECOVERY - Ensure local MW versions match expected deployment on mw2137 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:53] RECOVERY - Ensure local MW versions match expected deployment on mw2208 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:01:53] RECOVERY - Ensure local MW versions match expected deployment on wtp2014 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:01] RECOVERY - Ensure local MW versions match expected deployment on snapshot1005 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:11] RECOVERY - Ensure local MW versions match expected deployment on mw2296 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:35] RECOVERY - Ensure local MW versions match expected deployment on mw1328 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:47] RECOVERY - Ensure local MW versions match expected deployment on mw1402 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:49] RECOVERY - Ensure local MW versions match expected deployment on mw1362 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:51] RECOVERY - Ensure local MW versions match expected deployment on mw1275 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:55] RECOVERY - Ensure local MW versions match expected deployment on mw2311 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:02:55] RECOVERY - Ensure local MW versions match expected deployment on mw2218 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:03:07] RECOVERY - Ensure local MW versions match expected deployment on mw2221 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:03:10] 10Operations, 10RESTBase: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10Joe) p:05Unbreak!โ†’03High a:03Joe Resetting to high since we've fixed the immediate problem by reverting the MediaWiki patch. Before we roll it out again we need: [] Fix wikifeeds to call the... [09:03:17] RECOVERY - Ensure local MW versions match expected deployment on mw1351 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:03:37] RECOVERY - Ensure local MW versions match expected deployment on mw1410 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [09:05:16] (03PS1) 10Giuseppe Lavagetto: wikifeeds: use the puppet CA if available, call the mw api via https [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) [09:05:18] !log jforrester@deploy1001 Finished scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it T256669 (duration: 51m 41s) [09:05:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:24] T256669: 1.35.0-wmf.41 deployment blockers - https://phabricator.wikimedia.org/T256669 [09:05:27] Finally. [09:05:31] (03CR) 10jerkins-bot: [V: 04-1] wikifeeds: use the puppet CA if available, call the mw api via https [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:05:38] (03PS1) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [09:06:22] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612514 (owner: 10Jbond) [09:07:20] (03PS2) 10Muehlenhoff: Switch matomo to CAS (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/612512 [09:09:15] (03PS2) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [09:09:37] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612514 (owner: 10Jbond) [09:11:01] (03PS3) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [09:11:27] (03CR) 10JMeybohm: [C: 04-1] wikifeeds: use the puppet CA if available, call the mw api via https (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:11:29] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612514 (owner: 10Jbond) [09:11:41] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133 operation=get https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:11:54] <_joe_> jayme: yeah I just realized [09:12:08] <_joe_> values.yaml can't really use templating :P [09:12:29] <_joe_> akosiaris: also it seems etcd for k8s is not in good shape? all comes at the same time heh [09:13:00] _joe_: and then someone does channel-hopping as well :P [09:13:31] (03PS1) 10Jbond: P:idp::memcached: pass port directly to prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/612516 [09:13:53] akosiaris: pointed out etcd somewhere above, hard to find though. 2 of 3 nodes are fine was the tenor [09:14:48] (03CR) 10jerkins-bot: [V: 04-1] P:idp::memcached: pass port directly to prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/612516 (owner: 10Jbond) [09:15:15] (03PS2) 10Giuseppe Lavagetto: wikifeeds: use the puppet CA if available, call the mw api via https [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) [09:15:45] (03CR) 10Giuseppe Lavagetto: wikifeeds: use the puppet CA if available, call the mw api via https (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:17:31] (03PS3) 10Giuseppe Lavagetto: wikifeeds: use the puppet CA if available, call the mw api via https [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) [09:23:06] (03CR) 10JMeybohm: [C: 03+1] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:23:11] (03PS3) 10Muehlenhoff: Switch matomo to CAS (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/612512 [09:24:15] (03CR) 10Giuseppe Lavagetto: [C: 03+2] wikifeeds: use the puppet CA if available, call the mw api via https [deployment-charts] - 10https://gerrit.wikimedia.org/r/612513 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:25:22] (03PS2) 10Jbond: P:idp::memcached: pass port directly to prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/612516 [09:28:17] (03CR) 10Jbond: [C: 03+2] P:idp::memcached: pass port directly to prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/612516 (owner: 10Jbond) [09:28:26] (03PS3) 10Jbond: P:idp::memcached: pass port directly to prometheus exporter [puppet] - 10https://gerrit.wikimedia.org/r/612516 [09:29:08] (03CR) 10Jbond: [C: 03+2] prometheus::memcached_exporter: fix arguments hiera call [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [09:30:47] (03PS2) 10Jbond: prometheus::memcached_exporter: fix arguments hiera call [puppet] - 10https://gerrit.wikimedia.org/r/612507 [09:31:33] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [09:31:42] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [09:31:57] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133 operation=get https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:35:20] (03PS1) 10Giuseppe Lavagetto: termbox: use https to reach the api, the puppet CA where needed. [deployment-charts] - 10https://gerrit.wikimedia.org/r/612521 (https://phabricator.wikimedia.org/T257887) [09:35:52] (03PS2) 10Jforrester: ExtensionDistributor: There are now REL1_35 dev snapshots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612469 [09:35:57] (03CR) 10Jforrester: [C: 03+2] ExtensionDistributor: There are now REL1_35 dev snapshots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612469 (owner: 10Jforrester) [09:36:27] (03PS2) 10Jforrester: Add REL1_35 to ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612511 (owner: 10Legoktm) [09:36:34] (03CR) 10Jforrester: [C: 03+2] Add REL1_35 to ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612511 (owner: 10Legoktm) [09:36:36] (03CR) 10Kormat: [C: 03+1] mariadb: Promote db1103 to x1 master [puppet] - 10https://gerrit.wikimedia.org/r/612474 (https://phabricator.wikimedia.org/T254871) (owner: 10Marostegui) [09:36:57] (03CR) 10Kormat: [C: 03+1] wmnet: Update x1 alias [dns] - 10https://gerrit.wikimedia.org/r/612475 (https://phabricator.wikimedia.org/T254871) (owner: 10Marostegui) [09:37:01] (03Merged) 10jenkins-bot: ExtensionDistributor: There are now REL1_35 dev snapshots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612469 (owner: 10Jforrester) [09:37:04] (03PS1) 10Ema: VCL: log X-Cache-Int on cacheable Set-Cookie responses [puppet] - 10https://gerrit.wikimedia.org/r/612522 (https://phabricator.wikimedia.org/T256395) [09:37:28] (03Merged) 10jenkins-bot: Add REL1_35 to ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612511 (owner: 10Legoktm) [09:37:49] (03PS4) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [09:39:25] (03CR) 10Vgutierrez: [C: 03+1] VCL: log X-Cache-Int on cacheable Set-Cookie responses [puppet] - 10https://gerrit.wikimedia.org/r/612522 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [09:39:48] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Add REL1_35 as a candidate release (duration: 01m 06s) [09:39:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:35] (03PS4) 10Muehlenhoff: Switch matomo to CAS (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/612512 [09:41:31] (03CR) 10Elukey: "Looks good to me, there is a lighter layer of auth after LDAP so even in case of errors we could easily rollback without the risk of leaki" [puppet] - 10https://gerrit.wikimedia.org/r/612512 (owner: 10Muehlenhoff) [09:41:53] (03CR) 10Alexandros Kosiaris: [C: 04-1] "helmfiles also need to reference private/general.yaml" [deployment-charts] - 10https://gerrit.wikimedia.org/r/612521 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:42:14] (03CR) 10Ema: [C: 03+2] VCL: log X-Cache-Int on cacheable Set-Cookie responses [puppet] - 10https://gerrit.wikimedia.org/r/612522 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [09:43:38] (03CR) 10Jbond: "PCC https://puppet-compiler.wmflabs.org/compiler1002/23859/" [puppet] - 10https://gerrit.wikimedia.org/r/612514 (owner: 10Jbond) [09:43:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11896 and previous config saved to /var/cache/conftool/dbconfig/20200714-094354-marostegui.json [09:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:15] RECOVERY - k8s API server requests latencies on argon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:44:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P11897 and previous config saved to /var/cache/conftool/dbconfig/20200714-094449-marostegui.json [09:44:53] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:44:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:09] (03PS3) 10Arturo Borrero Gonzalez: openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) [09:45:23] (03PS2) 10Giuseppe Lavagetto: termbox: use https to reach the api, the puppet CA where needed. [deployment-charts] - 10https://gerrit.wikimedia.org/r/612521 (https://phabricator.wikimedia.org/T257887) [09:47:07] (03CR) 10Alexandros Kosiaris: "Not sure this is worth it tbh. We are moving recommendation-api in k8s and this is literally the only service. Perhaps it's better to do t" [puppet] - 10https://gerrit.wikimedia.org/r/612461 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [09:47:53] <_joe_> akosiaris: oh recommendation-api might be broken too [09:47:56] (03CR) 10Alexandros Kosiaris: [C: 03+1] termbox: use https to reach the api, the puppet CA where needed. [deployment-charts] - 10https://gerrit.wikimedia.org/r/612521 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:47:59] <_joe_> just working because it goes via the edge [09:48:08] <_joe_> same for mobileapps [09:48:19] <_joe_> it's still on scb correct? [09:48:25] (03PS2) 10Jforrester: Remove redundant beta config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608805 (owner: 10Awight) [09:48:31] (03CR) 10Jforrester: [C: 03+2] Remove redundant beta config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608805 (owner: 10Awight) [09:48:32] I think that mobileapps doesn't talk to the api, but let me verify [09:48:39] (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: use https to reach the api, the puppet CA where needed. [deployment-charts] - 10https://gerrit.wikimedia.org/r/612521 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:49:18] (03Merged) 10jenkins-bot: Remove redundant beta config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608805 (owner: 10Awight) [09:49:33] _joe_: config says that it does [09:49:34] but [09:49:35] mwapi_req: [09:49:35] method: post [09:49:35] uri: https://api-rw.discovery.wmnet/w/api.php [09:49:42] <_joe_> ok good [09:49:46] looks like it's only for updates and it uses HTTPS [09:49:48] (03Merged) 10jenkins-bot: termbox: use https to reach the api, the puppet CA where needed. [deployment-charts] - 10https://gerrit.wikimedia.org/r/612521 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [09:49:55] s/updates/edits/? [09:50:07] <_joe_> James_F: can you hold the deployment for a bit? [09:50:24] <_joe_> ok, so deploying termbox to staging first, then codfw, then eqiad [09:51:07] (03Abandoned) 10Jforrester: tests: Update local copy of SiteConfiguration.php to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534546 (owner: 10Jforrester) [09:53:15] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:53:16] _joe_: Yeah, I'm not doing anything in prod right now. [09:55:05] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:56:52] (03PS5) 10Muehlenhoff: Switch matomo to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) [09:57:19] (03CR) 10ArielGlenn: [C: 03+1] "Looks ok as far as the dumps worker profile change goes." [puppet] - 10https://gerrit.wikimedia.org/r/612514 (owner: 10Jbond) [10:00:10] (03PS1) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:01:29] (03CR) 10Elukey: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1003/23861/matomo1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [10:01:31] (03CR) 10jerkins-bot: [V: 04-1] P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 (owner: 10Jbond) [10:01:45] (03PS1) 10Giuseppe Lavagetto: termbox: actually use chart version 0.0.13 [deployment-charts] - 10https://gerrit.wikimedia.org/r/612524 [10:01:56] moritzm: shall we deploy and test matomo? [10:02:19] (03PS2) 10Giuseppe Lavagetto: termbox: remove chart version pinning [deployment-charts] - 10https://gerrit.wikimedia.org/r/598055 (owner: 10JMeybohm) [10:02:40] (03CR) 10Ladsgroup: [C: 03+1] termbox: remove chart version pinning [deployment-charts] - 10https://gerrit.wikimedia.org/r/598055 (owner: 10JMeybohm) [10:03:08] (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: remove chart version pinning [deployment-charts] - 10https://gerrit.wikimedia.org/r/598055 (owner: 10JMeybohm) [10:03:31] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 179 probes of 648 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:03:44] (03PS5) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [10:04:15] (03Merged) 10jenkins-bot: termbox: remove chart version pinning [deployment-charts] - 10https://gerrit.wikimedia.org/r/598055 (owner: 10JMeybohm) [10:04:37] (03PS2) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:05:07] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 83 probes of 564 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:06:05] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:06:09] (03CR) 10jerkins-bot: [V: 04-1] P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 (owner: 10Jbond) [10:06:37] (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: actually use chart version 0.0.13 [deployment-charts] - 10https://gerrit.wikimedia.org/r/612524 (owner: 10Giuseppe Lavagetto) [10:06:45] (03CR) 10jerkins-bot: [V: 04-1] termbox: actually use chart version 0.0.13 [deployment-charts] - 10https://gerrit.wikimedia.org/r/612524 (owner: 10Giuseppe Lavagetto) [10:06:58] (03PS3) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:07:01] _joe_: hold, CI -1ed you [10:07:55] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:08:57] (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/23862/" [puppet] - 10https://gerrit.wikimedia.org/r/612514 (owner: 10Jbond) [10:09:19] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 5 probes of 648 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:10:13] !log oblivian@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [10:10:13] !log oblivian@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [10:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:57] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 44 probes of 564 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:12:48] (03PS4) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:13:51] !log oblivian@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' . [10:13:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:55] !log Running AbuseFilter's updateVarDumps for group1 T246539 [10:14:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:00] T246539: Dry-run, then actually run updateVarDumps - https://phabricator.wikimedia.org/T246539 [10:15:24] <_joe_> why is a release taking this long? [10:16:04] (03PS4) 10Arturo Borrero Gonzalez: openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) [10:16:29] RECOVERY - termbox codfw on termbox.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [10:16:39] <_joe_> ^^ [10:16:42] <_joe_> \o/ [10:17:20] _joe_: ๐ŸŽ‰ [10:18:23] !log oblivian@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' . [10:18:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:15] RECOVERY - termbox eqiad on termbox.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [10:22:45] (03PS5) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:22:57] (03PS5) 10Arturo Borrero Gonzalez: openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) [10:23:06] (03CR) 10Jbond: "PCC (prod): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/23865" [puppet] - 10https://gerrit.wikimedia.org/r/612523 (owner: 10Jbond) [10:24:39] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 192 probes of 648 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:26:15] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 77 probes of 564 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:27:06] (03PS1) 10Elukey: Set BigTop for Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/612531 [10:27:40] (03PS6) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:27:42] (03CR) 10Elukey: [C: 03+2] Set BigTop for Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/612531 (owner: 10Elukey) [10:27:49] (03PS6) 10Arturo Borrero Gonzalez: openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) [10:28:24] (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) (owner: 10Arturo Borrero Gonzalez) [10:30:17] PROBLEM - Check systemd state on thanos-be1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:30:31] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 5 probes of 648 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:30:36] (03PS7) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:32:03] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 44 probes of 564 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:32:38] !log elukey@cumin1001 START - Cookbook sre.hadoop.stop-cluster [10:32:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:59] (03PS1) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 [10:33:29] PROBLEM - Host wtp2005 is DOWN: PING CRITICAL - Packet loss = 100% [10:33:33] (03CR) 10jerkins-bot: [V: 04-1] mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 (owner: 10Jbond) [10:34:00] (03PS2) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 [10:36:04] * volans looking at wtp2005 [10:38:12] not looking good, opening a task [10:38:33] (03CR) 10Jbond: [C: 04-1] Switch matomo to CAS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [10:39:21] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [10:39:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:30] !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro [10:39:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:01] (03PS1) 10JMeybohm: Remove cluster specific uri's (as they are not cluster specific) [deployment-charts] - 10https://gerrit.wikimedia.org/r/612535 (https://phabricator.wikimedia.org/T257887) [10:42:20] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [10:42:22] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:36] (03PS8) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:43:38] (03PS3) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 [10:43:49] (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/23871/" [puppet] - 10https://gerrit.wikimedia.org/r/612532 (owner: 10Jbond) [10:44:00] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) (owner: 10Arturo Borrero Gonzalez) [10:44:16] 10Operations, 10ops-codfw: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Volans) [10:44:32] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Remove cluster specific uri's (as they are not cluster specific) [deployment-charts] - 10https://gerrit.wikimedia.org/r/612535 (https://phabricator.wikimedia.org/T257887) (owner: 10JMeybohm) [10:45:02] !log depool wtp2005 [10:45:02] (03CR) 10Muehlenhoff: Switch matomo to CAS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [10:45:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:17] (03CR) 10JMeybohm: [C: 03+2] Remove cluster specific uri's (as they are not cluster specific) [deployment-charts] - 10https://gerrit.wikimedia.org/r/612535 (https://phabricator.wikimedia.org/T257887) (owner: 10JMeybohm) [10:45:38] !log jiji@cumin1001 conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid [10:45:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:47] !log jiji@cumin1001 conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid-php [10:45:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:46:28] (03Merged) 10jenkins-bot: Remove cluster specific uri's (as they are not cluster specific) [deployment-charts] - 10https://gerrit.wikimedia.org/r/612535 (https://phabricator.wikimedia.org/T257887) (owner: 10JMeybohm) [10:47:29] !log volans@cumin1001 conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet [10:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:23] (03CR) 10Jbond: "ready fore review" [puppet] - 10https://gerrit.wikimedia.org/r/612514 (owner: 10Jbond) [10:49:45] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:49:50] (03CR) 10Jbond: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/612523 (owner: 10Jbond) [10:50:09] 10Operations, 10ops-codfw: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10Volans) Depoled from confctl and marked as failed on Netbox. [10:51:38] (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/23871/" [puppet] - 10https://gerrit.wikimedia.org/r/612532 (owner: 10Jbond) [10:51:44] ACKNOWLEDGEMENT - Host wtp2005 is DOWN: PING CRITICAL - Packet loss = 100% Effie Mouzeli H/W issue T257903 [10:52:00] !log powerdown wtp2005, hardware issue - T257903 [10:52:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:05] T257903: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 [10:52:43] (03PS1) 10Arturo Borrero Gonzalez: prometheus: node_neutron_namespace: disable explicit monitoring [puppet] - 10https://gerrit.wikimedia.org/r/612537 (https://phabricator.wikimedia.org/T257552) [10:52:48] (03PS1) 10JMeybohm: Include private/general.yaml for staging as well [deployment-charts] - 10https://gerrit.wikimedia.org/r/612538 (https://phabricator.wikimedia.org/T257887) [10:53:52] (03PS9) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [10:54:02] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] prometheus: node_neutron_namespace: disable explicit monitoring [puppet] - 10https://gerrit.wikimedia.org/r/612537 (https://phabricator.wikimedia.org/T257552) (owner: 10Arturo Borrero Gonzalez) [10:54:53] RECOVERY - Check systemd state on thanos-be1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:55:45] (03PS6) 10Muehlenhoff: Switch matomo to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) [10:56:07] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Include private/general.yaml for staging as well [deployment-charts] - 10https://gerrit.wikimedia.org/r/612538 (https://phabricator.wikimedia.org/T257887) (owner: 10JMeybohm) [10:56:47] (03CR) 10JMeybohm: [C: 03+2] Include private/general.yaml for staging as well [deployment-charts] - 10https://gerrit.wikimedia.org/r/612538 (https://phabricator.wikimedia.org/T257887) (owner: 10JMeybohm) [10:56:50] !log volans@cumin1001 conftool action : set/pooled=inactive; selector: name=wtp2005.codfw.wmnet [10:56:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:00] (03Merged) 10jenkins-bot: Include private/general.yaml for staging as well [deployment-charts] - 10https://gerrit.wikimedia.org/r/612538 (https://phabricator.wikimedia.org/T257887) (owner: 10JMeybohm) [10:58:58] (03PS6) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [10:59:41] !log jayme@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . [10:59:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: (Dis)respected human, time to deploy European mid-day backport window(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1100). Please do the needful. [11:00:21] why disrespected :( [11:00:27] (looks like nothing to do) [11:00:40] We can use the time to rough up jouncebot :-) [11:00:52] Tsk. [11:00:59] (03PS1) 10Giuseppe Lavagetto: scap: check enwiki via https, not http [puppet] - 10https://gerrit.wikimedia.org/r/612540 (https://phabricator.wikimedia.org/T257887) [11:01:00] jouncebot: you wanna take this outside? [11:01:01] (03PS1) 10Giuseppe Lavagetto: pybal: check wikidata, not enwiki and expect 302 [puppet] - 10https://gerrit.wikimedia.org/r/612541 (https://phabricator.wikimedia.org/T257887) [11:01:06] Feel free to poke REL1_35 for issues if you have some spare time. :-) [11:01:11] <_joe_> uhm can we please pause the window? [11:01:13] (03PS7) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [11:01:17] (03PS10) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [11:01:20] <_joe_> we're still in high water with the incident [11:01:22] _joe_: thereโ€™s nothing in the window anyways [11:01:23] <_joe_> sorry [11:01:25] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1002/23875/" [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [11:01:26] <_joe_> oh great [11:02:05] (03PS4) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 [11:03:47] !log jayme@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [11:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:49] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:06:25] (03CR) 10Muehlenhoff: mcrouter: store defaults in module not in hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612532 (owner: 10Jbond) [11:06:47] (03CR) 10Alexandros Kosiaris: [C: 03+1] scap: check enwiki via https, not http [puppet] - 10https://gerrit.wikimedia.org/r/612540 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [11:09:11] (03CR) 10Alexandros Kosiaris: [C: 03+1] pybal: check wikidata, not enwiki and expect 302 [puppet] - 10https://gerrit.wikimedia.org/r/612541 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [11:09:41] (03CR) 10Giuseppe Lavagetto: [C: 03+2] scap: check enwiki via https, not http [puppet] - 10https://gerrit.wikimedia.org/r/612540 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [11:09:57] (03PS8) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 [11:10:09] (03PS11) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [11:10:14] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Across the fleet PCC is happy (https://puppet-compiler.wmflabs.org/compiler1001/23841/), +1ing" [puppet] - 10https://gerrit.wikimedia.org/r/609403 (owner: 10Alexandros Kosiaris) [11:13:18] (03PS12) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 [11:14:35] (03PS1) 10Arturo Borrero Gonzalez: openstack: monitor: neutron: nf_conntrack: run the NRPE check as root using sudo [puppet] - 10https://gerrit.wikimedia.org/r/612545 (https://phabricator.wikimedia.org/T257552) [11:14:59] (03CR) 10Giuseppe Lavagetto: [C: 03+2] pybal: check wikidata, not enwiki and expect 302 [puppet] - 10https://gerrit.wikimedia.org/r/612541 (https://phabricator.wikimedia.org/T257887) (owner: 10Giuseppe Lavagetto) [11:15:05] !log jayme@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [11:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:48] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: monitor: neutron: nf_conntrack: run the NRPE check as root using sudo [puppet] - 10https://gerrit.wikimedia.org/r/612545 (https://phabricator.wikimedia.org/T257552) (owner: 10Arturo Borrero Gonzalez) [11:15:53] PROBLEM - Host ripe-atlas-eqiad IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [11:16:40] (03CR) 10Jbond: "> Patch Set 8:" [puppet] - 10https://gerrit.wikimedia.org/r/612523 (owner: 10Jbond) [11:16:47] (03PS5) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 [11:17:50] (03PS6) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 [11:17:57] (03CR) 10Jbond: "updated thanks" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612532 (owner: 10Jbond) [11:18:35] 10Operations, 10serviceops: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10MoritzMuehlenhoff) [11:19:28] (03PS1) 10Alexandros Kosiaris: prometheus: Enable processes collector in k8s nodes [puppet] - 10https://gerrit.wikimedia.org/r/612546 (https://phabricator.wikimedia.org/T257679) [11:20:16] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Makes sense. +1 then" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610849 (https://phabricator.wikimedia.org/T257333) (owner: 10JMeybohm) [11:22:23] <_joe_> !log restart pybal on lvs1016 [11:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:14] <_joe_> !log restart pybal on lvs1015 T257887 [11:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:19] T257887: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 [11:26:49] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:28:10] 10Operations, 10observability, 10Patch-For-Review, 10Performance-Team (Radar): Revisit Grafana/Icinga notification strategy - https://phabricator.wikimedia.org/T203485 (10akosiaris) >>! In T203485#6296804, @ema wrote: > My entirely uninformed opinion is that having Grafana send email/irc notifications woul... [11:30:51] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:31:32] <_joe_> !log restart pybal on lvs2010 T257887 [11:31:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:40] T257887: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 [11:32:50] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:33:34] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [11:34:15] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:35:41] <_joe_> !log restart pybal on lvs2009 T257887 [11:35:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1130 for query plan checks T238966 ', diff saved to https://phabricator.wikimedia.org/P11898 and previous config saved to /var/cache/conftool/dbconfig/20200714-113612-marostegui.json [11:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:17] T238966: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 [11:40:35] 10Operations, 10RESTBase, 10Patch-For-Review: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 (10JMeybohm) >>! In T257887#6303600, @Joe wrote: > Resetting to high since we've fixed the immediate problem by reverting the MediaWiki patch. > > Before we roll it out again we... [11:41:32] (03PS1) 10Ema: ATS: fix X-Cache-Int for objects that failed revalidation [puppet] - 10https://gerrit.wikimedia.org/r/612550 [11:46:44] (03PS1) 10Kormat: mariadb: Promote es1021 to es4 master. [puppet] - 10https://gerrit.wikimedia.org/r/612551 (https://phabricator.wikimedia.org/T257847) [11:47:08] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Promote es1021 to es4 master. [puppet] - 10https://gerrit.wikimedia.org/r/612551 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [11:47:10] (03CR) 10Kormat: [C: 04-2] "Don't merge until switchover day." [puppet] - 10https://gerrit.wikimedia.org/r/612551 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [11:48:06] (03PS2) 10Kormat: mariadb: Promote es1021 to es4 master. [puppet] - 10https://gerrit.wikimedia.org/r/612551 (https://phabricator.wikimedia.org/T257847) [11:48:12] (03CR) 10Marostegui: [C: 04-1] mariadb: Promote es1021 to es4 master. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612551 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [11:50:20] (03CR) 10Marostegui: [C: 03+1] mariadb: Promote es1021 to es4 master. [puppet] - 10https://gerrit.wikimedia.org/r/612551 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [11:50:52] (03CR) 10Marostegui: [C: 03+1] "Race condition between my review and your new patch :)" [puppet] - 10https://gerrit.wikimedia.org/r/612551 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [11:50:59] _joe_: Will it be OK to proceed with the train (to group0) in 10 mins, or should I delay? [11:54:49] James_F: _joe_ is out for lunch. We decided to postpone the re-revert of the forceHTTPS/cors stuff to "later". If that does not affect you, I think you're good to go [11:55:06] jayme: Excellent. I'll continue as planned, then, thanks. [11:57:11] James_F: Cool. If anything looks suspicious, feel free to ping me. [11:59:22] Sure. [11:59:33] (03PS1) 10Jforrester: group0 wikis to 1.35.0-wmf.41 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612555 [11:59:35] (03CR) 10Jforrester: [C: 03+2] group0 wikis to 1.35.0-wmf.41 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612555 (owner: 10Jforrester) [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1200) [12:00:23] (03Merged) 10jenkins-bot: group0 wikis to 1.35.0-wmf.41 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612555 (owner: 10Jforrester) [12:01:54] !log jforrester@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.41 [12:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:31] is it intentional that the new version adds "Retrieved from "https://www.mediawiki.org/w/index.php?title=MediaWiki_1.36&oldid=3769286"" to the bottom of all pages? [12:05:40] <_joe_> James_F: ^^ [12:06:20] that [12:06:31] that's only on vector [12:06:40] https://www.mediawiki.org/wiki/MediaWiki it differs per page, but yes, I'm wondering that aswell [12:06:51] Hmm. [12:07:18] Yeah, that's odd [12:07:18] !log disable puppet ro reboot puppetdb's [12:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:22] that's only visible on the printable version on wmf.40 wikis [12:07:27] Oh! [12:07:34] want me to file a ticket? [12:07:35] !log disable puppet fleet wide to reboot puppetdb's [12:07:35] Probably the printable work the Web team were rushing. [12:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:39] Majavah: Please. [12:07:42] I think it [12:07:44] sure, doing [12:07:46] blocker? [12:07:53] 's OK to leave (given it's group0), but yes, blocker UBN deffo. [12:08:47] <_joe_> Majavah: it doesn't show up in other skins AFAICS [12:09:05] _joe_: I found that out earlier, see scrollback :P [12:09:31] <_joe_> heh sorry I was testing stuff too :P [12:09:39] _joe_: Yeah, Vector-only printable work. [12:10:36] filed T257914 [12:10:37] T257914: Vector now shows "Retrieved from "https://www.mediawiki.org/w/index.php?title=Project:Sandbox&oldid=3963051"" on all page views - https://phabricator.wikimedia.org/T257914 [12:10:56] Thanks. [12:14:58] (03PS1) 10Kormat: db-eqiad.php: Depool cluster26 (es4) from writes. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612559 (https://phabricator.wikimedia.org/T257847) [12:15:46] (03CR) 10Kormat: [C: 04-2] "Don't merge before failover time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612559 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [12:16:14] (03PS2) 10Kormat: db-eqiad.php: Depool cluster26 (es4) from writes. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612559 (https://phabricator.wikimedia.org/T257847) [12:16:49] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=jmx_puppetdb site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:17:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] Kask: Use Releng Cassandra Image (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/609894 (https://phabricator.wikimedia.org/T224041) (owner: 10Jeena Huneidi) [12:18:02] (03PS1) 10Kormat: wmnet: Update es4-master alias [dns] - 10https://gerrit.wikimedia.org/r/612560 (https://phabricator.wikimedia.org/T257847) [12:18:37] (03CR) 10Kormat: [C: 04-2] "Don't merge until switchover happens." [dns] - 10https://gerrit.wikimedia.org/r/612560 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [12:18:41] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:19:07] !log re-enable puppet fleet [12:19:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:27] PROBLEM - Check systemd state on puppetdb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:20:35] !log installing xen security updates (client-side tools/libs) [12:20:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:43] PROBLEM - Check systemd state on puppetdb2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:23:05] (03CR) 10Vgutierrez: [C: 03+1] ATS: fix X-Cache-Int for objects that failed revalidation [puppet] - 10https://gerrit.wikimedia.org/r/612550 (owner: 10Ema) [12:24:00] !log route ns0.wikimedia.org to codfw for reboot [12:24:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:08] (03CR) 10Alexandros Kosiaris: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/23878/kubernetes1001.eqiad.wmnet/index.html says ok, merging" [puppet] - 10https://gerrit.wikimedia.org/r/612546 (https://phabricator.wikimedia.org/T257679) (owner: 10Alexandros Kosiaris) [12:31:11] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0) [12:31:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:45] (03CR) 10Ema: [C: 03+2] ATS: fix X-Cache-Int for objects that failed revalidation [puppet] - 10https://gerrit.wikimedia.org/r/612550 (owner: 10Ema) [12:35:21] PROBLEM - Host authdns1001 is DOWN: PING CRITICAL - Packet loss = 100% [12:35:31] ^ me rebooting [12:35:55] (03CR) 10Elukey: "From pcc the change seems failing: https://puppet-compiler.wmflabs.org/compiler1001/23879/mc1019.eqiad.wmnet/change.mc1019.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [12:36:13] (03CR) 10Marostegui: [C: 03+1] wmnet: Update es4-master alias [dns] - 10https://gerrit.wikimedia.org/r/612560 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [12:36:19] RECOVERY - Host authdns1001 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [12:36:34] (03CR) 10Marostegui: [C: 03+1] db-eqiad.php: Depool cluster26 (es4) from writes. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612559 (https://phabricator.wikimedia.org/T257847) (owner: 10Kormat) [12:40:17] (03PS1) 10Elukey: Revert "Set BigTop for Hadoop test" [puppet] - 10https://gerrit.wikimedia.org/r/612491 [12:41:44] (03PS1) 10Alexandros Kosiaris: mobileapps: Add kubernetes nodes next to scb nodes [puppet] - 10https://gerrit.wikimedia.org/r/612567 (https://phabricator.wikimedia.org/T218733) [12:43:33] <_joe_> akosiaris: mobileapps will need the same https trick I guess [12:44:09] PROBLEM - Host authdns2001 is DOWN: PING CRITICAL - Packet loss = 100% [12:44:27] ^^ me reboot [12:44:49] RECOVERY - Host authdns2001 is UP: PING OK - Packet loss = 0%, RTA = 36.10 ms [12:45:38] (03CR) 10Giuseppe Lavagetto: "Before we do this we need to ensure mobileapps talks https to the appservers." [puppet] - 10https://gerrit.wikimedia.org/r/612567 (https://phabricator.wikimedia.org/T218733) (owner: 10Alexandros Kosiaris) [12:45:41] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff) [12:45:43] (03CR) 10Elukey: [C: 04-1] "Setting this to -1 just to double check, I am not getting one thing. The mcrouter::shards are used in profile::mediawiki::mcrouter_wancach" [puppet] - 10https://gerrit.wikimedia.org/r/612532 (owner: 10Jbond) [12:47:40] (03CR) 10Elukey: "Gerrit's UI tricked me, will check the related patches, sorry :)" [puppet] - 10https://gerrit.wikimedia.org/r/612532 (owner: 10Jbond) [12:48:24] (03PS1) 10Giuseppe Lavagetto: Revert "Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612492 [12:48:58] (03CR) 10Alexandros Kosiaris: [C: 03+1] Revert "Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612492 (owner: 10Giuseppe Lavagetto) [12:49:00] <_joe_> jouncebot: next [12:49:00] In 0 hour(s) and 10 minute(s): Mediawiki train - European+American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1300) [12:49:10] (03PS11) 10ZPapierski: Correct url and path for nginx OAuth 1.0a [puppet] - 10https://gerrit.wikimedia.org/r/609909 (https://phabricator.wikimedia.org/T251498) [12:49:13] <_joe_> James_F: can I go on with merging the re-revert? [12:49:18] (03CR) 10JMeybohm: [C: 03+1] Revert "Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612492 (owner: 10Giuseppe Lavagetto) [12:49:47] (03CR) 10Elukey: [C: 03+2] Revert "Set BigTop for Hadoop test" [puppet] - 10https://gerrit.wikimedia.org/r/612491 (owner: 10Elukey) [12:50:57] (03PS1) 10Muehlenhoff: Remove apt pin for stretch-backports for npm [puppet] - 10https://gerrit.wikimedia.org/r/612568 (https://phabricator.wikimedia.org/T256877) [12:51:10] <_joe_> I assume it's ok [12:51:30] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Revert "Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612492 (owner: 10Giuseppe Lavagetto) [12:52:16] (03Merged) 10jenkins-bot: Revert "Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612492 (owner: 10Giuseppe Lavagetto) [12:57:16] Clear from my end. [12:57:39] (^^ _joe_ ) [12:57:40] !log oblivian@deploy1001 Synchronized wmf-config/InitialiseSettings.php: revert forcehttps after fixing T257887 (duration: 01m 02s) [12:57:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:45] T257887: restbase: "featured" endpoint times out - https://phabricator.wikimedia.org/T257887 [12:57:49] PROBLEM - Host dns3001 is DOWN: PING CRITICAL - Packet loss = 100% [12:57:57] <_joe_> deploy done [12:57:59] RECOVERY - Host dns3001 is UP: PING OK - Packet loss = 0%, RTA = 83.48 ms [12:58:02] <_joe_> let's see if anything breaks [12:58:30] !log elukey@cumin1001 START - Cookbook sre.hadoop.stop-cluster [12:58:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:15] PROBLEM - BGP status on cr3-esams is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [12:59:25] 10Operations: FY2020-2021 Q1 codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10Marostegui) [12:59:50] is anybody working on dns3001? [12:59:55] jbond42: --^ ? [12:59:59] (just to double check) [13:00:05] James_F and longma: May I have your attention please! Mediawiki train - European+American Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1300) [13:00:05] elukey: yes i t just got rebooted did you see an issue? [13:00:21] oh sorry missed the alert [13:00:33] ah nono I just wanted to double check [13:00:36] thanks :) [13:01:05] RECOVERY - BGP status on cr3-esams is OK: BGP OK - up: 10, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [13:01:06] better to ask when the word "DNS" is returned by icinga :D [13:01:13] ack :) [13:01:29] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [13:01:30] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:53] !log rebooting dns3002 [13:01:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:14] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10Jclark-ctr) @Cmjohnson sorry spaced where flipped in netbox, netbox is correct now. an-test-worker1001 A5 30 WMF4833 an-test-worker1002 C5 34 WMF4834 an-te... [13:04:12] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10Jclark-ctr) @Cmjohnson correct racking an-test-master1001 A3 25 WMF4836 an-test-master1002 C3 33 WMF4837 an-test-coord1001 D3 18 WMF4838 [13:04:51] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job={bird,haproxy,pdnsrec} site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:05:45] PROBLEM - BFD status on cr2-esams is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [13:05:47] PROBLEM - BFD status on cr3-esams is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [13:06:03] ^^ looking gussing its related to the dns reboots [13:06:18] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [13:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:43] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:07:37] RECOVERY - BFD status on cr2-esams is OK: OK: UP: 10 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [13:07:37] RECOVERY - BFD status on cr3-esams is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [13:09:09] !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro [13:09:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:15] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org [13:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:39] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [13:10:39] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:50] !log reboot dns2001 [13:10:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:11] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org [13:13:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:26] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org [13:13:35] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [13:13:36] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:13:39] !log reboot dns2002 [13:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:09] PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [13:15:19] PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [13:16:00] ^^ expected [13:16:10] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org [13:16:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:29] PROBLEM - BFD status on cr2-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [13:18:20] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org [13:18:21] RECOVERY - BFD status on cr2-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [13:18:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:28] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [13:18:28] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:18:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:42] !log reboot dns1002 [13:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:53] RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 53, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [13:19:03] RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 52, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [13:22:34] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org [13:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:40] !log jbond@cumin1001 conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org [13:22:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:03] PROBLEM - Disk space on webperf1002 is CRITICAL: DISK CRITICAL - free space: /srv 11465 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf1002&var-datasource=eqiad+prometheus/ops [13:23:27] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:23:52] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime [13:23:53] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [13:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:03] !log reboot dns1001 [13:24:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:17] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:27:05] !log jbond@cumin1001 conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org [13:27:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11899 and previous config saved to /var/cache/conftool/dbconfig/20200714-132742-marostegui.json [13:27:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11900 and previous config saved to /var/cache/conftool/dbconfig/20200714-132823-marostegui.json [13:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:31] (03PS3) 10Jbond: prometheus::memcached_exporter: fix arguments hiera call [puppet] - 10https://gerrit.wikimedia.org/r/612507 [13:35:33] (03CR) 10Jbond: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/612507 (owner: 10Jbond) [13:38:13] (03PS7) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 (https://phabricator.wikimedia.org/T247956) [13:38:35] (03PS9) 10Jbond: profile::mediawiki::mcrouter_wancache: refactor [puppet] - 10https://gerrit.wikimedia.org/r/612514 (https://phabricator.wikimedia.org/T247956) [13:39:15] (03PS13) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 (https://phabricator.wikimedia.org/T247956) [13:39:35] (03PS14) 10Jbond: P:mediawiki::mcrouter_wancache: refactor parameters [puppet] - 10https://gerrit.wikimedia.org/r/612523 (https://phabricator.wikimedia.org/T247956) [13:39:50] (03PS8) 10Jbond: mcrouter: store defaults in module not in hiera [puppet] - 10https://gerrit.wikimedia.org/r/612532 (https://phabricator.wikimedia.org/T247956) [13:42:06] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0) [13:42:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:12] \o/ [13:42:33] this time even the rollback worked [13:50:32] (03PS1) 10Matthias Mullie: Fix case of directory name [extensions/WikibaseMediaInfo] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612494 [13:51:29] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1001&panelId=37 [13:56:17] (03PS1) 10Giuseppe Lavagetto: cxserver: use https to talk to MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/612583 [13:56:22] (03PS1) 10Elukey: sre.hadoop.change-distro.py: use a more specific name [cookbooks] - 10https://gerrit.wikimedia.org/r/612585 [14:00:51] (03CR) 10Volans: [C: 03+1] "Sure, as you see fit ๐Ÿ˜Š" [cookbooks] - 10https://gerrit.wikimedia.org/r/612585 (owner: 10Elukey) [14:01:32] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:01:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:35] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1001&panelId=37 [14:05:25] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:05:40] (03CR) 10Elukey: [C: 03+2] sre.hadoop.change-distro.py: use a more specific name [cookbooks] - 10https://gerrit.wikimedia.org/r/612585 (owner: 10Elukey) [14:05:42] (03CR) 10Alexandros Kosiaris: [C: 03+2] "@Kartik, we 'll deploy that, by FYI" [deployment-charts] - 10https://gerrit.wikimedia.org/r/612583 (owner: 10Giuseppe Lavagetto) [14:07:02] (03CR) 10Jforrester: "Want this deployed now?" [extensions/WikibaseMediaInfo] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612494 (owner: 10Matthias Mullie) [14:07:05] (03Merged) 10jenkins-bot: cxserver: use https to talk to MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/612583 (owner: 10Giuseppe Lavagetto) [14:07:21] (03Merged) 10jenkins-bot: sre.hadoop.change-distro.py: use a more specific name [cookbooks] - 10https://gerrit.wikimedia.org/r/612585 (owner: 10Elukey) [14:08:22] (03PS7) 10Muehlenhoff: Switch matomo to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612512 (https://phabricator.wikimedia.org/T159584) [14:08:53] (03CR) 10Matthias Mullie: "yes please :)" [extensions/WikibaseMediaInfo] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612494 (owner: 10Matthias Mullie) [14:09:41] (03PS1) 10DannyS712: Restore div wrapper around print footer [skins/Vector] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612495 (https://phabricator.wikimedia.org/T257914) [14:10:22] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: PDU Upgrade in C8 (July 14, 2pm-4pm UTC)) - https://phabricator.wikimedia.org/T257871 (10Jclark-ctr) starting on pdu upgrade now [14:11:12] starting on pdu upgrade eqiad No downtime is expected. https://phabricator.wikimedia.org/T257871 [14:11:25] !log oblivian@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [14:11:27] (03CR) 10Jforrester: [C: 03+2] Fix case of directory name [extensions/WikibaseMediaInfo] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612494 (owner: 10Matthias Mullie) [14:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:47] 10Operations, 10DBA, 10OTRS, 10serviceops: Create a parallel OTRS database with a freezed snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10jcrespo) [14:13:07] 10Operations, 10DBA, 10OTRS, 10serviceops: Create a parallel OTRS database with a freezed snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10jcrespo) p:05Triageโ†’03Medium [14:13:23] !log upgrading wikitech-static to mw 1.34.2 [14:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:57] !log oblivian@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' . [14:15:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:45] 10Operations, 10DBA, 10OTRS, 10serviceops: Create a parallel OTRS database with a freezed snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10jcrespo) I was planning on doing this slowly with @akosiaris so at the same time he learned about streamlined db provisioning system, but I... [14:16:53] (03PS1) 10Alexandros Kosiaris: changeprop: Talk to the API over HTTPS [deployment-charts] - 10https://gerrit.wikimedia.org/r/612588 (https://phabricator.wikimedia.org/T257887) [14:17:06] 10Operations, 10DBA, 10OTRS, 10serviceops: Create a parallel OTRS database with a frozen snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10Reedy) [14:18:30] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:18:30] !log jmm@cumin2001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) [14:18:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:55] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:18:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:46] 10Operations, 10DBA, 10OTRS, 10serviceops: Create a parallel OTRS database with a frozen snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10jcrespo) [14:20:57] (03CR) 10Matthias Mullie: "Thanks!" [extensions/WikibaseMediaInfo] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612494 (owner: 10Matthias Mullie) [14:20:58] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:21:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:48] (03PS1) 10Muehlenhoff: Improve error handling if malformed host is given [cookbooks] - 10https://gerrit.wikimedia.org/r/612591 [14:21:53] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:13] (03PS1) 10Ema: ATS: move Set-Cookie workaround to do_global_send_response [puppet] - 10https://gerrit.wikimedia.org/r/612592 (https://phabricator.wikimedia.org/T256395) [14:23:06] James_F are you available to deploy the backport for T257914 ? [14:23:07] T257914: Vector now shows "Retrieved from "https://www.mediawiki.org/w/index.php?title=Project:Sandbox&oldid=3963051"" on all page views - https://phabricator.wikimedia.org/T257914 [14:23:52] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:03] (03CR) 10Volans: [C: 03+1] "LGTM, wording nit inline." (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/612591 (owner: 10Muehlenhoff) [14:24:17] PROBLEM - Check that envoy is running on idp-test1001 is CRITICAL: CRITICAL - Expecting active but unit envoyproxy.service is failed https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [14:24:47] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1001&panelId=37 [14:25:08] (03PS1) 10Alexandros Kosiaris: eventgate: Switch stream_config_url to https [deployment-charts] - 10https://gerrit.wikimedia.org/r/612594 (https://phabricator.wikimedia.org/T257887) [14:25:31] PROBLEM - Check systemd state on idp-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:26:45] !log oblivian@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' . [14:26:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:04] (03CR) 10Muehlenhoff: Improve error handling if malformed host is given (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/612591 (owner: 10Muehlenhoff) [14:27:21] RECOVERY - Check systemd state on idp-test1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:27:47] (03CR) 10Vgutierrez: [C: 03+1] ATS: move Set-Cookie workaround to do_global_send_response [puppet] - 10https://gerrit.wikimedia.org/r/612592 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [14:27:57] RECOVERY - Check that envoy is running on idp-test1001 is OK: OK - envoyproxy.service is active https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [14:28:08] 10Operations, 10DBA, 10OTRS, 10serviceops: Create a parallel OTRS database with a frozen snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10jcrespo) https://en.wiktionary.org/wiki/freezed {icon hand-peace-o spin} [14:28:09] DannyS712: Yes, waiting for confirmation on Beta Cluster. [14:29:12] DannyS712: All looks good. Proceding. [14:29:29] (03CR) 10Jforrester: [C: 03+2] Restore div wrapper around print footer [skins/Vector] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612495 (https://phabricator.wikimedia.org/T257914) (owner: 10DannyS712) [14:31:29] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:31:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:28] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:33:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:33] (03CR) 10Volans: [C: 03+1] Improve error handling if malformed host is given (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/612591 (owner: 10Muehlenhoff) [14:35:08] (03CR) 10Hnowlan: [C: 03+2] changeprop: Talk to the API over HTTPS [deployment-charts] - 10https://gerrit.wikimedia.org/r/612588 (https://phabricator.wikimedia.org/T257887) (owner: 10Alexandros Kosiaris) [14:36:13] (03Merged) 10jenkins-bot: changeprop: Talk to the API over HTTPS [deployment-charts] - 10https://gerrit.wikimedia.org/r/612588 (https://phabricator.wikimedia.org/T257887) (owner: 10Alexandros Kosiaris) [14:38:03] 10Operations, 10MediaWiki-Authentication-and-authorization, 10Security-Team, 10Traffic, 10Security: Investigate usefulness of SameSite cookies for logged-in accounts - https://phabricator.wikimedia.org/T158604 (10Krinkle) [14:39:30] (03Merged) 10jenkins-bot: Fix case of directory name [extensions/WikibaseMediaInfo] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612494 (owner: 10Matthias Mullie) [14:40:21] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [14:40:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:30] (03CR) 10ZPapierski: [C: 04-1] "/sparql handling is incorrect and causes issues after cookie expires." [puppet] - 10https://gerrit.wikimedia.org/r/609909 (https://phabricator.wikimedia.org/T251498) (owner: 10ZPapierski) [14:42:26] !log stopping db1117:3322 (m2) replication temp. for otrs db cloning T257928 [14:42:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:31] T257928: Create a parallel OTRS database with a frozen snapshot of the production one - https://phabricator.wikimedia.org/T257928 [14:44:55] (03PS1) 10Andrew Bogott: nova_fixed_multi: fix a search/replace issue that broke .eqiad.wmflabs entries [puppet] - 10https://gerrit.wikimedia.org/r/612598 [14:46:17] (03CR) 10Andrew Bogott: [C: 03+2] nova_fixed_multi: fix a search/replace issue that broke .eqiad.wmflabs entries [puppet] - 10https://gerrit.wikimedia.org/r/612598 (owner: 10Andrew Bogott) [14:46:39] (03PS1) 10Alexandros Kosiaris: mobileapps: Remove unused config stanza [deployment-charts] - 10https://gerrit.wikimedia.org/r/612599 [14:46:41] (03PS1) 10Alexandros Kosiaris: mobileapps: Talk to API over HTTPS [deployment-charts] - 10https://gerrit.wikimedia.org/r/612600 (https://phabricator.wikimedia.org/T257887) [14:48:00] !log rebooting apt1001 for kernel update [14:48:01] (03CR) 10Giuseppe Lavagetto: [C: 04-1] mobileapps: Talk to API over HTTPS (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/612600 (https://phabricator.wikimedia.org/T257887) (owner: 10Alexandros Kosiaris) [14:48:02] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:48:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:55] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.41/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: Fix case of directory name (duration: 01m 05s) [14:48:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:22] (03PS2) 10Alexandros Kosiaris: mobileapps: Talk to API over HTTPS [deployment-charts] - 10https://gerrit.wikimedia.org/r/612600 (https://phabricator.wikimedia.org/T257887) [14:49:49] (03CR) 10Jforrester: "Deployed." [extensions/WikibaseMediaInfo] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612494 (owner: 10Matthias Mullie) [14:50:08] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:50:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:41] (03PS1) 10Jbond: envoyproxy: add ability to also listen on IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/612603 [14:51:46] (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: add ability to also listen on IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/612603 (owner: 10Jbond) [14:52:29] (03CR) 10Ema: [C: 03+2] ATS: move Set-Cookie workaround to do_global_send_response [puppet] - 10https://gerrit.wikimedia.org/r/612592 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [14:53:05] !log hnowlan@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' . [14:53:38] 10Operations, 10Scap, 10serviceops: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10LarsWirzenius) 05Openโ†’03Resolved --canary-wait-time has been included in a release and announced to the public and used on multiple trains now. Closing task. [14:53:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:44] (03Merged) 10jenkins-bot: Restore div wrapper around print footer [skins/Vector] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612495 (https://phabricator.wikimedia.org/T257914) (owner: 10DannyS712) [14:56:18] (03PS2) 10Jbond: envoyproxy: add ability to also listen on IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/612603 [14:57:11] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.41/skins/Vector/includes/SkinVector.php: T257914 Restore div wrapper around print footer (duration: 01m 03s) [14:57:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:21] T257914: Vector now shows printfooter on all page views - https://phabricator.wikimedia.org/T257914 [14:58:30] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10Maps (Kartotherian): Geoshapes service is not sending 'access-control-allow-origin' header to some requests - https://phabricator.wikimedia.org/T241644 (10MSantos) On another note, the requests that are not failing are also missing cache `x-... [14:58:39] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1006-cloudelastic-chi-eqiad on cloudelastic1006 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1006&panelId=37 [14:59:08] 10Operations, 10Traffic, 10Patch-For-Review: planet.wm.org missing from planet.discovery.wmnet Subject Alternative Name - https://phabricator.wikimedia.org/T257840 (10ema) >>! In T257840#6302796, @Dzahn wrote: > And I did not finish that yet.. what you describe is the next step, add it to the cert. I'll do t... [14:59:16] 10Operations, 10Traffic, 10Patch-For-Review: planet.wm.org missing from planet.discovery.wmnet Subject Alternative Name - https://phabricator.wikimedia.org/T257840 (10ema) p:05Triageโ†’03Low [15:00:05] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1005-cloudelastic-chi-eqiad on cloudelastic1005 is OK: (C)100 gt (W)80 gt 71.19 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1005&panelId=37 [15:01:18] 10Operations, 10Performance-Team: webpref1002 server close to have /srv partition full - https://phabricator.wikimedia.org/T257931 (10jcrespo) [15:02:23] (03CR) 10Mholloway: [C: 03+2] "Thanks. I believe I mistakenly copypasted this from the wikifeeds chart (where the config value is actually used)." [deployment-charts] - 10https://gerrit.wikimedia.org/r/612599 (owner: 10Alexandros Kosiaris) [15:02:55] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [15:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:17] PROBLEM - Host ps1-c8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [15:03:17] hi operations folks! I'd like to understand a bit more about DNS and redirects for donate.wiki[mp]edia.org [15:03:35] it looks like that wiki has been available at both domains for a while now [15:03:37] (03Merged) 10jenkins-bot: mobileapps: Remove unused config stanza [deployment-charts] - 10https://gerrit.wikimedia.org/r/612599 (owner: 10Alexandros Kosiaris) [15:04:12] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [15:04:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:16] just that the root URL at either donate.wikiMedia.org or donate.wikiPedia.org redirects to donate.wM.org/Special:FundraiserRedirector [15:04:59] That's all fine - we just need to serve the TY page from donate.wP.org so we can keep setting banner hide cookies on the wikiPedias for people that donate [15:05:13] and I want to be sure we're not exploiting some weird glitch [15:05:24] that makes the TY page available at donate.wP.org [15:05:55] also, to maybe make that dual-availability explicit in the documentation of that root URL redirect and dns [15:06:24] (TY page means thank you page, for context) [15:06:38] (03PS1) 10Jbond: idp: enable ipv6 for IDP roles [puppet] - 10https://gerrit.wikimedia.org/r/612605 [15:07:26] also, out of curiousity, other fundraising types would like to know how long it's been available on both domains (if that's an easy question to answer) [15:08:00] I've put down the bits I understand so far in this phab comment: https://phabricator.wikimedia.org/T251780#6303208 [15:08:01] (03CR) 10Jbond: "Ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/612605 (owner: 10Jbond) [15:08:55] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:09:51] I don't see the Special:FundraiserRedirector (target of the root URL redirect) mentioned anywhere in the operations/mediawiki-config repo [15:10:01] but as I understand, that's just the PHP side of things [15:10:36] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [15:10:37] so would that redirect be at the cache level? or in the apache / nginx site config? [15:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:46] which repos should I search to check for those? [15:11:28] Anyway, if anyone wants to answer async at this ticket it would be super helpful! https://phabricator.wikimedia.org/T251780 [15:12:22] ejegg: I believe it happens in apache redirects, configured via the puppet repo: https://gerrit.wikimedia.org/g/operations/puppet/+/57ac6c8378bf98e21a717856046930d82b2e3bca/modules/mediawiki/manifests/web/prod_sites.pp#106 [15:12:26] (03CR) 10Jbond: "ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/612603 (owner: 10Jbond) [15:12:36] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [15:12:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:17] thanks cdanis! [15:13:54] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [15:13:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:14:29] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:16:50] (03CR) 10Ottomata: [C: 03+1] eventgate: Switch stream_config_url to https [deployment-charts] - 10https://gerrit.wikimedia.org/r/612594 (https://phabricator.wikimedia.org/T257887) (owner: 10Alexandros Kosiaris) [15:17:49] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [15:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:48] (03PS1) 10Muehlenhoff: Stop including backports on Stretch production hosts [puppet] - 10https://gerrit.wikimedia.org/r/612612 (https://phabricator.wikimedia.org/T256881) [15:23:50] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [15:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:03] (03CR) 10jerkins-bot: [V: 04-1] Stop including backports on Stretch production hosts [puppet] - 10https://gerrit.wikimedia.org/r/612612 (https://phabricator.wikimedia.org/T256881) (owner: 10Muehlenhoff) [15:25:52] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [15:25:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:06] (03PS2) 10Muehlenhoff: Remove stretch-backports from bootstrapvz config [puppet] - 10https://gerrit.wikimedia.org/r/610121 (https://phabricator.wikimedia.org/T256881) [15:32:49] (03PS1) 10Ema: VTC: override X-Cache iff the origin sends X-Cache-Int-Testing [puppet] - 10https://gerrit.wikimedia.org/r/612616 [15:33:38] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: PDU Upgrade in C8 (July 14, 2pm-4pm UTC)) - https://phabricator.wikimedia.org/T257871 (10Jclark-ctr) [15:34:38] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: PDU Upgrade in C8 (July 14, 2pm-4pm UTC)) - https://phabricator.wikimedia.org/T257871 (10Jclark-ctr) [15:36:13] (03PS2) 10Muehlenhoff: Improve error handling if malformed host is given [cookbooks] - 10https://gerrit.wikimedia.org/r/612591 [15:37:02] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/612603 (owner: 10Jbond) [15:37:51] (03PS10) 10MSantos: charts for push-notification service [deployment-charts] - 10https://gerrit.wikimedia.org/r/602390 (https://phabricator.wikimedia.org/T250493) [15:37:53] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, but let's split this and first do idp-test?" [puppet] - 10https://gerrit.wikimedia.org/r/612605 (owner: 10Jbond) [15:39:25] (03PS2) 10Ema: VTC: override X-Cache iff the origin sends X-Cache-Int-Testing [puppet] - 10https://gerrit.wikimedia.org/r/612616 (https://phabricator.wikimedia.org/T256395) [15:39:42] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/612616 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [15:40:39] !log hnowlan@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' . [15:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:57] PROBLEM - Host ms-be1024 is DOWN: PING CRITICAL - Packet loss = 100% [15:41:57] (03CR) 10Ema: [C: 03+2] VTC: override X-Cache iff the origin sends X-Cache-Int-Testing [puppet] - 10https://gerrit.wikimedia.org/r/612616 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [15:45:25] RECOVERY - Host ps1-c8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.22 ms [15:49:25] PROBLEM - ps1-c8-eqiad-infeed-load-tower-A-phase-Z on ps1-c8-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:49:33] PROBLEM - ps1-c8-eqiad-infeed-load-tower-A-phase-Y on ps1-c8-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:49:37] PROBLEM - ps1-c8-eqiad-infeed-load-tower-B-phase-Y on ps1-c8-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:49:57] PROBLEM - ps1-c8-eqiad-infeed-load-tower-A-phase-X on ps1-c8-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:50:13] PROBLEM - ps1-c8-eqiad-infeed-load-tower-B-phase-Z on ps1-c8-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:50:19] PROBLEM - ps1-c8-eqiad-infeed-load-tower-B-phase-X on ps1-c8-eqiad is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:51:18] ^ expected, PDU upgrade today [15:51:37] 10Operations, 10PDF-Rendering, 10Product-Infrastructure-Team-Backlog, 10Proton, and 2 others: PDF renderer needs better CJK font - https://phabricator.wikimedia.org/T226633 (10Mholloway) 05Openโ†’03Resolved a:03Mholloway Proton is now using fonts-noto-cjk and fonts-noto-cjk-extra. [15:52:01] cdanis: hello, if you have a while, could you puppet-merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/609549 for me, please? Thanks! [15:52:44] Urbanecm: I can grab that [15:52:49] ok, thanks! [15:52:59] (03CR) 10RLazarus: [C: 03+2] Update urbanecm's .gitconfig [puppet] - 10https://gerrit.wikimedia.org/r/609549 (owner: 10Urbanecm) [15:53:08] thanks rzl [15:53:14] Urbanecm: you should also look at https://phabricator.wikimedia.org/P8871 [15:53:16] :) [15:54:21] Urbanecm: {{done}} [15:54:28] thanks again :) [15:54:33] no problem :) [15:56:40] !log upgrade spark2 on stat100x to 2.4.4-bin-hadoop2.6-3 [15:56:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:04] mutante, qchris, and paladox: That opportune time is upon us again. Time for a Special Gerrit window(Gerrit SSH host keys will change) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1600). [16:03:10] (03PS1) 10CRusnov: Upgrade Netbox to v2.8.7-wmf [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/612624 [16:03:33] (03CR) 10CRusnov: "This change is ready for review." [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/612624 (owner: 10CRusnov) [16:04:58] jouncebot: it's been postponed [16:05:47] 10Operations, 10Performance-Team: webpref1002 server close to have /srv partition full - https://phabricator.wikimedia.org/T257931 (10Krinkle) p:05Triageโ†’03High Further zoomed out: jouncebot: refresh [16:05:55] I refreshed my knowledge about deployments. [16:06:16] 10Operations, 10Performance-Team: webpref1002 server close to have /srv partition full - https://phabricator.wikimedia.org/T257931 (10Krinkle) a:03dpifke [16:06:25] 10Operations, 10Performance-Team: webperf1002 server close to have /srv partition full - https://phabricator.wikimedia.org/T257931 (10Krinkle) [16:06:32] 10Operations, 10Arc-Lamp, 10Performance-Team: webperf1002 server close to have /srv partition full - https://phabricator.wikimedia.org/T257931 (10Krinkle) [16:18:27] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:21:49] 10Operations, 10vm-requests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10Dzahn) [16:22:09] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:22:53] 10Operations, 10vm-requests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10Dzahn) This is supposed to be a VM with buster, to host testreduce (nodejs, formerly on scandium, the parsoid test host on real hardware). More details on T257906 CPU and disk requirements still to... [16:23:07] 10Operations, 10vm-requests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10Dzahn) a:03Dzahn [16:23:17] 10Operations, 10vm-requests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10Dzahn) [16:23:19] 10Operations, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Dzahn) [16:29:11] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [16:30:00] (03PS1) 10Chico Venancio: Update brwikimedia logo and add upscaled versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612629 (https://phabricator.wikimedia.org/T257925) [16:34:07] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:35:59] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:38:53] (03Restored) 10Gergล‘ Tisza: Empty change to test CI [extensions/Wikibase] (wmf/1.35.0-wmf.40) - 10https://gerrit.wikimedia.org/r/610112 (owner: 10Lucas Werkmeister (WMDE)) [16:39:01] (03CR) 10Gergล‘ Tisza: "recheck" [extensions/Wikibase] (wmf/1.35.0-wmf.40) - 10https://gerrit.wikimedia.org/r/610112 (owner: 10Lucas Werkmeister (WMDE)) [16:41:34] Urbanecm: regarding T257925, are the added details sufficient or should I add more? [16:41:37] T257925: Update brwikimedia logo - https://phabricator.wikimedia.org/T257925 [16:41:41] (03CR) 10Majavah: [C: 04-1] "this needs to be split into two patches, see https://wikitech.wikimedia.org/wiki/Wikimedia_site_requests#Change_the_logo_of_a_Wikimedia_wi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612629 (https://phabricator.wikimedia.org/T257925) (owner: 10Chico Venancio) [16:43:02] chicocvenancio: that helps, could you add a link to the new logo too, please? :) [16:44:35] 10Operations, 10serviceops, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ssastry) [16:47:30] done, now looking at splitting into two as per Majavah's comment [16:48:01] (03PS3) 10Ahmon Dancy: Allow aptly::repo commands to run as alternate user [puppet] - 10https://gerrit.wikimedia.org/r/611457 (https://phabricator.wikimedia.org/T250157) [16:51:33] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [16:54:50] am I missing something? Or are the examples in https://wikitech.wikimedia.org/wiki/Wikimedia_site_requests#Change_the_logo_of_a_Wikimedia_wiki in a single patch? [16:57:26] chicocvenancio: see the second-last bullet [16:57:38] the logos need to be in one patch and config changes in another [17:00:04] halfak and accraze: #bothumor I ๏ฟฝ Unicode. All rise for Services โ€“ Graphoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1700). [17:05:15] Majavah: yes, I'm not that proficient at gerrit so went to look at examples linked, specially since the "correct" order is not defined there. [17:05:39] am I correct to assume the order should be logos then config? [17:05:48] yes, logos then config [17:05:54] let me see if I can find a more recent example [17:06:11] 10Operations, 10vm-requests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10ssastry) @Dzahn please note that my handle is @ssastry - I know it is confusing with multiple subbus and almost identical first names. Removing the other subbu from the task. [17:07:22] Majavah: thanks let me see if can wrangle that [17:08:16] one example is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/592506 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/592507 [17:11:11] (03PS1) 10C. Scott Ananian: Bump Parsoid to v0.12.0-a22 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612634 (https://phabricator.wikimedia.org/T252448) [17:15:42] 10Operations, 10Arc-Lamp, 10Performance-Team: webperf1002 server close to have /srv partition full - https://phabricator.wikimedia.org/T257931 (10dpifke) The growth is all in the raw log files (~270GB of logs, ~360MB of SVGs, ~310MB for XHGui), which seem to be increasing in size as of the start of this mont... [17:16:09] 10Operations, 10vm-requests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10ssastry) So, here is how rt-testing works. There is a testreduce server which is nodejs code that needs a connection to a mysql db. There are testreduce clients that run nodejs code to fetch page titl... [17:17:04] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: TY pages in a subdomain of wikipedia and set hide banner cookie - https://phabricator.wikimedia.org/T251780 (10DStrine) [17:17:19] 10Operations, 10Parsoid, 10vm-requests, 10Parsoid-Tests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10ssastry) [17:17:28] 10Operations, 10Parsoid, 10vm-requests, 10Parsoid-Tests: eqiad: 1 VM request for testreduce - https://phabricator.wikimedia.org/T257940 (10Dzahn) >>! In T257940#6305264, @ssastry wrote: > @Dzahn please note that my handle is @ssastry - I know it is confusing with multiple subbus and almost identical first... [17:17:51] 10Operations, 10Parsoid, 10serviceops, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ssastry) [17:22:03] (03PS2) 10Chico Venancio: Update brwikimedia logo and add upscaled versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612629 (https://phabricator.wikimedia.org/T257925) [17:27:44] (03PS1) 10Chico Venancio: Update brwikimedia logo and add upscaled versions (config) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612636 (https://phabricator.wikimedia.org/T257925) [17:31:27] (03PS2) 10C. Scott Ananian: Bump Parsoid to v0.12.0-a22 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612634 (https://phabricator.wikimedia.org/T252448) [17:37:58] (03CR) 10Subramanya Sastry: [C: 03+1] "+1 from me. I'll let you all figure out the ordering of operations." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612568 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [17:38:37] (03PS4) 10Ahmon Dancy: Allow aptly::repo commands to run as alternate user [puppet] - 10https://gerrit.wikimedia.org/r/611457 (https://phabricator.wikimedia.org/T250157) [17:39:31] (03CR) 10jerkins-bot: [V: 04-1] Allow aptly::repo commands to run as alternate user [puppet] - 10https://gerrit.wikimedia.org/r/611457 (https://phabricator.wikimedia.org/T250157) (owner: 10Ahmon Dancy) [17:40:52] (03PS5) 10Ahmon Dancy: Allow aptly::repo commands to run as alternate user [puppet] - 10https://gerrit.wikimedia.org/r/611457 (https://phabricator.wikimedia.org/T250157) [17:47:08] (03PS3) 10Jdlrobson: Drop main page special casing on all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612423 (https://phabricator.wikimedia.org/T32405) [17:55:06] (03PS4) 10Jdlrobson: Drop main page special casing on all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612423 (https://phabricator.wikimedia.org/T32405) [17:55:37] (03PS3) 10Jforrester: Bump Parsoid to v0.12.0-a22 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612634 (https://phabricator.wikimedia.org/T252448) (owner: 10C. Scott Ananian) [17:55:47] (03Abandoned) 10Gergล‘ Tisza: Empty change to test CI [extensions/Wikibase] (wmf/1.35.0-wmf.40) - 10https://gerrit.wikimedia.org/r/610112 (owner: 10Lucas Werkmeister (WMDE)) [17:56:34] (03CR) 10Ahmon Dancy: "Puppet compiler results: https://puppet-compiler.wmflabs.org/compiler1001/23888/deployment-deploy01.deployment-prep.eqiad.wmflabs/index.ht" [puppet] - 10https://gerrit.wikimedia.org/r/611457 (https://phabricator.wikimedia.org/T250157) (owner: 10Ahmon Dancy) [17:57:02] (03CR) 10Jforrester: [C: 03+2] Bump Parsoid to v0.12.0-a22 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612634 (https://phabricator.wikimedia.org/T252448) (owner: 10C. Scott Ananian) [17:59:09] RECOVERY - Host ms-be1024 is UP: PING WARNING - Packet loss = 33%, RTA = 0.16 ms [18:00:04] RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Morning backport window(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1800). [18:00:04] Jdlrobson: A patch you scheduled for Morning backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:26] Jdlrobson: I can do it if you want? [18:00:40] Some nice symmetry in being shot of it for good. [18:01:44] (03PS5) 10Jforrester: Drop main page special casing on all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612423 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [18:01:48] (03CR) 10Jforrester: [C: 03+2] Drop main page special casing on all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612423 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [18:02:59] (03Merged) 10jenkins-bot: Drop main page special casing on all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612423 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [18:05:44] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T32405 T254287 Stop varying wgMFSpecialCaseMainPage (duration: 01m 05s) [18:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:52] T254287: Final warning: Mobile main page special casing will be disabled July - https://phabricator.wikimedia.org/T254287 [18:05:53] T32405: [EPIC] MobileFrontend extension should stop special-casing main page - https://phabricator.wikimedia.org/T32405 [18:07:31] !log jforrester@deploy1001 Synchronized multiversion/MWConfigCacheGenerator.php: T32405 T254287 Stop loading the mobilemainpagelegacy dblist (duration: 01m 05s) [18:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:37] here [18:07:40] thanks James_F [18:08:30] want me to verify? [18:08:57] Jdlrobson: All done, I think. [18:09:02] HURRAH [18:09:06] that feels good [18:09:09] Jdlrobson: Wikis which are broken should fix themselves. [18:09:12] Congratulations. [18:09:36] !log jforrester@deploy1001 Synchronized dblists/: T32405 T254287 Remove the mobilemainpagelegacy dblist (duration: 01m 04s) [18:09:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:33] Jdlrobson: Should I leave marking the tasks as Resolved to you? [18:13:02] !log all long-running elasticsearch reindex jobs are complete [18:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:04] (03Merged) 10jenkins-bot: Bump Parsoid to v0.12.0-a22 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612634 (https://phabricator.wikimedia.org/T252448) (owner: 10C. Scott Ananian) [18:16:53] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [18:21:55] 10Operations, 10Traffic: https://blog.wikimedia.org/ returning blank page? - https://phabricator.wikimedia.org/T257948 (10Nuria) [18:25:57] PROBLEM - HP RAID on ms-be1024 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Cache: Permanently Disabled - Battery count: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [18:26:00] ACKNOWLEDGEMENT - HP RAID on ms-be1024 is CRITICAL: CRITICAL: Slot 3: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Cache: Permanently Disabled - Battery count: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T257949 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Inform [18:26:04] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1024 - https://phabricator.wikimedia.org/T257949 (10ops-monitoring-bot) [18:29:32] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/612591 (owner: 10Muehlenhoff) [18:33:09] 10Operations, 10Traffic: https://blog.wikimedia.org/ returning blank page? - https://phabricator.wikimedia.org/T257948 (10Dzahn) This is not hosted on WMF infrastructure anymore. blog.wikimedia.org is an alias for blog-wikimedia-org.go-vip.net. [18:33:30] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [18:34:28] 10Operations, 10Traffic: https://blog.wikimedia.org/ returning blank page? - https://phabricator.wikimedia.org/T257948 (10Dzahn) But it does not seem to be a general outage at Wordpress VIP either, because techblog.wikimedia.org also points there and still works. [18:44:09] (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/612624 (owner: 10CRusnov) [18:47:51] (03PS1) 10C. Scott Ananian: Bump Parsoid to v0.12.0-a23 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612499 [18:48:38] (03CR) 10Jforrester: [C: 03+2] Bump Parsoid to v0.12.0-a23 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612499 (owner: 10C. Scott Ananian) [18:50:56] 10Operations, 10Traffic: https://blog.wikimedia.org/ returning blank page? - https://phabricator.wikimedia.org/T257948 (10wkandek) slack conversation in #general indicates it is part of the blog migration and will be addressed soon. [18:52:00] 10Operations, 10Traffic: https://blog.wikimedia.org/ returning blank page? - https://phabricator.wikimedia.org/T257948 (10wkandek) Christopher Koerner:e-mail: 35 minutes ago Once I fix what is broken all the posts from blog.wikimedia.org will be at diff.wikimedia.org with redirects. Greg Varnum:black_squar... [18:53:34] (03PS1) 10Bstorm: cloud-nfs: Allow changing the nfs mount version [puppet] - 10https://gerrit.wikimedia.org/r/612647 (https://phabricator.wikimedia.org/T257945) [19:00:04] James_F and longma: My dear minions, it's time we take the moon! Just kidding. Time for Mediawiki train - European+American Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T1900). [19:00:38] longma: No need for the train (group0 is done), but the parsoid patch will need deploying. I'll do it in about an hour once I'm back. [19:01:39] Okay, thanks James_F. I'll be around then [19:07:25] (03Merged) 10jenkins-bot: Bump Parsoid to v0.12.0-a23 [vendor] (wmf/1.35.0-wmf.41) - 10https://gerrit.wikimedia.org/r/612499 (owner: 10C. Scott Ananian) [19:08:32] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [19:10:22] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [19:32:28] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [19:33:43] (03Abandoned) 10C. Scott Ananian: Make GC in PHP 7.2 configurable in Parsoid, but don't change production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610341 (https://phabricator.wikimedia.org/T257462) (owner: 10C. Scott Ananian) [19:52:42] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [19:52:46] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.41/vendor/wikimedia/parsoid/: T252448 T255190 Bump Parsoid to v0.12.0-a23 (duration: 01m 06s) [19:52:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:52] T255190: HTML4BlockTags is a lie - https://phabricator.wikimedia.org/T255190 [19:52:52] T252448: Decide what to do with size of parsoid and its dependencies (langconv) vendor/tarballs - https://phabricator.wikimedia.org/T252448 [19:55:34] PROBLEM - Host db1131.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [19:58:11] (03PS1) 10Herron: mx: reject excessive web.de bounces [puppet] - 10https://gerrit.wikimedia.org/r/612653 [19:59:11] (03CR) 10Herron: [C: 03+2] mx: reject excessive web.de bounces [puppet] - 10https://gerrit.wikimedia.org/r/612653 (owner: 10Herron) [20:14:35] (03PS1) 10DCausse: [cirrus] use more neutral config var names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612655 [20:14:44] (03CR) 10jerkins-bot: [V: 04-1] [cirrus] use more neutral config var names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612655 (owner: 10DCausse) [20:22:57] 10Operations, 10Wikimedia-General-or-Unknown, 10Wikimedia-SVG-rendering, 10Documentation: Document how to request installing additional SVG and PDF fonts on Wikimedia servers - https://phabricator.wikimedia.org/T228591 (10Mholloway) I can't speak to SVG rendering, but PDF rendering at Wikimedia is actually... [20:44:20] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [20:46:10] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [20:50:14] RECOVERY - Host db1131.mgmt is UP: PING OK - Packet loss = 0%, RTA = 18.45 ms [20:50:14] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:05:02] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:12:19] (03CR) 10Cwhite: "There is a problem in the "Attempting to satisfy build-dependencies" build step on deneb. For some reason, pbuilder is unable to install " [debs/grafana-loki] (debian/sid) - 10https://gerrit.wikimedia.org/r/610864 (owner: 10Cwhite) [21:16:02] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:21:05] 10Operations, 10ops-eqiad: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257983 (10ops-monitoring-bot) [21:23:59] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257983 (10Majavah) [21:31:33] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257983 (10wiki_willy) [21:31:38] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257253 (10wiki_willy) [21:31:52] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257983 (10wiki_willy) a:03Jclark-ctr Duplicate task of T257253 [21:32:50] 10Operations, 10ops-eqiad: Interface errors on asw2-b-eqiad:ge-5/0/35 (kubernetes1010) - https://phabricator.wikimedia.org/T257542 (10wiki_willy) a:03Cmjohnson [21:33:06] 10Operations, 10ops-eqiad: Interface errors on asw2-d-eqiad:xe-7/0/0 (ms-be1037) - https://phabricator.wikimedia.org/T257541 (10wiki_willy) a:03Cmjohnson [21:37:14] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1024 - https://phabricator.wikimedia.org/T257949 (10wiki_willy) a:03Cmjohnson @Cmjohnson - looks like this one is out of warranty. (purchased in June 2016) Let me know if you have any spares onsite or if you need a part ordered. Thanks, Willy [21:55:57] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [21:59:39] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [21:59:52] 10Operations, 10ops-codfw: wtp2005 hardware issue - https://phabricator.wikimedia.org/T257903 (10wiki_willy) a:03Papaul @akosiaris - looks like this server is past the 5yr server life cycle, and was due to be refreshed via T231255. Let us know if we can ignore this alert. Thanks, Willy [22:03:06] (03PS2) 10Bstorm: cloud-nfs: Allow changing the nfs mount version [puppet] - 10https://gerrit.wikimedia.org/r/612647 (https://phabricator.wikimedia.org/T257945) [22:04:19] (03CR) 10jerkins-bot: [V: 04-1] cloud-nfs: Allow changing the nfs mount version [puppet] - 10https://gerrit.wikimedia.org/r/612647 (https://phabricator.wikimedia.org/T257945) (owner: 10Bstorm) [22:07:47] (03PS3) 10Bstorm: cloud-nfs: Allow changing the nfs mount version [puppet] - 10https://gerrit.wikimedia.org/r/612647 (https://phabricator.wikimedia.org/T257945) [22:11:14] (03PS5) 10Addshore: Commons: Define entity sources configuration (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609987 (https://phabricator.wikimedia.org/T256906) [22:11:16] (03PS4) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:11:18] (03PS1) 10Addshore: Wikibase: Split localEntitySourceName config for repo and client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612666 (https://phabricator.wikimedia.org/T254315) [22:11:20] (03PS1) 10Addshore: Wikibase labs: All client "local" entity sources are wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612667 (https://phabricator.wikimedia.org/T254315) [22:11:22] (03PS1) 10Addshore: Wikibase test: Client local entity sources are always testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612668 (https://phabricator.wikimedia.org/T254315) [22:12:14] (03CR) 10jerkins-bot: [V: 04-1] Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:17:12] (03PS5) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:17:14] (03PS1) 10Addshore: Wikidata test: Split client db lists. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) [22:18:33] (03CR) 10jerkins-bot: [V: 04-1] Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:18:35] (03CR) 10jerkins-bot: [V: 04-1] Wikidata test: Split client db lists. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:18:43] (03PS1) 10Addshore: Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) [22:19:44] (03CR) 10jerkins-bot: [V: 04-1] Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:19:49] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:21:12] (03PS2) 10Addshore: Wikidata test: Split client db lists. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) [22:21:19] (03PS6) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:21:24] (03PS2) 10Addshore: Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) [22:30:55] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:32:37] (03PS1) 10Andrew Bogott: cloud puppetmaster frontend: don't double-include packages on VMs [puppet] - 10https://gerrit.wikimedia.org/r/612674 (https://phabricator.wikimedia.org/T242607) [22:33:23] (03PS3) 10Addshore: Wikidata test: Split client db lists. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) [22:33:25] (03PS6) 10Addshore: Commons: Define entity sources configuration (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609987 (https://phabricator.wikimedia.org/T256906) [22:33:27] (03PS7) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:33:29] (03PS3) 10Addshore: Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) [22:34:13] (03CR) 10Andrew Bogott: [C: 03+2] cloud puppetmaster frontend: don't double-include packages on VMs [puppet] - 10https://gerrit.wikimedia.org/r/612674 (https://phabricator.wikimedia.org/T242607) (owner: 10Andrew Bogott) [22:34:35] (03PS1) 10RLazarus: Don't build and run the tests when compiling Envoy, just the binary. [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/612675 (https://phabricator.wikimedia.org/T256843) [22:36:21] (03CR) 10Addshore: [C: 03+1] "0 diff looks good ๐Ÿ˜Ž" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612666 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:37:18] (03CR) 10Addshore: [C: 03+1] "diff appears good to me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612667 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:38:20] (03CR) 10Majavah: [C: 04-1] "Afaics this shouldn't touch portals" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612666 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:38:41] (03CR) 10Addshore: [C: 03+1] "dam rebase.. let me fix that.." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612666 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:39:28] (03PS2) 10Addshore: Wikibase: Split localEntitySourceName config for repo and client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612666 (https://phabricator.wikimedia.org/T254315) [22:39:58] (03CR) 10Addshore: [C: 03+1] "diff looks good, portals change fixed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612666 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:40:07] (03PS2) 10Addshore: Wikibase labs: All client "local" entity sources are wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612667 (https://phabricator.wikimedia.org/T254315) [22:40:13] (03PS2) 10Addshore: Wikibase test: Client local entity sources are always testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612668 (https://phabricator.wikimedia.org/T254315) [22:40:18] (03PS4) 10Addshore: Wikidata test: Split client db lists. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) [22:40:25] (03PS7) 10Addshore: Commons: Define entity sources configuration (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609987 (https://phabricator.wikimedia.org/T256906) [22:40:30] (03PS8) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:40:36] (03PS4) 10Addshore: Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) [22:43:46] (03PS5) 10Addshore: Wikidata test: Split client db lists. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) [22:43:48] (03PS3) 10Addshore: Wikibase test: Client local entity sources are always testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612668 (https://phabricator.wikimedia.org/T254315) [22:43:50] (03PS8) 10Addshore: Commons: Define entity sources configuration (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609987 (https://phabricator.wikimedia.org/T256906) [22:43:53] (03PS9) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:43:54] *changes order again* ... [22:43:55] (03PS5) 10Addshore: Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) [22:44:46] (03CR) 10Addshore: [C: 04-1] "Needs to be reviewed and altered as this is now after the wikidata client db list split" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612668 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:45:09] addshore: jerkins hates you [22:45:14] <3 [22:45:59] (03CR) 10Addshore: [C: 04-1] Wikidata test: Split client db lists. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:46:42] (03PS6) 10Addshore: Wikidata test: Split client db lists. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) [22:49:23] (03CR) 10Addshore: [C: 03+1] "diff seems to correctly be showing some changes." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612669 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:49:30] (03PS4) 10Addshore: Wikibase test: Client local entity sources are always testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612668 (https://phabricator.wikimedia.org/T254315) [22:50:31] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 53 probes of 566 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:51:58] (03PS1) 10Mstyles: add logout config for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/612681 (https://phabricator.wikimedia.org/T257314) [22:52:11] (03PS5) 10Addshore: Wikibase test: Client local entity sources are always testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612668 (https://phabricator.wikimedia.org/T254315) [22:52:22] (03CR) 10Addshore: [C: 03+1] "Diff looks good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612668 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:53:18] (03PS9) 10Addshore: Commons: Define entity sources configuration (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609987 (https://phabricator.wikimedia.org/T256906) [22:53:27] (03PS10) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:54:00] (03PS6) 10Addshore: Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) [22:54:53] (03CR) 10Addshore: [C: 03+1] "diff looks good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609987 (https://phabricator.wikimedia.org/T256906) (owner: 10Addshore) [22:56:19] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 45 probes of 566 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:57:35] (03PS11) 10Addshore: Wikidata client wikis: Define entity sources configuration (take 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) [22:59:30] (03CR) 10Addshore: [C: 03+1] "diff looks good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609988 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [22:59:38] (03PS7) 10Addshore: Wikibase: remove wmgWikibaseLocalEntitySourceName [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) [23:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Evening backport window(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200714T2300). [23:00:58] (03CR) 10Addshore: [C: 03+1] "diff looks good (no change)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612670 (https://phabricator.wikimedia.org/T254315) (owner: 10Addshore) [23:08:09] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:15:33] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:23:17] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:23:24] (03PS1) 10BryanDavis: cloud: [mwv] Use NFSv4 by default LXC+Vagrant [puppet] - 10https://gerrit.wikimedia.org/r/612682 (https://phabricator.wikimedia.org/T257855) [23:25:07] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:25:34] (03CR) 10BryanDavis: "Works in tandem with I10cd90b2d6fd5664333e45b4ca87975ec8d88b18 to tell Vagrant to use NFSv4 on a new Cloud VPS instance using role::labs::" [puppet] - 10https://gerrit.wikimedia.org/r/612682 (https://phabricator.wikimedia.org/T257855) (owner: 10BryanDavis) [23:32:15] 10Operations, 10FR-MW-Vagrant, 10Fundraising-Backlog, 10MediaWiki-Vagrant: Package XDebug 2.9 for apt.wikimedia.org - https://phabricator.wikimedia.org/T220406 (10Tgr) What can be done to get this moving? [23:37:47] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:48:53] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:58:23] 10Operations, 10ops-codfw, 10netops: (Need by: ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul)