[00:16:00] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2326.codfw.wmnet'] ` and were **ALL** successful. [00:18:11] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2327.codfw.wmnet'] ` and were **ALL** successful. [01:07:51] 10serviceops, 10Release-Engineering-Team: Enable phpdbg on mwdebug* servers - https://phabricator.wikimedia.org/T244549 (10Jdforrester-WMF) [01:08:39] 10serviceops, 10Release-Engineering-Team: Enable phpdbg on mwdebug* servers - https://phabricator.wikimedia.org/T244549 (10EBernhardson) In terms of actual deployment I think we can simply install the php-phpdbg package (available from our php7.2 deb component) and adjust MWScript.php to allow the 'phpdbg' SAP... [01:10:45] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2328.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [01:13:18] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2329.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [01:32:42] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2328.codfw.wmnet'] ` and were **ALL** successful. [01:57:44] 10serviceops, 10WMF-JobQueue, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review: Jobrunner monitoring still calles /rpc/runJobs.php - https://phabricator.wikimedia.org/T243096 (10Pchelolo) Let's hold on this one until we finish T220127 cause we will change how the job execution is call... [02:13:33] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2329.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2329.codfw.wmnet'] ` [02:15:38] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2329.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [02:32:19] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2330.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [03:15:53] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2329.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2329.codfw.wmnet'] ` [03:25:15] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2329.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [03:32:35] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2330.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2330.codfw.wmnet'] ` [03:34:23] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2330.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [03:47:05] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2329.codfw.wmnet'] ` and were **ALL** successful. [03:51:05] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2321.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [03:53:22] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2321.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2321.codfw.wmnet'] ` [03:56:07] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2331.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [03:57:21] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2330.codfw.wmnet'] ` and were **ALL** successful. [03:59:29] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2332.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [04:18:10] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2331.codfw.wmnet'] ` and were **ALL** successful. [04:21:26] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2332.codfw.wmnet'] ` and were **ALL** successful. [04:31:28] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2333.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [04:33:21] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2334.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020... [04:56:02] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2333.codfw.wmnet'] ` and were **ALL** successful. [05:33:53] 10serviceops, 10Operations, 10ops-codfw: rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2334.codfw.wmnet'] ` Of which those **FAILED**: ` ['mw2334.codfw.wmnet'] ` [08:20:15] 10serviceops, 10Wikifeeds, 10Patch-For-Review, 10Wikimedia-Incident: wikifeeds - fix the CPU limits so that it doesn't get starved - https://phabricator.wikimedia.org/T244535 (10akosiaris) p:05Triage→03High a:03akosiaris [08:21:34] !log deploy https://gerrit.wikimedia.org/r/570726 T244535 to avoid CPU throttling of wikifeeds [08:21:51] wrong channel [08:27:28] 10serviceops, 10Wikifeeds, 10Patch-For-Review, 10Wikimedia-Incident: wikifeeds - fix the CPU limits so that it doesn't get starved - https://phabricator.wikimedia.org/T244535 (10akosiaris) https://grafana.wikimedia.org/d/35vIuGpZk/wikifeeds?orgId=1&from=1581018813182&to=1581025628155&var-dc=eqiad%20prometh... [08:50:01] 10serviceops, 10Patch-For-Review: Add x-request-id to httpd (apache) logs - https://phabricator.wikimedia.org/T244545 (10ArielGlenn) Adding @Ottomata as a heads up that these log lines will have an additional element in them, in case that impacts analytics processing. [09:31:23] 10serviceops, 10Operations, 10observability, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10fgiunchedi) [09:32:03] 10serviceops, 10Operations, 10observability, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10fgiunchedi) >>! In T244357#5853220, @Dzahn wrote: > added vm-requests tag and pasted vm-request form. please add the missing data above. Done, thank you! [09:39:28] 10serviceops, 10Wikifeeds, 10Wikimedia-Incident: wikifeeds - fix the CPU limits so that it doesn't get starved - https://phabricator.wikimedia.org/T244535 (10akosiaris) Limits have been increased to 2.5 cores. However the app is still mildly throttled [1]. Given the limits is 1.5 times more than the current... [09:46:35] 10serviceops, 10Wikifeeds, 10Wikimedia-Incident: wikifeeds - fix the CPU limits so that it doesn't get starved - https://phabricator.wikimedia.org/T244535 (10akosiaris) We are definitely better than what we used to be, but I am still not happy. I 'll increase the capacity as well, from 4 pods to 6 pods, that... [10:27:23] 10serviceops, 10Operations: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10jijiki) Given we have 5 canary appservers in eqiad + 2 debug servers, I would recommend we add another 2 in codfw [10:31:56] 10serviceops, 10Operations: Test and deploy mcrouter 0.41 - https://phabricator.wikimedia.org/T244476 (10jijiki) p:05Triage→03Medium [10:39:54] 10serviceops, 10Operations: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10Urbanecm) @jijiki Don't we have mwdebug2001 and mwdebug2002 in codfw too? [10:42:55] 10serviceops, 10Operations: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10jijiki) @Urbanecm they do not get user traffic, so they are good enough for testing, but not good enough for canary deloys [10:44:42] 10serviceops, 10Operations: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10Urbanecm) Is that different from what eqiad debug servers do? I'm trying to understand why you said "Given we have 5 canary appservers in eqiad //+ 2 debug servers//" (emphasis mine). [10:51:17] 10serviceops, 10Operations: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10jijiki) @Urbanecm yes, so that is a total of 7 canary app servers in eqiad, of which 5 get real user traffic. Since we will be switching to codfw, it makes sense to have a similar setup in codfw. [10:52:08] 10serviceops, 10Operations: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10Urbanecm) @jijiki Gotit, thanks! [10:57:49] 10serviceops, 10Operations, 10Scap: Make canary wait time configurable - https://phabricator.wikimedia.org/T217924 (10jijiki) shall we move this forward? [11:05:44] 10serviceops, 10Wikifeeds, 10Wikimedia-Incident: wikifeeds - fix the CPU limits so that it doesn't get starved - https://phabricator.wikimedia.org/T244535 (10akosiaris) 05Open→03Resolved The capacity increase did not fix anything, neither did some efforts with increasing requests/limits more. In fact the... [14:10:12] 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): restrouter.svc.{eqiad,codfw}.wmnet in a failed state - https://phabricator.wikimedia.org/T242461 (10akosiaris) [14:18:46] 10serviceops, 10Patch-For-Review: Add x-request-id to httpd (apache) logs - https://phabricator.wikimedia.org/T244545 (10Ottomata) Not I, but I'm all for x-request-id everywhere! Related: {T235773} [14:53:13] _joe_ akosiaris apergos effie mark mutante: I assume we're postponing the meeting for the incident [14:53:25] ugh that's in 7 mins isn't it [14:53:40] can we do it on monday same time? [14:53:56] yeah, that please ^ [14:54:00] and merge it with the status one? [14:55:19] i can't make it on monday [15:03:50] apergos: I am here but I can't join the meeting :/ [15:04:06] mutante- we are recovering from two outages [15:04:10] mutante-: I think we're postponing, sorry [15:04:12] and still doing postmortem [15:04:13] Laptop shut down and don't have the power adapter [15:04:18] Ugh.. ok [15:04:47] Ok. I will go back home to fetch the adapter [15:05:10] On phone with webchat right now [15:24:15] <_joe_> so what's the plan? [15:24:21] <_joe_> ok [15:25:00] I would rather meet today than Monday, if we're still able [15:25:07] but if we're meeting Monday, let's just keep our original timeslot [15:25:45] +1 to today if possible [15:34:19] i guess not today since joe is trying to rest a little [15:34:44] if we go monday we go without ma rk [15:35:07] yeah, we need both of them present to make a decision [15:35:20] I guess it's Tuesday [15:41:57] for a split second there I was about to type 'why don't we just use today's status meeting for it?' .... [15:42:04] this friday is too long already [15:42:32] _joe_: are you in for a meeting now, or next week? [15:44:36] <_joe_> I'm ok either way [15:44:47] <_joe_> as long as we keep it to 30 minutes or so [15:45:27] definitely won't be longer than that -- but if you're drained from the outage, I'd rather wait and have you fresh [15:45:28] up to you though [15:45:53] i can meet now [15:46:00] sounds like maybe we should [15:46:05] and get it over with [15:47:24] akosiaris: effie: up for that? [15:47:50] if we keep it 30' [15:47:59] I am up [15:48:09] ok [15:48:18] <_joe_> at what time? [15:48:21] nowish [15:48:25] as soon as we hear from alex? [15:48:26] <_joe_> in 12 minutes? [15:48:29] <_joe_> ok [15:48:30] I am around [15:48:32] in 12? [15:48:37] I thought now [15:48:46] if we're all here we can start now :) [15:48:47] fine by me. Even now [15:48:48] in 2' ? [15:48:50] coom [15:48:52] cool [15:49:04] ok be ther ein a sec [15:49:04] I'm getting into the Meet room from our 1500 meeting [15:50:01] <_joe_> I had to relogin [15:50:05] ditto [15:50:08] it is me reuven and his keyboard [15:50:10] where is my hangout tab damnit [15:50:13] i have like a thousand open