[14:17:10] so serviceops meeting this week... at the usual thu slot tomorrow it's overlapping with the staff meeting, so that's not really an option [14:17:20] i guess we should use our slot today for those who can make it [14:19:01] do we need the full 45 minutes? are we doing more annual planning? [14:19:08] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform: Create production and canary releases for existent eventgate helmfile services - https://phabricator.wikimedia.org/T245203 (10akosiaris) [14:19:14] or is the overlap more than usual this time? [14:19:44] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform: Create production and canary releases for existent eventgate helmfile services - https://phabricator.wikimedia.org/T245203 (10akosiaris) I 've merged https://gerrit.wikimedia.org/r/576402 and manually cleaned the various IPVS services that we... [14:20:53] what overlap? [14:21:58] I can't really make it fwiw [14:22:13] I 'd be happy to do extra time tomorrow but can't do today [14:23:02] "it's overlapping with the staff meeting" you said, normall the overlap is 15 minutes, is it more than that this time? [14:23:17] otherwise can we make do with the half hour? [14:25:32] it's overlapping with the technology all hands meeting tomorrow [14:25:48] how about this [14:25:55] for those who can make it, add agenda items to the pad now [14:26:01] and then we decide if we want to skip this week or not? [14:30:46] I can be there today, but I don't have any topics myself [14:52:21] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform: Create production and canary releases for existent eventgate helmfile services - https://phabricator.wikimedia.org/T245203 (10Ottomata) YES THANK YOU! [14:53:39] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10cscott) >All concluded, I think we have the following options to solve the issue: > >1. Make scap3 restart php-fpm >2. Disable the realpath cache... [15:22:13] o/ [15:22:27] Is there a way to limit the rate of a certain job execution? [15:23:57] yes [15:24:14] there is a yaml, I think, although not sure with how much granularity [15:24:22] * addshore searches around puppet [15:24:41] I know I asked to do some changes in the past, but I cannot remember the details [15:25:07] people here will know more [15:29:16] found it https://github.com/wikimedia/mediawiki-services-change-propagation-jobqueue-deploy/blob/master/scap/vars.yaml#L27 :D [15:44:40] that's it, yes [15:53:23] re meeting, count me as one more "I can make it but no topics to add" [15:56:07] ok [15:56:12] i think we can skip it then [15:56:21] i will remove it from the calendar [15:58:09] 👍 [16:00:42] we could... do tomorrow 5pm to 5:45 pm UTC (I am kicking myself for even suggesting this after back to back meetings already but it is possible) [16:02:43] akosiaris: thak setting installed: false on the releases I want to destroy just makes them not known by helmfile [16:03:12] instead, can I do e.g. helmfile --selector name=analytics destroy [16:03:12] ? [16:03:28] (oh pinged you in wrong channel before :p) [16:04:16] <_joe_> oh you just freed my next hour, thanks :) [16:04:18] ottomata: that should work as well. Whatever suits you. the installed: false idea just doesn't have you work through the flags of the helmfile incantation [16:04:38] * akosiaris on the move [16:04:50] hm, but how do I get rid of the pods if installed: false? ok will do destroy! [16:13:05] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Create production and canary releases for existent eventgate helmfile services - https://phabricator.wikimedia.org/T245203 (10Ottomata) [16:16:23] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Create production and canary releases for existent eventgate helmfile services - https://phabricator.wikimedia.org/T245203 (10Ottomata) [16:21:17] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Create production and canary releases for existent eventgate helmfile services - https://phabricator.wikimedia.org/T245203 (10Ottomata) [16:22:23] 10serviceops, 10Operations, 10ops-eqiad: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Cmjohnson) [16:22:35] 10serviceops, 10Analytics, 10Analytics-Kanban: Clarify multi-service instance concepts in helm charts and enable canary releases - https://phabricator.wikimedia.org/T242861 (10Ottomata) @akosiaris for my purposes I'm satisfied, but I'm not sure we settled this totally for other charts. Most controversial wa... [16:25:13] 10serviceops, 10Operations, 10ops-eqiad: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Cmjohnson) @dzahn or whoever needs these, all of them with the exception of mw1403 is ready for service implementation. mw1403 is not installing and I a... [16:32:58] 10serviceops, 10Operations, 10ops-eqiad: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` mw1403.eqiad.wmnet ` The log can be found in `... [16:35:07] ok i'm at the add discovery entries for eventgate-analytics-external step [16:35:10] looks simple enough [16:35:30] confctl --object-type discovery select 'dnsdisc=eventgate-analytics-external' set/pooled=true [16:35:40] and then merge and apply dns entries: https://gerrit.wikimedia.org/r/c/operations/dns/+/573367 [16:35:43] right? [16:35:57] akosiaris or _joe_ ok if I proceed? ^ [16:36:52] <_joe_> +1 assuming the dnsdisc is correct [16:48:55] _joe_: confctl from conf1004, yes? [16:49:31] <_joe_> ottomata: rather from cumin1001 [16:49:34] ah k [16:49:38] <_joe_> or puppetmaster1001 [16:54:13] 10serviceops, 10Operations, 10ops-eqiad: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1403.eqiad.wmnet'] ` and were **ALL** successful. [16:55:10] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Create production and canary releases for existent eventgate helmfile services - https://phabricator.wikimedia.org/T245203 (10Ottomata) [16:59:21] 10serviceops, 10Operations: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Cmjohnson) a:05Cmjohnson→03jijiki @jijiki all servers are now ready for implementation. I am removing the ops-eqiad tag and assigned to you [17:02:38] 10serviceops, 10Operations: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Dzahn) a:05jijiki→03None Thanks @Cmjohnson! I'll take that as jijiki is currently away. [17:02:49] 10serviceops, 10Operations: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Dzahn) a:03Dzahn [17:27:49] hm, something not right with discovery [17:30:52] confctl looks ok [17:30:54] $ sudo confctl --object-type discovery select 'dnsdisc=eventgate-analytics-external' get [17:30:54] {"codfw": {"references": [], "ttl": 300, "pooled": true}, "tags": "dnsdisc=eventgate-analytics-external"} [17:30:54] {"eqiad": {"references": [], "ttl": 300, "pooled": true}, "tags": "dnsdisc=eventgate-analytics-external"} [17:33:39] 10serviceops, 10Parsing-Team, 10Parsoid, 10PHP 7.2 support: Parsoid-php doesn't get updated after a code deploy - https://phabricator.wikimedia.org/T236275 (10cscott) The underlying issues here might resurface as we reconfigure beta due to {T246833}/{T246854}. (This comment is mostly just to mention these... [18:51:49] nm, it is working! [18:51:59] i guess it just took a whiel to propagate [22:11:18] 10serviceops, 10Operations, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 7 host(s) and their services with reason: new_install ` mw[1393-1399].eqi... [22:11:23] 10serviceops, 10Operations, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 5 host(s) and their services with reason: new_install ` mw[1400-1404].eqi... [22:14:11] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Release-Engineering-Team-TODO, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Urbanecm) [22:18:59] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Release-Engineering-Team-TODO, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) [22:56:49] 10serviceops, 10Operations, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 7 host(s) and their services with reason: new_install ` mw[1393-1399].eqi... [22:59:18] 10serviceops, 10Operations, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 5 host(s) and their services with reason: new_install ` mw[1400-1404].eqi... [23:37:48] when adding new eqiad appservers.. envoyproxy is in state failed [23:38:27] looking now why that is [23:49:48] 10serviceops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul)