[05:41:18] 10serviceops, 10Deployments, 10Release-Engineering-Team, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10thcipriani) ## Simplified Recreation I was able to recreate this issue with a simplified example. This is a simpli... [06:49:23] 10serviceops, 10Deployments, 10Release-Engineering-Team, 10Performance-Team (Radar): Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 (10Joe) >>! In T236104#5910908, @thcipriani wrote: > Another possibility is to restart php-fpm whenever we update `Ini... [11:08:00] 10serviceops, 10Operations, 10ops-eqiad: mw1280 crashed logging correctable memory errors - https://phabricator.wikimedia.org/T240187 (10jijiki) [11:42:35] 10serviceops, 10Release-Engineering-Team-TODO, 10Scap: Deploy scap 3.13.0-1 - https://phabricator.wikimedia.org/T245530 (10jijiki) 05Open→03Resolved Let me know if we come across any issues:) [13:11:21] 10serviceops, 10Operations, 10Release-Engineering-Team: mcrouter proxies and scap proxies - https://phabricator.wikimedia.org/T245841 (10jbond) p:05Triage→03Medium [13:11:55] 10serviceops, 10Operations, 10Parsoid-PHP, 10SRE-Access-Requests: Give all members of the Parsing team production `deployment` access - https://phabricator.wikimedia.org/T245877 (10jbond) p:05Triage→03Medium [13:12:12] 10serviceops, 10Core Platform Team, 10DC-Ops, 10Operations: Rename wtp* servers to parsoid* (Parsoid PHP servers) - https://phabricator.wikimedia.org/T245888 (10jbond) p:05Triage→03Medium [15:41:32] 10serviceops, 10CPT Initiatives (Core REST API in PHP), 10Core Platform Team Workboards (Green): Move CORE REST API to be served from the MW API Cluster - https://phabricator.wikimedia.org/T246002 (10WDoranWMF) [15:41:43] 10serviceops, 10CPT Initiatives (Core REST API in PHP), 10Core Platform Team Workboards (Green): Move CORE REST API to be served from the MW API Cluster - https://phabricator.wikimedia.org/T246002 (10WDoranWMF) p:05Triage→03Medium [16:16:26] <_joe_> ottomata: https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service [16:16:42] <_joe_> and https://wikitech.wikimedia.org/wiki/LVS#Remove_a_load_balanced_service [16:29:37] COOOOL [16:29:37] reading [16:29:54] whiich reminds me mr. akosiaris helllloooooo :) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/573624 [18:25:46] <_joe_> Notice: /Stage[main]/Envoyproxy/Systemd::Service[envoyproxy.service]/Service[envoyproxy.service]: Triggered 'refresh' from 1 event [18:25:50] <_joe_> \o/ [18:26:06] ;) [18:26:15] <_joe_> but sadly it fails to start [18:26:32] <_joe_> the verification step works, but clearly doesn't catch all the corner cases [18:27:38] <_joe_> unable to open file '/var/log/envoy/echostore.log': Permission denied [18:28:09] <_joe_> sigh ofc [18:28:31] aww. at least more obvious why i guess [18:37:50] <_joe_> there must be some race condition I don't understand at the moment [18:39:17] <_joe_> but when envoy starts from puppet it seems to create the log files owned by root:root, while if I just do sudo systemctl start envoyproxy.service the files get created with the correct user [18:39:40] <_joe_> uh actually, just the first time [18:42:26] deploying apache cluster config change.. adds gr.wikimedia.org to list of vhosts..not more [18:45:42] <_joe_> I seem to have done something wrong with TLS configuration of upstream clusters [18:46:00] <_joe_> rlazarus: if you want to take a look on say mwdebug2001 when you're back [18:46:04] <_joe_> I'm going afk now [18:47:45] duplicate cluster 'api-ro' [18:47:58] is what it complains about when checking envoy config on mwdebug1001 [18:48:23] from [mwdebug1001:~] $ /usr/local/sbin/build-envoy-config -c '/etc/envoy' [18:49:02] <_joe_> mutante: yeah I still need to clean up there [18:49:20] *nod* [18:49:21] <_joe_> so the problem is I set tls minimum protocol version to tls 1.3 [18:49:29] <_joe_> this can be fixed tomorrow though [18:49:43] ok [18:49:56] <_joe_> also debugged, as eventgate-main has envoy as a tls terminator so it should be able to speak tls 1.3 [18:50:04] <_joe_> if it doesn't we need to understand why [18:51:04] <_joe_> the reason to use tls 1.3 internally is to reduce the overhead [18:51:17] <_joe_> but also, not every cluster is still able to speak tls 1.3, heh [18:51:27] <_joe_> like echostore and sessionstore probably can't [18:52:35] <_joe_> so yeah this still needs some tuning, but overall it works(TM) [18:53:48] ack. still great progress [20:24:56] 10serviceops, 10Operations, 10ops-eqiad: (Need by: 2020-02-12) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10wiki_willy) [21:11:16] 10serviceops, 10Operations, 10ops-eqiad: (Need by: TBD) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10RobH) [21:50:03] 10serviceops, 10ChangeProp, 10Operations, 10Release Pipeline, and 5 others: Migrate cpjobqueue to kubernetes - https://phabricator.wikimedia.org/T220399 (10Pchelolo)