[00:01:25] (03PS1) 10Bstorm: kubeadm and toolforge: rearrange some things to get it working [puppet] - 10https://gerrit.wikimedia.org/r/597899 (https://phabricator.wikimedia.org/T211096) [00:04:30] (03CR) 10Bstorm: [C: 03+2] paws-k8s: set some volumes up [puppet] - 10https://gerrit.wikimedia.org/r/597884 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [00:05:19] 10Operations, 10Traffic, 10Performance-Team (Radar): Collect client network errors, deprecation, intervention and crash reports - https://phabricator.wikimedia.org/T207860 (10CDanis) 05Resolved→03Open a:05Gilles→03None SRE is interested in collecting at least network error reports. [00:09:44] (03CR) 10Bstorm: "PCC on a control plane host: https://puppet-compiler.wmflabs.org/compiler1002/22687/tools-k8s-control-1.tools.eqiad.wmflabs/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/597899 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [00:17:32] 10Operations, 10InternetArchiveBot, 10Traffic: Support TLSv1.3 in IABot - https://phabricator.wikimedia.org/T251414 (10Krinkle) [00:26:29] (03PS1) 10AntiCompositeNumber: engine.vips: Don't run vips if scaling_factor == 1 [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/597904 (https://phabricator.wikimedia.org/T218272) [00:34:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10bd808) [00:41:47] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10bd808) [00:43:57] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10bd808) [00:47:54] 10Operations, 10Traffic, 10Goal, 10Performance-Team (Radar), 10Wikimedia-Incident: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Krinkle) (See also [Navigation Timing metrics spec](https://www.w3.org/TR/navigation-timing-2/#processing-model).) | {icon arrow-circle-up color=red} request... [00:47:56] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org - https://phabricator.wikimedia.org/T251619 (10bd808) [00:48:50] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10bd808) We did research in {T248425} and found that we could reduce the number of 10G ports from 2 to 1, but I'm not sure that... [00:59:13] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10bd808) [00:59:15] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10bd808) [00:59:18] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org - https://phabricator.wikimedia.org/T251619 (10bd808) [01:02:50] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10bd808) [01:02:53] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10bd808) [01:02:56] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org - https://phabricator.wikimedia.org/T251619 (10bd808) [01:17:24] (03CR) 10Bmansurov: "@Alexandros, I looked into the issue. It's happening because you don't have MySQL setup locally. I think it's a non-issue." [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [02:11:17] 10Operations, 10Thumbor, 10Wikimedia-SVG-rendering: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) - https://phabricator.wikimedia.org/T36947 (10AntiCompositeNumber) [02:20:24] 10Operations, 10Commons, 10MediaWiki-File-management, 10Thumbor, 10Traffic: Thumbnail rendering of complex SVG file leads to Error 500 or Error 429 instead of Error 408 - https://phabricator.wikimedia.org/T226318 (10AntiCompositeNumber) As for the issue mentioned in the task title, I don't believe 408 wo... [02:31:52] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [02:37:28] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [03:05:49] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install kubernetes20[07-14].codfw.wmnet and kubestage200[1-2].codfw.wmnet. - https://phabricator.wikimedia.org/T252185 (10Papaul) @akosiaris please see below what i am getting from kubestage2001 and kubernetes2007 ` You may use the whole vol... [04:12:34] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:12:38] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 62, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:34:19] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1081 - T252512', diff saved to https://phabricator.wikimedia.org/P11272 and previous config saved to /var/cache/conftool/dbconfig/20200522-043418-marostegui.json [04:34:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:34:23] T252512: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 [04:36:36] (03PS1) 10Marostegui: mariadb: Place db1149 into s4 [puppet] - 10https://gerrit.wikimedia.org/r/597916 (https://phabricator.wikimedia.org/T252512) [04:39:20] (03PS2) 10Marostegui: mariadb: Place db1149 into s4 [puppet] - 10https://gerrit.wikimedia.org/r/597916 (https://phabricator.wikimedia.org/T252512) [04:40:15] (03CR) 10Marostegui: [C: 03+2] mariadb: Place db1149 into s4 [puppet] - 10https://gerrit.wikimedia.org/r/597916 (https://phabricator.wikimedia.org/T252512) (owner: 10Marostegui) [05:50:19] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:51:41] PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:57:46] so there seems to be two overlapping maintenance windows [05:58:26] yes Zayo and Telia [05:58:35] checking the circuit id just in case [05:59:17] RECOVERY - OSPF status on cr2-eqiad is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:59:34] yep both transports are a maintenance event scheduled for their circuit id [06:07:16] (03PS1) 10Elukey: Add druid100[7,8] to the druid_public_broker VIP [puppet] - 10https://gerrit.wikimedia.org/r/597918 (https://phabricator.wikimedia.org/T252771) [06:13:40] (03PS1) 10Elukey: profile::prometheus::alerts: update druid analytics monitor [puppet] - 10https://gerrit.wikimedia.org/r/597919 (https://phabricator.wikimedia.org/T252771) [06:14:19] (03CR) 10Elukey: [C: 03+2] profile::prometheus::alerts: update druid analytics monitor [puppet] - 10https://gerrit.wikimedia.org/r/597919 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [06:16:09] PROBLEM - PHP opcache health on mw2350 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [06:18:39] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 133, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:36:29] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:36:33] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:49:33] (03CR) 10Ayounsi: check_puppet_run_changes: remove staging hosts from this test (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [06:52:22] (03CR) 10Elukey: [C: 03+2] Add druid100[7,8] to the druid_public_broker VIP [puppet] - 10https://gerrit.wikimedia.org/r/597918 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [06:55:27] RECOVERY - PHP opcache health on mw2350 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200522T0700) [07:01:11] (03PS2) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [07:04:00] (03PS2) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: refresh code for modern puppet [puppet] - 10https://gerrit.wikimedia.org/r/597805 (https://phabricator.wikimedia.org/T97972) [07:04:02] (03PS3) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [07:04:04] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=druid1007.eqiad.wmnet [07:04:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:04:37] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=druid1007.eqiad.wmnet [07:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:04:52] <_joe_> elukey: you can set/pooled=yes:weight=10 to set two values at once [07:05:12] <_joe_> and with weight zero it won't be inserted in the pool properly [07:06:22] _joe_ yeah I used ";" instead of ":", my bad [07:06:27] just fixed it [07:07:32] !log elukey@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=druid1008.eqiad.wmnet [07:07:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:52] goood [07:07:57] I see traffic coming in [07:08:26] 10Operations, 10Cassandra, 10Core Platform Team Workboards (Clinic Duty Team): Revisit default settings for c-foreach-restart - https://phabricator.wikimedia.org/T198787 (10Naike) 05Open→03Stalled [07:15:07] (03PS4) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [07:18:03] (03PS1) 10Giuseppe Lavagetto: Add etcd password autogen seed [labs/private] - 10https://gerrit.wikimedia.org/r/597987 [07:18:28] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Add etcd password autogen seed [labs/private] - 10https://gerrit.wikimedia.org/r/597987 (owner: 10Giuseppe Lavagetto) [07:18:40] (03CR) 10Dzahn: [C: 03+2] phabricator weekly changes email: List only open tasks by new contributors [puppet] - 10https://gerrit.wikimedia.org/r/597545 (owner: 10Aklapper) [07:18:56] <_joe_> mutante: can you also merge my labs/private change? [07:19:20] _joe_: yep, doing [07:19:25] <_joe_> thanks :) [07:20:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11275 and previous config saved to /var/cache/conftool/dbconfig/20200522-072000-marostegui.json [07:20:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:14] np, done [07:20:34] (03PS2) 10Dzahn: phabricator weekly changes email: List users with URLs in profile desc [puppet] - 10https://gerrit.wikimedia.org/r/597582 (owner: 10Aklapper) [07:21:29] (03PS5) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [07:21:39] (03CR) 10jerkins-bot: [V: 04-1] phabricator weekly changes email: List users with URLs in profile desc [puppet] - 10https://gerrit.wikimedia.org/r/597582 (owner: 10Aklapper) [07:22:21] (03PS1) 10Marostegui: db1149: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/597988 (https://phabricator.wikimedia.org/T252512) [07:23:00] (03CR) 10Dzahn: "eh "Invalid commit message"" [puppet] - 10https://gerrit.wikimedia.org/r/597582 (owner: 10Aklapper) [07:23:24] (03CR) 10Marostegui: [C: 03+2] db1149: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/597988 (https://phabricator.wikimedia.org/T252512) (owner: 10Marostegui) [07:23:38] (03PS3) 10Dzahn: phabricator weekly changes email: List users with URLs in profile desc [puppet] - 10https://gerrit.wikimedia.org/r/597582 (owner: 10Aklapper) [07:26:13] (03CR) 10Dzahn: [C: 03+2] phabricator weekly changes email: List users with URLs in profile desc [puppet] - 10https://gerrit.wikimedia.org/r/597582 (owner: 10Aklapper) [07:26:42] (03CR) 10Dzahn: "currently empty result, but i think that's not unexpected" [puppet] - 10https://gerrit.wikimedia.org/r/597582 (owner: 10Aklapper) [07:32:28] (03PS1) 10Elukey: profile::prometheus::alerts: improve druid alerts [puppet] - 10https://gerrit.wikimedia.org/r/597990 (https://phabricator.wikimedia.org/T252771) [07:34:00] (03CR) 10Elukey: [C: 03+2] profile::prometheus::alerts: improve druid alerts [puppet] - 10https://gerrit.wikimedia.org/r/597990 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [07:35:11] (03PS3) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: refresh code for modern puppet [puppet] - 10https://gerrit.wikimedia.org/r/597805 (https://phabricator.wikimedia.org/T97972) [07:35:13] (03PS6) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [07:46:20] (03PS7) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [07:48:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11276 and previous config saved to /var/cache/conftool/dbconfig/20200522-074853-marostegui.json [07:48:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:52] (03PS1) 10Elukey: superset: upgrade gunicorn app name for all envs [puppet] - 10https://gerrit.wikimedia.org/r/597991 (https://phabricator.wikimedia.org/T249495) [07:57:39] (03PS8) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [08:03:26] (03CR) 10Gilles: [V: 03+2 C: 03+2] engine.vips: Don't run vips if scaling_factor == 1 [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/597904 (https://phabricator.wikimedia.org/T218272) (owner: 10AntiCompositeNumber) [08:05:19] (03PS9) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [08:06:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11277 and previous config saved to /var/cache/conftool/dbconfig/20200522-080629-marostegui.json [08:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:28] (03CR) 10Elukey: [C: 03+2] superset: upgrade gunicorn app name for all envs [puppet] - 10https://gerrit.wikimedia.org/r/597991 (https://phabricator.wikimedia.org/T249495) (owner: 10Elukey) [08:13:00] !log test hugepages allocator on ATS in cp2041 [08:13:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:25] 10Operations, 10Privacy Engineering, 10Research, 10Traffic, 10Privacy: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Dzahn) Is there a specific thing we are waiting for? [08:17:46] !log elukey@deploy1001 Started deploy [analytics/superset/deploy@59ba01d]: Upgrade Superset to 0.36 [08:17:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:52] (03PS10) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [08:18:43] !log elukey@deploy1001 Finished deploy [analytics/superset/deploy@59ba01d]: Upgrade Superset to 0.36 (duration: 01m 01s) [08:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:03] (03CR) 10Aklapper: "Thanks. Yes, query result is intentionally empty." [puppet] - 10https://gerrit.wikimedia.org/r/597582 (owner: 10Aklapper) [08:27:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11278 and previous config saved to /var/cache/conftool/dbconfig/20200522-082700-marostegui.json [08:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:40] (03CR) 10Ema: [V: 03+2 C: 03+2] Pass JSON file and network address as CLI flags [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/597574 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [08:34:43] (03PS6) 10Dzahn: contint: fix git cloning of docroot for integration.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/595525 (https://phabricator.wikimedia.org/T224591) [08:34:49] (03CR) 10Ema: [C: 03+2] varnish: new stap script post_body.stp [puppet] - 10https://gerrit.wikimedia.org/r/597471 (owner: 10Ema) [08:36:20] 10Operations, 10Traffic: Stale nic firmware files on some hosts - https://phabricator.wikimedia.org/T253374 (10fgiunchedi) [08:39:16] (03PS1) 10Filippo Giunchedi: hieradata: add cluster for thanos::backend role [puppet] - 10https://gerrit.wikimedia.org/r/597997 [08:41:17] !log reverting hugepages experiment on cp2041 [08:41:19] (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: add cluster for thanos::backend role [puppet] - 10https://gerrit.wikimedia.org/r/597997 (owner: 10Filippo Giunchedi) [08:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:55] (03PS11) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [08:43:05] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/597998 [08:43:46] (03CR) 10jerkins-bot: [V: 04-1] Jenkins job validation (DO NOT SUBMIT) [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/597998 (owner: 10Hashar) [08:44:27] (03PS1) 10KartikMistry: Update cxserver to 2020-05-22-083137-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/597999 (https://phabricator.wikimedia.org/T246317) [08:44:43] (03CR) 10Dzahn: [C: 03+2] site: remove 13 old jobrunners from codfw rack C3 [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [08:44:51] (03PS3) 10Dzahn: site: remove 13 old jobrunners from codfw rack C3 [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) [08:45:12] (03PS12) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [08:53:16] (03PS7) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) [08:53:41] (03CR) 10Jbond: "Thanks updated" (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [08:55:03] (03CR) 10jerkins-bot: [V: 04-1] cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [08:55:31] (03PS2) 10Filippo Giunchedi: icinga: add --sni to check_http --ssl invocations [puppet] - 10https://gerrit.wikimedia.org/r/597765 (https://phabricator.wikimedia.org/T253292) [08:55:33] (03PS1) 10Filippo Giunchedi: idp: rename check_sso_redirect command into check_https_sso_redirect [puppet] - 10https://gerrit.wikimedia.org/r/598000 (https://phabricator.wikimedia.org/T253292) [08:59:36] (03PS8) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) [09:01:03] (03CR) 10Filippo Giunchedi: "The audit of current check_https usage can be found here: https://phabricator.wikimedia.org/T253292#6155573" [puppet] - 10https://gerrit.wikimedia.org/r/597765 (https://phabricator.wikimedia.org/T253292) (owner: 10Filippo Giunchedi) [09:03:25] 10Operations, 10doxygen, 10Continuous-Integration-Config, 10Developer Productivity, and 3 others: Update Doxygen in CI to 1.8.17 or greater - https://phabricator.wikimedia.org/T242155 (10hashar) [09:05:32] (03PS22) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [09:07:18] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [09:09:13] !log elukey@deploy1001 Started deploy [analytics/superset/deploy@be203c8]: Rollback superset to 0.35.2 [09:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:09:55] !log elukey@deploy1001 Finished deploy [analytics/superset/deploy@be203c8]: Rollback superset to 0.35.2 (duration: 00m 43s) [09:09:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:45] (03PS1) 10Elukey: Revert "superset: upgrade gunicorn app name for all envs" [puppet] - 10https://gerrit.wikimedia.org/r/598003 [09:11:46] what a lovely day [09:13:16] (03CR) 10Elukey: [C: 03+2] Revert "superset: upgrade gunicorn app name for all envs" [puppet] - 10https://gerrit.wikimedia.org/r/598003 (owner: 10Elukey) [09:13:19] 10Operations, 10doxygen, 10Continuous-Integration-Config, 10Developer Productivity, and 3 others: Update Doxygen in CI to 1.8.17 or greater - https://phabricator.wikimedia.org/T242155 (10hashar) The update is https://gerrit.wikimedia.org/r/#/c/operations/debs/doxygen/+/589416/ and is for Buster. I have tes... [09:18:29] (03PS13) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [09:22:57] !log jayme@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' . [09:22:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:37] 10Operations, 10Wikimedia-Logstash: Upgrade ELK Stack - https://phabricator.wikimedia.org/T234854 (10fgiunchedi) Since Elastic stack 7.7 has been released I think it'd make sense we upgrade to that before the switch, supposedly there have been improvements to memory usage! [09:28:48] 10Operations, 10serviceops, 10Kubernetes, 10Patch-For-Review: Add TLS termination to services running on kubernetes - https://phabricator.wikimedia.org/T235411 (10JMeybohm) >>! In T235411#6153075, @JMeybohm wrote: > TLS enabled mathoid is corrently deployed in staging and codfw k8s clusters but not in eqia... [09:31:19] (03PS4) 10Dzahn: site: remove 13 old jobrunners from codfw rack C3 [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) [09:32:08] (03PS23) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [09:32:30] (03CR) 10Jbond: "Thanks updated" (037 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [09:33:39] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/598000 (https://phabricator.wikimedia.org/T253292) (owner: 10Filippo Giunchedi) [09:41:54] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [09:41:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:19] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [09:43:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:26] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin... [09:49:11] 10Operations, 10serviceops: upgrade people.wikimedia.org backend to buster - https://phabricator.wikimedia.org/T247649 (10Volans) 05Resolved→03Open I think there are still some bits in the DNS repo that point to the old instance: ` templates/wmnet:people 5M IN CNAME people1001.eqiad.wmnet. te... [09:49:11] PROBLEM - mediawiki-installation DSH group on mw2154 is CRITICAL: Host mw2154 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [09:49:13] 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Volans) [09:50:02] 10Operations, 10ops-eqsin, 10Traffic: cp5012 memory errors - https://phabricator.wikimedia.org/T251219 (10Vgutierrez) 05Stalled→03Resolved cp5012 seems stable, I'll reopen this task if I see any sign of memory issues. Thanks @RobH [09:51:27] PROBLEM - mediawiki-installation DSH group on mw2156 is CRITICAL: Host mw2156 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [09:55:25] (03PS4) 10Vgutierrez: Release 8.0.7-1wm10 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/597552 [09:56:49] 10Operations, 10serviceops: upgrade people.wikimedia.org backend to buster - https://phabricator.wikimedia.org/T247649 (10Dzahn) These were changed in https://gerrit.wikimedia.org/r/c/operations/dns/+/595959/2/templates/wmnet [09:56:58] 10Puppet, 10Analytics, 10Cassandra, 10observability, and 2 others: Upgrade prometheus-jmx-exporter on all services using it - https://phabricator.wikimedia.org/T192948 (10fgiunchedi) [09:57:01] 10Operations, 10Goal, 10User-fgiunchedi: Export Prometheus-compatible JVM metrics from JVMs in production - https://phabricator.wikimedia.org/T177197 (10fgiunchedi) [09:57:29] PROBLEM - mediawiki-installation DSH group on mw2155 is CRITICAL: Host mw2155 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [09:58:22] meh, that's me, handling it [09:58:43] (DSH groups) and yes, i had already scheduled downtimes [10:00:21] !log update pdns-recursor on dns recursors [10:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:13] PROBLEM - mediawiki-installation DSH group on mw2151 is CRITICAL: Host mw2151 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [10:05:42] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [10:05:42] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [10:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:55] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [10:05:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:58] (03PS9) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) [10:09:06] (03PS24) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [10:09:33] 10Operations, 10serviceops: upgrade people.wikimedia.org backend to buster - https://phabricator.wikimedia.org/T247649 (10Volans) 05Open→03Resolved @Dzahn my bad, I had a silent error during the update of my local git copy that lead to this mis-finding. FWIW it's also possible to ssh directly into the "ri... [10:09:35] 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Volans) [10:09:51] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [10:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:57] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin... [10:10:40] (03CR) 10jerkins-bot: [V: 04-1] cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:10:55] !log Stop event_scheduler on db1115 - T252331 [10:10:59] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:10:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:59] T252331: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging - https://phabricator.wikimedia.org/T252331 [10:15:11] (03CR) 10Filippo Giunchedi: [C: 03+2] idp: rename check_sso_redirect command into check_https_sso_redirect [puppet] - 10https://gerrit.wikimedia.org/r/598000 (https://phabricator.wikimedia.org/T253292) (owner: 10Filippo Giunchedi) [10:15:51] (03PS10) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) [10:16:08] (03PS1) 10Jcrespo: Add new pool DatabasesCodfw to backup data generated on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/598005 (https://phabricator.wikimedia.org/T79922) [10:16:51] (03PS2) 10Jcrespo: Add new pool DatabasesCodfw to backup data generated on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/598005 (https://phabricator.wikimedia.org/T79922) [10:17:19] (03PS11) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) [10:17:23] (03PS1) 10Ema: Wrap metrics in a struct [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598006 (https://phabricator.wikimedia.org/T253197) [10:17:25] (03PS1) 10Ema: Rename update.go to metrics.go [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598007 (https://phabricator.wikimedia.org/T253197) [10:17:27] (03PS1) 10Ema: Basic testing with testify [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598008 (https://phabricator.wikimedia.org/T253197) [10:17:29] (03PS1) 10Ema: Packaging and copyring notices [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598009 (https://phabricator.wikimedia.org/T253197) [10:19:43] (03CR) 10Jcrespo: "@Akosiaris, I don't like this patch because I don't think this method scales for further modifications later on." [puppet] - 10https://gerrit.wikimedia.org/r/598005 (https://phabricator.wikimedia.org/T79922) (owner: 10Jcrespo) [10:19:56] (03PS1) 10Giuseppe Lavagetto: profile: remove unused profile::etcd::auth [puppet] - 10https://gerrit.wikimedia.org/r/598010 [10:20:59] (03PS25) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [10:22:46] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:22:55] (03PS26) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [10:23:32] (03CR) 10Volans: [C: 04-1] "Minor things to fix inline I think. General logic sounds good to me." (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [10:24:10] (03CR) 10JMeybohm: [C: 03+2] termbox: enable TLS with chart defaults [deployment-charts] - 10https://gerrit.wikimedia.org/r/597035 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [10:24:43] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:28:55] (03PS27) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [10:29:27] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/22700/etcd1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/598010 (owner: 10Giuseppe Lavagetto) [10:29:55] (03PS2) 10Ema: Basic testing with testify [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598008 (https://phabricator.wikimedia.org/T253197) [10:29:57] (03PS2) 10Ema: Packaging and copyring notices [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598009 (https://phabricator.wikimedia.org/T253197) [10:32:12] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [10:32:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:45] (03PS1) 10Dzahn: installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) [10:32:49] (03CR) 10Volans: "Some comments inline. LGTM in general." (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:32:58] (03CR) 10jerkins-bot: [V: 04-1] installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [10:33:34] (03PS2) 10Dzahn: installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) [10:33:53] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/594173 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:34:25] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [10:34:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:33] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin... [10:34:40] (03CR) 10jerkins-bot: [V: 04-1] installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [10:34:48] (03CR) 10Jbond: "updated thanks" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [10:34:59] (03PS7) 10Jbond: check_puppet_run_changes: remove staging hosts from this test [puppet] - 10https://gerrit.wikimedia.org/r/597014 [10:35:09] (03CR) 10Jcrespo: "I am thinking of moving job pool configuration to the job definition directly- making production the default pool, but being explicitly on" [puppet] - 10https://gerrit.wikimedia.org/r/598005 (https://phabricator.wikimedia.org/T79922) (owner: 10Jcrespo) [10:35:59] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [10:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:38] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [10:37:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:44] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin... [10:38:42] (03CR) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:41:35] (03CR) 10Jbond: "Thanks, bump most of theses comments to later patches to avoid a rebase nightmare" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:42:57] (03PS3) 10Dzahn: installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) [10:42:59] (03CR) 10Volans: "LGTM, minor nits inline." (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:43:10] (03CR) 10jerkins-bot: [V: 04-1] installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [10:43:33] (03PS4) 10Dzahn: installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) [10:44:00] jbond42: I'm wondering if maybe squashing all those commits before the rename into one wouldn't make it actually simpler to review :) [10:44:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1097:3314, db1097:3315 - T252512', diff saved to https://phabricator.wikimedia.org/P11281 and previous config saved to /var/cache/conftool/dbconfig/20200522-104437-marostegui.json [10:44:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:41] T252512: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 [10:44:41] then a rename without changes as those are impossible to detect in gerrit [10:45:43] PROBLEM - Check correctness of the icinga configuration on icinga1001 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga [10:45:56] (03PS1) 10JMeybohm: termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598013 (https://phabricator.wikimedia.org/T235411) [10:46:41] volans yes this patch set has got a bit messy to behonest. with some stuff fixed in the last patch that should really be in the earlier ones. ill take a look squashing and cleaning up [10:46:59] (03CR) 10Dzahn: [C: 03+2] installserver/light: limit access to webserver to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/598012 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [10:47:33] jbond42: I'm also ok to have all squashed into one commit and review just the final state [10:47:36] if that simplifies [10:48:11] ignoring the diffs completely [10:48:11] let me see how much trouble gerrit gives me :) [10:48:14] (03PS1) 10Marostegui: mariadb: Productionize db1144 into s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/598014 (https://phabricator.wikimedia.org/T252512) [10:48:33] !log Stop MySQL on db1097:3314, db1097:3315 to clone db1144 - T252512 [10:48:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:54] godog: FYI icinga config issue ^^^ seems related to the rename check_sso_redirect command into check_https_sso_redirect [10:49:07] is that temporary due to the exported resources double puppet run? [10:49:14] (03PS1) 10JMeybohm: zotero: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598015 (https://phabricator.wikimedia.org/T235411) [10:49:16] (03PS3) 10Ema: Packaging and copyring notices [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598009 (https://phabricator.wikimedia.org/T253197) [10:49:25] oh, and i was just wondering why certain checks did not disappear from icinga that i wanted to be gone [10:49:34] guess that will do it if config can't be reloaded [10:49:46] (03PS2) 10Marostegui: mariadb: Productionize db1144 into s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/598014 (https://phabricator.wikimedia.org/T252512) [10:50:16] mmmh but that merge is more than 1h ago [10:50:22] let me check if puppet is disabled there [10:50:23] (03CR) 10Marostegui: [C: 03+2] mariadb: Productionize db1144 into s4 and s5 [puppet] - 10https://gerrit.wikimedia.org/r/598014 (https://phabricator.wikimedia.org/T252512) (owner: 10Marostegui) [10:50:37] on icinga? nah, i just ran it [10:50:58] and no errors/warnings now [10:51:09] no logstash hosts that generates the error [10:51:14] but no, run there too [10:51:52] for my part it is fixed now and stuff is gone that i expected to be gone [10:52:38] yeah the change on the 2 logstash host has been applied onlu 5 minutes ago [10:52:41] all good now [10:54:14] (03PS7) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) [10:54:38] (03Abandoned) 10Jbond: cookbook sre.hosts.rotate-pdu-password: use request.Session and response.raise_for_status [cookbooks] - 10https://gerrit.wikimedia.org/r/594173 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:54:51] (03Abandoned) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:55:20] (03CR) 10Volans: cookbooks sre.hosts.rotate-pdu-password: reset SNMP (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [10:55:43] RECOVERY - Check correctness of the icinga configuration on icinga1001 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [10:58:30] cool, i see the recovery, thx volans [10:59:11] no prob mutante, and sorry for the earlier mis-reopen for the people task ;) [10:59:14] my bad [11:00:38] no worries at all, and i see your comment about using the fingerprint script, yea. that's right. could have mentioned it in my mail [11:01:01] that's mostly an SRE-only thing [11:01:09] yea, i figured not all people know it [11:01:11] I doubt anyone outside SRE would use that [11:01:28] though.. when i suggested to do something similar for bastion hosts (bast-eqiad, bast-codfw..) so people don't have to remember the number... [11:01:30] (03CR) 10JMeybohm: "@Ppchelko FYI (because of I0db887959617bf9a81c2c3853f079a8bde55809a)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/596141 (https://phabricator.wikimedia.org/T242461) (owner: 10JMeybohm) [11:01:50] it was abandoned again because of that issue that people would suddenly get new fingerprints [11:01:55] (03PS2) 10JMeybohm: restrouter: Remove chart and namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/596141 (https://phabricator.wikimedia.org/T242461) [11:03:44] also we don't really need to have people and peopleweb CNAMEs but one was added for.. well.. people.. and the other for envoy. i could change it some time [11:04:10] (03PS8) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) [11:04:12] (03PS1) 10Jbond: sre.hosts.rotate-pdu-passwords: rename file [cookbooks] - 10https://gerrit.wikimedia.org/r/598018 [11:05:11] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/597805 (https://phabricator.wikimedia.org/T97972) (owner: 10Giuseppe Lavagetto) [11:07:49] (03PS9) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) [11:08:39] (03Abandoned) 10Jbond: sre.hosts.rotate-pdu-passwords: rename file [cookbooks] - 10https://gerrit.wikimedia.org/r/598018 (owner: 10Jbond) [11:14:57] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Dzahn) [11:17:11] 10Operations, 10Cassandra, 10Core Platform Team Workboards (Clinic Duty Team): Revisit default settings for c-foreach-restart - https://phabricator.wikimedia.org/T198787 (10Aklapper) 05Stalled→03Open @Naike: The previous comments don't explain what/who exactly this task is stalled on (["If a report is wa... [11:17:21] (03PS1) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/598020 [11:17:24] (03PS1) 10Jbond: sre.pdus.rotate-password: split generic functions out to __init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/598021 [11:18:50] 10Operations, 10Traffic: varnish warnings: Invalid conf pair: lg_dirty_mult/lg_chunk - https://phabricator.wikimedia.org/T253379 (10ema) [11:18:58] 10Operations, 10Traffic: varnish warnings: Invalid conf pair: lg_dirty_mult/lg_chunk - https://phabricator.wikimedia.org/T253379 (10ema) p:05Triage→03Medium [11:19:11] (03CR) 10jerkins-bot: [V: 04-1] sre.pdus.rotate-password: split generic functions out to __init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/598021 (owner: 10Jbond) [11:19:53] (03CR) 10Volans: "The change and compiler looks sane:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) (owner: 10Giuseppe Lavagetto) [11:20:36] 10Operations, 10Traffic, 10conftool, 10Patch-For-Review, and 2 others: Figure out a security model for etcd - https://phabricator.wikimedia.org/T97972 (10Volans) +1 as the above schema of auth reflects mostly my old comment/proposal. What's the deploy strategy? [11:21:43] (03PS28) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [11:23:27] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [11:29:49] (03PS2) 10Jbond: sre.pdus.rotate-password: split generic functions out to __init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/598021 [11:31:33] (03PS29) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [11:35:10] (03PS30) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [11:35:19] 10Operations, 10serviceops: No mw canary servers in codfw - https://phabricator.wikimedia.org/T242606 (10Dzahn) 05Resolved→03Open a:05jijiki→03Dzahn reopening because i am decom'ing servers in T247018 and that included some canaries. so we need to assign new ones [11:41:49] (03PS31) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [11:43:14] (03PS3) 10Jbond: sre.pdus.rotate-password: split generic functions out to __init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/598021 [11:46:41] (03PS4) 10Jbond: sre.pdus.rotate-password: split generic functions out to __init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/598021 [11:46:43] (03PS32) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [11:47:24] (03CR) 10Ayounsi: [C: 03+1] cookbook sre.hosts.rotate-pdu-password: rename (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [11:48:10] (03PS1) 10Dzahn: site: decom mw2163 through mw2169 appservers [puppet] - 10https://gerrit.wikimedia.org/r/598025 (https://phabricator.wikimedia.org/T247018) [11:49:03] (03Abandoned) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [11:49:11] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [11:50:07] (03PS33) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [11:51:51] (03CR) 10Dzahn: [C: 03+2] site: fix comment regarding servers in the wrong rack [puppet] - 10https://gerrit.wikimedia.org/r/597773 (owner: 10Dzahn) [11:51:59] (03PS2) 10Dzahn: site: fix comment regarding servers in the wrong rack [puppet] - 10https://gerrit.wikimedia.org/r/597773 [11:54:22] (03CR) 10JMeybohm: [C: 03+2] restrouter: Remove chart and namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/596141 (https://phabricator.wikimedia.org/T242461) (owner: 10JMeybohm) [11:54:46] (03Merged) 10jenkins-bot: restrouter: Remove chart and namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/596141 (https://phabricator.wikimedia.org/T242461) (owner: 10JMeybohm) [11:56:38] (03PS2) 10Dzahn: site: decom mw2163 through mw2169 appservers [puppet] - 10https://gerrit.wikimedia.org/r/598025 (https://phabricator.wikimedia.org/T247018) [11:57:50] (03CR) 10Dzahn: "these are also from rack C3" [puppet] - 10https://gerrit.wikimedia.org/r/598025 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [11:59:58] (03CR) 10Dzahn: [C: 03+2] site: decom mw2163 through mw2169 appservers [puppet] - 10https://gerrit.wikimedia.org/r/598025 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [12:00:19] (03CR) 10Jbond: check_puppet_run_changes: remove staging hosts from this test (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [12:01:12] (03PS1) 10Hnowlan: changeprop: simplify config writing. make beta config write puppet-friendly YAML. [deployment-charts] - 10https://gerrit.wikimedia.org/r/598026 (https://phabricator.wikimedia.org/T251176) [12:03:53] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [12:03:53] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [12:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:00] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [12:04:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:09] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [12:06:01] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [12:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:09] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[21... [12:07:54] 10Operations, 10Traffic: Stale nic firmware files on some hosts - https://phabricator.wikimedia.org/T253374 (10CDanis) Thanks for the report, I'll have to dig through Puppet logs I guess. Here's all eqiad hosts that have an mtime more than 10 minutes in the past: https://w.wiki/RYU In codfw it's just the tha... [12:09:08] volans: yeah temporary! [12:11:12] (03CR) 10Jbond: [C: 03+2] check_puppet_run_changes: remove staging hosts from this test [puppet] - 10https://gerrit.wikimedia.org/r/597014 (owner: 10Jbond) [12:12:00] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [12:12:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:35] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [12:13:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:43] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[21... [12:18:29] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Dzahn) @papaul 20 servers from rack C3 have been decom'ed. mw2150 through mw2169. (lower part of... [12:19:08] 10Operations, 10serviceops: move all 86 new codfw appservers into production (mw2[291-2377].codfw.wmnet) - https://phabricator.wikimedia.org/T247021 (10Dzahn) 05Stalled→03Open [12:19:11] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install 86 new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Dzahn) [12:31:31] (03PS1) 10Jbond: check_puppet_run_changes.py: fix logic [puppet] - 10https://gerrit.wikimedia.org/r/598030 [12:31:42] (03PS1) 10Ssingh: aptrepo: add a component for dnsdist [puppet] - 10https://gerrit.wikimedia.org/r/598031 (https://phabricator.wikimedia.org/T252132) [12:33:04] (03CR) 10Jbond: [C: 03+2] check_puppet_run_changes.py: fix logic [puppet] - 10https://gerrit.wikimedia.org/r/598030 (owner: 10Jbond) [12:33:06] (03CR) 10Giuseppe Lavagetto: [C: 03+1] zotero: switch to common_templates v0.2 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/598015 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [12:33:38] (03CR) 10Giuseppe Lavagetto: [C: 03+1] termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598013 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [12:37:08] RECOVERY - Ensure hosts are not performing a change on every puppet run on puppetdb1002 is OK: OK: all nodes running as expected https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes [12:41:24] (03PS2) 10JMeybohm: termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598013 (https://phabricator.wikimedia.org/T235411) [12:42:08] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/598031 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [12:42:26] (03CR) 10JMeybohm: [C: 03+2] "Rebased for index.yaml" [deployment-charts] - 10https://gerrit.wikimedia.org/r/598013 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [12:42:36] (03CR) 10Ssingh: [C: 03+2] aptrepo: add a component for dnsdist [puppet] - 10https://gerrit.wikimedia.org/r/598031 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [12:42:46] (03Merged) 10jenkins-bot: termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598013 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [12:48:54] (03PS1) 10Dzahn: site: decom mw2170 - mw2172 [puppet] - 10https://gerrit.wikimedia.org/r/598036 (https://phabricator.wikimedia.org/T247018) [12:51:20] (03PS2) 10Dzahn: site: decom mw2170 - mw2172 [puppet] - 10https://gerrit.wikimedia.org/r/598036 (https://phabricator.wikimedia.org/T247018) [12:51:37] (03PS1) 10JMeybohm: termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598037 (https://phabricator.wikimedia.org/T235411) [12:51:39] 10Operations, 10Traffic: Stale nic firmware files on some hosts - https://phabricator.wikimedia.org/T253374 (10CDanis) P11283 contains syslog from one of the hosts where it didn't start properly. We see the service unit being installed, a systemctl daemon-reload, the timer unit being installed, then three sys... [12:53:54] (03PS2) 10JMeybohm: zotero: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598015 (https://phabricator.wikimedia.org/T235411) [12:54:14] (03CR) 10Giuseppe Lavagetto: [C: 03+1] termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598037 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [12:54:43] (03PS1) 10Elukey: alternatives: add a class for the java use case [puppet] - 10https://gerrit.wikimedia.org/r/598038 [12:56:01] (03PS3) 10JMeybohm: zotero: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598015 (https://phabricator.wikimedia.org/T235411) [12:56:54] !log depool cp4032 for some ats tests [12:56:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:04] (03CR) 10JMeybohm: [C: 03+2] termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598037 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [12:58:23] (03Merged) 10jenkins-bot: termbox: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598037 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [13:00:14] (03PS1) 10Dzahn: DHCP: remove recently decom'ed codfw appservers [puppet] - 10https://gerrit.wikimedia.org/r/598039 (https://phabricator.wikimedia.org/T247018) [13:00:26] (03PS4) 10JMeybohm: zotero: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598015 (https://phabricator.wikimedia.org/T235411) [13:02:43] (03PS2) 10Elukey: alternatives: add a class for the java use case [puppet] - 10https://gerrit.wikimedia.org/r/598038 [13:04:01] PROBLEM - Host mr1-ulsfo.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [13:04:08] (03PS1) 10Dzahn: mcrouter: replace proxy in codfw row C [puppet] - 10https://gerrit.wikimedia.org/r/598040 (https://phabricator.wikimedia.org/T247018) [13:04:28] (03CR) 10JMeybohm: [C: 03+2] "Rebased for index.yaml" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/598015 (https://phabricator.wikimedia.org/T235411) (owner: 10JMeybohm) [13:06:30] (03CR) 10Elukey: "Open questions:" [puppet] - 10https://gerrit.wikimedia.org/r/598038 (owner: 10Elukey) [13:07:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1144:3314 and db1144:3315 to the list of hosts', diff saved to https://phabricator.wikimedia.org/P11284 and previous config saved to /var/cache/conftool/dbconfig/20200522-130707-marostegui.json [13:07:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:44] mutante: o/ [13:08:00] about mcrouter - is the codfw proxy already down? [13:08:06] the one that is going to be swapped [13:08:08] (03PS3) 10Dzahn: site: decom mw2170 - mw2172 [puppet] - 10https://gerrit.wikimedia.org/r/598036 (https://phabricator.wikimedia.org/T247018) [13:08:15] !log jayme@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' . [13:08:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:29] because I think it is causing https://grafana.wikimedia.org/d/000000549/mcrouter?panelId=9&fullscreen&orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=All&var-instance=All&var-memcached_server=All [13:08:37] we don't have any failover for the codfw proxies [13:08:57] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 52 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:09:21] !log jayme@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' . [13:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:32] elukey: yes, it is. and here is the change https://gerrit.wikimedia.org/r/c/operations/puppet/+/598040/1/hieradata/common/mcrouter.yaml [13:09:39] RECOVERY - Host mr1-ulsfo.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 77.37 ms [13:09:53] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 57 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:10:00] mutante: sure but the change should have happened before the decom of the host [13:10:20] !log jayme@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' . [13:10:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:25] since now all traffic replicated to codfw hashed to the old decommed host is being TKOed [13:10:43] (03PS1) 10Marostegui: db1144: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/598041 (https://phabricator.wikimedia.org/T252512) [13:10:44] let's merge the change asap :) [13:10:49] (03PS1) 10CDanis: node_nic_firmware: add timer schedules [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) [13:10:56] !log jayme@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' . [13:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:16] (03PS2) 10Dzahn: mcrouter: replace proxy in codfw row C [puppet] - 10https://gerrit.wikimedia.org/r/598040 (https://phabricator.wikimedia.org/T247018) [13:11:32] elukey: ok, rebasing it so it goes asap. next time i will check better before [13:12:22] (03CR) 10Dzahn: [C: 03+2] mcrouter: replace proxy in codfw row C [puppet] - 10https://gerrit.wikimedia.org/r/598040 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [13:12:58] (03CR) 10Marostegui: [C: 03+2] db1144: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/598041 (https://phabricator.wikimedia.org/T252512) (owner: 10Marostegui) [13:14:06] (03PS5) 10JMeybohm: zotero: switch to common_templates v0.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/598015 (https://phabricator.wikimedia.org/T235411) [13:14:31] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 45 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:14:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11286 and previous config saved to /var/cache/conftool/dbconfig/20200522-131452-marostegui.json [13:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:37] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 45 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:16:28] elukey: does it need a puppet run? i dont see changes when doing a manual one on the appserver in question or a random mc codfw host [13:17:13] mutante: yes a puppet run is sufficient, on all mw1xxx hosts, and mcrouter will reload the config (uses inotify on the config file) [13:17:45] 10Operations, 10SRE-tools: E901 SyntaxError: invalid syntax is wrongly raised on using python's abc by jenkins python CI linter - https://phabricator.wikimedia.org/T152950 (10hashar) 05Stalled→03Resolved a:03jbond The patch https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/326155/ introduced a pytho... [13:17:47] (the mw1xxx are the ones contacting the codfw proxy to replicate traffic to codfw) [13:18:32] (03PS2) 10CDanis: node_nic_firmware: don't brick the timer on reboot [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) [13:18:39] (03CR) 10CDanis: "PCC looks good https://puppet-compiler.wmflabs.org/compiler1003/22705/thanos-be2002.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [13:21:05] elukey: running on mw-api-eqiad right now [13:21:25] (03CR) 10Ema: [C: 03+1] Release 8.0.7-1wm10 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/597552 (owner: 10Vgutierrez) [13:21:47] (03CR) 10Giuseppe Lavagetto: "The patch is correct, but I wonder if this is the right approach, see the code for my comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [13:22:28] (03PS5) 10Jbond: docker build: update the build process to us docker [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/596779 (https://phabricator.wikimedia.org/T251574) [13:23:56] (03CR) 10Jbond: "updated thanks" (033 comments) [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/596779 (https://phabricator.wikimedia.org/T251574) (owner: 10Jbond) [13:25:14] (03CR) 10CDanis: node_nic_firmware: don't brick the timer on reboot (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [13:28:37] (03PS2) 10Marostegui: dashboard.sql: Change storing time [software/tendril] - 10https://gerrit.wikimedia.org/r/597792 (https://phabricator.wikimedia.org/T252331) [13:29:01] mutante: also the appservers are showing the issue [13:29:04] (03CR) 10Marostegui: "> As long as it doesn't put down the mariadb server, I don't have any" [software/tendril] - 10https://gerrit.wikimedia.org/r/597792 (https://phabricator.wikimedia.org/T252331) (owner: 10Marostegui) [13:29:37] elukey: i ran puppet on mw-eqiad as well just some of them failed.. which i am re-enabling now [13:30:19] (03CR) 10CDanis: node_nic_firmware: don't brick the timer on reboot (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [13:32:01] repating the puppet run on mw-eqiad [13:32:23] tkos are now zero for apis [13:32:30] ok, good [13:32:33] but appservers are still affected [13:32:48] worst one is mw1264 [13:33:14] but should recover soon, just reloaded the config [13:33:20] puppet ran on that one 1 minute ago [13:35:59] elukey: all done. sorry! [13:36:04] (03CR) 10Gehel: "minor comment inline. Otherwise LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598038 (owner: 10Elukey) [13:36:05] https://grafana.wikimedia.org/d/000000549/mcrouter?panelId=9&fullscreen&orgId=1&var-source=eqiad%20prometheus%2Fops&var-cluster=All&var-instance=All&var-memcached_server=All&from=1590154025887&to=1590154535604 [13:40:05] mutante: can you open a task to create a monitor to alarm on sustained TKO rate? [13:40:11] I think we really need it [13:41:01] (03CR) 10Elukey: alternatives: add a class for the java use case (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598038 (owner: 10Elukey) [13:41:17] elukey: ok [13:42:55] (03PS4) 10Ema: Packaging and copyright notices [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598009 (https://phabricator.wikimedia.org/T253197) [13:43:10] _joe_: I'm working on a 'proper' patch now [13:43:24] OnActiveSec solves our original problem 🤦 [13:43:34] er, --> #-sre [13:43:41] (03PS3) 10Elukey: alternatives: add a class for the java use case [puppet] - 10https://gerrit.wikimedia.org/r/598038 [13:43:58] 10Operations, 10LDAP-Access-Requests: LDAP access request - add Christian Aistleitner to "nda" (or "wmf") - https://phabricator.wikimedia.org/T252875 (10QChris) Thanks. Works like a cham now! [13:44:15] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [13:44:16] 10Operations, 10observability: add monitoring of sustained memcached TKO rates - https://phabricator.wikimedia.org/T253384 (10Dzahn) [13:44:57] <_joe_> uhm [13:45:38] <_joe_> XioNoX: I don't see you working on eqsin's oob router, so this is real? ^^ [13:46:05] looking, v6 only, I'd bet on a provider issue [13:46:44] (03CR) 10Giuseppe Lavagetto: node_nic_firmware: don't brick the timer on reboot (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [13:47:13] v4 and v6 replies to pings for me [13:48:03] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10User-DannyS712: 502 error on beta commons - https://phabricator.wikimedia.org/T250103 (10Aklapper) [13:48:13] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 72 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:48:16] <_joe_> yeah I asked because I don't have v6 at home :/ [13:48:29] <_joe_> and yeah, this seems related :) [13:49:28] (03CR) 10Jcrespo: [C: 03+1] dashboard.sql: Change storing time [software/tendril] - 10https://gerrit.wikimedia.org/r/597792 (https://phabricator.wikimedia.org/T252331) (owner: 10Marostegui) [13:50:31] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Wrap metrics in a struct [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598006 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [13:52:12] looking more [13:52:25] hashar: wait!:p [13:53:05] (03CR) 10Giuseppe Lavagetto: Basic testing with testify (031 comment) [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598008 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [13:55:15] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 211.40 ms [13:56:48] (03PS1) 10Alexandros Kosiaris: restrouter: Cleanup some leftover hiera entries [puppet] - 10https://gerrit.wikimedia.org/r/598047 (https://phabricator.wikimedia.org/T242461) [13:57:58] nothing looks wrong on our side [13:58:01] (03CR) 10Ema: [V: 03+2 C: 03+2] Wrap metrics in a struct [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598006 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [13:58:09] (03PS1) 10CDanis: systemd::timer::job: unkludge OnUnitInactiveSec/OnUnitActiveSec [puppet] - 10https://gerrit.wikimedia.org/r/598050 [13:58:13] (03CR) 10Ema: [V: 03+2 C: 03+2] Rename update.go to metrics.go [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598007 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [13:58:14] looks like v4 took a small hit as well - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1&from=now-3h&to=now [13:58:33] probably an upstream issue [13:59:17] (03CR) 10jerkins-bot: [V: 04-1] systemd::timer::job: unkludge OnUnitInactiveSec/OnUnitActiveSec [puppet] - 10https://gerrit.wikimedia.org/r/598050 (owner: 10CDanis) [14:01:40] <_joe_> cdanis: stupid doubt: will OnActiveSec not fire after boot directly? [14:01:46] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) After discussion with @Dzahn we will d... [14:01:48] _joe_: hmmmmmm [14:01:50] 10Operations, 10observability, 10Sustainability (Incident Prevention): add monitoring of sustained memcached TKO rates - https://phabricator.wikimedia.org/T253384 (10Krinkle) [14:02:10] _joe_: I *suspect* it will, actually [14:02:30] <_joe_> lemme re-read the docs [14:02:30] because of the `WantedBy=multi-user.target` in the timer unit [14:03:01] that should translate to an activation of the timer unit, which then should trigger the OnActiveSec [14:03:12] <_joe_> yes [14:03:19] ok let's try it :D [14:03:32] <_joe_> I was about to say, OnActiveSec should start the unit at boot or at installation of the timer [14:03:49] <_joe_> assuming the timer gets activated on reboot, which is a different bag of problems in case [14:04:03] <_joe_> but also puppet will run on boot and ensure it's activated [14:04:08] <_joe_> so yes, it should work [14:05:37] yeah I concur [14:05:40] (03PS3) 10Ema: Basic testing with testify [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598008 (https://phabricator.wikimedia.org/T253197) [14:06:05] (03CR) 10Ema: Basic testing with testify (031 comment) [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598008 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [14:07:58] (03PS2) 10CDanis: systemd::timer::job: unkludge OnUnitInactiveSec/OnUnitActiveSec [puppet] - 10https://gerrit.wikimedia.org/r/598050 (https://phabricator.wikimedia.org/T253374) [14:08:10] (03Abandoned) 10CDanis: node_nic_firmware: don't brick the timer on reboot [puppet] - 10https://gerrit.wikimedia.org/r/598042 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [14:08:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11288 and previous config saved to /var/cache/conftool/dbconfig/20200522-140847-marostegui.json [14:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:03] (03CR) 10jerkins-bot: [V: 04-1] systemd::timer::job: unkludge OnUnitInactiveSec/OnUnitActiveSec [puppet] - 10https://gerrit.wikimedia.org/r/598050 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [14:09:16] (03PS5) 10Ema: Packaging and copyright notices [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598009 (https://phabricator.wikimedia.org/T253197) [14:13:42] (03PS1) 10JMeybohm: termbox: remove chart version pinning [deployment-charts] - 10https://gerrit.wikimedia.org/r/598055 [14:13:46] !log upload dnsdist_1.4.0-1~deb10u1 to apt.wm.o (buster) - T252132 [14:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:50] T252132: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 [14:15:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11289 and previous config saved to /var/cache/conftool/dbconfig/20200522-141513-marostegui.json [14:15:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:19] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 48 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [14:20:42] (03CR) 10Dzahn: [C: 03+2] site: decom mw2170 - mw2172 [puppet] - 10https://gerrit.wikimedia.org/r/598036 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [14:22:30] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [14:22:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:05] (03PS2) 10Dzahn: DHCP: remove recently decom'ed codfw appservers [puppet] - 10https://gerrit.wikimedia.org/r/598039 (https://phabricator.wikimedia.org/T247018) [14:24:18] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [14:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:26] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[21... [14:29:52] (03CR) 10Giuseppe Lavagetto: [C: 04-1] systemd::timer::job: unkludge OnUnitInactiveSec/OnUnitActiveSec (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/598050 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [14:30:28] (03PS1) 10Hashar: zuul: use modern [connection] section in config [puppet] - 10https://gerrit.wikimedia.org/r/598057 (https://phabricator.wikimedia.org/T253263) [14:30:30] (03PS1) 10Hashar: zuul: add a connection to gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/598058 (https://phabricator.wikimedia.org/T253263) [14:30:56] (03CR) 10Dzahn: [C: 03+2] DHCP: remove recently decom'ed codfw appservers [puppet] - 10https://gerrit.wikimedia.org/r/598039 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [14:33:10] (03CR) 10Hashar: "In the old times Zuul just add '[gerrit]' which has then be enhanced with the concept of connections. So we get something like:" [puppet] - 10https://gerrit.wikimedia.org/r/598057 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:33:57] (03CR) 10Ema: [V: 03+2 C: 03+2] Basic testing with testify [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598008 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [14:34:06] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install 86 new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Dzahn) 05Stalled→03Open Hi @Papaul 23 servers from rack C3 have been decom'ed. mw2150 through mw2172. (lower part of the rack) You can: - remove these p... [14:34:08] (03CR) 10Ema: [C: 03+2] Packaging and copyright notices [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/598009 (https://phabricator.wikimedia.org/T253197) (owner: 10Ema) [14:34:14] (03PS3) 10CDanis: systemd::timer::job: unkludge OnUnitInactiveSec/OnUnitActiveSec [puppet] - 10https://gerrit.wikimedia.org/r/598050 (https://phabricator.wikimedia.org/T253374) [14:34:55] (03CR) 10Hashar: "Untested, although I have looked over the zuul code and it should be fine. We can check it live on Tuesday morning, I don't want to take a" [puppet] - 10https://gerrit.wikimedia.org/r/598058 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:35:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11290 and previous config saved to /var/cache/conftool/dbconfig/20200522-143541-marostegui.json [14:35:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:28] (03CR) 10CDanis: "PCC looks good https://puppet-compiler.wmflabs.org/compiler1001/22710/" [puppet] - 10https://gerrit.wikimedia.org/r/598050 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [14:38:43] 10Operations, 10observability, 10Sustainability (Incident Prevention): add monitoring of sustained memcached TKO rates - https://phabricator.wikimedia.org/T253384 (10elukey) The current use cases are (to simplify assume eqiad): 1) one or more shards among mc10xx go down, and mcrouters on mw1xxx failover to... [14:40:05] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Dzahn) Technically resolved because we made more than enough room for the 5 (not 15 anymore, 10 we... [14:43:07] (03PS1) 10Kormat: mariadb: Enable notifications for db2137 [puppet] - 10https://gerrit.wikimedia.org/r/598061 (https://phabricator.wikimedia.org/T252985) [14:43:30] marostegui: ^ [14:43:45] (03CR) 10Marostegui: [C: 03+1] mariadb: Enable notifications for db2137 [puppet] - 10https://gerrit.wikimedia.org/r/598061 (https://phabricator.wikimedia.org/T252985) (owner: 10Kormat) [14:43:56] (03CR) 10Gehel: [C: 04-1] "see inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/598038 (owner: 10Elukey) [14:43:59] (03CR) 10QChris: [C: 04-1] zuul: add a connection to gerrit-test.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598058 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:44:08] (03CR) 10Kormat: [C: 03+2] mariadb: Enable notifications for db2137 [puppet] - 10https://gerrit.wikimedia.org/r/598061 (https://phabricator.wikimedia.org/T252985) (owner: 10Kormat) [14:46:09] (03CR) 10Elukey: alternatives: add a class for the java use case (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/598038 (owner: 10Elukey) [14:46:14] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install 86 new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) @Dzahn Thanks [14:46:45] (03CR) 10Elukey: alternatives: add a class for the java use case (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598038 (owner: 10Elukey) [14:47:05] (03CR) 10Hashar: zuul: add a connection to gerrit-test.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/598058 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:47:40] (03PS2) 10Hashar: zuul: add a connection to gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/598058 (https://phabricator.wikimedia.org/T253263) [14:50:13] (03CR) 10QChris: [C: 04-1] zuul: use modern [connection] section in config [puppet] - 10https://gerrit.wikimedia.org/r/598057 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:50:50] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add new pool DatabasesCodfw to backup data generated on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/598005 (https://phabricator.wikimedia.org/T79922) (owner: 10Jcrespo) [14:51:12] (03CR) 10QChris: [C: 04-1] zuul: add a connection to gerrit-test.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/598058 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:51:34] !log reedy@deploy1001 Synchronized php-1.35.0-wmf.32/maintenance/blockUsers.php: (no justification provided) (duration: 01m 09s) [14:51:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:38] (03CR) 10QChris: [C: 03+1] "Sorry clicked wrong button before. :-(" [puppet] - 10https://gerrit.wikimedia.org/r/598058 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:52:20] (03PS4) 10Elukey: alternatives: add a class for the java use case [puppet] - 10https://gerrit.wikimedia.org/r/598038 [14:52:43] (03CR) 10QChris: [C: 03+1] "Sorry, clicked wrong button before :-(" [puppet] - 10https://gerrit.wikimedia.org/r/598057 (https://phabricator.wikimedia.org/T253263) (owner: 10Hashar) [14:53:13] (03PS5) 10Elukey: alternatives: add a class for the java use case [puppet] - 10https://gerrit.wikimedia.org/r/598038 [14:53:33] (03CR) 10jerkins-bot: [V: 04-1] alternatives: add a class for the java use case [puppet] - 10https://gerrit.wikimedia.org/r/598038 (owner: 10Elukey) [14:53:46] !log reedy@deploy1001 Synchronized php-1.35.0-wmf.31/maintenance/blockUsers.php: (no justification provided) (duration: 01m 08s) [14:53:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:15] (03CR) 10Alexandros Kosiaris: [C: 03+1] "> @Alexandros, I looked into the issue. It's happening because you don't have MySQL setup locally. I think it's a non-issue." [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [14:55:19] (03PS14) 10Alexandros Kosiaris: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [14:57:14] (03CR) 10Alexandros Kosiaris: [C: 03+2] Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [14:57:34] (03Merged) 10jenkins-bot: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [14:59:05] 10Operations, 10Privacy Engineering, 10Research, 10Traffic, 10Privacy: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10JFishback_WMF) The last things I see are: 1) Remove as much of the YUI and Bootstrap CSS and JS as poss... [15:01:21] !log kormat@cumin1001 dbctl commit (dc=all): 'Pool db2137 into s4+s5 T252985', diff saved to https://phabricator.wikimedia.org/P11292 and previous config saved to /var/cache/conftool/dbconfig/20200522-150120-kormat.json [15:01:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:24] T252985: Productionize db213[6-9] and db2140 - https://phabricator.wikimedia.org/T252985 [15:01:50] (03CR) 10RLazarus: [C: 03+1] site: fix comment regarding servers in the wrong rack [puppet] - 10https://gerrit.wikimedia.org/r/597773 (owner: 10Dzahn) [15:02:22] (03CR) 10Alexandros Kosiaris: "This was added back in I9f73926b913782a1c5ef5cf5ee03398b7d57758a by WMDE as a precaution against upgrading by mistake to the latest versio" [deployment-charts] - 10https://gerrit.wikimedia.org/r/598055 (owner: 10JMeybohm) [15:02:25] (03PS1) 10Volans: sre.hosts.decommission: check repositories [cookbooks] - 10https://gerrit.wikimedia.org/r/598065 [15:03:56] (03CR) 10Marostegui: [V: 03+2 C: 03+2] dashboard.sql: Change storing time [software/tendril] - 10https://gerrit.wikimedia.org/r/597792 (https://phabricator.wikimedia.org/T252331) (owner: 10Marostegui) [15:06:27] (03PS6) 10Elukey: alternatives: add a class for the java use case [puppet] - 10https://gerrit.wikimedia.org/r/598038 [15:06:29] !log Decrease tendril_purge_global_status_log_5m storing rows time from 2 days to 1 day T252331 [15:06:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:32] T252331: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging - https://phabricator.wikimedia.org/T252331 [15:11:55] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/598050 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [15:12:14] (03CR) 10CDanis: [C: 03+2] systemd::timer::job: unkludge OnUnitInactiveSec/OnUnitActiveSec [puppet] - 10https://gerrit.wikimedia.org/r/598050 (https://phabricator.wikimedia.org/T253374) (owner: 10CDanis) [15:12:25] (03CR) 10RLazarus: [C: 03+1] site: decom mw2163 through mw2169 appservers [puppet] - 10https://gerrit.wikimedia.org/r/598025 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [15:15:32] (03CR) 10RLazarus: [C: 03+1] site: decom mw2170 - mw2172 [puppet] - 10https://gerrit.wikimedia.org/r/598036 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [15:20:01] (03PS1) 10RLazarus: site: Move the D3 hosts under the D3 comment [puppet] - 10https://gerrit.wikimedia.org/r/598066 [15:22:51] 10Operations, 10Privacy Engineering, 10Research, 10Traffic, 10Privacy: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10leila) @bmansurov please check JFishback_WMF's comment above and make the changes requested. [15:24:11] !log hnowlan@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [15:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:18] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:25:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:45] (03PS1) 10Hashar: zuul: add convenience link to 'zuul' bin [puppet] - 10https://gerrit.wikimedia.org/r/598068 (https://phabricator.wikimedia.org/T224591) [15:25:47] !log fixing prometheus-nic-firmware-textfile.service wherever it is broken T253374 [15:25:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:51] T253374: Stale nic firmware files on some hosts - https://phabricator.wikimedia.org/T253374 [15:26:26] (03CR) 10Hashar: "With the zuul Debian package, that was in the PATH ;)" [puppet] - 10https://gerrit.wikimedia.org/r/598068 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [15:29:57] (03PS2) 10Privacybatm: transfer.py: Add information to --help option [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597569 (https://phabricator.wikimedia.org/T253219) [15:30:04] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:43] 10Operations, 10Traffic, 10Patch-For-Review: Stale nic firmware files on some hosts - https://phabricator.wikimedia.org/T253374 (10CDanis) 05Open→03Resolved Fixed, and shouldn't happen again for this or other similar systemd::timer::jobs [15:36:46] 10Operations, 10Traffic: track NIC firmware version numbers across the fleet - https://phabricator.wikimedia.org/T236744 (10CDanis) [15:37:30] (03PS4) 10Cwhite: mtail: update varnishrls compatibility with rc35 [puppet] - 10https://gerrit.wikimedia.org/r/594316 (https://phabricator.wikimedia.org/T251466) [15:38:50] (03PS2) 10Alexandros Kosiaris: restrouter: Cleanup some leftover hiera entries [puppet] - 10https://gerrit.wikimedia.org/r/598047 (https://phabricator.wikimedia.org/T242461) [15:38:52] (03PS1) 10Alexandros Kosiaris: install: Switch kubernetes/kubestage2XXX to stretch [puppet] - 10https://gerrit.wikimedia.org/r/598069 (https://phabricator.wikimedia.org/T252185) [15:40:31] (03CR) 10Alexandros Kosiaris: [C: 03+2] restrouter: Cleanup some leftover hiera entries [puppet] - 10https://gerrit.wikimedia.org/r/598047 (https://phabricator.wikimedia.org/T242461) (owner: 10Alexandros Kosiaris) [15:41:13] (03CR) 10Alexandros Kosiaris: [C: 03+2] install: Switch kubernetes/kubestage2XXX to stretch [puppet] - 10https://gerrit.wikimedia.org/r/598069 (https://phabricator.wikimedia.org/T252185) (owner: 10Alexandros Kosiaris) [15:41:33] (03PS2) 10Alexandros Kosiaris: install: Switch kubernetes/kubestage2XXX to stretch [puppet] - 10https://gerrit.wikimedia.org/r/598069 (https://phabricator.wikimedia.org/T252185) [15:41:35] (03CR) 10Cwhite: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/594316 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [15:45:14] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:45:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:30] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:06] (03CR) 10Volans: "A couple of things to fix, looks good otherwise." (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:46:55] (03CR) 10Krinkle: [C: 03+1] "This will be good as it restores/maintains what we already have on contint1001, so that CI maintenance docs etc continue to be accurate on" [puppet] - 10https://gerrit.wikimedia.org/r/598068 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [15:47:49] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:47:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:58] (03PS1) 10Alexandros Kosiaris: Remove install[12]002 site.pp entries [puppet] - 10https://gerrit.wikimedia.org/r/598071 (https://phabricator.wikimedia.org/T224576) [15:50:52] (03CR) 10CDanis: [C: 03+1] "still LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/597765 (https://phabricator.wikimedia.org/T253292) (owner: 10Filippo Giunchedi) [15:52:35] (03CR) 10Volans: [C: 03+1] "LGTM for now. The whole diff_configs() could use a refactor, we're at 3 nested functions ;) But not urgent or a blocker." [software/conftool] - 10https://gerrit.wikimedia.org/r/597631 (https://phabricator.wikimedia.org/T253025) (owner: 10CDanis) [15:53:04] (03CR) 10Volans: [C: 03+1] "LGTM, one nit inline" (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/597634 (owner: 10CDanis) [15:53:36] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:35] (03PS4) 10CDanis: dbctl: diffs: cleanup return value accumulation [software/conftool] - 10https://gerrit.wikimedia.org/r/597634 [15:57:02] (03CR) 10CDanis: [C: 03+2] dbctl: diffs: cleanup return value accumulation (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/597634 (owner: 10CDanis) [15:57:09] (03CR) 10CDanis: [C: 03+2] dbctl: diffs: recurse into complicated sub-sections [software/conftool] - 10https://gerrit.wikimedia.org/r/597631 (https://phabricator.wikimedia.org/T253025) (owner: 10CDanis) [15:57:11] (03PS10) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) [15:57:32] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:43] (03CR) 10Jbond: "updated thanks" (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:57:58] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:58:20] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:32] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:58:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:47] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install kubernetes20[07-14].codfw.wmnet and kubestage200[1-2].codfw.wmnet. - https://phabricator.wikimedia.org/T252185 (10akosiaris) a:05Papaul→03akosiaris We debugged this with @papaul, patch above resolves it. Whil... [15:59:16] (03Merged) 10jenkins-bot: dbctl: diffs: recurse into complicated sub-sections [software/conftool] - 10https://gerrit.wikimedia.org/r/597631 (https://phabricator.wikimedia.org/T253025) (owner: 10CDanis) [15:59:18] (03Merged) 10jenkins-bot: dbctl: diffs: cleanup return value accumulation [software/conftool] - 10https://gerrit.wikimedia.org/r/597634 (owner: 10CDanis) [15:59:26] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [16:01:01] (03PS2) 10Jbond: cookbook sre.hosts.rotate-pdu-password: rename [cookbooks] - 10https://gerrit.wikimedia.org/r/598020 [16:03:29] (03PS1) 10Bstorm: icinga: switch bstorm-wmcs to bstorm-email [puppet] - 10https://gerrit.wikimedia.org/r/598072 [16:03:33] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [16:06:00] (03CR) 10Bstorm: [C: 03+2] icinga: switch bstorm-wmcs to bstorm-email [puppet] - 10https://gerrit.wikimedia.org/r/598072 (owner: 10Bstorm) [16:06:17] (03CR) 10Volans: [C: 03+1] "trivial, and +1 to the name change" [cookbooks] - 10https://gerrit.wikimedia.org/r/598020 (owner: 10Jbond) [16:08:59] (03PS5) 10Jbond: sre.pdus.rotate-password: split generic functions out to __init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/598021 [16:09:57] 10Operations, 10Striker, 10LDAP: Store Wikimedia unified account name (SUL) in LDAP directory - https://phabricator.wikimedia.org/T148048 (10Aklapper) [16:11:13] (03PS6) 10Jbond: sre.pdus.rotate-password: split generic functions out to __init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/598021 [16:11:54] (03PS34) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) [16:13:05] (03PS1) 10Ssingh: wikidough: install dnsdist and add configuration file [puppet] - 10https://gerrit.wikimedia.org/r/598073 (https://phabricator.wikimedia.org/T252132) [16:20:34] (03CR) 10Bearloga: [C: 03+1] "@Elukey: Could you please help with +2ing this?" [puppet] - 10https://gerrit.wikimedia.org/r/596221 (https://phabricator.wikimedia.org/T252365) (owner: 10Bearloga) [16:20:43] (03PS1) 10Hnowlan: changeprop-jobqueue: Set correct port, fix config indentation. [deployment-charts] - 10https://gerrit.wikimedia.org/r/598074 (https://phabricator.wikimedia.org/T220399) [16:29:39] (03CR) 10Volans: "Is this still needed?" [cookbooks] - 10https://gerrit.wikimedia.org/r/595649 (https://phabricator.wikimedia.org/T206951) (owner: 10Ryan Kemper) [16:29:46] (03CR) 10Krinkle: "He, I see. Nice :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/594316 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [16:31:12] (03PS1) 10JMeybohm: Readd wmf.chartid (.metadata.labels.chart) to all resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/598076 [16:33:06] (03CR) 10Bstorm: "Disabling puppet on toolsbeta-test-k8s-control-2/3 so I can validate this live on toolsbeta-test-k8s-control-1 without worrying it will bl" [puppet] - 10https://gerrit.wikimedia.org/r/597899 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [16:37:41] (03CR) 10Bstorm: "It's a noop on toolsbeta-test-k8s-control-1. Yay!" [puppet] - 10https://gerrit.wikimedia.org/r/597899 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [16:37:43] (03PS2) 10JMeybohm: Readd wmf.chartid (.metadata.labels.chart) to all resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/598076 [16:37:49] (03PS2) 10Ssingh: wikidough: install dnsdist and add configuration file [puppet] - 10https://gerrit.wikimedia.org/r/598073 (https://phabricator.wikimedia.org/T252132) [16:39:45] (03CR) 10Volans: "LGTM, couple of nits inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/596389 (owner: 10Ayounsi) [16:40:16] (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/compiler1001/22713/malmok.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/598073 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [16:40:55] (03CR) 10Volans: "Also probably worth going to 2.8.4 directly (or wait for .5 ;) )" [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/595717 (owner: 10CRusnov) [16:41:46] (03CR) 10Bstorm: [C: 03+2] "Because it's a noop on toolforge and I need this to bootstrap the cluster for paws, I'll merge it." [puppet] - 10https://gerrit.wikimedia.org/r/597899 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [16:42:08] (03CR) 10CRusnov: "> Patch Set 1:" [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/595717 (owner: 10CRusnov) [16:42:25] (03PS3) 10Volans: wmf-auto-reimage: fix autodetected rename MGMT [puppet] - 10https://gerrit.wikimedia.org/r/595931 (https://phabricator.wikimedia.org/T214314) [16:43:59] (03CR) 10Volans: [C: 03+2] wmf-auto-reimage: fix autodetected rename MGMT [puppet] - 10https://gerrit.wikimedia.org/r/595931 (https://phabricator.wikimedia.org/T214314) (owner: 10Volans) [16:56:21] PROBLEM - PHP opcache health on scandium is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [17:01:51] RECOVERY - PHP opcache health on scandium is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [17:08:14] (03PS1) 10BryanDavis: toolforge: remove elasticsearch5 role and profile manifests [puppet] - 10https://gerrit.wikimedia.org/r/598082 (https://phabricator.wikimedia.org/T236606) [17:13:12] (03CR) 10BryanDavis: "PCC output: https://puppet-compiler.wmflabs.org/compiler1001/22714/" [puppet] - 10https://gerrit.wikimedia.org/r/598082 (https://phabricator.wikimedia.org/T236606) (owner: 10BryanDavis) [17:14:25] (03CR) 10Andrew Bogott: [C: 03+2] toolforge: remove elasticsearch5 role and profile manifests [puppet] - 10https://gerrit.wikimedia.org/r/598082 (https://phabricator.wikimedia.org/T236606) (owner: 10BryanDavis) [17:20:30] 10Operations, 10Core Platform Team, 10Traffic, 10serviceops, and 2 others: Reduce rate of purges emitted by MediaWiki - https://phabricator.wikimedia.org/T250205 (10Krinkle) >>! In T250205#6154883, @aaron wrote: > I'm not fond of the idea of not sending purges for indirect edits Agreed. The proposal to st... [17:59:15] (03CR) 10Cwhite: mtail: update varnishrls compatibility with rc35 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/594316 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [17:59:26] (03PS7) 10Jeena Huneidi: Automate deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) [18:00:38] (03PS8) 10Jeena Huneidi: Automate deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) [18:01:33] 10Operations, 10Traffic, 10Patch-For-Review: check_http and SNI support - https://phabricator.wikimedia.org/T253292 (10RLazarus) p:05Triage→03Medium [18:08:49] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10AntiCompositeNumber) [18:23:00] (03PS2) 10CRusnov: Upgrade Netbox to v2.8.4-wmf [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/595717 [18:23:13] 10Operations, 10Core Platform Team, 10Traffic: Whitelist x-wikimedia-debug header field (currently not allowed by Access-Control-Allow-Headers in preflight response) - https://phabricator.wikimedia.org/T252826 (10RLazarus) Hi CPT, adding you back -- is it possible this is due to this section of [[ https://ge... [18:35:51] (03PS1) 10Bstorm: toolforge-kubeadm: kubeadm 1.16 requires docker 18.09 [puppet] - 10https://gerrit.wikimedia.org/r/598093 (https://phabricator.wikimedia.org/T250866) [18:45:19] 10Operations, 10SRE-tools: wmf-auto-reimage-host: failed to resolve mgmt FQDN while renaming host - https://phabricator.wikimedia.org/T214314 (10Volans) 05Open→03Resolved The patch above should have fixed the issue. [18:51:22] (03PS1) 10Alexandros Kosiaris: Add egress fixtures to all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/598098 (https://phabricator.wikimedia.org/T249927) [18:51:46] (03CR) 10jerkins-bot: [V: 04-1] Add egress fixtures to all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/598098 (https://phabricator.wikimedia.org/T249927) (owner: 10Alexandros Kosiaris) [18:58:58] (03PS2) 10Alexandros Kosiaris: Add egress fixtures to all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/598098 (https://phabricator.wikimedia.org/T249927) [18:59:00] (03PS1) 10Alexandros Kosiaris: mobileapps/chromium-render: Fix indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/598099 [18:59:51] (03CR) 10Alexandros Kosiaris: [C: 03+2] mobileapps/chromium-render: Fix indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/598099 (owner: 10Alexandros Kosiaris) [18:59:56] (03CR) 10Alexandros Kosiaris: [C: 03+2] Add egress fixtures to all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/598098 (https://phabricator.wikimedia.org/T249927) (owner: 10Alexandros Kosiaris) [19:00:14] (03Merged) 10jenkins-bot: mobileapps/chromium-render: Fix indentation [deployment-charts] - 10https://gerrit.wikimedia.org/r/598099 (owner: 10Alexandros Kosiaris) [19:00:20] (03Merged) 10jenkins-bot: Add egress fixtures to all charts [deployment-charts] - 10https://gerrit.wikimedia.org/r/598098 (https://phabricator.wikimedia.org/T249927) (owner: 10Alexandros Kosiaris) [19:01:49] (03PS2) 10Alexandros Kosiaris: zotero: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597783 (owner: 10Apakhomov) [19:02:07] (03CR) 10jerkins-bot: [V: 04-1] zotero: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597783 (owner: 10Apakhomov) [19:03:09] 10Operations, 10Core Platform Team, 10Traffic, 10Developer Productivity: Whitelist x-wikimedia-debug header field (currently not allowed by Access-Control-Allow-Headers in preflight response) - https://phabricator.wikimedia.org/T252826 (10Krinkle) [19:03:46] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Aklapper) [19:07:20] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) https://en.wikipedia.org/wiki/File:Lucille-Mareen_Mayr_(2018).jpg now works, and also did on every other wiki I... [19:11:57] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) `test test2 he ca en de it wikidat wvoyen 200 200 404 404 404 404 404 404 404 Fioan_Fiedler.jpg 200... [19:15:21] (03PS3) 10Privacybatm: transfer.py: Add information to --help option [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/597569 (https://phabricator.wikimedia.org/T253219) [19:22:01] (03PS1) 10Alexandros Kosiaris: common_templates: Deduplicate using symlinks [deployment-charts] - 10https://gerrit.wikimedia.org/r/598102 [19:23:09] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10BMacZero) More affected files, in case they are helpful: - https://commons.wikimedia.org/wiki/File:Arlington-Reef-2... [19:24:07] (03PS3) 10Alexandros Kosiaris: zotero: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597783 (owner: 10Apakhomov) [19:25:13] (03CR) 10Alexandros Kosiaris: [C: 03+2] "https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/597783/ was +2ed after this and I 've tried a couple of helm packages ma" [deployment-charts] - 10https://gerrit.wikimedia.org/r/598102 (owner: 10Alexandros Kosiaris) [19:25:39] (03Merged) 10jenkins-bot: common_templates: Deduplicate using symlinks [deployment-charts] - 10https://gerrit.wikimedia.org/r/598102 (owner: 10Alexandros Kosiaris) [19:25:51] (03CR) 10Alexandros Kosiaris: [C: 03+2] zotero: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597783 (owner: 10Apakhomov) [19:26:13] (03Merged) 10jenkins-bot: zotero: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597783 (owner: 10Apakhomov) [19:26:49] (03PS2) 10Alexandros Kosiaris: termbox: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597778 (owner: 10Apakhomov) [19:27:22] (03PS2) 10Alexandros Kosiaris: cxserver: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597752 (owner: 10Apakhomov) [19:27:43] (03PS2) 10Alexandros Kosiaris: wikifeeds: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597782 (owner: 10Apakhomov) [19:28:02] (03CR) 10Alexandros Kosiaris: [C: 03+2] termbox: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597778 (owner: 10Apakhomov) [19:28:24] (03Merged) 10jenkins-bot: termbox: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597778 (owner: 10Apakhomov) [19:28:39] (03PS3) 10Alexandros Kosiaris: cxserver: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597752 (owner: 10Apakhomov) [19:28:52] (03CR) 10Alexandros Kosiaris: [C: 03+2] wikifeeds: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597782 (owner: 10Apakhomov) [19:29:05] (03PS3) 10Alexandros Kosiaris: wikifeeds: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597782 (owner: 10Apakhomov) [19:29:21] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] wikifeeds: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597782 (owner: 10Apakhomov) [19:29:44] (03PS2) 10Alexandros Kosiaris: citoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597750 (owner: 10Apakhomov) [19:29:46] (03Merged) 10jenkins-bot: wikifeeds: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597782 (owner: 10Apakhomov) [19:33:12] (03PS3) 10Alexandros Kosiaris: citoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597750 (owner: 10Apakhomov) [19:34:37] (03PS4) 10Alexandros Kosiaris: citoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597750 (owner: 10Apakhomov) [19:36:01] (03CR) 10Alexandros Kosiaris: [C: 03+2] citoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597750 (owner: 10Apakhomov) [19:36:25] (03Merged) 10jenkins-bot: citoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597750 (owner: 10Apakhomov) [19:36:42] (03PS4) 10Alexandros Kosiaris: cxserver: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597752 (owner: 10Apakhomov) [19:38:55] (03CR) 10Alexandros Kosiaris: [C: 03+2] cxserver: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597752 (owner: 10Apakhomov) [19:39:21] (03Merged) 10jenkins-bot: cxserver: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597752 (owner: 10Apakhomov) [19:39:49] (03Abandoned) 10Alexandros Kosiaris: restrouter: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597791 (owner: 10Apakhomov) [19:39:52] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) To get a rough idea of how broadly this is happening, I grabbed a list of the latest 50 uploads to Commons. 6... [19:40:38] (03PS2) 10Alexandros Kosiaris: kask: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597776 (owner: 10Apakhomov) [19:44:04] (03CR) 10Alexandros Kosiaris: [C: 04-1] kask: added support egress rules (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597776 (owner: 10Apakhomov) [19:44:48] (03PS2) 10Alexandros Kosiaris: parsoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597789 (owner: 10Apakhomov) [19:46:07] (03CR) 10Alexandros Kosiaris: [C: 04-1] parsoid: added support egress rules (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597789 (owner: 10Apakhomov) [19:46:37] (03PS2) 10Alexandros Kosiaris: mediawiki-dev: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597787 (owner: 10Apakhomov) [19:47:20] (03CR) 10Alexandros Kosiaris: [C: 04-1] mediawiki-dev: added support egress rules (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597787 (owner: 10Apakhomov) [19:48:07] (03CR) 10Alexandros Kosiaris: [C: 04-1] chromium-render: added support egress rules (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597785 (owner: 10Apakhomov) [19:49:45] (03CR) 10Alexandros Kosiaris: [C: 04-1] mathoid: added support egress rules (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597777 (owner: 10Apakhomov) [19:50:08] (03PS3) 10Alexandros Kosiaris: changeprop: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597749 (owner: 10Apakhomov) [19:51:55] (03CR) 10Alexandros Kosiaris: [C: 03+2] changeprop: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597749 (owner: 10Apakhomov) [19:52:15] (03PS2) 10Alexandros Kosiaris: eventgate: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597772 (owner: 10Apakhomov) [19:52:18] (03Merged) 10jenkins-bot: changeprop: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597749 (owner: 10Apakhomov) [19:53:26] (03CR) 10Alexandros Kosiaris: [C: 04-1] eventgate: added support egress rules (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597772 (owner: 10Apakhomov) [19:54:01] (03CR) 10Alexandros Kosiaris: [C: 04-1] eventstreams: added support egress rules (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597774 (owner: 10Apakhomov) [20:09:33] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) For one file, https://commons.wikimedia.org/wiki/File:Terrible_scene_at_Manders.jpg, I iterated over every wiki... [20:16:30] (03PS1) 10BryanDavis: .gitignore: add wmcs-package-build.py [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/598108 [20:16:32] (03PS1) 10BryanDavis: Remove validation of Kubernetes self-signed API cert [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/598109 (https://phabricator.wikimedia.org/T253412) [20:16:47] (03CR) 10BryanDavis: [C: 03+2] .gitignore: add wmcs-package-build.py [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/598108 (owner: 10BryanDavis) [20:17:42] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) https://commons.wikimedia.org/wiki/File:Fioan_Fiedler.jpg is, uh, neatly partitioned along the alphabet in term... [20:18:11] (03Merged) 10jenkins-bot: .gitignore: add wmcs-package-build.py [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/598108 (owner: 10BryanDavis) [20:26:29] (03Abandoned) 10Jforrester: Undeploy ParsoidBatchAPI [mediawiki-config] - 10https://gerrit.wikimedia.org/r/563461 (https://phabricator.wikimedia.org/T242430) (owner: 10Reedy) [20:39:21] (03PS2) 10Krinkle: Clean up MWMultiVersion check in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 [20:45:46] (03PS1) 10Andrew Bogott: profile::openstack::base::designate::service: tighten up firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/598112 (https://phabricator.wikimedia.org/T251604) [20:46:00] (03PS3) 10Krinkle: Clean up MWMultiVersion check in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 [20:46:50] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Maile66) Don't know if this is related, but thought I'd mention it. About the same time the issue for this ticket was... [20:53:47] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Multichill) Could T250767 be related @Tgr ? [20:56:14] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) p:05Triage→03High [21:08:24] (03PS4) 10Krinkle: Clean up MWMultiVersion check in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 [21:09:04] (03PS5) 10Krinkle: Clean up MWMultiVersion check in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 [21:09:56] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some recent uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) [21:10:14] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) [21:10:52] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) AIUI, @aaron suspects that (newly-added?) WANObjectCache memcache key coalescing support might be at fault h... [21:22:48] (03CR) 10Jforrester: Clean up MWMultiVersion check in CommonSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 (owner: 10Krinkle) [21:27:30] (03CR) 10Bstorm: "So, this solution works, but it is surprisingly annoying on the command line." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/598109 (https://phabricator.wikimedia.org/T253412) (owner: 10BryanDavis) [21:27:57] (03CR) 10Bstorm: "It was that many warnings for a single restart command." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/598109 (https://phabricator.wikimedia.org/T253412) (owner: 10BryanDavis) [21:28:38] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, 10Patch-For-Review: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Krinkle) >>! In T253405#6159451, @CDanis wrote: > AIUI, @aaron suspects that (newly-added?) WA... [21:30:02] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Krinkle) a:03aaron [21:30:39] (03CR) 10BryanDavis: [C: 04-1] "Needs a fix for the warnings." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/598109 (https://phabricator.wikimedia.org/T253412) (owner: 10BryanDavis) [21:36:53] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Mbch331) [21:43:56] (03CR) 10Krinkle: Clean up MWMultiVersion check in CommonSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 (owner: 10Krinkle) [21:57:22] (03PS2) 10Krinkle: CommonSettings.php: Move uncondition/no-sideeffect includes up [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579814 [22:13:23] (03CR) 10Krinkle: CommonSettings.php: Move uncondition/no-sideeffect includes up (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579814 (owner: 10Krinkle) [22:14:56] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 3 others: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Speravir) Info: I renamed `File:Fioan Fiedler.jpg` to [[https://commons.wikimedia.org/wiki/File:Fiona_... [22:31:09] * Krinkle takes deploy lock and staging on mwdebug1002 [22:32:05] AaronSchulz: staged on mwdebug1002 [22:32:25] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 3 others: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) a:05aaron→03Krinkle [22:37:01] Krinkle: trivial check, it seems to fix https://en.wikipedia.org/wiki/File:Terrible_scene_at_Manders.jpg [22:37:06] Krinkle: seems fine to me in chrome [22:39:04] OK [22:40:05] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 3 others: Some (recent?) uploads to Commons are not avaliable on other wikis - https://phabricator.wikimedia.org/T253405 (10Krinkle) 05Open→03Resolved a:05Krinkle→03aaron [22:40:44] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.32/includes/filerepo/: Ie19613ef7643a (duration: 01m 08s) [22:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:41:19] https://en.wikipedia.org/wiki/File:Fioan_Fiedler.jpg [22:41:43] oh well that's interesting. the description is on enwiki and commons, but the thumbnail only displays on commons [22:42:01] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.31/includes/filerepo/: Ie19613ef7643a (duration: 01m 06s) [22:42:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:31] AaronSchulz: Krinkle: fixes all six files I've tried [22:42:55] AntiComposite: fix is being deployed right now I'm pretty sure. I see the thumb [22:42:56] on all wikis [22:43:07] yeah, works now [22:43:43] was just a weird transition state [22:57:48] (03CR) 10Jforrester: Clean up MWMultiVersion check in CommonSettings.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 (owner: 10Krinkle) [22:57:53] (03CR) 10Jforrester: [C: 03+1] Clean up MWMultiVersion check in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 (owner: 10Krinkle) [23:02:49] (03PS1) 10Subramanya Sastry: Bump rt-test clients to 24 [puppet] - 10https://gerrit.wikimedia.org/r/598131