[00:00:04] twentyafterfour: Your horoscope predicts another unfortunate Phabricator update deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T0000). [01:33:46] PROBLEM - PHP opcache health on mw2368 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [01:35:36] RECOVERY - PHP opcache health on mw2368 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [03:05:14] PROBLEM - PHP opcache health on mw2311 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [03:26:57] (03CR) 10Cwhite: [C: 03+1] profile::kibana: add the package_name parameter [puppet] - 10https://gerrit.wikimedia.org/r/587427 (https://phabricator.wikimedia.org/T246961) (owner: 10Elukey) [03:40:10] RECOVERY - PHP opcache health on mw2311 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [04:30:08] PROBLEM - PHP opcache health on mw2316 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [04:35:40] RECOVERY - PHP opcache health on mw2316 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [04:58:27] (03CR) 10Muehlenhoff: [C: 03+2] Create a repository component component/wmf-sre-laptop [puppet] - 10https://gerrit.wikimedia.org/r/587500 (owner: 10Muehlenhoff) [05:04:33] 10Operations, 10SRE-Access-Requests: Requesting access to analytics for andrew-wmde - https://phabricator.wikimedia.org/T249733 (10MoritzMuehlenhoff) p:05Triage→03Medium You have an updated NDA in our records, so that's covered. Adding @Nuria for approval on the Wikimedia Foundation end This will also ne... [05:07:46] !log upload trafficserver 8.0.6-1wm6 to apt.wm.o (buster) - T249335 [05:07:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:51] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [05:08:50] !log Deploy schema change on db1123 [05:08:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:42] (03PS1) 10Marostegui: install_server: Allow reimage of labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/587628 (https://phabricator.wikimedia.org/T249188) [05:22:36] (03PS1) 10Marostegui: install_server: Reimage pc2008 with buster and 10.4 [puppet] - 10https://gerrit.wikimedia.org/r/587629 [05:24:51] (03CR) 10Marostegui: [C: 03+2] install_server: Reimage pc2008 with buster and 10.4 [puppet] - 10https://gerrit.wikimedia.org/r/587629 (owner: 10Marostegui) [05:27:43] (03PS1) 10Marostegui: install_server: Allow reimage pc2008 [puppet] - 10https://gerrit.wikimedia.org/r/587630 [05:29:24] (03CR) 10Marostegui: [C: 03+2] install_server: Allow reimage pc2008 [puppet] - 10https://gerrit.wikimedia.org/r/587630 (owner: 10Marostegui) [05:32:09] (03PS1) 10Marostegui: db-codfw.php: Depool pc2008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587631 [05:34:00] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Depool pc2008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587631 (owner: 10Marostegui) [05:34:49] (03PS1) 10Marostegui: pc2008: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/587632 [05:34:55] (03Merged) 10jenkins-bot: db-codfw.php: Depool pc2008 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587631 (owner: 10Marostegui) [05:36:24] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 01m 08s) [05:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:36:44] (03CR) 10Marostegui: [C: 03+2] pc2008: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/587632 (owner: 10Marostegui) [05:37:51] !log Stop MySQL on pc2008 for upgrade to Buster and 10.4 [05:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:40:40] !log upgrade ats to version 8.0.6-1wm6 in cp[4025,4031,5005,5011] - T249335 [05:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:40:45] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [05:58:19] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10MoritzMuehlenhoff) This task evolved from an access request for analytics-privatedata-users tow... [06:00:16] (03PS1) 10Marostegui: Revert "install_server: Allow reimage pc2008" [puppet] - 10https://gerrit.wikimedia.org/r/587634 [06:01:19] (03CR) 10Elukey: [C: 03+2] profile::kibana: add the package_name parameter [puppet] - 10https://gerrit.wikimedia.org/r/587427 (https://phabricator.wikimedia.org/T246961) (owner: 10Elukey) [06:03:49] PROBLEM - PHP opcache health on mw2369 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [06:16:45] (03PS1) 10Giuseppe Lavagetto: mediawiki::web: nice envoy with the same priority as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/587636 [06:16:47] (03PS1) 10Giuseppe Lavagetto: services_proxy: reduce the number of requests per connection [puppet] - 10https://gerrit.wikimedia.org/r/587637 [06:18:18] (03CR) 10Ayounsi: [C: 03+2] uRPF enable globally as log only [homer/public] - 10https://gerrit.wikimedia.org/r/587526 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [06:20:58] RECOVERY - PHP opcache health on mw2369 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [06:21:14] !log push urpf log only to AMS - T244147 [06:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:41] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::web: nice envoy with the same priority as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/587636 (owner: 10Giuseppe Lavagetto) [06:22:08] (03CR) 10jerkins-bot: [V: 04-1] services_proxy: reduce the number of requests per connection [puppet] - 10https://gerrit.wikimedia.org/r/587637 (owner: 10Giuseppe Lavagetto) [06:24:21] 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Marostegui) I have also experienced `autoinstall/` not being managed by puppet issues :-( ` root@install1002:/srv/autoinstall# grep -iR "no-srv-format.cfg" * | grep pc2007 netboot.cfg: db1... [06:25:05] !log push urpf log only to eqsin - T244147 [06:25:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:30:35] !log push urpf log only to eqiad - T244147 [06:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:30] (03PS8) 10ArielGlenn: weekly dump of machine vision tables from commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/573351 (https://phabricator.wikimedia.org/T236431) [06:35:04] (03CR) 10ArielGlenn: [C: 03+2] weekly dump of machine vision tables from commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/573351 (https://phabricator.wikimedia.org/T236431) (owner: 10ArielGlenn) [06:36:06] !log disabling puppet on logstash host for CR deploy - T244147 [06:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:24] PROBLEM - Rate of JVM GC Old generation-s runs - logstash1011-production-logstash-eqiad on logstash1011 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1011&panelId=37 [06:36:35] (03CR) 10Ayounsi: [C: 03+2] Logstash: parse Juniper PFE firewall syslog. Take 2 [puppet] - 10https://gerrit.wikimedia.org/r/587513 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [06:37:42] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Marostegui) For the DB testing host we have a different procedure - https://wikitech.wikimedia.... [06:37:46] (03CR) 10Ayounsi: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/587513 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [06:42:33] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [06:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:58] !log confirmed on one host that the change didn't break logstash. Re-enable Puppet on logstash hosts - T244147 [06:44:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:09] (03PS1) 10Muehlenhoff: Sync jenkins to thirdparty/ci [puppet] - 10https://gerrit.wikimedia.org/r/587684 (https://phabricator.wikimedia.org/T224591) [06:45:02] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [06:45:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:45] (03PS4) 10Ema: prometheus: job definition for purged [puppet] - 10https://gerrit.wikimedia.org/r/587525 (https://phabricator.wikimedia.org/T249583) [06:52:12] (03CR) 10jerkins-bot: [V: 04-1] prometheus: job definition for purged [puppet] - 10https://gerrit.wikimedia.org/r/587525 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [06:57:03] (03PS1) 10Ema: purged: use processorcount with explicit namespace [puppet] - 10https://gerrit.wikimedia.org/r/587686 (https://phabricator.wikimedia.org/T249583) [06:59:37] !log upgrade ats to version 8.0.6-1wm7 in cp[4026,4032,5006,5012] [06:59:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:47] (03CR) 10Ema: [C: 03+2] purged: use processorcount with explicit namespace [puppet] - 10https://gerrit.wikimedia.org/r/587686 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [07:04:01] !log re-activate BGP to Zayo in eqiad [07:04:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:20] (03CR) 10Marostegui: [C: 03+2] Revert "install_server: Allow reimage pc2008" [puppet] - 10https://gerrit.wikimedia.org/r/587634 (owner: 10Marostegui) [07:09:15] (03PS5) 10Ema: prometheus: job definition for purged [puppet] - 10https://gerrit.wikimedia.org/r/587525 (https://phabricator.wikimedia.org/T249583) [07:10:19] (03PS1) 10Marostegui: Revert "dbproxy: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/587691 [07:10:21] !log switch urpf from log to syslog in ulsfo [07:10:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:01] (03PS2) 10Giuseppe Lavagetto: mediawiki::web: nice envoy with the same priority as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/587636 [07:11:09] (03PS2) 10Giuseppe Lavagetto: services_proxy: reduce the number of requests per connection [puppet] - 10https://gerrit.wikimedia.org/r/587637 [07:11:19] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/587691 (owner: 10Marostegui) [07:12:51] !log Repool labsdb1011 [07:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:19] (03CR) 10Ema: [C: 03+2] prometheus: job definition for purged [puppet] - 10https://gerrit.wikimedia.org/r/587525 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [07:16:47] (03CR) 10Muehlenhoff: [C: 03+2] Sync jenkins to thirdparty/ci [puppet] - 10https://gerrit.wikimedia.org/r/587684 (https://phabricator.wikimedia.org/T224591) (owner: 10Muehlenhoff) [07:19:47] (03CR) 10Filippo Giunchedi: autoinstall: fix kafka-jumbo.cfg for Buster (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/587560 (https://phabricator.wikimedia.org/T244506) (owner: 10Elukey) [07:22:32] (03PS1) 10Ayounsi: Junos firewall log parsing, fix logic [puppet] - 10https://gerrit.wikimedia.org/r/587692 (https://phabricator.wikimedia.org/T244147) [07:24:01] (03CR) 10Filippo Giunchedi: Junos firewall log parsing, fix logic (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587692 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [07:24:33] !log synched jenkins 222.1 to apt.wikimedia.org (buster-wikimedia, thirdparty/ci) T224591 [07:24:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:38] T224591: Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 [07:25:10] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki::web: nice envoy with the same priority as php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/587636 (owner: 10Giuseppe Lavagetto) [07:26:04] (03CR) 10Elukey: autoinstall: fix kafka-jumbo.cfg for Buster (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/587560 (https://phabricator.wikimedia.org/T244506) (owner: 10Elukey) [07:29:11] (03CR) 10Filippo Giunchedi: autoinstall: fix kafka-jumbo.cfg for Buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587560 (https://phabricator.wikimedia.org/T244506) (owner: 10Elukey) [07:34:47] (03CR) 10Ayounsi: Junos firewall log parsing, fix logic (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587692 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [07:35:19] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10MoritzMuehlenhoff) helm-diff also needs to be... [07:37:57] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10MoritzMuehlenhoff) So, the consensus is that access to stat1006 is not neeeded and instead acce... [07:38:38] (03CR) 10Filippo Giunchedi: [C: 03+1] Junos firewall log parsing, fix logic (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587692 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [07:38:50] 10Operations, 10Traffic, 10Security: HTTP MediaWiki API GET requests to Wikimedia wikis should not be redirected to HTTPS when they have a session cookie or Authorization header - https://phabricator.wikimedia.org/T247490 (10MoritzMuehlenhoff) p:05Triage→03Medium [07:39:25] (03CR) 10Ayounsi: [C: 03+2] Junos firewall log parsing, fix logic [puppet] - 10https://gerrit.wikimedia.org/r/587692 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [07:39:27] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Email to WikimediaUA mailing list from base-w[at]yandex.ru does not get delivered - https://phabricator.wikimedia.org/T247603 (10MoritzMuehlenhoff) p:05Triage→03Medium [07:40:30] 10Operations, 10OpenRefine, 10Traffic, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10MoritzMuehlenhoff) p:05Triage→03High [07:41:02] 10Operations, 10Wikimedia-Mailing-lists: add oauth login to mailing lists - https://phabricator.wikimedia.org/T249678 (10MoritzMuehlenhoff) p:05Triage→03Low [07:45:06] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Dzahn) [07:46:30] 10Operations, 10Cloud-VPS, 10User-fgiunchedi, 10cloud-services-team (Kanban): CPU scaling governor audit - https://phabricator.wikimedia.org/T225713 (10fgiunchedi) a:05fgiunchedi→03None Not actively working on this, respective service/hw owners assess the need/feasibility of this change [07:53:29] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Dzahn) [ ] E: Unable to locate package blubber... [07:53:33] 10Operations, 10OpenRefine, 10Traffic, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Joe) As far as I can see, apache (which sits beyond envoy) emits all cooki... [07:54:05] 10Operations, 10observability: dropped packets to kafkamon 9000/tcp - https://phabricator.wikimedia.org/T238794 (10fgiunchedi) kafkamon2001 shows up in the codfw config because it has the `kafka::monitoring` role applied, my recommendation would be to use a more specific `class_name` for that particular config... [07:56:36] !log contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3 (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) [07:56:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:47] !log contint2001 - a2dismod mpm_event - then run puppet to let it enable php_mod_7.3 (race condition like mentioned in https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) (T224591) [07:56:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:53] T224591: Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 [08:05:49] PROBLEM - Check systemd state on wtp1048 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:06:32] 10Operations, 10OpenRefine, 10Traffic, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Pintoch) Switching on headers capitalization would be absolutely fantastic... [08:19:34] RECOVERY - Check no envoy runtime configuration is left persistent on wtp1025 is OK: HTTP OK: HTTP/1.1 200 OK - 286 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [08:21:12] RECOVERY - Check systemd state on wtp1048 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:25:04] 10Operations, 10DBA, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Marostegui) I will reimage pc1008 as Buster with 10.4 and next week I will repool it back after 3 weeks since this incident so the keys repli... [08:25:55] (03PS4) 10Elukey: autoinstall: fix kafka-jumbo.cfg for Buster [puppet] - 10https://gerrit.wikimedia.org/r/587560 (https://phabricator.wikimedia.org/T244506) [08:27:16] (03CR) 10Elukey: "Filippo: renamed the vg volumes and also allowed 20% of space free to future needs." [puppet] - 10https://gerrit.wikimedia.org/r/587560 (https://phabricator.wikimedia.org/T244506) (owner: 10Elukey) [08:28:03] (03PS1) 10Giuseppe Lavagetto: tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) [08:31:41] (03CR) 10jerkins-bot: [V: 04-1] tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) (owner: 10Giuseppe Lavagetto) [08:36:15] (03PS1) 10Dzahn: add install.wikimedia.org CNAME to install1003 [dns] - 10https://gerrit.wikimedia.org/r/587698 (https://phabricator.wikimedia.org/T224576) [08:36:51] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/587560 (https://phabricator.wikimedia.org/T244506) (owner: 10Elukey) [08:41:26] (03PS1) 10Dzahn: install_server: replace apt.wm.org with install.wm.org in autoinstall URLs [puppet] - 10https://gerrit.wikimedia.org/r/587699 (https://phabricator.wikimedia.org/T224576) [08:42:38] (03CR) 10Filippo Giunchedi: Add Thanos query (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [08:42:48] (03PS3) 10Filippo Giunchedi: modules: add thanos-sidecar define and profile [puppet] - 10https://gerrit.wikimedia.org/r/586312 (https://phabricator.wikimedia.org/T233956) [08:42:50] (03PS3) 10Filippo Giunchedi: prometheus: add thanos-sidecar to prometheus@ops [puppet] - 10https://gerrit.wikimedia.org/r/586313 (https://phabricator.wikimedia.org/T233956) [08:42:52] (03PS4) 10Filippo Giunchedi: Add Thanos query [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) [08:42:54] (03PS4) 10Filippo Giunchedi: prometheus: scrape thanos sidecar/query metrics [puppet] - 10https://gerrit.wikimedia.org/r/586315 (https://phabricator.wikimedia.org/T233956) [08:42:57] (03CR) 10Dzahn: "also we need to fix ferm rules to make this work in the first place.. another patch coming soon" [puppet] - 10https://gerrit.wikimedia.org/r/587699 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [08:47:18] PROBLEM - PHP opcache health on mw2372 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [08:50:21] (03PS1) 10Dzahn: installserver: include preseed in apt role, not in light role [puppet] - 10https://gerrit.wikimedia.org/r/587701 (https://phabricator.wikimedia.org/T224576) [08:51:14] (03CR) 10Dzahn: "This is the alternative to the other 2 patches, include preseed profile where we already have a webserver setup." [puppet] - 10https://gerrit.wikimedia.org/r/587701 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [08:52:22] (03CR) 10Dzahn: "..or ... we can do https://gerrit.wikimedia.org/r/c/operations/puppet/+/587701 and forget about changing these. Since we already have a w" [puppet] - 10https://gerrit.wikimedia.org/r/587699 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [08:52:40] (03CR) 10Elukey: [C: 03+2] autoinstall: fix kafka-jumbo.cfg for Buster [puppet] - 10https://gerrit.wikimedia.org/r/587560 (https://phabricator.wikimedia.org/T244506) (owner: 10Elukey) [08:57:41] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: (Need by: TBD) rack/setup/install kafka-jumbo100[789].eqiad.wmnet - https://phabricator.wikimedia.org/T244506 (10elukey) Summary: - the partman recipe is fixed - 1009 seems good - 1007's mgmt is not reachable - 1008's mgmt works, but I can't pxe... [08:58:06] PROBLEM - PHP opcache health on mw2361 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:04:13] (03CR) 10Dzahn: [C: 03+2] installserver: include preseed in apt role, not in light role [puppet] - 10https://gerrit.wikimedia.org/r/587701 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [09:04:22] (03PS2) 10Dzahn: installserver: include preseed in apt role, not in light role [puppet] - 10https://gerrit.wikimedia.org/r/587701 (https://phabricator.wikimedia.org/T224576) [09:04:26] PROBLEM - PHP opcache health on mw2374 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:13:04] RECOVERY - PHP opcache health on mw2372 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:20:58] RECOVERY - PHP opcache health on mw2374 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:24:50] PROBLEM - PHP opcache health on mw2363 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:27:34] RECOVERY - PHP opcache health on mw2361 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:28:22] (03PS2) 10Hnowlan: Enable TLS for kafka connections [deployment-charts] - 10https://gerrit.wikimedia.org/r/587573 (https://phabricator.wikimedia.org/T249644) (owner: 10Ppchelko) [09:30:47] (03PS1) 10Ayounsi: Offload traffic from eqiad's NTT [homer/public] - 10https://gerrit.wikimedia.org/r/587703 (https://phabricator.wikimedia.org/T249808) [09:31:44] !log offload traffic from NTT eqiad - T249808 [09:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:23] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) > [ ] E: Unable to locate package zuul That one is intended... [09:40:03] (03PS1) 10Filippo Giunchedi: logstash: validate config files [puppet] - 10https://gerrit.wikimedia.org/r/587704 (https://phabricator.wikimedia.org/T221052) [09:40:08] (03PS1) 10Filippo Giunchedi: logstash: log safepoints only when running the daemon [puppet] - 10https://gerrit.wikimedia.org/r/587705 (https://phabricator.wikimedia.org/T221052) [09:41:48] 10Operations, 10Traffic: varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [09:41:55] 10Operations, 10Traffic: varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) p:05Triage→03High [09:43:59] 10Operations, 10Traffic: varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [09:44:13] 10Operations, 10Traffic: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [09:46:07] !log cp3051: disable transient storage limit and restart varnish-fe T249809 [09:46:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:13] T249809: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 [09:49:02] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) https://gerrit.wikimedia.org/r/587706... [09:51:10] PROBLEM - PHP opcache health on mw2370 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:51:26] (03PS2) 10Ayounsi: Offload traffic from eqiad's NTT [homer/public] - 10https://gerrit.wikimedia.org/r/587703 (https://phabricator.wikimedia.org/T249808) [09:52:56] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1001/21806/" [puppet] - 10https://gerrit.wikimedia.org/r/587704 (https://phabricator.wikimedia.org/T221052) (owner: 10Filippo Giunchedi) [09:54:39] 10Operations, 10MediaWiki-API, 10serviceops, 10Core Platform Team Workboards (External Code Reviews), 10MW-1.35-notes (1.35.0-wmf.27; 2020-04-07): CORS errors on commons on debug servers - https://phabricator.wikimedia.org/T249107 (10Tgr) 05Open→03Resolved a:03Tgr The error can't be reproduced on C... [09:57:42] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1003/21807/" [puppet] - 10https://gerrit.wikimedia.org/r/587705 (https://phabricator.wikimedia.org/T221052) (owner: 10Filippo Giunchedi) [10:02:10] RECOVERY - PHP opcache health on mw2370 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [10:03:14] RECOVERY - PHP opcache health on mw2363 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [10:05:05] (03CR) 10Muehlenhoff: add install.wikimedia.org CNAME to install1003 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/587698 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:12:12] (03CR) 10Giuseppe Lavagetto: "1 - I think you should read the puppet ca from disk, rather than pasting it in values.yaml" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/587298 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan) [10:17:50] (03PS1) 10ArielGlenn: make sure machine vision dumps dir is created on all dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/587709 (https://phabricator.wikimedia.org/T236431) [10:19:59] (03CR) 10ArielGlenn: [C: 03+2] make sure machine vision dumps dir is created on all dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/587709 (https://phabricator.wikimedia.org/T236431) (owner: 10ArielGlenn) [10:30:17] !log cp3051: re-enable transient storage limit, downgrade varnish to 5.1.3-1wm12 (no 0035-vbf_stp_condfetch_crash.patch) and restart varnish-fe T249809 [10:30:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:23] T249809: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 [10:31:59] (03PS4) 10Muehlenhoff: Setup idp-test2001 as IDP staging host [puppet] - 10https://gerrit.wikimedia.org/r/587429 (https://phabricator.wikimedia.org/T233930) [10:40:50] (03CR) 10Muehlenhoff: [C: 03+2] Setup idp-test2001 as IDP staging host [puppet] - 10https://gerrit.wikimedia.org/r/587429 (https://phabricator.wikimedia.org/T233930) (owner: 10Muehlenhoff) [10:43:27] !log repool cp3051 T249809 [10:43:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:32] T249809: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 [10:49:55] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [10:49:55] !log jmm@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [10:49:57] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [10:49:58] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:49:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:10] !log rolling upgrade to trafficserver 8.0.6-1mw7 [10:50:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:14] (03Abandoned) 10Dzahn: install_server: replace apt.wm.org with install.wm.org in autoinstall URLs [puppet] - 10https://gerrit.wikimedia.org/r/587699 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:57:52] (03CR) 10Dzahn: "I am close to abandoning it because https://gerrit.wikimedia.org/r/c/operations/puppet/+/587701 is the easier and better solution imho. On" [dns] - 10https://gerrit.wikimedia.org/r/587698 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:58:30] (03Abandoned) 10Dzahn: add install.wikimedia.org CNAME to install1003 [dns] - 10https://gerrit.wikimedia.org/r/587698 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:59:34] (03CR) 10Dzahn: "> Patch Set 14:" [puppet] - 10https://gerrit.wikimedia.org/r/587233 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T1100). [11:00:04] cormacparle and kart_: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:24] * kart_ is here [11:02:16] 10Operations, 10netops, 10cloud-services-team (Kanban): CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10aborrero) 05Open→03Stalled We are not planning on working on this anytime soon. [11:03:00] * cormacparle__ here! [11:03:07] (sorry I'm late) [11:03:37] I'm first on the list - ok for me to go ahead? [11:03:55] yes. Go ahead cormacparle__ [11:04:10] 👍 [11:05:22] PROBLEM - PHP opcache health on mw2355 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [11:06:39] o/ [11:07:37] (03CR) 10Cparle: [C: 03+2] Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 (owner: 10Cparle) [11:07:53] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 (owner: 10Cparle) [11:09:56] kart_: do you want to go ahead? I've got some merge conflicts that will take me a little while to fix ... [11:10:14] OK. Deploying my patch.. [11:10:45] (03PS2) 10KartikMistry: Enable ContentTranslation in Slovenian WP as a default tool [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587257 (https://phabricator.wikimedia.org/T248836) [11:10:59] (03PS1) 10Muehlenhoff: Make totp profile parameters optional [puppet] - 10https://gerrit.wikimedia.org/r/587714 [11:12:31] (03CR) 10KartikMistry: [C: 03+2] "SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587257 (https://phabricator.wikimedia.org/T248836) (owner: 10KartikMistry) [11:13:06] (03PS1) 10Arturo Borrero Gonzalez: toolforge: bastion: bump nproc limit to 250 [puppet] - 10https://gerrit.wikimedia.org/r/587715 (https://phabricator.wikimedia.org/T219070) [11:13:24] (03Merged) 10jenkins-bot: Enable ContentTranslation in Slovenian WP as a default tool [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587257 (https://phabricator.wikimedia.org/T248836) (owner: 10KartikMistry) [11:14:59] (03PS2) 10Cparle: Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 [11:16:40] (03PS3) 10Cparle: Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 [11:17:37] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: bastion: bump nproc limit to 250 [puppet] - 10https://gerrit.wikimedia.org/r/587715 (https://phabricator.wikimedia.org/T219070) (owner: 10Arturo Borrero Gonzalez) [11:18:22] (03CR) 10Matthias Mullie: [C: 03+1] Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 (owner: 10Cparle) [11:18:40] kart_: ready to go now whenever you're done [11:18:58] cormacparle__: finishing 1st sync.. [11:19:16] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|587257|Enable ContentTranslation as a default tool in Slovenian WP (T248836)]] (duration: 01m 07s) [11:19:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:22] T248836: Enable Content Translation in Slovenian Wikipedia as a default tool - https://phabricator.wikimedia.org/T248836 [11:20:43] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|587257|Enable ContentTranslation as a default tool in Slovenian WP (T248836)]], take II (duration: 01m 06s) [11:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:01] cormacparle__: done. [11:21:09] kart_: thank you! [11:22:21] (03CR) 10Cparle: [C: 03+2] Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 (owner: 10Cparle) [11:23:15] (03Merged) 10jenkins-bot: Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 (owner: 10Cparle) [11:28:38] (03PS1) 10Cparle: Revert "Revert "Revert "Enable WikibaseQualityConstraints on commons""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587718 [11:30:52] (03PS2) 10Cparle: Revert "Revert "Revert "Enable WikibaseQualityConstraints on commons""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587718 [11:32:56] (03CR) 10Cparle: [C: 03+2] Revert "Revert "Revert "Enable WikibaseQualityConstraints on commons""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587718 (owner: 10Cparle) [11:33:55] (03Merged) 10jenkins-bot: Revert "Revert "Revert "Enable WikibaseQualityConstraints on commons""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587718 (owner: 10Cparle) [11:33:56] PROBLEM - PHP opcache health on mw2373 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [11:35:45] ok I'm done now too [11:45:44] RECOVERY - PHP opcache health on mw2355 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [11:51:01] Lucas_WMDE: we have something else we'd like to get out asap [11:51:04] https://commons.wikimedia.org/wiki/Special:SuggestedTags is broken [11:51:14] would it be ok to overrun the window a little? [11:51:39] (03CR) 10Dzahn: [C: 03+1] cescout: update metadb's data directory [puppet] - 10https://gerrit.wikimedia.org/r/587559 (owner: 10Ssingh) [11:53:00] (03CR) 10Ssingh: [C: 03+2] cescout: update metadb's data directory [puppet] - 10https://gerrit.wikimedia.org/r/587559 (owner: 10Ssingh) [11:53:24] RECOVERY - Rate of JVM GC Old generation-s runs - logstash1011-production-logstash-eqiad on logstash1011 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1011&panelId=37 [11:56:02] (03CR) 1020after4: [C: 03+1] phabricator: remove firewall holes for port 80 from caches [puppet] - 10https://gerrit.wikimedia.org/r/569100 (owner: 10Dzahn) [11:57:00] !log offload more traffic from NTT eqiad - T249808 [11:57:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:13] (03PS3) 10Ayounsi: Offload traffic from eqiad's NTT [homer/public] - 10https://gerrit.wikimedia.org/r/587703 (https://phabricator.wikimedia.org/T249808) [12:02:23] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020): Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10Aklapper) I propose to resolve this task as I can access hive via SSH (and that was my goal). :) [12:03:16] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020): Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10Aklapper) @elukey: Regarding Superset only: >>! In T248905#6036328, @elukey wrote: >> ** The site constantly asks me to re-login after every ac... [12:03:23] we (cparle & me) are just waiting for another patch to pass CI and would like to deploy in ~half hour, if no-one objects [12:07:41] what patch is it? [12:08:54] (03PS1) 10Ssingh: cescout: additional changes for metadb sync [puppet] - 10https://gerrit.wikimedia.org/r/587723 (https://phabricator.wikimedia.org/T247273) [12:09:42] apergos: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MachineVision/+/587721 [12:10:24] RECOVERY - PHP opcache health on mw2373 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:11:20] apergos: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MachineVision/+/587721/ [12:11:40] ooh, sorry, didn't see matthiasmullie had already replied :) [12:11:51] not to be a dick about this, but is there a reason this isn't on the deployment calendar? i.e. is it a ubn? or can it wait for the next slot? [12:12:59] https://commons.wikimedia.org/wiki/Special:SuggestedTags#popular is broken [12:13:43] can you stick aroun to rollback immediately if there are issues? [12:13:46] *around [12:13:50] (03CR) 10jerkins-bot: [V: 04-1] cescout: additional changes for metadb sync [puppet] - 10https://gerrit.wikimedia.org/r/587723 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [12:14:04] sure [12:14:20] I guess deploy to mwdeploy , check that it works fine there then sync on the rest ;) [12:14:31] all right, please let me know just before you deploy so I can be watching as well [12:16:59] hashar: re you giving the thumbs up from releng? I guess releng and sre should both give the ok, see https://wikitech.wikimedia.org/wiki/Deployments/Emergencies [12:16:59] it wasn't not on the calendar because we only just discovered it this AM - if it must wait until next deployment slot we can live with it (but that time is very inconvenient for 2 euros, and tomorrow is friday, so would prefer now-ish :p) [12:17:54] but next slot works if too much trouble ;) [12:19:05] apergos: yeah +1 :) [12:19:22] I am very liberal with hotfix deploy [12:19:35] and that specific ones seems really harmless ;) [12:19:47] I am happy to sign off on a task / change if needed [12:20:04] cool thanks :) [12:20:06] I don't think we need that level of formality [12:20:19] ace, thanks! [12:20:19] just that someone on both teams has given the go ahead [12:20:46] again, ping me before you push so I can be around [12:21:29] will do! [12:23:59] oh MachineVision is tested with Wikibase [12:24:48] hence the ~30min or so wait, a lot of tests to run :D [12:24:56] yeah :( [12:25:04] we could at least split the selenium tests to their own job [12:26:32] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020): Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10elukey) 05Open→03Resolved +1 for the new task! [12:27:23] 10Operations, 10Traffic: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [12:31:44] !log cp3051: upgrade varnish to 5.1.3-1wm13 once again, restart varnish-fe T249809 [12:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:50] T249809: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 [12:31:53] cormacparle__: sorry, I wasn’t paying attention to IRC :( [12:32:15] that's grand - hashar and apergos say ok anyway :) [12:32:19] great :) [12:32:21] (03PS2) 10Ssingh: cescout: additional changes for metadb sync [puppet] - 10https://gerrit.wikimedia.org/r/587723 (https://phabricator.wikimedia.org/T247273) [12:34:16] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Rebuild helm/helm-diff for buster-wikimedia - https://phabricator.wikimedia.org/T249812 (10MoritzMuehlenhoff) [12:35:12] apergos: hashar: patch merged; ok to proceed with deployment? [12:35:28] I'm here, go ahead [12:40:13] seems to work just fine on mwdebug1001 - cparle testing to confirm [12:41:47] confirmed - will sync [12:43:38] !log mlitn@deploy1001 Synchronized php-1.35.0-wmf.27/extensions/MachineVision/: [MachineVision] Fix statement creation from suggestion (duration: 01m 09s) [12:43:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:42] all done! thanks for the flexibility ;) [12:45:52] (03PS1) 10Aklapper: phabricator weekly changes email: List projects with empty description [puppet] - 10https://gerrit.wikimedia.org/r/587725 (https://phabricator.wikimedia.org/T249805) [12:46:09] (03PS1) 10Elukey: admin: allow gpu-users to use radeontop [puppet] - 10https://gerrit.wikimedia.org/r/587726 [12:47:07] (03Abandoned) 10Elukey: Allow gpu-testers to run radeontop [puppet] - 10https://gerrit.wikimedia.org/r/518210 (https://phabricator.wikimedia.org/T220811) (owner: 10Muehlenhoff) [12:48:11] (03PS3) 10Ssingh: cescout: additional changes for metadb sync [puppet] - 10https://gerrit.wikimedia.org/r/587723 (https://phabricator.wikimedia.org/T247273) [12:50:03] there was a spike of 5xx errors which seems to have subsided, but keeping an eye on things nonetheless [12:51:40] RECOVERY - Maps - OSM synchronization lag - codfw on icinga1001 is OK: (C)2.592e+05 ge (W)1.764e+05 ge 1.183e+05 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1 [12:51:49] kk [12:52:44] apergos: LMK if the problem persist, I'm around! [12:53:02] thanks for staying available! [13:05:57] things look ok, I'm going to stop watching the graphs now :-) [13:09:16] matthiasmullie: cormacparle__ congratulations :] [13:10:28] thanks [13:12:03] (03CR) 10Ottomata: Changeprop: add puppet CA cert to environment variables (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/587298 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan) [13:16:05] 10Operations, 10Maps, 10Toolforge, 10cloud-services-team (Kanban): maps: whitelist/reduce ratelimit from requests with toolforge.org referrer - https://phabricator.wikimedia.org/T249815 (10aborrero) [13:20:29] (03PS1) 10Urbanecm: whitelist ratelimit from requests with toolforge.org referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) [13:21:48] (03PS1) 10Arturo Borrero Gonzalez: maps block: allow toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/587731 (https://phabricator.wikimedia.org/T249815) [13:23:06] (03Abandoned) 10Urbanecm: whitelist ratelimit from requests with toolforge.org referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [13:23:38] (03PS2) 10Giuseppe Lavagetto: tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) [13:23:40] (03PS1) 10Giuseppe Lavagetto: parsoid: allow retries for connection resets in envoy [puppet] - 10https://gerrit.wikimedia.org/r/587732 (https://phabricator.wikimedia.org/T249705) [13:23:42] (03CR) 10Hnowlan: [C: 03+2] Enable TLS for kafka connections [deployment-charts] - 10https://gerrit.wikimedia.org/r/587573 (https://phabricator.wikimedia.org/T249644) (owner: 10Ppchelko) [13:23:44] (03PS1) 10Giuseppe Lavagetto: tlsproxy: add the ability to define an idle timeout for upstream connections [puppet] - 10https://gerrit.wikimedia.org/r/587733 [13:23:46] (03PS1) 10Giuseppe Lavagetto: mediawiki: set idle timeout for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/587734 [13:23:48] (03Abandoned) 10Arturo Borrero Gonzalez: maps block: allow toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/587731 (https://phabricator.wikimedia.org/T249815) (owner: 10Arturo Borrero Gonzalez) [13:24:21] (03Restored) 10Urbanecm: whitelist ratelimit from requests with toolforge.org referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [13:24:32] (03PS2) 10Urbanecm: maps: allow requests with toolforge.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) [13:24:52] (03Merged) 10jenkins-bot: Enable TLS for kafka connections [deployment-charts] - 10https://gerrit.wikimedia.org/r/587573 (https://phabricator.wikimedia.org/T249644) (owner: 10Ppchelko) [13:25:45] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] maps: allow requests with toolforge.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [13:27:00] (03CR) 10jerkins-bot: [V: 04-1] tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) (owner: 10Giuseppe Lavagetto) [13:27:34] (03PS2) 10Aklapper: phabricator weekly changes email: List projects with empty description [puppet] - 10https://gerrit.wikimedia.org/r/587725 (https://phabricator.wikimedia.org/T249805) [13:28:07] (03CR) 10jerkins-bot: [V: 04-1] tlsproxy: add the ability to define an idle timeout for upstream connections [puppet] - 10https://gerrit.wikimedia.org/r/587733 (owner: 10Giuseppe Lavagetto) [13:29:30] (03PS3) 10Giuseppe Lavagetto: tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) [13:31:18] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [13:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:53] (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/compiler1002/21811/cescout1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/587723 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [13:32:57] (03CR) 10jerkins-bot: [V: 04-1] tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) (owner: 10Giuseppe Lavagetto) [13:40:12] (03PS4) 10Giuseppe Lavagetto: tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) [13:44:14] (03CR) 10CDanis: [C: 03+1] maps: allow requests with toolforge.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [13:45:15] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [13:45:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:20] (03PS2) 10Huji: Restore the 'reviewer' group for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587301 (https://phabricator.wikimedia.org/T249643) [13:47:29] (03CR) 10Huji: "Typo is fixed now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587301 (https://phabricator.wikimedia.org/T249643) (owner: 10Huji) [13:48:48] (03CR) 10Muehlenhoff: [C: 03+1] "Seems fine" [puppet] - 10https://gerrit.wikimedia.org/r/587726 (owner: 10Elukey) [13:50:11] 10Operations, 10Maps, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): maps: whitelist/reduce ratelimit from requests with toolforge.org referrer - https://phabricator.wikimedia.org/T249815 (10MoritzMuehlenhoff) p:05Triage→03High [13:52:06] (03PS4) 10Muehlenhoff: admin: Remove jpita account, only apply special privileges to Josepita [puppet] - 10https://gerrit.wikimedia.org/r/583720 (https://phabricator.wikimedia.org/T247722) (owner: 10Volans) [13:52:43] (03CR) 10Giuseppe Lavagetto: [C: 03+2] tlsproxy: allow capitalizing headers when connections are http/1.1 [puppet] - 10https://gerrit.wikimedia.org/r/587697 (https://phabricator.wikimedia.org/T249680) (owner: 10Giuseppe Lavagetto) [13:57:50] (03PS1) 10Andrew Bogott: wmf_sink: update conf settings [puppet] - 10https://gerrit.wikimedia.org/r/587741 (https://phabricator.wikimedia.org/T242766) [13:58:20] (03CR) 10Muehlenhoff: [C: 03+2] admin: Remove jpita account, only apply special privileges to Josepita [puppet] - 10https://gerrit.wikimedia.org/r/583720 (https://phabricator.wikimedia.org/T247722) (owner: 10Volans) [13:59:12] (03CR) 10Andrew Bogott: [C: 03+2] wmf_sink: update conf settings [puppet] - 10https://gerrit.wikimedia.org/r/587741 (https://phabricator.wikimedia.org/T242766) (owner: 10Andrew Bogott) [13:59:33] (03PS2) 10Muehlenhoff: admin: Complete remove all references to jpita [puppet] - 10https://gerrit.wikimedia.org/r/585437 (https://phabricator.wikimedia.org/T247722) (owner: 10Jcrespo) [14:03:53] (03CR) 10Alexandros Kosiaris: [C: 04-1] services_proxy: reduce the number of requests per connection (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587637 (owner: 10Giuseppe Lavagetto) [14:04:48] (03CR) 10Muehlenhoff: [C: 03+2] admin: Complete remove all references to jpita [puppet] - 10https://gerrit.wikimedia.org/r/585437 (https://phabricator.wikimedia.org/T247722) (owner: 10Jcrespo) [14:04:54] (03CR) 10Andrew Bogott: [C: 03+1] "These all look right to me -- we'll want to keep a close eye on things after it merges though." [puppet] - 10https://gerrit.wikimedia.org/r/587556 (owner: 10Arturo Borrero Gonzalez) [14:10:20] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Pita - https://phabricator.wikimedia.org/T247722 (10MoritzMuehlenhoff) 05Open→03Resolved The old identity has been cleaned up, closing. [14:21:20] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10Papaul) Server is still not about to read from partman recipe [14:22:02] (03PS1) 10Giuseppe Lavagetto: envoyproxy::tls_terminator: move option for non-SNI to the correct position [puppet] - 10https://gerrit.wikimedia.org/r/587773 (https://phabricator.wikimedia.org/T249680) [14:23:01] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] envoyproxy::tls_terminator: move option for non-SNI to the correct position [puppet] - 10https://gerrit.wikimedia.org/r/587773 (https://phabricator.wikimedia.org/T249680) (owner: 10Giuseppe Lavagetto) [14:30:02] (03CR) 10Ayounsi: [C: 03+2] Offload traffic from eqiad's NTT [homer/public] - 10https://gerrit.wikimedia.org/r/587703 (https://phabricator.wikimedia.org/T249808) (owner: 10Ayounsi) [14:30:53] (03CR) 10RLazarus: [C: 03+1] maps: allow requests with toolforge.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [14:31:53] jouncebot: next [14:31:53] In 1 hour(s) and 28 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T1600) [14:32:03] (03PS12) 10Dzahn: zuul: provision the scap repository [puppet] - 10https://gerrit.wikimedia.org/r/579587 (https://phabricator.wikimedia.org/T215458) (owner: 10Hashar) [14:32:48] !log disable down interfaces from fasw-c-codfw (mintaka) [14:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:01] Jeff_Green: ^ [14:33:18] fyi, interfaces down for more than 7 days alert [14:33:38] Xionox is there a way to ack the alert without disabling the interface? [14:34:26] I haven't put mintaka in decom yet because although we have the replacement built it's a stretch->buster change and I'm waiting for fr-tech to test and sign off on it. [14:34:55] Jeff_Green: do you have access to https://librenms.wikimedia.org/alerts ? [14:35:13] Jeff_Green: note that the interfaces can be re-enabled if the host is coming back online [14:36:26] XioNoX: ok [14:36:49] checking librems... [14:37:18] I do have access [14:37:55] Jeff_Green: can you click on the red icon in front of fasw-c-codfw? [14:37:58] Maybe we should sign me and Dallas up for alerts for the frack devices, that would help me remember to mute these things if we're working on them [14:38:22] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: migrate cleanup_upload_stash to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587327 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [14:38:24] I see only 9 entries and that one's not on the list [14:38:29] (03CR) 10Dzahn: [C: 03+2] zuul: provision the scap repository [puppet] - 10https://gerrit.wikimedia.org/r/579587 (https://phabricator.wikimedia.org/T215458) (owner: 10Hashar) [14:38:43] ah, it cleared since they got disabled [14:39:03] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: Migrate parsercachepurging to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587324 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [14:39:06] I see it under devices [14:39:19] I mean the alert itself [14:39:31] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [14:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:41] yeah I understand, should I click on the asw one just to see what it looks like? [14:40:03] Jeff_Green: sure, it will ask for confirmation [14:40:06] ok [14:40:13] ok [14:40:27] is it possible to receive alerts for only specific devices? [14:40:35] librenms alerting is quite basic, duplicating alerting rules for frack might become hard to manage [14:40:55] (03PS1) 10Elukey: Absent spark refine jobs from an-coord1001 [puppet] - 10https://gerrit.wikimedia.org/r/587778 (https://phabricator.wikimedia.org/T249593) [14:40:57] (03PS1) 10Elukey: role::analytics_cluster::launcher: add Spark refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/587779 (https://phabricator.wikimedia.org/T249593) [14:41:13] I would just add fr-tech-ops@ to the list of alert recipients for frack-relevant hosts [14:42:46] Jeff_Green: let me think if it can be done cleanly [14:43:02] ok [14:43:12] Jeff_Green: do you want the switch ports re-enabled now? [14:43:45] it's ok, I'm optimistic we won't need to put that box back online and will start decom within a week or so [14:43:56] trying to be strict on that otherwise it quickly becomes a mess :) [14:44:31] Cool, I like not causing messes! [14:45:16] PROBLEM - Keyholder SSH agent on deploy1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder [14:45:21] haha thanks! [14:45:29] (03CR) 10jerkins-bot: [V: 04-1] Absent spark refine jobs from an-coord1001 [puppet] - 10https://gerrit.wikimedia.org/r/587778 (https://phabricator.wikimedia.org/T249593) (owner: 10Elukey) [14:46:02] is there a downtime equivalent in librenms? [14:46:19] i.e. if we're rebuilding a server to spare you alerts? [14:58:41] (03PS2) 10Elukey: Absent spark refine jobs from an-coord1001 [puppet] - 10https://gerrit.wikimedia.org/r/587778 (https://phabricator.wikimedia.org/T249593) [14:58:43] (03PS2) 10Elukey: role::analytics_cluster::launcher: add Spark refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/587779 (https://phabricator.wikimedia.org/T249593) [14:58:59] (03CR) 10Hnowlan: [C: 03+2] Changeprop: Listen to mediawiki.page-suppress topic [deployment-charts] - 10https://gerrit.wikimedia.org/r/584672 (https://phabricator.wikimedia.org/T242025) (owner: 10Ppchelko) [15:03:21] (03PS1) 10Dzahn: zuul: fix dependency on /etc/zuul and package if on buster [puppet] - 10https://gerrit.wikimedia.org/r/587782 (https://phabricator.wikimedia.org/T224591) [15:04:51] (03PS1) 10Ema: Add 0036-VSV00004.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/587783 (https://phabricator.wikimedia.org/T249810) [15:07:18] (03PS3) 10Ppchelko: Changeprop: Listen to mediawiki.page-suppress topic [deployment-charts] - 10https://gerrit.wikimedia.org/r/584672 (https://phabricator.wikimedia.org/T242025) [15:07:40] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/587783 (https://phabricator.wikimedia.org/T249810) (owner: 10Ema) [15:08:50] (03PS2) 10Giuseppe Lavagetto: parsoid: allow retries for connection resets in envoy [puppet] - 10https://gerrit.wikimedia.org/r/587732 (https://phabricator.wikimedia.org/T249705) [15:10:36] (03CR) 10Ppchelko: [C: 03+2] "PS3 is a rebase." [deployment-charts] - 10https://gerrit.wikimedia.org/r/584672 (https://phabricator.wikimedia.org/T242025) (owner: 10Ppchelko) [15:10:57] (03Merged) 10jenkins-bot: Changeprop: Listen to mediawiki.page-suppress topic [deployment-charts] - 10https://gerrit.wikimedia.org/r/584672 (https://phabricator.wikimedia.org/T242025) (owner: 10Ppchelko) [15:14:26] (03PS2) 10Dzahn: zuul: fix dependency on /etc/zuul and package if on buster [puppet] - 10https://gerrit.wikimedia.org/r/587782 (https://phabricator.wikimedia.org/T224591) [15:14:49] (03PS1) 10Ayounsi: Logstash Junos PFE Firewall parsing, add PFE_FW_SYSLOG_ETH_IP6_TCP_UDP [puppet] - 10https://gerrit.wikimedia.org/r/587786 (https://phabricator.wikimedia.org/T244147) [15:16:46] (03CR) 10BryanDavis: maps: allow requests with toolforge.org as referrer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [15:17:26] (03PS1) 10Giuseppe Lavagetto: services_proxy: retry on connect failures for parsoid [puppet] - 10https://gerrit.wikimedia.org/r/587787 (https://phabricator.wikimedia.org/T249705) [15:17:45] 10Operations, 10Maps, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): maps: whitelist/reduce ratelimit from requests with toolforge.org referrer - https://phabricator.wikimedia.org/T249815 (10bd808) I made a note on the proposed gerrit patch about adding wmcloud.org to the allow list now... [15:19:36] (03CR) 10Elukey: [C: 03+2] Absent spark refine jobs from an-coord1001 [puppet] - 10https://gerrit.wikimedia.org/r/587778 (https://phabricator.wikimedia.org/T249593) (owner: 10Elukey) [15:20:46] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1003/21818/" [puppet] - 10https://gerrit.wikimedia.org/r/587782 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [15:20:52] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps (wikifeeds/cxserver at the least) running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10Pginer-WMF) [15:22:42] (03PS3) 10Arturo Borrero Gonzalez: maps: allow requests with toolforge.org and wmcloud.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [15:22:53] (03CR) 10jerkins-bot: [V: 04-1] Add 0036-VSV00004.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/587783 (https://phabricator.wikimedia.org/T249810) (owner: 10Ema) [15:23:00] thanks arturo [15:23:17] 👍 [15:23:48] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] maps: allow requests with toolforge.org and wmcloud.org as referrer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [15:24:57] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::launcher: add Spark refine jobs [puppet] - 10https://gerrit.wikimedia.org/r/587779 (https://phabricator.wikimedia.org/T249593) (owner: 10Elukey) [15:26:01] (03CR) 10BBlack: [C: 03+1] maps: allow requests with toolforge.org and wmcloud.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [15:26:41] (03CR) 10Hashar: [C: 03+1] "I like the intermediate variable ;) All that will be cleaned up as soon as both contint* machines are on Buster." [puppet] - 10https://gerrit.wikimedia.org/r/587782 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [15:28:12] (03CR) 10Hashar: [C: 03+1] "Oh, the facts for contint2001 on the ppc hosts have to be refreshed I guess. They are most probably still having the ones from Jessie." [puppet] - 10https://gerrit.wikimedia.org/r/587782 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [15:38:07] (03PS1) 10Elukey: role::analytics_cluster::launcher: use kerberos for data_check [puppet] - 10https://gerrit.wikimedia.org/r/587791 (https://phabricator.wikimedia.org/T249593) [15:41:26] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/21819/an-launcher1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/587791 (https://phabricator.wikimedia.org/T249593) (owner: 10Elukey) [15:41:28] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::launcher: use kerberos for data_check [puppet] - 10https://gerrit.wikimedia.org/r/587791 (https://phabricator.wikimedia.org/T249593) (owner: 10Elukey) [15:43:17] 10Operations, 10ops-codfw, 10fundraising-tech-ops: (Need by: TBD) codfw: rack/setup/install 3 new payments server for frack - https://phabricator.wikimedia.org/T244169 (10Jgreen) [15:46:20] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10thcipriani) >>! In T224591#6042651, @hashar wr... [15:49:52] (03PS1) 10Elukey: role::analytics_cluster::coordinator: absent camus jobs [puppet] - 10https://gerrit.wikimedia.org/r/587793 (https://phabricator.wikimedia.org/T249593) [15:51:45] 10Operations, 10SRE-Access-Requests: Requesting access to analytics for andrew-wmde - https://phabricator.wikimedia.org/T249733 (10Nuria) Approved on my end. [15:52:17] (03PS4) 10CDanis: maps: allow requests with toolforge.org and wmcloud.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [15:52:20] (03PS1) 10Cwhite: smart: make smart_data_dump importable for adding tests [puppet] - 10https://gerrit.wikimedia.org/r/587795 (https://phabricator.wikimedia.org/T199236) [15:54:03] (03PS2) 10Elukey: role::analytics_cluster::coordinator: absent camus jobs [puppet] - 10https://gerrit.wikimedia.org/r/587793 (https://phabricator.wikimedia.org/T249593) [15:54:05] (03PS1) 10Elukey: role::analytics_cluster::launcher: add Camus jobs [puppet] - 10https://gerrit.wikimedia.org/r/587796 (https://phabricator.wikimedia.org/T249593) [15:54:39] 10Operations, 10serviceops, 10Kubernetes: New Deployment charts should allow exposing services via TLS - https://phabricator.wikimedia.org/T236008 (10Joe) 05Open→03Resolved [15:54:43] 10Operations, 10serviceops, 10Kubernetes, 10Patch-For-Review: Add TLS termination to services running on kubernetes - https://phabricator.wikimedia.org/T235411 (10Joe) [15:55:05] 10Operations, 10Release-Engineering-Team, 10serviceops: Hundreds of tags for `wikimedia/mediawiki-core` image - https://phabricator.wikimedia.org/T242775 (10Joe) p:05High→03Low [15:56:51] (03PS5) 10CDanis: maps: allow requests with toolforge.org and wmcloud.org as referrer [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [15:57:59] (03CR) 10CDanis: [C: 03+2] "Added VTC tests for the new whitelist entries; tests pass." [puppet] - 10https://gerrit.wikimedia.org/r/587730 (https://phabricator.wikimedia.org/T249815) (owner: 10Urbanecm) [15:59:55] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::coordinator: absent camus jobs [puppet] - 10https://gerrit.wikimedia.org/r/587793 (https://phabricator.wikimedia.org/T249593) (owner: 10Elukey) [16:00:04] godog and _joe_: Dear deployers, time to do the Puppet SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:06:57] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::launcher: add Camus jobs [puppet] - 10https://gerrit.wikimedia.org/r/587796 (https://phabricator.wikimedia.org/T249593) (owner: 10Elukey) [16:10:55] (03CR) 10Dzahn: [C: 03+1] cescout: additional changes for metadb sync [puppet] - 10https://gerrit.wikimedia.org/r/587723 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [16:13:19] (03PS1) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) [16:13:29] (03PS1) 10Jdlrobson: Drop unused config for main page css [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587800 (https://phabricator.wikimedia.org/T243996) [16:15:43] (03CR) 10Bstorm: [C: 03+1] "So many of them 😲" [puppet] - 10https://gerrit.wikimedia.org/r/587556 (owner: 10Arturo Borrero Gonzalez) [16:15:51] (03CR) 10RLazarus: [C: 03+2] maintenance: migrate cleanup_upload_stash to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587327 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [16:16:16] (03CR) 10RLazarus: [C: 03+2] maintenance: Migrate parsercachepurging to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587324 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [16:17:49] (03PS2) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) [16:18:10] (03PS3) 10RLazarus: maintenance: Migrate parsercachepurging to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587324 (https://phabricator.wikimedia.org/T211250) [16:19:33] (03CR) 10Ssingh: [C: 03+2] cescout: additional changes for metadb sync [puppet] - 10https://gerrit.wikimedia.org/r/587723 (https://phabricator.wikimedia.org/T247273) (owner: 10Ssingh) [16:21:01] sukhe: okay to merge yours? [16:21:05] rlazarus: yes, thank you! [16:22:26] ✅ [16:23:13] (03CR) 10RLazarus: [C: 03+2] maintenance: Migrate parsercachepurging to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/587324 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [16:25:24] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 7154 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:25:42] that's not great [16:25:50] nope [16:26:38] memcache traffic has been slightly elevated for a few minutes but doesn't look crazy https://grafana.wikimedia.org/d/000000316/memcache?orgId=1 [16:26:40] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 46 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:27:12] (03PS1) 10Arturo Borrero Gonzalez: kubernetes: ingress: use HTTP 307 for canonical redirect [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/587807 (https://phabricator.wikimedia.org/T249843) [16:27:12] 🧐 [16:27:33] so from https://grafana.wikimedia.org/d/000000549/mcrouter?panelId=9&fullscreen&orgId=1&from=now-1h&to=now it seems that two hosts showed up problems [16:27:40] seems confirmed by logstash too [16:27:55] app server latency spiked but recovered [16:28:29] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "untested patch!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/587807 (https://phabricator.wikimedia.org/T249843) (owner: 10Arturo Borrero Gonzalez) [16:28:43] whatever this was, it sucked for like two minutes and then ended, I'd love to know why [16:29:01] (03CR) 10jerkins-bot: [V: 04-1] kubernetes: ingress: use HTTP 307 for canonical redirect [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/587807 (https://phabricator.wikimedia.org/T249843) (owner: 10Arturo Borrero Gonzalez) [16:29:08] rlazarus: I think it was bandwith saturation for a mc shard [16:29:25] ohh I'd buy that [16:29:33] cc cdanis who wanted exactly that graph [16:29:52] so in the two nodes reporting tkos I see [16:29:54] ProxyDestination.cpp:453] 10.64.0.84:11211 unmarked TKO. Total hard TKOs: 0; soft TKOs: 0. Reply: mc_res_ok [16:29:57] that is mc1023 [16:30:24] (03PS3) 10Hnowlan: profile::kubernetes: add the puppet CA cert to general.yaml [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) [16:30:36] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Matthew Williams - https://phabricator.wikimedia.org/T249844 (10mwilliams) [16:33:52] (03PS2) 10Arturo Borrero Gonzalez: kubernetes: ingress: use HTTP 307 for canonical redirect [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/587807 (https://phabricator.wikimedia.org/T249843) [16:33:57] prometheus shows mc1023 momentarily at 68.8mbyte/sec ~= 0.55 gigabit/second, but of course that's an average over about a minute [16:34:49] yeah [16:35:05] there was a peak of GETs at the same time, slab 98 [16:35:11] 10Operations, 10OpenRefine, 10Traffic, 10serviceops, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Joe) 05Open→03Resolved a:03Joe From my tests, now we get all cookies correctly set: ` $ curl --http1.1 -sIL... [16:35:26] it looks like the tx bandwidth saturated [16:36:00] yeah I'd believe it [16:36:17] elukey: I'm hoping to have some proper monitoring for brief saturation events by next week btw [16:36:43] 10Operations, 10OpenRefine, 10Traffic, 10serviceops, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Joe) @Pintoch the behaviour should be restored now. Can you confirm old versions of OpenRefine work correctly now? [16:37:09] cdanis: <3 [16:37:14] if you need help let me know [16:37:30] haha, if you're interested in code reviewing some python ;) [16:38:04] I'm always interested in code reviewing some python [16:38:09] (03CR) 10Bstorm: "This seems like a really good idea. Have we tried it live?" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/587807 (https://phabricator.wikimedia.org/T249843) (owner: 10Arturo Borrero Gonzalez) [16:38:16] cdanis: count me in! [16:38:20] ok cool [16:44:08] (03PS1) 10Cwhite: smart: add _check_output wrapper method and tests [puppet] - 10https://gerrit.wikimedia.org/r/587811 (https://phabricator.wikimedia.org/T199236) [16:44:17] (03CR) 10Hnowlan: "pcc run here: https://puppet-compiler.wmflabs.org/compiler1002/21826/deploy1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/587799 (https://phabricator.wikimedia.org/T249633) (owner: 10Hnowlan) [16:45:45] (03CR) 10jerkins-bot: [V: 04-1] smart: add _check_output wrapper method and tests [puppet] - 10https://gerrit.wikimedia.org/r/587811 (https://phabricator.wikimedia.org/T199236) (owner: 10Cwhite) [16:56:43] 10Operations, 10OpenRefine, 10Traffic, 10serviceops, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Pintoch) Thanks a million, this is very kind of you! I can confirm this works, edits are coming through again fro... [16:57:40] (03PS1) 10Hnowlan: changeprop: Use TLS port for Kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/587812 (https://phabricator.wikimedia.org/T249644) [16:58:02] (03PS1) 10Elukey: profile::analytics::refinery::job::refine: add failed flags checker [puppet] - 10https://gerrit.wikimedia.org/r/587813 (https://phabricator.wikimedia.org/T240230) [17:00:04] halfak and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Graphoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T1700). [17:00:34] (03PS2) 10Cwhite: smart: add _check_output wrapper method and tests [puppet] - 10https://gerrit.wikimedia.org/r/587811 (https://phabricator.wikimedia.org/T199236) [17:02:52] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/21827/an-launcher1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/587813 (https://phabricator.wikimedia.org/T240230) (owner: 10Elukey) [17:03:15] (03PS1) 10Mforns: analytics::refinery::eventlogging-saltrotate: Move state to HDFS [puppet] - 10https://gerrit.wikimedia.org/r/587815 [17:04:01] (03CR) 10Ppchelko: [C: 03+2] changeprop: Use TLS port for Kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/587812 (https://phabricator.wikimedia.org/T249644) (owner: 10Hnowlan) [17:05:08] 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10Anomie) Next step is to determine the limits and implement them for Wiki... [17:07:29] (03PS1) 10Cwhite: abstract parsing from data gathering and add tests [puppet] - 10https://gerrit.wikimedia.org/r/587816 (https://phabricator.wikimedia.org/T199236) [17:07:48] 10Operations, 10Analytics, 10Analytics-Kanban, 10User-Elukey: Refactor Analytics POSIX groups in puppet to improve maintainability - https://phabricator.wikimedia.org/T246578 (10Nuria) 05Open→03Resolved [17:08:21] (03PS2) 10Cwhite: smart: abstract parsing from data gathering and add tests [puppet] - 10https://gerrit.wikimedia.org/r/587816 (https://phabricator.wikimedia.org/T199236) [17:08:56] (03CR) 10Elukey: analytics::refinery::eventlogging-saltrotate: Move state to HDFS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587815 (owner: 10Mforns) [17:09:24] (03CR) 10Mforns: "Now I'm thinking this still assumes the /etc/refinery/salts directory is present. Should we add a mkdir -p?" [puppet] - 10https://gerrit.wikimedia.org/r/587815 (owner: 10Mforns) [17:09:32] (03CR) 10jerkins-bot: [V: 04-1] smart: abstract parsing from data gathering and add tests [puppet] - 10https://gerrit.wikimedia.org/r/587816 (https://phabricator.wikimedia.org/T199236) (owner: 10Cwhite) [17:10:40] (03PS2) 10Hnowlan: changeprop: Use TLS port for Kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/587812 (https://phabricator.wikimedia.org/T249644) [17:11:03] (03CR) 10Hnowlan: [V: 03+2] changeprop: Use TLS port for Kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/587812 (https://phabricator.wikimedia.org/T249644) (owner: 10Hnowlan) [17:13:58] PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:14:04] (03CR) 10Hnowlan: changeprop: Use TLS port for Kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/587812 (https://phabricator.wikimedia.org/T249644) (owner: 10Hnowlan) [17:14:25] (03CR) 10Hnowlan: [V: 03+2 C: 03+2] changeprop: Use TLS port for Kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/587812 (https://phabricator.wikimedia.org/T249644) (owner: 10Hnowlan) [17:14:30] PROBLEM - zuul_merger_service_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger https://www.mediawiki.org/wiki/Continuous_integration/Zuul [17:14:32] PROBLEM - git_daemon_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/git-core/git-daemon --syslog https://www.mediawiki.org/wiki/Continuous_integration/Zuul [17:14:52] (03Merged) 10jenkins-bot: changeprop: Use TLS port for Kafka [deployment-charts] - 10https://gerrit.wikimedia.org/r/587812 (https://phabricator.wikimedia.org/T249644) (owner: 10Hnowlan) [17:15:09] mutante: expected? ^ [17:18:44] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [17:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:40] !log hnowlan@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' . [17:24:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:39] (03PS3) 10DCausse: [mwgrep] only query live indices [puppet] - 10https://gerrit.wikimedia.org/r/587586 (https://phabricator.wikimedia.org/T249435) [17:31:59] (03CR) 10DCausse: [mwgrep] only query live indices (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587586 (https://phabricator.wikimedia.org/T249435) (owner: 10DCausse) [17:33:03] (03CR) 10jerkins-bot: [V: 04-1] [mwgrep] only query live indices [puppet] - 10https://gerrit.wikimedia.org/r/587586 (https://phabricator.wikimedia.org/T249435) (owner: 10DCausse) [17:38:16] (03PS3) 10Cwhite: smart: abstract parsing from data gathering and add tests [puppet] - 10https://gerrit.wikimedia.org/r/587816 (https://phabricator.wikimedia.org/T199236) [17:38:54] (03PS4) 10DCausse: [mwgrep] only query live indices [puppet] - 10https://gerrit.wikimedia.org/r/587586 (https://phabricator.wikimedia.org/T249435) [17:40:14] (03CR) 10jerkins-bot: [V: 04-1] [mwgrep] only query live indices [puppet] - 10https://gerrit.wikimedia.org/r/587586 (https://phabricator.wikimedia.org/T249435) (owner: 10DCausse) [17:40:30] !log hnowlan@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' . [17:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:48] (03PS5) 10DCausse: [mwgrep] only query live indices [puppet] - 10https://gerrit.wikimedia.org/r/587586 (https://phabricator.wikimedia.org/T249435) [17:47:59] (03PS2) 10Mforns: analytics::refinery::eventlogging-saltrotate: Bootstrap salts [puppet] - 10https://gerrit.wikimedia.org/r/587815 [17:49:54] rlazarus: yes that it is broken, no that icinga tells aus about it [17:50:09] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:50:10] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [17:50:11] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [17:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:18] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10ops-monitoring-bot) Icinga downtime for 1 day,... [17:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:21] rlazarus: re-downtimed. thanks [17:50:25] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps (wikifeeds/cxserver at the least) running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10bearND) @Pchelolo Do we need to add `named_levels: true` to t... [17:50:49] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [17:50:50] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [17:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:58] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10ops-monitoring-bot) Icinga downtime for 4 days... [17:51:11] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [17:52:08] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps (wikifeeds/cxserver at the least) running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10Pchelolo) This is only needed for k8s, not for scap deploys.... [17:53:51] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:54:16] mutante: 👍 [17:54:47] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [17:56:55] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [17:58:41] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [18:00:04] RoanKattouw, Niharika, and Urbanecm: Dear deployers, time to do the Morning SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T1800). [18:00:05] Jdlrobson: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:05:20] 10Operations, 10CX-cxserver, 10Product-Infrastructure-Team-Backlog, 10Wikifeeds, and 4 others: service-runner apps (wikifeeds/cxserver at the least) running on kubernetes emit logs with log level 50 - https://phabricator.wikimedia.org/T239459 (10bearND) Good to know. So, no need to make changes to the depl... [18:11:16] here if anyone is able to deploy.. Urbanecm RoanKattouw niedzielski ? [18:11:19] Niharika: sorry :) [18:12:55] Jdlrobson: I can SWAT for you! [18:13:38] but not https://gerrit.wikimedia.org/r/c/587801/, that's a patch in master and needs to be code reviewed before it can be SWATted [18:17:19] Jdlrobson: if you want, I can do the second patch through [18:19:06] We wouldn't swat something like that anyway... [18:20:41] Yeah, that's not worth risking the site stability for. [18:20:54] Let it ride the train. [18:28:34] 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1023 crashed / Smart Storage Battery failure - https://phabricator.wikimedia.org/T249174 (10wiki_willy) a:03Jclark-ctr [18:29:25] 10Operations, 10ops-eqiad, 10DC-Ops: ganeti1011.mgmt is un-configured (was: Puppet resolves wrong IP for Icinga host config) - https://phabricator.wikimedia.org/T249314 (10wiki_willy) a:03Cmjohnson [18:33:41] (03CR) 10Herron: [C: 03+1] "Thanks for this, it will be a definite improvement!" [puppet] - 10https://gerrit.wikimedia.org/r/587704 (https://phabricator.wikimedia.org/T221052) (owner: 10Filippo Giunchedi) [18:35:16] (03PS4) 10Andrew Bogott: OpenStack designate: upgrade eqiad1 to version Rocky [puppet] - 10https://gerrit.wikimedia.org/r/587522 (https://phabricator.wikimedia.org/T248635) [18:43:45] (03CR) 10Herron: [C: 03+1] "LGTM, one optional minor comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587705 (https://phabricator.wikimedia.org/T221052) (owner: 10Filippo Giunchedi) [18:44:02] (03PS1) 10Mholloway: Update multiple services' stdout logging to use named_levels [deployment-charts] - 10https://gerrit.wikimedia.org/r/587845 (https://phabricator.wikimedia.org/T239459) [18:52:02] Urbanecm: sorry something came up at home. Looks like i linked to the wrong thing too [18:52:08] i'll pick this up in the later swat window [18:52:34] sorry for the confusion Reedy James_F [18:52:42] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/587800 was supposed to be the sWAT link not the mobilefrontend thing :) [18:53:28] 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1023 crashed / Smart Storage Battery failure - https://phabricator.wikimedia.org/T249174 (10Jclark-ctr) @fgiunchedi @Volans i will be on site 4/14/2020. at 10am Est we have limited time on site can we schedule this? [18:53:49] (have updated swat calendar_ [19:00:04] longma and James_F: I, the Bot under the Fountain, allow thee, The Deployer, to do Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T1900). [19:00:22] * James_F waves. [19:01:04] !log deploying 1.35.0-wmf.27 to all wikis [19:01:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:24] (03CR) 10Ayounsi: [C: 03+1] logstash: validate config files [puppet] - 10https://gerrit.wikimedia.org/r/587704 (https://phabricator.wikimedia.org/T221052) (owner: 10Filippo Giunchedi) [19:04:31] James_F: looks like the patch has merged so I will proceed to sync [19:04:43] Cool. [19:04:54] wikibugs having issues? [19:05:22] not sure about wikibugs but I see a failure in one of the tests [19:05:23] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/587847 [19:05:49] That's fine, it's just saying that you didn't touch IS. [19:05:51] it says no change detected, so maybe fine? [19:05:53] Which is what we wanted. [19:06:02] 👍 [19:06:12] But for 90% of patches to that repo, people want to actually change config of a given wiki. [19:06:59] (03PS1) 10Jeena Huneidi: all wikis to 1.35.0-wmf.27 refs T247774 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587847 [19:07:12] (03CR) 10Jeena Huneidi: [C: 03+2] all wikis to 1.35.0-wmf.27 refs T247774 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587847 (owner: 10Jeena Huneidi) [19:07:18] (03PS9) 10Mholloway: Add recommendation-api chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [19:07:19] Helpful. [19:07:23] haha [19:07:57] (03CR) 10Mholloway: "Rebased and updated the stdout logging config per T239459." [deployment-charts] - 10https://gerrit.wikimedia.org/r/565788 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [19:08:10] !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.27 refs T247774 [19:08:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:08:15] T247774: 1.35.0-wmf.27 deployment blockers - https://phabricator.wikimedia.org/T247774 [19:08:43] Quiet so far. [19:08:51] (03Merged) 10jenkins-bot: all wikis to 1.35.0-wmf.27 refs T247774 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587847 (owner: 10Jeena Huneidi) [19:08:54] yeah [19:09:18] LGTM. [19:09:33] cool, thanks for your assistance! [19:17:10] 10Operations, 10ops-eqiad: msw-a2-eqiad missing from Netbox - https://phabricator.wikimedia.org/T249685 (10wiki_willy) a:03Jclark-ctr [19:19:38] (03PS1) 10Ssingh: aptrepo: update Postgres shell hook [puppet] - 10https://gerrit.wikimedia.org/r/587850 [19:23:06] (03CR) 10BearND: [C: 03+1] Update multiple services' stdout logging to use named_levels [deployment-charts] - 10https://gerrit.wikimedia.org/r/587845 (https://phabricator.wikimedia.org/T239459) (owner: 10Mholloway) [19:23:49] ACKNOWLEDGEMENT - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. amusso Being dist upgraded. T224591 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:23:49] ACKNOWLEDGEMENT - git_daemon_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/git-core/git-daemon --syslog amusso Being dist upgraded. T224591 https://www.mediawiki.org/wiki/Continuous_integration/Zuul [19:23:49] ACKNOWLEDGEMENT - zuul_merger_service_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger amusso Being dist upgraded. T224591 https://www.mediawiki.org/wiki/Continuous_integration/Zuul [19:25:15] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/587850 (owner: 10Ssingh) [19:26:30] (03CR) 10Ssingh: [C: 03+2] aptrepo: update Postgres shell hook [puppet] - 10https://gerrit.wikimedia.org/r/587850 (owner: 10Ssingh) [19:27:12] (03CR) 10Ppchelko: "Do we need to update the versions of the images as well in the same commit? The changes on it's own will not work unless a new image is pu" [deployment-charts] - 10https://gerrit.wikimedia.org/r/587845 (https://phabricator.wikimedia.org/T239459) (owner: 10Mholloway) [19:30:27] (03CR) 10Mholloway: "mobileapps and chromium render aren't actually deployed on the pipeline yet. i i'll add a follow-up patch to update the deploy image for w" [deployment-charts] - 10https://gerrit.wikimedia.org/r/587845 (https://phabricator.wikimedia.org/T239459) (owner: 10Mholloway) [19:33:42] (03PS1) 10Mholloway: Bump wikifeeds to 2020-04-09-185833-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/587854 (https://phabricator.wikimedia.org/T239459) [19:35:00] (03CR) 10Ppchelko: [C: 03+2] Bump wikifeeds to 2020-04-09-185833-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/587854 (https://phabricator.wikimedia.org/T239459) (owner: 10Mholloway) [19:38:15] (03CR) 10Mholloway: [C: 03+2] Update multiple services' stdout logging to use named_levels [deployment-charts] - 10https://gerrit.wikimedia.org/r/587845 (https://phabricator.wikimedia.org/T239459) (owner: 10Mholloway) [19:38:31] (03Merged) 10jenkins-bot: Update multiple services' stdout logging to use named_levels [deployment-charts] - 10https://gerrit.wikimedia.org/r/587845 (https://phabricator.wikimedia.org/T239459) (owner: 10Mholloway) [19:39:11] !log mholloway-shell@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' . [19:39:12] (03Merged) 10jenkins-bot: Bump wikifeeds to 2020-04-09-185833-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/587854 (https://phabricator.wikimedia.org/T239459) (owner: 10Mholloway) [19:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:11] !log mholloway-shell@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' . [19:41:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:43] !log mholloway-shell@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' . [19:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:16] (03CR) 10Andrew Bogott: [C: 03+2] OpenStack designate: upgrade eqiad1 to version Rocky [puppet] - 10https://gerrit.wikimedia.org/r/587522 (https://phabricator.wikimedia.org/T248635) (owner: 10Andrew Bogott) [20:08:32] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users and WMF/NDA restricted tickets for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10jmads) [20:32:26] (03PS1) 10Hashar: ci: remove blubber Debian package [puppet] - 10https://gerrit.wikimedia.org/r/587862 (https://phabricator.wikimedia.org/T224591) [20:33:06] (03PS2) 10Hashar: ci: remove blubber Debian package [puppet] - 10https://gerrit.wikimedia.org/r/587862 (https://phabricator.wikimedia.org/T224591) [20:35:33] (03CR) 10Thcipriani: [C: 03+1] "As indicated in commit message, pipeline uses blubberoid now." [puppet] - 10https://gerrit.wikimedia.org/r/587862 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [20:40:41] 10Operations, 10Performance-Team: Occasional NIC Tx bandwidth saturation for mc1027 - https://phabricator.wikimedia.org/T248962 (10aaron) >>! In T248962#6038821, @elukey wrote: > @aaron one thing that it would be useful is, in my opinion, having instrumentation in MediaWiki about key size volume/bytes. Even pe... [20:47:37] (03PS1) 10Andrew Bogott: Openstack Designate: fix an encoding issue in the memcached tooz driver [puppet] - 10https://gerrit.wikimedia.org/r/587864 (https://phabricator.wikimedia.org/T248635) [20:52:59] (03PS2) 10Andrew Bogott: Openstack Designate: fix an encoding issue in the memcached tooz driver [puppet] - 10https://gerrit.wikimedia.org/r/587864 (https://phabricator.wikimedia.org/T248635) [20:55:30] (03CR) 10Andrew Bogott: [C: 03+2] Openstack Designate: fix an encoding issue in the memcached tooz driver [puppet] - 10https://gerrit.wikimedia.org/r/587864 (https://phabricator.wikimedia.org/T248635) (owner: 10Andrew Bogott) [20:58:53] (03PS1) 10Andrew Bogott: Openstack tooz hack: don't explicitly depend on python3-tooz [puppet] - 10https://gerrit.wikimedia.org/r/587866 (https://phabricator.wikimedia.org/T248635) [20:59:56] (03CR) 10jerkins-bot: [V: 04-1] Openstack tooz hack: don't explicitly depend on python3-tooz [puppet] - 10https://gerrit.wikimedia.org/r/587866 (https://phabricator.wikimedia.org/T248635) (owner: 10Andrew Bogott) [21:01:16] (03PS2) 10Andrew Bogott: Openstack tooz hack: don't explicitly depend on python3-tooz [puppet] - 10https://gerrit.wikimedia.org/r/587866 (https://phabricator.wikimedia.org/T248635) [21:02:00] (03CR) 10jerkins-bot: [V: 04-1] Openstack tooz hack: don't explicitly depend on python3-tooz [puppet] - 10https://gerrit.wikimedia.org/r/587866 (https://phabricator.wikimedia.org/T248635) (owner: 10Andrew Bogott) [21:03:07] (03PS3) 10Andrew Bogott: Openstack tooz hack: don't explicitly depend on python3-tooz [puppet] - 10https://gerrit.wikimedia.org/r/587866 (https://phabricator.wikimedia.org/T248635) [21:04:31] (03CR) 10Andrew Bogott: [C: 03+2] Openstack tooz hack: don't explicitly depend on python3-tooz [puppet] - 10https://gerrit.wikimedia.org/r/587866 (https://phabricator.wikimedia.org/T248635) (owner: 10Andrew Bogott) [21:09:27] (03PS1) 10Mholloway: Release wikifeeds 0.0.10 [deployment-charts] - 10https://gerrit.wikimedia.org/r/587868 (https://phabricator.wikimedia.org/T239459) [21:13:25] (03PS2) 10Mholloway: Release new charts for wikifeeds, mobileapps, chromium-render [deployment-charts] - 10https://gerrit.wikimedia.org/r/587868 (https://phabricator.wikimedia.org/T239459) [21:14:39] (03PS3) 10Mholloway: Release new charts for wikifeeds, mobileapps, chromium-render [deployment-charts] - 10https://gerrit.wikimedia.org/r/587868 (https://phabricator.wikimedia.org/T239459) [21:24:39] PROBLEM - PHP opcache health on mw2321 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [21:29:15] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [21:29:26] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) 05Open→03Resolved This is done! [21:33:45] RECOVERY - PHP opcache health on mw2321 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [22:01:41] !log running initial metadb sync on cescout1001 [22:01:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:03:26] (03PS1) 10Rxy: Add extendedconfirmed group and protection level at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587874 (https://phabricator.wikimedia.org/T249820) [22:04:48] (03CR) 10jerkins-bot: [V: 04-1] Add extendedconfirmed group and protection level at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587874 (https://phabricator.wikimedia.org/T249820) (owner: 10Rxy) [22:06:36] (03PS2) 10Rxy: Add extendedconfirmed group and protection level at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587874 (https://phabricator.wikimedia.org/T249820) [22:07:08] (03CR) 10DannyS712: Add extendedconfirmed group and protection level at jawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587874 (https://phabricator.wikimedia.org/T249820) (owner: 10Rxy) [22:07:27] (03CR) 10DannyS712: Add extendedconfirmed group and protection level at jawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587874 (https://phabricator.wikimedia.org/T249820) (owner: 10Rxy) [22:18:23] jouncebot: next [22:18:23] In 0 hour(s) and 41 minute(s): Evening SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T2300) [22:19:54] (03PS1) 10Cwhite: smart: add tests for _parse_smart_info and _parse_smart_attributes [puppet] - 10https://gerrit.wikimedia.org/r/587877 (https://phabricator.wikimedia.org/T199236) [22:40:35] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/587704 (https://phabricator.wikimedia.org/T221052) (owner: 10Filippo Giunchedi) [22:41:49] (03CR) 10Cwhite: [C: 03+1] Add Thanos query [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [23:00:04] RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Evening SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200409T2300). [23:00:05] Jdlrobson and rxy: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:09] o/ [23:00:12] o/ [23:01:07] I can do the SWAT [23:04:40] thanks RoanKattouw [23:07:01] (03CR) 10Catrope: [C: 03+2] Add extendedconfirmed group and protection level at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587874 (https://phabricator.wikimedia.org/T249820) (owner: 10Rxy) [23:07:22] is 1002 ? [23:08:14] (03Merged) 10jenkins-bot: Add extendedconfirmed group and protection level at jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587874 (https://phabricator.wikimedia.org/T249820) (owner: 10Rxy) [23:09:04] rxy: Your change is on mwdebug1001, please test [23:09:16] *1002, sorry [23:09:34] (03CR) 10Catrope: [C: 03+2] Drop unused config for main page css [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587800 (https://phabricator.wikimedia.org/T243996) (owner: 10Jdlrobson) [23:10:29] * rxy looks [23:10:37] (03PS2) 10Jforrester: Drop unused config for main page css [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587800 (https://phabricator.wikimedia.org/T243996) (owner: 10Jdlrobson) [23:10:43] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587800 (https://phabricator.wikimedia.org/T243996) (owner: 10Jdlrobson) [23:10:55] RoanKattouw: Gerrit wasn't going to fix the merge conflict. [23:11:07] Oh whoops didn't realize there was one [23:11:13] No problem. [23:12:01] (03Merged) 10jenkins-bot: Drop unused config for main page css [mediawiki-config] - 10https://gerrit.wikimedia.org/r/587800 (https://phabricator.wikimedia.org/T243996) (owner: 10Jdlrobson) [23:12:31] a warning is ignorable? @ prod logstash 1002 [23:12:39] Looking [23:13:22] looks good to me ; works correctly [23:13:41] mwdebug1002.eqiad.wmnet [23:15:10] Yes that warning looks like an issue with the blocks code in core, not related to this config change [23:15:16] rxy: Are things working otherwies? [23:15:26] please deploy to prod [23:17:37] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add extendedconfirmed group and protection level on jawiki (T249820) (duration: 00m 59s) [23:17:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:44] T249820: Enable of extended confirmed protection at ja.wikipedia - https://phabricator.wikimedia.org/T249820 [23:18:33] server: mw1409.eqiad.wmnet ; works correctly. Thanks! [23:18:45] Jdlrobson: Your first patch is on mwdebug1002, please test (to the extent that there is anything to test) [23:18:52] sweet on it [23:19:11] (03PS4) 10Catrope: Drop fallback support for wgMobileFrontendLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584734 (https://phabricator.wikimedia.org/T248500) (owner: 10Jforrester) [23:19:28] (03CR) 10Catrope: [C: 03+2] Drop fallback support for wgMobileFrontendLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584734 (https://phabricator.wikimedia.org/T248500) (owner: 10Jforrester) [23:20:21] LGTM RoanKattouw [23:21:54] (03Merged) 10jenkins-bot: Drop fallback support for wgMobileFrontendLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584734 (https://phabricator.wikimedia.org/T248500) (owner: 10Jforrester) [23:21:55] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Drop unused config for main page CSS (T243996) (duration: 00m 58s) [23:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:03] T243996: [Technical debt pay off] Remove MFMobileMainPageCss from MobileFrontend - https://phabricator.wikimedia.org/T243996 [23:23:20] Jdlrobson: Now the other one is on mwdebug1002 (MF logos) [23:25:56] (on that too) [23:26:34] RoanKattouw: also LGTM thanks! [23:27:53] !log catrope@deploy1001 Synchronized wmf-config/mobile.php: Drop fallback support for wgMobileFrontendLogo (T248500) (duration: 00m 58s) [23:27:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:59] T248500: Replace $wgMobileFrontendLogo with $wgLogos - https://phabricator.wikimedia.org/T248500 [23:30:02] All done! [23:30:18] Thanks :D [23:30:42] thanks RoanKattouw [23:30:46] quick and painless :) [23:43:18] RoanKattouw: Remember to dupe-sync. [23:43:33] (Or I can?) [23:43:45] Yeah please do [23:43:58] Done. [23:44:06] Didn't realize that that was standard practice now [23:44:41] Sadly. [23:44:50] We have a fix, but I've not deployed it yet. [23:44:54] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s) [23:44:56] I might do that next week. [23:44:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log