[00:22:24] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=icinga site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:22:36] (03PS1) 10Dzahn: releases: avoid adding rsync when source and dest are the same [puppet] - 10https://gerrit.wikimedia.org/r/620137 (https://phabricator.wikimedia.org/T247652) [00:23:49] (03CR) 10jerkins-bot: [V: 04-1] releases: avoid adding rsync when source and dest are the same [puppet] - 10https://gerrit.wikimedia.org/r/620137 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [00:24:37] (03PS2) 10Dzahn: releases: avoid adding rsync when source and dest are the same [puppet] - 10https://gerrit.wikimedia.org/r/620137 (https://phabricator.wikimedia.org/T247652) [00:25:47] (03PS3) 10Dzahn: releases: avoid adding rsync when source and dest are the same [puppet] - 10https://gerrit.wikimedia.org/r/620137 (https://phabricator.wikimedia.org/T247652) [00:25:50] (03CR) 10jerkins-bot: [V: 04-1] releases: avoid adding rsync when source and dest are the same [puppet] - 10https://gerrit.wikimedia.org/r/620137 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [00:31:52] 10Operations, 10Gerrit-Privilege-Requests, 10User-Kormat: Request for Gerrit Managers permissions - https://phabricator.wikimedia.org/T260342 (10Legoktm) Just to clarify, are there specific permissions you would like and are lacking, or you'd like to be a manager to fix whatever comes up (which is fine too)?... [00:32:16] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:37:07] (03PS1) 10Krinkle: noc: Improve phrasing of highlight.php error message [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620138 (https://phabricator.wikimedia.org/T254646) [00:40:12] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=icinga site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:47:44] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/24490/" [puppet] - 10https://gerrit.wikimedia.org/r/620137 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [00:49:32] (03PS1) 10Krinkle: profiler: Update XHGui SERVER/GET key filter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620139 [01:08:02] (03PS1) 10Dzahn: releases: ensure to have motd warnings on all secondary servers [puppet] - 10https://gerrit.wikimedia.org/r/620141 (https://phabricator.wikimedia.org/T247652) [01:11:04] (03CR) 10Dzahn: [C: 03+2] "adds missing warning on releases1001 to tell users to stop using it" [puppet] - 10https://gerrit.wikimedia.org/r/620141 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [01:13:40] (03CR) 10Dave Pifke: [C: 03+1] profiler: Update XHGui SERVER/GET key filter (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620139 (owner: 10Krinkle) [01:14:46] (03CR) 10Dave Pifke: [C: 03+1] webperf: remove the xhgui_old_host parameter [puppet] - 10https://gerrit.wikimedia.org/r/620128 (https://phabricator.wikimedia.org/T180761) (owner: 10Dzahn) [01:19:19] (03PS1) 10Dave Pifke: xhgui: remove MongoDB backend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620142 (https://phabricator.wikimedia.org/T180761) [01:21:59] (03CR) 10Krinkle: profiler: Update XHGui SERVER/GET key filter (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620139 (owner: 10Krinkle) [01:22:29] (03PS1) 10Dzahn: releases: update contents of the warning MOTD template [puppet] - 10https://gerrit.wikimedia.org/r/620143 (https://phabricator.wikimedia.org/T247652) [01:23:29] (03CR) 10Dzahn: [C: 03+2] webperf: remove the xhgui_old_host parameter [puppet] - 10https://gerrit.wikimedia.org/r/620128 (https://phabricator.wikimedia.org/T180761) (owner: 10Dzahn) [01:23:33] (03PS2) 10Krinkle: profiler: Update XHGui SERVER/GET key filter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620139 [01:25:13] (03CR) 10Dzahn: "- ProxyPass /xhgui-old http://tungsten.eqiad.wmnet/xhgui" [puppet] - 10https://gerrit.wikimedia.org/r/620128 (https://phabricator.wikimedia.org/T180761) (owner: 10Dzahn) [01:25:52] (03PS1) 10Ottomata: Revert to anaconda 2020.02, also some activation improvements [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/620144 [01:26:11] (03CR) 10Ottomata: [C: 03+2] Don't allow an env set CONDA_USER_ENV to override $1 in anaconda-activate-stacked-env [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/618990 (owner: 10Ottomata) [01:26:14] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Don't allow an env set CONDA_USER_ENV to override $1 in anaconda-activate-stacked-env [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/618990 (owner: 10Ottomata) [01:29:06] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/24492/" [puppet] - 10https://gerrit.wikimedia.org/r/620143 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [01:29:16] (03CR) 10Dave Pifke: [C: 03+1] profiler: Update XHGui SERVER/GET key filter (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620139 (owner: 10Krinkle) [01:31:01] (03PS3) 10Jeena Huneidi: [WIP] Script to update image versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/619833 (https://phabricator.wikimedia.org/T255835) [01:32:19] (03CR) 10Jeena Huneidi: "> Patch Set 2:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/619833 (https://phabricator.wikimedia.org/T255835) (owner: 10Jeena Huneidi) [01:36:17] (03PS4) 10Jeena Huneidi: [WIP] Script to update image versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/619833 (https://phabricator.wikimedia.org/T255835) [01:39:11] (03PS5) 10Jeena Huneidi: [WIP] Script to update image versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/619833 (https://phabricator.wikimedia.org/T255835) [02:04:06] 10Operations, 10MediaWiki-Shell: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603 (10tstarling) MemoryLimit is now a deprecated property. The [[https://manpages.debian.org/buster/systemd/systemd.resource-control.5.en.html|documentation]] in stretch and bust... [02:09:19] (03PS11) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [02:09:51] (03CR) 10Krinkle: [C: 03+1] xhgui: remove MongoDB backend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620142 (https://phabricator.wikimedia.org/T180761) (owner: 10Dave Pifke) [02:11:32] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [02:15:20] (03PS12) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [02:18:08] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [02:26:24] (03PS13) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [02:29:04] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [02:32:29] (03PS14) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [02:35:13] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [02:36:04] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:41:54] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=icinga site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:44:40] PROBLEM - SSH on webperf2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [02:46:26] RECOVERY - SSH on webperf2002 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [03:11:16] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [03:25:04] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=icinga site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [03:32:58] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [03:36:56] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=icinga site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:35:54] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:39:52] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=icinga site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:50:19] (03CR) 10Kosta Harlan: [C: 03+1] noc: Improve phrasing of highlight.php error message [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620138 (https://phabricator.wikimedia.org/T254646) (owner: 10Krinkle) [05:03:26] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:07:24] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=icinga site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [06:49:45] 10Operations, 10ops-codfw: mc2028 regular and mgmt interface down - https://phabricator.wikimedia.org/T260224 (10elukey) I would be in favor of using it, how many cpus/cores and memory are available on the spare? I tried to check the procurement task but it is a little bit difficult with all the correspondence... [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200814T0700) [07:01:27] (03PS12) 10Jcrespo: mariadb-backups: Reorganize files and update paths [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/619962 (https://phabricator.wikimedia.org/T165358) [07:05:30] (03PS13) 10Jcrespo: wmfbackups: Split WMFBackup into its logical components (backup methods) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/619962 (https://phabricator.wikimedia.org/T165358) [07:23:04] (03PS1) 10Elukey: Improve logging about why a datapoint is not collected/supported. [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/620288 [07:26:56] ACKNOWLEDGEMENT - Host mc2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Elukey T260224 [07:27:41] ACKNOWLEDGEMENT - Aggregate IPsec Tunnel Status eqiad on icinga1001 is CRITICAL: instance=mc1028 site=eqiad tunnel=mc2028_v4 Elukey T260224 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan https://grafana.wikimedia.org/d/B9JpocKZz/ipsec-tunnel-status [07:28:24] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:29:14] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 52, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:46:45] (03PS15) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 [07:48:40] (03CR) 10Ryan Kemper: "=== TOX STUFF ===" [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [07:49:27] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [08:07:26] (03CR) 10Ryan Kemper: "Cleaning up the comments." (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [08:16:07] (03PS2) 10Elukey: Improve logging about why a datapoint is not collected/supported. [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/620288 [08:16:58] (03CR) 10Elukey: [V: 03+2 C: 03+2] Improve logging about why a datapoint is not collected/supported. [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/620288 (owner: 10Elukey) [08:21:51] 10Operations, 10Gerrit-Privilege-Requests, 10User-Kormat: Request for Gerrit Managers permissions - https://phabricator.wikimedia.org/T260342 (10Kormat) > Just to clarify, are there specific permissions you would like and are lacking, or you'd like to be a manager to fix whatever comes up (which is fine too)... [08:26:58] (03CR) 10JMeybohm: [C: 04-1] Resurrect fluent-bit image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/619512 (https://phabricator.wikimedia.org/T251812) (owner: 10Ppchelko) [08:27:44] (03CR) 10Kormat: [C: 03+1] wmfbackups: Split WMFBackup into its logical components (backup methods) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/619962 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [08:32:55] (03CR) 10JMeybohm: [C: 04-1] "It might also be worth it to use buster as seed and an up to date fluent-bit package as 0.12.2 is from 2017 :)" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/619512 (https://phabricator.wikimedia.org/T251812) (owner: 10Ppchelko) [08:35:54] RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [08:38:40] (03CR) 10JMeybohm: "> Patch Set 3:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/619833 (https://phabricator.wikimedia.org/T255835) (owner: 10Jeena Huneidi) [08:42:45] (03PS14) 10Jcrespo: wmfbackups: Split WMFBackup into its logical components (backup methods) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/619962 (https://phabricator.wikimedia.org/T165358) [08:42:47] (03PS1) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [08:43:18] (03PS2) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [08:43:41] (03CR) 10jerkins-bot: [V: 04-1] wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [08:47:33] (03CR) 10Jcrespo: "The test only fails on CI (but works locally). Do you think we could enforce python >=3.7 for the development environment only? It would s" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [08:51:03] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Add DVrandecic to group nda - https://phabricator.wikimedia.org/T260279 (10Vgutierrez) p:05Triage→03Medium [08:54:47] (03PS3) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [08:55:11] (03CR) 10jerkins-bot: [V: 04-1] wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [09:04:20] (03PS1) 10Fdans: modules/refine: bump jar version to fix pageview definition bug [puppet] - 10https://gerrit.wikimedia.org/r/620293 (https://phabricator.wikimedia.org/T257860) [09:06:25] (03PS4) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [09:06:49] (03CR) 10jerkins-bot: [V: 04-1] wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [09:08:14] (03PS2) 10Elukey: profile::analytics::refinery::job::refine.pp: bump jar version to fix pageview definition bug [puppet] - 10https://gerrit.wikimedia.org/r/620293 (https://phabricator.wikimedia.org/T257860) (owner: 10Fdans) [09:09:13] (03PS3) 10Elukey: profile::analytics::refinery::job::refine: bump jar version [puppet] - 10https://gerrit.wikimedia.org/r/620293 (https://phabricator.wikimedia.org/T257860) (owner: 10Fdans) [09:11:51] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery::job::refine: bump jar version [puppet] - 10https://gerrit.wikimedia.org/r/620293 (https://phabricator.wikimedia.org/T257860) (owner: 10Fdans) [09:17:40] (03PS15) 10Jcrespo: wmfbackups: Split WMFBackup into its logical components (backup methods) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/619962 (https://phabricator.wikimedia.org/T165358) [09:17:50] (03PS5) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [09:18:15] (03CR) 10jerkins-bot: [V: 04-1] wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [09:18:39] (03CR) 10Jcrespo: [C: 03+2] "I fixed a linter issue with missin line break." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/619962 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [09:19:06] (03Merged) 10jenkins-bot: wmfbackups: Split WMFBackup into its logical components (backup methods) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/619962 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [09:21:00] (03CR) 10Jcrespo: "This is very much work in progress, but I wonder the performance impact of reading a file on each request. Shouldn't be as impacting given" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [09:28:25] (03PS6) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [09:28:33] (03CR) 10jerkins-bot: [V: 04-1] wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [09:28:40] (03PS7) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [09:32:16] (03CR) 10Vgutierrez: [C: 03+1] cache: remove '_ats' suffix from DC names [puppet] - 10https://gerrit.wikimedia.org/r/618975 (https://phabricator.wikimedia.org/T241239) (owner: 10Ema) [09:36:25] (03PS6) 10Lucas Werkmeister (WMDE): Enable Data Bridge on Catalan Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/595543 (https://phabricator.wikimedia.org/T232584) [09:37:19] (03PS1) 10Kormat: Fix lintian duplicate-conffile error [software/transferpy] - 10https://gerrit.wikimedia.org/r/620299 [09:39:49] (03PS1) 10Kormat: Fix lintian duplicate-conffile error [software/transferpy] - 10https://gerrit.wikimedia.org/r/620300 [09:41:00] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10elukey) [09:41:07] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10elukey) [09:42:23] (03Abandoned) 10Kormat: Fix lintian duplicate-conffile error [software/transferpy] - 10https://gerrit.wikimedia.org/r/620300 (owner: 10Kormat) [09:42:52] (03PS2) 10Kormat: Fix lintian duplicate-conffile error [software/transferpy] - 10https://gerrit.wikimedia.org/r/620299 [09:43:42] 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10decommission-hardware: Decommission analytics10[28-31,33-41] - https://phabricator.wikimedia.org/T227485 (10elukey) [09:47:11] (03PS2) 10Kormat: Move RemoteExecution library to wmfmariadbpy [software/transferpy] - 10https://gerrit.wikimedia.org/r/619959 (https://phabricator.wikimedia.org/T259516) [09:51:37] (03CR) 10Kormat: [C: 04-2] "> Patch Set 1: Code-Review-2" [software/transferpy] - 10https://gerrit.wikimedia.org/r/619959 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat) [10:01:08] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] golang:1.13-2, Add ca-certificates [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/619731 (owner: 10Addshore) [10:02:08] (03CR) 10Jcrespo: [C: 03+2] Fix lintian duplicate-conffile error [software/transferpy] - 10https://gerrit.wikimedia.org/r/620299 (owner: 10Kormat) [10:15:10] 10Operations, 10ops-codfw: mc2028 regular and mgmt interface down - https://phabricator.wikimedia.org/T260224 (10elukey) Just to summarize, I think that we have two options: 1) if the specs of the spare are ok, we could swap it with mc2028 (dns, reimage, puppet config, etc..). This would allow us to have the... [10:20:41] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Add DVrandecic to group nda - https://phabricator.wikimedia.org/T260279 (10Vgutierrez) From https://wikitech.wikimedia.org/wiki/LDAP/Groups: ` wmf - for WMF staff/contractors (documented below) ops - for operations people (see ops group in puppet mani... [10:27:55] (03PS1) 10JMeybohm: golang: Fix run template package list to be a string [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620302 [10:29:36] (03CR) 10Addshore: [C: 03+1] golang: Fix run template package list to be a string [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620302 (owner: 10JMeybohm) [10:30:26] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] golang: Fix run template package list to be a string [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620302 (owner: 10JMeybohm) [10:31:25] (03CR) 10Jcrespo: [C: 03+1] "With the previous warning about potential loss of downtimes/alert disabling." [puppet] - 10https://gerrit.wikimedia.org/r/619291 (owner: 10Kormat) [10:38:27] (03PS1) 10Giuseppe Lavagetto: [WIP] Introduce sre.host.reboot-cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/620305 [10:39:37] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Introduce sre.host.reboot-cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/620305 (owner: 10Giuseppe Lavagetto) [10:50:47] (03PS1) 10Jcrespo: [WIP] Add WMFBackup package creation [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620309 (https://phabricator.wikimedia.org/T165358) [10:50:56] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add WMFBackup package creation [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620309 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [10:52:05] (03PS2) 10Jcrespo: [WIP] Add WMFBackup package creation [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620309 (https://phabricator.wikimedia.org/T165358) [11:01:37] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] build: loki & ratelimit use golang:1.13-2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620015 (owner: 10Addshore) [11:02:32] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, 10Sustainability (Incident Followup): mw* servers memory leaks (12 Aug) - https://phabricator.wikimedia.org/T260281 (10NullPointer) I suggest setting this a security issue since this may cause people to //intentionally// make memory leaks to... [11:03:15] (03PS1) 10Hnowlan: api-gateway: strip cookie headers from requests and responses. [deployment-charts] - 10https://gerrit.wikimedia.org/r/620311 (https://phabricator.wikimedia.org/T259296) [11:03:33] (03PS8) 10Jcrespo: wmfmariadbpy: Load and provide a method for section to port assignment [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) [11:03:35] (03PS3) 10Jcrespo: [WIP] Add WMFBackup package creation [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620309 (https://phabricator.wikimedia.org/T165358) [11:08:45] (03PS1) 10Jcrespo: [WIP] Change backup hosts into using the package version of scripts [puppet] - 10https://gerrit.wikimedia.org/r/620312 (https://phabricator.wikimedia.org/T165358) [11:09:41] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, 10Sustainability (Incident Followup): mw* servers memory leaks (12 Aug) - https://phabricator.wikimedia.org/T260281 (10RhinosF1) >>! In T260281#6385334, @NullPointer wrote: > I suggest setting this a security issue since this may cause peopl... [11:09:49] (03PS4) 10Jcrespo: [WIP] Add WMFBackup package creation [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620309 (https://phabricator.wikimedia.org/T165358) [11:09:57] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Change backup hosts into using the package version of scripts [puppet] - 10https://gerrit.wikimedia.org/r/620312 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [11:11:23] (03PS2) 10Hnowlan: api-gateway: strip cookie headers from requests and responses. [deployment-charts] - 10https://gerrit.wikimedia.org/r/620311 (https://phabricator.wikimedia.org/T259296) [11:13:17] (03CR) 10Hnowlan: "This review also implements T259294" [deployment-charts] - 10https://gerrit.wikimedia.org/r/620311 (https://phabricator.wikimedia.org/T259296) (owner: 10Hnowlan) [11:15:30] (03PS1) 10Jcrespo: backup_mariadb: Use path to find backup_mariadb.py [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620315 (https://phabricator.wikimedia.org/T165358) [11:18:26] (03PS1) 10Addshore: golang:1.13-3, Add git build-essential [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620316 [11:20:09] (03PS1) 10Addshore: build: loki & ratelimit use golang:1.13-3 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620317 [11:21:19] (03PS2) 10Addshore: golang:1.13-3, Add git & build-essential [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620316 [11:21:25] (03PS2) 10Addshore: build: loki & ratelimit use golang:1.13-3 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620317 [11:23:08] (03CR) 10Ladsgroup: "Do you think this can be merged now?" [puppet] - 10https://gerrit.wikimedia.org/r/617842 (https://phabricator.wikimedia.org/T256536) (owner: 10Ladsgroup) [11:39:07] (03PS1) 10Jcrespo: wmfmariadbpy: Add unit tests for resolve method [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620319 [12:03:28] (03CR) 10Kormat: [C: 04-1] "> Patch Set 5: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/619291 (owner: 10Kormat) [12:11:38] (03CR) 10Hnowlan: [C: 03+1] "LGTM" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620068 (owner: 10Ppchelko) [12:15:41] (03PS2) 10Jcrespo: [WIP] Change backup hosts into using the package version of scripts [puppet] - 10https://gerrit.wikimedia.org/r/620312 (https://phabricator.wikimedia.org/T165358) [12:15:47] (03CR) 10Kormat: "> The test only fails on CI (but works locally). Do you think we could enforce python >=3.7 for the development environment only? It would" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620291 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [12:16:52] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Change backup hosts into using the package version of scripts [puppet] - 10https://gerrit.wikimedia.org/r/620312 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo) [12:32:59] (03CR) 10Hnowlan: [C: 03+1] "> Patch Set 2:" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/619804 (https://phabricator.wikimedia.org/T254914) (owner: 10Ppchelko) [13:00:17] (03PS1) 10Kormat: Replace the old wmfmariadbpy metapackage [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620333 [13:01:45] (03CR) 10Kormat: [C: 03+2] Replace the old wmfmariadbpy metapackage [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620333 (owner: 10Kormat) [13:02:24] (03Merged) 10jenkins-bot: Replace the old wmfmariadbpy metapackage [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/620333 (owner: 10Kormat) [13:18:18] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 59 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:18:58] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 51 probes of 572 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:22:50] I can't ssh to bast1002 or 2002 from here. something's out of whack [13:24:14] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 46 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:24:33] apergos: 3way handshake is ok from here (nc -zv) [13:24:47] might be a specific provider [13:24:54] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 45 probes of 572 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:25:14] or it might be those recoveries ^^ (now back for me [13:25:15] ) [13:41:05] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] golang:1.13-3, Add git & build-essential [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/620316 (owner: 10Addshore) [13:51:23] (03CR) 10Hashar: "That is awesome and has been a pain point anytime we upgraded the Jenkins package (it would start the service unexpectedly on the agent) o" [puppet] - 10https://gerrit.wikimedia.org/r/619855 (owner: 10Dzahn) [13:53:28] (03PS5) 10Hashar: ci: switch integration.wikimedia.org to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924) [13:54:29] (03CR) 10Hashar: [C: 03+1] "Thanks for the parent changes. We can do that anytime next week, I should be around in your morning :]" [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [13:58:50] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [14:00:16] PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:00:46] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [14:25:53] (03Abandoned) 10Elukey: admin: remove krb flag from user neilpquinn-wmf [puppet] - 10https://gerrit.wikimedia.org/r/619269 (owner: 10Elukey) [14:44:51] (03PS6) 10Hashar: ci: switch integration.wikimedia.org to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924) [14:44:53] (03PS1) 10Hashar: doc: switch to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/620368 (https://phabricator.wikimedia.org/T149924) [14:46:01] (03CR) 10Hashar: [C: 04-1] "That is roughly how to migrate doc.wikimedia.org to a new document root. I will most probably split this change though and definitely nee" [puppet] - 10https://gerrit.wikimedia.org/r/620368 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [14:46:20] (03CR) 10jerkins-bot: [V: 04-1] doc: switch to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/620368 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [14:55:10] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, 10Sustainability (Incident Followup): Figure what change caused the ongoing memleak on mw appservers - https://phabricator.wikimedia.org/T260329 (10JMeybohm) Looking at the values today it's pretty clear that mw1382 wins and mw1381 takes the... [15:00:16] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:01:08] RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:04:12] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:05:06] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:05:46] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 54, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:07:09] (03PS2) 10Ottomata: Revert to anaconda 2020.02, also some activation improvements [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/620144 [15:32:38] 10Operations, 10ops-codfw: mc2028 regular and mgmt interface down - https://phabricator.wikimedia.org/T260224 (10Papaul) HP ProLiant DL360 Gen9 with 64GB RAM, Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz each CPU has 8/8 cores; 16 threads takes SSD's [15:33:54] (03CR) 10Gehel: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [15:34:37] 10Operations, 10ops-codfw: mc2028 regular and mgmt interface down - https://phabricator.wikimedia.org/T260224 (10elukey) The RAM size is what concerns me, since all memcached shards are running with ~90G of memory allocated (so we'd have some imbalance and possibly more evictions etc..). [15:35:16] (03CR) 10Gehel: "LGTM, except for the jenkins failure." [cookbooks] - 10https://gerrit.wikimedia.org/r/603731 (owner: 10Ryan Kemper) [15:37:58] 10Operations, 10ops-codfw: mc2028 regular and mgmt interface down - https://phabricator.wikimedia.org/T260224 (10Papaul) RAM is not problem we can take the RAM from the broken server and add it to the spare server [15:42:24] 10Operations, 10ops-codfw: mc2028 regular and mgmt interface down - https://phabricator.wikimedia.org/T260224 (10elukey) >>! In T260224#6385838, @Papaul wrote: > RAM is not problem we can take the RAM from the broken server and add it to the spare server Wonderful news, now it is looking definitely better,... [15:44:43] (03CR) 10Gehel: "The jenkins failure needs some investigation (I don't see how it is related to this CR)." (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (owner: 10Ryan Kemper) [15:45:07] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: ASAP) rack/setup/install clouddb10[13-20] - https://phabricator.wikimedia.org/T260441 (10RobH) [15:45:24] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: ASAP) rack/setup/install clouddb10[13-20] - https://phabricator.wikimedia.org/T260441 (10RobH) [16:07:29] (03PS1) 10CDanis: whitelist broken advertisements from Jio AS55836 [homer/public] - 10https://gerrit.wikimedia.org/r/620377 [16:24:41] (03PS2) 10CDanis: whitelist broken advertisements from Jio AS55836 [homer/public] - 10https://gerrit.wikimedia.org/r/620377 [16:27:22] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-40] - https://phabricator.wikimedia.org/T260445 (10RobH) [16:27:44] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-40] - https://phabricator.wikimedia.org/T260445 (10RobH) [16:28:57] (03CR) 10Ayounsi: [C: 03+1] "LGTM, we need eqsin + NS sites, but not worth the hastle to only skip ulsfo." [homer/public] - 10https://gerrit.wikimedia.org/r/620377 (owner: 10CDanis) [16:32:26] (03CR) 10CDanis: [C: 03+2] whitelist broken advertisements from Jio AS55836 [homer/public] - 10https://gerrit.wikimedia.org/r/620377 (owner: 10CDanis) [16:32:50] (03Merged) 10jenkins-bot: whitelist broken advertisements from Jio AS55836 [homer/public] - 10https://gerrit.wikimedia.org/r/620377 (owner: 10CDanis) [16:36:46] !log ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' [16:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:35] !log ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr1-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' [16:39:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:22] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops: (Need By: 2020-10-31) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10RobH) [16:40:33] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops: (Need By: 2020-10-31) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10RobH) [16:44:22] !log ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-esams*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' [16:44:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:19] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10CDanis) [16:50:32] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10CDanis) [16:51:37] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10CDanis) For posterity, relevant workaround patch and deployment thereof: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/620377 https://sal.toolforge... [16:52:10] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10CDanis) [16:52:53] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10Cparle) [16:54:19] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10RhinosF1) Hi @Cparle, per the description of #sre-access-requests, you need to add the following information to the task: == Requestor provided information and prerequisites ==... [16:56:01] (03PS1) 10CDanis: add tracking task to comment [homer/public] - 10https://gerrit.wikimedia.org/r/620382 [16:58:54] !log done deploying 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' to all routers T260449 [16:58:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:57] T260449: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 [16:58:58] (03CR) 10CDanis: [C: 03+2] add tracking task to comment [homer/public] - 10https://gerrit.wikimedia.org/r/620382 (owner: 10CDanis) [16:59:22] (03Merged) 10jenkins-bot: add tracking task to comment [homer/public] - 10https://gerrit.wikimedia.org/r/620382 (owner: 10CDanis) [17:02:08] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10CDanis) [17:02:29] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10CDanis) 05Open→03Resolved a:03CDanis There's still an issue on Jio's side that needs to be fixed by them, but, we've put a temporary workaround in place,... [18:53:52] (03PS1) 10Ssingh: Revert "whitelist broken advertisements from Jio AS55836" [homer/public] - 10https://gerrit.wikimedia.org/r/620386 (https://phabricator.wikimedia.org/T260452) [19:26:30] (03PS1) 10Hashar: doc: prepare /srv/doc as the new destination [puppet] - 10https://gerrit.wikimedia.org/r/620389 (https://phabricator.wikimedia.org/T149924) [19:27:46] (03CR) 10jerkins-bot: [V: 04-1] doc: prepare /srv/doc as the new destination [puppet] - 10https://gerrit.wikimedia.org/r/620389 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [19:27:56] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10elukey) [19:29:21] (03PS2) 10Hashar: doc: prepare /srv/doc as the new destination [puppet] - 10https://gerrit.wikimedia.org/r/620389 (https://phabricator.wikimedia.org/T149924) [19:41:04] !log restart mwdebug1002 [19:41:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:48] 10Operations, 10serviceops, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10kostajh) > @kostajh -- are you asking whether we should deactivate EditorJourney in all wikis, so as to stop it from recordin... [20:43:32] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:45:30] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:56:00] PROBLEM - PHP opcache health on mwdebug1002 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [21:41:27] ^ me, I will disable this alert [21:45:31] ACKNOWLEDGEMENT - PHP opcache health on mwdebug1002 is CRITICAL: CRITICAL: opcache free space is below 50 MB Effie Mouzeli testing T253673 https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health