[08:22:07] (03CR) 10Legoktm: [C: 03+1] "Duh," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660001 (owner: 10Urbanecm) [08:22:56] (03CR) 10Legoktm: [C: 03+1] Publish logos.php at noc.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/659433 (https://phabricator.wikimedia.org/T273330) (owner: 10Urbanecm) [08:25:20] (03CR) 10Legoktm: "Nice! Did everything work out OK? (besides the validate() issue you fixed in the other patch)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660000 (https://phabricator.wikimedia.org/T273323) (owner: 10Urbanecm) [08:25:45] (03CR) 10Legoktm: [C: 03+2] "no-op in production" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660001 (owner: 10Urbanecm) [08:26:38] (03Merged) 10jenkins-bot: logos: Run validate after updating logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660001 (owner: 10Urbanecm) [08:39:20] (03PS1) 10Elukey: Allow incompatible col type changes for Hive 2.x [puppet] - 10https://gerrit.wikimedia.org/r/660619 (https://phabricator.wikimedia.org/T268733) [08:40:42] (03PS1) 10Legoktm: [WIP] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660620 [08:40:54] (03CR) 10jerkins-bot: [V: 04-1] Allow incompatible col type changes for Hive 2.x [puppet] - 10https://gerrit.wikimedia.org/r/660619 (https://phabricator.wikimedia.org/T268733) (owner: 10Elukey) [08:41:13] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27779/console" [puppet] - 10https://gerrit.wikimedia.org/r/660619 (https://phabricator.wikimedia.org/T268733) (owner: 10Elukey) [08:44:49] (03PS2) 10Elukey: Allow incompatible col type changes for Hive 2.x [puppet] - 10https://gerrit.wikimedia.org/r/660619 (https://phabricator.wikimedia.org/T268733) [08:46:17] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27780/console" [puppet] - 10https://gerrit.wikimedia.org/r/660619 (https://phabricator.wikimedia.org/T268733) (owner: 10Elukey) [08:50:02] (03CR) 10Elukey: [V: 03+1 C: 03+2] Allow incompatible col type changes for Hive 2.x [puppet] - 10https://gerrit.wikimedia.org/r/660619 (https://phabricator.wikimedia.org/T268733) (owner: 10Elukey) [08:50:34] (03PS2) 10Legoktm: logos: Update dewiki from Commons and recompress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660620 [08:50:36] (03PS1) 10Legoktm: logos: Update frwiki from Commons and recompress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660621 [08:50:38] (03PS1) 10Legoktm: logos: Update plwiki from Commons and recompress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660622 [08:50:40] (03PS1) 10Legoktm: logos: Update itwiki from Commons and recompress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660623 [08:50:42] (03PS1) 10Legoktm: logos: Update jawiki from Commons and recompress [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660624 [09:09:23] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [09:11:43] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:20:34] (03PS1) 10Elukey: presto: set kerberos enabled by default [puppet] - 10https://gerrit.wikimedia.org/r/660630 [10:22:59] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (NOOP 5): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27781/console" [puppet] - 10https://gerrit.wikimedia.org/r/660630 (owner: 10Elukey) [10:24:46] (03CR) 10Elukey: [V: 03+1 C: 03+2] presto: set kerberos enabled by default [puppet] - 10https://gerrit.wikimedia.org/r/660630 (owner: 10Elukey) [10:25:50] (03CR) 10Urbanecm: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660000 (https://phabricator.wikimedia.org/T273323) (owner: 10Urbanecm) [13:07:40] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:09:52] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:10:56] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:15:40] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:41:42] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:46:26] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:46:49] (03PS1) 10ArielGlenn: make snapshot1007 running buster a dumpsrunner and move testbed to 1005 [puppet] - 10https://gerrit.wikimedia.org/r/660634 (https://phabricator.wikimedia.org/T269377) [15:51:55] (03CR) 10ArielGlenn: [C: 03+2] make snapshot1007 running buster a dumpsrunner and move testbed to 1005 [puppet] - 10https://gerrit.wikimedia.org/r/660634 (https://phabricator.wikimedia.org/T269377) (owner: 10ArielGlenn) [16:09:57] 10SRE, 10Dumps-Generation, 10Platform Engineering, 10serviceops, 10Patch-For-Review: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (10ArielGlenn) The prefetch runs went well. I ran a small wiki on snapshot1007 (buster) and then on snapshot1005 (stretch) on the same hardware... [16:13:25] 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: (Need By: 2021-03-31) rack/setup/install snapshot101[1-5] - https://phabricator.wikimedia.org/T272509 (10ArielGlenn) It is extremely likely you'll be able to install direcly with buster once these arrive. I should know within a week. The first reimaged host... [16:19:34] 10SRE: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet - https://phabricator.wikimedia.org/T234076 (10ArielGlenn) 05Open→03Resolved I wonder why this is still open. Woops! Host has been doing its job for quite some time... [17:17:20] (03PS1) 10Andrew Bogott: Update the nova-fullstack monitoring to expect python3 [puppet] - 10https://gerrit.wikimedia.org/r/660639 (https://phabricator.wikimedia.org/T272587) [17:18:21] (03CR) 10Andrew Bogott: [C: 03+2] Update the nova-fullstack monitoring to expect python3 [puppet] - 10https://gerrit.wikimedia.org/r/660639 (https://phabricator.wikimedia.org/T272587) (owner: 10Andrew Bogott) [17:24:42] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:25:01] (03PS1) 10Andrew Bogott: nova-fullstack: use keystoneauth1 session [puppet] - 10https://gerrit.wikimedia.org/r/660640 (https://phabricator.wikimedia.org/T273378) [17:26:01] (03CR) 10Andrew Bogott: [C: 03+2] nova-fullstack: use keystoneauth1 session [puppet] - 10https://gerrit.wikimedia.org/r/660640 (https://phabricator.wikimedia.org/T273378) (owner: 10Andrew Bogott) [17:27:04] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:31:06] 10SRE, 10Commons, 10Wikimedia-General-or-Unknown: Upload to Commons fails with a common ADSL connection in Taiwan - https://phabricator.wikimedia.org/T205619 (10Jidanni) Couldn't even upload this to this bug report: Upload Failure m.mp3 Server responded: (03CR) 10ArielGlenn: [C: 03+2] "Woops, the package is already out and deployed on a a production server. Guess I'd better merge this then." [debs/mwbzutils] - 10https://gerrit.wikimedia.org/r/658923 (owner: 10ArielGlenn) [17:57:37] (03CR) 10Andrew Bogott: [C: 03+2] nova-fullstack: fix puppet cert cleanup check to use bastion [puppet] - 10https://gerrit.wikimedia.org/r/660641 (https://phabricator.wikimedia.org/T272587) (owner: 10Andrew Bogott) [18:20:22] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:21:48] PROBLEM - Logstash Elasticsearch indexing errors #o11y on alert1001 is CRITICAL: 18.8 ge 8 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/3283cc1372b7df18f26128163125cf45 https://grafana.wikimedia.org/dashboard/db/logstash [18:22:42] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:00:53] (03PS1) 10Andrew Bogott: nova-fullstack: remove test for puppet cert cleanup [puppet] - 10https://gerrit.wikimedia.org/r/660643 (https://phabricator.wikimedia.org/T272587) [20:01:35] (03CR) 10jerkins-bot: [V: 04-1] nova-fullstack: remove test for puppet cert cleanup [puppet] - 10https://gerrit.wikimedia.org/r/660643 (https://phabricator.wikimedia.org/T272587) (owner: 10Andrew Bogott) [20:04:52] (03PS2) 10Andrew Bogott: nova-fullstack: remove test for puppet cert cleanup [puppet] - 10https://gerrit.wikimedia.org/r/660643 (https://phabricator.wikimedia.org/T272587) [20:06:33] (03CR) 10Andrew Bogott: [C: 03+2] nova-fullstack: remove test for puppet cert cleanup [puppet] - 10https://gerrit.wikimedia.org/r/660643 (https://phabricator.wikimedia.org/T272587) (owner: 10Andrew Bogott) [20:19:30] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:21:42] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:18:10] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:20:30] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:18:59] (03PS1) 10Urbanecm: Revert "Revert "Remove usages and hard deprecate User::changeable(By)Group"" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660532 (https://phabricator.wikimedia.org/T273317) [23:20:07] (03Abandoned) 10Urbanecm: Revert "Revert "Remove usages and hard deprecate User::changeable(By)Group"" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660532 (https://phabricator.wikimedia.org/T273317) (owner: 10Urbanecm) [23:22:33] (03Restored) 10Urbanecm: Revert "Revert "Remove usages and hard deprecate User::changeable(By)Group"" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660532 (https://phabricator.wikimedia.org/T273317) (owner: 10Urbanecm) [23:22:39] (03PS2) 10Urbanecm: Revert "Revert "Remove usages and hard deprecate User::changeable(By)Group"" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660532 (https://phabricator.wikimedia.org/T273317) [23:22:57] (03Abandoned) 10Urbanecm: Revert "Revert "Remove usages and hard deprecate User::changeable(By)Group"" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660532 (https://phabricator.wikimedia.org/T273317) (owner: 10Urbanecm) [23:24:02] (03PS1) 10Urbanecm: Revert "Revert "Revert "Remove usages and hard deprecate User::changeable(By)Group""" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660533 (https://phabricator.wikimedia.org/T273317) [23:33:16] (03PS1) 10Urbanecm: Revert "Move User::changeable(By)Groups methods to UserGroupManager" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660649 (https://phabricator.wikimedia.org/T273296) [23:36:44] (03PS2) 10Urbanecm: Revert "Move User::changeable(By)Groups methods to UserGroupManager" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660649 (https://phabricator.wikimedia.org/T273296) [23:42:11] (03CR) 10jerkins-bot: [V: 04-1] Revert "Move User::changeable(By)Groups methods to UserGroupManager" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660649 (https://phabricator.wikimedia.org/T273296) (owner: 10Urbanecm) [23:43:30] (03PS3) 10Urbanecm: Revert "Move User::changeable(By)Groups methods to UserGroupManager" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660649 (https://phabricator.wikimedia.org/T273296) [23:49:37] (03CR) 10jerkins-bot: [V: 04-1] Revert "Move User::changeable(By)Groups methods to UserGroupManager" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660649 (https://phabricator.wikimedia.org/T273296) (owner: 10Urbanecm) [23:50:12] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:50:39] (03PS4) 10Urbanecm: Revert "Move User::changeable(By)Groups methods to UserGroupManager" [core] (wmf/1.36.0-wmf.28) - 10https://gerrit.wikimedia.org/r/660649 (https://phabricator.wikimedia.org/T273296) [23:52:30] (03PS1) 10Andrew Bogott: WMCS utils: add script to detect leaked VM puppet certs [puppet] - 10https://gerrit.wikimedia.org/r/660651 (https://phabricator.wikimedia.org/T273379) [23:52:32] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:54:30] (03CR) 10Andrew Bogott: [C: 03+2] WMCS utils: add script to detect leaked VM puppet certs [puppet] - 10https://gerrit.wikimedia.org/r/660651 (https://phabricator.wikimedia.org/T273379) (owner: 10Andrew Bogott)