[00:07:55] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:10:23] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:51:05] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_legacy_failure_flags.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:32:47] (Traffic bill over quota) firing: (2) Alert for device cr2-eqdfw.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org [01:35:29] PROBLEM - puppet last run on cloudweb2001-dev is CRITICAL: CRITICAL: Puppet last ran 9 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [01:37:47] (Traffic bill over quota) firing: (3) Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org [01:42:47] (Traffic bill over quota) resolved: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org [01:47:51] 10SRE, 10Traffic: Wikimedia was temporarily unreachable via Esams - https://phabricator.wikimedia.org/T279809 (10Peachey88) [01:48:23] RECOVERY - puppet last run on cloudweb2001-dev is OK: OK: Puppet is currently disabled (experimenting with compression settings), not alerting. Last run 6 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [02:07:35] PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The following units failed: package_builder_Clean_up_build_directory.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:07:57] RECOVERY - dump of es4 in eqiad on alert1001 is OK: Last dump for es4 at eqiad (es1022.eqiad.wmnet) taken on 2021-04-09 14:07:57 (1662 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [02:21:57] (03PS1) 10Andrew Bogott: codfw1dev horizon: temporarily roll back release version to test T279699 [puppet] - 10https://gerrit.wikimedia.org/r/678297 [02:22:38] (03CR) 10Andrew Bogott: [C: 03+2] codfw1dev horizon: temporarily roll back release version to test T279699 [puppet] - 10https://gerrit.wikimedia.org/r/678297 (owner: 10Andrew Bogott) [02:31:04] (03PS1) 10Andrew Bogott: Revert "codfw1dev horizon: temporarily roll back release version to test T279699" [puppet] - 10https://gerrit.wikimedia.org/r/678298 [02:31:37] (03CR) 10Andrew Bogott: [C: 03+2] Revert "codfw1dev horizon: temporarily roll back release version to test T279699" [puppet] - 10https://gerrit.wikimedia.org/r/678298 (owner: 10Andrew Bogott) [03:15:39] PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:20:23] RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.070 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [04:04:01] PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration [04:06:21] RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: active_primary_shards: 937, timed_out: False, unassigned_shards: 0, relocating_shards: 0, active_shards_percent_as_number: 100.0, number_of_nodes: 6, number_of_data_nodes: 6, number_of_pending_tasks: 0, active_shards: 1877, status: green, cluster_name: cloudelastic-chi-eqiad, task_max_waiting_in_qu [04:06:21] layed_unassigned_shards: 0, initializing_shards: 0, number_of_in_flight_fetch: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration [04:07:39] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 133.2 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [04:13:40] (03PS1) 10Palak199: Check for server version and compare with xtrabackup [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/678299 (https://phabricator.wikimedia.org/T253959) [05:17:57] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job={netbox_device_statistics,swagger_check_citoid_cluster_eqiad} site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:20:23] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [05:30:31] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [05:56:05] (03PS1) 10Ladsgroup: lists: Add option to enable mailman3 on lists [puppet] - 10https://gerrit.wikimedia.org/r/678300 (https://phabricator.wikimedia.org/T278612) [05:59:59] PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is CRITICAL: 59.79 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [06:02:25] RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [07:00:05] Deploy window No deploys all day! See [[Deployments/Emergencies]] if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210410T0700) [07:03:43] (03PS2) 10Ladsgroup: lists: Add option to enable mailman3 on lists [puppet] - 10https://gerrit.wikimedia.org/r/678300 (https://phabricator.wikimedia.org/T278612) [07:08:34] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/678300 (https://phabricator.wikimedia.org/T278612) (owner: 10Ladsgroup) [07:12:13] (03CR) 10Ladsgroup: "PCC diff: https://puppet-compiler.wmflabs.org/compiler1001/712/lists1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/678300 (https://phabricator.wikimedia.org/T278612) (owner: 10Ladsgroup) [07:13:01] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Install mailman3 and mailman2 at the same time on the cloud - https://phabricator.wikimedia.org/T278612 (10Ladsgroup) https://polymorphic.lists.wmcloud.org/mailman3/postorius/lists/ ^^ [07:18:12] (03CR) 10DharmrajRathod98: "Thank you for pointing me out "return is_valid_datetime". Should i put only one test case in tests? as you said having tests > no tests. e" [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/673693 (https://phabricator.wikimedia.org/T277754) (owner: 10DharmrajRathod98) [07:23:43] (03PS12) 10DharmrajRathod98: Improved: regex-validation in cli/recover-dump and added unit test file in test/unit [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/673693 (https://phabricator.wikimedia.org/T277754) [08:06:24] 10SRE, 10Traffic: Wikimedia was temporarily unreachable via Esams (2021-04-09 22:10 UTC) - https://phabricator.wikimedia.org/T279809 (10Aklapper) [10:03:11] (03CR) 10DharmrajRathod98: "if this task is over then i am really enjoying the proccess of reviewing and pushing the code.that would be great if you can suggest anoth" [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/673693 (https://phabricator.wikimedia.org/T277754) (owner: 10DharmrajRathod98) [10:05:51] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 70.17 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37 [10:20:43] 10SRE, 10Traffic: Wikimedia was temporarily unreachable via Esams (2021-04-09 22:10 UTC) - https://phabricator.wikimedia.org/T279809 (10Majavah) Likely related: ` [22:11:33] PROBLEM - Host lvs3005 is DOWN: PING CRITICAL - Packet loss = 100% [22:12:01] RECOVERY - Host lvs3005 is UP: PI... [10:21:43] PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration [10:26:27] RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: delayed_unassigned_shards: 0, initializing_shards: 0, active_shards: 1877, active_shards_percent_as_number: 100.0, cluster_name: cloudelastic-chi-eqiad, number_of_nodes: 6, relocating_shards: 0, number_of_data_nodes: 6, number_of_pending_tasks: 0, task_max_waiting_in_queue_millis: 0, timed_out: Fal [10:26:27] , active_primary_shards: 937, unassigned_shards: 0, number_of_in_flight_fetch: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration [11:04:23] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:18:19] 10SRE, 10netops: Lumen link between cr2-eqiad and cr2-esams down - https://phabricator.wikimedia.org/T279820 (10elukey) [11:29:55] 10SRE, 10netops: Lumen link between cr2-eqiad and cr2-esams down - https://phabricator.wikimedia.org/T279820 (10ayounsi) Thanks! > Ticket ID 21038766 has been successfully created. [11:31:28] 10SRE, 10netops: Lumen link between cr2-eqiad and cr2-esams down - https://phabricator.wikimedia.org/T279820 (10ayounsi) a:03ayounsi [11:47:03] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:49:31] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:34:47] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 83, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [12:37:55] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [12:41:07] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 75.25 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1001&panelId=37 [14:08:24] !log andrew@deploy1002 Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 [14:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:34] T279699: Horizon does not allow sorting tables anymore - https://phabricator.wikimedia.org/T279699 [14:08:35] !log andrew@deploy1002 Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 (duration: 00m 11s) [14:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:44] !log andrew@deploy1002 Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 [14:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:05] !log andrew@deploy1002 Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 (duration: 02m 21s) [14:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:25] (03PS1) 10Anjali041: Changed mistakes in 6 files [puppet] - 10https://gerrit.wikimedia.org/r/678320 (https://phabricator.wikimedia.org/T201491) [14:17:12] !log andrew@deploy1002 Started deploy [horizon/deploy@ee1be56]: fix for T279699 [14:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:20] T279699: Horizon does not allow sorting tables anymore - https://phabricator.wikimedia.org/T279699 [14:21:24] !log andrew@deploy1002 Finished deploy [horizon/deploy@ee1be56]: fix for T279699 (duration: 04m 12s) [14:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:47] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [14:37:51] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 238, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:20:03] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:22:33] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:59:41] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:02:09] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:17:01] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:19:27] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:16:45] PROBLEM - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [17:19:05] RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [17:52:16] (03PS1) 10Ladsgroup: Don't do strict equal condition check [extensions/FlaggedRevs] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/678347 (https://phabricator.wikimedia.org/T279750) [17:53:34] (03CR) 10Jforrester: [C: 03+1] "Good to deploy on Monday." [extensions/FlaggedRevs] (wmf/1.36.0-wmf.38) - 10https://gerrit.wikimedia.org/r/678347 (https://phabricator.wikimedia.org/T279750) (owner: 10Ladsgroup) [19:15:23] (03PS1) 10Luke081515: Enable Wikidata description override on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678332 (https://phabricator.wikimedia.org/T279829) [19:45:08] (03Abandoned) 10Jforrester: Enable Wikidata description override on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678332 (https://phabricator.wikimedia.org/T279829) (owner: 10Luke081515) [20:00:42] (03PS1) 10Ladsgroup: systemd: Make timer send email to MAILTO if specified instead of root@ [puppet] - 10https://gerrit.wikimedia.org/r/678336 (https://phabricator.wikimedia.org/T273673) [20:00:46] (03PS1) 10Ladsgroup: systemd: Add ability to set working directory in the timer job [puppet] - 10https://gerrit.wikimedia.org/r/678337 (https://phabricator.wikimedia.org/T273673) [20:00:48] (03PS1) 10Ladsgroup: snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) [20:01:47] (03CR) 10jerkins-bot: [V: 04-1] snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [20:02:15] (03CR) 10jerkins-bot: [V: 04-1] systemd: Make timer send email to MAILTO if specified instead of root@ [puppet] - 10https://gerrit.wikimedia.org/r/678336 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [20:07:20] (03PS2) 10Ladsgroup: systemd: Make timer send email to MAILTO if specified instead of root@ [puppet] - 10https://gerrit.wikimedia.org/r/678336 (https://phabricator.wikimedia.org/T273673) [20:07:22] (03PS2) 10Ladsgroup: systemd: Add ability to set working directory in the timer job [puppet] - 10https://gerrit.wikimedia.org/r/678337 (https://phabricator.wikimedia.org/T273673) [20:07:25] (03PS2) 10Ladsgroup: snapshot: Migrate cronjobs in pagetitles to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/678338 (https://phabricator.wikimedia.org/T273673) [20:07:35] (03PS1) 10Zabe: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) [20:08:11] (03CR) 10Zabe: [C: 04-1] "on hold" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [20:12:30] (03CR) 10Zoranzoki21: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [20:16:13] (03PS2) 10Zabe: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) [20:16:43] (03CR) 10Zabe: Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [20:16:51] (03CR) 10Zabe: [C: 04-1] "still on hold" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [20:23:14] (03PS1) 10Zabe: Replace 'ombudsman' with 'ombuds' in wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678341 (https://phabricator.wikimedia.org/T256299) [20:47:43] (03PS1) 10Jforrester: [wikitech] Update logo to mirror the new MediaWiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678342 (https://phabricator.wikimedia.org/T279087) [20:54:39] (03CR) 10Ladsgroup: [C: 03+1] [wikitech] Update logo to mirror the new MediaWiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678342 (https://phabricator.wikimedia.org/T279087) (owner: 10Jforrester) [21:50:14] (03PS1) 10Zabe: Add extendedconfirmed on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678366 (https://phabricator.wikimedia.org/T279836) [21:52:20] (03CR) 10jerkins-bot: [V: 04-1] Add extendedconfirmed on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678366 (https://phabricator.wikimedia.org/T279836) (owner: 10Zabe) [21:54:01] (03PS2) 10Zabe: Add extendedconfirmed on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678366 (https://phabricator.wikimedia.org/T279836) [22:01:36] (03CR) 10Urbanecm: [C: 03+1] Replace 'ombudsman' with 'ombuds' in wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678341 (https://phabricator.wikimedia.org/T256299) (owner: 10Zabe) [22:11:45] (03CR) 10Zoranzoki21: [C: 03+1] Add 'abusefilter-maintainer' to wmgPrivilegedGlobalGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/678339 (https://phabricator.wikimedia.org/T279835) (owner: 10Zabe) [22:47:53] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:50:15] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:16:17] 10SRE, 10MediaWiki-General, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), and 3 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Jdforrester-WMF) No, I believe the final step post-upgrade hasn't yet be...