[00:03:43] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10tstarling) @holger.knust told me by email that most page m... [00:22:19] RECOVERY - Ensure hosts are not performing a change on every puppet run on puppetdb1002 is OK: OK: all nodes running as expected https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes [00:29:37] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [00:31:27] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [00:34:29] PROBLEM - PHP opcache health on scandium is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [00:43:41] RECOVERY - PHP opcache health on scandium is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [01:38:30] (03PS3) 10Reedy: Preemptively revoke administrators' ability to check if 2FA is enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579817 (https://phabricator.wikimedia.org/T209749) (owner: 10DannyS712) [01:39:57] (03CR) 10Reedy: [C: 03+2] Preemptively revoke administrators' ability to check if 2FA is enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579817 (https://phabricator.wikimedia.org/T209749) (owner: 10DannyS712) [01:40:56] Reedy I thought deployments had to wait until https://phabricator.wikimedia.org/T250881 [01:41:08] (03Merged) 10jenkins-bot: Preemptively revoke administrators' ability to check if 2FA is enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579817 (https://phabricator.wikimedia.org/T209749) (owner: 10DannyS712) [01:41:10] James_F said `I'm guessing blocking patches to our main skin might be UBN worthy and block any SWAT deploys` [01:41:34] He didn't [01:41:50] Oops, wrong name, Jdlrobson [01:42:08] Patches to mw-config don't run tests against Vector [01:42:40] oh, okay; so its just swat deploys of code against core and extensions? [01:43:23] !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: T209749 (duration: 01m 01s) [01:43:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:43:31] T209749: Allow privileged accounts to determine if an account has enrolled in 2FA - https://phabricator.wikimedia.org/T209749 [01:43:36] I can't help wonder why it's using phab not gerrit [01:46:54] !log milimetric@deploy1001 Started deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump [01:46:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:51:03] !log milimetric@deploy1001 deploy aborted: Analytics: another follow-up on the train, jar version bump (duration: 04m 08s) [01:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:51:10] !log milimetric@deploy1001 Started deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump [01:51:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:54:04] !log milimetric@deploy1001 Finished deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump (duration: 02m 54s) [01:54:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:54:24] !log milimetric@deploy1001 Started deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump (take 2, analytics1030 keeps failing) [01:54:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:55:06] !log milimetric@deploy1001 Finished deploy [analytics/refinery@30facc4]: Analytics: another follow-up on the train, jar version bump (take 2, analytics1030 keeps failing) (duration: 00m 42s) [01:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:33:25] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10DannyS712) >>! In T219279#6077286, @tstarling wrote: > @ho... [03:22:24] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10tstarling) isSystemUser() is not used for authorization, a... [05:24:58] !log elukey@deploy1001 Started deploy [analytics/refinery@30facc4]: log [05:25:00] !log elukey@deploy1001 deploy aborted: log (duration: 00m 02s) [05:25:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:27:14] (checking a bad deployment and didn't execute "deploy-log" but "deploy log", good morning Luca) [05:45:30] !log elukey@deploy1001 Started deploy [analytics/refinery@30facc4]: Test of new scap settings [05:45:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:13] !log elukey@deploy1001 Finished deploy [analytics/refinery@30facc4]: Test of new scap settings (duration: 04m 42s) [05:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:04] Deploy window No Deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200422T0700) [07:33:47] 10Operations, 10netops: OSPF metrics - https://phabricator.wikimedia.org/T200277 (10ayounsi) The idea here is that once we have a sound logic behind the OSPF metrics, we can: 1/ Create the following Netbox circuits custom fields: * `latency`: in ms, this will not change often * `status`: multiple choice betw... [08:49:14] (03CR) 10Urbanecm: noc.wikimedia.org: highlight.php should not append .txt to dblist URLs (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [08:55:29] !log Move User:Wikipedia:Introduction (historical) --> Wikipedia:Introduction (historical) at enwiki using moveBatch.php, on-wiki interface was time-outing [08:55:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:14:17] (03PS1) 10RhinosF1: Set $wgArticleCount to 'any' on trwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 [09:20:02] (03PS2) 10RhinosF1: Set $wgArticleCount to 'any' on trwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) [10:45:41] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:49:11] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22719 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:26:03] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:35:13] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22733 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:02:23] 10Operations, 10Services, 10Service-deployment-requests, 10artificial-intelligence: New Service Request 'open_nsfw' - https://phabricator.wikimedia.org/T250110 (10Lazy-restless) Best of luck! [13:51:54] PROBLEM - Check systemd state on maps2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:56:37] Urbanecm: https://phabricator.wikimedia.org/T250583 [14:03:12] 10Operations, 10ops-ulsfo, 10DC-Ops: fix newly imported cable data in ulsfo - https://phabricator.wikimedia.org/T250408 (10ayounsi) Adding to the TODO: mr1-ulsfo:ge-0/0/1 cable (to msw1) mr1-ulsfo:ge-0/0/3 cable (to msw2) Are missing the cable ID, color and remote port# [15:01:55] 10Operations, 10netbox: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) [15:03:41] 10Operations, 10netbox: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) [15:03:42] 10Operations, 10ops-ulsfo, 10DC-Ops: fix newly imported cable data in ulsfo - https://phabricator.wikimedia.org/T250408 (10ayounsi) [15:03:45] 10Operations, 10netbox: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) [15:07:28] 10Operations, 10netbox: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) [15:23:09] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:32:57] PROBLEM - Check systemd state on maps2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:36:15] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22736 bytes in 0.275 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:57:10] (03PS1) 10Ottomata: Re-enable webrequest data deletion job [puppet] - 10https://gerrit.wikimedia.org/r/591935 (https://phabricator.wikimedia.org/T248600) [16:01:28] (03CR) 10Ottomata: [C: 03+2] Re-enable webrequest data deletion job [puppet] - 10https://gerrit.wikimedia.org/r/591935 (https://phabricator.wikimedia.org/T248600) (owner: 10Ottomata) [16:28:13] (03CR) 10Nuria: [C: 03+1] Re-enable webrequest data deletion job [puppet] - 10https://gerrit.wikimedia.org/r/591935 (https://phabricator.wikimedia.org/T248600) (owner: 10Ottomata) [17:36:26] (03CR) 10Reedy: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/591335 (https://phabricator.wikimedia.org/T250806) (owner: 10Reedy) [17:54:45] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10suffusion_of_yellow) @tstarling: I just set [[https://en.... [23:04:10] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:32:48] 10Operations, 10Cognate, 10ContentTranslation, 10DBA, and 10 others: Restart extension1 (x1) database primary master (db1120) - https://phabricator.wikimedia.org/T250701 (10Addshore) Hmm, Cognate should be in the lists in the description? Or am I confusing something?