[01:19:06] (03PS1) 10Bstorm: breakfix: set the label selector to a subset of actual labels [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) [01:19:54] (03PS2) 10Bstorm: breakfix: set the label selector to a subset of actual labels [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) [01:22:08] (03CR) 10Reedy: [C: 04-1] breakfix: set the label selector to a subset of actual labels (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) (owner: 10Bstorm) [01:23:26] (03CR) 10Bstorm: breakfix: set the label selector to a subset of actual labels (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) (owner: 10Bstorm) [01:24:52] (03PS3) 10Bstorm: breakfix: set the label selector to a subset of actual labels [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) [01:27:39] (03CR) 10Bstorm: breakfix: set the label selector to a subset of actual labels (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) (owner: 10Bstorm) [01:29:00] (03CR) 10BryanDavis: [C: 03+1] breakfix: set the label selector to a subset of actual labels [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) (owner: 10Bstorm) [01:29:30] (03CR) 10Bstorm: [C: 03+2] breakfix: set the label selector to a subset of actual labels [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549990 (https://phabricator.wikimedia.org/T237836) (owner: 10Bstorm) [05:42:43] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 269, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:42:43] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:52:21] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:55:23] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:57:11] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 271, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:58:35] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 135, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:00:25] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:01:09] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:02:01] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:02:47] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:10:45] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/title/{title} (Get rev by title from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [06:12:17] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [14:24:35] (03CR) 10Urbanecm: [C: 03+1] Add right "abusefilter-log-private" to usergroup "rollbacker" at ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549987 (https://phabricator.wikimedia.org/T237830) (owner: 10Tks4Fish) [16:17:25] RECOVERY - Check systemd state on labtestpuppetmaster2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:22:13] PROBLEM - Check systemd state on labtestpuppetmaster2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:18:43] (03CR) 10Jforrester: "Beautiful." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [18:00:11] (03PS1) 10Reedy: Remove a lot of orphaned OpenStackManager config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550020 [18:00:43] (03Abandoned) 10Reedy: Remove a lot of orphaned OpenStackManager config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550020 (owner: 10Reedy) [18:02:01] (03CR) 10Reedy: [C: 03+2] wikitech: remove OSM settings related to OpenStack (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [18:02:09] (03CR) 10Reedy: "Damn it gerrit" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [18:02:37] (03CR) 10Reedy: [C: 04-1] "Couple of other bits needed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [18:03:32] (03CR) 10Reedy: [C: 04-1] wikitech: remove OSM settings related to OpenStack (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [18:05:27] (03PS3) 10Reedy: wikitech: remove OSM settings related to OpenStack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [18:06:21] Reedy: Note that the train isn't running for the next fortnight… [18:06:30] SAD [18:06:32] We don't run a Wikitech in Beta Cluster, I suppose? [18:08:38] James_F: no, but we do have https://labtestwikitech.wikimedia.org/. We could hack things up there if we really wanted to. [18:08:57] Sounds messy. [18:09:00] yeah [18:09:09] Especially as it's in "production". [18:09:34] isolated hardware, etc, but yes [18:09:47] Does it have its own SSL cert? [18:09:53] (03PS4) 10Reedy: wikitech: remove OSM settings related to OpenStack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [18:10:10] Yes. Good. [18:10:22] James_F: no, its behind the misc TLS/cache [18:11:48] Yeah, but not the *.wikipedia.org one. [18:13:20] (03Abandoned) 10BryanDavis: wikitech: Update hostnames for OpenStack endpoints [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542506 (https://phabricator.wikimedia.org/T223907) (owner: 10BryanDavis) [18:52:21] PROBLEM - traffic_server tls process restarted on cp1088 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=eqiad+prometheus/ops&var-instance=cp1088&var-layer=tls [21:55:39] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:58:43] (03CR) 10Masumrezarock100: [C: 03+1] "The rights system on ptwiki is pretty messed up, I have to say. Even rollbackers have the block right." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549987 (https://phabricator.wikimedia.org/T237830) (owner: 10Tks4Fish) [22:03:05] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:06:47] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:11:10] (03CR) 10Volans: [C: 04-1] "The script fails to run, even after having fixed a bunch of typos and errors, marking it as WIP." (034 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/539013 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov) [22:13:39] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:25:23] (03PS2) 10CRusnov: netbox: Set URL in report alert to the URL of the report [puppet] - 10https://gerrit.wikimedia.org/r/549959 [22:29:02] (03CR) 10CRusnov: netbox: Set URL in report alert to the URL of the report (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/549959 (owner: 10CRusnov) [22:47:32] (03PS1) 10Reedy: Use extension.json in extension-list for LdapAuthentication and OSM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550037 [22:47:34] (03PS1) 10Reedy: Use wfLoadExtension() for LdapAuthentication and OSM [mediawiki-config] - 10https://gerrit.wikimedia.org/r/550038 [23:04:37] RECOVERY - Long running screen/tmux on netbox1001 is OK: OK: Tmux detected but not long running. https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [23:05:22] (03PS3) 10Tim Starling: Enable REST API on all WMF wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549219 (https://phabricator.wikimedia.org/T237555) [23:07:23] (03CR) 10Tim Starling: Enable REST API on all WMF wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549219 (https://phabricator.wikimedia.org/T237555) (owner: 10Tim Starling) [23:28:47] 10Operations, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Reedy) [23:31:54] 10Operations, 10Horizon, 10Traffic, 10Upstream, 10cloud-services-team (Kanban): Horizon Designate dashboard not allowing creation of NS records - https://phabricator.wikimedia.org/T204013 (10Krenair) (Stein got released on 10th April, Wikimedia probably wont have it for a while though)