[02:02:15] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/653264 [03:14:23] PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [03:32:41] RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 2 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting%23Nova-fullstack [06:35:17] PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:45:01] PROBLEM - Check whether ferm is active by checking the default input chain on thanos-fe1001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [07:01:53] RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:15:35] RECOVERY - Check whether ferm is active by checking the default input chain on thanos-fe1001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [07:42:41] PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100% [07:43:23] RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 33.39 ms [09:57:08] 10Operations: Provide an option menu when booting via PXE - https://phabricator.wikimedia.org/T191018 (10elukey) Hello @fgiunchedi, I'd need to boot an-coord1002 with d-i in rescue mode to execute `grub-install` on a raid-1 disk (that doesn't have it), is there a procedure for a one-off that I can use/test ? [10:11:29] PROBLEM - SSH on logstash2005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [10:19:39] RECOVERY - SSH on logstash2005 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [10:21:28] 10Operations, 10SRE-swift-storage, 10netops: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10elukey) [10:26:05] 10Operations, 10SRE-swift-storage: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10RhinosF1) Is this not the same as {T244567} [11:21:13] PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [11:52:23] PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [12:26:21] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:27:57] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:00:13] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir3002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86388 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:00:15] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir4002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86386 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:00:23] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir3001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86378 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:00:23] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir4001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86378 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:00:31] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir5001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86371 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:00:35] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir1001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86366 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:00:57] RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [13:01:05] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir2002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86337 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:01:33] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir1002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86309 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:01:33] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir2001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86308 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:01:43] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir5002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86299 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [13:11:11] RECOVERY - mediawiki originals uploads -hourly- for eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [15:00:11] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir2001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86391 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:00:41] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir5001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86361 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:00:43] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir1001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86359 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:00:45] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir2002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86357 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:01:05] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir1002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86337 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:01:09] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir3002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86332 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:01:09] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir4001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86332 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:01:17] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir4002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86324 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:01:27] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir3001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86314 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [15:01:37] PROBLEM - HTTPS non-canonical-redirect-2 on ncredir5002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikimania.com has 86304 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [18:40:19] PROBLEM - exim queue on mx1001 is CRITICAL: CRITICAL: 4448 mails in exim queue. https://wikitech.wikimedia.org/wiki/Exim [18:42:05] PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [18:52:01] RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [18:56:25] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) is CRITICAL: Test Retrieve announcements returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [18:58:03] RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:03:19] (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/653525 [19:27:07] !log restart acme-chief on acmechief1001 [19:27:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:33] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir5002 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581490 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:28:55] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir5002 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588666 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:30:07] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir2001 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588594 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:30:17] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir2002 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581384 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:30:37] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir2002 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588563 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:30:47] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir2001 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581355 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:31:11] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir1001 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581331 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:32:07] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir1002 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581273 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:32:09] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir1001 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588473 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:32:23] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir5001 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581259 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:32:31] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir1002 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588452 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:33:05] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir5001 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588417 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:34:33] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir3002 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588328 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:34:37] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir3002 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581124 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:34:51] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir3001 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581111 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:34:53] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir3001 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588308 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:36:09] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir4001 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588232 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:36:23] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir4001 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 581018 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:37:47] RECOVERY - HTTPS non-canonical-redirect-3 on ncredir4002 is OK: SSL OK - OCSP staple validity for www.wikipedia.bg has 580934 seconds left:Certificate *.wikipedia.bg valid until 2021-02-28 12:01:39 +0000 (expires in 56 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:38:03] RECOVERY - HTTPS non-canonical-redirect-2 on ncredir4002 is OK: SSL OK - OCSP staple validity for www.wikimania.com has 588118 seconds left:Certificate *.wikimania.com valid until 2021-03-21 14:01:30 +0000 (expires in 77 days) https://wikitech.wikimedia.org/wiki/Ncredir [19:59:57] (03PS1) 10Daimona Eaytoy: Temporarily add alternative path for AbuseFilter script [puppet] - 10https://gerrit.wikimedia.org/r/653539 [22:48:09] (03PS3) 10Urbanecm: hrwiki: Restrict changetags permissions to sysop and bot group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653173 (https://phabricator.wikimedia.org/T270996)