[00:06:23] <icinga-wm>	 PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:54:09] <icinga-wm>	 PROBLEM - Check systemd state on centrallog1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:12:35] <wikibugs>	 10Operations, 10Traffic, 10Wikispore, 10HTTPS: Make Wikispore HTTPS-only - https://phabricator.wikimedia.org/T260701 (10Tgr) Sure, theft of a Wikispore account is not particularly damaging, but I doubt there are many people who cannot access Wikipedia and its sister projects but would want to access Wikisp...
[07:43:01] <wikibugs>	 10Operations, 10MediaWiki-Parser: Varnish 503 errors on page with large number of flag icons. - https://phabricator.wikimedia.org/T267804 (10Izno) It is not normal for a PEIS max to cause a 503, in my experience. While it is an actual problem due to the artificial limit, we more-or-less always see the page as...
[08:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201114T0800)
[08:16:12] <wikibugs>	 10Operations, 10MediaWiki-Parser: Varnish 503 errors on page with large number of flag icons. - https://phabricator.wikimedia.org/T267804 (10Mjroots) Izno, there are no issues with my internet service, the rest of the net is working fine without any issues accessing websites and web pages. I don't understand m...
[08:51:51] <wikibugs>	 10Operations, 10MediaWiki-Parser: Varnish 503 errors on page with large number of flag icons. - https://phabricator.wikimedia.org/T267804 (10Izno) I could not reproduce with Firefox, Win10, latest version (82?).  >>! In T267804#6622146, @Mjroots wrote: > Izno, there are no issues with my internet service, the...
[09:09:02] <wikibugs>	 10Operations, 10MediaWiki-Parser: Varnish 503 errors on page with large number of flag icons. - https://phabricator.wikimedia.org/T267804 (10Mjroots) Izno - under preferences > editing I have the following ticked:-   General options Show the difference between the latest accepted version and the latest pending...
[10:15:47] <icinga-wm>	 PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1
[14:56:52] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.beta.wmflabs.org expired on November 13, 2020. - https://phabricator.wikimedia.org/T267858 (10Krenair) Cert was renewed: ` root@deployment-acme-chief03:~# openssl x509 -in /var/lib/acme-chief/certs/unified/live/rsa-...
[15:01:35] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.beta.wmflabs.org expired on November 13, 2020. - https://phabricator.wikimedia.org/T267858 (10Krenair) For some reason I had to do a full restart of the `trafficserver-tls` service on the cache-upload06 VM but it ha...
[15:08:08] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.beta.wmflabs.org expired on November 13, 2020. - https://phabricator.wikimedia.org/T267858 (10Krenair) a:03Krenair @Vgutierrez FYI in case this could happen in prod too, I haven't been keeping track of changes lat...
[15:17:34] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.beta.wmflabs.org expired on November 13, 2020. - https://phabricator.wikimedia.org/T267858 (10AlexisJazz) @hashar Could there be a relation with T267561? (very wild guess)
[16:40:21] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 107 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:41:59] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 5 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:03:31] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on an-worker1098 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/c 15 GB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[19:03:45] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:15:25] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:18:35] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] profile::lvs::realserver: use poolcounter for guarding service restarts [puppet] - 10https://gerrit.wikimedia.org/r/640928 (https://phabricator.wikimedia.org/T266055) (owner: 10Giuseppe Lavagetto)
[19:20:10] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on an-worker1100 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/n 15 GB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration
[19:46:27] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is CRITICAL: 55.19 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[19:49:29] <icinga-wm>	 PROBLEM - varnish-http-requests grafana alert on alert1001 is CRITICAL: CRITICAL: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is alerting: 70% GET drop in 30min alert. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[19:51:07] <icinga-wm>	 RECOVERY - varnish-http-requests grafana alert on alert1001 is OK: OK: Varnish HTTP Requests ( https://grafana.wikimedia.org/d/000000180/varnish-http-requests ) is not alerting. https://phabricator.wikimedia.org/project/view/1201/ https://grafana.wikimedia.org/d/000000180/
[19:51:23] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on alert1001 is OK: (C)60 le (W)70 le 97.35 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[20:12:44] <wikibugs>	 10Operations, 10Traffic: INMARSAT geolocates to the UK, leading to requests going to esams - https://phabricator.wikimedia.org/T209785 (10Reedy) 05Open→03Declined
[20:19:59] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on an-worker1100 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/n 15 GB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration