[00:12:04] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: New WikiJournal_CoC@lists.wikimedia.org - https://phabricator.wikimedia.org/T223861 (10Aklapper) 05Open→03Resolved Unfortunately no reply by @Thomas_Shafee, hence assuming everything is fine.
[00:21:02] <wikibugs>	 10Operations, 10observability, 10User-CDanis: Find links to grafana.wikimedia.org and change them to use the new URL format - https://phabricator.wikimedia.org/T211982 (10Aklapper)
[00:24:05] <wikibugs>	 10Operations, 10observability, 10User-CDanis: Find links to grafana.wikimedia.org and change them to use the new URL format - https://phabricator.wikimedia.org/T211982 (10Aklapper)
[01:31:41] <icinga-wm>	 PROBLEM - HHVM rendering on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[01:32:59] <icinga-wm>	 RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 75918 bytes in 0.126 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:01:07] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment cluster for awight - https://phabricator.wikimedia.org/T225062 (10greg) Approved on my side. Thanks Adam!
[05:02:26] <wikibugs>	 (03PS1) 10ArielGlenn: revoke dzahn's ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/515988 (https://phabricator.wikimedia.org/T225371)
[05:03:23] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] revoke dzahn's ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/515988 (https://phabricator.wikimedia.org/T225371) (owner: 10ArielGlenn)
[05:05:19] <wikibugs>	 (03PS2) 10ArielGlenn: revoke dzahn's ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/515988 (https://phabricator.wikimedia.org/T225371)
[05:06:39] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] revoke dzahn's ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/515988 (https://phabricator.wikimedia.org/T225371) (owner: 10ArielGlenn)
[05:21:05] <icinga-wm>	 PROBLEM - puppet last run on mwmaint1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): User[dzahn]
[05:32:44] <apergos>	 ^^ handled
[05:48:19] <icinga-wm>	 RECOVERY - puppet last run on mwmaint1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:29:27] <icinga-wm>	 PROBLEM - puppet last run on restbase1023 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[06:30:39] <icinga-wm>	 PROBLEM - puppet last run on logstash1007 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[06:33:35] <icinga-wm>	 PROBLEM - puppet last run on mw1308 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/safe-service-restart]
[06:38:07] <icinga-wm>	 PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/cron - 177 bytes in 0.009 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[06:45:25] <icinga-wm>	 RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.013 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[06:56:41] <icinga-wm>	 RECOVERY - puppet last run on restbase1023 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on logstash1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:00:47] <icinga-wm>	 RECOVERY - puppet last run on mw1308 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[08:32:35] <icinga-wm>	 RECOVERY - Host lvs4007 is UP: PING OK - Packet loss = 0%, RTA = 74.31 ms
[09:26:51] <icinga-wm>	 PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) https://wikitech.wikimedia.org/wiki/Mailman
[09:29:43] <icinga-wm>	 RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. https://wikitech.wikimedia.org/wiki/Mailman
[10:40:36] <wikibugs>	 (03PS1) 10Andrew Bogott: Removed dzahn's root keys [labs/private] - 10https://gerrit.wikimedia.org/r/516049 (https://phabricator.wikimedia.org/T225371)
[10:41:24] <wikibugs>	 (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Removed dzahn's root keys [labs/private] - 10https://gerrit.wikimedia.org/r/516049 (https://phabricator.wikimedia.org/T225371) (owner: 10Andrew Bogott)
[11:41:45] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2097 is OK: OK slave_sql_lag Replication lag: 0.36 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[11:52:40] <wikibugs>	 (03PS1) 10Gergő Tisza: Fix import group name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516053
[12:06:34] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): WDQS internal servers started lagging behind - https://phabricator.wikimedia.org/T224829 (10Mathew.onipe) 05Open→03Invalid
[14:36:54] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: db2097 (codfw s1&s6 source backups) mariadb@s6 *process* (10.1.39) crashed on 2019-06-08 - https://phabricator.wikimedia.org/T225378 (10Marostegui) Looks like it might be related to a HW memory issue that has been going on for a few days: `   /system1/log1/record16   Targets...
[14:40:23] <wikibugs>	 10Operations, 10DNS, 10Matrix, 10Traffic, 10Wikimedia-Apache-configuration: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Tgr)
[14:56:42] <wikibugs>	 (03PS1) 10Gergő Tisza: Add .well-known/matrix for wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835)
[14:58:44] <wikibugs>	 (03PS1) 10Gergő Tisza: Revert "Matrix wikimedia.org IDs domain authorization" [dns] - 10https://gerrit.wikimedia.org/r/516056 (https://phabricator.wikimedia.org/T223835)
[15:03:34] <icinga-wm>	 PROBLEM - Check systemd state on stat1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[16:01:24] <icinga-wm>	 RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational
[16:11:29] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): dologmsg: move this little script out of toolforge profile (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/515104 (owner: 10Bstorm)
[16:48:29] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Testing, 10HHVM: Re-add complete URL parsing fix from 3.18.7 release - https://phabricator.wikimedia.org/T185024 (10Krinkle)
[19:02:25] <wikibugs>	 10Operations, 10Prod-Kubernetes, 10Release Pipeline, 10Documentation, 10Release-Engineering-Team (Next): TEC3:O6:O:6.1:Q3:  Deployment Pipeline Documentation - https://phabricator.wikimedia.org/T213090 (10thcipriani)
[19:02:30] <wikibugs>	 10Operations, 10Prod-Kubernetes, 10Documentation, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Next): Update Blubber documentation - https://phabricator.wikimedia.org/T213198 (10thcipriani) 05Open→03Resolved a:03thcipriani >>! In T213198#4961595, @LarsWirzenius wrote: > https://wikitec...
[19:36:57] <icinga-wm>	 PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[19:58:09] <icinga-wm>	 PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[20:04:05] <icinga-wm>	 RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[20:30:47] <icinga-wm>	 RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[20:42:45] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[20:44:05] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[22:00:35] <icinga-wm>	 PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[22:27:45] <icinga-wm>	 RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[22:43:07] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[22:47:21] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[22:56:07] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received: / (spec from root) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[23:01:34] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[23:01:49] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[23:06:15] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[23:08:38] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 138 bytes in 0.009 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[23:09:03] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid