[00:05:57] <wikibugs>	 (03PS2) 10Dzahn: acme_chief: add Icinga notes_url [puppet] - 10https://gerrit.wikimedia.org/r/509475 (https://phabricator.wikimedia.org/T197873)
[00:06:10] <wikibugs>	 (03PS1) 10Dzahn: labstore: add Icinga notes_urls [puppet] - 10https://gerrit.wikimedia.org/r/509545 (https://phabricator.wikimedia.org/T197873)
[00:24:33] <icinga-wm>	 PROBLEM - Disk space on actinium is CRITICAL: DISK CRITICAL - free space: / 339 MB (3% inode=90%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[00:28:37] <icinga-wm>	 PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:52:13] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2013 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:56:17] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2013 is OK: OK - running: The system is fully operational
[00:58:17] <wikibugs>	 (03CR) 10Alex Monk: "Should probably do Ifc7d8290 first" [puppet] - 10https://gerrit.wikimedia.org/r/506672 (https://phabricator.wikimedia.org/T220894) (owner: 10Alex Monk)
[01:00:50] <icinga-wm>	 RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[01:03:42] <wikibugs>	 (03PS7) 10Alex Monk: base::firewall: Send (almost) all special host groups via single parameter [puppet] - 10https://gerrit.wikimedia.org/r/505793
[01:06:00] <wikibugs>	 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979 (10Krinkle) >>! In T137979#4118215, @BBlack wrote: > Re-reading above: probably the better blend of options would be to swap gzip for brotli in Varnish one-for-one (without the whole st...
[01:06:11] <wikibugs>	 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979 (10Krinkle) p:05Low→03Normal
[01:11:02] <wikibugs>	 (03PS2) 10Krinkle: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509357 (owner: 10Aaron Schulz)
[01:20:56] <wikibugs>	 (03PS1) 10Dzahn: mariadb: set some more Icinga notes URLs for nrpe checks [puppet] - 10https://gerrit.wikimedia.org/r/509552 (https://phabricator.wikimedia.org/T197873)
[01:20:58] <wikibugs>	 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Reedy)
[01:32:19] <icinga-wm>	 PROBLEM - Disk space on ms-be1015 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdb1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[01:35:48] <wikibugs>	 (03PS1) 10Dzahn: nrpe: add Icinga notes_url for systemd_unit_state check [puppet] - 10https://gerrit.wikimedia.org/r/509553 (https://phabricator.wikimedia.org/T197873)
[01:38:18] <wikibugs>	 10Operations, 10ops-codfw, 10media-storage, 10Patch-For-Review, 10User-fgiunchedi: decom ms-be201[345] - https://phabricator.wikimedia.org/T221068 (10Dzahn)
[01:46:30] <wikibugs>	 10Operations, 10ops-eqiad, 10media-storage: ms-be1015 - sdb1 failed - https://phabricator.wikimedia.org/T222991 (10Dzahn)
[01:46:43] <wikibugs>	 10Operations, 10ops-eqiad, 10media-storage: ms-be1015 - sdb1 failed - https://phabricator.wikimedia.org/T222991 (10Dzahn) p:05Triage→03High
[01:47:38] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on ms-be1015 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdb1 is not accessible: Input/output error daniel_zahn https://phabricator.wikimedia.org/T222991 https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[01:48:49] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Discovery: Memory correctable errors -EDAC- elastic1029 - https://phabricator.wikimedia.org/T214283 (10Dzahn) 05Invalid→03Open It's back:  https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=elastic1029&service=Memory+correctable+errors+-EDAC-  Cur...
[01:49:23] <icinga-wm>	 ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on elastic1029 is CRITICAL: 4.001 ge 4 daniel_zahn https://phabricator.wikimedia.org/T214283 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=elastic1029&var-datasource=eqiad+prometheus/ops
[01:56:29] <icinga-wm>	 PROBLEM - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 30.00% of data under the critical threshold [50.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[01:57:13] <icinga-wm>	 PROBLEM - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 40.00% of data under the critical threshold [50.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[01:57:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdb1]
[01:58:18] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdb1] daniel_zahn https://phabricator.wikimedia.org/T222991
[02:01:38] <mutante>	 !log actinium - low disk space - apt-get clean - gzip /var/log/squid3/access.log.1
[02:01:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:02:01] <icinga-wm>	 PROBLEM - Disk space on actinium is CRITICAL: DISK CRITICAL - free space: / 253 MB (2% inode=90%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[02:03:25] <icinga-wm>	 RECOVERY - Disk space on actinium is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[02:06:08] <wikibugs>	 10Operations, 10ops-codfw, 10Discovery-Search (Current work): elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10Dzahn) it's fully down now:  https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=elastic2038
[02:06:23] <wikibugs>	 10Operations, 10ops-codfw, 10Discovery-Search (Current work): elastic2038 DOWN (CPU/memory errors ) - https://phabricator.wikimedia.org/T217398 (10Dzahn)
[02:07:28] <icinga-wm>	 ACKNOWLEDGEMENT - Host elastic2038 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T217398
[02:13:15] <librenms-wmf>	 08Warning Alert for device cr1-codfw.wikimedia.org - Inbound interface errors
[02:21:13] <icinga-wm>	 RECOVERY - Mediawiki Cirrussearch update rate - codfw on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[02:21:55] <icinga-wm>	 RECOVERY - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[02:25:31] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on ms-be2017 is CRITICAL: cluster=swift device=cciss,1 instance=ms-be2017:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2017&var-datasource=codfw+prometheus/ops
[02:57:07] <wikibugs>	 (03Abandoned) 10Reedy: Revert "striker: Disable developer account creation" [puppet] - 10https://gerrit.wikimedia.org/r/508944 (https://phabricator.wikimedia.org/T222844) (owner: 10Reedy)
[03:24:25] <icinga-wm>	 PROBLEM - puppet last run on db1089 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:26:13] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on ms-be2017 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2017&var-datasource=codfw+prometheus/ops
[03:51:15] <icinga-wm>	 RECOVERY - puppet last run on db1089 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[03:53:16] <wikibugs>	 10Operations, 10Commons, 10media-storage: Upload fails at Wikimedia Commons "Internal error: Server failed to store temporary file." - https://phabricator.wikimedia.org/T222994 (10Peachey88)
[04:06:27] <icinga-wm>	 RECOVERY - EDAC syslog messages on db1068 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db1068&var-datasource=eqiad+prometheus/ops
[05:12:59] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui)
[05:13:05] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: correctable memory errors db1068 (commons primary master database) - https://phabricator.wikimedia.org/T213664 (10Marostegui) 05Open→03Resolved It recovered again, needs replacement though as I'm sure it will become critical again soonish Closing for now again unt...
[05:33:15] <librenms-wmf>	 08Warning Alert for device cr1-codfw.wikimedia.org - Inbound interface errors
[06:30:17] <icinga-wm>	 PROBLEM - puppet last run on db1092 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled]
[06:31:51] <icinga-wm>	 PROBLEM - puppet last run on dbmonitor2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle.
[06:37:52] <elukey>	 !log restart eventlogging on eventlog1002 - huge kafka consumer lag accumulated (T222941)
[06:37:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:57] <stashbot>	 T222941: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941
[06:57:09] <icinga-wm>	 RECOVERY - puppet last run on db1092 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:58:45] <icinga-wm>	 RECOVERY - puppet last run on dbmonitor2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[07:16:24] <wikibugs>	 (03PS1) 10Elukey: profile::prometheus::alerts: add EL processors kafka consumer lag alert [puppet] - 10https://gerrit.wikimedia.org/r/509566 (https://phabricator.wikimedia.org/T222941)
[07:17:37] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::prometheus::alerts: add EL processors kafka consumer lag alert [puppet] - 10https://gerrit.wikimedia.org/r/509566 (https://phabricator.wikimedia.org/T222941) (owner: 10Elukey)
[07:22:18] <wikibugs>	 (03PS1) 10Elukey: profile::prometheus::alerts: tune EL kafka consumer lag [puppet] - 10https://gerrit.wikimedia.org/r/509567 (https://phabricator.wikimedia.org/T222941)
[07:23:16] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::prometheus::alerts: tune EL kafka consumer lag [puppet] - 10https://gerrit.wikimedia.org/r/509567 (https://phabricator.wikimedia.org/T222941) (owner: 10Elukey)
[07:30:08] <wikibugs>	 (03PS1) 10Elukey: profile::prometheus::alerts: fix dashboard URL [puppet] - 10https://gerrit.wikimedia.org/r/509568 (https://phabricator.wikimedia.org/T222941)
[07:31:01] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::prometheus::alerts: fix dashboard URL [puppet] - 10https://gerrit.wikimedia.org/r/509568 (https://phabricator.wikimedia.org/T222941) (owner: 10Elukey)
[07:31:33] <icinga-wm>	 PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:33:37] <elukey>	 fixing it --^
[07:34:41] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[07:36:35] <icinga-wm>	 RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[07:39:13] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[07:40:17] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[07:40:47] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[08:00:39] <icinga-wm>	 PROBLEM - Disk space on ms-be1014 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdl1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[08:11:48] <wikibugs>	 (03PS1) 10ArielGlenn: make tox happy again on some subdirs so changes to others will pass ci [software] - 10https://gerrit.wikimedia.org/r/509571
[08:12:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] make tox happy again on some subdirs so changes to others will pass ci [software] - 10https://gerrit.wikimedia.org/r/509571 (owner: 10ArielGlenn)
[08:12:57] <icinga-wm>	 PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdl1]
[08:23:01] <wikibugs>	 (03PS2) 10ArielGlenn: make tox happy again on some subdirs so changes to others will pass ci [software] - 10https://gerrit.wikimedia.org/r/509571
[08:23:27] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on db1068 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db1068&var-datasource=eqiad+prometheus/ops
[08:30:57] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] make tox happy again on some subdirs so changes to others will pass ci [software] - 10https://gerrit.wikimedia.org/r/509571 (owner: 10ArielGlenn)
[09:23:15] <librenms-wmf>	 08̶W̶a̶r̶n̶i̶n̶g Device cr1-codfw.wikimedia.org recovered from Inbound interface errors
[09:29:44] <wikibugs>	 (03PS1) 10ArielGlenn: remove salt-misc dir, scripts no longer used [software] - 10https://gerrit.wikimedia.org/r/509576
[09:34:58] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "as discussed" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/509172 (https://phabricator.wikimedia.org/T141324) (owner: 10Dzahn)
[09:55:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw1225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[09:58:17] <icinga-wm>	 RECOVERY - HHVM rendering on mw1225 is OK: HTTP OK: HTTP/1.1 200 OK - 77183 bytes in 1.071 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[10:00:59] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] "Looks reasonable, see comments inline" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/509542 (owner: 10Paladox)
[10:58:56] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "I don't think these are appropriate. elukey will have better documentation for eventlogging, and haproxy may and should have a better page" [puppet] - 10https://gerrit.wikimedia.org/r/509552 (https://phabricator.wikimedia.org/T197873) (owner: 10Dzahn)
[11:17:49] <icinga-wm>	 PROBLEM - Disk space on ms-be2013 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdb1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[11:21:50] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] remove salt-misc dir, scripts no longer used [software] - 10https://gerrit.wikimedia.org/r/509576 (owner: 10ArielGlenn)
[11:24:03] <icinga-wm>	 PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdb1]
[14:05:40] <wikibugs>	 (03Abandoned) 10Giuseppe Lavagetto: mediawiki::web::beta_sites: convert wikibooks to vhost [puppet] - 10https://gerrit.wikimedia.org/r/439894 (https://phabricator.wikimedia.org/T196968) (owner: 10Giuseppe Lavagetto)
[14:51:59] <wikibugs>	 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10media-storage: Upload fails at Wikimedia Commons "Internal error: Server failed to store temporary file." - https://phabricator.wikimedia.org/T222994 (10Reedy)
[15:26:23] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[15:27:01] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[15:28:21] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[15:28:51] <icinga-wm>	 PROBLEM - HHVM rendering on mw1268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[15:30:05] <icinga-wm>	 RECOVERY - HHVM rendering on mw1268 is OK: HTTP OK: HTTP/1.1 200 OK - 77176 bytes in 0.133 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[15:33:15] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[15:33:53] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[15:35:11] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[15:38:15] <librenms-wmf>	 08Warning Alert for device cr1-codfw.wikimedia.org - Inbound interface errors
[16:27:29] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[16:28:09] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[16:29:37] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[16:30:11] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[16:30:15] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[16:30:53] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[16:31:37] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[16:32:26] <marostegui>	 all the DCs with 500s?
[16:35:41] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[16:36:31] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[16:37:09] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[16:45:28] <wikibugs>	 (03PS1) 10Framawiki: Enable SandboxLink extension on zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509593 (https://phabricator.wikimedia.org/T223006)
[16:55:08] <wikibugs>	 (03CR) 10Framawiki: [C: 03+1] "I80e054a2134ca was merged a month ago, I think this patch can be merged." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496677 (https://phabricator.wikimedia.org/T218363) (owner: 10Varnent)
[17:11:05] <wikibugs>	 (03PS1) 10Framawiki: Enable wmgProofreadPageShowHeaders on pawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509594 (https://phabricator.wikimedia.org/T222740)
[17:28:55] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[17:29:21] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:30:01] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:30:03] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[17:30:10] <wikibugs>	 (03CR) 10Framawiki: [C: 03+1] Set wgArticleCountMethod='any' for bgwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506943 (https://phabricator.wikimedia.org/T222044) (owner: 10Ammarpad)
[17:30:19] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[17:30:27] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:30:41] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[17:30:49] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:31:59] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[17:33:03] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[17:33:04] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[17:33:21] <wikibugs>	 (03CR) 10Framawiki: [C: 03+1] Add namespace aliases on zhwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506892 (https://phabricator.wikimedia.org/T222024) (owner: 10DannyS712)
[17:33:29] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:33:33] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:34:09] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:34:35] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[17:37:34] <wikibugs>	 (03CR) 10Framawiki: [C: 04-1] "The task desc asks a change for 9 namespaces, there is only five here. If it is wanted, please explain why, thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/503680 (https://phabricator.wikimedia.org/T220881) (owner: 10DannyS712)
[17:38:19] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[17:38:53] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[17:39:01] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
[18:13:45] <wikibugs>	 (03PS1) 10Alex Monk: deployment-prep: Move to working Mathoid service [puppet] - 10https://gerrit.wikimedia.org/r/509595 (https://phabricator.wikimedia.org/T221654)
[18:14:18] <wikibugs>	 (03PS1) 10Alex Monk: deployment-prep: Move to working Mathoid service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509596 (https://phabricator.wikimedia.org/T221654)
[18:14:59] <wikibugs>	 (03PS2) 10Alex Monk: deployment-prep: Move to working Mathoid service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509596 (https://phabricator.wikimedia.org/T221654)
[18:20:34] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] Set wgArticleCountMethod='any' for bgwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/506943 (https://phabricator.wikimedia.org/T222044) (owner: 10Ammarpad)
[18:23:27] <wikibugs>	 (03PS1) 10Reedy: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509597
[18:23:29] <wikibugs>	 (03CR) 10Reedy: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509597 (owner: 10Reedy)
[18:25:05] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509597 (owner: 10Reedy)
[18:26:04] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 57s)
[18:26:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:14] <wikibugs>	 (03CR) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509597 (owner: 10Reedy)
[18:39:37] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] deployment-prep: Move to working Mathoid service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509596 (https://phabricator.wikimedia.org/T221654) (owner: 10Alex Monk)
[18:40:43] <wikibugs>	 (03Merged) 10jenkins-bot: deployment-prep: Move to working Mathoid service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509596 (https://phabricator.wikimedia.org/T221654) (owner: 10Alex Monk)
[18:40:57] <wikibugs>	 (03CR) 10jenkins-bot: deployment-prep: Move to working Mathoid service [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509596 (https://phabricator.wikimedia.org/T221654) (owner: 10Alex Monk)
[20:14:07] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s8 on db1116 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 840.93 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[20:48:29] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp3035 is CRITICAL: CRITICAL: expiry mailbox lag is 2065004 https://wikitech.wikimedia.org/wiki/Varnish
[21:00:35] <wikibugs>	 (03PS42) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932)
[21:00:42] <wikibugs>	 (03CR) 10Mathew.onipe: icinga: create and apply cirrus config check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe)
[21:01:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe)
[21:14:39] <wikibugs>	 (03PS43) 10Mathew.onipe: icinga: create and apply cirrus config check [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932)
[21:38:51] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC output is Ok: https://puppet-compiler.wmflabs.org/compiler1002/16474/" [puppet] - 10https://gerrit.wikimedia.org/r/507045 (https://phabricator.wikimedia.org/T218932) (owner: 10Mathew.onipe)
[21:40:28] <wikibugs>	 (03PS8) 10Mathew.onipe: wdqs: add WDQS restart cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/507347 (https://phabricator.wikimedia.org/T221832)
[22:21:47] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp3035 is OK: OK: expiry mailbox lag is 238285 https://wikitech.wikimedia.org/wiki/Varnish
[22:23:15] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s8 on db1116 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[22:53:15] <librenms-wmf>	 08Warning Alert for device cr1-codfw.wikimedia.org - Inbound interface errors
[22:57:11] <wikibugs>	 (03PS1) 10Framawiki: quarry: nginx conf for custom 50x error pages [puppet] - 10https://gerrit.wikimedia.org/r/509608 (https://phabricator.wikimedia.org/T223018)
[22:59:57] <wikibugs>	 (03PS2) 10Framawiki: quarry: nginx conf for custom 50x error pages [puppet] - 10https://gerrit.wikimedia.org/r/509608 (https://phabricator.wikimedia.org/T223018)
[23:29:37] <icinga-wm>	 PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle.