[03:08:22] <icinga-wm>	 RECOVERY - exim queue on mx2001 is OK: OK: Less than 1000 mails in exim queue.
[03:28:43] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 877.56 seconds
[03:33:12] <icinga-wm>	 PROBLEM - puppet last run on mw2172 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[03:35:33] <icinga-wm>	 PROBLEM - puppet last run on analytics1055 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-ISP.mmdb.gz]
[03:57:53] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 264.37 seconds
[04:01:02] <icinga-wm>	 RECOVERY - puppet last run on analytics1055 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[04:03:42] <icinga-wm>	 RECOVERY - puppet last run on mw2172 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:32:35] <icinga-wm>	 PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled]
[06:54:12] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 34 probes of 316 (alerts on 25) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[06:58:04] <icinga-wm>	 RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:13] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 316 (alerts on 25) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[07:04:13] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[07:08:42] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[07:14:53] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:27:03] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:56:41] <wikibugs>	 10Operations, 10OTRS: Upgrade to OTRS version 5.0.30 - https://phabricator.wikimedia.org/T205540 (10Framawiki) Thank you for updating so quickly !
[09:55:14] <wikibugs>	 10Operations: Add which ldap groups can login on netbox login form - https://phabricator.wikimedia.org/T203840 (10Framawiki) Thanks all !
[10:04:03] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[10:10:33] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[10:52:52] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[10:56:12] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[11:39:10] <wikibugs>	 (03PS2) 10ArielGlenn: make path to MWScript.php configurable for xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/461650 (https://phabricator.wikimedia.org/T204962)
[11:42:01] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] make path to MWScript.php configurable for xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/461650 (https://phabricator.wikimedia.org/T204962) (owner: 10ArielGlenn)
[12:31:10] <wikibugs>	 (03PS2) 10ArielGlenn: make location of MWScript.php configurable for xml/sql dumps [dumps] - 10https://gerrit.wikimedia.org/r/461651 (https://phabricator.wikimedia.org/T204962)
[12:38:47] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] make location of MWScript.php configurable for xml/sql dumps [dumps] - 10https://gerrit.wikimedia.org/r/461651 (https://phabricator.wikimedia.org/T204962) (owner: 10ArielGlenn)
[12:38:56] <wikibugs>	 (03PS3) 10ArielGlenn: make location of MWScript.php configurable for xml/sql dumps [dumps] - 10https://gerrit.wikimedia.org/r/461651 (https://phabricator.wikimedia.org/T204962)
[12:41:50] <logmsgbot>	 !log ariel@deploy1001 Started deploy [dumps/dumps@26aaee6]: make location of MWScript.php configurable
[12:41:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:54] <logmsgbot>	 !log ariel@deploy1001 Finished deploy [dumps/dumps@26aaee6]: make location of MWScript.php configurable (duration: 00m 03s)
[12:41:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:15] <wikibugs>	 (03PS2) 10ArielGlenn: make 'misc cron dumps' use a configured path to MWScript.php [puppet] - 10https://gerrit.wikimedia.org/r/461667 (https://phabricator.wikimedia.org/T204962)
[13:20:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] make 'misc cron dumps' use a configured path to MWScript.php [puppet] - 10https://gerrit.wikimedia.org/r/461667 (https://phabricator.wikimedia.org/T204962) (owner: 10ArielGlenn)
[13:22:35] <wikibugs>	 (03PS3) 10ArielGlenn: make 'misc cron dumps' use a configured path to MWScript.php [puppet] - 10https://gerrit.wikimedia.org/r/461667 (https://phabricator.wikimedia.org/T204962)
[14:04:32] <wikibugs>	 (03PS2) 10GTirloni: shinken - Tweak Puppet thresholds [puppet] - 10https://gerrit.wikimedia.org/r/463581 (https://phabricator.wikimedia.org/T161898)
[14:22:10] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] make 'misc cron dumps' use a configured path to MWScript.php [puppet] - 10https://gerrit.wikimedia.org/r/461667 (https://phabricator.wikimedia.org/T204962) (owner: 10ArielGlenn)
[14:31:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 031] shinken - Tweak Puppet thresholds [puppet] - 10https://gerrit.wikimedia.org/r/463581 (https://phabricator.wikimedia.org/T161898) (owner: 10GTirloni)
[14:31:45] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750 (10valerio.bozzolan) Hello, same error here.  Tried to subscribe myself in https://lists.wikimedia.org/mailman/listinfo/mediawiki-l just now.
[15:02:22] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[15:03:32] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1
[15:30:13] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[15:56:12] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[16:09:19] <wikibugs>	 10Operations, 10Mail, 10Toolforge, 10Patch-For-Review, 10Security: Forward security@tools.wmflabs.org to security@wikimedia.org - https://phabricator.wikimedia.org/T182812 (10valhallasw) To add some context to the current situation -- most of the email sent to security@tools.wmflabs.org is:  - from openb...
[17:37:05] <librenms-wmf>	 08Warning Alert for device cr4-ulsfo.wikimedia.org - Inbound interface errors
[17:48:06] <librenms-wmf>	 08̶W̶a̶r̶n̶i̶n̶g Device cr4-ulsfo.wikimedia.org recovered from Inbound interface errors
[18:10:05] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750 (10Aklapper) Does the problem still happen after waiting more than 30 minutes?
[21:04:13] <icinga-wm>	 PROBLEM - Restbase root url on restbase2003 is CRITICAL: HTTP CRITICAL - No data received from host
[21:05:22] <icinga-wm>	 RECOVERY - Restbase root url on restbase2003 is OK: HTTP OK: HTTP/1.1 200 - 16081 bytes in 0.127 second response time
[21:08:29] <wikibugs>	 10Operations, 10Scap, 10Datacenter-Switchover-2018, 10Patch-For-Review, and 2 others: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10hashar)
[21:40:29] <wikibugs>	 (03PS3) 10GTirloni: shinken - Tweak Puppet thresholds [puppet] - 10https://gerrit.wikimedia.org/r/463581 (https://phabricator.wikimedia.org/T161898)
[21:53:22] <wikibugs>	 (03CR) 10GTirloni: [C: 032] shinken - Tweak Puppet thresholds [puppet] - 10https://gerrit.wikimedia.org/r/463581 (https://phabricator.wikimedia.org/T161898) (owner: 10GTirloni)
[22:50:43] <icinga-wm>	 PROBLEM - Filesystem available is greater than filesystem size on ms-be2041 is CRITICAL: cluster=swift device=/dev/sdi1 fstype=xfs instance=ms-be2041:9100 job=node mountpoint=/srv/swift-storage/sdi1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2041&var-datasource=codfw%2520prometheus%252Fops
[22:56:22] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[22:58:33] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen