[00:28:39] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [02:48:13] RECOVERY - puppet last run on install2002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [02:57:49] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:58:01] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [03:48:53] PROBLEM - Check systemd state on ms-be2036 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:19:17] RECOVERY - Check systemd state on ms-be2036 is OK: OK - running: The system is fully operational [04:54:43] PROBLEM - Disk space on ms-be2015 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdf1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space [05:01:43] PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[mountpoint-/srv/swift-storage/sdf1] [05:28:07] PROBLEM - Check systemd state on ms-be1030 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:31:18] !log DIsable notifications for db1116:s8 Slave LAG check as this is a snapshot source [05:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:40:54] (03PS1) 10ArielGlenn: remove checkhosts, depends on salt, very obsolete [software] - 10https://gerrit.wikimedia.org/r/509618 [05:41:24] (03CR) 10jerkins-bot: [V: 04-1] remove checkhosts, depends on salt, very obsolete [software] - 10https://gerrit.wikimedia.org/r/509618 (owner: 10ArielGlenn) [05:47:29] RECOVERY - Check systemd state on ms-be1030 is OK: OK - running: The system is fully operational [06:39:16] (03PS2) 10ArielGlenn: remove checkhosts, depends on salt, very obsolete [software] - 10https://gerrit.wikimedia.org/r/509618 [07:30:05] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [07:35:05] (03CR) 10ArielGlenn: [C: 03+2] remove checkhosts, depends on salt, very obsolete [software] - 10https://gerrit.wikimedia.org/r/509618 (owner: 10ArielGlenn) [07:42:12] (03Abandoned) 10ArielGlenn: [WIP] audit ssh key use on production cluster [software] - 10https://gerrit.wikimedia.org/r/174408 (owner: 10ArielGlenn) [07:42:40] (03Abandoned) 10ArielGlenn: first try at jenkins plugin to check prod vs ldap ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/175442 (owner: 10ArielGlenn) [07:43:21] (03Abandoned) 10ArielGlenn: script to generate lists of db hosts by shard and/or dc [software] - 10https://gerrit.wikimedia.org/r/299006 (https://phabricator.wikimedia.org/T104459) (owner: 10ArielGlenn) [07:43:55] (03Abandoned) 10ArielGlenn: clean up arg parsing, this db host checker will have a number of args [software] - 10https://gerrit.wikimedia.org/r/299180 (owner: 10ArielGlenn) [07:44:09] (03Abandoned) 10ArielGlenn: limit list of db hosts to be checked by shards and or dcs [software] - 10https://gerrit.wikimedia.org/r/299181 (owner: 10ArielGlenn) [07:45:15] (03Abandoned) 10ArielGlenn: generate separate mysql config with list of private wikis [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/325765 (https://phabricator.wikimedia.org/T152100) (owner: 10ArielGlenn) [07:45:43] (03Abandoned) 10ArielGlenn: for sanitarium hosts, include a separate mysql cnf with private wikis [puppet] - 10https://gerrit.wikimedia.org/r/325766 (https://phabricator.wikimedia.org/T152100) (owner: 10ArielGlenn) [07:52:19] PROBLEM - Host es1019.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [07:56:59] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:31:02] 10Operations: IPMI Audit 2018-04 - https://phabricator.wikimedia.org/T193155 (10Marostegui) [08:31:09] 10Operations, 10observability, 10Patch-For-Review: Several hosts return "internal IPMI error" in the check_ipmi_temp check - https://phabricator.wikimedia.org/T167121 (10Marostegui) [08:31:11] 10Operations, 10ops-eqiad, 10Patch-For-Review: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 (10Marostegui) 05Resolved→03Open This has happened again - I guess a cold reset is needed?: ` 09:52:20 <+icinga-wm> PROBLEM - Host es1019.mgmt is DO... [08:42:36] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10media-storage: Upload fails at Wikimedia Commons "Internal error: Server failed to store temporary file." - https://phabricator.wikimedia.org/T222994 (10Rodhullandemu) It's been happening to me as well, except the message I get is "Unk... [08:43:55] RECOVERY - Device not healthy -SMART- on labstore1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labstore1003&var-datasource=eqiad+prometheus/ops [10:14:09] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [10:14:31] PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [10:15:19] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [10:15:53] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [10:16:31] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [10:17:27] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [10:18:15] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [10:18:39] RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [10:19:27] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [10:22:45] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [10:24:15] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [10:24:41] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [11:53:49] (03Abandoned) 10ArielGlenn: explicitly start wikidata entity dumps on the 1st and 20th of each month [puppet] - 10https://gerrit.wikimedia.org/r/498164 (https://phabricator.wikimedia.org/T216160) (owner: 10ArielGlenn) [12:14:19] !log restart eventlogging on eventlog1002 - all processors stuck due to kafka python (T222941) [12:14:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:25] T222941: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 [13:00:45] (03PS3) 10ArielGlenn: db query checking scripts for auditing WikiExporter (dumps) queries [software] - 10https://gerrit.wikimedia.org/r/478708 (https://phabricator.wikimedia.org/T207628) [13:01:12] (03CR) 10jerkins-bot: [V: 04-1] db query checking scripts for auditing WikiExporter (dumps) queries [software] - 10https://gerrit.wikimedia.org/r/478708 (https://phabricator.wikimedia.org/T207628) (owner: 10ArielGlenn) [13:11:06] (03PS4) 10ArielGlenn: db query checking scripts for auditing WikiExporter (dumps) queries [software] - 10https://gerrit.wikimedia.org/r/478708 (https://phabricator.wikimedia.org/T207628) [13:11:32] (03CR) 10jerkins-bot: [V: 04-1] db query checking scripts for auditing WikiExporter (dumps) queries [software] - 10https://gerrit.wikimedia.org/r/478708 (https://phabricator.wikimedia.org/T207628) (owner: 10ArielGlenn) [13:14:40] (03PS5) 10ArielGlenn: db query checking scripts for auditing WikiExporter (dumps) queries [software] - 10https://gerrit.wikimedia.org/r/478708 (https://phabricator.wikimedia.org/T207628) [13:18:00] (03Abandoned) 10ArielGlenn: allow the display of only index or column differences for db table checker [software] - 10https://gerrit.wikimedia.org/r/506136 (owner: 10ArielGlenn) [13:18:04] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10media-storage: Upload fails at Wikimedia Commons "Internal error: Server failed to store temporary file." - https://phabricator.wikimedia.org/T222994 (10Framawiki) p:05Triage→03High [13:18:10] (03Abandoned) 10ArielGlenn: script to show section/dbhost info by asking mediawiki for it [software] - 10https://gerrit.wikimedia.org/r/506137 (owner: 10ArielGlenn) [13:18:18] (03Abandoned) 10ArielGlenn: allow section, list of dbs or list of wikis stand alone as arg [software] - 10https://gerrit.wikimedia.org/r/506225 (owner: 10ArielGlenn) [13:18:24] (03Abandoned) 10ArielGlenn: move from MySQLdb to pymysql [software] - 10https://gerrit.wikimedia.org/r/506410 (owner: 10ArielGlenn) [13:18:32] (03Abandoned) 10ArielGlenn: allow table checking to work with specified section [software] - 10https://gerrit.wikimedia.org/r/506411 (owner: 10ArielGlenn) [13:18:38] (03Abandoned) 10ArielGlenn: for checking tables per section, do only so many wikis, not all [software] - 10https://gerrit.wikimedia.org/r/506412 (owner: 10ArielGlenn) [13:18:46] (03Abandoned) 10ArielGlenn: ability to check tables on default section, by looking up databases served [software] - 10https://gerrit.wikimedia.org/r/506413 (owner: 10ArielGlenn) [13:18:52] (03Abandoned) 10ArielGlenn: show host info will now show the largest n wikis per requested section [software] - 10https://gerrit.wikimedia.org/r/506414 (owner: 10ArielGlenn) [14:06:27] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_upload site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [14:06:47] PROBLEM - HTTP availability for Varnish at eqsin on icinga1001 is CRITICAL: job=varnish-upload site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [14:07:07] PROBLEM - Upload HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [14:08:39] PROBLEM - Eqsin HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [14:09:33] RECOVERY - HTTP availability for Varnish at eqsin on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [14:10:33] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [14:13:59] RECOVERY - Upload HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [14:15:31] RECOVERY - Eqsin HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [14:53:55] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [14:55:49] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [14:57:11] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [14:57:29] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [15:14:25] PROBLEM - High CPU load on API appserver on mw1222 is CRITICAL: CRITICAL - load average: 49.53, 26.09, 17.39 [15:15:49] RECOVERY - High CPU load on API appserver on mw1222 is OK: OK - load average: 28.57, 25.18, 17.81 [15:15:52] 10Operations, 10Patch-For-Review: uwsgi's logsocket_plugin.so causes segfaults during log rotation - https://phabricator.wikimedia.org/T212697 (10elukey) Upstream commit: https://github.com/unbit/uwsgi/commit/d642e635b3d558ce91e80442c74f4d16b9d81146 Next step is to open a bug to Debian to ask a patch to the u... [15:32:43] !log rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 - T222941 [15:32:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:48] T222941: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 [15:50:41] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [15:50:59] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [15:51:07] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [15:51:33] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [15:53:25] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [15:58:25] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [15:59:11] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [15:59:21] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [16:25:52] (03CR) 10Zhuyifei1999: [C: 03+1] quarry: nginx conf for custom 50x error pages [puppet] - 10https://gerrit.wikimedia.org/r/509608 (https://phabricator.wikimedia.org/T223018) (owner: 10Framawiki) [16:30:35] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [16:31:30] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [16:32:17] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [16:32:23] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [16:34:45] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [16:37:53] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [16:39:07] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [16:39:45] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [18:05:06] (03CR) 10Framawiki: "Upstream commit I7f0eacf146d09dca8871170975c43e209a22299d was merged, https://quarry.wmflabs.org/static/error/502.html is live." [puppet] - 10https://gerrit.wikimedia.org/r/509608 (https://phabricator.wikimedia.org/T223018) (owner: 10Framawiki) [18:06:54] Can someone with +2 rights on puppet repo merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509608/? thanks [20:33:33] Hi, why I can not upload new patchset on Gerrit? See https://snag.gy/B5EHUJ.jpg [20:39:17] (03CR) 10Mathew.onipe: Add postgres slave init cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/504570 (https://phabricator.wikimedia.org/T220946) (owner: 10Mathew.onipe) [22:44:43] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [22:46:01] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [22:46:35] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [22:54:15] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [22:54:19] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [22:54:49] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [22:58:13] (03PS1) 10Vladis13: Enable webfonts for ru,uk,be of wiki,wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 [22:58:15] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (owner: 10Vladis13) [23:06:47] (03CR) 10Zoranzoki21: [C: 03+1] "Should be ok, is this done per some task on Phabricator or?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (owner: 10Vladis13) [23:07:10] (03CR) 10Zoranzoki21: [C: 03+1] "Oh, I see now. You need to add Bug: T220752 at commit message." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (owner: 10Vladis13) [23:07:42] (03CR) 10jerkins-bot: [V: 04-1] Enable webfonts for ru,uk,be of wiki,wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (owner: 10Vladis13) [23:13:22] (03PS2) 10Vladis13: Enable webfonts for ru,uk,be of wiki,wikisource, and for sourceswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/509739 (https://phabricator.wikimedia.org/T220752)