[02:16:58] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [02:19:08] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [02:36:54] 10Operations, 10docker-pkg, 10Patch-For-Review: Allow selecting which images to build - https://phabricator.wikimedia.org/T186416 (10Legoktm) I just discovered this. It's awesome <3 Is there anything else left to do here or should it be closed as resolved? [02:45:57] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [02:50:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 17 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [03:28:58] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 759.40 seconds [03:35:47] PROBLEM - puppet last run on mw2277 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [03:36:38] PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [03:54:28] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [03:57:27] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [03:58:59] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [04:01:07] RECOVERY - puppet last run on mw2277 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:01:28] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 285.64 seconds [04:02:07] RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:02:28] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 17 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [04:14:47] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [04:19:48] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [04:31:58] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [04:36:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [04:49:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [04:54:17] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [05:06:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [05:11:37] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 17 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [05:49:28] PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 52788 MB (10% inode=99%) [06:12:08] RECOVERY - Disk space on elastic1018 is OK: DISK OK [06:29:18] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs] [06:30:08] PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/default/ferm] [06:59:39] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:00:38] RECOVERY - puppet last run on analytics1028 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [07:36:47] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [07:41:08] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [08:38:00] (03PS1) 10Framawiki: Remove mhs.ox.ac.uk from $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/459365 (https://phabricator.wikimedia.org/T203904) [10:21:47] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [10:31:57] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [10:44:18] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [10:49:28] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [10:56:38] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:21:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:29:17] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:34:27] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:46:47] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:52:58] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is CRITICAL: cluster={cache_text,cache_upload} site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [11:53:08] PROBLEM - HTTP availability for Varnish at eqsin on einsteinium is CRITICAL: job=varnish-upload site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [11:54:58] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [11:55:17] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [11:55:18] PROBLEM - Eqsin HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [11:55:18] RECOVERY - HTTP availability for Varnish at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [11:56:18] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [11:56:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:58:04] 08Warning Alert for device cr1-eqsin.wikimedia.org - Traffic on tunnel link [12:01:37] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [12:01:48] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5 [12:01:58] RECOVERY - Eqsin HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqsin&var-cache_type=All&var-status_type=5 [12:04:17] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [12:08:04] 08̶W̶a̶r̶n̶i̶n̶g Device cr1-eqsin.wikimedia.org recovered from Traffic on tunnel link [12:09:18] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [12:21:28] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [12:26:37] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [12:33:48] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [12:44:07] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [12:56:28] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [13:16:47] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [13:29:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [13:39:18] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [13:46:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [13:51:38] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:04:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:09:17] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:21:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:26:38] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:33:58] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:39:07] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:46:49] (03CR) 10Mathew.onipe: elasticsearch shard size check * Checks shard size and sends alert if more than 30gb. (0310 comments) [puppet] - 10https://gerrit.wikimedia.org/r/458891 (owner: 10Mathew.onipe) [14:47:24] (03PS2) 10Mathew.onipe: elasticsearch shard size check * Checks shard size and sends alert if more than 30gb. [puppet] - 10https://gerrit.wikimedia.org/r/458891 [14:48:07] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch shard size check * Checks shard size and sends alert if more than 30gb. [puppet] - 10https://gerrit.wikimedia.org/r/458891 (owner: 10Mathew.onipe) [14:51:28] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:55:18] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [14:56:37] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [14:59:47] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:06:08] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:14:58] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:19:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [15:21:47] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:23:58] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:24:18] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [15:36:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:01:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:09:27] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:14:28] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:21:48] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:26:48] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:28:07] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [16:28:07] PROBLEM - Host mr1-eqsin.oob is DOWN: PING CRITICAL - Packet loss = 100% [16:32:23] hmm [16:33:18] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 223.15 ms [16:33:18] RECOVERY - Host mr1-eqsin.oob is UP: PING OK - Packet loss = 0%, RTA = 222.32 ms [16:39:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:49:18] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [16:56:28] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [17:06:47] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [17:14:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [17:19:08] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [17:26:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [17:35:18] PROBLEM - Filesystem available is greater than filesystem size on ms-be2040 is CRITICAL: cluster=swift device=/dev/sdc1 fstype=xfs instance=ms-be2040:9100 job=node mountpoint=/srv/swift-storage/sdc1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2040&var-datasource=codfw%2520prometheus%252Fops [17:39:33] (03PS1) 10Urbanecm: New throttle rule for Czech school [mediawiki-config] - 10https://gerrit.wikimedia.org/r/459383 (https://phabricator.wikimedia.org/T203909) [17:40:13] (03CR) 10jerkins-bot: [V: 04-1] New throttle rule for Czech school [mediawiki-config] - 10https://gerrit.wikimedia.org/r/459383 (https://phabricator.wikimedia.org/T203909) (owner: 10Urbanecm) [17:41:31] Anybody who can deploy throttle rule? T203909 [17:41:32] T203909: Allow IP for creating account for school project for 30 days - https://phabricator.wikimedia.org/T203909 [17:41:33] BTW, jenkins failed due to lack of memory [17:50:57] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [18:01:07] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 17 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [18:02:08] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [18:07:28] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [18:09:27] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [18:14:08] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [18:14:28] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [18:36:57] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [18:43:07] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [18:52:07] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [18:53:18] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:05:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:10:37] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:17:58] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:20:57] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [19:22:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:25:18] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [19:35:18] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:45:38] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:52:57] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:57:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:05:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:10:18] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:23:48] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:33:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:41:18] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:46:18] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:53:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [20:55:58] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 22 probes of 317 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [20:56:18] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [21:01:07] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 16 probes of 317 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [21:01:27] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [21:03:47] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [21:11:07] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [21:16:08] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [21:28:37] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [21:33:38] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [21:40:57] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [21:41:58] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [21:44:17] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [21:45:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [21:53:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [22:02:00] (03CR) 10Gehel: [C: 04-1] "Minor comments inline. Ping me when you have time and we can go over the puppet side of that CR." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/458891 (owner: 10Mathew.onipe) [22:04:28] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [22:08:18] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 20 probes of 317 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [22:09:38] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 19 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [22:13:18] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 17 probes of 317 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [22:18:57] PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [22:21:08] RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [22:26:15] Do ops people work Sunday in the middle of the night? [22:27:23] Seddon: Depends, what's up [22:27:30] If it's important enough, we can find people [22:28:07] Reedy: Its not important enough to find people. Do you know this EU copyright website thats been built? [22:28:15] Yeah? [22:28:30] There's people in SF who it wouldn't be an unreasonable time to ping etc [22:28:38] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [22:28:58] Reedy: I just want to know who I should notify that we are going to be throwing a fuck ton of traffic at it [22:29:05] Starting tomorrow [22:29:12] Oh don't worry about that too much [22:29:17] Ops are aware, up high [22:29:42] Reedy: Okay good [22:29:45] Just checking [22:30:18] Seddon: You know that wikipedia site... [22:35:25] Reedy: shush. [22:35:32] * Reedy pets Seddon [22:35:57] Reedy: Its been a long few months working on this [22:36:44] Seddon: Might be worth an email to the ops list as a reminder though [22:37:17] Just so it's fresh and in everyones mind rather than "why are we getting a lot more traffic suddenly?" [22:37:36] Reedy: Email? [22:40:58] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [22:46:08] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [22:53:28] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [22:54:55] Seddon: ops@lists.wikimedia.org [22:55:37] Seddon: but I can't imagine that it would get more traffic than...English Wikipedia? [22:56:25] 10Operations, 10Services (watching): Create Debian packages for Node.js 10 upgrade - https://phabricator.wikimedia.org/T203239 (10Krinkle) [22:58:37] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [23:32:52] (03CR) 10Tim Starling: "> Patch Set 3:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458623 (https://phabricator.wikimedia.org/T97192) (owner: 10Tim Starling) [23:37:48] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [23:39:40] (03PS4) 10Tim Starling: Set PHP time limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458623 (https://phabricator.wikimedia.org/T97192) [23:47:58] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [23:50:39] (03CR) 10Tim Starling: [C: 032] Set PHP time limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458623 (https://phabricator.wikimedia.org/T97192) (owner: 10Tim Starling) [23:52:23] (03Merged) 10jenkins-bot: Set PHP time limit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/458623 (https://phabricator.wikimedia.org/T97192) (owner: 10Tim Starling) [23:53:18] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 20 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [23:55:08] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [23:58:27] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 17 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map